Main
Neural Networks in Atmospheric Remote Sensing
Neural Networks in Atmospheric Remote Sensing
William J. Blackwell, Frederick W. Chen
0 /
0
How much do you like this book?
What’s the quality of the file?
Download the book for quality assessment
What’s the quality of the downloaded files?
A neural network refers to interconnecting artificial neurons that mimic the properties of biological neurons to perform sophisticated, intelligent tasks. This authoritative reference offers a comprehensive understanding of the underpinnings and practical applications of artificial neural networks and their use in the retrieval of geophysical parameters. Professionals find expert guidance on the development and evaluation of neural network algorithms that process data from a new generation of hyperspectral sensors. Engineers discover how to use neural networks to approximate remote sensing inverse functions with emphasis on model selection, preprocessing, initialization, training, and performance evaluation.
Categories:
Year:
2009
Edition:
1
Publisher:
MIT
Language:
english
Pages:
234 / 232
ISBN 10:
1596933720
ISBN 13:
9781596933729
File:
PDF, 4.45 MB
Download (pdf, 4.45 MB)
 Open in Browser
 Checking other formats...
 Convert to EPUB
 Convert to FB2
 Convert to MOBI
 Convert to TXT
 Convert to RTF
 Converted file can differ from the original. If possible, download the file in its original format.
 Please login to your account first

Need help? Please read our short guide how to send a book to Kindle
The file will be sent to your email address. It may take up to 15 minutes before you receive it.
The file will be sent to your Kindle account. It may takes up to 15 minutes before you received it.
Please note: you need to verify every book you want to send to your Kindle. Check your mailbox for the verification email from Amazon Kindle.
Please note: you need to verify every book you want to send to your Kindle. Check your mailbox for the verification email from Amazon Kindle.
You may be interested in Powered by Rec2Me
Most frequently terms
neural^{470}
network^{427}
atmospheric^{342}
data^{325}
retrieval^{291}
training^{224}
function^{221}
neural network^{219}
temperature^{216}
neural networks^{216}
error^{197}
remote sensing^{194}
noise^{166}
linear^{155}
atmospheric remote^{152}
amsu^{148}
precipitation^{142}
performance^{135}
algorithm^{135}
airs^{123}
layer^{116}
example^{116}
ppc^{114}
microwave^{111}
channels^{110}
surface^{110}
networks in atmospheric^{106}
regression^{104}
radiance^{97}
nodes^{92}
ghz^{91}
weights^{90}
functions^{88}
cloud^{87}
inputs^{87}
profile^{86}
components^{85}
shown^{85}
nonlinear^{84}
methods^{82}
hidden^{82}
observations^{81}
input^{79}
ieee^{75}
vector^{73}
retrievals^{73}
rms^{72}
processing^{70}
validation^{70}
infrared^{69}
atmosphere^{69}
learning^{67}
statistical^{67}
output^{66}
parameters^{65}
representation^{64}
water vapor^{62}
absorption^{61}
measurements^{60}
0 comments
You can write a book review and share your experiences. Other readers will always be interested in your opinion of the books you've read. Whether you've loved the book or not, if you give your honest and detailed thoughts then people will find new books that are right for them.
1

2

Neural Networks in Atmospheric Remote Sensing This is a sample library statement Neural Networks in Atmospheric Remote Sensing William J. Blackwell Frederick W. Chen Library of Congress CataloginginPublication Data A catalog record for this book is available from the U.S. Library of Congress. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library. ISBN13 9781596933729 Cover design by Yekaterina Ratner © 2009 Massachusetts Institute of Technology Lincoln Laboratory 244 Wood Street Lexington, MA 02420 All rights reserved. This work was funded in part by the National Oceanic and Atmospheric Administration under Air Force contract FA872105C0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government. Printed and bound in the United States of America. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher. All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. Artech House cannot attest to the accuracy of this information. Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark. 10 9 8 7 6 5 4 3 2 1 Disclaimer: This eBook does not include the ancillary media that was packaged with the original printed version of the book. To our families Contents Preface xiii 1 Introduction 1 1.1 Present Challenges 1 1.2 Solutions Based on Neural Networks 2 1.3 Mathematical Notation References 3 5 2 Physical Background of Atmospheric Remote Sensing 7 2.1 2.1.1 2.1.2 2.1.3 2.1.4 Overview of the Composition and Thermal Structure of the Earth’s Atmosphere Chemical Composition of the Atmosphere Vertical Distribution of Pressure and ; Density Thermal Structure of the Atmosphere Cloud Microphysics 7 8 9 10 11 2.2 2.2.1 2.2.2 2.2.3 Electromagnetic Wave Propagation Maxwell’s Equations and the Wave Equation Polarization Reflection and Transmission at a Planar Boundary 12 12 13 15 2.3 2.3.1 2.3.2 2.3.3 Absorption of Electromagnetic Waves by Atmospheric Gases Mechanisms of Molecular Absorption Line Shapes Absorption Coefficients and Transmission Functions 16 17 17 17 vii viii Neural Networks in Atmospheric Remote Sensing 2.3.4 The Atmospheric Absorption Spectra 18 2.4 2.4.1 2.4.2 2.4.3 Scattering of Electromagnetic Waves by Atmospheric Particles Mie Scattering The Rayleigh Approximation Comparison of Scattering and Absorption by Hydrometeors 19 19 21 22 2.5 2.5.1 2.5.2 2.5.3 2.5.4 Radiative Transfer in a Nonscattering PlanarStratified Atmosphere Equilibrium Radiation: Planck and Kirchhoff’s Laws Radiative Transfer Due to Emission and Absorption Integral Form of the Radiative Transfer Equation Weighting Function 22 24 24 25 27 2.6 2.6.1 2.6.2 Passive Spectrometer Systems Optical Spectrometers Microwave Spectrometers 30 31 32 2.7 Summary References 33 35 3 An Overview of Inversion Problems in Atmospheric Remote Sensing 37 3.1 Mathematical Notation 38 3.2 Optimality 38 3.3 3.3.1 3.3.2 Methods That Exploit Statistical Dependence The Bayesian Approach Linear and Nonlinear Regression Methods 39 39 41 3.4 3.4.1 3.4.2 Physical Inversion Methods The Linear Case The Nonlinear Case 45 45 46 3.5 3.5.1 3.5.2 Hybrid Inversion Methods Improved Retrieval Accuracy Improved Retrieval Efficiency 48 48 49 3.6 3.6.1 3.6.2 Error Analysis Analytical Analysis Perturbation Analysis 49 49 50 3.7 Summary References 51 52 Contents ix 4 Signal Processing and Data Representation 55 4.1 4.1.1 4.1.2 Analysis of the Information Content of Hyperspectral Data Shannon Information Content Degrees of Freedom 56 56 58 4.2 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 Principal Components Analysis (PCA) Nonlinear PCA Linear PCA Principal Components Transforms The Projected PC Transform Evaluation of Radiance Compression Performance Using Two Different Metrics 59 61 61 63 64 67 4.3 Representation of Nonlinear Features 69 4.4 Summary References 70 71 5 Introduction to Multilayer Perceptron Neural Networks 73 5.1 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 A Brief Overview of Machine Learning Supervised and Unsupervised Learning Classification and Regression Kernel Methods Support Vector Machines Feedforward Neural Networks 74 74 74 75 76 78 5.2 5.2.1 5.2.2 Feedforward Multilayer Perceptron Neural Networks Network Topology Network Training 82 82 84 5.3 5.3.1 5.3.2 Simple Examples SingleInput Networks TwoInput Networks 85 85 93 5.4 Summary 94 5.5 Exercises References 95 96 6 A Practical Guide to Neural Network Training 97 6.1 6.1.1 6.1.2 6.1.3 Data Set Assembly and Organization Data Set Integrity The Importance of an Extensive and Comprehensive Data Set Data Set Partitioning 97 98 98 98 x Neural Networks in Atmospheric Remote Sensing 6.2 6.2.1 6.2.2 6.2.3 Model Selection Number of Inputs Number of Hidden Layers and Nodes Adaptive Model Building Techniques 100 100 100 101 6.3 Network Initialization 101 6.4 6.4.1 6.4.2 6.4.3 Network Training Calculation of the Error Gradient Using Backpropagation FirstOrder Optimization: Gradient Descent SecondOrder Optimization: LevenbergMarquardt 102 102 104 104 6.5 Underfitting and Overfitting 105 6.6 6.6.1 6.6.2 Regularization Techniques Treatment of Noisy Data Weight Decay 107 108 110 6.7 Performance Evaluation 111 6.8 Summary References 112 114 7 Pre and PostProcessing of Atmospheric Data 115 7.1 Mathematical Overview 116 7.2 Data Compression 117 7.3 7.3.1 7.3.2 Filtering of Interfering Signals The Wiener Filter Stochastic Cloud Clearing 118 119 120 7.4 7.4.1 7.4.2 7.4.3 Data Warping Function of Time of Day Function of Geolocation Function of Time of Year 124 125 129 131 7.5 Summary References 134 135 8 Neural Network Jacobian Analysis 137 8.1 Calculation of the Neural Network Jacobian 138 8.2 8.2.1 8.2.2 Neural Network Error Analysis Using the Jacobian The Network Weight Jacobian The Network Input Jacobian 139 139 140 Contents xi 8.2.3 Use of the Jacobian to Assess Noise Contribution 141 8.3 8.3.1 8.3.2 8.3.3 Retrieval System Optimization Using the Jacobian Noise Smoothing Versus Atmospheric Smoothing Optimization Approach Optimization Results 143 144 145 146 8.4 Summary References 146 148 9 Neural Network Retrieval of Precipitation from Passive Microwave Observations 149 9.1 9.1.1 9.1.2 Structure of the Algorithm Physical Basis of Preprocessing Physical Basis of PostProcessing 149 150 153 9.2 9.2.1 9.2.2 9.2.3 9.2.4 153 153 155 159 9.2.5 Signal Processing Components LimbandSurface Corrections Precipitation Detection Cloud Clearing by Regional Laplacian Interpolation TemperatureProfile and WaterVaporProfile Principal Components Image Sharpening 163 164 9.3 Development of the Algorithm 165 9.4 9.4.1 9.4.2 Retrieval Performance Evaluation Image Comparisons of NEXRAD and AMSU/HSB Numerical Comparisons of NEXRAD and AMSU/HSB Retrievals Global Retrievals of Rain and Snow 168 168 169 173 9.5 Summary References 175 176 10 Neural Network Retrieval of Atmospheric Profiles from Microwave and Hyperspectral Infrared Observations 179 9.4.3 10.1 The PPC/NN Algorithm 10.1.1 Network Topology 10.1.2 Network Training 10.2 Retrieval Performance Comparisons with Simulated ClearAir AIRS Radiances 180 181 181 181 xii Neural Networks in Atmospheric Remote Sensing 10.2.1 Simulation of AIRS Radiances 10.2.2 An Iterated MinimumVariance Technique for the Retrieval of Atmospheric Profiles 10.2.3 Retrieval Performance Comparisons 10.2.4 Discussion 10.3 10.3.1 10.3.2 10.3.3 10.3.4 10.3.5 10.3.6 10.3.7 Validation of the PPC/NN Algorithm with AIRS/AMSU Observations of Partially Cloudy Scenes over Land and Ocean Cloud Clearing of AIRS Radiances AIRS/AMSU/ECMWF Data Set AIRS/AMSU Channel Selection PPC/NN Retrieval Enhancements for Variable Sensor Scan Angle and Surface Pressure Retrieval Performance Retrieval Performance Sensitivity Analyses Discussion and Future Work 182 183 184 185 188 188 188 189 189 190 194 198 10.4 Summary and Conclusions References 201 202 11 Discussion of Future Work 205 11.1 Bayesian Approaches for Neural Network Training and Error Characterization 205 11.2 Soft Computing: NeuroFuzzy Systems 206 11.3 Nonstationarity Considerations: Neural Network Applications for Climate Studies References 207 209 About the Authors 211 Index 213 Preface This book is intended to provide a practical, applicationsoriented treatment of neural network methodologies for use in atmospheric remote sensing. We focus on the retrieval of atmospheric parameters, such as the Earth’s temperature and water vapor profiles and precipitation rate, but the techniques described can be applied to a wide variety of problems where function approximation is required. We use simple, largely theoretical examples to provide the reader with intuition on how performance is affected by basic neural network attributes such as model selection, initialization, and training methodology, and we then build these simple techniques into larger, “realworld” applications that are common throughout the field of atmospheric remote sensing. Many of the examples are accompanied by M ATLABTM (www.mathworks.com) software codes (available on the accompanying CDROM in the back of the book) that can be used as building blocks for larger and more complex problems. These codes were written using the freely available N ETLAB Neural Network package and do not require any M ATLABTM “addon” toolboxes. The book is organized as follows. Chapter 1 presents an overview of the issues and challenges related to the retrieval of atmospheric parameters from remote measurements of atmospheric emission and scattering. Chapter 2 provides a summary of many of the relevant physical processes at the foundation of atmospheric remote sensing, including atmospheric composition, wave propagation, atmospheric absorption and scattering, radiative transfer, and spectrometer systems. Chapter 3 then follows with an overview of mathematical inversion methods commonly used in retrievals of atmospheric parameters from radiometric data, including iterative minimum variance approaches, regression, and Bayesian techniques. Constraints imposed on solutions through the use of regularization techniques are also discussed. xiii xiv Neural Networks in Atmospheric Remote Sensing Chapter 4 presents theoretical background on many of the signal processing techniques commonly used in conjunction with neural network processing, including principal components analysis (PCA), Wiener filtering, periodic data representation, and blind estimation. Chapter 5 introduces multilayer perceptron neural networks and provides a general description of network topology and optimization in the broad context of machine learning. Chapter 6 presents detailed methodologies for network model selection, initialization, and training. Guidance is also provided for the use of these methodologies in practice, and common pitfalls are discussed. Chapter 7 discusses pre and postprocessing techniques that can be used to improve the effectiveness of the concomitant retrieval operators by reducing the volume of data that must be processed, and/or removing unwanted, interfering signals, such as noise, clouds, or surface variability. Chapter 8 provides practical guidance on the evaluation of network performance and discusses the important concepts of under and overfitting, early stopping and weight decay, and network stability. The book culminates with two examples of complete neural network retrieval systems used to retrieve atmospheric parameters from passive spaceborne spectrometers. Chapter 9 presents an example of a highly nonlinear problem, retrieval of precipitation rate from passive microwave measurements, and Chapter 10 presents an example of a highdimensional problem, retrieval of temperature and moisture profiles from combined microwave and infrared measurements. A discussion of possible future work is given in Chapter 11. A great number of people have contributed to the work presented in this book. George Aumann, Chris Barnet, Mous Chahine, Mitch Goldberg, Tom Pagano, Bill Smith, and Joel Susskind of the AIRS Science Team have provided helpful suggestions and constructive criticisms that have helped to shape the course of much of our neural network retrieval research involving the AIRS products. Larrabee Strow and Scott Hannon provided the AIRS transmittance codes and guidance on their use. Many in the broad NPOESS community have provided valuable feedback and support, including Karen St. Germain, Degui Gu, Xu Liu, Steve Mango, and Dan Zhou. We would like to thank Laura Bickmeier, Chuck Cho, Monica Coakley, Harry Finkle, Chris Gittins, Laura Jairam, John Kerekes, Dan Mooney, Mike Pieper, Phil Rosenkranz, Chinnawat Surussavadee, and Kevin Wong for many helpful discussions. We are grateful to Dave Weitz, Vince Leslie, and Dimitris Manolakis for thoughtful comments on the manuscript and to Seth Hall for computer support. We thank Greg Berthiaume, Hsiaohua Burke, and Roger Sudbury for their support and encouragement over the course of this project. A special debt of gratitude is expressed to Dave Staelin for the guidance and support he has provided on all facets of this work, including preparation Preface xv of this book. His thoughtful comments have added substantially to the clarity and completeness of the presentation. William J. Blackwell would especially like to thank Megan for putting up with his preoccupation with this project over more than a few nights and weekends and for offering many fresh and insightful perspectives. 1 Introduction Measurements of the state and composition of the Earth’s surface and atmosphere have been made using passive microwave and infrared sensors for over 50 years [1]. Applications of these remote measurements are numerous, and encompass fields ranging from meteorology, oceanography, geology, and ecology. For example, satellite measurements of atmospheric temperature are used to improve weather forecasting models, analyze climate change, and study the radiation budget of Earth [2]. Recent advances in airborne and spaceborne sounding platforms have made atmospheric measurements possible on a global scale, and advances in sensor technologies have pushed the limits of achievable spatial and temporal resolution to unprecedented levels. These performance improvements, however, are not without concomitant data processing difficulties. The vast amount of data generated by present and next generation sounding systems must be transmitted and processed in a timely manner (usually near real time), which requires processing algorithms that are both computationally efficient and robust to sensor and atmospheric anomalies (an erupting volcano, for example). 1.1 Present Challenges A principal complication in the retrieval of geophysical parameters such as the global threedimensional atmospheric temperature and moisture profile from satellite radiance observations is the nonlinear, nonGaussian, and illposed physical and mathematical relationship between the radiance observed by a remote sensing instrument and the desired retrieved quantity. Great strides have recently been made to improve and better characterize the models that are used to capture these relationships, but these models are seldom 1 2 Neural Networks in Atmospheric Remote Sensing invertible by direct means, usually due to the complex nature of the underlying physics of the relevant geophysical processes. Common inversion approaches involve iterated numerical optimization methodologies that minimize a cost function subject to constraints imposed by a set of regularization parameters constructed so that the optimization tends toward solutions that are more “statistically probable” and/or “physically realistic.” These regularization parameters are often largely subjective, and the construction of effective retrieval algorithms therefore requires a substantial component of “black art” to balance the use of the information content in the measured upwelling atmospheric radiances with the plausibility of the retrieval. A logistical drawback to iterated, modelbased inversion techniques is the computational burden required to carry out the numerical optimizations. Modern thermal infrared sensors measure spectral radiances in tens of thousands of separate wavebands (sometimes termed “hyperspectral” or even “ultraspectral”) for each observed pixel. The computational complexity of the optimization routines typically scales as the square (or cube) of the number of channels, and it is rare that all of the information available in the radiance spectrum is used. The vast presence of clouds further degrades performance, and therefore a separate preprocessing stage is often employed prior to (or in concert with) numerical inversion to correct the substantial radiance errors that can be introduced due to the high opacity of cloud formations in the infrared wavelengths. 1.2 Solutions Based on Neural Networks An alternative approach to the numerical inversion approach described above is statistical regression (parameterized function approximation), where an ensemble of input/output pairs is used to empirically derive statistical relationships between the ensembles. In the case of linear regression, secondorder statistical moments (covariances) are used to compute a linear fit that minimizes the sumsquared error between the fit and the data. A linear representation is seldom sufficient to fully characterize the complex statistical relationships endemic in atmospheric data, and nonlinear regression techniques must be used. An artificial neural network is a special class of nonlinear regression operators – the mathematical structure of a neural network is chosen to afford several desirable properties, including scalability and differentiability. Patterned after the human nervous system, an artificial neural network (hereafter, simply a neural net) consists of interconnected neurons, or nodes, that implement a simple, nonlinear function of the inputs. Usually, the inputs are linearly weighted (the weights modulate each input and the biases provide an offset) and passed through an activation function Introduction 3 (often nonlinear). The power of neural networks, both from the standpoint of their capabilities and the derivation of their free parameters, stems from the parallel structure of the computational elements. In this book, we primarily consider feedforward connections of layers of nodes with sigmoidal (softlimit) activation nodes. Many other variations can be used, but the feedforward variety is most common and the techniques described here are readily applied to other topologies. The neural network approach offers several substantial advantages over iterated, modelbased inversion methodologies. Once the weights and biases are derived (during the training process), the network operates very quickly and can be easily implemented in software. This simplicity and speed greatly facilitates the development and maintenance, and therefore cost, of complex geophysical retrieval systems that process high volumes of hyperspectral data. The trained neural networks are continuous and differentiable, which simplifies error propagation and therefore performance sensitivity analyses. Finally, neural networks can approximate functions with arbitrarily high degrees of nonlinearity with a sufficient number of nodes and layers. These advantages have spurred the recent use of neural network estimation algorithms for geophysical parameter retrievals [3–5]. Methods based on neural networks for data classification have also become commonplace, although we will focus on regression in this book. Many of the tips and techniques discussed, however, are directly applicable to both types of problems. 1.3 Mathematical Notation One of the primary goals of this book is to cohesively unite the fields of statistics and estimation, mathematical inversion, machine learning, and radiative transfer in the context of the atmospheric retrieval problem. A principal challenge in this endeavor is to reconcile the often disparate sets of mathematical notation used in the literature for each field. For example, Rodger’s classic treatment of the retrieval of atmospheric state variables [6] denotes the state vector to be retrieved (i.e., the output of the retrieval algorithm) as “x,” whereas the statistical and machine learning literature almost always reserves x to denote the input of the algorithm. Our objective has therefore been to develop a notational convention that maximizes the commonality of the notations of the various communities. To minimize confusion and ambiguity, we have also tried to choose mnemonic notations, where possible. The conventions and variable notations we have adopted are shown in Tables 1.1 and 1.2, respectively. 4 Neural Networks in Atmospheric Remote Sensing Table 1.1 Mathematical Notation Variable and Operator Types Notation Scalars and functions with scalar output Lowercase letters Vectors Uppercase letters Matrices Boldface uppercase letters Vectorvalued functions Boldface lowercase letters Expected value E(·) Transpose Noisy variable (·)T f (·) Estimate c (·) Table 1.2 Variable Names Variable Name Radiance measurement vector (retrieval input) Atmospheric state vector (retrieval output) Neural network target (or truth) vector Generic input vector Generic output vector Noise vector Weight matrix Covariance matrix Noise covariance matrix Error covariance matrix Kernel function Feature map Notation R S T X Y Ψ W C CΨΨ C k(X, X 0 ) Φ(X) Introduction 5 References [1] D. H. Staelin. “Passive remote sensing at microwave wavelengths.” Proceedings of the IEEE, 57(4):427–439, April 1969. [2] G. L. Stephens. Remote Sensing of the Lower Atmosphere. Oxford University Press, New York, 1994. [3] P. M. Atkinson and A. R. L. Tatnall. “Introduction to neural networks in remote sensing.” Int. J. Remote Sensing, 18(4):699–709, 1997. [4] V. M. Krasnopolsky and F. Chevallier. “Some neural network applications in environmental sciences. Part I: Forward and inverse problems in geophysical remote measurements.” Neural Netw., 16(34):321–334, 2003. [5] V. M. Krasnopolsky and F. Chevallier. “Some neural network applications in environmental sciences. Part II: Advancing computational efficiency of environmental numerical models.” Neural Netw., 16(34):335–348, 2003. [6] C. D. Rodgers. “Retrieval of atmospheric temperature and composition from remote measurements of thermal radiation.” J. Geophys. Res., 41(7):609–624, July 1976. 2 Physical Background of Atmospheric Remote Sensing We begin with a broad overview of relevant physical issues in passive atmospheric sounding to provide background and context to results developed later in the book. For additional details, the reader is referred to excellent references on tropospheric remote sensing [1], atmospheric science [2, 3], atmospheric radiation [4], and electromagnetic wave propagation [5]. 2.1 Overview of the Composition and Thermal Structure of the Earth’s Atmosphere The Earth’s atmosphere extends over 100 km from its surface, and can roughly be categorized into four layers based on the thermal and chemical phenomena that occur within each layer. These layers are (in increasing altitude) the troposphere, the stratosphere, the mesosphere, and the thermosphere. The boundaries between each layer are usually not welldefined, but do show characteristic features. They are the tropopause, stratopause, and mesopause, respectively. The troposphere extends from the surface to an altitude of approximately 12 km (as low as 7 km near the poles and as high as 17 km near the equator) and is characterized by a steady decrease in temperature with altitude. Approximately 80% of the total mass of the atmosphere is contained in the troposphere, and almost all of the Earth’s weather is created there. The troposphere is therefore the focus of most atmospheric sounding research, including the examples presented in this book. The tropopause marks the region of the atmosphere where the temperature gradually changes from decreasing with altitude to increasing with altitude, and forms a somewhat nebulous boundary layer between the troposphere and 7 8 Neural Networks in Atmospheric Remote Sensing the stratosphere. The stratosphere extends to an altitude of approximately 40 km, and is characterized by relatively high concentrations of ozone (a few parts per million). A sharp increase in temperature with altitude occurs in the stratosphere due to the absorption of ultraviolet radiation by ozone. High cirrus clouds sometimes form in the lower stratosphere, but for the most part there are no significant weather patterns in this layer, and horizontal and vertical atmospheric variability is much smaller than in the troposphere. The mesosphere extends from approximately 40 to 80 km, and is characterized by a decreasing temperature with altitude. Extremely low temperatures (∼ −150◦ C) present at the top of the mesosphere sometimes allow the presence of noctilucent clouds, thought to be made of ice crystals that have formed on dust particles. The transition from the mesosphere to the thermosphere layer begins at an altitude of approximately 80 km. The thermosphere is characterized by warmer temperatures caused by the absorption of the sun’s shortwave ultraviolet radiation. This radiation penetrates the upper atmosphere and causes the atmospheric particles to become positively charged. These ionized particles build up to form a series of layers, often referred to as the ionosphere. 2.1.1 Chemical Composition of the Atmosphere The Earth’s atmosphere is composed of a variety of gases. Each gas interacts characteristically with electromagnetic radiation of a given frequency. This relationship forms the physical basis by which the atmospheric temperature can be measured by observing radiation of different frequencies that has been emitted by and transmitted through the atmosphere. The average fractional volumes of various species in the Earth’s atmosphere are given in Table 2.1. Perhaps the most important gases in the atmosphere, from the point of view of their interaction with electromagnetic radiation, are water vapor, oxygen, carbon dioxide, and ozone. Oxygen and carbon dioxide are wellmixed in the atmosphere below approximately 100 km, and therefore frequencies near the resonances of these molecules are desirable for temperature sounding. The vertical distribution of ozone reaches maximum concentration near 25 km. Above 30 km, ozone is rapidly formed by photochemical reactions from oxygen so that an equilibrium is established during the daylight hours. Below this level, ozone is created more slowly and is highly variable [1]. Water vapor is perhaps the most influential atmospheric gas from the perspective of weather and climate processes. This is primarily due to its high temporal and spatial variability in the lower troposphere and its large role in energy transfer. Physical Background of Atmospheric Remote Sensing 9 Table 2.1 Composition of the Earth’s Atmosphere (Source: [1]) Molecule Volume Fraction† Comments N2 0.7808 O2 0.2095 H2 O < 0.04 Ar 9.34 × 10−3 CO2 3.45 × 10−4 CH4 1.6 × 10−6 N2 O 3.5 × 10−7 CO 7 × 10−8 O3 ∼ 10−8 CFCl3 and CF2 Cl2 1–2 ×10−10 Photochemical dissociation high in the ionosphere; mixed at lower levels Photochemical dissociation above 95 km; mixed at lower levels Highly variable; photodissociates above 80 km Mixed up to 110 km; diffusive separation above Slightly variable; mixed up to 100 km; dissociated above Mixed in troposphere; dissociated in mesosphere Slightly variable at surface; dissociated in stratosphere and mesosphere Variable photochemical and combustion product Highly variable; photochemical origin Industrial origin; mixed in troposphere, dissociated in stratosphere † Fraction of lower tropospheric air. 2.1.2 Vertical Distribution of Pressure and Density The pressure and density of the Earth’s atmosphere can vary substantially in the vertical dimension. It is therefore helpful to define a reference or 10 Neural Networks in Atmospheric Remote Sensing “standard”1 atmosphere that is a representation of the atmosphere as a function of height only. Below an altitude of 100 km, the atmospheric pressure and density are almost always within ±30% of that of the standard atmosphere [1]. Atmospheric density decreases with altitude due to the Earth’s gravitational field. If a condition of static equilibrium is assumed, the relationship between density and pressure as a function of altitude may be expressed by the following differential equation: dp = −gρ dz (2.1) where p and ρ are the pressure and density at altitude z measured vertically upward from the surface. The change in gravitational force with altitude is small enough over the relatively short extent of the atmosphere to be ignored. The ideal gas equation pV = nRT can be used to relate the density of an ideal gas of molecular weight Mr to its temperature and pressure: ρ= Mr p RT (2.2) where R is the gas constant per mole, and T is the temperature (K). Equation (2.1) can then be expressed as dz dp =− p H which can be integrated to find the pressure p at altitude z: Z z dz p = p0 exp − 0 H (2.3) (2.4) where p0 is the surface pressure and H = RT /Mr g is known as the scale height. The scale height is the increase in altitude necessary to reduce the pressure by a factor of e. In the troposphere, H typically varies between ∼6 km at T = 210 K to ∼8.5 km at T = 290 K [2]. 2.1.3 Thermal Structure of the Atmosphere The macroscopic thermal features of the atmosphere were outlined previously. We now examine features that occur on a finer vertical scale, with a focus 1. The horizontal and temporal variations of the Earth’s atmosphere do vary substantially on a global and seasonal scale. A wide variety of “standard” atmospheres have been tabulated for various geographical regions and seasonal periods [3]. Physical Background of Atmospheric Remote Sensing 11 on the lower troposphere. The bottom 1–2 km of the atmosphere exhibits the greatest thermal variability due to strong surface interactions and diurnal variations. At some latitudes, temperature inversions exist in the lowest 2– 3 km of the atmosphere. Above 3 km, there is a systematic decrease of temperature with altitude that can be characterized by an adiabatic lapse rate, as follows. Continuing the assumption of the previous section that the atmosphere is in hydrostatic equilibrium, the first law of thermodynamics can be applied to a unit “parcel” of atmospheric mass: dq = cv dT + p dV (2.5) where cv is the specific heat at constant volume. Provided no heat enters or leaves the parcel (i.e., the process is adiabatic), the quantity dq is zero. Equation (2.5) can then be substituted into the differential form of the ideal gas law to yield: dT g (2.6) = − = −Γ dz cp where cp is the specific heat at constant pressure and Γ is the lapse rate. Equation (2.6) shows that the change in temperature with altitude is constant, for constant cp and g. Under typical tropospheric conditions, cp varies slightly with altitude, and the dry adiabatic lapse rate in the troposphere is approximately 10 K/km. If the latent heat released by the condensation of rising moist air is considered, the average lapse is approximately 6.5 K/km. 2.1.4 Cloud Microphysics Clouds affect the energy balance of the atmosphere through two mechanisms: (1) water cycle changes, including the release of latent heat through condensation and the removal of liquid water through precipitation, and (2) radiation budget changes, including the scattering, absorption, and emission of solar and terrestrial radiation. In Chapter 9, the microphysical properties of clouds (taken here to mean the size and shape of the particles and their volume concentration) will be used to characterize regions of precipitation by examining their interaction with microwave radiation. It is therefore useful to review several of the salient details of the microphysical structure of clouds and precipitation. The microphysical properties of clouds depend highly on the size, shape, and phase of the water particles. Water droplets are typically smaller than 100 µm and are spherical [1]. The distribution of water droplet concentration (the number of droplets per volume existing in a differential radius range dr) is reasonably approximated by analytic functions. A modified Gamma 12 Neural Networks in Atmospheric Remote Sensing Table 2.2 Representative Drop Size Concentrations (N0 ), Mean Particle Radius (rm ), and Liquid Water Content (l) for Several Cloud Types (Source: [6]) Cloud Type Stratus (ocean) Stratus (land) Fairweather cumulus Maritime cumulus Cumulonimbus Cumulus congestus Altostratus N0 rm l (cm−3 ) (µm) (gm−3 ) 50 300–400 300–500 50 70 60 200–400 10 6 4 15 20 24 5 0.1–0.5 0.1–0.5 0.3 0.5 2.5 2.0 0.6 distribution is often used for this purpose. Table 2.2 gives average values of the number of particles (N0 ), mean droplet radius (rm ), and cloud liquid water density (l) for a variety of clouds. Raindrops are generally nonspherical, resembling oblate spheroids with an aspect ratio (widthtolength ratio) that decreases as the drop size increases. One analytic function that is commonly used to relate raindrop size distributions to rainrate is the MarshallPalmer distribution [7]. Ice crystals form in a wide variety of sizes and shapes. In addition to simple polyhedron forms, irregular crystals or combinations of simple shapes readily appear in nature. 2.2 Electromagnetic Wave Propagation The thermal and compositional state of the atmosphere affects both the generation and propagation of electromagnetic (EM) waves. For now, we ignore the source of the EM waves and focus instead on their propagation through a homogeneous, lossless medium. 2.2.1 Maxwell’s Equations and the Wave Equation In a sourcefree, homogeneous, and isotropic medium with permittivity and permeability µ, the spatial and temporal variation of electric and magnetic Physical Background of Atmospheric Remote Sensing fields are related according to Maxwell’s equations: ~ = −µ ∂ H ~ ∇×E ∂t ~ = ∂E ~ ∇×H ∂t ~ = 0 ∇·E ~ = 0 ∇·H 13 (2.7) (2.8) (2.9) (2.10) A wave equation can be derived by taking the curl of (2.7) and substituting ~ = ∇(∇ · E) ~ − ∇2 E ~ and (2.8). After using the vector identity ∇ × (∇ × E) (2.9), we find: 2 ~ = µ ∂ E ~ ∇2 E (2.11) ∂t2 where the Laplacian operator ∇2 in a rectangular coordinate system is ∂2 ∂2 ∂2 + + ∂x2 ∂y 2 ∂z 2 The wave equation (2.11) is a secondorder partial differential equation of space and time coordinates x, y, z, and t. A simple solution to the wave equation is ~ r, t) = E~0 cos(2πνt ± ~k · ~r) E(~ (2.12) ~ where k = x̂kx + ŷky + ẑkz and ~r = x̂x + ŷy + ẑz. Equation (2.12) represents two waves propagating in opposite directions in the ~k direction with temporal phase 2πνt and spatial phase ~k · ~r. A spectrum of values of frequency (ν) found in atmospheric remote sensing systems is shown in Figure 2.1. ∇2 = 2.2.2 Polarization The electric field vector of a uniform plane wave traveling in the +z direction must lie in the xyplane perpendicular to the zaxis. As time progresses, the tip of the electric field vector traces a curve in the xyplane. It is the shape of this curve (linear, circular, or elliptical) that determines the polarization of the plane wave. If the curve is circular or elliptical, the tip may move in either a clockwise or counterclockwise direction. The interaction of electromagnetic waves with matter often depends (sometimes entirely) on the polarization state of the wave. Some remote sensing systems exploit polarization dependence to extract information about a polarized target. For example, polarimetric microwave measurements of the ocean surface reveal information about sea surface wind speed and direction due to the polarimetric signature of the resulting ocean waves. 14 Neural Networks in Atmospheric Remote Sensing 100 m HF 10 M Hz 10 m 100 M Hz 1GH z Radio P UHF 1cm 1m m Frequency AMSU⌧A/B ATMS W avelength 10 cm 10 GH z SHF 100 GH z EHF 1TH z L S C X Ku K Ka Far 0.1mm IASI AIRS CrIS M icrow ave 1m VHF IR 10 TH z 10 µm 100 TH z 1µm 1000 TH z 0.1µm 0.8 Infrared Visible Ultraviolet T her⌧ m al IR red Near IR orange yellow gree n 0.7 0.6 µm blue 0.5 violet 0.4 0.3 Figure 2.1 The electromagnetic spectrum. The diagram shows those parts of the electromagnetic spectrum that are important in remote sensing, together with the conventional names of the various regions of the spectrum. Also shown are wavelength regions of sensors mentioned in this book: AIRS, AMSU, IASI, CrIS, and ATMS. (After: [1].) 2.2.2.1 Stokes Parameters A mathematical discussion of polarization can be facilitated by decomposing ~ vector into two components perpendicular to the direction of propagathe E tion, ~k, for a fixed point in space: ~ = ĥEh + v̂Ev = ĥeh cos(2πν − ϕh ) + v̂ev cos(2πν − ϕv ) E (2.13) Physical Background of Atmospheric Remote Sensing 15 where ~k, ĥ, and v̂ form an orthogonal system mutually perpendicular to one another. The four Stokes parameters may then be defined as follows: I = Q = U = V = 1 2 (e + e2v ) η h 1 2 (e − e2v ) η h 2 eh ev cos(ϕ) η 2 eh ev sin(ϕ) η (2.14) (2.15) (2.16) (2.17) p where ϕ is the phase difference ϕh − ϕv and η = µ/ is the characteristic impedance. In practice, it is often easier to measure the Stokes parameter rather than measure eh , ev , and ϕ directly. The four Stokes parameter are related as follows: I 2 = Q2 + U 2 + V 2 (2.18) 2.2.3 Reflection and Transmission at a Planar Boundary Electromagnetic radiation from the sun or the cosmic background can reflect off cloud tops and the surface of the Earth. The characterization of the transmitted and reflected components of radiation is necessary to develop cloud and surface models. Consider a linearly polarized plane wave propagating in free space along direction k~i that is incident upon a planar √ dielectric material with index of refraction n = c µ at an incidence angle of θi . The electric fields for the incident, reflected, and transmitted waves can be expressed as (assuming a TE wave propagating in the xzplane): ~ i = ŷEi e−j k~i ·~r E ~ E~r = ŷΓEi e−j kr ·~r ~ t = ŷT Ei e−j k~t ·~r E (2.19) where Γ and T are the complex reflection and transmission coefficients, respectively. The tangential components of the net electric field must vanish at the boundary, requiring the tangential components of all three ~k vectors to be equal along the boundary. The tangential components of the ~k vectors can be expressed in terms of the angles of incidence, reflection, and transmission to yield ki sin θi = kr sin θr = kt sin θt (2.20) 16 Neural Networks in Atmospheric Remote Sensing √ where ki = kr = ω µi i is the magnitude of the propagation vectors k~i and √ k~r . The magnitude of the transmitted wave vector is kt = ω µt t , which is in general not equal to ki . Substitution into (2.20) gives the reflection law and Snell’s law θr = θi (2.21) sin θt ki ni = = sin θi kt nt (2.22) Given k~r and k~t , the complex reflection and transmission coefficients can be found by supplementing the boundary condition for continuity of the electric field with a similar equation for the tangential magnetic field. For TE waves, ΓT E = TT E = ηt cos θi − ηi cos θt ηt cos θi + ηi cos θt 2ηt cos θi ηt cos θi + ηi cos θt (2.23) (2.24) and for TM waves, ΓT M = TT M = ηi cos θi − ηt cos θt ηi cos θi + ηt cos θt 2ηi cos θi ηi cos θi + ηt cos θt (2.25) (2.26) As an important consequence of the preceding equations, unpolarized radiation incident upon a planar dielectric surface can become partially or totally polarized on reflection. For example, a portion of the unpolarized microwave radiation emitted by the atmosphere is reflected by the ocean surface and another portion is absorbed and reemitted by the ocean surface. However, the TE and TM components of the emitted radiation are different when viewed from oblique angles, a characteristic that can be used to discriminate surface water from rainfall [1]. 2.3 Absorption of Electromagnetic Waves by Atmospheric Gases A knowledge of the mechanisms of electromagnetic radiation interaction with matter, as well as some of the fundamental properties of matter itself, is necessary to infer and interpret information about the atmosphere. In the following two subsections, the interactions are described on a microscopic (molecular) and macroscopic (particle) level. Physical Background of Atmospheric Remote Sensing 17 2.3.1 Mechanisms of Molecular Absorption The total internal energy of an isolated molecule consists of three types of energy states, E = Ee + Ev + Er (2.27) where Ee = electronic energy, Ev = vibrational energy, and Er = rotational energy. Rotational energy is associated with rotational motions of the atoms of the molecule about its center of mass, and vibrational energy is associated with vibrational motions of the atoms about their equilibrium positions. Radiation is absorbed or emitted when a transition takes place from one energy state to another. The frequency (ν) of the absorbed (or emitted) photon is given by the Bohr frequency condition, Eh − El ν= (2.28) h where h is Planck’s constant and Eh and El are the internal energies of the higher and lower molecular states, respectively. The absorption spectrum due to a single transition is called an absorption line. Absorption by molecules in the mid and nearinfrared occur by vibration (although a mixture of vibrations and rotations are usually induced at these frequencies). In the microwave and farinfrared, rotational transitions are the dominant mechanism of energy transfer. 2.3.2 Line Shapes Based on (2.28), the absorption (or emission) spectrum of an isolated, unperturbed, stationary molecule consists of sharply defined frequency lines corresponding to transitions between quantized energy levels of the molecule. Atmospheric gases, however, consist of molecules that are in constant motion, interacting and colliding with one another. These disturbances cause the absorption lines to broaden. The two most important sources of line broadening are Doppler (thermal) broadening and pressure (collision) broadening, which is dominant for most frequencies up to an altitude of approximately 40 km [8]. 2.3.3 Absorption Coefficients and Transmission Functions Line shape f (ν − ν0 ), line position (ν0 ), and line strength (S) mathematically define the absorption coefficient: κν = S f (ν − ν0 ) (2.29) 18 Neural Networks in Atmospheric Remote Sensing The line strength of a specific atmospheric gas is governed by the number of absorbing molecules of that gas per unit volume, the temperature of the gas, and the molecular parameters associated with that transition. Absorption of radiation by gases in the Earth’s atmosphere is described in terms of transmission functions (or simply, transmittance). Lambert’s law states that the change in radiance intensity along a path ds is proportional to the amount of matter along the path: dRν = −κν Rν ds (2.30) where κν is the volume absorption coefficient. Integration of Lambert’s law along the path connected by s1 and s2 yields Rν (s2 ) = Tν (s1 , s2 )Rν (s1 ) (2.31) where Tν (s1 , s2 ) is the monochromatic transmittance defined as Tν (s1 , s2 ) = e − Rs 2 s1 κν ds The optical path (or thickness)2 between s1 and s2 is defined as Z s2 τν (s1 , s2 ) = κν ds (2.32) (2.33) s1 The absorption coefficient, transmittance, and optical path form the mathematical basis for the subject of Section 2.5.2: radiative transfer. In practice, these quantities are not monochromatic, but bandaveraged over some spectral response function of the instrument. 2.3.4 The Atmospheric Absorption Spectra The atmospheric absorption spectrum for microwave frequencies is shown in Figure 2.2. Notable features include the water vapor absorption lines centered at 22.235, 183.31, and 325.15 GHz (lines at 380.20 and 448.00 GHz are difficult to identify on the plot) and oxygen absorption lines near 60, 118.75, 368.50, 424.76, and 487.25 GHz. The atmospheric absorption spectrum for infrared wavelengths between 3 and 15 µm is shown in Figure 2.3. Notable features include the water vapor absorption lines near 6–7 µm, ozone absorption lines near 10 µm, and carbon dioxide absorption lines near 4.3–4.6 µm and 13–15 µm. 2. The related quantities optical depth and opacity will be defined later. 19 Physical Background of Atmospheric Remote Sensing H O O2 2 O O 2 O 2 2 No water vapor 100 Transmission (percent) O 2 80 H2O 60 2 1.5 g/cm water vapor 40 H O 2 20 0 0 50 100 150 200 250 300 350 400 450 500 Frequency (GHz) Figure 2.2 The microwave absorption spectrum. Two calculations for the percent transmission (nadir view) using the 1976 Standard Atmosphere are shown, one assuming no water vapor and one assuming 1.5 g/cm2 . 2.4 Scattering of Electromagnetic Waves by Atmospheric Particles In addition to the molecular absorption mechanisms discussed earlier, electromagnetic waves are also scattered and absorbed by much larger particles often found in the atmosphere, such as cloud water droplets, raindrops, or even dust. The scattering of electromagnetic waves upon interaction with atmospheric particles provides a tool that can be used to help retrieve many microphysical parameters related to clouds and precipitation. 2.4.1 Mie Scattering A suspended particle of geometrical crosssection A will absorb a fraction of incident power and will also scatter incident power in all directions. The ratio of absorbed power Pa (W) to incident power density S (W/m2 ) is known as 20 Neural Networks in Atmospheric Remote Sensing CO O2 2 CO O 3 2 Transmission (percent) 100 80 60 40 H O 2 20 1.5 g/cm2 water vapor No water vapor 0 3 6 9 12 15 Wavelength (µm) Figure 2.3 The thermal infrared absorption spectrum. Two calculations for the percent transmission (nadir view) using the 1976 Standard Atmosphere are shown, one assuming no water vapor and one assuming 1.5 g/cm2 . the absorption crosssection Pa (2.34) S and the ratio of Ca to the physical crosssection A is known as the efficiency factor Qa . For a spherical particle of radius r, A = πr2 and therefore Ca = Qa = Ca πr2 (2.35) Analogous quantities for scattering, that is, the scattering crosssection Cs and the scattering efficiency Qs , are defined as Cs = Ps S (2.36) Qs = Cs πr2 (2.37) Physical Background of Atmospheric Remote Sensing 21 The total power lost due to absorption and scattering (together known as the extinction) is Pa + Ps and the resulting extinction crosssection Ce and efficiency Qe are Ce = Ca + Cs (2.38a) Qe = Qa + Qs (2.38b) The solution for the scattering and absorption of electromagnetic waves in free space by a dielectric sphere of radius r was formulated by Mie in terms of the “size parameter” 2πr χ= (2.39) λ and √ n = c (2.40) where λ is the wavelength of the incident wave, n is the complex refractive index of the particle and c is the corresponding complex dielectric constant. Mie’s expressions for the scattering and extinction efficiencies of the sphere are given by Qs (n, χ) = ∞ 2 X (2m + 1)(am 2 + bm 2 ) χ2 (2.41a) ∞ 2 X (2m + 1)Re{am + bm } χ2 (2.41b) m=1 Qe (n, χ) = m=1 where am and bm are known as the Mie coefficients jm (nχ)[χjm (χ)]0 − jm (χ)[nχjm (nχ)]0 jm (nχ)[χhm (χ)]0 − hm (χ)[nχjm (nχ)]0 (2.42a) jm (χ)[nχjm (nχ)]0 − n2 jm (nχ)[χjm (χ)]0 hm (χ)[nχjm (nχ)]0 − n2 jm (nχ)[χhm (χ)]0 (2.42b) am = − bm = − where jm (·) and hm (·) are the spherical Bessel and Hankel functions of the first kind, and the (·)0 operator denotes the complex conjugation. 2.4.2 The Rayleigh Approximation The Mie expressions for Qs and Qe can be approximated with negligible error if the particle size is much smaller than the wavelength of the incident wave (nχ 1). The Rayleigh approximation is obtained by retaining only the most significant terms in the series expansion: 8 Qs = χ4 K2 3 (2.43) 22 Neural Networks in Atmospheric Remote Sensing 8 Qe = 4χIm{−K} + χ4 K2 3 (2.44) Qa = 4χIm{−K} (2.45) and where K is a complex quantity defined in terms of the complex index of refraction n c − 1 n2 − 1 = (2.46) K= 2 n +2 c + 2 Note that in the Rayleigh limit the scattering efficiency scales as the fourth power of frequency, whereas the absorption efficiency scales linearly with frequency, for a fixed particle size and a frequencyindependent index of refraction. For water, the index of refraction is frequencydependent, and the absorption efficiency scales as frequency squared (for frequencies below 100 GHz or so) when this dependence is included. 2.4.3 Comparison of Scattering and Absorption by Hydrometeors Figure 2.4 shows scattering and absorption contributions of water spheres, both in the liquid and ice phases. Deirmendjian’s recursive procedure [9] was used to calculate the Mie coefficients; 80 terms were used to approximate the series. For liquid droplets, absorption is dominant in the Rayleigh region and scattering is dominant in the Mie region. For ice, scattering is dominant for all but the lowest microwave frequencies. The frequency dependence of scattering and absorption can be used to retrieve information about the particle size distributions of clouds, and the related quantity, rainrate. The distribution and type of hydrometeors found in typical clouds vary widely, and monodispersive models are inadequate. More complicated modeling is beyond the scope of this book; an excellent discussion can be found in [10]. 2.5 Radiative Transfer in a Nonscattering PlanarStratified Atmosphere A sensor high above the Earth’s surface receives emission from the Earth and its atmosphere, along with any reflected components of solar and cosmic background radiation. Measurements of this emission allow the retrieval of many atmospheric parameters, including the temperature and water vapor profile, the amount of cloud liquid water, rainrates, and sea surface temperatures. 23 Physical Background of Atmospheric Remote Sensing 2 10 Scattering Absorption 0 Efficiency factor 10 −2 10 −4 10 −6 10 1−mm spheres of liquid water −8 10 1 10 100 1000 Frequency (GHz) 2 10 Scattering Absorption 0 Efficiency factor 10 −2 10 −4 10 −6 10 1−mm spheres of ice −8 10 1 10 100 1000 Frequency (GHz) Figure 2.4 Scattering and absorption efficiency for water spheres with 1mm radius. Liquid water spheres (273 K) are shown in the top plot and ice spheres (266 K) are shown in the bottom plot. 24 Neural Networks in Atmospheric Remote Sensing 2.5.1 Equilibrium Radiation: Planck and Kirchhoff’s Laws The solution to the simple differential equation presented by Lambert’s law (2.30) is referred to as Beer’s law Rν (s00 ) = Rν (s0 )e− R s00 s0 κν (s) ds (2.47) In addition to absorption of radiation by the gas contained within path s0 to s00 , Kirchhoff’s law states that if in thermal equilibrium, the gas also emits radiation in an amount proportional to the absorption coefficient κν : Rνemission = κν Jν (T ) (2.48) where Jν (T ) is the radiation intensity produced (at each of two orthogonal polarizations) by a blackbody at temperature T and frequency ν: Jν (T ) = hν 3 1 W · m−2 · ster−1 · Hz−1 c2 ehν/kT − 1 (2.49) The Planck equation exhibits a nonlinear relationship between intensity and temperature. The degree of the nonlinearity is frequencydependent, and is shown in Figure 2.5. The nonlinearity is most severe at the higher frequencies (shorter wavelengths) and almost nonexistent at the microwave frequencies. The approximation of the Planck radiance by the linear Taylor series term is called the RayleighJeans (RJ) approximation, and the microwave brightness temperature is defined as the scaled intensity: Bν = c2 Rν 2ν 2 k (2.50) Note that if a radiometer is calibrated against a blackbody and all departures from the RayleighJeans law are ignored, brightness temperature is effectively redefined as c2 hv Bν = 2 Rν + (2.51) 2ν k 2k and accuracy is better than 0.1 K for frequencies up to 300 GHz and terrestrial temperatures. When extremely cold temperatures are encountered (e.g., the cosmic background) corrections to the RJ approximation are needed. 2.5.2 Radiative Transfer Due to Emission and Absorption The net change in radiation along ds due to the combination of emission and absorption is dRν = dRνemission + dRνabsorption (2.52) 25 Physical Background of Atmospheric Remote Sensing 1 0.9 m 5m Radiance (T) / Radiance (300) 0.8 0.7 0.6 µm 15 0.5 0.4 0.3 0.2 µm 4 0.1 0 200 210 220 230 240 250 260 Temperature (K) 270 280 290 300 Figure 2.5 Nonlinearity of the Planck function as a function of wavelength. Substitution of (2.30) and (2.48) into (2.52) yields the Schwartzchild equation dRν = −κν [Rν − Jν (T )] ds (2.53) which mathematically describes how radiation is transferred from one layer to another layer as a result of absorption and emission. The intensity of radiation leaving the path is therefore a function of both the absorber along the path and the temperature along the path. Passive (emissionbased) sounding of constituent concentration and temperature is based upon this principle. 2.5.3 Integral Form of the Radiative Transfer Equation Differentiation of (2.33) gives dτν (s) = −κν (s) ds (2.54) where we adopt the convention that τ increases from zero downward from the top of the atmosphere to a maximum value τ ∗ (the opacity of the atmosphere) 26 Neural Networks in Atmospheric Remote Sensing at the surface. Multiplying both sides of (2.53) by e−τν (s) and combining terms gives dRν e−τν (s) = −Jν e−τν (s) (2.55) dτν which upon integration from path s0 to s00 yields Z τ (s0 ) 00 0 Jν (s)e−τν (s) dτ (s) (2.56) Rν (s00 )e−τν (s ) − Rν (s0 )e−τν (s ) = τ (s00 ) Equation (2.56) can be rearranged into the integral form of the radiative transfer equation as follows: Z s00 00 00 0 −[τν (s0 )−τν (s00 )] Jν (s)e−[τν (s)−τν (s )] dτν (s) Rν (s ) = Rν (s )e + s0 (2.57) The equivalent relation in terms of the absorption coefficient κν is Z s00 R s00 R 00 00 0 − ss0 κν (s)ds κν (s)Jν (s)e− s κν (σ) dσ ds (2.58) + Rν (s ) = Rν (s )e s0 The angular properties of emission have thus far been neglected, but can easily be included for the case of a horizontally homogeneous vertically stratified atmosphere by noting that an angular tilt of θ results in an increase in the path length by a factor of sec θ (see Figure 2.6). Optical depth is related to optical path as follows: τ (s) = τ (z) sec(θ) (2.59) After including the angular terms, the final form of the radiative transfer equation describing the radiation intensity observed at altitude L and viewing angle θ can be formulated by including reflected atmospheric and cosmic contributions and the radiance emitted by the surface: Z L RL 0 0 Rν (L) = κν (z)Jν [T (z)]e− z sec θκν (z ) dz sec θ dz 0 Z L Rz 0 0 −τ ∗ sec θ κν (z)Jν [T (z)]e− 0 sec θκν (z ) dz sec θ dz + ρν e 0 −2τ ∗ sec θ + ρν e Jν (Tc ) −τ ∗ sec θ + εν e Jν (Ts ) (2.60) where εν is the surface emissivity, ρν is the surface reflectivity, Ts is the surface temperature, and Tc is the cosmic background temperature (2.736 ± 0.017 K). 27 Physical Background of Atmospheric Remote Sensing θ . s2 p1 ds z2 dz z1 p2 ds = secθ dz z=0 ps Figure 2.6 Geometry of the planarstratified atmospheric radiative transfer equation. 2.5.4 Weighting Function The first term in (2.60) can be recast in terms of the transmittance function Tν (z): Z L dTν (z) Rν (L) = Jν [T (z)] dz (2.61) dz 0 The derivative of the transmittance function with respect to altitude is often called the weighting function 4 Wν (z) = dTν (z) dz (2.62) and gives the relative contribution of the radiance emanating from each altitude. Note that the Planck radiances are weighted, not the temperature profile. It is sometimes useful to define a temperature weighting function, where the temperature profile is weighted directly. One approach is to express the radiance intensity Rν (L) in terms of a “blackbodyequivalent” brightness temperature TB,ν (L) (the temperature of a blackbody that produces a radiance equivalent to Rν (L) – note that TB,ν (L) 6= Bν ) and linearize about a nominal temperature profile T0 (z) and corresponding radiance R0,ν (L). 28 2.5.4.1 Neural Networks in Atmospheric Remote Sensing Temperature Weighting Function For a particular frequency, the blackbodyequivalent radiance may be written as follows: TB,ν (L) = Jν−1 (Wν (Jν (Tz ))) (2.63) where Jν (·) is the Planck function, Wν (·) is the integration against the weighting function, and Jν−1 (·) is the inverse Planck function. The firstorder Taylor series approximation of Jν−1 (Wν (Jν (·))) is then dJν−1 dWν dJν [T (z) − T0 (z)] dWν dJν dT = WT,ν (z)[T (z) − T0 (z)] + R0,ν (L) Rν (L) = R0,ν (L) + (2.64) (2.65) where WT,ν (z) is defined as the temperature weighting function: 4 WT,ν (z) = dJν−1 dWν dJν dWν dJν dT (2.66) T0 (z) The Planck radiance function can be linearized about some nominal temperature profile T0 , and a temperature weighting function (sometimes called an incremental weighting function) can be defined: Z Rν (L) = L Z 0 L [T (z) − T0 (z)]WT,ν (z)dz (2.67) Wν (z)Jν [T0 (z)]dz + 0 where the temperature weighting function is defined as WT,ν (z) = dJν [T0 (z)] hcν Jν [T0 (z)] Wν (z) = Wν (z) dT k T02 (z) (2.68) The difference between the Planck weighting function and the temperature weighting function can be significant for shortwavelength channels, as shown in Figure 2.7. The temperature weighting functions are sharper and peak lower in the atmosphere. The RMS errors (in units of blackbodyequivalent brightness temperature) resulting from the use of the firstorder approximation given by (2.67) (and assuming the weighting functions are independent of atmospheric parameters) over a representative set of atmospheric profiles T (z) (with T0 (z) = E[T (z)]) are shown in Figure 2.8 for the channel set of the NASA Atmospheric Infrared Sounder (AIRS) launched on the Aqua satellite in 2002 [11]. The two dominant sources of error are the nonlinearity of the Planck function (most evident in the shortwavelength channels) and the nonlinearity of the atmospheric transmittance (most evident 29 Physical Background of Atmospheric Remote Sensing 4.2 µm 18 16 16 14 14 12 12 Altitude (km) Altitude (km) 14.2 µm 18 10 8 10 8 6 6 4 4 2 2 0 0 0.05 0 0.1 0 Weight (1/km) 0.1 0.2 Weight (1/km) Planck weight Temperature weight Figure 2.7 The Planck radiance weighting function and the temperature weighting function for two infrared channels. RMS Error (K) 10 8 6 4 2 0 2 4 6 8 10 12 14 Wavelength (µm) Figure 2.8 AIRS RMS radiance error due to firstorder Planck approximation. 16 30 Neural Networks in Atmospheric Remote Sensing in the water vapor channels). Temperature weighting functions are almost never used directly to evaluate the radiative transfer equation because of the large errors introduced by the exclusion of nonlinearities. Nevertheless, the temperature weighting functions provide a useful characterization of the way different layers of the atmosphere at different temperatures contribute to the radiation emitted at the top of the atmosphere. Returning to the special case of microwave frequencies and terrestrial temperatures (i.e., the RayleighJeans regime), (2.60) can be expressed in terms of the physical temperature profile T (z) and the brightness temperature Bν : Z L RL 0 0 κν (z)T (z)e− z sec θκν (z ) dz sec θ dz 0 Z L Rz 0 0 −τ ∗ sec θ κν (z)T (z)e− 0 sec θκν (z ) dz sec θ dz + ρν e Bν (L) = 0 −2τ ∗ sec θ + ρν e Tec ∗ + εν e−τ sec θ Ts (2.69) where Tec is the corrected cosmic background temperature hν (ehν/kTc + 1) Tec = 2k (ehν/kTc − 1) (2.70) Note that in the microwave case, WT,ν (z) = Wν (z). 2.6 Passive Spectrometer Systems Measurement of the Earth’s radiation at a spectral resolution high enough to study details of molecular absorption bands is achieved using spectrometer sensor systems. The terms “multispectral,” “hyperspectral,” and more recently “ultraspectral” have been used to denote spectrometer systems that measure radiance in tens, hundreds (or sometimes a few thousand), and thousands of spectral channels, respectively. The chapter concludes with a brief discussion of various spectrometer technologies, with a focus on performance advantages, disadvantages, and engineering tradeoffs that must be considered when selecting an instrument to carry out a given remote sensing task. The concepts presented here are rudimentary; for more details, the reader is referred to [12–14]. Physical Background of Atmospheric Remote Sensing 31 2.6.1 Optical Spectrometers For the purpose of an elementary discussion, optical spectrometers can be grouped into three system types: prism dispersion, diffraction grating, and radiation interference. The systems differ primarily in the mechanisms used to separate incident radiation into discrete spectral components. 2.6.1.1 Prism Dispersion Systems A prism spectrometer produces radiance spectra by passing the incident radiation through a prism. The prism causes the radiation to disperse (bend) by a frequencydependent angle. The degree to which the radiation is dispersed is determined by the refractive index of the prism. The spectra are usually detected either by sweeping the dispersed radiation across a fixed detector (for example, a photomultiplier), or sweeping the detector across the radiation field. The spectral resolution achieved by prism spectrometers is relatively coarse, and they are therefore used typically in imaging systems. The calibration of prism systems is also problematic because of the frequency dependence of the index of refraction of the prism. 2.6.1.2 Diffraction Grating Systems A diffraction grating disperses radiation into spectra through angulardependent interference patterns that result when radiation is passed through a dense array of small slits. Diffraction grating systems operate by either transmitting radiation through or reflecting radiation from a series of closely spaced parallel lines etched on plastic film (for transmission) or a metallic surface (for reflection). Transmission gratings generally perform poorly in comparison with reflection gratings, which are used in highperformance space spectrometers. The spectral resolving power of grating spectrometers typically exceeds that of prism spectrometers by an order of magnitude, at the expense of increased instrumentation complexity. The AIRS instrument, which is discussed in Chapter 10, is a diffraction grating spectrometer. 2.6.1.3 Interferometer Systems The interferometer spectrometer operates quite differently than the prism or grating spectrometer in that interference effects instead of dispersion effects are used to separate spectra. One of the simplest types of interferometers is the Michelson interferometer, which splits incoming radiation into two beams of unequal length by a partially silvered plate (beam splitter) and later recombines the beams with a known path difference. The path difference 32 Neural Networks in Atmospheric Remote Sensing can be varied uniformly by moving a mirror at a constant speed, causing the two beams to move in and out of phase at the detector. The intensity of the resulting waveform (termed the interferogram) is related to the spectral intensity of the incident radiation by the Fourier transform. The interferogram is the autocorrelation function of the optical signal. The performance of the interferometer spectrometer relative to the grating spectrometer depends on a number of factors, including the nature (e.g., mechanical and electrical) and origin (e.g., photon and thermal) of system noise. Examples of interferometer sounding systems include the NPOESS Aircraft Sounder Testbed–Infrared (NASTI) [15], the Crosstrack Infrared Sounder (CrIS) [16], and the Infrared Atmospheric Sounding Interferometer (IASI) [17]. 2.6.2 Microwave Spectrometers Microwave and optical spectrometer systems are conceptually similar. Perhaps the most pernicious source of error in microwave spectrometer systems is the instability of the receiver, and the primary difference among microwave systems is the way in which receiver sensitivity is compromised for receiver stability. Three types of microwave spectrometers3 are now discussed. 2.6.2.1 Total Power Spectrometer The simplest type of microwave spectrometer measures the power of incident radiation over a collection of bandwidths Bn , integrated over a time τ . The RMS sensitivity of the measurement at any given channel is a function of the receiver noise (TR , expressed in units of temperature), the incident radiation (TA , expressed in units of temperature), the bandwidth (Bn , Hz), and the integration time4 (τ , sec): TR + TA ∆Trms = √ Bn τ (2.71) Equation (2.71) assumes that the receiver gain is perfectly stable. Fluctuations in receiver gain reduce the system sensitivity as follows: s 1 ∆G 2 ∆Trms = (TR + TA ) + (2.72) Bn τ G 3. The term “radiometer” is used when incident electromagnetic power is measured across a given frequency band. The term “spectrometer” is used when power across several frequency bands (or channels) is measured. 4. It is assumed for the purposes of this discussion that the detector signal is convolved with a boxcar of length τ . Other averaging kernels may be used, with tradeoffs between sensitivity and memory effects. Physical Background of Atmospheric Remote Sensing 33 where ∆G/G is the fractional receiver gain drift. It is not uncommon for the gain drift component to dominate the noise expressed in (2.72). Examples of totalpower microwave spectrometer sounding systems include the NPOESS Aircraft Sounder Testbed–Microwave (NASTM) [18], the Advanced Microwave Sounding Unit (AMSU) [19], and the Advanced Technology Microwave Sounder (ATMS) [20]. 2.6.2.2 Dicke Spectrometer The Dicke spectrometer is essentially a totalpower spectrometer with two additional features: (1) a switch used to modulate the receiver input signal, and (2) a synchronous detector, placed between the detector and integrator. The modulation consists of periodically switching the receiver input between the antenna and a reference source (Tref ) at a rate higher than the highest significant spectral component of the gain variation. If the noise temperature of the reference source is close to the antenna temperature TA , the system sensitivity of the Dicke spectrometer becomes ∆Trms = 2.6.2.3 2(TR + TA ) √ Bn τ (2.73) Correlation Spectrometer Another possible method of stabilizing a receiver system involves the correlation of signals. Two separate receivers are used in a correlation spectrometer, and the resulting output voltages are multiplied and detected. The average value of a product of two independent noise temperatures is zero, and because only correlated noise voltages yield a DC output, receiver gain instabilities will not affect the sensitivity of the correlation√spectrometer. The sensitivity of the correlation spectrometer is a factor of 2 better than the Dicke spectrometer. However, two separate receivers are needed. 2.7 Summary The Earth’s atmosphere and its interaction with electromagnetic radiation has been examined on microscopic (molecular absorption) and macroscopic (particle extinction) levels. If the atmosphere is assumed to be nonscattering, horizontally homogeneous, and vertically stratified, straightforward relations can be derived for the radiation intensity observed by a downwardviewing satellite or aircraft sensor. The frequency dependence of scattering, absorption, and the Planck radiance offers various advantages for atmospheric profile sounding in the presence of clouds (see Table 2.3). Various instrument 34 Neural Networks in Atmospheric Remote Sensing Table 2.3 Comparison of Certain Characteristics of the 4.3µm, 15.0µm, and 5.0mm Spectral Regions. Detector Noise RMS: 0.15 K (IR) and 0.7 K (MW) (Source: [21]) E NERGY (Relative Planck radiance) T EMPERATURE SENSITIVITY (Relative to detector noise) C LOUD T RANSMISSION 200 K 300 K 200 K 300 K Water Ice 4.3µm 1.25 200 1 20 6% 1% 15.0µm 5,000 15,000 10 6 1% 1% 5.0mm 1 1 4 1 96% 99.98% technologies present performance advantages and disadvantages that must be considered when implementing a remote sounding system. Physical Background of Atmospheric Remote Sensing 35 References [1] G. L. Stephens. Remote Sensing of the Lower Atmosphere. Oxford University Press, New York, 1994. [2] J. T. Houghton. The Physics of Atmospheres. Cambridge University Press, Cambridge, U. K., 1986. [3] J. M Wallace and P. V. Hobbs. Atmospheric Science: An Introductory Survey. Elsevier, New York, second edition, 2006. [4] K. N. Liou. An Introduction to Atmospheric Radiation. Academic Press, Orlando, Florida, 1980. [5] D. H. Staelin, A. W. Morgenthaler, and J. A. Kong. Electromagnetic Waves. Prentice Hall, Upper Saddle River, New Jersey, 1994. [6] B. J. Mason. The Physics of Clouds. Oxford University Press, Oxford, U. K., 1971. [7] J. S. Marshall and W. Palmer. “The distribution of raindrops with size.” Journal of the Atmosphere, 5:165–166, 1948. [8] C. Elachi. Introduction to the Physics and Techniques of Remote Sensing. Wiley, New York, 1987. [9] D. Deirmendjian. Electromagnetic Scattering on Spherical Polydispersions. American Elsevier Publishing Co., Inc., New York, 1969. [10] A. J. Gasiewski. “Microwave radiative transfer in hydrometeors.” Atmospheric Remote Sensing by Microwave Radiometry, M. A. Janssen, Ed., Chapter 3, Wiley, New York, 1993. [11] H. H. Aumann, et al. “AIRS/AMSU/HSB on the Aqua mission: Design, science objectives, data products, and processing systems.” IEEE Trans. Geosci. Remote Sens., 41(2):253–264, February 2003. [12] H. S. Chen. Space Remote Sensing Systems: An Introduction. Academic Press, New York, 1985. [13] J. D. Kraus. Radio Astronomy. CygnusQuaser Books, Powell, Ohio, second edition, 1986. [14] M. A. Janssen. Atmospheric Remote Sensing by Microwave Radiometry. Wiley, New York, 1993. [15] D. Cousins and M. J. Gazarik. NAST Interferometer Design and Characterization: Final Report. Project Report NOAA26, MIT Lincoln Laboratory, July 1999. [16] H. J. Bloom. “The Crosstrack Infrared Sounder (CrIS): A sensor for operational meteorological remote sensing.” IEEE International Geoscience and Remote Sensing Symposium, 3:1341–1343, July, 2001. [17] G. Chalon, F. Cayla, and D. Diebel. “IASI: An advanced sounder for operational meteorology.” Proceedings of the 52nd Congress of IAF, pages 1–5, October 2001. [18] W. J. Blackwell, J. W. Barrett, F. W. Chen, R. V. Leslie, P. W. Rosenkranz, M. J. Schwartz, 36 Neural Networks in Atmospheric Remote Sensing and D. H. Staelin. “NPOESS aircraft sounder testbedmicrowave (NASTM): Instrument description and initial flight results.” IEEE Trans. Geosci. Remote Sens., 39(11):2444– 2453, November 2001. [19] B. H. Lambrigtsen. “Calibration of the AIRS microwave instruments.” IEEE Trans. Geosci. Remote Sens., 41(2):369–378, February 2003. [20] C. Muth, P. S. Lee, J. C. Shiue, and W. A. Webb. “Advanced technology microwave sounder on NPOESS and NPP.” IEEE International Geoscience and Remote Sensing Symposium, 4:2454–2458 Vol. 4, September 2004. [21] W. L. Smith. “Satellite techniques for observing the temperature structure of the atmosphere.” Bulletin of the American Meteorological Society, 53(11):1074–1082, November 1972. 3 An Overview of Inversion Problems in Atmospheric Remote Sensing In this book, we focus on the retrieval of geophysical state parameters (for example, the atmospheric temperature profile) from radiometric measurements observed in a number of spectral bands, and this retrieval almost always requires mathematical inversion of some form of a physical model with vectorvalued inputs and outputs, often termed the “forward model.” In the simplest case, this equation can be cast in matrix form by discretizing the relevant parameters (atmospheric profiles are inherently continuous quantities, for example) and ignoring nonlinear terms. Even in this simple case, the system of linear equations may be overdetermined, in which case no solution exists, or undetermined, in which case an infinite number of solutions exist. Problems of this type are often classified as illposed, and additional assumptions or constraints must be introduced to allow unique solutions to be obtained [1]. For example, an assumption that the desired solution is close in the Euclidean sense to some a priori value might be used to allow the linear system of equations to be solved with leastsquares techniques [2], or a constraint might be imposed that the solution must be sufficiently smooth by including a term related to the second derivative of the solution in the cost function to be minimized [3, 4]. Mathematical regularization techniques like those discussed above are used to increase the stability of the solutions to illposed problems. Inversion problems in atmospheric remote sensing and the relationships of the variables involved are seldom linear, Gaussian, or wellposed. For these reasons, sophisticated methodologies must be used to derive a useful solution. The strategies employed can be categorized into three mutually exclusive 37 38 Neural Networks in Atmospheric Remote Sensing and collectively exhaustive categories that we will term physical methods, statistical dependence methods, and hybrid methods. Physical approaches essentially propagate a first guess of the atmospheric state through a forward model (for example, the radiative transfer equation (2.60)) and use iterative, numerical procedures to match the modeled (i.e., simulated) measurements to the actual observations by updating the guess at each iteration. Statistical regularization is often used (but is not required) to introduce a tendency of the optimization towards a likely value, for example. It is for this reason that we differentiate between “statistical dependence methods” and “statistical methods,” as a physical method that uses statistical regularization is also a statistical method. Statistical dependence methods explicitly use (or empirically derive) a statistical relationship between the observations (i.e., the independent variables) and the geophysical state parameters (i.e., the dependent variables). No physical models are required in a statistical dependence method. Finally, hybrid methods use both physical models and statistical dependence to derive a solution to an inverse problem. For example, a forward model can be used to generate an ensemble of simulated observations and geophysical state parameters which then can be used with a statistical dependence method to carry out the inversion. We now present each of these three categories of inversion methodologies in detail. 3.1 Mathematical Notation For the following analyses, we assume that a noisy observation of a random e is related to some atmospheric state vector S through a radiance vector R forward model f (·) as follows e = f (S) + Ψ = R + Ψ R (3.1) where Ψ is a random noise vector (that may depend on S), and R is the “noisefree” radiance observation. The retrieval seeks to estimate the state vector S e where we use S( b R) e to denote the estimate of S given an observation of R, e given an observation of R. 3.2 Optimality Almost all inversion techniques are designed to optimize something. There are many choices of suitable mathematical parameters to optimize: sumsquared error, probabilistic likelihood, resolution, and signaltonoise ratio, to name a few. In the context of practical remote sensing inversion algorithms, there are two key points that must be stressed. First, it is difficult to guarantee optimality An Overview of Inversion Problems in Atmospheric Remote Sensing 39 with respect to even the most simple metrics due to the assumptions that must be satisfied by the retrieval system. We will therefore be careful to distinguish between a theoretically optimal retrieval algorithm and one that is employed in practice, where it is impossible to guarantee that all the necessary assumptions are universally satisfied. Second, a useful atmospheric retrieval algorithm must be robust with respect to a variety of metrics, some of which may even behave in direct opposition (for example, resolution and signaltonoise ratio). We therefore in practice usually choose a theoretically optimal algorithm (for some chosen metric, usually sumsquared error) and evaluate performance for a wide variety of other metrics to ensure that the algorithm performs well even in pathological, but meteorologically important, cases that may not be wellrepresented in global statistics. For example, atmospheric profile retrieval algorithms are often evaluated on a global perspective using sumsquared error with respect to a comprehensive set of “ground truth,” typically radiosondes or numerical model fields. It is also illuminating to examine performance on a casebycase basis to verify that interesting atmospheric phenomenology is captured with the necessary fidelity. 3.3 Methods That Exploit Statistical Dependence We begin with a discussion of techniques that directly utilize the joint e and S, P (R, e S), or statistics based probability distribution function (pdf) of R on this pdf, such as the crosscovariance. 3.3.1 The Bayesian Approach The Bayesian approach to estimation involves the incorporation of a priori knowledge about the state vector S with knowledge gained by measuring e Mathematically, this knowledge is formulated in terms of five related R. probability density functions (pdfs): P (S) e P (R) e S) P (R, e P (RS) e P (SR) The prior (i.e., before the measurement) pdf of state S e The prior pdf of the measurement R e and S The joint prior pdf of R e given state S The conditional pdf of R e This is The conditional pdf of S after measurement R. the quantity of interest for the solution of the estimation problem. 40 Neural Networks in Atmospheric Remote Sensing Bayes’ theorem relates the conditional probabilities as follows: e e = P (RS)P (S) P (SR) e P (R) (3.2) Therefore, the Bayesian framework allows probabilities to be assigned to b R) e given knowledge of the joint and conditional possible choices of S( e and S. A reasonable choice for S( b R) e is the value of S for probabilities of R e which P (SR) is the largest (known as the maximum a posteriori, or MAP, estimator, and sometimes called the maximum likelihood estimator). 3.3.1.1 Bayes’ LeastSquares Estimator An alternative to the MAP estimator is the estimator g(·) that minimizes some suitable cost criterion, C: b = arg min C(S, g(R)) e S(·) g(·) (3.3) The sumsquared error (SSE) cost criterion b T (S − S)] b C = E[(S − S) (3.4) is commonly chosen for this purpose and results in the following estimator, sometimes called the Bayes’ leastsquares (BLS) estimator: b R) e = E[SR] e S( (3.5) The BLS and MAP estimators may be identical under some circumstances, e are jointly Gaussian. for example, if S and R 3.3.1.2 Bayes’ Linear LeastSquares Estimator The BLS estimator has two disadvantages: it is often a nonlinear function e and it requires a complete statistical representation of the relationship of R, e and S, which is rarely available in practice. If we constrain the between R estimator g(·) in (3.3) to be linear, the resulting estimator depends only on a e and S. secondorder characterization of the statistical relationship between R This estimator is the linear leastsquares estimator (LLSE): b R) e = C e C−1 R e ≡ L eR e S( SR e e SR RR (3.6) An Overview of Inversion Problems in Atmospheric Remote Sensing 41 with error covariance T C = CSS − CS Re C−1 e e e CS R RR (3.7) e are zeromean and We have assumed without loss of generality that S and R have used CXY to denote the crosscovariance of X and Y . The LLS and e and S are jointly Gaussian. BLS estimators are identical when R 3.3.2 Linear and Nonlinear Regression Methods Direct application of the BLS and LLS estimators is often precluded in practical applications because the needed expected values (in the case of BLS) and covariance matrices (in the case of LLS) both depend on the joint e and these pdfs are very difficult probability distribution functions of S and R, to calculate directly. A much more convenient approach is to first estimate the needed statistical parameters from the available sample data and then derive the estimator from these sample statistics. We now present two examples of this approach, where statistical relationships are derived empirically from e sample data, taken here to mean an ensemble of N pairs of S and R: e1 ), . . . , (SN , R eN ) ∈ S × R (S1 , R (3.8) This ensemble could be assembled, for example, from colocated satellite radiance measurements and radiosonde observations of temperature as a function of altitude. Linear regression operators can be calculated directly from “sample covariances” that are empirically derived from the sample data. Nonlinear regression operators generalize this approach to allow more e to be represented. complicated mathematical relationships between S and R A parameterized, nonlinear function (for example, a polynomial) is often used to fit the sample data. The parameters can be chosen using a numerical optimization technique that minimizes a cost function usually involving the Euclidean distance between the actual data and the derived fit. 3.3.2.1 Linear Regression A given set of N observations of P parameters can be arranged into a P × N matrix, X. The sample mean, MX , is a column vector where each element is b XX is given by the average of each row of X. The sample covariance C T b XX = X X C N −1 (3.9) 42 Neural Networks in Atmospheric Remote Sensing where X is calculated by removing the sample mean from X. e where each column Given a matrix of noisy radiance observations, R, corresponds to an observation and each row corresponds to a spectral channel, and a matrix of geophysical observations, S, where each column corresponds to an observation and each row corresponds to a vertical level, for example, the linear regression estimate is b eC b −1 (R b R) e = MS + C e − M e) S( SR e e R RR (3.10) b e and C b e e are the sample covariance matrices. If the additive where C SR RR random noise term in (3.1) is zero mean and uncorrelated with the radiances, R, and the geophysical state vector, S, then (3.10) can be expressed in a simplified form as follows: b SR (C b RR + CΨΨ )−1 (R b R) e = MS + C e − MR ) S( (3.11) where CΨΨ is the noise covariance. If the noise covariance is known a priori, it can be used directly in (3.11), otherwise, it can be estimated using techniques presented in Chapter 4. Note that the covariance matrix can serve as a form b RR is near singular. This can occur of mathematical regularization when C frequently with hyperspectral/ultraspectral measurements due to the very high degree of correlation among some of the channels. It is interesting, and maybe not obvious, that linear regression can provide optimal estimates even in cases where the observations are nonlinearly related to the variables to be estimated. We illustrate this with a simple example. Suppose we wish to retrieve a scalar variable s from two scalar observations r1 and r2 and these variables are nonlinearly related as follows: r1 = a1 + b1 s + c1 s2 r2 = a2 + b2 s + c2 s2 (3.12) (3.13) A linear combination of r1 and r2 recovers s exactly: sb = α + βr1 + γr2 (3.14) where α = β = γ = c1 a2 − c2 a1 c1 b2 − c2 b1 c2 c2 b1 − c1 b2 c1 c1 b2 − c2 b1 (3.15) (3.16) (3.17) An Overview of Inversion Problems in Atmospheric Remote Sensing 43 While trivial in the case presented here, this example provides insight into recent work ([5, 6], for example) demonstrating that linear estimators perform very well when used to retrieve temperature and water vapor from hyperspectral infrared observations, where the relationships are quite nonlinear due to the Planck function and the dependence of atmospheric absorption on water vapor content, but there are a large number of spectral channels to allow the nonlinear dependence to be “unraveled” by appropriate linear combinations. We will revisit this example from a different perspective in Section 5.1.3. 3.3.2.2 Nonlinear Parametric Regression The linear form afforded by (3.11) is simple and convenient to apply in practice, but can lead to substantial inaccuracies in cases where the relationship between R and S is nonlinear. The linear regression framework presented above can be readily expanded by including nonlinear functions of R as inputs to the linear regression. For example, simple polynomial terms can be constructed and the linear regression operator can be used to optimize the coefficients that modulate these terms. Polynomial regression is an example of a parameterized method, where the model structure and complexity is predetermined and only the free parameters in the model need to be derived. Other parameterized, nonlinear functions can also be constructed, and the parameters can be obtained using numerical optimization techniques. 3.3.2.3 Nonlinear Nonparametric Regression Nonparametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data during the training process. The term nonparametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance. A neural network can be broadly categorized as a special case of nonparameterized nonlinear regression and will be discussed in detail in Chapter 5. We end this section with a temperature retrieval example using both linear and polynomial regression techniques. The NOAA88b radiosonde set contains approximately 7,500 global atmospheric profiles of temperature, water vapor, and ozone. A radiative transfer package was used to simulate microwave sounding observations in 100 spectral bands near the 118.75GHz oxygen line. Figure 3.1 shows the temperature retrieval RMS errors as a function of altitude for linear regression and polynomial regression. The a priori error (not shown) is approximately 10K throughout the troposphere. Although both the linear and nonlinear regressions have substantially reduced 44 Neural Networks in Atmospheric Remote Sensing Figure 3.1 RMS temperature retrieval error for linear regression and nonlinear (polynomial) regression. A hypothetical microwave sounder with 100 channels near the 118.75GHz oxygen line was used in the retrieval simulation. the a priori error, the polynomial containing at least thirdorder terms clearly improves the accuracy of the retrieval. 3.3.2.4 Ridge Regression It may be desirable to constrain the magnitude of the coefficients in the e This is a form of regularization that stabilizes the regression operator, LS Re R. solution if the covariance matrix is nearly singular. The modified cost function is b T (S − S)] b + γtrace{LT L} C = E[(S − S) (3.18) and the solution is b R) e = C e C e e + γI S( SR RR −1 e R (3.19) This modification to the standard linear regression operator is termed ridge regression [7]. An Overview of Inversion Problems in Atmospheric Remote Sensing 45 3.4 Physical Inversion Methods The previous estimators are based entirely on the statistical relationship between R and S. Other methods use knowledge of the forward model f (·) together with whatever limited statistical characterization of R and S is available. If we return to (3.1) and make the assumptions that the error and the a priori state distributions are Gaussian, the probability distributions in the numerator of (3.2) are proportional to the following terms: 1 e T −1 e e P (RS) ∼ exp − (R − R) CΨΨ (R − R) (3.20) 2 1 P (S) ∼ exp − (S − Sa )T C−1 (3.21) SS (S − Sa ) 2 e is often a where Sa is the a priori state vector. The denominator in (3.2), P (R), normalizing factor in practice and can be neglected [8]. The most likely value e is therefore the maximum of the product of (3.20) and (3.21), of P (SR) or equivalently, the maximum of the sum of their natural logarithms. This maximization is equivalent to minimizing: e − R) + (S − Sa )T C−1 (S − Sa ) e − R)T C−1 (R ξmin = (R SS ΨΨ (3.22) Most physical retrieval approaches attempt to minimize a cost function similar in form to that given in (3.22), that is, b = arg min ξ S(·) min S (3.23) although many variations of this cost function could be used in practice. For example, it might be advantageous to minimize a weighted sum of the two terms in (3.22). Other physical approaches seek to minimize a quite different cost function, for example, the vertical resolution of the retrieval [9]. We will not discuss these methods here – the interested reader is referred to Twomey [1] and Rodgers [8] for detailed treatment of these topics. 3.4.1 The Linear Case The solution to (3.23) can be found analytically only under certain circumstances. The earlier assumption of Gaussianity must hold, and the relationship e and S must be linear: between R e = WS + Ψ R (3.24) 46 Neural Networks in Atmospheric Remote Sensing where W is sometimes called the weighting function matrix. It can be shown [8] that the solution in this case can be expressed by two equivalent relations: e = Sa + (WT C−1 W + C−1 )−1 WT C−1 (R e − WSa ) Sbm (R) ΨΨ SS ΨΨ (3.25) e = Sa + CSS WT (WCSS WT + CΨΨ )−1 (R e − WSa ) Sbn (R) (3.26) where the subscript on Sb (m or n) denotes the order of the matrix to be inverted. Note the similarity of the nform solution with the linear regression estimate given by (3.11). 3.4.1.1 The MinimumInformation Retrieval The minimuminformation retrieval picks the Sb which is “closest” in the leastsquares sense to Sa and satisfies e − R)T (R e − R) = M σ 2 (R (3.27) where M σ 2 is a scalar quantity related to the measurement error. In the linear case, the minimuminformation solution in the presence of noise with covariance CΨΨ is then b R) e = Sa + WT (WWT + βCΨΨ )−1 (R e − WSa ) S( (3.28) where β is some constant (in units of 1/K2 ). Note that the minimuminformation retrieval is the nform solution given in (3.26) with the a priori state covariance replaced by the identity matrix. Figure 3.2 shows the performance of three operators used to retrieve the temperature profile (S = T ) from simulated Advanced Microwave Sounding Unit (AMSU) radiances in clearair: the minimuminformation retrieval (3.28), the linear model with known CSS (3.26), and the linear regression estimator (3.11). The graph demonstrates the significant impact of a priori statistics on the retrieval performance. 3.4.2 The Nonlinear Case The minimization of (3.23) often requires numerical methods. We apply the canonical approach and set the derivative of the cost function (3.22) to zero and numerically find a root of the resulting equation. The derivative of the cost function is calculated as follows: −1 e −[∇S f (S)]T C−1 ΨΨ [R − f (S)] + CSS (S − Sa ) (3.29) 47 An Overview of Inversion Problems in Atmospheric Remote Sensing 50 45 40 Altitude (km) 35 30 25 20 15 10 5 0 0 1 2 3 4 5 6 7 8 9 10 RMS Temperature profile retrieval error (K) A priori Minimum information C TT CTR, CRR Figure 3.2 Comparison of temperature retrieval techniques. The minimum information retrieval (with β = 1K−2 ) only uses information contained in the weighting function matrix. The optimal linear model retrieval uses the weighting function matrix and CT T only. The direct multiple regression retrieval uses statistical characterizations of CRR and CT T . Newton’s method iteratively solves h(S) = 0 by stepping from the current guess (Si ) to the next guess (Si+1 ) according to: Si+1 = Si − [∇S h(Si )]−1 h(Si ) (3.30) Substitution of (3.29) into (3.30) yields the update step needed to minimize the cost function: o−1 n T −1 e T −1 Si+1 = Si + C−1 SS + Ki CΨΨ Ki − [∇S Ki ] CΨΨ [R − f (Si )] n o e − f (Si )] − C−1 (Si − Sa ) × KTi C−1 [ R (3.31) ΨΨ SS where the matrix Ki contains the derivatives of the forward model with respect to the state vector, that is, Ki = ∇S f (Si ). The implementation of this 48 Neural Networks in Atmospheric Remote Sensing method is complicated in practice by the calculation of the Hessian matrix, ∇S Ki , which is computationally intensive to evaluate. It is often reasonable to neglect the Hessian term in moderately linear problems [10], and the update step becomes: −1 T −1 Si+1 = Si + C−1 SS + Ki CΨΨ Ki n o e − f (Si )] − C−1 (Si − Sa ) × KTi C−1 [ R ΨΨ SS (3.32) Alternatively, the update can be expressed relative to the a priori state, as follows: Si+1 = Sa + CSS KTi (Ki CSS KTi + CΨΨ )−1 n o e − f (Si ) + Ki (Si − Sa ) × R (3.33) This equation will be used in a retrieval example presented in Chapter 10, at which point we also discuss convergence criteria and computational efficiency. 3.5 Hybrid Inversion Methods e and S must be available Some knowledge about the relationship between R b R), e and we have seen that to allow the formulation of a suitable estimate S( statistical dependence methods exploit statistical relationships and physical methods exploit physical relationships. It may be possible to improve retrieval accuracy and/or efficiency by using both statistical and physical knowledge of the state parameters and the measured radiances. 3.5.1 Improved Retrieval Accuracy The physical inversion approaches previously discussed are less effective as the relevant processes deviate from linearity and Gaussianity, and some atmospheric processes present formidable retrieval challenges. Most notable are clouds and precipitation, which can be highly variable, dynamic, nonlinear, and nonGaussian. Recent work on the retrieval of precipitation rate from passive microwave measurements [11–17] has demonstrated the utility of microphysical precipitation models. The complexity and nonlinearity of these models, however, has precluded their direct use in retrieval algorithms. One fruitful approach has been to generate training data using the microphysical precipitation model and subsequently derive the relevant statistical relationships with a nonlinear regression (neural network) method. An Overview of Inversion Problems in Atmospheric Remote Sensing 49 3.5.2 Improved Retrieval Efficiency One advantage of decoupling the physical model from the retrieval algorithm is an increase in computational efficiency. Many physical models, especially those that must represent complicated, nonlinear processes in many spectral bands, are computationally prohibitive for retrieval systems that must operate in nearrealtime. An alternative approach is to use the forward model “offline” from the retrieval to generate a database of radiance–state pairs. The statistical relationships are also derived offline, and optimal estimators can be calculated. The operational retrieval then can implement these estimators. 3.6 Error Analysis Once a suitable retrieval operator has been constructed, it is necessary to assess the performance of the estimates. Many reasonable metrics could be examined, including the sensitivity of the retrieval to sensor noise, the resolution of the retrieval, the degree to which the retrieval system (i.e., the retrieval algorithm and the sensor) is “blind” to the atmospheric parameters of interest, and so forth. These metrics, and others, can be readily calculated from the retrieval operators in many cases. Perturbation analysis can be used when complexity and/or nonlinearity preclude simple, direct techniques. 3.6.1 Analytical Analysis We begin with the linear case, where the state estimate takes the form b R) e = DR e S( (3.34) where we have encapsulated the offset term into D by adding a new element e and setting this element to one. The D matrix may be derived in many to R ways, for example, using (3.11), (3.26), or (3.28). Under the assumption of a linear forward model, (3.24) can be substituted into (3.34), and we obtain: b R) e = DWS + DΨ S( (3.35) and we observe that the estimated state vector is a linear function of the true state function. In the ideal, linear case, this decomposition yields the two basic contributions of retrieval imperfection: smoothing, represented by the averaging kernel, A = DW, and retrieval noise, represented by DΨ. In practice, the forward model does not perfectly capture reality and is usually nonlinear. These complications can be handled by including additional forward model error terms and linearizing the forward model about a suitable 50 Neural Networks in Atmospheric Remote Sensing operating point [8]. This results in (at least) a third contribution to retrieval imperfection, in the form of a retrieval bias. The relative contributions of smoothing error and retrieval noise to the total retrieval error covariance can be readily calculated: Csmooth = (A − I)CSS (A − I)T (3.36) Cnoise = DCΨΨ DT (3.37) and where we have assumed that the measurement noise is uncorrelated with the atmospheric state. The direct calculation of the error contributions due to forward model errors is difficult using analytical methods unless assumptions and approximations are used. Numerical techniques such as Monte Carlo analysis can be helpful in such circumstances. 3.6.2 Perturbation Analysis 3.6.2.1 Atmospheric Smoothing In the linear case, the derivative of the estimated state with respect to the true state is simply the averaging kernel: ∂ Sb =A (3.38) ∂S Analysis of more complicated cases is facilitated by decomposing this derivative into the product of the forward model Jacobian and the retrieval Jacobian: ∂ Sb ∂R ∂ Sb = (3.39) ∂S ∂S ∂R The forward model and the retrieval operator can then be linearized about some operating point, and these derivatives can be calculated – numerical techniques can be used, if necessary. We will see in Chapter 8 that neural network estimators are particularly amenable to perturbation analyses because the Jacobians are very easy to compute. 3.6.2.2 Retrieval Noise The contribution of retrieval noise to the total retrieval error can be approximated using a secondorder propagation of errors approach, as follows: !T ∂ Sb ∂ Sb Cnoise ≈ CΨΨ (3.40) ∂R ∂R An Overview of Inversion Problems in Atmospheric Remote Sensing 51 3.7 Summary The set of equations relating an atmospheric state S to the observed e is seldom directly invertible. The inversion is often illposed, radiances R where an infinite number of solutions exist, or illconditioned, where small perturbations of the radiance values lead to very large perturbations of the solution. Regularization techniques in the form of constraints that penalize deviations of the solution from an a priori state, for example, are used to e can be improve the stability of the retrieval. Relationships between S and R derived using physical or statistical means, or both. In some cases, retrieval error components can be isolated and attributed to biases, atmospheric smoothing, sensor noise, and so forth. 52 Neural Networks in Atmospheric Remote Sensing References [1] S. Twomey. Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements. Elsevier Scientific Publishing Company, New York, 1977. [2] G. Strang. Linear Algebra and Its Applications. Academic Press, New York, 1980. [3] A. Tikhonov. “On the solution of incorrectly stated problems and a method of regularization.” Dokl. Acad. Nauk SSSR, 151:501–504, 1963. [4] A. N. Tikhonov, A. V. Goncharsky, V. V. Stepanov, and A. G. Yagola. Numerical Methods for the Solution of IllPosed Problems. Kluwer, Boston, Massachusetts, 1995. [5] M. D. Goldberg, Y. Qu, L. M. McMillin, W. Wolff, L. Zhou, and M. Divakarla. “AIRS nearrealtime products and algorithms in support of operational numerical weather prediction.” IEEE Trans. Geosci. Remote Sens., 41(2):379–389, February 2003. [6] E. Weisz, H. L. Huang, J. Li, E. Borbas, K. Baggett, P. K. Thapliyal, and L. Guan. “International MODIS and AIRS processing package: AIRS products and applications.” J. App. Rem. Sens., 1:1–23, July 2007. [7] A. E. Hoerl. “Application of ridge analysis to regression problems.” Chemical Engineering Progress, 58:54–59, 1962. [8] C. D. Rodgers. Inverse Methods for Atmospheric Sounding. World Scientific, New York, 2000. [9] G. E. Backus and J. F. Gilbert. “Uniqueness in the inversion of inaccurate gross earth data.” Phil. Trans. Roy. Soc. London, 266:123–192, 1970. [10] C. D. Rodgers. “Retrieval of atmospheric temperature and composition from remote measurements of thermal radiation.” J. Geophys. Res., 41(7):609–624, July 1976. [11] C. Surussavadee and D. H. Staelin. “Comparison of AMSU millimeterwave satellite observations, MM5/TBSCAT predicted radiances, and electromagnetic models for hydrometeors.” IEEE Trans. Geosci. Remote Sens., 44(10):2667–2678, October 2006. [12] C. Surussavadee and D. H. Staelin. “Precipitation retrieval accuracies for geomicrowave sounders.” IEEE Trans. Geosci. Remote Sens., 45(10):3150–3159, October 2007. [13] C. Surussavadee and D. H. Staelin. “Millimeterwave precipitation retrievals and observedversussimulated radiance distributions: Sensitivity to assumptions.” J. Atmos. Sci., 64(11):3808–3826, November 2007. [14] C. Surussavadee and D. H. Staelin. “Global millimeterwave precipitation retrievals train