Main Neural Networks in Atmospheric Remote Sensing

Neural Networks in Atmospheric Remote Sensing

,
0 / 0
How much do you like this book?
What’s the quality of the file?
Download the book for quality assessment
What’s the quality of the downloaded files?
A neural network refers to interconnecting artificial neurons that mimic the properties of biological neurons to perform sophisticated, intelligent tasks. This authoritative reference offers a comprehensive understanding of the underpinnings and practical applications of artificial neural networks and their use in the retrieval of geophysical parameters. Professionals find expert guidance on the development and evaluation of neural network algorithms that process data from a new generation of hyperspectral sensors. Engineers discover how to use neural networks to approximate remote sensing inverse functions with emphasis on model selection, preprocessing, initialization, training, and performance evaluation.
Year:
2009
Edition:
1
Publisher:
MIT
Language:
english
Pages:
234 / 232
ISBN 10:
1596933720
ISBN 13:
9781596933729
File:
PDF, 4.45 MB
Download (pdf, 4.45 MB)

You may be interested in Powered by Rec2Me

 

Most frequently terms

 
0 comments
 

You can write a book review and share your experiences. Other readers will always be interested in your opinion of the books you've read. Whether you've loved the book or not, if you give your honest and detailed thoughts then people will find new books that are right for them.
1

Perioperative Anesthetic Care of the Obese Patient

Year:
2009
Language:
english
File:
PDF, 1.79 MB
0 / 0
2

Handbook of Synthetic Photochemistry

Year:
2010
Language:
english
File:
PDF, 7.90 MB
0 / 0
Neural Networks
in Atmospheric Remote Sensing

This is a sample library statement

Neural Networks in Atmospheric
Remote Sensing
William J. Blackwell
Frederick W. Chen

Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the U.S. Library of Congress.

British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library.

ISBN-13 978-1-59693-372-9

Cover design by Yekaterina Ratner

© 2009 Massachusetts Institute of Technology
Lincoln Laboratory
244 Wood Street
Lexington, MA 02420
All rights reserved.
This work was funded in part by the National Oceanic and Atmospheric Administration
under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and
recommendations are those of the authors and are not necessarily endorsed by the United States
Government.

Printed and bound in the United States of America. No part of this book may be reproduced or
utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the
publisher. All terms mentioned in this book that are known to be trademarks or service marks
have been appropriately capitalized. Artech House cannot attest to the accuracy of this information. Use of a term in this book should not be regarded as affecting the validity of any trademark
or service mark.

10 9 8 7 6 5 4 3 2 1

Disclaimer:
This eBook does not include the ancillary media that was
packaged with the original printed version of the book.

To our families

Contents
Preface

xiii

1

Introduction

1

1.1

Present Challenges

1

1.2

Solutions Based on Neural Networks

2

1.3

Mathematical Notation
References

3
5

2

Physical Background of Atmospheric Remote Sensing

7

2.1
2.1.1
2.1.2
2.1.3
2.1.4

Overview of the Composition and Thermal Structure
of the Earth’s Atmosphere
Chemical Composition of the Atmosphere
Vertical Distribution of Pressure and ; Density
Thermal Structure of the Atmosphere
Cloud Microphysics

7
8
9
10
11

2.2
2.2.1
2.2.2
2.2.3

Electromagnetic Wave Propagation
Maxwell’s Equations and the Wave Equation
Polarization
Reflection and Transmission at a Planar Boundary

12
12
13
15

2.3
2.3.1
2.3.2
2.3.3

Absorption of Electromagnetic Waves by Atmospheric Gases
Mechanisms of Molecular Absorption
Line Shapes
Absorption Coefficients and Transmission Functions

16
17
17
17

vii

viii

Neural Networks in Atmospheric Remote Sensing

2.3.4

The Atmospheric Absorption Spectra

18

2.4
2.4.1
2.4.2
2.4.3

Scattering of Electromagnetic Waves by Atmospheric Particles
Mie Scattering
The Rayleigh Approximation
Comparison of Scattering and Absorption by Hydrometeors

19
19
21
22

2.5
2.5.1
2.5.2
2.5.3
2.5.4

Radiative Transfer in a Nonscattering Planar-Stratified
Atmosphere
Equilibrium Radiation: Planck and Kirchhoff’s Laws
Radiative Transfer Due to Emission and Absorption
Integral Form of the Radiative Transfer Equation
Weighting Function

22
24
24
25
27

2.6
2.6.1
2.6.2

Passive Spectrometer Systems
Optical Spectrometers
Microwave Spectrometers

30
31
32

2.7

Summary
References

33
35

3

An Overview of Inversion Problems in Atmospheric
Remote Sensing
37

3.1

Mathematical Notation

38

3.2

Optimality

38

3.3
3.3.1
3.3.2

Methods That Exploit Statistical Dependence
The Bayesian Approach
Linear and Nonlinear Regression Methods

39
39
41

3.4
3.4.1
3.4.2

Physical Inversion Methods
The Linear Case
The Nonlinear Case

45
45
46

3.5
3.5.1
3.5.2

Hybrid Inversion Methods
Improved Retrieval Accuracy
Improved Retrieval Efficiency

48
48
49

3.6
3.6.1
3.6.2

Error Analysis
Analytical Analysis
Perturbation Analysis

49
49
50

3.7

Summary
References

51
52

Contents

ix

4

Signal Processing and Data Representation

55

4.1
4.1.1
4.1.2

Analysis of the Information Content of Hyperspectral Data
Shannon Information Content
Degrees of Freedom

56
56
58

4.2
4.2.1
4.2.2
4.2.3
4.2.4
4.2.5

Principal Components Analysis (PCA)
Nonlinear PCA
Linear PCA
Principal Components Transforms
The Projected PC Transform
Evaluation of Radiance Compression Performance Using Two
Different Metrics

59
61
61
63
64
67

4.3

Representation of Nonlinear Features

69

4.4

Summary
References

70
71

5

Introduction to Multilayer Perceptron Neural Networks 73

5.1
5.1.1
5.1.2
5.1.3
5.1.4
5.1.5

A Brief Overview of Machine Learning
Supervised and Unsupervised Learning
Classification and Regression
Kernel Methods
Support Vector Machines
Feedforward Neural Networks

74
74
74
75
76
78

5.2
5.2.1
5.2.2

Feedforward Multilayer Perceptron Neural Networks
Network Topology
Network Training

82
82
84

5.3
5.3.1
5.3.2

Simple Examples
Single-Input Networks
Two-Input Networks

85
85
93

5.4

Summary

94

5.5

Exercises
References

95
96

6

A Practical Guide to Neural Network Training

97

6.1
6.1.1
6.1.2
6.1.3

Data Set Assembly and Organization
Data Set Integrity
The Importance of an Extensive and Comprehensive Data Set
Data Set Partitioning

97
98
98
98

x

Neural Networks in Atmospheric Remote Sensing

6.2
6.2.1
6.2.2
6.2.3

Model Selection
Number of Inputs
Number of Hidden Layers and Nodes
Adaptive Model Building Techniques

100
100
100
101

6.3

Network Initialization

101

6.4
6.4.1
6.4.2
6.4.3

Network Training
Calculation of the Error Gradient Using Backpropagation
First-Order Optimization: Gradient Descent
Second-Order Optimization: Levenberg-Marquardt

102
102
104
104

6.5

Underfitting and Overfitting

105

6.6
6.6.1
6.6.2

Regularization Techniques
Treatment of Noisy Data
Weight Decay

107
108
110

6.7

Performance Evaluation

111

6.8

Summary
References

112
114

7

Pre- and Post-Processing of Atmospheric Data

115

7.1

Mathematical Overview

116

7.2

Data Compression

117

7.3
7.3.1
7.3.2

Filtering of Interfering Signals
The Wiener Filter
Stochastic Cloud Clearing

118
119
120

7.4
7.4.1
7.4.2
7.4.3

Data Warping
Function of Time of Day
Function of Geolocation
Function of Time of Year

124
125
129
131

7.5

Summary
References

134
135

8

Neural Network Jacobian Analysis

137

8.1

Calculation of the Neural Network Jacobian

138

8.2
8.2.1
8.2.2

Neural Network Error Analysis Using the Jacobian
The Network Weight Jacobian
The Network Input Jacobian

139
139
140

Contents

xi

8.2.3

Use of the Jacobian to Assess Noise Contribution

141

8.3
8.3.1
8.3.2
8.3.3

Retrieval System Optimization Using the Jacobian
Noise Smoothing Versus Atmospheric Smoothing
Optimization Approach
Optimization Results

143
144
145
146

8.4

Summary
References

146
148

9

Neural Network Retrieval of Precipitation
from Passive Microwave Observations

149

9.1
9.1.1
9.1.2

Structure of the Algorithm
Physical Basis of Preprocessing
Physical Basis of Post-Processing

149
150
153

9.2
9.2.1
9.2.2
9.2.3
9.2.4

153
153
155
159

9.2.5

Signal Processing Components
Limb-and-Surface Corrections
Precipitation Detection
Cloud Clearing by Regional Laplacian Interpolation
Temperature-Profile and Water-Vapor-Profile Principal
Components
Image Sharpening

163
164

9.3

Development of the Algorithm

165

9.4
9.4.1
9.4.2

Retrieval Performance Evaluation
Image Comparisons of NEXRAD and AMSU/HSB
Numerical Comparisons of NEXRAD and AMSU/HSB
Retrievals
Global Retrievals of Rain and Snow

168
168
169
173

9.5

Summary
References

175
176

10

Neural Network Retrieval of Atmospheric Profiles
from Microwave and Hyperspectral Infrared
Observations

179

9.4.3

10.1
The PPC/NN Algorithm
10.1.1 Network Topology
10.1.2 Network Training
10.2

Retrieval Performance Comparisons with Simulated Clear-Air
AIRS Radiances

180
181
181
181

xii

Neural Networks in Atmospheric Remote Sensing

10.2.1 Simulation of AIRS Radiances
10.2.2 An Iterated Minimum-Variance Technique for the Retrieval of
Atmospheric Profiles
10.2.3 Retrieval Performance Comparisons
10.2.4 Discussion
10.3
10.3.1
10.3.2
10.3.3
10.3.4
10.3.5
10.3.6
10.3.7

Validation of the PPC/NN Algorithm with AIRS/AMSU
Observations of Partially Cloudy Scenes over Land and Ocean
Cloud Clearing of AIRS Radiances
AIRS/AMSU/ECMWF Data Set
AIRS/AMSU Channel Selection
PPC/NN Retrieval Enhancements for Variable Sensor Scan
Angle and Surface Pressure
Retrieval Performance
Retrieval Performance Sensitivity Analyses
Discussion and Future Work

182
183
184
185
188
188
188
189
189
190
194
198

10.4

Summary and Conclusions
References

201
202

11

Discussion of Future Work

205

11.1

Bayesian Approaches for Neural Network Training and Error
Characterization

205

11.2

Soft Computing: Neuro-Fuzzy Systems

206

11.3

Nonstationarity Considerations: Neural Network Applications
for Climate Studies
References

207
209

About the Authors

211

Index

213

Preface
This book is intended to provide a practical, applications-oriented treatment
of neural network methodologies for use in atmospheric remote sensing.
We focus on the retrieval of atmospheric parameters, such as the Earth’s
temperature and water vapor profiles and precipitation rate, but the techniques
described can be applied to a wide variety of problems where function
approximation is required. We use simple, largely theoretical examples to
provide the reader with intuition on how performance is affected by basic
neural network attributes such as model selection, initialization, and training
methodology, and we then build these simple techniques into larger, “realworld” applications that are common throughout the field of atmospheric
remote sensing. Many of the examples are accompanied by M ATLABTM
(www.mathworks.com) software codes (available on the accompanying CDROM in the back of the book) that can be used as building blocks for larger and
more complex problems. These codes were written using the freely available
N ETLAB Neural Network package and do not require any M ATLABTM “addon” toolboxes.
The book is organized as follows. Chapter 1 presents an overview of
the issues and challenges related to the retrieval of atmospheric parameters
from remote measurements of atmospheric emission and scattering. Chapter 2
provides a summary of many of the relevant physical processes at the foundation of atmospheric remote sensing, including atmospheric composition,
wave propagation, atmospheric absorption and scattering, radiative transfer,
and spectrometer systems. Chapter 3 then follows with an overview of
mathematical inversion methods commonly used in retrievals of atmospheric
parameters from radiometric data, including iterative minimum variance
approaches, regression, and Bayesian techniques. Constraints imposed on
solutions through the use of regularization techniques are also discussed.
xiii

xiv

Neural Networks in Atmospheric Remote Sensing

Chapter 4 presents theoretical background on many of the signal processing
techniques commonly used in conjunction with neural network processing,
including principal components analysis (PCA), Wiener filtering, periodic
data representation, and blind estimation. Chapter 5 introduces multilayer
perceptron neural networks and provides a general description of network
topology and optimization in the broad context of machine learning. Chapter 6
presents detailed methodologies for network model selection, initialization,
and training. Guidance is also provided for the use of these methodologies
in practice, and common pitfalls are discussed. Chapter 7 discusses pre- and
post-processing techniques that can be used to improve the effectiveness of
the concomitant retrieval operators by reducing the volume of data that must
be processed, and/or removing unwanted, interfering signals, such as noise,
clouds, or surface variability. Chapter 8 provides practical guidance on the
evaluation of network performance and discusses the important concepts of
under- and overfitting, early stopping and weight decay, and network stability.
The book culminates with two examples of complete neural network retrieval
systems used to retrieve atmospheric parameters from passive spaceborne
spectrometers. Chapter 9 presents an example of a highly nonlinear problem,
retrieval of precipitation rate from passive microwave measurements, and
Chapter 10 presents an example of a high-dimensional problem, retrieval of
temperature and moisture profiles from combined microwave and infrared
measurements. A discussion of possible future work is given in Chapter 11.
A great number of people have contributed to the work presented in
this book. George Aumann, Chris Barnet, Mous Chahine, Mitch Goldberg,
Tom Pagano, Bill Smith, and Joel Susskind of the AIRS Science Team have
provided helpful suggestions and constructive criticisms that have helped to
shape the course of much of our neural network retrieval research involving
the AIRS products. Larrabee Strow and Scott Hannon provided the AIRS
transmittance codes and guidance on their use. Many in the broad NPOESS
community have provided valuable feedback and support, including Karen
St. Germain, Degui Gu, Xu Liu, Steve Mango, and Dan Zhou. We would
like to thank Laura Bickmeier, Chuck Cho, Monica Coakley, Harry Finkle,
Chris Gittins, Laura Jairam, John Kerekes, Dan Mooney, Mike Pieper, Phil
Rosenkranz, Chinnawat Surussavadee, and Kevin Wong for many helpful
discussions. We are grateful to Dave Weitz, Vince Leslie, and Dimitris
Manolakis for thoughtful comments on the manuscript and to Seth Hall for
computer support. We thank Greg Berthiaume, Hsiao-hua Burke, and Roger
Sudbury for their support and encouragement over the course of this project.
A special debt of gratitude is expressed to Dave Staelin for the guidance
and support he has provided on all facets of this work, including preparation

Preface

xv

of this book. His thoughtful comments have added substantially to the clarity
and completeness of the presentation.
William J. Blackwell would especially like to thank Megan for putting
up with his preoccupation with this project over more than a few nights and
weekends and for offering many fresh and insightful perspectives.

1
Introduction
Measurements of the state and composition of the Earth’s surface and
atmosphere have been made using passive microwave and infrared sensors for
over 50 years [1]. Applications of these remote measurements are numerous,
and encompass fields ranging from meteorology, oceanography, geology, and
ecology. For example, satellite measurements of atmospheric temperature are
used to improve weather forecasting models, analyze climate change, and
study the radiation budget of Earth [2].
Recent advances in airborne and spaceborne sounding platforms have
made atmospheric measurements possible on a global scale, and advances
in sensor technologies have pushed the limits of achievable spatial and temporal resolution to unprecedented levels. These performance improvements,
however, are not without concomitant data processing difficulties. The vast
amount of data generated by present and next generation sounding systems
must be transmitted and processed in a timely manner (usually near real time),
which requires processing algorithms that are both computationally efficient
and robust to sensor and atmospheric anomalies (an erupting volcano, for
example).

1.1 Present Challenges
A principal complication in the retrieval of geophysical parameters such as
the global three-dimensional atmospheric temperature and moisture profile
from satellite radiance observations is the nonlinear, non-Gaussian, and illposed physical and mathematical relationship between the radiance observed
by a remote sensing instrument and the desired retrieved quantity. Great
strides have recently been made to improve and better characterize the models
that are used to capture these relationships, but these models are seldom
1

2

Neural Networks in Atmospheric Remote Sensing

invertible by direct means, usually due to the complex nature of the underlying
physics of the relevant geophysical processes. Common inversion approaches
involve iterated numerical optimization methodologies that minimize a cost
function subject to constraints imposed by a set of regularization parameters
constructed so that the optimization tends toward solutions that are more
“statistically probable” and/or “physically realistic.” These regularization
parameters are often largely subjective, and the construction of effective
retrieval algorithms therefore requires a substantial component of “black art”
to balance the use of the information content in the measured upwelling
atmospheric radiances with the plausibility of the retrieval.
A logistical drawback to iterated, model-based inversion techniques is
the computational burden required to carry out the numerical optimizations.
Modern thermal infrared sensors measure spectral radiances in tens of
thousands of separate wavebands (sometimes termed “hyperspectral” or even
“ultraspectral”) for each observed pixel. The computational complexity of the
optimization routines typically scales as the square (or cube) of the number
of channels, and it is rare that all of the information available in the radiance
spectrum is used. The vast presence of clouds further degrades performance,
and therefore a separate preprocessing stage is often employed prior to (or in
concert with) numerical inversion to correct the substantial radiance errors that
can be introduced due to the high opacity of cloud formations in the infrared
wavelengths.

1.2 Solutions Based on Neural Networks
An alternative approach to the numerical inversion approach described
above is statistical regression (parameterized function approximation), where
an ensemble of input/output pairs is used to empirically derive statistical
relationships between the ensembles. In the case of linear regression, secondorder statistical moments (covariances) are used to compute a linear fit
that minimizes the sum-squared error between the fit and the data. A
linear representation is seldom sufficient to fully characterize the complex
statistical relationships endemic in atmospheric data, and nonlinear regression
techniques must be used. An artificial neural network is a special class
of nonlinear regression operators – the mathematical structure of a neural
network is chosen to afford several desirable properties, including scalability
and differentiability. Patterned after the human nervous system, an artificial
neural network (hereafter, simply a neural net) consists of interconnected
neurons, or nodes, that implement a simple, nonlinear function of the inputs.
Usually, the inputs are linearly weighted (the weights modulate each input
and the biases provide an offset) and passed through an activation function

Introduction

3

(often nonlinear). The power of neural networks, both from the standpoint of
their capabilities and the derivation of their free parameters, stems from the
parallel structure of the computational elements. In this book, we primarily
consider feedforward connections of layers of nodes with sigmoidal (softlimit) activation nodes. Many other variations can be used, but the feedforward
variety is most common and the techniques described here are readily applied
to other topologies.
The neural network approach offers several substantial advantages
over iterated, model-based inversion methodologies. Once the weights and
biases are derived (during the training process), the network operates very
quickly and can be easily implemented in software. This simplicity and
speed greatly facilitates the development and maintenance, and therefore
cost, of complex geophysical retrieval systems that process high volumes
of hyperspectral data. The trained neural networks are continuous and
differentiable, which simplifies error propagation and therefore performance
sensitivity analyses. Finally, neural networks can approximate functions
with arbitrarily high degrees of nonlinearity with a sufficient number of
nodes and layers. These advantages have spurred the recent use of neural
network estimation algorithms for geophysical parameter retrievals [3–5].
Methods based on neural networks for data classification have also become
commonplace, although we will focus on regression in this book. Many of the
tips and techniques discussed, however, are directly applicable to both types
of problems.

1.3 Mathematical Notation
One of the primary goals of this book is to cohesively unite the fields
of statistics and estimation, mathematical inversion, machine learning, and
radiative transfer in the context of the atmospheric retrieval problem. A
principal challenge in this endeavor is to reconcile the often disparate sets
of mathematical notation used in the literature for each field. For example,
Rodger’s classic treatment of the retrieval of atmospheric state variables [6]
denotes the state vector to be retrieved (i.e., the output of the retrieval
algorithm) as “x,” whereas the statistical and machine learning literature
almost always reserves x to denote the input of the algorithm. Our objective
has therefore been to develop a notational convention that maximizes the
commonality of the notations of the various communities. To minimize
confusion and ambiguity, we have also tried to choose mnemonic notations,
where possible. The conventions and variable notations we have adopted are
shown in Tables 1.1 and 1.2, respectively.

4

Neural Networks in Atmospheric Remote Sensing

Table 1.1
Mathematical Notation

Variable and Operator Types

Notation

Scalars and functions with scalar output

Lowercase letters

Vectors

Uppercase letters

Matrices

Boldface uppercase letters

Vector-valued functions

Boldface lowercase letters

Expected value

E(·)

Transpose
Noisy variable

(·)T
f
(·)

Estimate

c
(·)

Table 1.2
Variable Names

Variable Name
Radiance measurement vector (retrieval input)
Atmospheric state vector (retrieval output)
Neural network target (or truth) vector
Generic input vector
Generic output vector
Noise vector
Weight matrix
Covariance matrix
Noise covariance matrix
Error covariance matrix
Kernel function
Feature map

Notation
R
S
T
X
Y
Ψ
W
C
CΨΨ
C
k(X, X 0 )
Φ(X)

Introduction

5

References
[1] D. H. Staelin. “Passive remote sensing at microwave wavelengths.” Proceedings of the
IEEE, 57(4):427–439, April 1969.
[2] G. L. Stephens. Remote Sensing of the Lower Atmosphere. Oxford University Press, New
York, 1994.
[3] P. M. Atkinson and A. R. L. Tatnall. “Introduction to neural networks in remote sensing.”
Int. J. Remote Sensing, 18(4):699–709, 1997.
[4] V. M. Krasnopolsky and F. Chevallier. “Some neural network applications in
environmental sciences. Part I: Forward and inverse problems in geophysical remote
measurements.” Neural Netw., 16(3-4):321–334, 2003.
[5] V. M. Krasnopolsky and F. Chevallier. “Some neural network applications in
environmental sciences. Part II: Advancing computational efficiency of environmental
numerical models.” Neural Netw., 16(3-4):335–348, 2003.
[6] C. D. Rodgers. “Retrieval of atmospheric temperature and composition from remote
measurements of thermal radiation.” J. Geophys. Res., 41(7):609–624, July 1976.

2
Physical Background of
Atmospheric Remote Sensing
We begin with a broad overview of relevant physical issues in passive
atmospheric sounding to provide background and context to results developed
later in the book. For additional details, the reader is referred to excellent
references on tropospheric remote sensing [1], atmospheric science [2, 3],
atmospheric radiation [4], and electromagnetic wave propagation [5].

2.1 Overview of the Composition and Thermal
Structure of the Earth’s Atmosphere
The Earth’s atmosphere extends over 100 km from its surface, and can roughly
be categorized into four layers based on the thermal and chemical phenomena
that occur within each layer. These layers are (in increasing altitude) the
troposphere, the stratosphere, the mesosphere, and the thermosphere. The
boundaries between each layer are usually not well-defined, but do show
characteristic features. They are the tropopause, stratopause, and mesopause,
respectively. The troposphere extends from the surface to an altitude of
approximately 12 km (as low as 7 km near the poles and as high as 17 km
near the equator) and is characterized by a steady decrease in temperature
with altitude. Approximately 80% of the total mass of the atmosphere
is contained in the troposphere, and almost all of the Earth’s weather is
created there. The troposphere is therefore the focus of most atmospheric
sounding research, including the examples presented in this book. The
tropopause marks the region of the atmosphere where the temperature
gradually changes from decreasing with altitude to increasing with altitude,
and forms a somewhat nebulous boundary layer between the troposphere and
7

8

Neural Networks in Atmospheric Remote Sensing

the stratosphere. The stratosphere extends to an altitude of approximately
40 km, and is characterized by relatively high concentrations of ozone
(a few parts per million). A sharp increase in temperature with altitude
occurs in the stratosphere due to the absorption of ultraviolet radiation
by ozone. High cirrus clouds sometimes form in the lower stratosphere,
but for the most part there are no significant weather patterns in this
layer, and horizontal and vertical atmospheric variability is much smaller
than in the troposphere. The mesosphere extends from approximately 40
to 80 km, and is characterized by a decreasing temperature with altitude.
Extremely low temperatures (∼ −150◦ C) present at the top of the mesosphere
sometimes allow the presence of noctilucent clouds, thought to be made
of ice crystals that have formed on dust particles. The transition from the
mesosphere to the thermosphere layer begins at an altitude of approximately
80 km. The thermosphere is characterized by warmer temperatures caused
by the absorption of the sun’s short-wave ultraviolet radiation. This radiation
penetrates the upper atmosphere and causes the atmospheric particles to
become positively charged. These ionized particles build up to form a series
of layers, often referred to as the ionosphere.

2.1.1 Chemical Composition of the Atmosphere
The Earth’s atmosphere is composed of a variety of gases. Each gas interacts
characteristically with electromagnetic radiation of a given frequency. This
relationship forms the physical basis by which the atmospheric temperature
can be measured by observing radiation of different frequencies that has been
emitted by and transmitted through the atmosphere.
The average fractional volumes of various species in the Earth’s
atmosphere are given in Table 2.1. Perhaps the most important gases in the
atmosphere, from the point of view of their interaction with electromagnetic
radiation, are water vapor, oxygen, carbon dioxide, and ozone. Oxygen and
carbon dioxide are well-mixed in the atmosphere below approximately 100
km, and therefore frequencies near the resonances of these molecules are
desirable for temperature sounding. The vertical distribution of ozone reaches
maximum concentration near 25 km. Above 30 km, ozone is rapidly formed
by photochemical reactions from oxygen so that an equilibrium is established
during the daylight hours. Below this level, ozone is created more slowly and
is highly variable [1]. Water vapor is perhaps the most influential atmospheric
gas from the perspective of weather and climate processes. This is primarily
due to its high temporal and spatial variability in the lower troposphere and its
large role in energy transfer.

Physical Background of Atmospheric Remote Sensing

9

Table 2.1
Composition of the Earth’s Atmosphere (Source: [1])

Molecule

Volume Fraction†

Comments

N2

0.7808

O2

0.2095

H2 O

< 0.04

Ar

9.34 × 10−3

CO2

3.45 × 10−4

CH4

1.6 × 10−6

N2 O

3.5 × 10−7

CO

7 × 10−8

O3

∼ 10−8

CFCl3 and
CF2 Cl2

1–2 ×10−10

Photochemical dissociation high
in the ionosphere; mixed
at lower levels
Photochemical dissociation
above 95 km; mixed
at lower levels
Highly variable; photodissociates
above 80 km
Mixed up to 110 km;
diffusive separation above
Slightly variable; mixed up to
100 km; dissociated above
Mixed in troposphere; dissociated
in mesosphere
Slightly variable at surface;
dissociated in stratosphere
and mesosphere
Variable photochemical
and combustion product
Highly variable;
photochemical origin
Industrial origin; mixed
in troposphere, dissociated
in stratosphere

† Fraction

of lower tropospheric air.

2.1.2 Vertical Distribution of Pressure and Density
The pressure and density of the Earth’s atmosphere can vary substantially
in the vertical dimension. It is therefore helpful to define a reference or

10

Neural Networks in Atmospheric Remote Sensing

“standard”1 atmosphere that is a representation of the atmosphere as a
function of height only. Below an altitude of 100 km, the atmospheric
pressure and density are almost always within ±30% of that of the standard
atmosphere [1].
Atmospheric density decreases with altitude due to the Earth’s gravitational field. If a condition of static equilibrium is assumed, the relationship
between density and pressure as a function of altitude may be expressed by
the following differential equation:
dp = −gρ dz

(2.1)

where p and ρ are the pressure and density at altitude z measured vertically
upward from the surface. The change in gravitational force with altitude is
small enough over the relatively short extent of the atmosphere to be ignored.
The ideal gas equation pV = nRT can be used to relate the density of an ideal
gas of molecular weight Mr to its temperature and pressure:
ρ=

Mr p
RT

(2.2)

where R is the gas constant per mole, and T is the temperature (K).
Equation (2.1) can then be expressed as
dz
dp
=−
p
H
which can be integrated to find the pressure p at altitude z:
 Z z 
dz
p = p0 exp −
0 H

(2.3)

(2.4)

where p0 is the surface pressure and H = RT /Mr g is known as the scale
height. The scale height is the increase in altitude necessary to reduce the
pressure by a factor of e. In the troposphere, H typically varies between
∼6 km at T = 210 K to ∼8.5 km at T = 290 K [2].
2.1.3 Thermal Structure of the Atmosphere
The macroscopic thermal features of the atmosphere were outlined previously.
We now examine features that occur on a finer vertical scale, with a focus
1. The horizontal and temporal variations of the Earth’s atmosphere do vary substantially on
a global and seasonal scale. A wide variety of “standard” atmospheres have been tabulated for
various geographical regions and seasonal periods [3].

Physical Background of Atmospheric Remote Sensing

11

on the lower troposphere. The bottom 1–2 km of the atmosphere exhibits
the greatest thermal variability due to strong surface interactions and diurnal
variations. At some latitudes, temperature inversions exist in the lowest 2–
3 km of the atmosphere. Above 3 km, there is a systematic decrease of
temperature with altitude that can be characterized by an adiabatic lapse
rate, as follows. Continuing the assumption of the previous section that the
atmosphere is in hydrostatic equilibrium, the first law of thermodynamics can
be applied to a unit “parcel” of atmospheric mass:
dq = cv dT + p dV

(2.5)

where cv is the specific heat at constant volume. Provided no heat enters
or leaves the parcel (i.e., the process is adiabatic), the quantity dq is zero.
Equation (2.5) can then be substituted into the differential form of the ideal
gas law to yield:
dT
g
(2.6)
= − = −Γ
dz
cp
where cp is the specific heat at constant pressure and Γ is the lapse rate.
Equation (2.6) shows that the change in temperature with altitude is constant,
for constant cp and g. Under typical tropospheric conditions, cp varies
slightly with altitude, and the dry adiabatic lapse rate in the troposphere is
approximately 10 K/km. If the latent heat released by the condensation of
rising moist air is considered, the average lapse is approximately 6.5 K/km.
2.1.4 Cloud Microphysics
Clouds affect the energy balance of the atmosphere through two mechanisms:
(1) water cycle changes, including the release of latent heat through
condensation and the removal of liquid water through precipitation, and
(2) radiation budget changes, including the scattering, absorption, and
emission of solar and terrestrial radiation. In Chapter 9, the microphysical
properties of clouds (taken here to mean the size and shape of the particles
and their volume concentration) will be used to characterize regions of
precipitation by examining their interaction with microwave radiation. It is
therefore useful to review several of the salient details of the microphysical
structure of clouds and precipitation.
The microphysical properties of clouds depend highly on the size, shape,
and phase of the water particles. Water droplets are typically smaller than
100 µm and are spherical [1]. The distribution of water droplet concentration
(the number of droplets per volume existing in a differential radius range
dr) is reasonably approximated by analytic functions. A modified Gamma

12

Neural Networks in Atmospheric Remote Sensing
Table 2.2
Representative Drop Size Concentrations (N0 ), Mean Particle Radius (rm ), and
Liquid Water Content (l) for Several Cloud Types (Source: [6])

Cloud Type
Stratus (ocean)
Stratus (land)
Fair-weather cumulus
Maritime cumulus
Cumulonimbus
Cumulus congestus
Altostratus

N0

rm

l

(cm−3 )

(µm)

(gm−3 )

50
300–400
300–500
50
70
60
200–400

10
6
4
15
20
24
5

0.1–0.5
0.1–0.5
0.3
0.5
2.5
2.0
0.6

distribution is often used for this purpose. Table 2.2 gives average values of
the number of particles (N0 ), mean droplet radius (rm ), and cloud liquid water
density (l) for a variety of clouds. Raindrops are generally nonspherical,
resembling oblate spheroids with an aspect ratio (width-to-length ratio) that
decreases as the drop size increases. One analytic function that is commonly
used to relate raindrop size distributions to rainrate is the Marshall-Palmer
distribution [7]. Ice crystals form in a wide variety of sizes and shapes. In
addition to simple polyhedron forms, irregular crystals or combinations of
simple shapes readily appear in nature.

2.2 Electromagnetic Wave Propagation
The thermal and compositional state of the atmosphere affects both the
generation and propagation of electromagnetic (EM) waves. For now, we
ignore the source of the EM waves and focus instead on their propagation
through a homogeneous, lossless medium.

2.2.1 Maxwell’s Equations and the Wave Equation
In a source-free, homogeneous, and isotropic medium with permittivity  and
permeability µ, the spatial and temporal variation of electric and magnetic

Physical Background of Atmospheric Remote Sensing

fields are related according to Maxwell’s equations:
~ = −µ ∂ H
~
∇×E
∂t
~ = ∂E
~
∇×H
∂t
~ = 0
∇·E
~ = 0
∇·H

13

(2.7)
(2.8)
(2.9)
(2.10)

A wave equation can be derived by taking the curl of (2.7) and substituting
~ = ∇(∇ · E)
~ − ∇2 E
~ and
(2.8). After using the vector identity ∇ × (∇ × E)
(2.9), we find:
2
~ = µ ∂ E
~
∇2 E
(2.11)
∂t2
where the Laplacian operator ∇2 in a rectangular coordinate system is
∂2
∂2
∂2
+
+
∂x2 ∂y 2 ∂z 2
The wave equation (2.11) is a second-order partial differential equation of
space and time coordinates x, y, z, and t. A simple solution to the wave
equation is
~ r, t) = E~0 cos(2πνt ± ~k · ~r)
E(~
(2.12)
~
where k = x̂kx + ŷky + ẑkz and ~r = x̂x + ŷy + ẑz. Equation (2.12) represents two waves propagating in opposite directions in the ~k direction
with temporal phase 2πνt and spatial phase ~k · ~r. A spectrum of values
of frequency (ν) found in atmospheric remote sensing systems is shown in
Figure 2.1.
∇2 =

2.2.2 Polarization
The electric field vector of a uniform plane wave traveling in the +z direction
must lie in the xy-plane perpendicular to the z-axis. As time progresses, the
tip of the electric field vector traces a curve in the xy-plane. It is the shape of
this curve (linear, circular, or elliptical) that determines the polarization of the
plane wave. If the curve is circular or elliptical, the tip may move in either a
clockwise or counterclockwise direction. The interaction of electromagnetic
waves with matter often depends (sometimes entirely) on the polarization state
of the wave. Some remote sensing systems exploit polarization dependence
to extract information about a polarized target. For example, polarimetric
microwave measurements of the ocean surface reveal information about sea
surface wind speed and direction due to the polarimetric signature of the
resulting ocean waves.

14

Neural Networks in Atmospheric Remote Sensing

100 m
HF

10 M Hz
10 m
100 M Hz
1GH z

Radio

P
UHF

1cm
1m m

Frequency

AMSU⌧A/B
ATMS

W avelength

10 cm
10 GH z

SHF

100 GH z

EHF

1TH z

L
S
C
X
Ku
K
Ka

Far

0.1mm
IASI
AIRS
CrIS

M icrow ave

1m

VHF

IR

10 TH z
10 µm
100 TH z
1µm
1000 TH z
0.1µm

0.8

Infrared

Visible
Ultraviolet

T her⌧
m al
IR

red

Near
IR

orange
yellow
gree n

0.7
0.6
µm

blue

0.5

violet

0.4
0.3

Figure 2.1 The electromagnetic spectrum. The diagram shows those parts of the
electromagnetic spectrum that are important in remote sensing, together
with the conventional names of the various regions of the spectrum. Also
shown are wavelength regions of sensors mentioned in this book: AIRS,
AMSU, IASI, CrIS, and ATMS. (After: [1].)

2.2.2.1

Stokes Parameters

A mathematical discussion of polarization can be facilitated by decomposing
~ vector into two components perpendicular to the direction of propagathe E
tion, ~k, for a fixed point in space:
~ = ĥEh + v̂Ev = ĥeh cos(2πν − ϕh ) + v̂ev cos(2πν − ϕv )
E

(2.13)

Physical Background of Atmospheric Remote Sensing

15

where ~k, ĥ, and v̂ form an orthogonal system mutually perpendicular to one
another. The four Stokes parameters may then be defined as follows:
I =
Q =
U

=

V

=

1 2
(e + e2v )
η h
1 2
(e − e2v )
η h
2
eh ev cos(ϕ)
η
2
eh ev sin(ϕ)
η

(2.14)
(2.15)
(2.16)
(2.17)

p
where ϕ is the phase difference ϕh − ϕv and η = µ/ is the characteristic
impedance. In practice, it is often easier to measure the Stokes parameter
rather than measure eh , ev , and ϕ directly. The four Stokes parameter are
related as follows:
I 2 = Q2 + U 2 + V 2
(2.18)
2.2.3 Reflection and Transmission at a Planar Boundary
Electromagnetic radiation from the sun or the cosmic background can
reflect off cloud tops and the surface of the Earth. The characterization
of the transmitted and reflected components of radiation is necessary to
develop cloud and surface models. Consider a linearly polarized plane wave
propagating in free space along direction k~i that is incident upon a planar
√
dielectric material with index of refraction n = c µ at an incidence angle of
θi . The electric fields for the incident, reflected, and transmitted waves can be
expressed as (assuming a TE wave propagating in the xz-plane):
~ i = ŷEi e−j k~i ·~r
E
~
E~r = ŷΓEi e−j kr ·~r
~ t = ŷT Ei e−j k~t ·~r
E

(2.19)

where Γ and T are the complex reflection and transmission coefficients,
respectively. The tangential components of the net electric field must vanish
at the boundary, requiring the tangential components of all three ~k vectors to
be equal along the boundary. The tangential components of the ~k vectors can
be expressed in terms of the angles of incidence, reflection, and transmission
to yield
ki sin θi = kr sin θr = kt sin θt
(2.20)

16

Neural Networks in Atmospheric Remote Sensing

√
where ki = kr = ω µi i is the magnitude of the propagation vectors k~i and
√
k~r . The magnitude of the transmitted wave vector is kt = ω µt t , which is
in general not equal to ki . Substitution into (2.20) gives the reflection law

and Snell’s law

θr = θi

(2.21)

sin θt
ki
ni
=
=
sin θi
kt
nt

(2.22)

Given k~r and k~t , the complex reflection and transmission coefficients can be
found by supplementing the boundary condition for continuity of the electric
field with a similar equation for the tangential magnetic field. For TE waves,
ΓT E =
TT E =

ηt cos θi − ηi cos θt
ηt cos θi + ηi cos θt
2ηt cos θi
ηt cos θi + ηi cos θt

(2.23)
(2.24)

and for TM waves,
ΓT M

=

TT M

=

ηi cos θi − ηt cos θt
ηi cos θi + ηt cos θt
2ηi cos θi
ηi cos θi + ηt cos θt

(2.25)
(2.26)

As an important consequence of the preceding equations, unpolarized
radiation incident upon a planar dielectric surface can become partially or
totally polarized on reflection. For example, a portion of the unpolarized
microwave radiation emitted by the atmosphere is reflected by the ocean
surface and another portion is absorbed and re-emitted by the ocean surface.
However, the TE and TM components of the emitted radiation are different
when viewed from oblique angles, a characteristic that can be used to
discriminate surface water from rainfall [1].

2.3 Absorption of Electromagnetic Waves by
Atmospheric Gases
A knowledge of the mechanisms of electromagnetic radiation interaction with
matter, as well as some of the fundamental properties of matter itself, is
necessary to infer and interpret information about the atmosphere. In the
following two subsections, the interactions are described on a microscopic
(molecular) and macroscopic (particle) level.

Physical Background of Atmospheric Remote Sensing

17

2.3.1 Mechanisms of Molecular Absorption
The total internal energy of an isolated molecule consists of three types of
energy states,
E = Ee + Ev + Er
(2.27)
where Ee = electronic energy, Ev = vibrational energy, and Er = rotational
energy. Rotational energy is associated with rotational motions of the atoms of
the molecule about its center of mass, and vibrational energy is associated with
vibrational motions of the atoms about their equilibrium positions. Radiation
is absorbed or emitted when a transition takes place from one energy state to
another. The frequency (ν) of the absorbed (or emitted) photon is given by the
Bohr frequency condition,
Eh − El
ν=
(2.28)
h
where h is Planck’s constant and Eh and El are the internal energies of the
higher and lower molecular states, respectively. The absorption spectrum due
to a single transition is called an absorption line. Absorption by molecules in
the mid- and near-infrared occur by vibration (although a mixture of vibrations
and rotations are usually induced at these frequencies). In the microwave
and far-infrared, rotational transitions are the dominant mechanism of energy
transfer.
2.3.2 Line Shapes
Based on (2.28), the absorption (or emission) spectrum of an isolated,
unperturbed, stationary molecule consists of sharply defined frequency lines
corresponding to transitions between quantized energy levels of the molecule.
Atmospheric gases, however, consist of molecules that are in constant
motion, interacting and colliding with one another. These disturbances
cause the absorption lines to broaden. The two most important sources of
line broadening are Doppler (thermal) broadening and pressure (collision)
broadening, which is dominant for most frequencies up to an altitude of
approximately 40 km [8].
2.3.3 Absorption Coefficients and Transmission Functions
Line shape f (ν − ν0 ), line position (ν0 ), and line strength (S) mathematically
define the absorption coefficient:
κν = S f (ν − ν0 )

(2.29)

18

Neural Networks in Atmospheric Remote Sensing

The line strength of a specific atmospheric gas is governed by the number of
absorbing molecules of that gas per unit volume, the temperature of the gas,
and the molecular parameters associated with that transition.
Absorption of radiation by gases in the Earth’s atmosphere is described
in terms of transmission functions (or simply, transmittance). Lambert’s law
states that the change in radiance intensity along a path ds is proportional to
the amount of matter along the path:
dRν = −κν Rν ds

(2.30)

where κν is the volume absorption coefficient. Integration of Lambert’s law
along the path connected by s1 and s2 yields
Rν (s2 ) = Tν (s1 , s2 )Rν (s1 )

(2.31)

where Tν (s1 , s2 ) is the monochromatic transmittance defined as
Tν (s1 , s2 ) = e

−

Rs

2
s1

κν ds

The optical path (or thickness)2 between s1 and s2 is defined as
Z s2
τν (s1 , s2 ) =
κν ds

(2.32)

(2.33)

s1

The absorption coefficient, transmittance, and optical path form the mathematical basis for the subject of Section 2.5.2: radiative transfer. In practice,
these quantities are not monochromatic, but band-averaged over some spectral
response function of the instrument.
2.3.4 The Atmospheric Absorption Spectra
The atmospheric absorption spectrum for microwave frequencies is shown in
Figure 2.2. Notable features include the water vapor absorption lines centered
at 22.235, 183.31, and 325.15 GHz (lines at 380.20 and 448.00 GHz are
difficult to identify on the plot) and oxygen absorption lines near 60, 118.75,
368.50, 424.76, and 487.25 GHz. The atmospheric absorption spectrum
for infrared wavelengths between 3 and 15 µm is shown in Figure 2.3.
Notable features include the water vapor absorption lines near 6–7 µm, ozone
absorption lines near 10 µm, and carbon dioxide absorption lines near 4.3–4.6
µm and 13–15 µm.
2. The related quantities optical depth and opacity will be defined later.

19

Physical Background of Atmospheric Remote Sensing

H O O2
2

O

O

2

O

2

2

No water vapor

100

Transmission (percent)

O

2

80

H2O
60
2

1.5 g/cm water vapor
40

H O
2

20

0

0

50

100

150

200

250

300

350

400

450

500

Frequency (GHz)
Figure 2.2 The microwave absorption spectrum. Two calculations for the percent
transmission (nadir view) using the 1976 Standard Atmosphere are
shown, one assuming no water vapor and one assuming 1.5 g/cm2 .

2.4 Scattering of Electromagnetic Waves by
Atmospheric Particles
In addition to the molecular absorption mechanisms discussed earlier,
electromagnetic waves are also scattered and absorbed by much larger
particles often found in the atmosphere, such as cloud water droplets,
raindrops, or even dust. The scattering of electromagnetic waves upon
interaction with atmospheric particles provides a tool that can be used to help
retrieve many microphysical parameters related to clouds and precipitation.
2.4.1 Mie Scattering
A suspended particle of geometrical cross-section A will absorb a fraction of
incident power and will also scatter incident power in all directions. The ratio
of absorbed power Pa (W) to incident power density S (W/m2 ) is known as

20

Neural Networks in Atmospheric Remote Sensing

CO

O2

2

CO

O

3

2

Transmission (percent)

100

80

60

40

H O
2

20

1.5 g/cm2 water vapor
No water vapor
0

3

6

9

12

15

Wavelength (µm)
Figure 2.3 The thermal infrared absorption spectrum. Two calculations for the
percent transmission (nadir view) using the 1976 Standard Atmosphere
are shown, one assuming no water vapor and one assuming 1.5 g/cm2 .

the absorption cross-section
Pa
(2.34)
S
and the ratio of Ca to the physical cross-section A is known as the efficiency
factor Qa . For a spherical particle of radius r, A = πr2 and therefore
Ca =

Qa =

Ca
πr2

(2.35)

Analogous quantities for scattering, that is, the scattering cross-section Cs and
the scattering efficiency Qs , are defined as
Cs =

Ps
S

(2.36)

Qs =

Cs
πr2

(2.37)

Physical Background of Atmospheric Remote Sensing

21

The total power lost due to absorption and scattering (together known as
the extinction) is Pa + Ps and the resulting extinction cross-section Ce and
efficiency Qe are
Ce = Ca + Cs
(2.38a)
Qe = Qa + Qs

(2.38b)

The solution for the scattering and absorption of electromagnetic waves
in free space by a dielectric sphere of radius r was formulated by Mie in terms
of the “size parameter”
2πr
χ=
(2.39)
λ
and
√
n = c
(2.40)
where λ is the wavelength of the incident wave, n is the complex refractive
index of the particle and c is the corresponding complex dielectric constant.
Mie’s expressions for the scattering and extinction efficiencies of the sphere
are given by
Qs (n, χ) =

∞
2 X
(2m + 1)(|am |2 + |bm |2 )
χ2

(2.41a)

∞
2 X
(2m + 1)Re{am + bm }
χ2

(2.41b)

m=1

Qe (n, χ) =

m=1

where am and bm are known as the Mie coefficients
jm (nχ)[χjm (χ)]0 − jm (χ)[nχjm (nχ)]0
jm (nχ)[χhm (χ)]0 − hm (χ)[nχjm (nχ)]0

(2.42a)

jm (χ)[nχjm (nχ)]0 − n2 jm (nχ)[χjm (χ)]0
hm (χ)[nχjm (nχ)]0 − n2 jm (nχ)[χhm (χ)]0

(2.42b)

am = −
bm = −

where jm (·) and hm (·) are the spherical Bessel and Hankel functions of the
first kind, and the (·)0 operator denotes the complex conjugation.
2.4.2 The Rayleigh Approximation
The Mie expressions for Qs and Qe can be approximated with negligible error
if the particle size is much smaller than the wavelength of the incident wave
(|nχ|  1). The Rayleigh approximation is obtained by retaining only the
most significant terms in the series expansion:
8
Qs = χ4 |K|2
3

(2.43)

22

Neural Networks in Atmospheric Remote Sensing

8
Qe = 4χIm{−K} + χ4 |K|2
3

(2.44)

Qa = 4χIm{−K}

(2.45)

and
where K is a complex quantity defined in terms of the complex index of
refraction n
c − 1
n2 − 1
=
(2.46)
K= 2
n +2
c + 2
Note that in the Rayleigh limit the scattering efficiency scales as the fourth
power of frequency, whereas the absorption efficiency scales linearly with
frequency, for a fixed particle size and a frequency-independent index of
refraction. For water, the index of refraction is frequency-dependent, and the
absorption efficiency scales as frequency squared (for frequencies below 100
GHz or so) when this dependence is included.
2.4.3 Comparison of Scattering and Absorption by
Hydrometeors
Figure 2.4 shows scattering and absorption contributions of water spheres,
both in the liquid and ice phases. Deirmendjian’s recursive procedure [9] was
used to calculate the Mie coefficients; 80 terms were used to approximate
the series. For liquid droplets, absorption is dominant in the Rayleigh region
and scattering is dominant in the Mie region. For ice, scattering is dominant
for all but the lowest microwave frequencies. The frequency dependence
of scattering and absorption can be used to retrieve information about the
particle size distributions of clouds, and the related quantity, rainrate. The
distribution and type of hydrometeors found in typical clouds vary widely,
and monodispersive models are inadequate. More complicated modeling is
beyond the scope of this book; an excellent discussion can be found in [10].

2.5 Radiative Transfer in a Nonscattering
Planar-Stratified Atmosphere
A sensor high above the Earth’s surface receives emission from the Earth
and its atmosphere, along with any reflected components of solar and
cosmic background radiation. Measurements of this emission allow the
retrieval of many atmospheric parameters, including the temperature and
water vapor profile, the amount of cloud liquid water, rainrates, and sea
surface temperatures.

23

Physical Background of Atmospheric Remote Sensing

2

10

Scattering
Absorption
0

Efficiency factor

10

−2

10

−4

10

−6

10

1−mm spheres of liquid water

−8

10

1

10

100

1000

Frequency (GHz)

2

10

Scattering
Absorption
0

Efficiency factor

10

−2

10

−4

10

−6

10

1−mm spheres of ice

−8

10

1

10

100

1000

Frequency (GHz)
Figure 2.4 Scattering and absorption efficiency for water spheres with 1-mm radius.
Liquid water spheres (273 K) are shown in the top plot and ice spheres
(266 K) are shown in the bottom plot.

24

Neural Networks in Atmospheric Remote Sensing

2.5.1 Equilibrium Radiation: Planck and Kirchhoff’s Laws
The solution to the simple differential equation presented by Lambert’s
law (2.30) is referred to as Beer’s law
Rν (s00 ) = Rν (s0 )e−

R s00
s0

κν (s) ds

(2.47)

In addition to absorption of radiation by the gas contained within path s0 to
s00 , Kirchhoff’s law states that if in thermal equilibrium, the gas also emits
radiation in an amount proportional to the absorption coefficient κν :
Rνemission = κν Jν (T )

(2.48)

where Jν (T ) is the radiation intensity produced (at each of two orthogonal
polarizations) by a blackbody at temperature T and frequency ν:
Jν (T ) =

hν 3
1
W · m−2 · ster−1 · Hz−1
c2 ehν/kT − 1

(2.49)

The Planck equation exhibits a nonlinear relationship between intensity and
temperature. The degree of the nonlinearity is frequency-dependent, and is
shown in Figure 2.5. The nonlinearity is most severe at the higher frequencies
(shorter wavelengths) and almost nonexistent at the microwave frequencies.
The approximation of the Planck radiance by the linear Taylor series term is
called the Rayleigh-Jeans (RJ) approximation, and the microwave brightness
temperature is defined as the scaled intensity:
Bν =

c2
Rν
2ν 2 k

(2.50)

Note that if a radiometer is calibrated against a blackbody and all departures
from the Rayleigh-Jeans law are ignored, brightness temperature is effectively
redefined as
c2
hv
Bν = 2 Rν +
(2.51)
2ν k
2k
and accuracy is better than 0.1 K for frequencies up to 300 GHz and terrestrial
temperatures. When extremely cold temperatures are encountered (e.g., the
cosmic background) corrections to the RJ approximation are needed.
2.5.2 Radiative Transfer Due to Emission and Absorption
The net change in radiation along ds due to the combination of emission and
absorption is
dRν = dRνemission + dRνabsorption
(2.52)

25

Physical Background of Atmospheric Remote Sensing

1
0.9
m
5-m

Radiance (T) / Radiance (300)

0.8
0.7
0.6

-µm
15

0.5
0.4
0.3
0.2

µm
4-

0.1
0
200

210

220

230

240
250
260
Temperature (K)

270

280

290

300

Figure 2.5 Nonlinearity of the Planck function as a function of wavelength.

Substitution of (2.30) and (2.48) into (2.52) yields the Schwartzchild equation
dRν
= −κν [Rν − Jν (T )]
ds

(2.53)

which mathematically describes how radiation is transferred from one layer to
another layer as a result of absorption and emission. The intensity of radiation
leaving the path is therefore a function of both the absorber along the path
and the temperature along the path. Passive (emission-based) sounding of
constituent concentration and temperature is based upon this principle.
2.5.3 Integral Form of the Radiative Transfer Equation
Differentiation of (2.33) gives
dτν (s) = −κν (s) ds

(2.54)

where we adopt the convention that τ increases from zero downward from the
top of the atmosphere to a maximum value τ ∗ (the opacity of the atmosphere)

26

Neural Networks in Atmospheric Remote Sensing

at the surface. Multiplying both sides of (2.53) by e−τν (s) and combining
terms gives
dRν e−τν (s)
= −Jν e−τν (s)
(2.55)
dτν
which upon integration from path s0 to s00 yields
Z τ (s0 )
00
0
Jν (s)e−τν (s) dτ (s)
(2.56)
Rν (s00 )e−τν (s ) − Rν (s0 )e−τν (s ) =
τ (s00 )

Equation (2.56) can be rearranged into the integral form of the radiative
transfer equation as follows:
Z s00
00
00
0 −[τν (s0 )−τν (s00 )]
Jν (s)e−[τν (s)−τν (s )] dτν (s)
Rν (s ) = Rν (s )e
+
s0

(2.57)
The equivalent relation in terms of the absorption coefficient κν is
Z s00
R s00
R 00
00
0 − ss0 κν (s)ds
κν (s)Jν (s)e− s κν (σ) dσ ds (2.58)
+
Rν (s ) = Rν (s )e
s0

The angular properties of emission have thus far been neglected, but
can easily be included for the case of a horizontally homogeneous vertically
stratified atmosphere by noting that an angular tilt of θ results in an increase
in the path length by a factor of sec θ (see Figure 2.6). Optical depth is related
to optical path as follows:
τ (s) = τ (z) sec(θ)

(2.59)

After including the angular terms, the final form of the radiative transfer
equation describing the radiation intensity observed at altitude L and viewing
angle θ can be formulated by including reflected atmospheric and cosmic
contributions and the radiance emitted by the surface:
Z L
RL
0
0
Rν (L) =
κν (z)Jν [T (z)]e− z sec θκν (z ) dz sec θ dz
0
Z L
Rz
0
0
−τ ∗ sec θ
κν (z)Jν [T (z)]e− 0 sec θκν (z ) dz sec θ dz
+ ρν e
0
−2τ ∗ sec θ

+ ρν e
Jν (Tc )
−τ ∗ sec θ
+ εν e
Jν (Ts )
(2.60)
where εν is the surface emissivity, ρν is the surface reflectivity, Ts is the
surface temperature, and Tc is the cosmic background temperature (2.736 ±
0.017 K).

27

Physical Background of Atmospheric Remote Sensing

θ

.

s2

p1
ds

z2

dz

z1

p2

ds = secθ dz

z=0

ps

Figure 2.6 Geometry of the planar-stratified atmospheric radiative transfer equation.

2.5.4 Weighting Function
The first term in (2.60) can be recast in terms of the transmittance function
Tν (z):


Z L
dTν (z)
Rν (L) =
Jν [T (z)]
dz
(2.61)
dz
0
The derivative of the transmittance function with respect to altitude is often
called the weighting function
4

Wν (z) =

dTν (z)
dz

(2.62)

and gives the relative contribution of the radiance emanating from each
altitude. Note that the Planck radiances are weighted, not the temperature
profile. It is sometimes useful to define a temperature weighting function,
where the temperature profile is weighted directly. One approach is to express
the radiance intensity Rν (L) in terms of a “blackbody-equivalent” brightness
temperature TB,ν (L) (the temperature of a blackbody that produces a radiance
equivalent to Rν (L) – note that TB,ν (L) 6= Bν ) and linearize about a nominal
temperature profile T0 (z) and corresponding radiance R0,ν (L).

28

2.5.4.1

Neural Networks in Atmospheric Remote Sensing

Temperature Weighting Function

For a particular frequency, the blackbody-equivalent radiance may be written
as follows:
TB,ν (L) = Jν−1 (Wν (Jν (Tz )))
(2.63)
where Jν (·) is the Planck function, Wν (·) is the integration against the
weighting function, and Jν−1 (·) is the inverse Planck function. The first-order
Taylor series approximation of Jν−1 (Wν (Jν (·))) is then
dJν−1 dWν dJν
[T (z) − T0 (z)]
dWν dJν dT
= WT,ν (z)[T (z) − T0 (z)] + R0,ν (L)

Rν (L) = R0,ν (L) +

(2.64)
(2.65)

where WT,ν (z) is defined as the temperature weighting function:
4

WT,ν (z) =

dJν−1 dWν dJν
dWν dJν dT

(2.66)
T0 (z)

The Planck radiance function can be linearized about some nominal
temperature profile T0 , and a temperature weighting function (sometimes
called an incremental weighting function) can be defined:
Z
Rν (L) =

L

Z

0

L

[T (z) − T0 (z)]WT,ν (z)dz (2.67)

Wν (z)Jν [T0 (z)]dz +
0

where the temperature weighting function is defined as
WT,ν (z) =

dJν [T0 (z)]
hcν Jν [T0 (z)]
Wν (z) =
Wν (z)
dT
k
T02 (z)

(2.68)

The difference between the Planck weighting function and the temperature weighting function can be significant for short-wavelength channels,
as shown in Figure 2.7. The temperature weighting functions are sharper
and peak lower in the atmosphere. The RMS errors (in units of blackbodyequivalent brightness temperature) resulting from the use of the first-order
approximation given by (2.67) (and assuming the weighting functions
are independent of atmospheric parameters) over a representative set of
atmospheric profiles T (z) (with T0 (z) = E[T (z)]) are shown in Figure 2.8 for
the channel set of the NASA Atmospheric Infrared Sounder (AIRS) launched
on the Aqua satellite in 2002 [11]. The two dominant sources of error are
the nonlinearity of the Planck function (most evident in the short-wavelength
channels) and the nonlinearity of the atmospheric transmittance (most evident

29

Physical Background of Atmospheric Remote Sensing

4.2 µm
18

16

16

14

14

12

12

Altitude (km)

Altitude (km)

14.2 µm
18

10
8

10
8

6

6

4

4

2

2

0

0

0.05

0

0.1

0

Weight (1/km)

0.1

0.2

Weight (1/km)
Planck weight
Temperature weight

Figure 2.7 The Planck radiance weighting function and the temperature weighting
function for two infrared channels.

RMS Error (K)

10
8
6
4
2
0
2

4

6

8

10

12

14

Wavelength (µm)
Figure 2.8 AIRS RMS radiance error due to first-order Planck approximation.

16

30

Neural Networks in Atmospheric Remote Sensing

in the water vapor channels). Temperature weighting functions are almost
never used directly to evaluate the radiative transfer equation because of the
large errors introduced by the exclusion of nonlinearities. Nevertheless, the
temperature weighting functions provide a useful characterization of the way
different layers of the atmosphere at different temperatures contribute to the
radiation emitted at the top of the atmosphere.
Returning to the special case of microwave frequencies and terrestrial
temperatures (i.e., the Rayleigh-Jeans regime), (2.60) can be expressed in
terms of the physical temperature profile T (z) and the brightness temperature
Bν :
Z

L

RL

0

0

κν (z)T (z)e− z sec θκν (z ) dz sec θ dz
0
Z L
Rz
0
0
−τ ∗ sec θ
κν (z)T (z)e− 0 sec θκν (z ) dz sec θ dz
+ ρν e

Bν (L) =

0
−2τ ∗ sec θ

+ ρν e
Tec
∗
+ εν e−τ sec θ Ts

(2.69)

where Tec is the corrected cosmic background temperature
hν (ehν/kTc + 1)
Tec =
2k (ehν/kTc − 1)

(2.70)

Note that in the microwave case, WT,ν (z) = Wν (z).

2.6 Passive Spectrometer Systems
Measurement of the Earth’s radiation at a spectral resolution high enough to
study details of molecular absorption bands is achieved using spectrometer
sensor systems. The terms “multispectral,” “hyperspectral,” and more recently
“ultraspectral” have been used to denote spectrometer systems that measure
radiance in tens, hundreds (or sometimes a few thousand), and thousands
of spectral channels, respectively. The chapter concludes with a brief
discussion of various spectrometer technologies, with a focus on performance
advantages, disadvantages, and engineering trade-offs that must be considered
when selecting an instrument to carry out a given remote sensing task. The
concepts presented here are rudimentary; for more details, the reader is
referred to [12–14].

Physical Background of Atmospheric Remote Sensing

31

2.6.1 Optical Spectrometers
For the purpose of an elementary discussion, optical spectrometers can be
grouped into three system types: prism dispersion, diffraction grating, and
radiation interference. The systems differ primarily in the mechanisms used
to separate incident radiation into discrete spectral components.
2.6.1.1

Prism Dispersion Systems

A prism spectrometer produces radiance spectra by passing the incident
radiation through a prism. The prism causes the radiation to disperse (bend)
by a frequency-dependent angle. The degree to which the radiation is
dispersed is determined by the refractive index of the prism. The spectra are
usually detected either by sweeping the dispersed radiation across a fixed
detector (for example, a photomultiplier), or sweeping the detector across
the radiation field. The spectral resolution achieved by prism spectrometers
is relatively coarse, and they are therefore used typically in imaging systems.
The calibration of prism systems is also problematic because of the frequency
dependence of the index of refraction of the prism.
2.6.1.2

Diffraction Grating Systems

A diffraction grating disperses radiation into spectra through angulardependent interference patterns that result when radiation is passed through
a dense array of small slits. Diffraction grating systems operate by either
transmitting radiation through or reflecting radiation from a series of closely
spaced parallel lines etched on plastic film (for transmission) or a metallic
surface (for reflection). Transmission gratings generally perform poorly in
comparison with reflection gratings, which are used in high-performance
space spectrometers. The spectral resolving power of grating spectrometers
typically exceeds that of prism spectrometers by an order of magnitude, at
the expense of increased instrumentation complexity. The AIRS instrument,
which is discussed in Chapter 10, is a diffraction grating spectrometer.
2.6.1.3

Interferometer Systems

The interferometer spectrometer operates quite differently than the prism or
grating spectrometer in that interference effects instead of dispersion effects
are used to separate spectra. One of the simplest types of interferometers
is the Michelson interferometer, which splits incoming radiation into two
beams of unequal length by a partially silvered plate (beam splitter) and later
recombines the beams with a known path difference. The path difference

32

Neural Networks in Atmospheric Remote Sensing

can be varied uniformly by moving a mirror at a constant speed, causing
the two beams to move in and out of phase at the detector. The intensity of
the resulting waveform (termed the interferogram) is related to the spectral
intensity of the incident radiation by the Fourier transform. The interferogram
is the autocorrelation function of the optical signal. The performance of the
interferometer spectrometer relative to the grating spectrometer depends on a
number of factors, including the nature (e.g., mechanical and electrical) and
origin (e.g., photon and thermal) of system noise. Examples of interferometer
sounding systems include the NPOESS Aircraft Sounder Testbed–Infrared
(NAST-I) [15], the Cross-track Infrared Sounder (CrIS) [16], and the Infrared
Atmospheric Sounding Interferometer (IASI) [17].
2.6.2 Microwave Spectrometers
Microwave and optical spectrometer systems are conceptually similar. Perhaps the most pernicious source of error in microwave spectrometer systems
is the instability of the receiver, and the primary difference among microwave
systems is the way in which receiver sensitivity is compromised for receiver
stability. Three types of microwave spectrometers3 are now discussed.
2.6.2.1

Total Power Spectrometer

The simplest type of microwave spectrometer measures the power of incident
radiation over a collection of bandwidths Bn , integrated over a time τ . The
RMS sensitivity of the measurement at any given channel is a function of the
receiver noise (TR , expressed in units of temperature), the incident radiation
(TA , expressed in units of temperature), the bandwidth (Bn , Hz), and the
integration time4 (τ , sec):
TR + TA
∆Trms = √
Bn τ

(2.71)

Equation (2.71) assumes that the receiver gain is perfectly stable. Fluctuations
in receiver gain reduce the system sensitivity as follows:
s


1
∆G 2
∆Trms = (TR + TA )
+
(2.72)
Bn τ
G
3. The term “radiometer” is used when incident electromagnetic power is measured across a
given frequency band. The term “spectrometer” is used when power across several frequency
bands (or channels) is measured.
4. It is assumed for the purposes of this discussion that the detector signal is convolved with a
boxcar of length τ . Other averaging kernels may be used, with trade-offs between sensitivity
and memory effects.

Physical Background of Atmospheric Remote Sensing

33

where ∆G/G is the fractional receiver gain drift. It is not uncommon
for the gain drift component to dominate the noise expressed in (2.72).
Examples of total-power microwave spectrometer sounding systems include
the NPOESS Aircraft Sounder Testbed–Microwave (NAST-M) [18], the
Advanced Microwave Sounding Unit (AMSU) [19], and the Advanced
Technology Microwave Sounder (ATMS) [20].
2.6.2.2

Dicke Spectrometer

The Dicke spectrometer is essentially a total-power spectrometer with two
additional features: (1) a switch used to modulate the receiver input signal,
and (2) a synchronous detector, placed between the detector and integrator.
The modulation consists of periodically switching the receiver input between
the antenna and a reference source (Tref ) at a rate higher than the highest
significant spectral component of the gain variation. If the noise temperature
of the reference source is close to the antenna temperature TA , the system
sensitivity of the Dicke spectrometer becomes
∆Trms =
2.6.2.3

2(TR + TA )
√
Bn τ

(2.73)

Correlation Spectrometer

Another possible method of stabilizing a receiver system involves the
correlation of signals. Two separate receivers are used in a correlation
spectrometer, and the resulting output voltages are multiplied and detected.
The average value of a product of two independent noise temperatures is zero,
and because only correlated noise voltages yield a DC output, receiver gain
instabilities will not affect the sensitivity of the correlation√spectrometer. The
sensitivity of the correlation spectrometer is a factor of 2 better than the
Dicke spectrometer. However, two separate receivers are needed.

2.7 Summary
The Earth’s atmosphere and its interaction with electromagnetic radiation
has been examined on microscopic (molecular absorption) and macroscopic
(particle extinction) levels. If the atmosphere is assumed to be nonscattering, horizontally homogeneous, and vertically stratified, straightforward
relations can be derived for the radiation intensity observed by a downwardviewing satellite or aircraft sensor. The frequency dependence of scattering,
absorption, and the Planck radiance offers various advantages for atmospheric
profile sounding in the presence of clouds (see Table 2.3). Various instrument

34

Neural Networks in Atmospheric Remote Sensing

Table 2.3
Comparison of Certain Characteristics of the 4.3-µm, 15.0-µm, and 5.0-mm Spectral
Regions. Detector Noise RMS: 0.15 K (IR) and 0.7 K (MW) (Source: [21])

E NERGY
(Relative Planck radiance)
T EMPERATURE SENSITIVITY
(Relative to detector noise)
C LOUD T RANSMISSION

200 K
300 K
200 K
300 K
Water
Ice

4.3-µm
1.25
200
1
20
6%
1%

15.0-µm
5,000
15,000
10
6
1%
1%

5.0-mm
1
1
4
1
96%
99.98%

technologies present performance advantages and disadvantages that must be
considered when implementing a remote sounding system.

Physical Background of Atmospheric Remote Sensing

35

References
[1] G. L. Stephens. Remote Sensing of the Lower Atmosphere. Oxford University Press, New
York, 1994.
[2] J. T. Houghton. The Physics of Atmospheres. Cambridge University Press, Cambridge,
U. K., 1986.
[3] J. M Wallace and P. V. Hobbs. Atmospheric Science: An Introductory Survey. Elsevier,
New York, second edition, 2006.
[4] K. N. Liou. An Introduction to Atmospheric Radiation. Academic Press, Orlando,
Florida, 1980.
[5] D. H. Staelin, A. W. Morgenthaler, and J. A. Kong. Electromagnetic Waves. Prentice
Hall, Upper Saddle River, New Jersey, 1994.
[6] B. J. Mason. The Physics of Clouds. Oxford University Press, Oxford, U. K., 1971.
[7] J. S. Marshall and W. Palmer. “The distribution of raindrops with size.” Journal of the
Atmosphere, 5:165–166, 1948.
[8] C. Elachi. Introduction to the Physics and Techniques of Remote Sensing. Wiley, New
York, 1987.
[9] D. Deirmendjian. Electromagnetic Scattering on Spherical Polydispersions. American
Elsevier Publishing Co., Inc., New York, 1969.
[10] A. J. Gasiewski. “Microwave radiative transfer in hydrometeors.” Atmospheric Remote
Sensing by Microwave Radiometry, M. A. Janssen, Ed., Chapter 3, Wiley, New York,
1993.
[11] H. H. Aumann, et al. “AIRS/AMSU/HSB on the Aqua mission: Design, science
objectives, data products, and processing systems.” IEEE Trans. Geosci. Remote Sens.,
41(2):253–264, February 2003.
[12] H. S. Chen. Space Remote Sensing Systems: An Introduction. Academic Press, New
York, 1985.
[13] J. D. Kraus. Radio Astronomy. Cygnus-Quaser Books, Powell, Ohio, second edition,
1986.
[14] M. A. Janssen. Atmospheric Remote Sensing by Microwave Radiometry. Wiley, New
York, 1993.
[15] D. Cousins and M. J. Gazarik. NAST Interferometer Design and Characterization: Final
Report. Project Report NOAA-26, MIT Lincoln Laboratory, July 1999.
[16] H. J. Bloom. “The Cross-track Infrared Sounder (CrIS): A sensor for operational
meteorological remote sensing.” IEEE International Geoscience and Remote Sensing
Symposium, 3:1341–1343, July, 2001.
[17] G. Chalon, F. Cayla, and D. Diebel. “IASI: An advanced sounder for operational
meteorology.” Proceedings of the 52nd Congress of IAF, pages 1–5, October 2001.
[18] W. J. Blackwell, J. W. Barrett, F. W. Chen, R. V. Leslie, P. W. Rosenkranz, M. J. Schwartz,

36

Neural Networks in Atmospheric Remote Sensing

and D. H. Staelin. “NPOESS aircraft sounder testbed-microwave (NAST-M): Instrument
description and initial flight results.” IEEE Trans. Geosci. Remote Sens., 39(11):2444–
2453, November 2001.
[19] B. H. Lambrigtsen. “Calibration of the AIRS microwave instruments.” IEEE Trans.
Geosci. Remote Sens., 41(2):369–378, February 2003.
[20] C. Muth, P. S. Lee, J. C. Shiue, and W. A. Webb. “Advanced technology microwave
sounder on NPOESS and NPP.” IEEE International Geoscience and Remote Sensing
Symposium, 4:2454–2458 Vol. 4, September 2004.
[21] W. L. Smith. “Satellite techniques for observing the temperature structure of the
atmosphere.” Bulletin of the American Meteorological Society, 53(11):1074–1082,
November 1972.

3
An Overview of Inversion
Problems in Atmospheric Remote
Sensing
In this book, we focus on the retrieval of geophysical state parameters (for
example, the atmospheric temperature profile) from radiometric measurements observed in a number of spectral bands, and this retrieval almost
always requires mathematical inversion of some form of a physical model
with vector-valued inputs and outputs, often termed the “forward model.” In
the simplest case, this equation can be cast in matrix form by discretizing
the relevant parameters (atmospheric profiles are inherently continuous
quantities, for example) and ignoring nonlinear terms. Even in this simple
case, the system of linear equations may be overdetermined, in which case no
solution exists, or undetermined, in which case an infinite number of solutions
exist. Problems of this type are often classified as ill-posed, and additional
assumptions or constraints must be introduced to allow unique solutions to be
obtained [1]. For example, an assumption that the desired solution is close
in the Euclidean sense to some a priori value might be used to allow the
linear system of equations to be solved with least-squares techniques [2], or
a constraint might be imposed that the solution must be sufficiently smooth
by including a term related to the second derivative of the solution in the cost
function to be minimized [3, 4]. Mathematical regularization techniques like
those discussed above are used to increase the stability of the solutions to
ill-posed problems.
Inversion problems in atmospheric remote sensing and the relationships
of the variables involved are seldom linear, Gaussian, or well-posed. For these
reasons, sophisticated methodologies must be used to derive a useful solution.
The strategies employed can be categorized into three mutually exclusive
37

38

Neural Networks in Atmospheric Remote Sensing

and collectively exhaustive categories that we will term physical methods,
statistical dependence methods, and hybrid methods. Physical approaches
essentially propagate a first guess of the atmospheric state through a forward
model (for example, the radiative transfer equation (2.60)) and use iterative,
numerical procedures to match the modeled (i.e., simulated) measurements
to the actual observations by updating the guess at each iteration. Statistical
regularization is often used (but is not required) to introduce a tendency
of the optimization towards a likely value, for example. It is for this
reason that we differentiate between “statistical dependence methods” and
“statistical methods,” as a physical method that uses statistical regularization
is also a statistical method. Statistical dependence methods explicitly use
(or empirically derive) a statistical relationship between the observations
(i.e., the independent variables) and the geophysical state parameters (i.e.,
the dependent variables). No physical models are required in a statistical
dependence method. Finally, hybrid methods use both physical models
and statistical dependence to derive a solution to an inverse problem. For
example, a forward model can be used to generate an ensemble of simulated
observations and geophysical state parameters which then can be used with a
statistical dependence method to carry out the inversion. We now present each
of these three categories of inversion methodologies in detail.

3.1 Mathematical Notation
For the following analyses, we assume that a noisy observation of a random
e is related to some atmospheric state vector S through a
radiance vector R
forward model f (·) as follows
e = f (S) + Ψ = R + Ψ
R

(3.1)

where Ψ is a random noise vector (that may depend on S), and R is the “noisefree” radiance observation. The retrieval seeks to estimate the state vector S
e where we use S(
b R)
e to denote the estimate of S
given an observation of R,
e
given an observation of R.

3.2 Optimality
Almost all inversion techniques are designed to optimize something. There are
many choices of suitable mathematical parameters to optimize: sum-squared
error, probabilistic likelihood, resolution, and signal-to-noise ratio, to name a
few. In the context of practical remote sensing inversion algorithms, there are
two key points that must be stressed. First, it is difficult to guarantee optimality

An Overview of Inversion Problems in Atmospheric Remote Sensing

39

with respect to even the most simple metrics due to the assumptions that must
be satisfied by the retrieval system. We will therefore be careful to distinguish
between a theoretically optimal retrieval algorithm and one that is employed in
practice, where it is impossible to guarantee that all the necessary assumptions
are universally satisfied. Second, a useful atmospheric retrieval algorithm
must be robust with respect to a variety of metrics, some of which may even
behave in direct opposition (for example, resolution and signal-to-noise ratio).
We therefore in practice usually choose a theoretically optimal algorithm (for
some chosen metric, usually sum-squared error) and evaluate performance
for a wide variety of other metrics to ensure that the algorithm performs
well even in pathological, but meteorologically important, cases that may
not be well-represented in global statistics. For example, atmospheric profile
retrieval algorithms are often evaluated on a global perspective using sumsquared error with respect to a comprehensive set of “ground truth,” typically
radiosondes or numerical model fields. It is also illuminating to examine
performance on a case-by-case basis to verify that interesting atmospheric
phenomenology is captured with the necessary fidelity.

3.3 Methods That Exploit Statistical Dependence
We begin with a discussion of techniques that directly utilize the joint
e and S, P (R,
e S), or statistics based
probability distribution function (pdf) of R
on this pdf, such as the cross-covariance.

3.3.1 The Bayesian Approach
The Bayesian approach to estimation involves the incorporation of a priori
knowledge about the state vector S with knowledge gained by measuring
e Mathematically, this knowledge is formulated in terms of five related
R.
probability density functions (pdfs):
P (S)
e
P (R)
e S)
P (R,
e
P (R|S)
e
P (S|R)

The prior (i.e., before the measurement) pdf of state S
e
The prior pdf of the measurement R
e and S
The joint prior pdf of R
e given state S
The conditional pdf of R
e This is
The conditional pdf of S after measurement R.
the quantity of interest for the solution of
the estimation problem.

40

Neural Networks in Atmospheric Remote Sensing

Bayes’ theorem relates the conditional probabilities as follows:
e
e = P (R|S)P (S)
P (S|R)
e
P (R)

(3.2)

Therefore, the Bayesian framework allows probabilities to be assigned to
b R)
e given knowledge of the joint and conditional
possible choices of S(
e and S. A reasonable choice for S(
b R)
e is the value of S for
probabilities of R
e
which P (S|R) is the largest (known as the maximum a posteriori, or MAP,
estimator, and sometimes called the maximum likelihood estimator).
3.3.1.1

Bayes’ Least-Squares Estimator

An alternative to the MAP estimator is the estimator g(·) that minimizes some
suitable cost criterion, C:
b = arg min C(S, g(R))
e
S(·)
g(·)

(3.3)

The sum-squared error (SSE) cost criterion
b T (S − S)]
b
C = E[(S − S)

(3.4)

is commonly chosen for this purpose and results in the following estimator,
sometimes called the Bayes’ least-squares (BLS) estimator:
b R)
e = E[S|R]
e
S(

(3.5)

The BLS and MAP estimators may be identical under some circumstances,
e are jointly Gaussian.
for example, if S and R
3.3.1.2

Bayes’ Linear Least-Squares Estimator

The BLS estimator has two disadvantages: it is often a nonlinear function
e and it requires a complete statistical representation of the relationship
of R,
e and S, which is rarely available in practice. If we constrain the
between R
estimator g(·) in (3.3) to be linear, the resulting estimator depends only on a
e and S.
second-order characterization of the statistical relationship between R
This estimator is the linear least-squares estimator (LLSE):
b R)
e = C e C−1 R
e ≡ L eR
e
S(
SR e e
SR
RR

(3.6)

An Overview of Inversion Problems in Atmospheric Remote Sensing

41

with error covariance
T
C = CSS − CS Re C−1
e
e e CS R
RR

(3.7)

e are zero-mean and
We have assumed without loss of generality that S and R
have used CXY to denote the cross-covariance of X and Y . The LLS and
e and S are jointly Gaussian.
BLS estimators are identical when R
3.3.2 Linear and Nonlinear Regression Methods
Direct application of the BLS and LLS estimators is often precluded in
practical applications because the needed expected values (in the case of
BLS) and covariance matrices (in the case of LLS) both depend on the joint
e and these pdfs are very difficult
probability distribution functions of S and R,
to calculate directly. A much more convenient approach is to first estimate the
needed statistical parameters from the available sample data and then derive
the estimator from these sample statistics. We now present two examples
of this approach, where statistical relationships are derived empirically from
e
sample data, taken here to mean an ensemble of N pairs of S and R:
e1 ), . . . , (SN , R
eN ) ∈ S × R
(S1 , R

(3.8)

This ensemble could be assembled, for example, from colocated satellite
radiance measurements and radiosonde observations of temperature as a
function of altitude. Linear regression operators can be calculated directly
from “sample covariances” that are empirically derived from the sample
data. Nonlinear regression operators generalize this approach to allow more
e to be represented.
complicated mathematical relationships between S and R
A parameterized, nonlinear function (for example, a polynomial) is often
used to fit the sample data. The parameters can be chosen using a numerical
optimization technique that minimizes a cost function usually involving the
Euclidean distance between the actual data and the derived fit.
3.3.2.1

Linear Regression

A given set of N observations of P parameters can be arranged into a P × N
matrix, X. The sample mean, MX , is a column vector where each element is
b XX is given by
the average of each row of X. The sample covariance C
T

b XX = X X
C
N −1

(3.9)

42

Neural Networks in Atmospheric Remote Sensing

where X is calculated by removing the sample mean from X.
e where each column
Given a matrix of noisy radiance observations, R,
corresponds to an observation and each row corresponds to a spectral channel,
and a matrix of geophysical observations, S, where each column corresponds
to an observation and each row corresponds to a vertical level, for example,
the linear regression estimate is
b eC
b −1 (R
b R)
e = MS + C
e − M e)
S(
SR e e
R
RR

(3.10)

b e and C
b e e are the sample covariance matrices. If the additive
where C
SR
RR
random noise term in (3.1) is zero mean and uncorrelated with the radiances,
R, and the geophysical state vector, S, then (3.10) can be expressed in a
simplified form as follows:
b SR (C
b RR + CΨΨ )−1 (R
b R)
e = MS + C
e − MR )
S(

(3.11)

where CΨΨ is the noise covariance. If the noise covariance is known a priori, it
can be used directly in (3.11), otherwise, it can be estimated using techniques
presented in Chapter 4. Note that the covariance matrix can serve as a form
b RR is near singular. This can occur
of mathematical regularization when C
frequently with hyperspectral/ultraspectral measurements due to the very high
degree of correlation among some of the channels.
It is interesting, and maybe not obvious, that linear regression can provide optimal estimates even in cases where the observations are nonlinearly
related to the variables to be estimated. We illustrate this with a simple
example. Suppose we wish to retrieve a scalar variable s from two scalar
observations r1 and r2 and these variables are nonlinearly related as follows:
r1 = a1 + b1 s + c1 s2
r2 = a2 + b2 s + c2 s2

(3.12)
(3.13)

A linear combination of r1 and r2 recovers s exactly:
sb = α + βr1 + γr2

(3.14)

where
α =
β =
γ =

c1 a2 − c2 a1
c1 b2 − c2 b1
c2
c2 b1 − c1 b2
c1
c1 b2 − c2 b1

(3.15)
(3.16)
(3.17)

An Overview of Inversion Problems in Atmospheric Remote Sensing

43

While trivial in the case presented here, this example provides insight
into recent work ([5, 6], for example) demonstrating that linear estimators
perform very well when used to retrieve temperature and water vapor
from hyperspectral infrared observations, where the relationships are quite
nonlinear due to the Planck function and the dependence of atmospheric
absorption on water vapor content, but there are a large number of spectral
channels to allow the nonlinear dependence to be “unraveled” by appropriate
linear combinations. We will revisit this example from a different perspective
in Section 5.1.3.
3.3.2.2

Nonlinear Parametric Regression

The linear form afforded by (3.11) is simple and convenient to apply
in practice, but can lead to substantial inaccuracies in cases where the
relationship between R and S is nonlinear. The linear regression framework
presented above can be readily expanded by including nonlinear functions of
R as inputs to the linear regression. For example, simple polynomial terms can
be constructed and the linear regression operator can be used to optimize the
coefficients that modulate these terms. Polynomial regression is an example
of a parameterized method, where the model structure and complexity is
predetermined and only the free parameters in the model need to be derived.
Other parameterized, nonlinear functions can also be constructed, and the
parameters can be obtained using numerical optimization techniques.
3.3.2.3

Nonlinear Nonparametric Regression

Nonparametric models differ from parametric models in that the model
structure is not specified a priori but is instead determined from data during
the training process. The term nonparametric is not meant to imply that
such models completely lack parameters but that the number and nature
of the parameters are flexible and not fixed in advance. A neural network
can be broadly categorized as a special case of nonparameterized nonlinear
regression and will be discussed in detail in Chapter 5.
We end this section with a temperature retrieval example using both
linear and polynomial regression techniques. The NOAA88b radiosonde set
contains approximately 7,500 global atmospheric profiles of temperature,
water vapor, and ozone. A radiative transfer package was used to simulate
microwave sounding observations in 100 spectral bands near the 118.75GHz oxygen line. Figure 3.1 shows the temperature retrieval RMS errors as
a function of altitude for linear regression and polynomial regression. The
a priori error (not shown) is approximately 10K throughout the troposphere.
Although both the linear and nonlinear regressions have substantially reduced

44

Neural Networks in Atmospheric Remote Sensing

Figure 3.1 RMS temperature retrieval error for linear regression and nonlinear
(polynomial) regression. A hypothetical microwave sounder with 100
channels near the 118.75-GHz oxygen line was used in the retrieval
simulation.

the a priori error, the polynomial containing at least third-order terms clearly
improves the accuracy of the retrieval.
3.3.2.4

Ridge Regression

It may be desirable to constrain the magnitude of the coefficients in the
e This is a form of regularization that stabilizes the
regression operator, LS Re R.
solution if the covariance matrix is nearly singular. The modified cost function
is
b T (S − S)]
b + γtrace{LT L}
C = E[(S − S)
(3.18)
and the solution is
b R)
e = C e C e e + γI
S(
SR
RR

−1

e
R

(3.19)

This modification to the standard linear regression operator is termed ridge
regression [7].

An Overview of Inversion Problems in Atmospheric Remote Sensing

45

3.4 Physical Inversion Methods
The previous estimators are based entirely on the statistical relationship
between R and S. Other methods use knowledge of the forward model f (·)
together with whatever limited statistical characterization of R and S is
available. If we return to (3.1) and make the assumptions that the error and
the a priori state distributions are Gaussian, the probability distributions in the
numerator of (3.2) are proportional to the following terms:


1 e
T −1 e
e
P (R|S) ∼ exp − (R − R) CΨΨ (R − R)
(3.20)
2


1
P (S) ∼ exp − (S − Sa )T C−1
(3.21)
SS (S − Sa )
2
e is often a
where Sa is the a priori state vector. The denominator in (3.2), P (R),
normalizing factor in practice and can be neglected [8]. The most likely value
e is therefore the maximum of the product of (3.20) and (3.21),
of P (S|R)
or equivalently, the maximum of the sum of their natural logarithms. This
maximization is equivalent to minimizing:
e − R) + (S − Sa )T C−1 (S − Sa )
e − R)T C−1 (R
ξmin = (R
SS
ΨΨ

(3.22)

Most physical retrieval approaches attempt to minimize a cost function similar
in form to that given in (3.22), that is,
b = arg min ξ
S(·)
min
S

(3.23)

although many variations of this cost function could be used in practice. For
example, it might be advantageous to minimize a weighted sum of the two
terms in (3.22). Other physical approaches seek to minimize a quite different
cost function, for example, the vertical resolution of the retrieval [9]. We
will not discuss these methods here – the interested reader is referred to
Twomey [1] and Rodgers [8] for detailed treatment of these topics.
3.4.1 The Linear Case
The solution to (3.23) can be found analytically only under certain circumstances. The earlier assumption of Gaussianity must hold, and the relationship
e and S must be linear:
between R
e = WS + Ψ
R

(3.24)

46

Neural Networks in Atmospheric Remote Sensing

where W is sometimes called the weighting function matrix. It can be
shown [8] that the solution in this case can be expressed by two equivalent
relations:
e = Sa + (WT C−1 W + C−1 )−1 WT C−1 (R
e − WSa )
Sbm (R)
ΨΨ
SS
ΨΨ

(3.25)

e = Sa + CSS WT (WCSS WT + CΨΨ )−1 (R
e − WSa )
Sbn (R)

(3.26)

where the subscript on Sb (m or n) denotes the order of the matrix to be
inverted. Note the similarity of the n-form solution with the linear regression
estimate given by (3.11).
3.4.1.1

The Minimum-Information Retrieval

The minimum-information retrieval picks the Sb which is “closest” in the leastsquares sense to Sa and satisfies
e − R)T (R
e − R) = M σ 2
(R

(3.27)

where M σ 2 is a scalar quantity related to the measurement error. In the
linear case, the minimum-information solution in the presence of noise with
covariance CΨΨ is then
b R)
e = Sa + WT (WWT + βCΨΨ )−1 (R
e − WSa )
S(

(3.28)

where β is some constant (in units of 1/K2 ). Note that the minimuminformation retrieval is the n-form solution given in (3.26) with the a priori
state covariance replaced by the identity matrix.
Figure 3.2 shows the performance of three operators used to retrieve the
temperature profile (S = T ) from simulated Advanced Microwave Sounding
Unit (AMSU) radiances in clear-air: the minimum-information retrieval
(3.28), the linear model with known CSS (3.26), and the linear regression
estimator (3.11). The graph demonstrates the significant impact of a priori
statistics on the retrieval performance.
3.4.2 The Nonlinear Case
The minimization of (3.23) often requires numerical methods. We apply the
canonical approach and set the derivative of the cost function (3.22) to zero
and numerically find a root of the resulting equation. The derivative of the cost
function is calculated as follows:
−1
e
−[∇S f (S)]T C−1
ΨΨ [R − f (S)] + CSS (S − Sa )

(3.29)

47

An Overview of Inversion Problems in Atmospheric Remote Sensing

50
45
40

Altitude (km)

35
30
25
20
15
10
5
0
0

1

2

3

4

5

6

7

8

9

10

RMS Temperature profile retrieval error (K)
A priori
Minimum information
C
TT
CTR, CRR
Figure 3.2 Comparison of temperature retrieval techniques. The minimum information retrieval (with β = 1K−2 ) only uses information contained in the
weighting function matrix. The optimal linear model retrieval uses the
weighting function matrix and CT T only. The direct multiple regression
retrieval uses statistical characterizations of CRR and CT T .

Newton’s method iteratively solves h(S) = 0 by stepping from the current
guess (Si ) to the next guess (Si+1 ) according to:
Si+1 = Si − [∇S h(Si )]−1 h(Si )

(3.30)

Substitution of (3.29) into (3.30) yields the update step needed to minimize
the cost function:
o−1
n
T −1 e
T −1
Si+1 = Si + C−1
SS + Ki CΨΨ Ki − [∇S Ki ] CΨΨ [R − f (Si )]
n
o
e − f (Si )] − C−1 (Si − Sa )
× KTi C−1
[
R
(3.31)
ΨΨ
SS
where the matrix Ki contains the derivatives of the forward model with
respect to the state vector, that is, Ki = ∇S f (Si ). The implementation of this

48

Neural Networks in Atmospheric Remote Sensing

method is complicated in practice by the calculation of the Hessian matrix,
∇S Ki , which is computationally intensive to evaluate. It is often reasonable
to neglect the Hessian term in moderately linear problems [10], and the update
step becomes:

−1
T −1
Si+1 = Si + C−1
SS + Ki CΨΨ Ki
n
o
e − f (Si )] − C−1 (Si − Sa )
× KTi C−1
[
R
ΨΨ
SS

(3.32)

Alternatively, the update can be expressed relative to the a priori state, as
follows:
Si+1 = Sa + CSS KTi (Ki CSS KTi + CΨΨ )−1
n
o
e − f (Si ) + Ki (Si − Sa )
× R

(3.33)

This equation will be used in a retrieval example presented in Chapter
10, at which point we also discuss convergence criteria and computational
efficiency.

3.5 Hybrid Inversion Methods
e and S must be available
Some knowledge about the relationship between R
b R),
e and we have seen that
to allow the formulation of a suitable estimate S(
statistical dependence methods exploit statistical relationships and physical
methods exploit physical relationships. It may be possible to improve retrieval
accuracy and/or efficiency by using both statistical and physical knowledge of
the state parameters and the measured radiances.
3.5.1 Improved Retrieval Accuracy
The physical inversion approaches previously discussed are less effective
as the relevant processes deviate from linearity and Gaussianity, and some
atmospheric processes present formidable retrieval challenges. Most notable
are clouds and precipitation, which can be highly variable, dynamic,
nonlinear, and non-Gaussian. Recent work on the retrieval of precipitation rate
from passive microwave measurements [11–17] has demonstrated the utility
of microphysical precipitation models. The complexity and nonlinearity
of these models, however, has precluded their direct use in retrieval
algorithms. One fruitful approach has been to generate training data using
the microphysical precipitation model and subsequently derive the relevant
statistical relationships with a nonlinear regression (neural network) method.

An Overview of Inversion Problems in Atmospheric Remote Sensing

49

3.5.2 Improved Retrieval Efficiency
One advantage of decoupling the physical model from the retrieval algorithm
is an increase in computational efficiency. Many physical models, especially
those that must represent complicated, nonlinear processes in many spectral
bands, are computationally prohibitive for retrieval systems that must operate
in near-real-time. An alternative approach is to use the forward model “offline” from the retrieval to generate a database of radiance–state pairs. The
statistical relationships are also derived off-line, and optimal estimators can
be calculated. The operational retrieval then can implement these estimators.

3.6 Error Analysis
Once a suitable retrieval operator has been constructed, it is necessary to
assess the performance of the estimates. Many reasonable metrics could
be examined, including the sensitivity of the retrieval to sensor noise, the
resolution of the retrieval, the degree to which the retrieval system (i.e., the
retrieval algorithm and the sensor) is “blind” to the atmospheric parameters
of interest, and so forth. These metrics, and others, can be readily calculated
from the retrieval operators in many cases. Perturbation analysis can be used
when complexity and/or nonlinearity preclude simple, direct techniques.
3.6.1 Analytical Analysis
We begin with the linear case, where the state estimate takes the form
b R)
e = DR
e
S(

(3.34)

where we have encapsulated the offset term into D by adding a new element
e and setting this element to one. The D matrix may be derived in many
to R
ways, for example, using (3.11), (3.26), or (3.28). Under the assumption of a
linear forward model, (3.24) can be substituted into (3.34), and we obtain:
b R)
e = DWS + DΨ
S(

(3.35)

and we observe that the estimated state vector is a linear function of the
true state function. In the ideal, linear case, this decomposition yields the
two basic contributions of retrieval imperfection: smoothing, represented by
the averaging kernel, A = DW, and retrieval noise, represented by DΨ.
In practice, the forward model does not perfectly capture reality and is
usually nonlinear. These complications can be handled by including additional
forward model error terms and linearizing the forward model about a suitable

50

Neural Networks in Atmospheric Remote Sensing

operating point [8]. This results in (at least) a third contribution to retrieval
imperfection, in the form of a retrieval bias. The relative contributions of
smoothing error and retrieval noise to the total retrieval error covariance can
be readily calculated:
Csmooth = (A − I)CSS (A − I)T

(3.36)

Cnoise = DCΨΨ DT

(3.37)

and
where we have assumed that the measurement noise is uncorrelated with the
atmospheric state. The direct calculation of the error contributions due to
forward model errors is difficult using analytical methods unless assumptions
and approximations are used. Numerical techniques such as Monte Carlo
analysis can be helpful in such circumstances.
3.6.2 Perturbation Analysis
3.6.2.1

Atmospheric Smoothing

In the linear case, the derivative of the estimated state with respect to the true
state is simply the averaging kernel:
∂ Sb
=A
(3.38)
∂S
Analysis of more complicated cases is facilitated by decomposing this
derivative into the product of the forward model Jacobian and the retrieval
Jacobian:
∂ Sb
∂R ∂ Sb
=
(3.39)
∂S
∂S ∂R
The forward model and the retrieval operator can then be linearized about
some operating point, and these derivatives can be calculated – numerical
techniques can be used, if necessary. We will see in Chapter 8 that neural
network estimators are particularly amenable to perturbation analyses because
the Jacobians are very easy to compute.
3.6.2.2

Retrieval Noise

The contribution of retrieval noise to the total retrieval error can be
approximated using a second-order propagation of errors approach, as
follows:
!T
∂ Sb
∂ Sb
Cnoise ≈
CΨΨ
(3.40)
∂R
∂R

An Overview of Inversion Problems in Atmospheric Remote Sensing

51

3.7 Summary
The set of equations relating an atmospheric state S to the observed
e is seldom directly invertible. The inversion is often ill-posed,
radiances R
where an infinite number of solutions exist, or ill-conditioned, where small
perturbations of the radiance values lead to very large perturbations of the
solution. Regularization techniques in the form of constraints that penalize
deviations of the solution from an a priori state, for example, are used to
e can be
improve the stability of the retrieval. Relationships between S and R
derived using physical or statistical means, or both. In some cases, retrieval
error components can be isolated and attributed to biases, atmospheric
smoothing, sensor noise, and so forth.

52

Neural Networks in Atmospheric Remote Sensing

References
[1] S. Twomey. Introduction to the Mathematics of Inversion in Remote Sensing and Indirect
Measurements. Elsevier Scientific Publishing Company, New York, 1977.
[2] G. Strang. Linear Algebra and Its Applications. Academic Press, New York, 1980.
[3] A. Tikhonov. “On the solution of incorrectly stated problems and a method of
regularization.” Dokl. Acad. Nauk SSSR, 151:501–504, 1963.
[4] A. N. Tikhonov, A. V. Goncharsky, V. V. Stepanov, and A. G. Yagola. Numerical Methods
for the Solution of Ill-Posed Problems. Kluwer, Boston, Massachusetts, 1995.
[5] M. D. Goldberg, Y. Qu, L. M. McMillin, W. Wolff, L. Zhou, and M. Divakarla. “AIRS
near-real-time products and algorithms in support of operational numerical weather
prediction.” IEEE Trans. Geosci. Remote Sens., 41(2):379–389, February 2003.
[6] E. Weisz, H. L. Huang, J. Li, E. Borbas, K. Baggett, P. K. Thapliyal, and L. Guan.
“International MODIS and AIRS processing package: AIRS products and applications.”
J. App. Rem. Sens., 1:1–23, July 2007.
[7] A. E. Hoerl. “Application of ridge analysis to regression problems.” Chemical
Engineering Progress, 58:54–59, 1962.
[8] C. D. Rodgers. Inverse Methods for Atmospheric Sounding. World Scientific, New York,
2000.
[9] G. E. Backus and J. F. Gilbert. “Uniqueness in the inversion of inaccurate gross earth
data.” Phil. Trans. Roy. Soc. London, 266:123–192, 1970.
[10] C. D. Rodgers. “Retrieval of atmospheric temperature and composition from remote
measurements of thermal radiation.” J. Geophys. Res., 41(7):609–624, July 1976.
[11] C. Surussavadee and D. H. Staelin. “Comparison of AMSU millimeter-wave satellite
observations, MM5/TBSCAT predicted radiances, and electromagnetic models for
hydrometeors.” IEEE Trans. Geosci. Remote Sens., 44(10):2667–2678, October 2006.
[12] C. Surussavadee and D. H. Staelin. “Precipitation retrieval accuracies for geo-microwave
sounders.” IEEE Trans. Geosci. Remote Sens., 45(10):3150–3159, October 2007.
[13] C. Surussavadee and D. H. Staelin. “Millimeter-wave precipitation retrievals and
observed-versus-simulated radiance distributions: Sensitivity to assumptions.” J. Atmos.
Sci., 64(11):3808–3826, November 2007.
[14] C. Surussavadee and D. H. Staelin. “Global millimeter-wave precipitation retrievals
train