Correlation Analysis of Environmental Pollutants and Meteorological Variables Applying Neuronal Networks
A. Vega-Coronaa, Diego-Andinab, F.S. Buendía-Buendíab J.M. Barrón-Adamea
aUniversidad de Guanajuato, México.
bUniversidad Politécnica de Madrid, España.
Abstract
In order to develop an environmental contingency forecasting tool for decision making.
-
A Pattern Recognition method applying Neural Networks is presented.
-
SO2 and PM10 time series concentrations are analyzed every hour and daily.
-
As well as a variety of meteorological variables.
These pollutant concentrations and meteorological variables are
-
Self-organized by means of a SOM Neural Network in different classes.
-
Classes are used in training fase of a General Regression Neural classifier (GRNN) to provide an air quality forecast.
In this case a time series set obtained from Environmental Monitoring Network (EMN) of the city of Salamanca, Guanajuato, México is used.
Results verify the potential of this method versus other statistical classification methods and also variables correlation is solved
Introduction
Air pollution is:
-
One of the most important environmental problems
-
Is the result of human activities.
Pollution has causes and sources:
-
Industrial, commercial, agricultural and domestic activities.
-
Combustion, used to generate heat, electricity or movement (many pollutants are produced).
City of Salamanca Guanajuato in México is a special case with great pollution (In México, this city occupies the fourth place in pollution).
Introduction
An Environmental Monitoring Network (EMN) was installed three years ago, in which data time series about pollutant concentrations like Sulphur Dioxide SO2 and particles PM10 among other meteorological variables are obtained
Introduction
Principal Air Pollutant Features
Clean air is a gassy mixture composed by:
-
Nitrogen (78%), Oxygen (21%), Argon, Carbon Dioxide, Ozone and other gases in small quantities (1%).
Pollutants:
-
Primary (are in the atmosphere when they are originally emitted by the source).
-
Secondary (are those that experience chemical changes as a result of the meteorological effects or combination with other pollutants).
Air Quality Index
-
AQI is a simple number into a scale from 0 to 500.
-
AQI is a value to inform to the population on the health concern.
Introduction
Principal Air Pollutants Features
For Sulfur Dioxide SO2
For Suspended particles or breathable fragments with diameters among 0.3 to 10 microns PM10
What means AQI?
|
Air Quality
|
AQI values
|
|
Good
|
0 to 50
|
|
Moderate
|
51 to 100
|
|
Unhealthy for sensitive groups
|
101 to 150
|
|
Unhealthy
|
151 to 300
|
|
Dangerous
|
301 to 500
|
Model and Theoretical Fundament
-
The proposed method considers an automatic multivariable data analysis of time series obtained from EMN
-
To determine the correlation among all the variables involved in the decision making exercise on health risk for the population.
-
In this research anly one field perception of the sensor is concidered
Model and Theoretical Fundament
We concider:
-
In some multi-dimensional engineering problems is necessary to recognize certain patterns without the necessity of knowing of data nature or their statistical distribution.
- Some patterns recognition techniques apply NNs to solve problems without the necessity of a prior data distribution knowledge or to make statistical suppositions.
-
Consequently, NNs is an ideal tool to solve the problem here exposed due to their operation which is analyzed like a black box that minimizes the energy function
Variables definition
Variables are defined in order to build a feature vector xj and to
define a pattern set X*={x1,x2,..,xj,..,xn}. Let XSO2 be a Sulfur
Dioxide set and let XPM10 be a particles concentration set and their
corresponding pattern is defined as xj={x1, x2,x3, x4}.
|
|
XSO2
|
XPM10
|
|
Variables xi
|
xj
|
xj
|
|
X1
|
SO2
|
PM10
|
|
X2
|
T
|
T
|
|
X3
|
HR
|
HR
|
|
x4
|
VV
|
VV
|
Proposed Model
Data Base and Pre-processing
-
We obtain a real and historical time series database from the EMN.
-
Data series from December to February and from 2002 to 2005 are used.
-
Time series consider a total of 6,480 multidimensional patterns about pollutant and meteorological variables.
Proposed Model
|
In Figure, the complicated nature of this problem is shown.
In figure, a typical day data is shown.
Correlation between SO2 and Meteorological variables
|

|
Black Box
- Data time series are obtained from EMN
- Data time series are self-organized by means of a SOM Neural Network in different classes.
- Classes are used in training fase of a General Regression Neural classifier (GRNN) to provide an air quality forecast.

Clustering Method
Health concern levels respect to Air Quality Index and their category map representation for SO2 and PM10
|
Air Quality
|
AQI
|
Cluster Prototype
|
|
Good
|
0 to 50
|
Z1
|
|
Moderate
|
51 to 100
|
Z2
|
|
Unhealthy for Sensitive Groups
|
101 to 150
|
Z3
|
|
Unhealthy
|
151 to 300
|
Z4
|
|
Dangerous
|
301 to 500
|
Z5
|
|
* Noise
|
|
Z6
|
Clustering Method
Self-Organized Neural Network
|

|
Zi={μi1, μi2, μi3, μi4}
μi1is the SO2 or PM10 concentration.
μi2 is the Temperature
μi3is the Relative Humedity
μi4is the Wind Speed
|
Clustering Method
A prior knowledge about patterns is unavailable.
SOM Neural Network
-
Mapping a high dimension feature space to a much lower dimension output map
-
Preserving the topological order.
To build pattern sets in order to design a classifier
-
In order to have six clusters and therefore six prototypes or weights Zi like Index Classification.
The center for each class is build as
-
Zi = {μi1, μi2, μi3, μi4};
Where μi1 is XSO2 or XPM10, μi2 is the Temperature , μi3 is the Relative Humedity and μi4 is The Wind Speed.
Introduction
Principal Air Pollutants Features
- The air quality evaluation
Sulfur Dioxide SO2
Suspended particles or breathable fragments with diameters among 0.3 to 10 microns PM10
Introduction
The Air Quality Index (AQI)
- Provides daily information on the air pollution concentration.
- Is a value to inform at the population on the actions to reduce the air pollution or environmental forecasting.
- Is a simple number into a scale from 0 to 500.
|
Air Quality
|
AQI values
|
|
Good
|
0 to 50
|
|
Moderate
|
51 to 100
|
|
Unhealthy for sensitive groups
|
101 to 150
|
|
Unhealthy
|
151 to 300
|
|
Dangerous
|
301 to 500
|
Model and Theoretical Fundament
Model and Theoretical Fundament
-
Black Box
-
In some multi-dimensional engineering problems is necessary to recognize certain patterns without the necessity of knowing of data nature or their statistical distribution.
-
Some patterns recognition techniques apply NNs to solve problems without the necessity of a prior data distribution knowledge or to make statistical suppositions.
-
Consequently, NNs is an ideal tool to solve the problem here exposed due to their operation which is analyzed like a black box that minimizes the energy function

Model and Theoretical Fundament
variables are defined in order to build a feature vector xj and to define a pattern set X*={x1,x2,..,xj,..,xn}. Let XSO2 be a Sulfur Dioxide set and let XPM10 be a particles concentration set and their corresponding pattern is defined as xj={x1, x2,x3, x4}
|
|
XSO2
|
XPM10
|
|
Variables xi
|
xj
|
xj
|
|
X1
|
SO2
|
PM10
|
|
X2
|
T
|
T
|
|
X3
|
HR
|
HR
|
|
X4
|
VV
|
VV
|
Model and Theoretical Fundament
Proposed Model
-
Data Base and Pre-processing
-
A real and historical time series database from the EMN.
-
Data series from December to February and from 2002 to 2005 are used.
-
Time series consider a total of 6,480 multidimensional patterns about pollutant and meteorological variables.
Model and Theoretical Fundament
Proposed Model
- The complicated nature of this problem.
- A typical day data.
- Correlation between SO2 and Meteorological variables
Model and Theoretical Fundament
Clustering Method
Health concern levels respect to Air Quality Index and their category map representation for SO
2 and PM
10
|
Air Quality
|
AQI
|
Cluster Prototype
|
|
Good
|
0 to 50
|
Z1
|
|
Moderate
|
51 to 100
|
Z2
|
|
Unhealthy for Sensitive Groups
|
101 to 150
|
Z3
|
|
Unhealthy
|
151 to 300
|
Z4
|
|
Dangerous
|
301 to 500
|
Z5
|
|
* Noise
|
|
Z6
|
Model and Theoretical Fundament
Clustering Method

Model and Theoretical Fundament
Clustering Method
-
A prior knowledge about patterns is unavailable.
-
SOM Neural Network
-
Mapping a high dimension feature space to a much lower dimension output map
-
Preserving the topological order.
-
To build pattern sets in order to design a classifier
-
In order to have six clusters and therefore six prototypes or weights Zi
-
The center for each class is build as
Zi = {μi1, μi2, μi3, μi4}
Classifier Design
-
The main advantage of GRNN.
-
The GRNN needs only a single learning pass to achieve optimal performance in classification.
-
The estimator
-
A reduced gaussian kernel
-
The GRNN operation
-
The input layer simply passes the patterns x to all units in the hidden layers composed by kernels functions exp(-(Di2/2 ρ 2)) and computes the squared distances among the new pattern x and xi training samples
-
The hidden-to-output weights are just the targets yi, thus the output y(x), is simply a weighted average of the target values yi of the training cases xi close to the given input case x.