Correlation Analysis of Environmental Pollutants and Meteorological Variables Applying Neuronal Networks

A. Vega-Coronaa, Diego-Andinab, F.S. Buendía-Buendíab J.M. Barrón-Adamea

aUniversidad de Guanajuato, México.

bUniversidad Politécnica de Madrid, España.

Outline

.

 

Abstract

In order to develop an environmental contingency forecasting tool for decision making.
 These pollutant concentrations and meteorological variables are In this case a time series set obtained from Environmental Monitoring Network (EMN) of the city of Salamanca, Guanajuato, México is used. Results verify the potential of this method versus other statistical classification methods and also variables correlation is solved

Introduction

Air pollution is:

Pollution has causes and sources:
  1. Industrial, commercial, agricultural and domestic activities.
  1. Combustion, used to generate heat, electricity or movement (many pollutants are produced).

City of Salamanca Guanajuato in México is a special case with great pollution (In México, this city occupies the fourth place in pollution).

Introduction

An Environmental Monitoring Network (EMN) was installed three years ago, in which data time series about pollutant concentrations like Sulphur Dioxide SO2 and particles PM10 among other meteorological variables are obtained
                           Estaciones               

Introduction

Principal Air Pollutant Features

Clean air is a gassy mixture composed by: Pollutants: Air Quality Index

  • AQI is a simple number into a scale from 0 to 500.
  • AQI is a value to inform to the population on the health concern.

Introduction

Principal Air Pollutants Features

For Sulfur Dioxide SO2

For Suspended particles or breathable fragments with diameters among 0.3 to 10 microns PM10
  •   Continuous measuremment (24 hrs.) .
  •   150 m g/m3, equivalent to 100 AQI units.

What means AQI?

 
Air Quality
AQI values
Good
0 to 50
Moderate
51 to 100
Unhealthy for sensitive groups
101 to 150
Unhealthy
151 to 300
Dangerous
301 to 500

 

Health Concern Levels

Model and Theoretical Fundament

Model and Theoretical Fundament

We concider:

Variables definition

Variables are defined in order to build a feature vector xj and to
define a pattern set X*={x1,x2,..,xj,..,xn}.  Let XSO2  be a Sulfur
Dioxide set and let XPM10 be a particles concentration set and their
corresponding pattern is defined as xj={x1, x2,x3, x4}.

 

 

XSO2
XPM10
Variables xi
xj
xj
X1
SO2
PM10
X2
T
T
X3
HR
HR
x4
VV
VV

Proposed Model

Data Base and Pre-processing

  • We obtain a real and historical time series database from the EMN.
  • Data series from December to February and from 2002 to 2005 are used.
  • Time series consider a total of 6,480 multidimensional patterns about pollutant and meteorological variables.

Proposed Model


 
  • In Figure, the complicated nature of this problem is shown.
  • In figure, a typical day data is shown.
  • Correlation between SO2 and Meteorological variables
  • SOM1

    Black Box

    • Data time series are obtained from EMN
    • Data time series are self-organized by means of a SOM Neural Network in different classes.
    • Classes are used in training fase of a General Regression Neural classifier (GRNN) to provide an air quality forecast.

    blackbox

    Clustering Method

    Health concern levels respect to Air Quality Index and their category map representation for SO2 and PM10

     
    Air Quality
    AQI
    Cluster Prototype
    Good
    0 to 50
    Z1
    Moderate
    51 to 100
    Z2
    Unhealthy for Sensitive Groups
    101 to 150
    Z3
    Unhealthy
    151 to 300
    Z4
    Dangerous
    301 to 500
    Z5
    * Noise
    Z6

    Clustering Method

    Self-Organized Neural Network

     SOM

    Zi={μi1, μi2, μi3, μi4

    • Where Zi is the class center

    μi1is the SO2 or PM10 concentration.

    μi2 is the Temperature

    μi3is the Relative Humedity

    μi4is the Wind Speed

    Clustering Method

    A prior knowledge about patterns is unavailable. SOM Neural Network
    • Mapping a high dimension feature space to a much lower dimension output map
    • Preserving the topological order.
    To build pattern sets in order to design a classifier
    • In order to have six clusters and therefore six prototypes or weights Zi like Index Classification.
    The center for each class is build as
    • Zi = {μi1, μi2, μi3, μi4};
    Where μi1 is XSO2 or XPM10, μi2 is the Temperature , μi3 is the Relative Humedity and μi4 is The Wind Speed.

    Introduction

    Principal Air Pollutants Features

    • The air quality evaluation

    Sulfur Dioxide SO2

    • (24 hrs.) Continuous measuremment.
    •   340 m g/m3 (0.13 ppm), equivalent to 100 AQI Units).

    Suspended particles or breathable fragments with diameters among 0.3 to 10 microns PM10

    •   (24 hrs.) Continuous measuremment.
    •   150 m g/m3, equivalent to 100 AQI units.

    Introduction

    The Air Quality Index (AQI)

    • Provides daily information on the air pollution concentration.
    • Is a value to inform at the population on the actions to reduce the air pollution or environmental forecasting.
    • Is a simple number into a scale from 0 to 500.

     
    Air Quality
    AQI values
    Good
    0 to 50
    Moderate
    51 to 100
    Unhealthy for sensitive groups
    101 to 150
    Unhealthy
    151 to 300
    Dangerous
    301 to 500

    Model and Theoretical Fundament

     

    • The proposed method considers an automatic multivariable data analysis of time series obtained from EMN.
    • To determine the correlation among all the variables involved in the decision making exercise on health risk for the population.

    Model and Theoretical Fundament

    • Black Box

    • In some multi-dimensional engineering problems is necessary to recognize certain patterns without the necessity of knowing of data nature or their statistical distribution.
    • Some patterns recognition techniques apply NNs to solve problems without the necessity of a prior data distribution knowledge or to make statistical suppositions.
    • Consequently, NNs is an ideal tool to solve the problem here exposed due to their operation which is analyzed like a black box that minimizes the energy function
    • BLACKBOX

    Model and Theoretical Fundament

    • Variables definition

    variables are defined in order to build a feature vector xj and to define a pattern set X*={x1,x2,..,xj,..,xn}.  Let XSO2  be a Sulfur Dioxide set and let XPM10 be a particles concentration set and their corresponding pattern is defined as xj={x1, x2,x3, x4}

     

     

    XSO2
    XPM10
    Variables xi
    xj
    xj
    X1
    SO2
    PM10
    X2
    T
    T
    X3
    HR
    HR
    X4
    VV
    VV

    Model and Theoretical Fundament

    Proposed Model

    • Data Base and Pre-processing

    1.   A real and historical time series database from the EMN.
    2. Data series from December to February and from 2002 to 2005 are used.
    3. Time series consider a total of 6,480 multidimensional patterns about pollutant and meteorological variables.

     

    Model and Theoretical Fundament

    Proposed Model

    1. The complicated nature of this problem.
    2. A typical day data.
    3. Correlation between SO2 and Meteorological variables

     

    Model and Theoretical Fundament

    Clustering Method

    Health concern levels respect to Air Quality Index and their category map representation for SO2 and PM10

     
    Air Quality
    AQI
    Cluster Prototype
    Good
    0 to 50
    Z1
    Moderate
    51 to 100
    Z2
    Unhealthy for Sensitive Groups
    101 to 150
    Z3
    Unhealthy
    151 to 300
    Z4
    Dangerous
    301 to 500
    Z5
    * Noise

     

    Z6

    Model and Theoretical Fundament

    Clustering Method

    SOMap

    Model and Theoretical Fundament

    Clustering Method

    • A prior knowledge about patterns is unavailable.

    • SOM Neural Network

    1. Mapping a high dimension feature space to a much lower dimension output map

    2. Preserving the topological order.

    • To build pattern sets in order to design a classifier

    1. In order to have six clusters and therefore six prototypes or weights Zi

    • The center for each class is build as

    Zi = {μi1, μi2, μi3, μi4}

    Classifier Design

    • The main advantage of GRNN.

    • The GRNN needs only a single learning pass to achieve optimal performance in classification.
    • The estimator

    • A reduced gaussian kernel
    • The GRNN operation

    1. The input layer simply passes the patterns x to all units in the hidden layers composed by kernels functions  exp(-(Di2/2 ρ 2)) and computes the squared distances among the new pattern x and xi training samples
    2. The hidden-to-output weights are just the targets yi, thus the output y(x), is simply a weighted average of the target values yi of the training cases xi close to the given input case x.