Statistics: Difference between revisions

From CIRPwiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(93 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Given the initial measured values <math>x_0</math>, final observed or measured values <math>x_m</math> and final calculated values <math>x_c</math>, there are several goodness-of-fit statistics which can be calculated. The definition for some of the more common ones are provided below.  
Given the initial measured values <math>x_0</math>, final observed or measured values <math>x_m</math> and final calculated values <math>x_c</math>, there are several goodness-of-fit statistics which can be calculated. The definition for some of the more common ones are provided below.  


== Brier Skill Score ==
= Dimensional Statistics =
The Bier Skill Score (BSS) is given by
== Mean Error ==
The mean error (ME), also referred to as bias (B) is given by
{{Equation|<math>
{{Equation|<math>
  BSS = 1 - \frac{\bigg\langle \big(x_m-x_c\big)^2 \bigg\rangle}{\bigg \langle \big(x_m-x_0\big)^2 \bigg\rangle }
  ME =  \langle x_c - x_m  \rangle  =  \langle x_c \rangle - \langle x_m \rangle  
</math>|1}}
</math>|1}}


where <math>x_m</math> is the measured or observed values, <math>x_c</math> is the calculated values, <math>x_0</math> is the initial measured values and the angled brackets indicate averaging. The BSS ranges between negative infinity and one. A BSS value of 1 indicates a perfect agreement between measured and calculated values. Scores equal to or less than 0 indicates that the initial value is as or more accurate than the calculated values. Recommended qualifications for different BSS ranges are provided in Table 1.
Smaller absolute ME values indicate better agreement between measured and calculated values. Positive values indicate positively biased computed values (overprediction) while negative values indicate negatively biased computed values (underprediction).  


'''Table 1. Brier Skill Score Qualifications'''
Example Matlab code:
{|border="1"
  ME = mean(xc(:)-xm(:));
|'''Range''' ||'''Qualification'''
 
|-
== Mean-Absolute Error ==
|0.8<BSS<1.0 || Excellent
The mean absolute error is given by
|-
{{Equation|<math>
|0.6<BSS<0.8  || Good
  MAE =  \bigg\langle \big| x_c - x_m \big| \bigg\rangle
|-
</math>|2}}
|0.3<BSS<0.6  || Reasonable
 
|-
Similarly to the RMSE, smaller MAE values indicate better agreement between measured and calculated values.
|0<BSS<0.3 || Poor
 
|-
Example Matlab code:
|BSS<0 || Bad
  MAE = mean(abs(xc(:)-xm(:)));
|}
 
== Root-Mean-Squared Error ==
The Root-Mean-Squared Error (RMSE) also referred to as Root-Mean-Squared Deviation (RMSD) is defined as
{{Equation|<math>
  RMSE = \sqrt{ \bigg\langle \big( x_c - x_m  \big)^2  \bigg\rangle  }
</math>|3}}
 
The RMSE has the same units as the measured and calculated data. Smaller values indicate better agreement between measured and calculated values.  
 
Example Matlab Code:
  RMSE = sqrt(mean((xc(:)-xm(:)).^2));
 
== Standard Deviation of Residuals ==
The standard deviation of residuals (SDR) is calculated as
{{Equation|<math>
  SDR = \sqrt{ \bigg\langle \bigg[ (x_c - x_m  \big)  - (\langle x_c \rangle - \langle x_m \rangle) \bigg]^2}
</math>|4}}
 
SDR is a measure of the dynamical correspondence. Smaller values indicate better agreement. The RMSE, ME, STD are related by the following formula
 
{{Equation|<math>
  RMSE^2 = ME^2 + SDR^2
</math>|5}}


Example Matlab Code:  
Example Matlab Code:  
   BSS = 1 - mean((xc(:)-xm(:)).^2)/mean((xm(:)-x0(:)).^2);
   SDR= sqrt(mean((xc(:)-xm(:)-mean(xc(:))+mean(xm(:))).^2));
 
== Normalization ==
The dimensional statistics above, namely RMSE, MAE, and B; can be normalized to produce a nondimensional statistic. When the variable is normalized the statistic is commonly prefixed by a letter N for normalized or R for relative (e.g. NRMSE, EMAE, and NB). This also has facilitates the comparison between different datasets or models which have different scales. For example, when comparing models to laboratory data the dimensional statistics will produce relatively smaller dimensional goodness-of-fit statistics compared to field data comparisons. One drawback of normalization is that there is no consistent means of normalization. Different types of data or normalized differently literature. For example, water levels are commonly normalized by the tidal range, while wave heights may be normalized by the offshore wave height. In some cases, the range of the measured data is a good choice. The range is defined as the maximum value minus the minimum value.
 
{{Equation|<math>x_N=range(x_m)=\max(x_m)-\min(x_m)</math>|6}}
 
Another common approach to nomralization is to use the mean value of the measurements
{{Equation|<math>x_N =  mean(x_m) </math>|7}}
 
When the RMS value is normalized by the mean measured value, is sometimes referred to as the scatter index (SI) (Zambresky 1989). When the RMS value is normalized by a specific measured value used to drive a model, it is sometimes referred to as the Operational Performance Index (OPI) (Ris et al. 1999). The OPI can be used for example to give an estimate of the performance of a nearshore wave height transformation model based on the offshore measured wave height.
 
More important than the choice of normalization variable is to properly describe how the statistics have been normalized.


== Nash-Sutcliffe Coefficient ==
= Nondimensional Statistics =
The Nash-Sutcliffe Coefficient (E) is commonly used to assess the predictive power of a model. It is defined as
== Performance Scores==
There are several goodness-of-fit statitics in literature of the form
{{Equation|<math>
{{Equation|<math>
  E = 1 - \frac{\bigg\langle \big(x_m-x_c\big)^2  \bigg\rangle}{\bigg\langle  \big(x_m- \langle x_m \rangle \big)^2 \bigg\rangle }   
  PS = 1 - \frac{\bigg\langle \big(x_c-x_m\big)^2  \bigg\rangle}{\bigg\langle  \big(x_m - x_R \big)^2 \bigg\rangle }   
</math>|2}}
</math>|8}}
 
where <math> x_R </math> is a reference value(s). When the reference value is equal to the base or initial measurements <math> x_R = x_0</math>, then the Peformance Score is referred to as the Brier Skill Score (BSS) or Brier Skill Index (BSI). When the reference value is equal to the mean measured value <math> x_R = \langle x_m \rangle</math>, then the Performance Score is referred to the Nash-Sutcliffe Coefficient (E) or Nash-Sutcliffe Score (ES) (Nash and Sutcliffe 1970). When the reference value is a specific measured value such as a model forcing value, then it is referred to as the Model Performance Index (MPI) or Model Performance Score (MPS).


where <math>x_m</math> is the measured or observed values, <math>x_c</math> is the calculated values, and the angled brackets indicate averaging. The Nash-Sutcliffe efficiency coefficient ranges from negative infinity to one. An efficiency of 1 corresponds to a perfect match between measured and calculated values. An efficiencies equal 0 or less indicates that the mean observed value is as or more accurate than the calculated values. Recommended qualifications for difference E ranges are provided in Table 2.
The various performance scores ranges between negative infinity and one. A performance score of 1 indicates a perfect agreement between measured and calculated values. Scores equal to or less than 0 indicates that the initial value is as or more accurate than the calculated values. Recommended qualifications for different BSS ranges are provided in Table 1.


'''Table 2. Nash-Sutcliffe Coefficient Qualifications'''
'''Table 1. Performance Score Qualifications'''
{|border="1"
{|border="1"
|'''Range''' ||'''Qualification'''
|'''Range''' ||'''Qualification'''
|-
|-
|0.8<E<1.0 || Excellent
|0.8<PS<1.0 || Excellent
|-
|-
|0.6<E<0.8  || Good
|0.6<PS<0.8  || Good
|-
|-
|0.3<E<0.6  || Reasonable
|0.3<PS<0.6  || Reasonable
|-
|-
|0<E<0.3 || Poor
|0<PS<0.3 || Poor
|-
|-
|E<0 || Bad
|PS<0 || Bad
|}
|}


Example Matlab Code:  
Example Matlab Code:  
   E = 1 - mean((xc(:)-xm(:)).^2)/mean((xm(:)-mean(xm(:))).^2);
   BSS = 1 - mean((xc(:)-xm(:)).^2)/mean((xm(:)-x0(:)).^2);
  ES = 1 - mean((xc(:)-xm(:)).^2)/mean((xm(:)-mean(xm(:))).^2);
  MPS = 1 - mean((xc(:)-xm(:)).^2)/mean((xm(:)-xR).^2);


== Root-Mean-Squared Error ==
== Index of Agreement ==
The Root-Mean-Squared Error (RMSE) also referred to as Root-Mean-Squared Deviation (RMSD) is defined as
The index of agreement (IA or d) is given by (Willmott et al. 1985)
{{Equation|<math>
{{Equation|<math>
   RMSE = \sqrt{ \bigg\langle \big( x_m - x_c \big)^2 \bigg\rangle  }
   IA = 1 - \frac{\bigg\langle \big(x_c-x_m\big)^2  \bigg\rangle}{\bigg\langle  \big(| x_c - \langle x_c \rangle | + | x_m - \langle x_m \rangle |\big)^2 \bigg\rangle }  
</math>|3}}
</math>|9}}


where where <math>x_m</math> is the measured or observed  values, <math>x_c</math> is the calculated values, and the angled brackets indicate averaging. The RMSE has the same units as the measured and calculated data. Smaller values indicate better agreement between measured and calculated values.  
The denominator in the above equation is referred to as the potential error. IA is a nondimensional and bounded measure with values closer to 1 indicating better agreement.  


Example Matlab Code:  
Example Matlab code:
   RMSE = sqrt(mean((xc(:)-xm(:)).^2));
   IA = 1 - mean((xc(:)-xm(:)).^2)/max(mean((abs(xc(:)-mean(xm(:)))+abs(xm(:)-mean(xm(:)))).^2),eps)
 
== Mean-Absolute Error ==
The mean absolute error is given by
{{Equation|<math>
  MAE =  \bigg\langle \big| x_m - x_c \big|  \bigg\rangle
</math>|4}}
 
where where <math>x_m</math> is the measured or observed  values, <math>x_c</math> is the calculated values, and the angled brackets indicate averaging.
 
Example Matlab code:  
  MAE = mean(abs(xc(:)-xm(:)));


== Correlation coefficient is defined as ==
== Correlation Coefficient ==
Correlation is a measure of the strength and direction of a linear relationship between two variables. The correlation coefficient <math> R </math> is defined as  
The correlation is a measure of the strength and direction of a linear relationship between two variables. The correlation coefficient <math> R </math> is defined as  
{{Equation|<math>
{{Equation|<math>
   R = \frac { \langle x_m x_c \rangle - \langle x_m \rangle \langle x_c \rangle  }{ \sqrt{ \langle x_m^2 \rangle - \langle x_m \rangle ^2} \sqrt{ \langle x_c^2 \rangle - \langle x_c \rangle ^2} }
   R = \frac { \langle x_m x_c \rangle - \langle x_m \rangle \langle x_c \rangle  }{ \sqrt{ \langle x_m^2 \rangle - \langle x_m \rangle ^2} \sqrt{ \langle x_c^2 \rangle - \langle x_c \rangle ^2} }
</math>|5}}
</math>|10}}


where where <math>x_m</math> is the measured or observed  values, <math>x_c </math> is the calculated values, and the angled brackets indicate averaging. A correlation of 1 indicates a perfect one-to-one linear relationship and -1 indicates a negative relationship. The square of the correlation coefficient describes how much of the variance between two variables is described by a linear fit.
A correlation of 1 indicates a perfect one-to-one linear relationship and -1 indicates a negative relationship. The square of the correlation coefficient describes how much of the variance between two variables is described by a linear fit.


Example Matlab code:   
Example Matlab code:   
   R = corrcoef(yc,ym);
   R = corrcoef(yc,ym);


==Bias ==
= References =
The bias is a measure of the over or under estimation and is defined as
*  Nash, J.E., and Sutcliffe, J.V. 1970. River flow forecasting through conceptual models part I — A discussion of principles, Journal of Hydrology, 10(3), 282–290.
{{Equation|<math>
  B =  \langle x_c - x_m \rangle
</math>|6}}


where <math>x_m</math> is the measured or observed values, <math>x_c</math> is the calculated values, and the angled brackets indicate averaging. The bias is a measure of the over or under prediction of a variable. Positive values indicate overprediction and negative values indicate underprediction.
* Ris, R.C., Holthuijsen, L.H., and Booij, N. 1999. A third-generation wave model for coastal regions 2, verification. Journal of Geophysical Research, 104(C4) 7667-7681.


Example Matlab code: 
* Willmott, C.J., Ackleson, S.G., Davis, R.E., Feddema, J.J., Klink, K.M., Legates, D.R., O’Donnell, J., and Rowe, C.M. 1985. Statistics for the evaluation and comparison of models, Journal of Geophysical Research, 90(C5), 8995–9005.
  B = mean(xc(:)-xm(:));


== Normalization ==
* Zambreskey, L., 1988. A verification study of the global WAM model, December 1987 – November 1988. GKSS Forschungzentrum Geesthacht GMBH Report GKSS 89/E/37.
The dimensional statistics above, namely RMSE, MAE, and B; can be normalized to produce a nondimensional statistic. When the variable is normalized the statistic is commonly prefixed by a letter N for normalized or R for relative (e.g. NRMSE, EMAE, and NB). This also has facilitates the comparison between different datasets or models which have different scales. For example, when comparing models to laboratory data the dimensional statistics will produce relatively smaller dimensional goodness-of-fit statistics compared to field data comparisons. One drawback of normalization is that there is no consistent means of normalization. Different types of data or normalized differently literature. For example, water levels are commonly normalized by the tidal range, while wave heights may be normalized by the offshore wave height. In some cases, the range of the measured data is a good choice. The range is defined as the maximum value minus the minimum value.
{{Equation|<math>
  x_N =  range(x_m) = \max{(x_m)}-\min{(x_m)
</math>|7}}


More important than the choice of normalization variable is to properly describe how the statistics have been normalized.  
= Symbols =
A description of all the symbols in the equations above is provided in Table 3.


----
'''Table 3. Description of symbols'''
'''Table 3. Description of symbols'''{|border="1"
{|border="1"
|'''Symbol''' ||'''Description'''
|'''Symbol''' ||'''Description'''
|-
|-
Line 117: Line 139:
| <math>x_N</math>  || Normalization value
| <math>x_N</math>  || Normalization value
|-
|-
|\langle \rangle || Expectation (averaging) operator
| <math>\langle \rangle</math> || Expectation (averaging) operator
|}
|}



Latest revision as of 19:48, 5 June 2014

Given the initial measured values , final observed or measured values and final calculated values , there are several goodness-of-fit statistics which can be calculated. The definition for some of the more common ones are provided below.

Dimensional Statistics

Mean Error

The mean error (ME), also referred to as bias (B) is given by

  (1)

Smaller absolute ME values indicate better agreement between measured and calculated values. Positive values indicate positively biased computed values (overprediction) while negative values indicate negatively biased computed values (underprediction).

Example Matlab code:

 ME = mean(xc(:)-xm(:));

Mean-Absolute Error

The mean absolute error is given by

  (2)

Similarly to the RMSE, smaller MAE values indicate better agreement between measured and calculated values.

Example Matlab code:

 MAE = mean(abs(xc(:)-xm(:)));

Root-Mean-Squared Error

The Root-Mean-Squared Error (RMSE) also referred to as Root-Mean-Squared Deviation (RMSD) is defined as

  (3)

The RMSE has the same units as the measured and calculated data. Smaller values indicate better agreement between measured and calculated values.

Example Matlab Code:

 RMSE = sqrt(mean((xc(:)-xm(:)).^2));

Standard Deviation of Residuals

The standard deviation of residuals (SDR) is calculated as

  (4)

SDR is a measure of the dynamical correspondence. Smaller values indicate better agreement. The RMSE, ME, STD are related by the following formula

  (5)

Example Matlab Code:

 SDR= sqrt(mean((xc(:)-xm(:)-mean(xc(:))+mean(xm(:))).^2));

Normalization

The dimensional statistics above, namely RMSE, MAE, and B; can be normalized to produce a nondimensional statistic. When the variable is normalized the statistic is commonly prefixed by a letter N for normalized or R for relative (e.g. NRMSE, EMAE, and NB). This also has facilitates the comparison between different datasets or models which have different scales. For example, when comparing models to laboratory data the dimensional statistics will produce relatively smaller dimensional goodness-of-fit statistics compared to field data comparisons. One drawback of normalization is that there is no consistent means of normalization. Different types of data or normalized differently literature. For example, water levels are commonly normalized by the tidal range, while wave heights may be normalized by the offshore wave height. In some cases, the range of the measured data is a good choice. The range is defined as the maximum value minus the minimum value.

  (6)

Another common approach to nomralization is to use the mean value of the measurements

  (7)

When the RMS value is normalized by the mean measured value, is sometimes referred to as the scatter index (SI) (Zambresky 1989). When the RMS value is normalized by a specific measured value used to drive a model, it is sometimes referred to as the Operational Performance Index (OPI) (Ris et al. 1999). The OPI can be used for example to give an estimate of the performance of a nearshore wave height transformation model based on the offshore measured wave height.

More important than the choice of normalization variable is to properly describe how the statistics have been normalized.

Nondimensional Statistics

Performance Scores

There are several goodness-of-fit statitics in literature of the form

  (8)

where is a reference value(s). When the reference value is equal to the base or initial measurements , then the Peformance Score is referred to as the Brier Skill Score (BSS) or Brier Skill Index (BSI). When the reference value is equal to the mean measured value , then the Performance Score is referred to the Nash-Sutcliffe Coefficient (E) or Nash-Sutcliffe Score (ES) (Nash and Sutcliffe 1970). When the reference value is a specific measured value such as a model forcing value, then it is referred to as the Model Performance Index (MPI) or Model Performance Score (MPS).

The various performance scores ranges between negative infinity and one. A performance score of 1 indicates a perfect agreement between measured and calculated values. Scores equal to or less than 0 indicates that the initial value is as or more accurate than the calculated values. Recommended qualifications for different BSS ranges are provided in Table 1.

Table 1. Performance Score Qualifications

Range Qualification
0.8<PS<1.0 Excellent
0.6<PS<0.8 Good
0.3<PS<0.6 Reasonable
0<PS<0.3 Poor
PS<0 Bad

Example Matlab Code:

 BSS = 1 - mean((xc(:)-xm(:)).^2)/mean((xm(:)-x0(:)).^2);
 ES = 1 - mean((xc(:)-xm(:)).^2)/mean((xm(:)-mean(xm(:))).^2);
 MPS = 1 - mean((xc(:)-xm(:)).^2)/mean((xm(:)-xR).^2);

Index of Agreement

The index of agreement (IA or d) is given by (Willmott et al. 1985)

  (9)

The denominator in the above equation is referred to as the potential error. IA is a nondimensional and bounded measure with values closer to 1 indicating better agreement.

Example Matlab code:

 IA = 1 - mean((xc(:)-xm(:)).^2)/max(mean((abs(xc(:)-mean(xm(:)))+abs(xm(:)-mean(xm(:)))).^2),eps)

Correlation Coefficient

The correlation is a measure of the strength and direction of a linear relationship between two variables. The correlation coefficient is defined as

  (10)

A correlation of 1 indicates a perfect one-to-one linear relationship and -1 indicates a negative relationship. The square of the correlation coefficient describes how much of the variance between two variables is described by a linear fit.

Example Matlab code:

 R = corrcoef(yc,ym);

References

  • Nash, J.E., and Sutcliffe, J.V. 1970. River flow forecasting through conceptual models part I — A discussion of principles, Journal of Hydrology, 10(3), 282–290.
  • Ris, R.C., Holthuijsen, L.H., and Booij, N. 1999. A third-generation wave model for coastal regions 2, verification. Journal of Geophysical Research, 104(C4) 7667-7681.
  • Willmott, C.J., Ackleson, S.G., Davis, R.E., Feddema, J.J., Klink, K.M., Legates, D.R., O’Donnell, J., and Rowe, C.M. 1985. Statistics for the evaluation and comparison of models, Journal of Geophysical Research, 90(C5), 8995–9005.
  • Zambreskey, L., 1988. A verification study of the global WAM model, December 1987 – November 1988. GKSS Forschungzentrum Geesthacht GMBH Report GKSS 89/E/37.

Symbols

A description of all the symbols in the equations above is provided in Table 3.

Table 3. Description of symbols

Symbol Description
Measured values
Calculated values
Initial measured values
Normalization value
Expectation (averaging) operator

Documentation Portal