Statistics: Difference between revisions

From CIRPwiki

Revision as of 22:15, 2 November 2012

Given the initial measured values $x_{0}$ , final observed or measured values $x_{m}$ and final calculated values $x_{c}$ , there are several goodness-of-fit statistics which can be calculated. The definition for some of the more common ones are provided below.

Brier Skill Score

The Bier Skill Score (BSS) is given by

BSS=1-{\frac {{\bigg \langle }{\big (}x_{m}-x_{c}{\big )}^{2}{\bigg \rangle }}{{\bigg \langle }{\big (}x_{m}-x_{0}{\big )}^{2}{\bigg \rangle }}}

(1)

The BSS ranges between negative infinity and one. A BSS value of 1 indicates a perfect agreement between measured and calculated values. Scores equal to or less than 0 indicates that the initial value is as or more accurate than the calculated values. Recommended qualifications for different BSS ranges are provided in Table 1.

Table 1. Brier Skill Score Qualifications

Range	Qualification
0.8<BSS<1.0	Excellent
0.6<BSS<0.8	Good
0.3<BSS<0.6	Reasonable
0<BSS<0.3	Poor
BSS<0	Bad

Example Matlab Code:

 BSS = 1 - mean((xc(:)-xm(:)).^2)/mean((xm(:)-x0(:)).^2);

Nash-Sutcliffe Coefficient

The Nash-Sutcliffe Coefficient (E) is commonly used to assess the predictive power of a model. It is defined as

E=1-{\frac {{\bigg \langle }{\big (}x_{m}-x_{c}{\big )}^{2}{\bigg \rangle }}{{\bigg \langle }{\big (}x_{m}-\langle x_{m}\rangle {\big )}^{2}{\bigg \rangle }}}

(2)

The Nash-Sutcliffe efficiency coefficient ranges from negative infinity to one. An efficiency of 1 corresponds to a perfect match between measured and calculated values. An efficiencies equal 0 or less indicates that the mean observed value is as or more accurate than the calculated values. Recommended qualifications for difference E ranges are provided in Table 2.

Table 2. Nash-Sutcliffe Coefficient Qualifications

Range	Qualification
0.8<E<1.0	Excellent
0.6<E<0.8	Good
0.3<E<0.6	Reasonable
0<E<0.3	Poor
E<0	Bad

Example Matlab Code:

 E = 1 - mean((xc(:)-xm(:)).^2)/mean((xm(:)-mean(xm(:))).^2);

Root-Mean-Squared Error

The Root-Mean-Squared Error (RMSE) also referred to as Root-Mean-Squared Deviation (RMSD) is defined as

RMSE={\sqrt {{\bigg \langle }{\big (}x_{m}-x_{c}{\big )}^{2}{\bigg \rangle }}}

(3)

The RMSE has the same units as the measured and calculated data. Smaller values indicate better agreement between measured and calculated values.

Example Matlab Code:

 RMSE = sqrt(mean((xc(:)-xm(:)).^2));

Mean-Absolute Error

The mean absolute error is given by

MAE={\bigg \langle }{\big |}x_{m}-x_{c}{\big |}{\bigg \rangle }

(4)

Similarly to the RMSE, smaller MAE values indicate better agreement between measured and calculated values.

Example Matlab code:

 MAE = mean(abs(xc(:)-xm(:)));

Correlation coefficient is defined as

Correlation is a measure of the strength and direction of a linear relationship between two variables. The correlation coefficient $R$ is defined as

R={\frac {\langle x_{m}x_{c}\rangle -\langle x_{m}\rangle \langle x_{c}\rangle }{{\sqrt {\langle x_{m}^{2}\rangle -\langle x_{m}\rangle ^{2}}}{\sqrt {\langle x_{c}^{2}\rangle -\langle x_{c}\rangle ^{2}}}}}

(5)

A correlation of 1 indicates a perfect one-to-one linear relationship and -1 indicates a negative relationship. The square of the correlation coefficient describes how much of the variance between two variables is described by a linear fit.

Example Matlab code:

 R = corrcoef(yc,ym);

Bias

The bias is a measure of the over or under estimation and is defined as

B=\langle x_{c}-x_{m}\rangle

(6)

The bias is a measure of the over or under prediction of a variable. Positive values indicate overprediction and negative values indicate underprediction.

Example Matlab code:

 B = mean(xc(:)-xm(:));

Normalization

The dimensional statistics above, namely RMSE, MAE, and B; can be normalized to produce a nondimensional statistic. When the variable is normalized the statistic is commonly prefixed by a letter N for normalized or R for relative (e.g. NRMSE, EMAE, and NB). This also has facilitates the comparison between different datasets or models which have different scales. For example, when comparing models to laboratory data the dimensional statistics will produce relatively smaller dimensional goodness-of-fit statistics compared to field data comparisons. One drawback of normalization is that there is no consistent means of normalization. Different types of data or normalized differently literature. For example, water levels are commonly normalized by the tidal range, while wave heights may be normalized by the offshore wave height. In some cases, the range of the measured data is a good choice. The range is defined as the maximum value minus the minimum value.

x_{N}=range(x_{m})=\max {(x_{m})}-\min {(x_{m})}

(7)

More important than the choice of normalization variable is to properly describe how the statistics have been normalized.

Symbols

A description of all the symbols in the equations above is provided in Table 3.

Table 3. Description of symbols

Symbol	Description
$x_{m}$	Measured values
$x_{c}$	Calculated values
$x_{0}$	Initial measured values
$x_{N}$	Normalization value
\langle \rangle	Expectation (averaging) operator

Documentation Portal

Retrieved from "https://cirpwiki.info/index.php?title=Statistics&oldid=9346"

@@ Line 7: / Line 7: @@
 </math>|1}}
-where <math>x_m</math> is the measured or observed values, <math>x_c</math> is the calculated values, <math>x_0</math> is the initial measured values and the angled brackets indicate averaging. The BSS ranges between negative infinity and one. A BSS value of 1 indicates a perfect agreement between measured and calculated values. Scores equal to or less than 0 indicates that the initial value is as or more accurate than the calculated values. Recommended qualifications for different BSS ranges are provided in Table 1.
+The BSS ranges between negative infinity and one. A BSS value of 1 indicates a perfect agreement between measured and calculated values. Scores equal to or less than 0 indicates that the initial value is as or more accurate than the calculated values. Recommended qualifications for different BSS ranges are provided in Table 1.
 '''Table 1. Brier Skill Score Qualifications'''
@@ Line 33: / Line 33: @@
 </math>|2}}
-where <math>x_m</math> is the measured or observed values, <math>x_c</math> is the calculated values, and the angled brackets indicate averaging. The Nash-Sutcliffe efficiency coefficient ranges from negative infinity to one. An efficiency of 1 corresponds to a perfect match between measured and calculated values. An efficiencies equal 0 or less indicates that the mean observed value is as or more accurate than the calculated values. Recommended qualifications for difference E ranges are provided in Table 2.
+The Nash-Sutcliffe efficiency coefficient ranges from negative infinity to one. An efficiency of 1 corresponds to a perfect match between measured and calculated values. An efficiencies equal 0 or less indicates that the mean observed value is as or more accurate than the calculated values. Recommended qualifications for difference E ranges are provided in Table 2.
 '''Table 2. Nash-Sutcliffe Coefficient Qualifications'''
@@ Line 59: / Line 59: @@
 </math>|3}}
-where where <math>x_m</math> is the measured or observed  values, <math>x_c</math> is the calculated values, and the angled brackets indicate averaging. The RMSE has the same units as the measured and calculated data. Smaller values indicate better agreement between measured and calculated values.
+The RMSE has the same units as the measured and calculated data. Smaller values indicate better agreement between measured and calculated values.
 Example Matlab Code:
@@ Line 70: / Line 70: @@
 </math>|4}}
-where where <math>x_m</math> is the measured or observed  values, <math>x_c</math> is the calculated values, and the angled brackets indicate averaging.
+Similarly to the RMSE, smaller MAE values indicate better agreement between measured and calculated values.
 Example Matlab code:
@@ Line 81: / Line 81: @@
 </math>|5}}
-where where <math>x_m</math> is the measured or observed  values, <math>x_c </math> is the calculated values, and the angled brackets indicate averaging. A correlation of 1 indicates a perfect one-to-one linear relationship and -1 indicates a negative relationship. The square of the correlation coefficient describes how much of the variance between two variables is described by a linear fit.
+A correlation of 1 indicates a perfect one-to-one linear relationship and -1 indicates a negative relationship. The square of the correlation coefficient describes how much of the variance between two variables is described by a linear fit.
 Example Matlab code:
@@ Line 92: / Line 92: @@
 </math>|6}}
-where <math>x_m</math> is the measured or observed values, <math>x_c</math> is the calculated values, and the angled brackets indicate averaging. The bias is a measure of the over or under prediction of a variable. Positive values indicate overprediction and negative values indicate underprediction.
+The bias is a measure of the over or under prediction of a variable. Positive values indicate overprediction and negative values indicate underprediction.
 Example Matlab code:
@@ Line 105: / Line 105: @@
 More important than the choice of normalization variable is to properly describe how the statistics have been normalized.
-----
+== Symbols ==
+A description of all the symbols in the equations above is provided in Table 3.
 '''Table 3. Description of symbols'''
 {|border="1"

Statistics: Difference between revisions

Revision as of 22:15, 2 November 2012

Contents

Brier Skill Score

Nash-Sutcliffe Coefficient

Root-Mean-Squared Error

Mean-Absolute Error

Correlation coefficient is defined as

Bias

Normalization

Symbols

Navigation menu

Statistics: Difference between revisions

Revision as of 22:15, 2 November 2012

Brier Skill Score

Nash-Sutcliffe Coefficient

Root-Mean-Squared Error

Mean-Absolute Error

Correlation coefficient is defined as

Bias

Normalization

Symbols

Navigation menu

Search