Esmvaltool: Change example correlation diagnostic so it uses a more meaningful statistic

Created on 2 Oct 2018  路  3Comments  路  Source: ESMValGroup/ESMValTool

So we do not forget the discussion in #596, this code
https://github.com/ESMValGroup/ESMValTool/blob/60a89f7828025c599615bcb5932b1917a40fb333/esmvaltool/diag_scripts/examples/correlate.py#L48

should probably be updated so it uses either:
scipy.stats.mstats.ks_twosamp
scipy.stats.ks_2samp
or this:
scipy.stats.anderson_ksamp
as some people seem a bit critical about the KS test.

diagnostic

Most helpful comment

assigned myself on this one, my intention is to start looking into developing a serious statistical module for ESMValTool, this is a good starting point

All 3 comments

assigned myself on this one, my intention is to start looking into developing a serious statistical module for ESMValTool, this is a good starting point

Regarding this issue/enhancement here there is information that I hope will be useful.

  • It might be interesting to check the _R-Forge libraries_, for instance, those related with the Wilcox robust statistics functions (https://rdrr.io/rforge/WRS/man/) or those in robustbase (https://rdrr.io/rforge/robustbase/man/). Some of them are already implemented on scipy but actually not all. It is useful to have in mind the package rpy2 for reuse or double checking.

  • In general the Pearson cross-correlation is not robust and assumes similar properties on the joint-distribution than the linear-regression. However, there are slight improvements that could solve at least the outlier dependency: like the percentage bend correlation coefficient (https://link.springer.com/article/10.1007/BF02294395) or Winsorized-correlation (that only relies on the trimmed mean and trimmed var ).

  • About the ksamples methods like those above mentioned, Anderson-Darling, Kruskal-Wallis etc, the ksamples package has information but it needs to know something about rank based tests. Other possibilities are rank correlation measures.

We will also support R diagnostics in the near future, see https://github.com/ESMValGroup/ESMValTool/pull/631, so no need to use rpy2.

Was this page helpful?
0 / 5 - 0 ratings