Esmvaltool: Change example correlation diagnostic so it uses a more meaningful statistic

Created on 2 Oct 2018 · 3Comments · Source: ESMValGroup/ESMValTool

So we do not forget the discussion in #596, this code
https://github.com/ESMValGroup/ESMValTool/blob/60a89f7828025c599615bcb5932b1917a40fb333/esmvaltool/diag_scripts/examples/correlate.py#L48

should probably be updated so it uses either:
scipy.stats.mstats.ks_twosamp
scipy.stats.ks_2samp
or this:
scipy.stats.anderson_ksamp
as some people seem a bit critical about the KS test.

diagnostic

Source

bouweandela

Most helpful comment

assigned myself on this one, my intention is to start looking into developing a serious statistical module for ESMValTool, this is a good starting point

valeriupredoi on 2 Oct 2018

🎉2

All 3 comments

assigned myself on this one, my intention is to start looking into developing a serious statistical module for ESMValTool, this is a good starting point

valeriupredoi on 2 Oct 2018

🎉2

Regarding this issue/enhancement here there is information that I hope will be useful.

It might be interesting to check the _R-Forge libraries_, for instance, those related with the Wilcox robust statistics functions (https://rdrr.io/rforge/WRS/man/) or those in robustbase (https://rdrr.io/rforge/robustbase/man/). Some of them are already implemented on scipy but actually not all. It is useful to have in mind the package rpy2 for reuse or double checking.
In general the Pearson cross-correlation is not robust and assumes similar properties on the joint-distribution than the linear-regression. However, there are slight improvements that could solve at least the outlier dependency: like the percentage bend correlation coefficient (https://link.springer.com/article/10.1007/BF02294395) or Winsorized-correlation (that only relies on the trimmed mean and trimmed var ).
About the ksamples methods like those above mentioned, Anderson-Darling, Kruskal-Wallis etc, the ksamples package has information but it needs to know something about rank based tests. Other possibilities are rank correlation measures.

RCHG on 16 Oct 2018

We will also support R diagnostics in the near future, see https://github.com/ESMValGroup/ESMValTool/pull/631, so no need to use rpy2.

bouweandela on 16 Oct 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Sanity check: compare time range from filename with actual cube time range

valeriupredoi · 5Comments

Monthly ESMValtool meeting December

bouweandela · 4Comments

setup.jl fails due to YAML (in)compatibility with Compat 3.5

jhardenberg · 5Comments

Why do we want to CMORize all observational datasets?

bascrezee · 5Comments

Variables missing for Autoassess area metrics

valeriupredoi · 4Comments