So we do not forget the discussion in #596, this code
https://github.com/ESMValGroup/ESMValTool/blob/60a89f7828025c599615bcb5932b1917a40fb333/esmvaltool/diag_scripts/examples/correlate.py#L48
should probably be updated so it uses either:
scipy.stats.mstats.ks_twosamp
scipy.stats.ks_2samp
or this:
scipy.stats.anderson_ksamp
as some people seem a bit critical about the KS test.
assigned myself on this one, my intention is to start looking into developing a serious statistical module for ESMValTool, this is a good starting point
Regarding this issue/enhancement here there is information that I hope will be useful.
It might be interesting to check the _R-Forge libraries_, for instance, those related with the Wilcox robust statistics functions (https://rdrr.io/rforge/WRS/man/) or those in robustbase (https://rdrr.io/rforge/robustbase/man/). Some of them are already implemented on scipy but actually not all. It is useful to have in mind the package rpy2 for reuse or double checking.
In general the Pearson cross-correlation is not robust and assumes similar properties on the joint-distribution than the linear-regression. However, there are slight improvements that could solve at least the outlier dependency: like the percentage bend correlation coefficient (https://link.springer.com/article/10.1007/BF02294395) or Winsorized-correlation (that only relies on the trimmed mean and trimmed var ).
About the ksamples methods like those above mentioned, Anderson-Darling, Kruskal-Wallis etc, the ksamples package has information but it needs to know something about rank based tests. Other possibilities are rank correlation measures.
We will also support R diagnostics in the near future, see https://github.com/ESMValGroup/ESMValTool/pull/631, so no need to use rpy2.
Most helpful comment
assigned myself on this one, my intention is to start looking into developing a serious statistical module for ESMValTool, this is a good starting point