I'm trying to find the distance correlation between columns, look at the code below. Most of time it returns higher than 1 result, which is not possible, because distance correlation is between 0 and 1. The actual returning value should around 0.5
import numpy as np
from scipy.spatial import distance
x = np.random.uniform(-1, 1, 10000)
print distance.correlation(x, x**2)
What is wrong here?
Are you sure the function is supposed to compute distance correlation?
The documentation says it computes correlation distance, which maybe is a different thing, cf eg https://reference.wolfram.com/language/ref/CorrelationDistance.html ?
Closing, the function appears to be computing correlation distance --- however, this quantity probably has little relation distance correlation. Please reopen if doubts remain.
It's not a distance correlation which is a nonlinear measure of dependence.
e.g. my take http://jpktd.blogspot.ca/2012/06/non-linear-dependence-measures-distance.html
However, "correlation" in scipy.spatial.distance.correlation is a bit misleading because according to the formula in the docstring it's a distance measure and not a correlation.
perfectly correlated with correlation coefficient equal to 1 has zero distance
perfectly negatively correlated with correlation coefficient equal to -1 has maximal distance at 2.
Most helpful comment
It's not a distance correlation which is a nonlinear measure of dependence.
e.g. my take http://jpktd.blogspot.ca/2012/06/non-linear-dependence-measures-distance.html
However, "correlation" in scipy.spatial.distance.correlation is a bit misleading because according to the formula in the docstring it's a distance measure and not a correlation.
perfectly correlated with correlation coefficient equal to 1 has zero distance
perfectly negatively correlated with correlation coefficient equal to -1 has maximal distance at 2.