I'm proposing addition of an additional optional parameter for pct_change calculation to help with negative numbers calculations. The new optional calculation should be calculated as s.diff()/s.shift().abs()
.
s = pd.Series([-1,-2,-1, 1])
res = pd.concat((s, s.pct_change(), s.diff()/s.shift().abs()),1)
res.columns = ['series','current pct_change','proposed alternative']
res
series current pct_change proposed alternative
0 -1 NaN NaN
1 -2 1.0 -1.0
2 -1 -0.5 0.5
3 1 -2.0 2.0
In finance, when going from -1 to -2 one is going from less debt to more debt , hence one would expect negative -100% change not positive +100% change. Say you had -1$ debt, and now you have -2$ debt, your debt increased by -100%. In absolute terms, the value changed by 100%, but the direction is negative not positive. Similarly, when going reducing debt from -2 to -1, pandas produces -50% instead of +50%.
A more complex example is going from -1 (negative) to +1 (positive), one would think of that as a positive change, that is a person went from negative balance to positive balance. However, pandas produces negative -200% instead of +200%.
The issue is that there is no universal approach for calculating pct_change when there are negative numbers in the series. For that reason, pandas should provide an optional parameter for dealing with negative numbers.
To give you more details of the issue, why pandas pct_change calculation produces nonsensical values when negative numbers are in the series, you can take a look at this article:
(https://www.excelcampus.com/functions/percentage-change-formula-negative-numbers/)
pd.show_versions()
pandas: 0.23.4
Original post updated to incorporate below comments.
Can you please strip down your example to the bare minimum code that shows the issue? There is a decent amount of extraneous items here that make it more difficult to sift through
Here are two examples only . If we read the below table, row by row, like a standard excel sheet, and if we apply, df.pct_change(), current incorrect formula = new/old -1, we'll get values shown in column "incorrect", which is what pandas is doing at the moment. If we apply proposed changed (new - old)/abs(old), then we will get values shown in the correct column.
old new incorrect correct
-1 -2 1.0 -1.0
-2 1 -1.5 1.5
The code that produces above table
data = [-1,-2,1]
df = pd.DataFrame(data,range(len(data)))
dfc = pd.concat([df.shift(), df, df.pct_change(), (df-df.shift())/df.shift().abs() ],1)
dfc.columns = ['old','new','incorrect','correct']
dfc.dropna()
Here is a simplified example. For future reference the code you've provided is too complex and brings in way too many things that are unrelated to the point you are trying to make, which is confusing for those trying to help
In [14]: data = [-1,-2,1]
...: df = pd.DataFrame(data,range(len(data)))
...:
In [15]: df
Out[15]:
0
0 -1
1 -2
2 1
In [16]: df.pct_change()
Out[16]:
0
0 NaN
1 1.0
2 -1.5
I believe your point is that the percent change values should be NaN
, -1.0
and 1.5
. I'm not sure that's the case however as -2 is in fact a doubling of -1 so a 100% increase there makes sense. Are you seeing a formal definition elsewhere that conflicts with this?
In finance, when going from -1 to -2 one is going from less debt to more debt , hence one would expect negative -100% change not positive +100% change. Say you had -1$ debt, and now you have -2$ debt, your debt increased by -100%. In absolute terms, the value changed by 100%, but the direction is negative not positive.
A more complex example is going from -2 (negative) to +1 (positive), one would think of that as a positive change, that is a person went from negative balance to positive balance. However, pandas produces negative -1.5 or -150%. That is completely counter intuitive.
The issue is that there is no universal approach for calculating pct_change when there are negative numbers in the series. For that reason, pandas should provide an optional parameter for dealing with negative numbers.
To give you more details of the issue, why pandas pct_change calculation produces nonsensical numbers, you can take a look at this article that goes in more details:
(https://www.excelcampus.com/functions/percentage-change-formula-negative-numbers/)
The issue is that there is no universal approach for calculating pct_change when there are negative numbers in the series. For that reason, pandas should provide an optional parameter for dealing with negative numbers.
Agreed on the first point but not on the second. I don't see a keyword providing much value here over a user using abs
as desired to get the calculation they want. Closing as a result
probably a bit late to the game :)
But something like :
import numpy as np
df[column].pct_change()*np.sign(df[column].shift(periods=1))
should do the job
Most helpful comment
probably a bit late to the game :)
But something like :
import numpy as np
df[column].pct_change()*np.sign(df[column].shift(periods=1))
should do the job