Pandas: pct_change calculation should have an additional optional parameter for calculation with negative numbers

Created on 5 Sep 2018  路  6Comments  路  Source: pandas-dev/pandas

Problem description

I'm proposing addition of an additional optional parameter for pct_change calculation to help with negative numbers calculations. The new optional calculation should be calculated as s.diff()/s.shift().abs().

Example code of current and proposed behavior

s = pd.Series([-1,-2,-1, 1])
res = pd.concat((s, s.pct_change(), s.diff()/s.shift().abs()),1)
res.columns = ['series','current pct_change','proposed alternative']
res

    series  current pct_change  proposed alternative
0    -1     NaN               NaN
1    -2         1.0              -1.0
2    -1        -0.5               0.5
3         1            -2.0           2.0

Intuition and Expected Behavior

In finance, when going from -1 to -2 one is going from less debt to more debt , hence one would expect negative -100% change not positive +100% change. Say you had -1$ debt, and now you have -2$ debt, your debt increased by -100%. In absolute terms, the value changed by 100%, but the direction is negative not positive. Similarly, when going reducing debt from -2 to -1, pandas produces -50% instead of +50%.

A more complex example is going from -1 (negative) to +1 (positive), one would think of that as a positive change, that is a person went from negative balance to positive balance. However, pandas produces negative -200% instead of +200%.

The issue is that there is no universal approach for calculating pct_change when there are negative numbers in the series. For that reason, pandas should provide an optional parameter for dealing with negative numbers.

To give you more details of the issue, why pandas pct_change calculation produces nonsensical values when negative numbers are in the series, you can take a look at this article:

(https://www.excelcampus.com/functions/percentage-change-formula-negative-numbers/)

Output of pd.show_versions()

pandas: 0.23.4

Original post updated to incorporate below comments.

Needs Info

Most helpful comment

probably a bit late to the game :)
But something like :
import numpy as np
df[column].pct_change()*np.sign(df[column].shift(periods=1))

should do the job

All 6 comments

Can you please strip down your example to the bare minimum code that shows the issue? There is a decent amount of extraneous items here that make it more difficult to sift through

Here are two examples only . If we read the below table, row by row, like a standard excel sheet, and if we apply, df.pct_change(), current incorrect formula = new/old -1, we'll get values shown in column "incorrect", which is what pandas is doing at the moment. If we apply proposed changed (new - old)/abs(old), then we will get values shown in the correct column.

 old  new     incorrect       correct
 -1    -2          1.0          -1.0
 -2     1         -1.5           1.5

The code that produces above table
data = [-1,-2,1]
df = pd.DataFrame(data,range(len(data)))
dfc = pd.concat([df.shift(), df, df.pct_change(), (df-df.shift())/df.shift().abs() ],1)
dfc.columns = ['old','new','incorrect','correct']
dfc.dropna()

Here is a simplified example. For future reference the code you've provided is too complex and brings in way too many things that are unrelated to the point you are trying to make, which is confusing for those trying to help

In [14]: data = [-1,-2,1]
    ...: df = pd.DataFrame(data,range(len(data)))
    ...: 

In [15]: df
Out[15]: 
   0
0 -1
1 -2
2  1

In [16]: df.pct_change()
Out[16]: 
     0
0  NaN
1  1.0
2 -1.5

I believe your point is that the percent change values should be NaN, -1.0 and 1.5. I'm not sure that's the case however as -2 is in fact a doubling of -1 so a 100% increase there makes sense. Are you seeing a formal definition elsewhere that conflicts with this?

In finance, when going from -1 to -2 one is going from less debt to more debt , hence one would expect negative -100% change not positive +100% change. Say you had -1$ debt, and now you have -2$ debt, your debt increased by -100%. In absolute terms, the value changed by 100%, but the direction is negative not positive.

A more complex example is going from -2 (negative) to +1 (positive), one would think of that as a positive change, that is a person went from negative balance to positive balance. However, pandas produces negative -1.5 or -150%. That is completely counter intuitive.

The issue is that there is no universal approach for calculating pct_change when there are negative numbers in the series. For that reason, pandas should provide an optional parameter for dealing with negative numbers.

To give you more details of the issue, why pandas pct_change calculation produces nonsensical numbers, you can take a look at this article that goes in more details:

(https://www.excelcampus.com/functions/percentage-change-formula-negative-numbers/)

The issue is that there is no universal approach for calculating pct_change when there are negative numbers in the series. For that reason, pandas should provide an optional parameter for dealing with negative numbers.

Agreed on the first point but not on the second. I don't see a keyword providing much value here over a user using abs as desired to get the calculation they want. Closing as a result

probably a bit late to the game :)
But something like :
import numpy as np
df[column].pct_change()*np.sign(df[column].shift(periods=1))

should do the job

Was this page helpful?
0 / 5 - 0 ratings