Pandas: QST: is the new behavior of df.apply(my_func, axis=1) in v1.1.0 intended?

Created on 31 Jul 2020  路  2Comments  路  Source: pandas-dev/pandas

  • [x] I have searched the [pandas] tag on StackOverflow for similar questions.

  • [x] I have asked my usage related question on StackOverflow.


Question about pandas

import pandas as pd
def test_func(row):
    row['c'] = str(row['a']) + str(row['b'])
    row['d'] = row['a'] + 1
    return row

df = pd.DataFrame({'a': [1,2,3], 'b': ['i','j', 'k']})
df.apply(test_func, axis=1)

The above code ran on pandas 1.1.0 returns:

   a  b   c  d
0  1  i  1i  2
1  1  i  1i  2
2  1  i  1i  2

While in pandas 1.0.5 it returns:

   a   b    c  d
0  1   i   1i  2
1  2   j   2j  3
2  3   k   3k  4

Using python 3.8.3 and IPython 7.16.1.

The Question:

:question: What is the right way of getting the v1.0.5 behavior in v1.1.0?

I did see this release note but honestly can't figure out if this is an intended/unintended side effect of it: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.1.0.html#apply-and-applymap-on-dataframe-evaluates-first-row-column-only-once

thanks

Duplicate

Most helpful comment

In great generality, one should not mutate containers when iterating over them.

def test_func(row): row = row.copy() row['c'] = str(row['a']) + str(row['b']) row['d'] = row['a'] + 1 return row

gives

a b c d 0 1 i 1i 2 1 2 j 2j 3 2 3 k 3k 4

Of course, the vectorized version of this will be much faster:

````
%%timeit

df['c'] = df['a'].astype(str) + df['b']
df['d'] = df['a'] + 1
````

gives 564 碌s 卤 5.97 碌s per loop whereas your version is 5.34 ms 卤 16.9 碌s per loop.

All 2 comments

In great generality, one should not mutate containers when iterating over them.

def test_func(row): row = row.copy() row['c'] = str(row['a']) + str(row['b']) row['d'] = row['a'] + 1 return row

gives

a b c d 0 1 i 1i 2 1 2 j 2j 3 2 3 k 3k 4

Of course, the vectorized version of this will be much faster:

````
%%timeit

df['c'] = df['a'].astype(str) + df['b']
df['d'] = df['a'] + 1
````

gives 564 碌s 卤 5.97 碌s per loop whereas your version is 5.34 ms 卤 16.9 碌s per loop.

Thanks @manihamidi for the report. Same issue as #35462 so closing as duplicate.

Was this page helpful?
0 / 5 - 0 ratings