Pandas: dataframe.replace() does not work on a subset of rows

Created on 11 Mar 2017  路  4Comments  路  Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

# Your code here
data = pd.DataFrame( {'ab' : ['A','B','A','A','B'], 'num' : ['01','02','01','01','01']})

a_replacements = { 'num' : { '01' : 'funny', '02' : 'serious' }}
b_replacements = { 'num' : { '01' : 'beginning', '02' : 'end' }}

data[data.ab == 'A'].replace(inplace=True, to_replace=a_replacements)

Problem description

The reason I have the mask (data.ab == 'A') is because there are two levels for A and two levels for B. If I were to run data.replace(inplace=True, to_replace=a_replacements, for rows where ab==B column num would be encoded as funny or serious instead of beginning or end.

Expected Output

My thought is that the last line of the above code block should result in a data frame that has all the 01's and 02's replaced for all rows that have 'ab' == 'A'. But this does not work. No exception is thrown. Not quite sure what I'm doing wrong.

Output of pd.show_versions()

Paste the output here pd.show_versions() here

```INSTALLED VERSIONS

commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Darwin
OS-release: 16.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.0
scipy: None
statsmodels: None
xarray: None
IPython: 5.3.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
boto: None
pandas_datareader: None
```

Indexing Usage Question

All 4 comments

Copy-pastable example:


In [28]: data = pd.DataFrame({"num": ['01', '02']})

In [29]: data[[True, False]].replace({"num": {"01": "funny", "02": "begining"}}, inplace=True)
/Users/tom.augspurger/Envs/py3/lib/python3.6/site-packages/pandas/pandas/core/generic.py:3664: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  regex=regex)

In [30]: data[[True, True]].replace({"num": {"01": "funny", "02": "begining"}}, inplace=True)

In [31]: data
Out[31]:
  num
0  01
1  02

So I think In[29] shows why this isn't working for you. Your slice data[data.ab == 'A'] may be a copy, and so your inplace replace is operating on a copy, not the original, so it looks like it's not working.

The potential bug here is why an all-True mask didn't raise the SettingWithCopy warning.

As you can see, you're probably better off not using inplace.

This is not a correct usage. As indicated by the error message it might work, but is not idiomatic.

In [4]: data[data.ab == 'A'].replace(inplace=True, to_replace=a_replacements)
/Users/jreback/pandas/pandas/core/generic.py:3664: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  regex=regex)

Instead use this pattern. The rhs is aligned to the lhs. This is the what pandas does for you by default. (you can also use .loc[data.ab=='A'].replace(...) on the rhs if its more clear.

In [14]: data.loc[data.ab=='A'] = data.replace(to_replace=a_replacements)

In [15]: data
Out[15]: 
  ab    num
0  A  funny
1  B     02
2  A  funny
3  A  funny
4  B     01

Thanks for the solution. It works well when I'm trying to replace values in num for each case ab == 'A', ab=='B'

Let's say I have 26 cases (ab=='C'....ab=='Z') and I'm trying to use a for loop to iterate through those cases, I get a TypeError
TypeError: cannot replace ['a_replacements'] with method pad on a DataFrame

code:

for letter in data.ab.unique():
    data.loc[data.ab == letter] = data.replace(to_replace=letter+"_replacements")

To which I get :

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-96-acd3197ceef4> in <module>()
      1 for letter in data.ab.unique():
      2     print(letter.lower()+"_replacements")
----> 3     data.loc[data.ab == letter] = data.replace(to_replace=letter.lower()+"_replacements")

/Users/alokshenoy/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/pandas/core/generic.py in replace(self, to_replace, value, inplace, limit, regex, method, axis)
   3427             if isinstance(to_replace, (tuple, list)):
   3428                 return _single_replace(self, to_replace, method, inplace,
-> 3429                                        limit)
   3430 
   3431             if not is_dict_like(to_replace):

/Users/alokshenoy/.pyenv/versions/miniconda3-latest/lib/python3.6/site-packages/pandas/core/generic.py in _single_replace(self, to_replace, method, inplace, limit)
     70     if self.ndim != 1:
     71         raise TypeError('cannot replace {0} with method {1} on a {2}'
---> 72                         .format(to_replace, method, type(self).__name__))
     73 
     74     orig_dtype = self.dtype

TypeError: cannot replace ['a_replacements'] with method pad on a DataFrame

Also tried the rhs = lhs way, and that throws the same error. Curious as to what changes once inside the for loop?

@alokshenoy you should ask on Stack Overflow

Was this page helpful?
0 / 5 - 0 ratings