Pandas: DataFrame.iat will create new column if .iat is used to set None on int Series

Created on 19 Oct 2018 · 5Comments · Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

>>> df = pd.DataFrame({'a':[0,1],'b':[4,5]})
>>> df
   a  b
0  0  4
1  1  5
>>> df.iat[0, 0] = None
>>> df
   a  b   0
0  0  4 NaN
1  1  5 NaN

Problem description

This is problematic for multiple reasons.

inconsistency between iloc and iat.
creation of a brand new column is almost surely not the intended/expected behavior
At the very least, I would expect it to simply bail on the operation with a warning about incompatible types.

This is likely related to the non-intuitive behavior of Series which has already been documented here:
https://github.com/pandas-dev/pandas/issues/20643#issuecomment-431244590

Expected Output

I would expect it to do what it does when using .iloc:

>>> df = pd.DataFrame({'a':[0,1],'b':[4,5]})
>>> df
   a  b
0  0  4
1  1  5
>>> df.iloc[0, 0] = None
>>> df
     a  b
0  NaN  4
1  1.0  5

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.utf8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: 0.1.2
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Bug Indexing

Source

jimmywan

Most helpful comment

iat indexes by integer position and not label, so it shouldn't matter if 0 is not in the columns; it should modify the value in row 0 column A

IIRC iat and at doesn't perform as many data validation checks, so this may be a "fallback" assignment and broadcasting.

mroeschke on 22 Oct 2018

👍2

All 5 comments

I tried it with 0.23.4 and had the same output.

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.utf8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: 0.1.2
fastparquet: None
pandas_gbq: None
pandas_datareader: None

jimmywan on 19 Oct 2018

Thanks for the report. Investigations and PRs welcome!

mroeschke on 20 Oct 2018

I think we can conclude that as df.iat is capable of creating and setting values,it will create index for 0 which is not present as column index(a,b).
We cannot access those values using iat if column index are char type.

aditya0811 on 22 Oct 2018

👎1

iat indexes by integer position and not label, so it shouldn't matter if 0 is not in the columns; it should modify the value in row 0 column A

IIRC iat and at doesn't perform as many data validation checks, so this may be a "fallback" assignment and broadcasting.

mroeschke on 22 Oct 2018

👍2

I do not agree with @aditya0811 :

I think we can conclude that as df.iat is capable of creating and setting values,it will create index for 0 which is not present as column index(a,b).

As explained by @mroeschke here:

iat indexes by integer position and not label, so it shouldn't matter if 0 is not in the columns; it should modify the value in row 0 column A

jimmywan on 23 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Dataframe creation: Specifying dtypes with a dictionary

amelio-vazquez-reina · 3Comments

Incompatibility between pandas.infer_freq and pandas.to_timedelta

idanivanov · 3Comments

Better display of negative Timedelta

scls19fr · 3Comments

to_sql UnicodeEncodeError

matthiasroder · 3Comments

Cannot use apply on Series with Timestamp values

nathanielatom · 3Comments

Pandas: DataFrame.iat will create new column if .iat is used to set None on int Series

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

Most helpful comment

All 5 comments

INSTALLED VERSIONS

Related issues

Output of `pd.show_versions()`