I don't know if this is a doc bug, a code bug, or an expectation error on my part.
The documentation of itertuples says this function will "Iterate over DataFrame rows as namedtuples", but the resulting items are not namedtuple and cannot be indexed by name like a namedtuple.
import pandas as pd
df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
index=['a', 'b'])
for row in df.itertuples():
print(row)
print(row['col1'])
$ python pandasbug.py
Pandas(Index='a', col1=1, col2=0.10000000000000001)
Traceback (most recent call last):
File "pandasbug.py", line 8, in <module>
print(row['col1'])
TypeError: tuple indices must be integers, not str
$ python pandasbug.py
Pandas(Index='a', col1=1, col2=0.10000000000000001)
1
Pandas(Index='b', col1=2, col2=0.20000000000000001)
2
pd.show_versions()
$ python -c 'import pandas as pd; pd.show_versions()'
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.0
nose: None
pip: 8.1.1
setuptools: 2.2
Cython: None
numpy: 1.11.0
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext)
jinja2: None
boto: None
I don't think namedtuples
can be indexed by name?
In [38]: namedtuple('y', list('abc'))(1,2,3)['a']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-38-b6fb7ee08b27> in <module>()
----> 1 namedtuple('y', list('abc'))(1,2,3)['a']
TypeError: tuple indices must be integers or slices, not str
named tuples have restricted attributes (eg no spaces) and only allow attribute access, not getitem access
Ah, so an expectation bug, I should have been using print(row.col1)
which works fine.
You can simply use this instead to get around the lack of getitem access with namedtuples:
getattr(row, "col1")
I think that's the equivalent of row.col1 but allows you to use variables for the index, just like row["col1"].
Most helpful comment
You can simply use this instead to get around the lack of getitem access with namedtuples:
getattr(row, "col1")
I think that's the equivalent of row.col1 but allows you to use variables for the index, just like row["col1"].