import pandas as pd
data = [
{'id': 1, 'content': [{'values': 3}]},
{'id': 2, 'content': u'whats going on'},
{'id': 3, 'content': u'whaaaaaaaaat'},
{'id': 4, 'content': [{'values': 4}]}
]
if __name__ == '__main__':
df = pd.DataFrame.from_dict(data)
v = [u'whats going on', u'whaaaaaat']
print df[df.content.isin(v)]
v = [u'whats going on', u'what']
print df[df.content.isin(v)]
The first print statement executes sucessfully, filtering to the single row 'id': 2, 'content': u'whats going on', however the second filter throws an error even though the only difference is the length of one of the elements in the list v.
Output for the code snippet above:
content id
1 whats going on 2
/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/indexes/range.py:473: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
return max(0, -(-(self._stop - self._start) // self._step))
Traceback (most recent call last):
File "test_pandas.py", line 15, in <module>
print df[df.content.isin(v)]
File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 2804, in isin
return self._constructor(result, index=self.index).__finalize__(self)
File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 264, in __init__
raise_cast_failure=True)
File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/series.py", line 3269, in _sanitize_array
if len(subarr) != len(index) and len(subarr) == 1:
File "/home/attila/digital/env/local/lib/python2.7/site-packages/pandas/core/indexes/range.py", line 473, in __len__
return max(0, -(-(self._stop - self._start) // self._step))
TypeError: unhashable type: 'list'
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-37-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: None.None
pandas: 0.22.0
pytest: 2.9.2
pip: 9.0.1
setuptools: 36.4.0
Cython: None
numpy: 1.14.2
scipy: 0.18.1
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
I have a different output:
In [7]: df.content.isin(v)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
TypeError: unhashable type: 'list'
The above exception was the direct cause of the following exception:
SystemError Traceback (most recent call last)
<ipython-input-7-5a60788e7bc7> in <module>()
----> 1 df.content.isin(v)
~/sandbox/pandas/pandas/core/series.py in isin(self, values)
3576 Name: animal, dtype: bool
3577 """
-> 3578 result = algorithms.isin(self, values)
3579 return self._constructor(result, index=self.index).__finalize__(self)
3580
~/sandbox/pandas/pandas/core/algorithms.py in isin(comps, values)
444 comps = comps.astype(object)
445
--> 446 return f(comps, values)
447
448
~/sandbox/pandas/pandas/core/algorithms.py in <lambda>(x, y)
419
420 # faster for larger cases to use np.in1d
--> 421 f = lambda x, y: htable.ismember_object(x, values)
422
423 # GH16012
~/sandbox/pandas/pandas/_libs/hashtable_func_helper.pxi in pandas._libs.hashtable.ismember_object()
470
471 kh_destroy_pymap(table)
--> 472 return result.view(np.bool_)
473
474
SystemError: <built-in method view of numpy.ndarray object at 0x1078a93f0> returned a result with an error set
In general, nested data like this aren't well supported at the moment. The upcoming 0.23 release is laying some groundwork to better-support this, but it'll take some time.
Similar issue, with a single value in the sdf.id.values, the following error occurs, with 2 or more values no error.
(Pdb) df.isin(sdf.id.values)
* SystemError:
Still not working in Pandas version '0.24.2'. I am having the same error than @TomAugspurger using python 3.7.3. It worked perfectly in python 2.7.15.
Any idea to sort this out?
I don't think anyone has investigated deeply. Could you @javi-clear-image-ai?
I don't think anyone has investigated deeply. Could you @javi-clear-image-ai?
I did (a bit), but without much luck. I ended up moving from pandas to numpy (df.values) and working with the numpy array. It worked for me, so that would be the walk around I would suggest for the moment.
Simpler test case:
pd.Series([0, [1, 2]]).isin(['a', 'b'])
(so unrelated to indexing, or DataFrame).
The problem in my case was because my column instead to be an str was an element/object in pandas, _i.e_. my data was an array and I was using a list to perform the comparison directly.
I just pass
# Iterate on the top of words in a column
for textract_value in textract_keywords:
textract_value = str(textract_value).lower()
for handwerkskammer in handwerkskammer_name:
handwerkskammer = str(handwerkskammer).lower()
if textract_value == handwerkskammer:
print(f'Contains: {handwerkskammer}')
This solved my problem.