In[0]:
store = pd.HDFStore("d:/org.h5")
a = store.get(table["table_name"])
a.index
​
Out[0]:
Index(['{7D23701A-991B-11E7-977E-448A5B7647CF}',
'{7D23701E-991B-11E7-977E-448A5B7647CF}',
'{7D237020-991B-11E7-977E-448A5B7647CF}',
...
'{9E1257B7-99E4-11E7-B36E-448A5B7647CF}',
'{9E1257B9-99E4-11E7-B36E-448A5B7647CF}',
'{9E1257CB-99E4-11E7-B36E-448A5B7647CF}'],
dtype='object', name='object_id', length=6710)
In [1]:
a[a['trade_dt'] == '2017-09-14'].index
Out[1]:
Index(['{7D23701A-991B-11E7-977E-448A5B7647CF}',
'{7D23701E-991B-11E7-977E-448A5B7647CF}',
'{7D237020-991B-11E7-977E-448A5B7647CF}',
...
'{7D236E8C-991B-11E7-977E-448A5B7647CF}',
'{7D236E9A-991B-11E7-977E-448A5B7647CF}',
'{7D236EA0-991B-11E7-977E-448A5B7647CF}'],
dtype='object', name='object_id', length=3353)
In [2]:
store.select(table["table_name"], where="index in %s" % list(a[a['trade_dt'] == '2017-09-14'].index))
Out[2]:
(3353, 22)
In [3]:
store.remove(table["table_name"], where="index in %s" % list(a[a['trade_dt'] == '2017-09-14'].index))
Out[3]:
6710
The “where" condition in HDFStore.select returns selected 3353 rows. I supposed using same condition with "where" in HDFStore.remove should do the same which is remove this 3353 rows. But it remove 6710 rows which is all lines in this table.
I tested with only one single row in the "index in [list]" where condition. It removes one row which is I expected.
In [4]:
[a[a['trade_dt'] == '2017-09-14'].index[0]]
Out[4]:
['{7D23701A-991B-11E7-977E-448A5B7647CF}']
In [5]:
store.select(table["table_name"], where="index in %s" % [a[a['trade_dt'] == '2017-09-14'].index[0]]).shape
Out[5]:
(1, 22)
In [6]:
store.remove(table["table_name"], where="index in %s" % [a[a['trade_dt'] == '2017-09-14'].index[0]])
Out[6]:
1
I tested with equivalent condition ops such as "=" returns the same
pd.show_versions()'0.20.3'
So you would have to show a reproducible example; the following works.
In [1]: df = pd.DataFrame({'A': np.arange(10), 'B': pd.date_range('20130101',periods=10)}).set_index('B')
...: store = pd.HDFStore('test.h5')
...: store.append('df', df)
...: assert len(store.select('df')) == 10
...: store.remove('df', where='index in ["20130102", "20130103"]')
...: assert len(store.select('df')) == 8
...:
In [2]: store.select('df')
Out[2]:
A
B
2013-01-01 0
2013-01-04 3
2013-01-05 4
2013-01-06 5
2013-01-07 6
2013-01-08 7
2013-01-09 8
2013-01-10 9
@jreback
I did a slight change to your test code then it reproduces now.
In [1]:
df = pd.DataFrame({'A': np.arange(1000), 'B': pd.date_range('20130101',periods=1000)}).set_index('B')
select = list(df.reset_index().B.sample(100))
store = pd.HDFStore('test.h5')
store.put('df', df, format='table')
assert len(store.select('df')) == 1000
print('select len %d' % len(store.select('df', where='index in %s' % select)))
store.remove('df', where='index in %s' % select)
print('after remove %d' % len(store.select('df')))
assert len(store.select('df')) == 900
select len 100
after remove 0
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-136-cf9062a4d575> in <module>()
7 store.remove('df', where='index in %s' % select)
8 print('after remove %d' % len(store.select('df')))
----> 9 assert len(store.select('df')) == 900
AssertionError:
In [2]:
store.select('df')
Out[2]:
A
B
ok, would appreciate if you can debug. there are not very many test cases for this, so certainly can add this one and fix.
Tangentially related to #15937. The remove method uses numexpr so selecting as shown above would work as long as <= 31 elements. A warning pointing to http://pandas.pydata.org/pandas-docs/stable/io.html#selecting-using-a-where-mask seems reasonable.
Swapping the commented lines here will return an empty dataframe.
```python
In [2]:
df = pd.DataFrame({'A': np.arange(1000),
'B': pd.date_range('20130101', periods=1000)}
).set_index('B')
select = list(df.reset_index().B.sample(32))
store = pd.HDFStore('temp.h5')
store.put('df', df, format='table')
where = df.index.isin(select)
store.select('df', where=where)
store.remove('df', where=where)
Out[2]:
32
In [3]:
len(store.select('df'))
Out[3]:
968
Most helpful comment
Tangentially related to #15937. The
removemethod usesnumexprso selecting as shown above would work as long as <= 31 elements. A warning pointing to http://pandas.pydata.org/pandas-docs/stable/io.html#selecting-using-a-where-mask seems reasonable.Swapping the commented lines here will return an empty dataframe.
```python
In [2]:
df = pd.DataFrame({'A': np.arange(1000),
'B': pd.date_range('20130101', periods=1000)}
).set_index('B')
select = list(df.reset_index().B.sample(32))
store = pd.HDFStore('temp.h5')
store.put('df', df, format='table')
where = df.index.isin(select)
store.select('df', where=where)
store.remove('df', where='index in %s' % select)
store.remove('df', where=where)
Out[2]:
32
In [3]:
len(store.select('df'))
Out[3]:
968