Would be nice to have the query method support partial string matching, so you could do the equivalence of df[df['A'].str.contains("abc")] using query: df.query("A contains 'abc'").
sure. pull -requests welcome!
Would it support regex? Or be more like the standard python in operator?
Because I was thinking, similar to
In [13]: df = pd.DataFrame({'a': ['abcde', 'fghij']})
In [14]: 'a' in 'abcd'
Out[14]: True
you could also do:
df.query("'a' in a")
However, I don't know in what sense it does conflict with the current use of in inside query
Maybe I'm missing something, but seems this is still an open issue. I ended up here via a lot of googling. I'm having the same sort of challenge. I've tried in both pandas 0.18.1 and 0.20.1.
I would love to be able to do df.query("A contains 'abc'") as @johanekholm suggested. It's understood that this would be a slower operation than a simpler condition such as == or != but I don't see any downside to having the option.
@rea725 this is an open issue as the tag indicates
if you want this implemented the quickest route would be a pull request to do so
It looks like I found a solution, by reading the pandas documentation. I get the behavior I seek by passing engine='python'. The explanation totally makes sense, including the recommendation to avoid doing so unless you really need to, since this would be slow compared to the default option. I'm not sure any additional action is merited.
More specifically, I had to do df.query('A.str.contains("abc"), engine=python) which is maybe not quite as elegant as df.query("A contains 'abc'"), but it is good enough for my purposes.
Not sure whether I need to open a new issue. If needed, will do.
So I've been using the Series string methods to do some comparisons with a input string. I'm using the
series.str.contains(word,case=False)
and create a new dataframe with the results. What I've observed is that if there is a plus sign (+) in the word I supply for search, the method return 0 zero results
Below is a snippet
import os
import pandas as pd
datadf = pd.DataFrame()
resultdf = pd.DataFrame()
datadf = pd.DataFrame({'description':["i am good boy","i am a bad boy","i am an ugly boy","i am a + boy"]})
print(datadf)
word = "i am a + boy"
resultdf=resultdf.append(datadf[datadf['description'].str.contains(word,case=False)])
print(len(resultdf.index))
The issue is not only with +, but also with *

Also does this have to do anything with the below note on official docs ?

Closing. Contributions welcome
Most helpful comment
It looks like I found a solution, by reading the pandas documentation. I get the behavior I seek by passing
engine='python'.The explanation totally makes sense, including the recommendation to avoid doing so unless you really need to, since this would be slow compared to the default option. I'm not sure any additional action is merited.More specifically, I had to do
df.query('A.str.contains("abc"), engine=python)which is maybe not quite as elegant asdf.query("A contains 'abc'"), but it is good enough for my purposes.