import re
import pandas as pd
import regex
df = pd.DataFrame({"a": [1, 2, 3], "b": ["a", "1", "2"]})
pattern = r"\d"
df.b.str.match(pattern)
df.b.str.match(re.compile(pattern))
df.b.str.match(regex.compile(pattern)) # throws typeError
TypeError Traceback (most recent call last)
9 df.b.str.match(pattern)
10 df.b.str.match(re.compile(pattern))
---> 11 df.b.str.match(regex.compile(pattern))
~/.virtualenvs/edgar/lib/python3.6/site-packages/pandas/core/strings.py in match(self, pat, case, flags, na, as_indexer)
2421 def match(self, pat, case=True, flags=0, na=np.nan, as_indexer=None):
2422 result = str_match(self._data, pat, case=case, flags=flags, na=na,
-> 2423 as_indexer=as_indexer)
2424 return self._wrap_result(result)
2425
~/.virtualenvs/edgar/lib/python3.6/site-packages/pandas/core/strings.py in str_match(arr, pat, case, flags, na, as_indexer)
736 flags |= re.IGNORECASE
737
--> 738 regex = re.compile(pat, flags=flags)
739
740 if (as_indexer is False) and (regex.groups > 0):
~/.virtualenvs/edgar/lib/python3.6/re.py in compile(pattern, flags)
231 def compile(pattern, flags=0):
232 "Compile a regular expression pattern, returning a pattern object."
--> 233 return _compile(pattern, flags)
234
235 def purge():
~/.virtualenvs/edgar/lib/python3.6/re.py in _compile(pattern, flags)
298 return pattern
299 if not sre_compile.isstring(pattern):
--> 300 raise TypeError("first argument must be string or compiled pattern")
301 p = sre_compile.compile(pattern, flags)
302 if not (flags & DEBUG):
TypeError: first argument must be string or compiled pattern
A simpler way to demonstrate the problem is:
```python
re.compile(regex.compile(pattern))
TypeError Traceback (most recent call last)
<ipython-input-64-38578ab20aeb> in <module>()
----> 1 re.compile(regex.compile(pattern))
~/.virtualenvs/edgar/lib/python3.6/re.py in compile(pattern, flags)
231 def compile(pattern, flags=0):
232 "Compile a regular expression pattern, returning a pattern object."
--> 233 return _compile(pattern, flags)
234
235 def purge():
~/.virtualenvs/edgar/lib/python3.6/re.py in _compile(pattern, flags)
298 return pattern
299 if not sre_compile.isstring(pattern):
--> 300 raise TypeError("first argument must be string or compiled pattern")
301 p = sre_compile.compile(pattern, flags)
302 if not (flags & DEBUG):
TypeError: first argument must be string or compiled pattern
The regex library seems not to be supported by pandas. Not sure if you want to add support for it, but I had a quick look and It seems relatively straight forward to add support for it (+ it would make maintainance for projects that have already opted for regex easier).
So, I think that the steps that seem to be required are:
pandas.core.dtypes.inference.is_re should return True for regex compiled patterns too (assuming that regex is installed of course).re.compile() (as is being done e.g. here):if not is_re(pat):
pat = re.compile(pat, flags)
pd.show_versions()INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.17.5-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.3
pytest: 3.7.1
pip: 18.0
setuptools: 40.0.0
Cython: 0.28.5
numpy: 1.15.0
scipy: 1.1.0
pyarrow: 0.10.0
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.7.3
bs4: 4.6.1
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
Are patterns compiled by regex instances of typing.re.Pattern?
No they are not.
import re
import typing
import regex
re_pat = re.compile(r"\d")
regex_pat = regex.compile(r"\d")
re_pat.__class__.mro() # [_sre.SRE_Pattern, object]
isinstance(re_pat, typing.Pattern) # True
regex_pat.__class__.mro() # [_regex.Pattern, object]
isinstance(regex_pat, typing.Pattern) # False
Hi @pmav99, any luck with this? Or did you happen to create a workaround for yourself?
@madimov, I think I used vanila re for pandas, and regex for everything else. Not nice ,but there was no feedback and I needed to move on.
We could probably optionally import regex and append it’s type to the re types we handle.
On Jul 26, 2019, at 10:23, pmav99 notifications@github.com wrote:
@madimov, I think I used vanila re for pandas, and regex for everything else. Not nice ,but there was no feedback and I needed to move on.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
@TomAugspurger that would be great! Might you have the time to give it a go?
No, I won't have time.
On Mon, Jul 29, 2019 at 3:21 AM Miko Dimov notifications@github.com wrote:
@TomAugspurger https://github.com/TomAugspurger that would be great!
Might you have the time to give it a go?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/22496?email_source=notifications&email_token=AAKAOITEIYARGCLBCFFOYVLQB2SB5A5CNFSM4FRLSLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD27632Q#issuecomment-515894762,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAKAOIT5NQEZ7LURSM6AADLQB2SB5ANCNFSM4FRLSLXA
.
@TomAugspurger
Are patterns compiled by regex instances of typing.re.Pattern?
If the answer to this is "no", then that's an upstream bug IMO.
That said, here is my attempt at a fix: https://github.com/pandas-dev/pandas/compare/master...gwerbin:patch-2
Just made the edits here on Github, so haven't actually run any tests yet.
Might you have the time to give it a go?
Tom didn't have time, but PRs are welcome.
@jbrockmendel did you take a look at my proposed patch? It will probably need a major rebase obviously. Just want to make sure what I did is an acceptable approach before I put more time into it.
That looks roughly correct. You'll need to update some of the CI envs in ci/deps to include regex and skip the test if it isn't present.
@gwerbin thanks for pinging on this. Yah, that looks a lot less invasive than I expected, seems reasonable.