import io
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(1000000, 10), columns=('COL{}'.format(i) for i in range(10)))
csv = io.StringIO(df.to_csv(index=False))
df2 = pd.read_csv(csv)
pd.read_csv()
using _libs.parsers.TextReader
read()
method is 3.5X slower on Pandas 0.23.4 on Python 3.7.1 compared to Pandas 0.22.0 on Python 3.5.2.
4244 function calls (4210 primitive calls) in 10.273 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 10.202 10.202 10.204 10.204 {method 'read' of 'pandas._libs.parsers.TextReader' objects}
1 0.039 0.039 0.039 0.039 internals.py:5017(_stack_arrays)
1 0.011 0.011 10.262 10.262 parsers.py:414(_read)
1 0.011 0.011 10.273 10.273 <string>:1(<module>)
1 0.004 0.004 0.004 0.004 parsers.py:1685(__init__)
321 0.001 0.000 0.002 0.000 common.py:811(is_integer_dtype)
3229 function calls (3222 primitive calls) in 2.944 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 2.881 2.881 2.882 2.882 {method 'read' of 'pandas._libs.parsers.TextReader' objects}
1 0.045 0.045 0.045 0.045 internals.py:4801(_stack_arrays)
1 0.010 0.010 2.944 2.944 parsers.py:423(_read)
1 0.004 0.004 0.004 0.004 parsers.py:1677(__init__)
320 0.001 0.000 0.001 0.000 common.py:777(is_integer_dtype)
1 0.001 0.001 0.001 0.001 {method 'close' of 'pandas._libs.p
pd.show_versions() -- Latst Python 3.7.1 Pandas 0.23.4 : Slow Read CSV
commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Windows
OS-release: 2008ServerR2
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: 3.9.2
pip: 18.1
setuptools: 40.4.3
Cython: 0.29
numpy: 1.15.3
scipy: 1.1.0
pyarrow: 0.11.0
xarray: 0.10.9
IPython: 7.0.1
sphinx: 1.8.1
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: 1.6.1
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.0
openpyxl: 2.5.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: None
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.12
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None
pd.show_versions() -- Older Python 3.5.2 Pandas 0.22.0 : Fast Read CSV
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: 3.5.0
pip: 9.0.3
setuptools: 20.10.1
Cython: 0.28.1
numpy: 1.14.2
scipy: 1.0.1
pyarrow: 0.9.0
xarray: 0.10.2
IPython: 6.3.0
sphinx: 1.7.2
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.1
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: None
lxml: 4.2.1
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.2.6
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
Can you benchmark just the change in python, and just the change in pandas separately?
Can you benchmark just the change in python, and just the change in pandas separately?
Yes, are there older builds of Pandas on Python 3.7.1? I suppose I can try newer pandas version on old Python.
I think 0.23.2 is the first version of pandas to support 3.7
On Mon, Nov 5, 2018 at 1:15 PM Gagi notifications@github.com wrote:
Can you benchmark just the change in python, and just the change in pandas
separately?Yes, are there older builds of Pandas on Python 3.7.1? I suppose I can try
newer pandas version on old Python.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/23516#issuecomment-435999395,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIm27-U5HB6S2JcM2DPtr6YzlPC90ks5usI48gaJpZM4YPD2Y
.
I ran the test on an older Python 3.5 stack with the latest Pandas version 0.23.4 but with a lot of older versions of other modules and it looks to be running faster on Python 3.5. Now, I'm not quite sure if its pandas directly on python 3.7.1 or one of its dependencies.
Does the parser's _read()
method rely on some other library that may be the culprit?
%prun df2 = pd.read_csv(csv)
5154 function calls (5041 primitive calls) in 2.004 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.960 1.960 1.962 1.962 {method 'read' of 'pandas._libs.parsers.TextReader' objects}
1 0.030 0.030 0.030 0.030 internals.py:5017(_stack_arrays)
1 0.006 0.006 2.004 2.004 parsers.py:414(_read)
1 0.003 0.003 0.003 0.003 parsers.py:1685(__init__)
321 0.001 0.000 0.001 0.000 common.py:811(is_integer_dtype)
518 0.000 0.000 0.000 0.000 common.py:1835(_get_dtype_type)
Installed Python 3.5 Stack With Latest pandas 0.23.4:
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: 3.5.0
pip: 18.0
setuptools: 38.6.0
Cython: 0.28.1
numpy: 1.14.2
scipy: 1.0.1
pyarrow: 0.10.0
xarray: 0.10.2
IPython: 6.3.0
sphinx: 1.7.2
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.1
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: None
lxml: 4.2.1
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.2.6
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
<\details>
Interestingly if I specify float_precision='round_trip'
I get similar parsing speeds. If I specify 'high' or None then its back to the same 3.5x difference.
__Python 3.7.1__
%prun df2 = pd.read_csv(csv, float_precision='round_trip')
4320 function calls (4286 primitive calls) in 4.074 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 3.862 3.862 3.864 3.864 {method 'read' of 'pandas._libs.parsers.TextReader' objects}
1 0.168 0.168 4.074 4.074 <string>:1(<module>)
1 0.030 0.030 0.030 0.030 internals.py:5017(_stack_arrays)
1 0.006 0.006 3.906 3.906 parsers.py:414(_read)
1 0.003 0.003 0.003 0.003 parsers.py:1685(__init__)
321 0.001 0.000 0.002 0.000 common.py:811(is_integer_dtype)
516 0.000 0.000 0.001 0.000 common.py:1835(_get_dtype_type)
952 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
__Python 3.5.2__
%prun df2 = pd.read_csv(csv, float_precision='round_trip')
4582 function calls (4545 primitive calls) in 3.716 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 3.665 3.665 3.667 3.667 {method 'read' of 'pandas._libs.parsers.TextReader' objects}
1 0.031 0.031 0.031 0.031 internals.py:5017(_stack_arrays)
1 0.006 0.006 3.710 3.710 parsers.py:414(_read)
1 0.006 0.006 3.716 3.716 <string>:1(<module>)
1 0.003 0.003 0.003 0.003 parsers.py:1685(__init__)
321 0.001 0.000 0.001 0.000 common.py:811(is_integer_dtype)
518 0.000 0.000 0.001 0.000 common.py:1835(_get_dtype_type)
Adding another data point. If I specify the 'python' engine. It looks like on Python 3.7.1 pandas._libs.lib.maybe_convert_numeric
is 3X slower than on Python on 3.5.2
Could this be due to the cython version?
__Python 3.7.1__
%prun df2 = pd.read_csv(csv, engine='python')
7003613 function calls (7003575 primitive calls) in 14.411 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
10 9.698 0.970 9.698 0.970 {pandas._libs.lib.maybe_convert_numeric}
1000004 3.221 0.000 3.221 0.000 {built-in method builtins.next}
1 0.386 0.386 4.066 4.066 parsers.py:2926(_get_lines)
1 0.263 0.263 14.399 14.399 parsers.py:1029(read)
4 0.154 0.038 0.247 0.062 parsers.py:2738(_remove_empty_lines)
1000002 0.138 0.000 3.359 0.000 parsers.py:2681(_next_iter_line)
2000072 0.125 0.000 0.125 0.000 {method 'append' of 'list' objects}
1 0.116 0.116 0.116 0.116 {pandas._libs.lib.to_object_array}
1000001 0.103 0.000 0.138 0.000 parsers.py:2869(<genexpr>)
2000130/2000117 0.069 0.000 0.069 0.000 {built-in method builtins.len}
14 0.067 0.005 0.204 0.015 {built-in method builtins.max}
__Python 3.5.2__
%prun df2 = pd.read_csv(csv, engine='python')
7004040 function calls (7004000 primitive calls) in 8.411 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1000003 3.662 0.000 3.662 0.000 {built-in method builtins.next}
10 3.270 0.327 3.270 0.327 {pandas._libs.lib.maybe_convert_numeric}
1 0.378 0.378 4.492 4.492 parsers.py:2926(_get_lines)
1 0.263 0.263 8.398 8.398 parsers.py:1029(read)
4 0.167 0.042 0.245 0.061 parsers.py:2738(_remove_empty_lines)
1000002 0.141 0.000 3.803 0.000 parsers.py:2681(_next_iter_line)
1 0.128 0.128 0.128 0.128 {pandas._libs.lib.to_object_array}
2000067 0.109 0.000 0.109 0.000 {method 'append' of 'list' objects}
1000001 0.108 0.000 0.133 0.000 parsers.py:2869(<genexpr>)
14 0.060 0.004 0.193 0.014 {built-in method builtins.max}
These last numbers are with what pandas version?
These last numbers are with what pandas version?
They are both Pandas 0.23.4
I tried building the latest Pandas version from source on Python 3.7.1 and still got the same slower performance. Are there any build/compile/cython flags I can set to optimize the parser?
the entire perf issue is simply the precision flag
you can choose higher precision but it takes more time; this is rarely useful though
the entire perf issue is simply the precision flag
you can choose higher precision but it takes more time; this is rarely useful though
I tried all three different float_precision=
flags and for 'high' and None the 3.5x slowdown was still present in Python 3.7.1 vs python 2.5.2.
I also tried specifying a float_format=
in the pd.to_csv() and I still see the same consistent 3.5x gap.
Can you reproduce in Python 3.6?
I should reiterate this perf difference is on the same version of Pandas 0.23.4 just different version of Python.
is there a way to specify 'xstrtod', or is that specified by float_precision=None?
I see no performance changes between 'high' and None.
Is anyone able to reproduce this on Python 3.7.1? I tested the code above on Python 3.7.0 using the Python.org interactive interpreter and it seemed to run faster than in my local 3.7.1 install.
Something is definitely up. I did a side by side compare reading the same CSV file on disk. Python 3.5 reads at 111 MB/sec and Python 3.7 reads at only 28 MB/Sec from the same SSD. Both running Pandas 0.23.4.
Could Python 3.7 have changed something in their I/O system?
__Python 3.5.2 & Pandas 0.23.4__
In [38]: %timeit pd.read_csv(r'out.csv', float_precision='high')
1.86 s ± 13.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
__Python 3.7.1 & Pandas 0.23.4__
In [17]: %timeit pd.read_csv(r'out.csv', float_precision='high')
7.97 s ± 19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I don't see the difference you're seeing
Python 3.5.6 |Anaconda, Inc.| (default, Aug 26 2018, 16:30:03)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas as pd
In [2]: pd.__version__
Out[2]: '0.23.4'
In [3]: %time _ = pd.read_csv('out.csv', float_precision='high')
CPU times: user 2.59 s, sys: 214 ms, total: 2.81 s
Wall time: 2.73 s
3.7
Python 3.7.1 (default, Oct 23 2018, 14:07:42)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.1.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas as pd
In [2]: pd.__version__
Out[2]: '0.23.4'
In [3]: %time _ = pd.read_csv('out.csv', float_precision='high')
CPU times: user 2.61 s, sys: 211 ms, total: 2.82 s
Wall time: 2.74 s
Both of those are using Anaconda's packages.
Tom, thanks for running this benchmark. Can you post your pd.show_versions() I want to re-create your stack exactly to do some more testing.
3.5
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.6.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: None
pip: 10.0.1
setuptools: 40.2.0
Cython: None
numpy: 1.15.2
scipy: None
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
3.7:
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 40.5.0
Cython: None
numpy: 1.15.4
scipy: None
pyarrow: None
xarray: None
IPython: 7.1.1
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
I tried several different fresh python installs on windows. Every Python 3.7 install 32 or 64 bit with Pandas 0.23.4 pip installed results in the slower CSV parsing speed. For fun I tried Installing a fresh Python 3.6.7 install and it again parses the same CSV 3X faster.
Is there anyone that could test this on Windows 10 and Python 3.7.1? 😕
cc @chris-b1 in case you can test on Windows
Indeed, I can confirm that there is a 3.5X slowdown when using Python 3.7.1 on Windows 10.
When I use Python 3.5.6, the performance is unchanged from 0.22.0
to 0.23.4
.
These observations are consistent with what @dragoljub was observing and appears to suggest that this is a Cython / Python suggest and not pandas
.
On windows 10, python 3.6 and python 3.7 I note a noticeable slowdown as well.
(py36) PS C:\Users\ttttt> ipython
Python 3.6.4 | packaged by conda-forge | (default, Dec 24 2017, 10:11:43) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.1.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas as pd
In [2]: %time _ = pd.read_csv('out.csv', float_precision='high')
Wall time: 7.03 s
In [3]: %time _ = pd.read_csv('out.csv')
Wall time: 7.04 s
python 3.7
```python
(py37) PS C:\Users\ttttt> ipython
Python 3.7.1 (default, Oct 28 2018, 08:39:03) [MSC v.1912 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.1.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas as pd
In [2]: df = pd.DataFrame(np.random.randn(1000000, 10), columns=('COL{}'.format(i) for i in range(10)))
In [6]: df.to_csv('out.csv')
In [7]: %time _ = pd.read_csv('out.csv', float_precision='high')
Wall time: 29.4 s
In [8]: %time _ = pd.read_csv('out.csv')
Wall time: 31.3 s
````
For people on windows, how are you installing pandas? From source, wheels, or conda packages? And if conda, from defaults or from conda-forge?
Here conda-forge:
PS C:\Users\ttttt> activate py37
(py37) PS C:\Users\ttttt> conda install ipython pandas
Solving environment: done
## Package Plan ##
environment location: C:\Miniconda\envs\py37
added / updated specs:
- ipython
- pandas
The following packages will be downloaded:
package | build
---------------------------|-----------------
ipython-7.1.1 |py37h39e3cac_1000 1.1 MB conda-forge
wcwidth-0.1.7 | py_1 17 KB conda-forge
six-1.11.0 | py37_1001 21 KB conda-forge
pytz-2018.7 | py_0 226 KB conda-forge
icc_rt-2017.0.4 | h97af966_0 8.0 MB
pygments-2.2.0 | py_1 622 KB conda-forge
pickleshare-0.7.5 | py37_1000 12 KB conda-forge
certifi-2018.10.15 | py37_1000 137 KB conda-forge
backcall-0.1.0 | py_0 13 KB conda-forge
mkl_random-1.0.1 | py37h77b88f5_1 267 KB
decorator-4.3.0 | py_0 10 KB conda-forge
numpy-1.15.4 | py37ha559c80_0 36 KB
mkl-2019.0 | 118 178.1 MB
pandas-0.23.4 |py37h830ac7b_1000 8.7 MB conda-forge
prompt_toolkit-2.0.7 | py_0 218 KB conda-forge
python-dateutil-2.7.5 | py_0 218 KB conda-forge
colorama-0.4.0 | py_0 15 KB conda-forge
mkl_fft-1.0.6 | py37hdbbee80_0 120 KB
jedi-0.13.1 | py37_1000 228 KB conda-forge
intel-openmp-2019.0 | 118 1.7 MB
parso-0.3.1 | py_0 59 KB conda-forge
traitlets-4.3.2 | py37_1000 130 KB conda-forge
ipython_genutils-0.2.0 | py_1 21 KB conda-forge
numpy-base-1.15.4 | py37h8128ebf_0 3.9 MB
blas-1.0 | mkl 6 KB
------------------------------------------------------------
Total: 203.7 MB
Thanks @toniatop. Can you create a couple environments with just defaults to see if it's an issue with how it was compiled for conda-forge?
Redone everything forcing --channel anaconda, same results.
cc @jjhelmus any thoughts on
https://github.com/pandas-dev/pandas/issues/23516#issuecomment-436958298? The tldr is that
pd.read_csv
is 3-4x slower on python 3.7 vs. python 3.6I have also tested this with the pre built wheels from: https://www.lfd.uci.edu/~gohlke/pythonlibs/ With the same results.
I also ran a source build of latest GitHub code with python setup.py bdist_wheel
with python 3.7.1 and got the same results that way.
I wonder if something in the build script has changed or some compile flag on windows.
I also ran a source build of latest GitHub code with python setup.py bdist_wheel with python 3.7.1 and got the same results that way.
Did you also try building from source with 3.6?
I will try building with python 3.6.7 today. I will also use the latest version of cython in case that’s the culprit.
Thanks for all those who confirmed this on windows.
I jut rebuilt the latest Pandas version (0.23.4+) from source with latest Cython 0.29 on Python 3.6.7 on Windows 10 and the parsing speed is _fast_. So it seems somehow related to Python 3.7 on windows. Not sure what it could be. Does the Python IO system pass data to the Cython/C parser? Was the Python 3.7 version compiled without optimizations?
Microsoft Windows [Version 10.0.16299.726]
(c) 2017 Microsoft Corporation. All rights reserved.
Python 3.6.7 (v3.6.7:6ec5cf24b7, Oct 20 2018, 13:35:33) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.1.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas as pd
In [2]: pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0+unknown
pytest: None
pip: 10.0.1
setuptools: 39.0.1
Cython: 0.29
numpy: 1.15.4
scipy: None
pyarrow: None
xarray: None
IPython: 7.1.1
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.8
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
In [3]: import io
In [4]: import numpy as np
In [5]: %time df = pd.DataFrame(np.random.randn(1000000, 10), columns=('COL{}'.format(i) for i in range(10)))
Wall time: 207 ms
In [6]: %time csv = io.StringIO(df.to_csv(index=False))
Wall time: 13.2 s
In [7]: %time df2 = df2 = pd.read_csv(csv, float_precision='high')
Wall time: 1.96 s
I posted an Issue linking this on the Python Issue tracker so at least they can take a look. Since the performance seems good on Linux I'm hopeful a compile/config fix could address it for Windows.
Just to cross a plausible suspect off the list - the MSVC version does not seem to matter.
# builds with MSVC 2017 with python 3.7 (assuming installed)
λ python setup.py build_ext -i
# %timeit pd.read_csv('tmp.csv')
# 14.6 s ± 701 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# builds with MSVC 2015
λ "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" x64
λ python setup.py build_ext -i -f
# %timeit pd.read_csv('tmp.csv')
# 15.2 s ± 2.27 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Was the Python 3.7 version compiled without optimizations?
No, 3.7 extensions are continuing to be built with /Ox
Is the slowdown only in IO-related methods?
It seems the issue is actually the float parsing? Which is odd because our xstrtod
doesn't (?) interact with Python at all.
In fact, as @dragoljub noted, using the round_trip
parser is faster, which DOES call back to python
In [1]: %timeit pd.read_csv('tmp.csv')
15.2 s ± 2.27 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [2]: %timeit pd.read_csv('tmp.csv', float_precision='precise')
18.9 s ± 984 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [3]: %timeit pd.read_csv('tmp.csv', float_precision='round_trip')
8.67 s ± 205 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I tested pure 0-9 INT parsing and its also ~3.5X slower on Python 3.7.1 in Windows. So I have a feeling this issue is with the data coming in before we get to any parsing (float or otherwise).
I also see a suspicious line of code being called in Python 3.7 but not in 3.5.2:
{built-in method _thread.allocate_lock}
Could this be something new in Python 3.7.1 that is interrupting the parser? Do we need to release the GIL differently for Python 3.7?
__DataFrame Setup:__
import io
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0, 9, (1000000, 10)), columns=('COL{}'.format(i) for i in range(10)))
csv = io.StringIO(df.to_csv(index=False))
```python
print(df.head().to_csv(index=False))
COL0,COL1,COL2,COL3,COL4,COL5,COL6,COL7,COL8,COL9
2,1,7,3,2,3,0,4,5,5
6,3,1,7,0,3,6,3,0,8
6,8,4,2,1,5,1,4,3,3
0,8,5,8,0,4,1,8,4,1
4,8,0,0,4,0,3,0,6,3
__Python 3.7.1 & Pandas 0.23.4 -- Slow__
```python
3676 function calls (3642 primitive calls) in 2.132 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 2.075 2.075 2.075 2.075 {method 'read' of 'pandas._libs.parsers.TextReader' objects}
1 0.039 0.039 0.039 0.039 internals.py:5017(_stack_arrays)
1 0.009 0.009 2.132 2.132 parsers.py:414(_read)
1 0.004 0.004 0.004 0.004 parsers.py:1685(__init__)
1 0.000 0.000 0.000 0.000 {method 'close' of 'pandas._libs.parsers.TextReader' objects}
161 0.000 0.000 0.000 0.000 common.py:811(is_integer_dtype)
206 0.000 0.000 0.000 0.000 common.py:1835(_get_dtype_type)
824 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
2 0.000 0.000 0.000 0.000 {built-in method nt.stat}
8/2 0.000 0.000 0.001 0.000 <frozen importlib._bootstrap>:978(_find_and_load)
4/3 0.000 0.000 0.000 0.000 base.py:255(__new__)
76 0.000 0.000 0.000 0.000 base.py:61(is_dtype)
2 0.000 0.000 0.001 0.000 {pandas._libs.lib.clean_index_list}
1 0.000 0.000 2.132 2.132 {built-in method builtins.exec}
462 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}
41 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.array}
89/78 0.000 0.000 0.000 0.000 {built-in method builtins.len}
86 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
149 0.000 0.000 0.000 0.000 generic.py:7(_check)
1 0.000 0.000 0.040 0.040 internals.py:4880(form_blocks)
3/2 0.000 0.000 0.001 0.001 series.py:166(__init__)
13 0.000 0.000 0.000 0.000 internals.py:3148(get_block_type)
245 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
7 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.empty}
12 0.000 0.000 0.000 0.000 cast.py:971(maybe_cast_to_datetime)
3 0.000 0.000 0.000 0.000 internals.py:237(mgr_locs)
1 0.000 0.000 0.000 0.000 internals.py:3363(_rebuild_blknos_and_blklocs)
8 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:157(_get_module_lock)
2 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.array_equivalent_object}
2 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:882(_find_spec)
12 0.000 0.000 0.000 0.000 series.py:4019(_sanitize_array)
2 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1356(find_spec)
16 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock}
1 0.000 0.000 2.075 2.075 parsers.py:1846(read)
2 0.000 0.000 0.000 0.000 missing.py:189(_isna_ndarraylike)
2 0.000 0.000 0.000 0.000 {method 'reduce' of 'numpy.ufunc' objects}
1 0.000 0.000 2.117 2.117 parsers.py:1029(read)
1 0.000 0.000 2.132 2.132 parsers.py:542(parser_f)
__Python 3.5.2 & Pandas 0.23.4 -- Fast__
%prun df2 = pd.read_csv(csv, float_precision=None)
2623 function calls (2616 primitive calls) in 0.661 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.604 0.604 0.605 0.605 {method 'read' of 'pandas._libs.parsers.TextReader' objects}
1 0.039 0.039 0.039 0.039 internals.py:4801(_stack_arrays)
1 0.011 0.011 0.661 0.661 parsers.py:423(_read)
1 0.004 0.004 0.004 0.004 parsers.py:1677(__init__)
1 0.000 0.000 0.000 0.000 {method 'close' of 'pandas._libs.parsers.TextReader' objects}
3/2 0.000 0.000 0.001 0.000 base.py:181(__new__)
160 0.000 0.000 0.000 0.000 common.py:777(is_integer_dtype)
186 0.000 0.000 0.000 0.000 common.py:1773(_get_dtype_type)
622 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
108 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
1 0.000 0.000 0.605 0.605 parsers.py:1837(read)
1 0.000 0.000 0.000 0.000 {pandas._libs.lib.clean_index_list}
129 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
118 0.000 0.000 0.000 0.000 generic.py:7(_check)
1 0.000 0.000 0.039 0.039 internals.py:4645(form_blocks)
10 0.000 0.000 0.000 0.000 cast.py:935(maybe_cast_to_datetime)
61 0.000 0.000 0.000 0.000 dtypes.py:85(is_dtype)
415 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}
1 0.000 0.000 0.661 0.661 {built-in method builtins.exec}
1 0.000 0.000 0.661 0.661 parsers.py:557(parser_f)
I also found this bug fix a while ago that made some changes to pandas/src/parser/io.c
PyGILState_Ensure() was called, possibly interacting differently on windows systesm with threads on Python 3.7.
https://github.com/pandas-dev/pandas/pull/11790/files#diff-006bfcefd42c2be38e538fdd3219dbfdL125
I'd be surprised that change matters, but I'm at a loss here, so maybe! Another possibility is that cython made some tweaks to threading logic for python 3.7 compat - again, wouldn't think that's the issue here, but possible some kind of bad interaction.
https://github.com/cython/cython/issues/1978
I'd be surprised that change matters, but I'm at a loss here, so maybe! Another possibility is that cython made some tweaks to threading logic for python 3.7 compat - again, wouldn't think that's the issue here, but possible some kind of bad interaction.
cython/cython#1978
Good info. I'm just surprised that people do not see this on Linux. I'll try OSX next.
FYI my timings above were on OSX (no slowdown)
On Fri, Nov 9, 2018 at 2:16 PM Gagi notifications@github.com wrote:
I'd be surprised that change matters, but I'm at a loss here, so maybe!
Another possibility is that cython made some tweaks to threading logic for
python 3.7 compat - again, wouldn't think that's the issue here, but
possible some kind of bad interaction.
cython/cython#1978 https://github.com/cython/cython/issues/1978Good info. I'm just surprised that people do not see this on Linux. I'll
try OSX next.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/23516#issuecomment-437482307,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIl8M-FYdJ366ZxFHd1Rdur0sKyLEks5uteKAgaJpZM4YPD2Y
.
Looks like the slowdown first shows up in Python 3.7.0a4
:
>C:\python-3.7.0a3-amd64\python.exe -m cProfile -s tottime pandascsv.py
235992 function calls (229477 primitive calls) in 21.525 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
200 11.316 0.057 11.316 0.057 {method 'astype' of 'numpy.ndarray' objects}
100 6.596 0.066 6.596 0.066 {pandas._libs.writers.write_csv_rows}
1 2.111 2.111 2.112 2.112 {method 'read' of 'pandas._libs.parsers.TextReader' objects}
>C:\python-3.7.0a4-amd64\python.exe -m cProfile -s tottime pandascsv.py
236639 function calls (230127 primitive calls) in 26.550 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 9.849 9.849 9.850 9.850 {method 'read' of 'pandas._libs.parsers.TextReader' objects}
200 8.766 0.044 8.766 0.044 {method 'astype' of 'numpy.ndarray' objects}
100 6.469 0.065 6.469 0.065 {pandas._libs.writers.write_csv_rows}
Very Interesting. I'll try Py 3.7.0a3 to confirm this on my systems. Is the diff between 0a3
to 0a4
easy to find from Pythons release notes?
easy to find from Pythons release notes?
https://docs.python.org/3.7/whatsnew/changelog.html#python-3-7-0-alpha-4
Maybe bpo-29240: Add a new UTF-8 mode: implementation of the PEP 540
?
See also https://github.com/python/cpython/compare/v3.7.0a3...v3.7.0a4
I can also confirm the changes from Python 3.7.0a3 to 3.7.0a4 show the slowdown on my Win10 test system. Thanks for finding when the slowdown occurred.
__Python 3.7.0a3 -- Fast Parse__
%prun df2 = pd.read_csv(csv)
5781 function calls (5743 primitive calls) in 3.062 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 2.953 2.953 2.955 2.955 {method 'read' of 'pandas._libs.parsers.TextReader' objects}
1 0.063 0.063 0.063 0.063 internals.py:5017(_stack_arrays)
1 0.016 0.016 3.052 3.052 parsers.py:414(_read)
1 0.009 0.009 3.062 3.062 <string>:1(<module>)
1 0.009 0.009 0.009 0.009 parsers.py:1685(__init__)
32 0.004 0.000 0.004 0.000 {built-in method nt.stat}
1 0.001 0.001 0.001 0.001 {method 'close' of 'pandas._libs.parsers.TextReader' objects}
321 0.001 0.000 0.002 0.000 common.py:811(is_integer_dtype)
516 0.001 0.000 0.001 0.000 common.py:1835(_get_dtype_type)
7 0.001 0.000 0.001 0.000 {built-in method numpy.core.multiarray.empty}
32 0.000 0.000 0.005 0.000 <frozen importlib._bootstrap_external>:1235(find_spec)
988 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
163 0.000 0.000 0.000 0.000 common.py:1527(is_float_dtype)
718 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}
192 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:59(<listcomp>)
192 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:57(_path_join)
8 0.000 0.000 0.005 0.001 <frozen importlib._bootstrap_external>:1119(_get_spec)
133 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
68 0.000 0.000 0.000 0.000 generic.py:7(_check)
__Python 3.7.0a4 -- Slow Parse__
%prun df2 = pd.read_csv(csv)
8007 function calls (7219 primitive calls) in 14.192 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 14.092 14.092 14.094 14.094 {method 'read' of 'pandas._libs.parsers.TextReader' objects}
1 0.061 0.061 0.062 0.062 internals.py:5017(_stack_arrays)
1 0.016 0.016 14.192 14.192 parsers.py:414(_read)
1 0.008 0.008 0.008 0.008 parsers.py:1685(__init__)
32 0.004 0.000 0.004 0.000 {built-in method nt.stat}
1 0.001 0.001 0.001 0.001 {method 'close' of 'pandas._libs.parsers.TextReader' objects}
321 0.001 0.000 0.002 0.000 common.py:811(is_integer_dtype)
516 0.001 0.000 0.001 0.000 common.py:1835(_get_dtype_type)
7 0.001 0.000 0.001 0.000 {built-in method numpy.core.multiarray.empty}
115/4 0.000 0.000 0.001 0.000 abc.py:194(__subclasscheck__)
32 0.000 0.000 0.005 0.000 <frozen importlib._bootstrap_external>:1322(find_spec)
1324/988 0.000 0.000 0.002 0.000 {built-in method builtins.isinstance}
937/725 0.000 0.000 0.002 0.000 {built-in method builtins.issubclass}
163 0.000 0.000 0.000 0.000 common.py:1527(is_float_dtype)
192 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:57(_path_join)
192 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:59(<listcomp>)
8 0.000 0.000 0.005 0.001 <frozen importlib._bootstrap_external>:1206(_get_spec)
89/78 0.000 0.000 0.000 0.000 {built-in method builtins.len}
192 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
I tried playing around with the UTF-8 mode settings with ENV variables and cmd line args on Windows and was not able to get faster parsing speed on Python 3.7.0a4.
https://www.python.org/dev/peps/pep-0540/#proposal
The benefit of the locale coercion approach is that it helps ensure that encoding handling in binary extension modules and child processes is consistent with Python's encoding handling. The upside of the UTF-8 Mode approach is that it allows an embedding application to change the interpreter's behaviour without having to change the process global locale settings.
So is it possible that somewhere the C parser extension we can just set the locale to UTF-8 and this issue would go away on Windows? I was hoping the ENV variable settings would fix the issue but it did not make a difference in my testing.
I compared the statement df2 = pd.read_csv(csv)
on Python 3.7.0a3 and a4 in the Visual Studio profiler. The culprit is the isdigit
function called in the parsers
extension module. On 3.7.0a3
the function is fast at ~8% of samples. On 3.7.0a4
the function is slow at ~64% samples because it calls the _isdigit_l
function, which seems to update and restore the locale in the current thread every time...
3.7.0a3:
Function Name Inclusive Samples Exclusive Samples Inclusive Samples % Exclusive Samples % Module Name
+ [parsers.cp37-win_amd64.pyd] 705 347 28.52% 14.04% parsers.cp37-win_amd64.pyd
isdigit 207 207 8.37% 8.37% ucrtbase.dll
- _errno 105 39 4.25% 1.58% ucrtbase.dll
toupper 24 24 0.97% 0.97% ucrtbase.dll
isspace 21 21 0.85% 0.85% ucrtbase.dll
[python37.dll] 1 1 0.04% 0.04% python37.dll
3.7.0a4:
Function Name Inclusive Samples Exclusive Samples Inclusive Samples % Exclusive Samples % Module Name
+ [parsers.cp37-win_amd64.pyd] 8,613 478 83.04% 4.61% parsers.cp37-win_amd64.pyd
+ isdigit 6,642 208 64.04% 2.01% ucrtbase.dll
+ _isdigit_l 6,434 245 62.03% 2.36% ucrtbase.dll
+ _LocaleUpdate::_LocaleUpdate 5,806 947 55.98% 9.13% ucrtbase.dll
+ __acrt_getptd 2,121 1,031 20.45% 9.94% ucrtbase.dll
FlsGetValue 647 647 6.24% 6.24% KernelBase.dll
- RtlSetLastWin32Error 296 235 2.85% 2.27% ntdll.dll
_guard_dispatch_icall_nop 101 101 0.97% 0.97% ucrtbase.dll
GetLastError 46 46 0.44% 0.44% KernelBase.dll
+ __acrt_update_multibyte_info 1,475 246 14.22% 2.37% ucrtbase.dll
- __crt_state_management::get_current_state_index 1,229 513 11.85% 4.95% ucrtbase.dll
+ __acrt_update_locale_info 1,263 235 12.18% 2.27% ucrtbase.dll
- __crt_state_management::get_current_state_index 1,028 429 9.91% 4.14% ucrtbase.dll
_ischartype_l 383 383 3.69% 3.69% ucrtbase.dll
Great work debugging this. I would guess any other code paths calling isdigit would also be slowed down on windows.
Just a note for people looking at xstrtod
(and thanks for doing so BTW, this looks like a really tough issue): there are two of them (#19361). Off the top of my head I'm not sure which is used in what context.
I may have found a pure python example that seems to show a similar but smaller 2.5X slowdown. Also note the variability is 15X higher for the 3.7.1 code. Possibly indicating that locale argument is passed/used in some calls but not others.
Can someone test this on linux and see if you see a difference?
__Python 3.7.1__
digits = ''.join([str(i) for i in range(10)]*10000000)
%timeit digits.isdigit() # --> 2.5X slower on python 3.7.1
537 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
__Python 3.5.2__
digits = ''.join([str(i) for i in range(10)]*10000000)
%timeit digits.isdigit() # --> 2.5X slower on python 3.7.1
215 ms ± 986 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
--> Based on comments from: https://bugs.python.org/msg329789
it appears this is a pure Unicode test. So it may be unrelated.
@cgohlke has posed a nice minimal example showing the slowdown: https://bugs.python.org/msg329790 Thanks! 👍
Thanks for the investigation @cgohlke - for 0.24 I suppose we should just shim in an ASCII isdigit
function?
(MUSL, MIT licensed)
https://github.com/esmil/musl/blob/master/src/ctype/isdigit.c#L5
@chris-b1 I was thinking the same thing since its quite a simple function, however than changing locale would be limited. I wonder how the windows isdigit function ends up calling the locale version. I don't think that source is available.
I wonder how the windows isdigit function ends up calling the locale version. I don't think that source is available.
The source code for the Windows UCRT is available with recent Windows SDK. It is usually installed under C:\Program Files (x86)\Windows Kits\10\Source
.
The isdigit
and _isdigit_l
functions are defined in ucrt\convert\_ctype.cpp
:
extern "C" extern __inline int (__cdecl isdigit)(int const c)
{
return __acrt_locale_changed()
? (_isdigit_l)(c, nullptr)
: fast_check(c, _DIGIT);
}
extern "C" extern __inline int (__cdecl _isdigit_l)(int const c, _locale_t const locale)
{
_LocaleUpdate locale_update(locale);
return _isdigit_l(c, locale_update.GetLocaleT());
}
The following comment is from the _wsetlocale
function:
// If no call has been made to setlocale to change locale from "C" locale
// to some other locale, we keep locale_changed = 0. Other functions that
// depend on locale use this variable to optimize performance for C locale
// which is normally the case in applications.
So if I’m understanding it correctly. Even if we set the locale in Python to “C” the windows isdigit function would still resort to calling the locale isdigit version slowing down parsing because the local has ‘changed’.
Is that the case in python. 3.7.0a3? Setting locale to “C” slows parsing down?
@jreback @TomAugspurger Do you think a simple shim of the isdigit function in the C parser code would be a fix we could entertain?
It would assume 'ASCII' compatible encoding for numeric columns which I think should cover all/most csv file encodings for digits.
int isdigit(int c)
{
return (unsigned)c-'0' < 10;
}
Yeah, if you want to submit PR, ping me, or if not, I'll try to get to it soon
@chris-b1 Go for it! 😄
@chris-b1 @jreback Thanks for getting this done! Really appreciate it! 👍 😄
Most helpful comment
I compared the statement
df2 = pd.read_csv(csv)
on Python 3.7.0a3 and a4 in the Visual Studio profiler. The culprit is theisdigit
function called in theparsers
extension module. On3.7.0a3
the function is fast at ~8% of samples. On3.7.0a4
the function is slow at ~64% samples because it calls the_isdigit_l
function, which seems to update and restore the locale in the current thread every time...