In [1]: %matplotlib gtk3
In [2]: import pandas
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-2-38d4b0363d82> in <module>()
----> 1 import pandas
~/.local/lib/python3.6/site-packages/pandas/__init__.py in <module>()
56
57 from pandas.util._print_versions import show_versions
---> 58 from pandas.io.api import *
59 from pandas.util._tester import test
60 import pandas.testing
~/.local/lib/python3.6/site-packages/pandas/io/api.py in <module>()
17 from pandas.io.stata import read_stata
18 from pandas.io.pickle import read_pickle, to_pickle
---> 19 from pandas.io.packers import read_msgpack, to_msgpack
20 from pandas.io.gbq import read_gbq
21
~/.local/lib/python3.6/site-packages/pandas/io/packers.py in <module>()
66
67 from pandas.io.msgpack import Unpacker as _Unpacker, Packer as _Packer, ExtType
---> 68 from pandas.util._move import (
69 BadMove as _BadMove,
70 move_into_mutable_buffer as _move_into_mutable_buffer,
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 3: invalid start byte
If I use tk or qt instead, this doesn't happen.
Can you show pd.show_versions()
How'd you install pandas?
Well I'm unable to import pandas after setting the mpl backend as you can see, so I'm reporting the output of show_versions without setting a backend first, I'm not sure whether this matter or not.
In [2]: pandas.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.13-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The installation was a standard pip one: pip install --user pandas
No idea what's going on here, sorry.
@tacaswell have you seen anything like this before? pandas._util.move is a C-extension module.
Maybe it's using some gtk api.
I was having the same problem of you, but without use matplotlib directly.
I was using graph_tool, which has a draw module that imports gtk internally), and pandas. I just have changed the order that I imported the libs.
My code before the workaround:
import graph_tool.all as gt
import pandas as pd
My code after the workaround:
import pandas as pd
import graph_tool.all as gt
So, I think that you should try to import pandas before and, set the matplotlib backend after.
We do some stuff with GTK so for read_clipboard. Perhaps we're doing something wrong there? Either of you mind taking a look?
Anyone able to debug this further?
On my system (Debian Testing), the script
import matplotlib.pyplot as plt
import pandas as pd
causes
Traceback (most recent call last):
File "test.py", line 2, in <module>
import pandas as pd
File "/usr/lib/python3/dist-packages/pandas/__init__.py", line 58, in <module>
from pandas.io.api import *
File "/usr/lib/python3/dist-packages/pandas/io/api.py", line 18, in <module>
from pandas.io.packers import read_msgpack, to_msgpack
File "/usr/lib/python3/dist-packages/pandas/io/packers.py", line 69, in <module>
from pandas.util._move import (
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 3: invalid start byte
when using the gtk3cairo and gtk3agg backends. The error does not occur if the order of the import statements are switched, or if I use the TkCairo or TkAgg backends. /usr/lib/python3/dist-packages/pandas/io/packers.py doesn't have any non-ASCII bytes in it.
This doesn't appear to have anything to do with whether the session is interactive or not; the behavior is the same.
I'd dig deeper, but I wouldn't know where to look.
Edit: pd.show_versions() gives
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.candidate.1
python-bits: 64
OS: Linux
OS-release: 4.15.0-2-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 0.19.1
xarray: None
IPython: 5.5.0
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
My pandas is Debian version 0.20.3-11. Should we remove the "Needs Info" label at this point?
I'm not sure where to look either, sorry.
On Fri, Apr 20, 2018 at 3:53 PM, Alex Robbins notifications@github.com
wrote:
On my system (Debian Testing), the script
import matplotlib.pyplot as pltimport pandas as pd
causes
Traceback (most recent call last):
File "test.py", line 2, in
import pandas as pd
File "/usr/lib/python3/dist-packages/pandas/__init__.py", line 58, in
from pandas.io.api import *
File "/usr/lib/python3/dist-packages/pandas/io/api.py", line 18, in
from pandas.io.packers import read_msgpack, to_msgpack
File "/usr/lib/python3/dist-packages/pandas/io/packers.py", line 69, in
from pandas.util._move import (
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 3: invalid start bytewhen using the gtk3cairo and gtk3agg backends. The error does not occur
if the order of the import statements are switched, or if I use the
TkCairo or TkAgg backends. /usr/lib/python3/dist-
packages/pandas/io/packers.py doesn't have any non-ASCII bytes in it.This doesn't appear to have anything to do with whether the session is
interactive or not; the behavior is the same.I'd dig deeper, but I wouldn't know where to look.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/19706#issuecomment-383219310,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIkUmK1Nur9lMmQhAaf8lPCMG3Ez5ks5tqkrAgaJpZM4SGEen
.
This 0x89 is 'CHARACTER TABULATION WITH JUSTIFICATION' so I am pretty sure this isn't intentional.
My guess is that something is going wrong in PyInit__move but no guess as to _why_.
PS, I like the note of this function "If you want to use this function you are probably wrong."
I am pretty sure this isn't intentional
Oh, definitely. There aren't any non-ASCII bytes in the source at all.
I'm still having this problem which makes impossible to use the gtk3 matplotlib backend.
impossible to use the gtk3 matplotlib backend
This is definitely something that should be fixed, but as per my previous comment, you should be able to use gtk3 by importing pandas before matplotlib.
That never worked for me:
~:: ipython
Python 3.6.6 (default, Jun 27 2018, 13:11:40)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
--------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-1-7dd3504c366f> in <module>()
----> 1 import pandas as pd
~/.local/lib/python3.6/site-packages/pandas/__init__.py in <module>()
55
56 from pandas.util._print_versions import show_versions
---> 57 from pandas.io.api import *
58 from pandas.util._tester import test
59 import pandas.testing
~/.local/lib/python3.6/site-packages/pandas/io/api.py in <module>()
17 from pandas.io.stata import read_stata
18 from pandas.io.pickle import read_pickle, to_pickle
---> 19 from pandas.io.packers import read_msgpack, to_msgpack
20 from pandas.io.gbq import read_gbq
21
~/.local/lib/python3.6/site-packages/pandas/io/packers.py in <module>()
67
68 from pandas.io.msgpack import Unpacker as _Unpacker, Packer as _Packer, ExtType
---> 69 from pandas.util._move import (
70 BadMove as _BadMove,
71 move_into_mutable_buffer as _move_into_mutable_buffer,
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 3: invalid start byte
~:: ipython
Python 3.6.6 (default, Jun 27 2018, 13:11:40)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
--------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-1-7dd3504c366f> in <module>()
----> 1 import pandas as pd
~/.local/lib/python3.6/site-packages/pandas/__init__.py in <module>()
55
56 from pandas.util._print_versions import show_versions
---> 57 from pandas.io.api import *
58 from pandas.util._tester import test
59 import pandas.testing
~/.local/lib/python3.6/site-packages/pandas/io/api.py in <module>()
17 from pandas.io.stata import read_stata
18 from pandas.io.pickle import read_pickle, to_pickle
---> 19 from pandas.io.packers import read_msgpack, to_msgpack
20 from pandas.io.gbq import read_gbq
21
~/.local/lib/python3.6/site-packages/pandas/io/packers.py in <module>()
67
68 from pandas.io.msgpack import Unpacker as _Unpacker, Packer as _Packer, ExtType
---> 69 from pandas.util._move import (
70 BadMove as _BadMove,
71 move_into_mutable_buffer as _move_into_mutable_buffer,
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 3: invalid start byte
Those are two identical code blocks... And in any case, it looks like you're importing numpy in some kind of start up script, before you even get the Python prompt. Is it possible that you've imported matplotlib by then as well?
It runs during ipython startup, it could be that ipython is importing
matplotlib before my imports, yes, I will check that tomorrow and let you
know.
On Mon, Aug 6, 2018, 11:00 PM Alex Robbins notifications@github.com wrote:
Those are two identical code blocks... And in any case, it looks like
you're importing numpy in some kind of start up script, before you even
get the Python prompt. Is it possible that you've imported matplotlib by
then as well?—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/19706#issuecomment-410908147,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACtq-f9UzOkHIGEtHw1shkerlcoHkSFaks5uOPTCgaJpZM4SGEen
.
@alex-robbins no luck here, this is the only config I need to make ipython throw the exception:
c = get_config() # noqa
c.InteractiveShellApp.matplotlib = 'gtk3'
c.TerminalIPythonApp.exec_lines = [
'import pandas as pd',
'import matplotlib.pyplot as plt',
]
So it has been months since the last time I was able to use ipython with gtk. It's not a minor issue IMO.
If you remove the ipython start up scripts can you reproduce the order dependent success?
Can you reproduce this in a plain python prompt?
I have no starup script configured in ipython. If I remove all config the bare minimum I have to do in order to reproduce this from the REPL is what I wrote in the first post of this issue.
I wanted to chime in, that this is still an issue. I have installed pandas both using pip and pacman and both had the same error. When I remove the auto-import of matplotlib in ipython, below in my ipython_config.py script, the issue is resolve and I am able to use pandas inside ipython.
$ cat ~/.ipython/profile_default/ipython_config.py
c = get_config()
c.InteractiveShellApp.exec_lines = [
'from importlib import reload`,
`import numpy`,
`from matplotlib import pyplot',
]
$ python -c "import pandas; print(pandas.show_versions())"
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.16-arch1-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 40.6.2
Cython: None
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.1.1
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
None
I don’t think any of the maintainers have been able to reproduce this. Any debugging you can do would be appreciated.
@TomAugspurger this is the next to minimal scenario in which I was able to reproduce the issue (the minimal one is listed below):
~:: MPLBACKEND=GTK3Agg python
Python 3.7.1 (default, Oct 22 2018, 10:41:28)
[GCC 8.2.1 20180831] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib.pyplot
>>> import pandas
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/carlos/.local/lib/python3.7/site-packages/pandas/__init__.py", line 57, in <module>
from pandas.io.api import *
File "/home/carlos/.local/lib/python3.7/site-packages/pandas/io/api.py", line 19, in <module>
from pandas.io.packers import read_msgpack, to_msgpack
File "/home/carlos/.local/lib/python3.7/site-packages/pandas/io/packers.py", line 69, in <module>
from pandas.util._move import (
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 2: invalid start byte
First of all, 0x89 in position 2 seems to be png files, but I'm not sure what position 2 refers to: from the beginning of a file, of a chunk, whatever. Also it's hard to see why the importer is using utf8 anyway. I checked sys.meta_path after the pyplot import and indeed gobject is adding it's own finder but even if I remove it the error persists. In the initialization function of move.c I'm failing to see any io related code so I still think it must be the importer. Since pandas.util._move imports without problem (i) without importing pyplot first and (ii) importing pyplot first with a non-gtk backend, I assume this is related to the conjunction of pygobject and pandas. Indeed:
~:: python
Python 3.7.1 (default, Oct 22 2018, 10:41:28)
[GCC 8.2.1 20180831] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gi
>>> from gi.repository import Gtk
__main__:1: PyGIWarning: Gtk was imported without specifying a version first. Use gi.require_version('Gtk', '3.0') before import to ensure that the right version gets loaded.
>>> import pandas
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/carlos/.local/lib/python3.7/site-packages/pandas/__init__.py", line 57, in <module>
from pandas.io.api import *
File "/home/carlos/.local/lib/python3.7/site-packages/pandas/io/api.py", line 19, in <module>
from pandas.io.packers import read_msgpack, to_msgpack
File "/home/carlos/.local/lib/python3.7/site-packages/pandas/io/packers.py", line 69, in <module>
from pandas.util._move import (
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 2: invalid start byte
Now we can restate the problem for worse: pandas doesn't work with pygobject.
I also commented everything in util/__init__.py and removed util/__pycache__/* to no avail. I set a breakpoint just before the offending import and take a look at the meta_path there, even removed the extra finders, again to no avail.
I'm giving up for now but hope this helps you track the ultimate cause of the problem.
I commented out the offending import in packers.py and was able to import pandas, then explicitly importing pandas.util._move triggers the error:
~:: python
Python 3.7.1 (default, Oct 22 2018, 10:41:28)
[GCC 8.2.1 20180831] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gi
>>> from gi.repository import Gtk
__main__:1: PyGIWarning: Gtk was imported without specifying a version first. Use gi.require_version('Gtk', '3.0') before import to ensure that the right version gets loaded.
>>> import pandas
>>> import pandas.util._move
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 2: invalid start byte
Also, importing _move relatively to the util directory fails:
~/.lo/lib/pyt/sit/pan/util:: python
Python 3.7.1 (default, Oct 22 2018, 10:41:28)
[GCC 8.2.1 20180831] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gi
>>> from gi.repository import Gtk
__main__:1: PyGIWarning: Gtk was imported without specifying a version first. Use gi.require_version('Gtk', '3.0') before import to ensure that the right version gets loaded.
>>> import _move
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 2: invalid start byte
Directly reading the .so file in text mode complaints about another byte in another position, so it doesn't seem to be the same operation than the importer is doing:
~/.lo/lib/pyt/sit/pan/util:: python
Python 3.7.1 (default, Oct 22 2018, 10:41:28)
[GCC 8.2.1 20180831] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x = open('_move.cpython-37m-x86_64-linux-gnu.so').read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 24: invalid start byte
Btw, position 2 of the .so file is not 0x89:
~/.lo/lib/pyt/sit/pan/util:: od -x _move.cpython-37m-x86_64-linux-gnu.so | head
0000000 457f 464c 0102 0001 0000 0000 0000 0000
0000020 0003 003e 0001 0000 0ab0 0000 0000 0000
0000040 0040 0000 0000 0000 1b48 0000 0000 0000
0000060 0000 0000 0040 0038 0006 0040 001b 001a
0000100 0001 0000 0005 0000 0000 0000 0000 0000
0000120 0000 0000 0000 0000 0000 0000 0000 0000
0000140 0f1c 0000 0000 0000 0f1c 0000 0000 0000
0000160 0000 0020 0000 0000 0001 0000 0006 0000
0000200 1000 0000 0000 0000 1000 0020 0000 0000
0000220 1000 0020 0000 0000 09d0 0000 0000 0000
Importing Glib instead is ok:
>>> from gi.repository import GLib
>>> import pandas
>>>
Also Gdk works. The problem triggers as soon as class Widget(Gtk.Widget): is left in gi/overrides/Gtk.py. Here Gtk = get_introspection_module('Gtk') where
def get_introspection_module(namespace):
"""
:Returns:
An object directly wrapping the gi module without overrides.
Might raise gi._gi.RepositoryError
"""
if namespace in _introspection_modules:
return _introspection_modules[namespace]
version = gi.get_required_version(namespace)
module = IntrospectionModule(namespace, version)
_introspection_modules[namespace] = module
return module
I can leave only
from ..module import get_introspection_module
Gtk = get_introspection_module('Gtk')
Gtk.Widget # or for example Gtk.Container
in the entire Gtk.py module and the problem still happens. I've instrumented every method of IntrospectionModule and none of them is called just before the error occurs.
Clearly the problem is very localized, we've now come to a point where no element can be removed without losing the ability to reproduce the error. Just three lines of the entire Gtk.py module get us into trouble.
OTOH the utf8 aspect could be unrelated to the real problem. Maybe something binary or ill-codified is being written to the terminal thus obfuscating the real nature of the problem.
Finally, at the point of the error the exception shows this trace:
<frozen importlib._bootstrap>(983)_find_and_load()
<frozen importlib._bootstrap>(967)_find_and_load_unlocked()
<frozen importlib._bootstrap>(677)_load_unlocked()
<frozen importlib._bootstrap_external>(728)exec_module()
<frozen importlib._bootstrap>(219)_call_with_frames_removed()
> /home/carlos/.local/lib/python3.7/site-packages/pandas/io/packers.py(76)<module>()
I've reported this to https://gitlab.gnome.org/GNOME/pygobject/issues/285 also, maybe they could help us.
Christoph Reiter from pyobject team was able to track the issue down to:
import ctypes
lib = ctypes.CDLL("libgtk-3.so.0", ctypes.RTLD_GLOBAL)
import pandas
Also the offending line pandas side seems to be https://github.com/pandas-dev/pandas/blob/9f2c7164ec581a484ee167c04478bf9e4f81426b/pandas/util/move.c#L245.
We're getting closer.
cc @llllllllll if you have any thoughts on why that would be failing.
Thanks @memeplex for narrowing it down to just those three lines. I am able to reproduce this on head of master, looking into the problem now. My initial guess is that something is messing up the static data, corrupting the statically allocated c-strings for the module or method names. Python tries to create a PyUnicodeObjectt out of these c-strings to generate the __name__ and __qualname__ fields. The fact that it is consistently UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 2: invalid start byte across systems leads me to believe it is _not_ just reading unitialized memory somewhere.
Now that I look more carefully, given that this fails with just a dlopen, no Python imports required, I think what may be happening is that there is a conflicting symbol defined in this file and libgtk, so when you dlopen(..., RTLD_GLOBAL) you are linking against the version in libgtk, which must be a static buffer that contains 0x89 at position 2. I don't think we have any extern values in this file, but I will compare the symbols provided in both libraries.
I see, method is conflicting so you get the wrong array. I will open a PR to make this static.
Thanks everyone for the help in tracking this down.
I can confirm that this is working for me after last system update. I'm pretty bleeding edge on arch so YMMV.
Most helpful comment
I was having the same problem of you, but without use matplotlib directly.
I was using graph_tool, which has a draw module that imports gtk internally), and pandas. I just have changed the order that I imported the libs.
My code before the workaround:
My code after the workaround:
So, I think that you should try to import pandas before and, set the matplotlib backend after.