Small description
When opening an xlsx file, I get the following error:
xml.etree.ElemeentTree.ParseError: not well formed
Expected result
No error.
Actual result with screenshot
Sorry, no screen shot. Technical difficulties.
If you get an unexpected error, please include the full stack trace that you get with Ctrl-E.
text
Traceback (most recent call last):
File "/usr/local/Cellar/visidata/2.0.1/libexec/lib/python3.9/site-packages/visidata/threads.py", line 215, in _toplevelTryFu
t.status = func(*args, **kwargs)
File "/usr/local/Cellar/visidata/2.0.1/libexec/lib/python3.9/site-packages/visidata/sheets.py", line 888, in reload
for r in vd.Progress(itsource, gerund='loading', total=0):
File "/usr/local/Cellar/visidata/2.0.1/libexec/lib/python3.9/site-packages/visidata/threads.py", line 70, in __iter__
for item in self.iterable:
File "/usr/local/Cellar/visidata/2.0.1/libexec/lib/python3.9/site-packages/visidata/loaders/xlsx.py", line 32, in iterload
for row in Progress(worksheet.iter_rows(), total=worksheet.max_row or 0):
File "/usr/local/Cellar/visidata/2.0.1/libexec/lib/python3.9/site-packages/visidata/threads.py", line 70, in __iter__
for item in self.iterable:
File "/usr/local/Cellar/visidata/2.0.1/libexec/lib/python3.9/site-packages/openpyxl/worksheet/_read_only.py", line 82, in _c
for idx, row in parser.parse():
File "/usr/local/Cellar/visidata/2.0.1/libexec/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py", line 139, in pars
for _, element in it:
File "/usr/local/Cellar/[email protected]/3.9.0_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/xml/etree/ElementTree.py",
yield from pullparser.read_events()
File "/usr/local/Cellar/[email protected]/3.9.0_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/xml/etree/ElementTree.py",
raise event
File "/usr/local/Cellar/[email protected]/3.9.0_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/xml/etree/ElementTree.py",
self._parser.feed(data)
File "<string>", line None
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 202, column 31102623
Steps to reproduce with sample data and a .vd
Please attach the commandlog (saved with Ctrl-D) to show the steps that led to the issue.
See here for more details.
Sorry, but due to the sensitive nature of this data I can not provide a sample.
If I can determine exactly what in the data is causing this issue I will update the post, or if I can make a minimal dataset that reproduces this error.
Additional context
Please include the version of VisiData.
VisiData v2.0.1
Notably this file is successfully parsed by the R library readxl (and I can subsequently open the file in visidata with the rvisidata R library).
The data seems to be well formed when opened in R, this seems to be a limitation of the openpyxl library (or possibly of the xml library).
Hi @suntzuisafterU!
Is it possible to share the xlsx file that you saw this problem with? Because we cannot reproduce this with our sample xlsx file.
One thing left for me to try is to check if this is a Python 3.9 xml problem. @ajkerrigan, do you see this with a homebrew'd VisiData?
Sorry, I can not share the file.
I should have tried this earlier, but attempting to open the file in excel failed.
Actually excel repaired it by removing the offending table and producing a blank file.
I think this problem is out of visidatas control. Apparently readxl is the gold standard in repairing excel files, :).
Hi @suntzuisafterU, if you can find a way to read the file using some kind of Python library, I'd be very interested. Ideally we would submit a fix upstream to openpyxl if we figured out what was going wrong.
Most helpful comment
Sorry, I can not share the file.
I should have tried this earlier, but attempting to open the file in excel failed.
Actually excel repaired it by removing the offending table and producing a blank file.
I think this problem is out of visidatas control. Apparently readxl is the gold standard in repairing excel files, :).