Visidata: xlsx file loader: xml.etree.ElementTree.ParseError: not well formed

Created on 1 Dec 2020  路  3Comments  路  Source: saulpw/visidata

Small description
When opening an xlsx file, I get the following error:

xml.etree.ElemeentTree.ParseError: not well formed

Expected result
No error.

Actual result with screenshot
Sorry, no screen shot. Technical difficulties.
If you get an unexpected error, please include the full stack trace that you get with Ctrl-E.

text                                                                                                                           
Traceback (most recent call last):                                                                                               
File "/usr/local/Cellar/visidata/2.0.1/libexec/lib/python3.9/site-packages/visidata/threads.py", line 215, in _toplevelTryFu   
    t.status = func(*args, **kwargs)                                                                                             
File "/usr/local/Cellar/visidata/2.0.1/libexec/lib/python3.9/site-packages/visidata/sheets.py", line 888, in reload  
    for r in vd.Progress(itsource, gerund='loading', total=0):                                                                   
File "/usr/local/Cellar/visidata/2.0.1/libexec/lib/python3.9/site-packages/visidata/threads.py", line 70, in __iter__            
    for item in self.iterable:                                                                                                   
File "/usr/local/Cellar/visidata/2.0.1/libexec/lib/python3.9/site-packages/visidata/loaders/xlsx.py", line 32, in iterload       
    for row in Progress(worksheet.iter_rows(), total=worksheet.max_row or 0):                                                    
File "/usr/local/Cellar/visidata/2.0.1/libexec/lib/python3.9/site-packages/visidata/threads.py", line 70, in __iter__            
    for item in self.iterable:                                                                                                   
File "/usr/local/Cellar/visidata/2.0.1/libexec/lib/python3.9/site-packages/openpyxl/worksheet/_read_only.py", line 82, in _c     
    for idx, row in parser.parse():                                                                                              
File "/usr/local/Cellar/visidata/2.0.1/libexec/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py", line 139, in pars 
    for _, element in it:                                                                                                        
File "/usr/local/Cellar/[email protected]/3.9.0_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/xml/etree/ElementTree.py",     
    yield from pullparser.read_events()                                                                                          
File "/usr/local/Cellar/[email protected]/3.9.0_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/xml/etree/ElementTree.py",     
    raise event                                                                                                                  
File "/usr/local/Cellar/[email protected]/3.9.0_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/xml/etree/ElementTree.py",      
    self._parser.feed(data)                                                                                                     
File "<string>", line None   

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 202, column 31102623

Steps to reproduce with sample data and a .vd
Please attach the commandlog (saved with Ctrl-D) to show the steps that led to the issue.
See here for more details.

Sorry, but due to the sensitive nature of this data I can not provide a sample.
If I can determine exactly what in the data is causing this issue I will update the post, or if I can make a minimal dataset that reproduces this error.

Additional context
Please include the version of VisiData.

VisiData v2.0.1

Notably this file is successfully parsed by the R library readxl (and I can subsequently open the file in visidata with the rvisidata R library).
The data seems to be well formed when opened in R, this seems to be a limitation of the openpyxl library (or possibly of the xml library).

bug

Most helpful comment

Sorry, I can not share the file.
I should have tried this earlier, but attempting to open the file in excel failed.
Actually excel repaired it by removing the offending table and producing a blank file.

I think this problem is out of visidatas control. Apparently readxl is the gold standard in repairing excel files, :).

All 3 comments

Hi @suntzuisafterU!

Is it possible to share the xlsx file that you saw this problem with? Because we cannot reproduce this with our sample xlsx file.

One thing left for me to try is to check if this is a Python 3.9 xml problem. @ajkerrigan, do you see this with a homebrew'd VisiData?

Sorry, I can not share the file.
I should have tried this earlier, but attempting to open the file in excel failed.
Actually excel repaired it by removing the offending table and producing a blank file.

I think this problem is out of visidatas control. Apparently readxl is the gold standard in repairing excel files, :).

Hi @suntzuisafterU, if you can find a way to read the file using some kind of Python library, I'd be very interested. Ideally we would submit a fix upstream to openpyxl if we figured out what was going wrong.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

aborruso picture aborruso  路  3Comments

frosencrantz picture frosencrantz  路  4Comments

geekscrapy picture geekscrapy  路  3Comments

zaiste picture zaiste  路  4Comments

paulklemm picture paulklemm  路  4Comments