openpyxl and xlrd do not support XLSB. I'm curious if anyone has taken a look at integrating (more like creating) the functionality into Pandas. Looks like it could be a Python package in it self.
Spec from Microsoft:
http://msdn.microsoft.com/en-us/library/cc313133(v=office.12).aspx
Based on https://github.com/python-excel/xlrd/issues/83 it seems that XLRD won't have XLSB support for a while (if ever)
The only liberally licensed tool for XLSB support is https://github.com/SheetJS/js-xlsx which uses JS but ships with a nodejs-powered script that can be run from the command line
I found that link too, but I was hoping I could compile a command line utility that I would call from within Python instead of having to run a node server to execute the conversions.
@kevindavenport you need to install node but don't need to run it as a server. It's like running a PHP script with the PHP CLI
Am I doing something wrong then by
$NodeJS/bin/node js-xlsx-master/bin/xlsx.njs
I get:
module.js:340
throw err;
^
Error: Cannot find module 'jszip'
...
...
@kevindavenport If you downloaded from source directly, you need to run npm install from the js-xlsx-master directory directly
If you run npm install -g xlsx, it creates a symlink /usr/local/bin/xlsx which you can use like:
$ xlsx test.xlsx Sheet1
1,2,3
4,5,6
5,7,9
Would this library help in the implementation of this feature?
https://pypi.org/project/pyxlsb/
see the following solution in stack overflow:
https://stackoverflow.com/questions/45019778/read-xlsb-file-in-pandas-python?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
being able to do this directly from pandas would be great.
PRs would be welcome
I would love to take a crack - but whose endorsement do we get before spending the time trying to integrate pyxlsb natively into pandas read
@kevindavenport : Hey there, sorry that this conversation suddenly went dark. We are more than open to an implementation / PR at this point. If you have time / able, just go for it!
@gfyoung : So would integration of pyxlsb as @velxundussa suggested be acceptable solution?
@talamb : If you can implement and submit as a PR, we will definitely take a look.
@talamb if interested in trying a PR you might want to take a look at #25427 and #25092 which added reading support for other formats. In a nutshell for this would want to copy the existing test files to .xlsb format, and add the appropriate parametrization in the test_readers.py module. Then subclass _BaseExcelReader and should fall into place
@WillAyd Thanks! Should be seeing a PR from me in the near future.
is this issue fixed????
@praful-potphode We have a PR open (#29836) that is trying to address this issue. If you have any thoughts on pushing that PR forward, that would be great!