Pandas: Enhancement: XLSB support in read_excel()

Created on 11 Oct 2014  路  15Comments  路  Source: pandas-dev/pandas

openpyxl and xlrd do not support XLSB. I'm curious if anyone has taken a look at integrating (more like creating) the functionality into Pandas. Looks like it could be a Python package in it self.

Spec from Microsoft:
http://msdn.microsoft.com/en-us/library/cc313133(v=office.12).aspx

Enhancement IO Excel

All 15 comments

Based on https://github.com/python-excel/xlrd/issues/83 it seems that XLRD won't have XLSB support for a while (if ever)

The only liberally licensed tool for XLSB support is https://github.com/SheetJS/js-xlsx which uses JS but ships with a nodejs-powered script that can be run from the command line

I found that link too, but I was hoping I could compile a command line utility that I would call from within Python instead of having to run a node server to execute the conversions.

@kevindavenport you need to install node but don't need to run it as a server. It's like running a PHP script with the PHP CLI

Am I doing something wrong then by
$NodeJS/bin/node js-xlsx-master/bin/xlsx.njs
I get:

module.js:340
    throw err;
          ^
Error: Cannot find module 'jszip'
...
...

@kevindavenport If you downloaded from source directly, you need to run npm install from the js-xlsx-master directory directly

If you run npm install -g xlsx, it creates a symlink /usr/local/bin/xlsx which you can use like:

$ xlsx test.xlsx Sheet1 
1,2,3
4,5,6
5,7,9

Would this library help in the implementation of this feature?

https://pypi.org/project/pyxlsb/

see the following solution in stack overflow:
https://stackoverflow.com/questions/45019778/read-xlsb-file-in-pandas-python?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

being able to do this directly from pandas would be great.

PRs would be welcome

I would love to take a crack - but whose endorsement do we get before spending the time trying to integrate pyxlsb natively into pandas read

@kevindavenport : Hey there, sorry that this conversation suddenly went dark. We are more than open to an implementation / PR at this point. If you have time / able, just go for it!

@gfyoung : So would integration of pyxlsb as @velxundussa suggested be acceptable solution?

@talamb : If you can implement and submit as a PR, we will definitely take a look.

@talamb if interested in trying a PR you might want to take a look at #25427 and #25092 which added reading support for other formats. In a nutshell for this would want to copy the existing test files to .xlsb format, and add the appropriate parametrization in the test_readers.py module. Then subclass _BaseExcelReader and should fall into place

@WillAyd Thanks! Should be seeing a PR from me in the near future.

is this issue fixed????

@praful-potphode We have a PR open (#29836) that is trying to address this issue. If you have any thoughts on pushing that PR forward, that would be great!

Was this page helpful?
0 / 5 - 0 ratings