Openrefine: open refine can not export excel tables larger than 65536

Created on 21 Feb 2015  Â·  10Comments  Â·  Source: OpenRefine/OpenRefine

I am trying to export a 100k table to a excel file. all the other export options seem to work except excel.

HTTP ERROR 500

Problem accessing /command/core/export-rows/concat-xlsx.xls. Reason:

Invalid row number (65536) outside allowable range (0..65535)

Caused by:

java.lang.IllegalArgumentException: Invalid row number (65536) outside allowable range (0..65535)
at org.apache.poi.hssf.usermodel.HSSFRow.setRowNum(HSSFRow.java:232)
at org.apache.poi.hssf.usermodel.HSSFRow.(HSSFRow.java:86)
at org.apache.poi.hssf.usermodel.HSSFRow.(HSSFRow.java:70)
at org.apache.poi.hssf.usermodel.HSSFSheet.createRow(HSSFSheet.java:205)
at org.apache.poi.hssf.usermodel.HSSFSheet.createRow(HSSFSheet.java:71)
at com.google.refine.exporters.XlsExporter$1.addRow(XlsExporter.java:96)
at com.google.refine.exporters.CustomizableTabularExporterUtilities$1.visit(CustomizableTabularExporterUtilities.java:157)
at com.google.refine.browsing.util.ConjunctiveFilteredRows.visitRow(ConjunctiveFilteredRows.java:76)
at com.google.refine.browsing.util.ConjunctiveFilteredRows.accept(ConjunctiveFilteredRows.java:65)
at com.google.refine.exporters.CustomizableTabularExporterUtilities.exportRows(CustomizableTabularExporterUtilities.java:171)
at com.google.refine.exporters.XlsExporter.export(XlsExporter.java:138)
at com.google.refine.commands.project.ExportRowsCommand.doPost(ExportRowsCommand.java:107)
at com.google.refine.RefineServlet.service(RefineServlet.java:177)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81)
at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:938)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

bug export

All 10 comments

I believe that this can be fixed by updating the version of poi library. I already did this upgrade on my own fork:
https://github.com/jackyq2015/OpenRefine/

@daviddou82, If you can use the same file to test against the build from above, that will be helpful.

@tfmorris Is there a quick workaround without upgrading the poi?

You can use XSSF SXSSF instead ..

The POI library has been updated for a while now so it would be worth trying again to see if this is still an issue.

FIXED.

Dataset tested (converted to XLSX then loaded into OpenRefine): https://www.kaggle.com/akhilv11/border-crossing-entry-data/version/1

Result: PASS
Annotation 2019-12-20 131344

@thadguidry note that this bug was about exporting, not importing XLS files… Have you checked in the other direction?

XLS has never supported more than that. Only XLSX does.

In a Xls workbook, the row limit is 65,536 (216) and 256 columns (28) which corresponds to the column IV. Now with xlsx workbooks (and xlsm), limits are 1,048,576 rows (220) and 16,384 columns (214) or the column XFD.

Exporting 346733 rows as an XLSX file does cause an error however, if you 2nd run it...

  1. expose the XLS error "larger than 65536" by attempting to save as XLS format.
  2. see the error in the console and a 2nd browser tab that appears.
  3. then go back to other browser tab with OpenRefine and try to save as XLSX, you will get the following error ...
13:44:19.229 [                   refine] POST /command/core/export-rows/Border_Crossing_Entry_Data.xlsx (14024ms)
ERROR:  'org.mortbay.jetty.EofException'
13:44:38.360 [        ExportRowsCommand] error:Fail to save: an error occurs while saving the package : The part /docProps/app.xml failed to be saved in the stream with marshaller org.apache.poi.openxml4j.opc.internal.marshallers.DefaultMarshaller@4251036a (19131ms)
13:44:38.360 [          org.mortbay.log] /command/core/export-rows/Border_Crossing_Entry_Data.xlsx (0ms)
org.apache.poi.openxml4j.exceptions.OpenXML4JRuntimeException: Fail to save: an error occurs while saving the package : The part /docProps/app.xml failed to be saved in the stream with marshaller org.apache.poi.openxml4j.opc.internal.marshallers.DefaultMarshaller@4251036a
        at org.apache.poi.openxml4j.opc.ZipPackage.saveImpl(ZipPackage.java:558)
        at org.apache.poi.openxml4j.opc.OPCPackage.save(OPCPackage.java:1488)
        at org.apache.poi.ooxml.POIXMLDocument.write(POIXMLDocument.java:242)
        at com.google.refine.exporters.XlsExporter.export(XlsExporter.java:143)
        at com.google.refine.commands.project.ExportRowsCommand.doPost(ExportRowsCommand.java:119)
        at com.google.refine.RefineServlet.service(RefineServlet.java:189)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
        at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81)
        at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:938)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:830)
Caused by: org.apache.poi.openxml4j.exceptions.OpenXML4JException: The part /docProps/app.xml failed to be saved in the stream with marshaller org.apache.poi.openxml4j.opc.internal.marshallers.DefaultMarshaller@4251036a
        at org.apache.poi.openxml4j.opc.ZipPackage.saveImpl(ZipPackage.java:547)
        ... 27 more

However, I am able to save the XLSX file format without error, if doing a 1st run export.

Thanks a lot for the testing! So if I understand correctly this is still an issue with POI. Let's see if updating it works!

Tested (dropped in POI 4.1.1 jars), that fixes the 2nd run error with updated POI 4.1.1 (no longer get the OpenXML4J exception there)

The 1st run error when saving more than 65536 rows into an XLS file will still occur (as expected)

Shall we close this as fixed now?

14:29:17.794 [                   refine] POST /command/core/export-rows/Border_Crossing_Entry_Data.xls (26326ms)
14:29:18.375 [        ExportRowsCommand] error:Invalid row number (65536) outside allowable range (0..65535) (581ms)
14:29:18.376 [          org.mortbay.log] /command/core/export-rows/Border_Crossing_Entry_Data.xls (1ms)
java.lang.IllegalArgumentException: Invalid row number (65536) outside allowable range (0..65535)
        at org.apache.poi.hssf.usermodel.HSSFRow.setRowNum(HSSFRow.java:233)
        at org.apache.poi.hssf.usermodel.HSSFRow.<init>(HSSFRow.java:87)
        at org.apache.poi.hssf.usermodel.HSSFRow.<init>(HSSFRow.java:71)
        at org.apache.poi.hssf.usermodel.HSSFSheet.createRow(HSSFSheet.java:263)
        at org.apache.poi.hssf.usermodel.HSSFSheet.createRow(HSSFSheet.java:86)
        at com.google.refine.exporters.XlsExporter$1.addRow(XlsExporter.java:100)
        at com.google.refine.exporters.CustomizableTabularExporterUtilities$1.visit(CustomizableTabularExporterUtilities.java:168)
        at com.google.refine.browsing.util.ConjunctiveFilteredRows.visitRow(ConjunctiveFilteredRows.java:76)
        at com.google.refine.browsing.util.ConjunctiveFilteredRows.accept(ConjunctiveFilteredRows.java:65)
        at com.google.refine.exporters.CustomizableTabularExporterUtilities.exportRows(CustomizableTabularExporterUtilities.java:182)
        at com.google.refine.exporters.XlsExporter.export(XlsExporter.java:140)
        at com.google.refine.commands.project.ExportRowsCommand.doPost(ExportRowsCommand.java:119)
        at com.google.refine.RefineServlet.service(RefineServlet.java:189)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
        at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81)
        at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:938)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:830)
Was this page helpful?
0 / 5 - 0 ratings

Related issues

wetneb picture wetneb  Â·  3Comments

thadguidry picture thadguidry  Â·  3Comments

kushthedude picture kushthedude  Â·  3Comments

davidegiunchidiennea picture davidegiunchidiennea  Â·  3Comments

dantexier picture dantexier  Â·  4Comments