Sheetjs: Invalid HTML: could not find <table>

Created on 22 May 2018  路  9Comments  路  Source: SheetJS/sheetjs

Usually it does succedd but sometimes I get this error and it crashes any idea?

if(!mtch) throw new Error("Invalid HTML: could not find

");
^

Error: Invalid HTML: could not find


at html_to_sheet (/.../node_modules/xlsx/xlsx.js:17000:19)
at Object.html_to_book [as to_workbook] (/.../node_modules/xlsx/xlsx.js:17056:28)
at parse_xlml_xml (/.../node_modules/xlsx/xlsx.js:13714:26)
at parse_xlml (/.../node_modules/xlsx/xlsx.js:14362:53)
at readSync (/../node_modules/xlsx/xlsx.js:18394:21)
at Object.readFileSync (/.../node_modules/xlsx/xlsx.js:18412:9)
at Request._callback (/.../server/routes/api.js:137:24)
at Request.self.callback (/.../node_modules/request/request.js:186:22)
at emitTwo (events.js:106:13)
at Request.emit (events.js:191:7)
at Request. (/.../node_modules/request/request.js:1163:10)
at emitOne (events.js:101:20)
at Request.emit (events.js:188:7)
at IncomingMessage. (/.../node_modules/request/request.js:1085:12)
at IncomingMessage.g (events.js:291:16)
at emitNone (events.js:91:20)
at IncomingMessage.emit (events.js:185:7)
at endReadableNT (_stream_readable.js:974:12)
at _combinedTickCallback (internal/process/next_tick.js:74:11)
at process._tickCallback (internal/process/next_tick.js:98:9)
[nodemon] app crashed - waiting for file changes before starting...

Most helpful comment

I found the error! in rectJS the file xlsx must be in the 'public' folder

All 9 comments

That error is triggered when the file is suspected to be HTML but the actual TABLE tag cannot be found. Can you capture and share an example file?

Hey,
It is an actual xlsm file which it converts sometime but it crashes sometime. Sorry the file is confidential cannot share but it is a basic .xlsm file

If it's an actual XLSM file, it would start with 0x50 0x4B ("PK") and would never trigger the error. Given that you are using request, It's likely you are seeing bad data (e.g. doing a HTTP request and receiving an HTML payload with the 404 error). Try wrapping XLSX.read in a callback and saving he data to a file -- likely its an HTML error page.

The same thing happens to me and in the registry of the console everything looks very good, but throws the same error

I found the error! in rectJS the file xlsx must be in the 'public' folder

I had the same issue when I use node to read the excel file in the docker container, it shows Error: Invalid HTML: could not find

.

I debugged into the xlsx.js file and found that the function firstbyte(d, o))[0] return 0x3C which indicated as xlml file, it suppose to return 0x50, the module source code piece below:

switch((n = firstbyte(d, o))[0]) {
case 0xD0: return read_cfb(CFB.read(d, o), o);
case 0x09: return parse_xlscfb(d, o);
case 0x3C: return parse_xlml(d, o);
case 0x49: if(n[1] === 0x44) return read_wb_ID(d, o); break;
case 0x54: if(n[1] === 0x41 && n[2] === 0x42 && n[3] === 0x4C) return DIF.to_workbook(d, o); break;
case 0x50: return (n[1] === 0x4B && n[2] < 0x09 && n[3] < 0x09) ? read_zip(d, o) : read_prn(data, d, o, str);
case 0xEF: return n[3] === 0x3C ? parse_xlml(d, o) : read_prn(data, d, o, str);
case 0xFF: if(n[1] === 0xFE) { return read_utf16(d, o); } break;
case 0x00: if(n[1] === 0x00 && n[2] >= 0x02 && n[3] === 0x00) return WK_.to_workbook(d, o); break;
case 0x03: case 0x83: case 0x8B: case 0x8C: return DBF.to_workbook(d, o);
case 0x7B: if(n[1] === 0x5C && n[2] === 0x72 && n[3] === 0x74) return RTF.to_workbook(d, o); break;
case 0x0A: case 0x0D: case 0x20: return read_plaintext_raw(d, o);
}

I found the error! in rectJS the file xlsx must be in the 'public' folder

What do you mean about the 'public' folder, should we name the folder as public or have this folder 's permission go public?

@wodeleeway @MedinaGitHub file type is deduced by looking at the magic (first few bytes), that's what you're seeing in that code block. If the file starts with "<" the immediate guess is that it's an HTML or XML-based format.

If you are using XHR or fetch and requesting a file that isn't available, you'll get back the 404 response. Usually the body is a small HTML page, which is why you see the error in question.

Your code should defend against it by checking the status of the request. With fetch it is available as res.status in the first callback:

fetch(myRequest).then(function(res) {
  if(res.status == 404) { /* file not found */ }

How to resolve the 404 errors is beyond the scope of this project.

I found the error! in rectJS the file xlsx must be in the 'public' folder

This worked for me too

Was this page helpful?
0 / 5 - 0 ratings

Related issues

magtuan picture magtuan  路  3Comments

lxzhh picture lxzhh  路  3Comments

gustavosimil picture gustavosimil  路  3Comments

eyalcohen4 picture eyalcohen4  路  3Comments

sangpuion picture sangpuion  路  3Comments