Sheetjs: Weird Characters parsed

Created on 11 Aug 2020  路  3Comments  路  Source: SheetJS/sheetjs

I have this example csv file, when parse it using:
const workbook = XLSX.read(data, { type: 'array' })
It output characters like <p>脗 </p>, which is actually a space.
example_error_character.csv.zip

Most helpful comment

You're right, the array case in https://github.com/SheetJS/sheetjs/blob/master/bits/40_harb.js#L888 does not handle the codepage argument. As a temporary workaround, convert to binary string as shown in https://jsfiddle.net/7Lrmxb8c/ :

/* assuming data is an Array or Uint8Array */
const binary = [...data].map(x => String.fromCharCode(x)).join("");
const workbook = XLSX.read(binary, { type: 'binary', codepage: 65001 });

All 3 comments

It's a UTF8 CSV but missing the BOM. You can see this using xxd:

00000070: 6972 204d 6174 222c 223c 703e c2a0 3c2f  ir Mat","<p>..</
00000080: 703e 0a3c 703e c2a0 3c2f 703e 222c 2c2c  p>.<p>..</p>",,,

To force a UTF8 interpretation, pass the option codepage: 65001:

const workbook = XLSX.read(data, { type: 'array', codepage: 65001 })

It's a UTF8 CSV but missing the BOM. You can see this using xxd:

00000070: 6972 204d 6174 222c 223c 703e c2a0 3c2f  ir Mat","<p>..</
00000080: 703e 0a3c 703e c2a0 3c2f 703e 222c 2c2c  p>.<p>..</p>",,,

To force a UTF8 interpretation, pass the option codepage: 65001:

const workbook = XLSX.read(data, { type: 'array', codepage: 65001 })

Thanks for respone.
I've tried to add the codepage configuration, still not work, still outputs:
<p>脗 </p>

You're right, the array case in https://github.com/SheetJS/sheetjs/blob/master/bits/40_harb.js#L888 does not handle the codepage argument. As a temporary workaround, convert to binary string as shown in https://jsfiddle.net/7Lrmxb8c/ :

/* assuming data is an Array or Uint8Array */
const binary = [...data].map(x => String.fromCharCode(x)).join("");
const workbook = XLSX.read(binary, { type: 'binary', codepage: 65001 });
Was this page helpful?
0 / 5 - 0 ratings

Related issues

sudhakar-sekar picture sudhakar-sekar  路  3Comments

seanmcilvenna picture seanmcilvenna  路  3Comments

m-ketan picture m-ketan  路  3Comments

jamesbillinger picture jamesbillinger  路  4Comments

gustavosimil picture gustavosimil  路  3Comments