I have this example csv file, when parse it using:
const workbook = XLSX.read(data, { type: 'array' })
It output characters like <p>脗 </p>, which is actually a space.
example_error_character.csv.zip
It's a UTF8 CSV but missing the BOM. You can see this using xxd:
00000070: 6972 204d 6174 222c 223c 703e c2a0 3c2f ir Mat","<p>..</
00000080: 703e 0a3c 703e c2a0 3c2f 703e 222c 2c2c p>.<p>..</p>",,,
To force a UTF8 interpretation, pass the option codepage: 65001:
const workbook = XLSX.read(data, { type: 'array', codepage: 65001 })
It's a UTF8 CSV but missing the BOM. You can see this using
xxd:00000070: 6972 204d 6174 222c 223c 703e c2a0 3c2f ir Mat","<p>..</ 00000080: 703e 0a3c 703e c2a0 3c2f 703e 222c 2c2c p>.<p>..</p>",,,To force a UTF8 interpretation, pass the option
codepage: 65001:const workbook = XLSX.read(data, { type: 'array', codepage: 65001 })
Thanks for respone.
I've tried to add the codepage configuration, still not work, still outputs:
<p>脗 </p>
You're right, the array case in https://github.com/SheetJS/sheetjs/blob/master/bits/40_harb.js#L888 does not handle the codepage argument. As a temporary workaround, convert to binary string as shown in https://jsfiddle.net/7Lrmxb8c/ :
/* assuming data is an Array or Uint8Array */
const binary = [...data].map(x => String.fromCharCode(x)).join("");
const workbook = XLSX.read(binary, { type: 'binary', codepage: 65001 });
Most helpful comment
You're right, the array case in https://github.com/SheetJS/sheetjs/blob/master/bits/40_harb.js#L888 does not handle the codepage argument. As a temporary workaround, convert to binary string as shown in https://jsfiddle.net/7Lrmxb8c/ :