We're attempting to automate data-driven API testing by generating a Postman environment file including query output for use with Newman.
We're using the standard Windows Powershell cmdlets for handling JSON data and text files to do this, and they work well except: without specifying an encoding, Out-File encodes using UTF16-LE _with byte order marker_. Newman chokes on the BOM, emitting this error:
" i ...'token � in JSON at position 0 while parsing near '��{
The same environment file imports successfully into Postman.
Explicitly specifying ASCII encoding when generating the environment file allows it to work with Newman, but there is the possibility the test data will contain international text (accented characters, oriental glyphs, etc.) and encoding to ASCII may break any test that happens to hit that data.
Please implement UTF-8 and UTF-16 (LE & BE) BOM support into the Newman .json parser, in addition to straight ASCII/ANSI.
newman run "test.json" -e "env.json" (filenames shortened)Steps to reproduce the problem (probably):
Yep. I haven’t tested this personally yet. But I can imagine that this could be a problem if we are doing plain simple JSON.parse
@codenirvana - I have handled this in liquid json - https://github.com/postmanlabs/liquid-json
Can we replace all json.parse of input files with this and add tests?
Well, since we are using another module for better json error help, this may be tricky. But it seems that .trim() removes BOM too.
That's not surprising as the BOM is actually an encoded optional-whitespace character...
I hesitate to jump on the .trim() bandwagon absent testing to ensure that doing so does not lead to corruption of any Unicode data that may be in the file...
I haven't actually gotten test data with that yet (I'm still working on the basic data-driven testing framework itself), but I'd like to ask you ensure your test suite for this has a file containing non-latin Unicode data (like, Chinese characters) to verify it continues to be properly handled. I don't see anything like that in your commit.
Thanks!
Yeah. Test suite has Unicode characters. And that did not fail.
@jhardin-aptos https://github.com/postmanlabs/newman/pull/1874 should fix your issue. Can you check that out? git install the repo and switch to that branch and then do npm install with that directory as source. If that's too much work ;-) then we will keep this issue updated as release happens.
At the moment just forcing ASCII encoding has unblocked us. When the next Newman release happens I'll test it and we'll upgrade the QA environment. If we _do_ get any tests that fail in the meantime due to that mangling non-Latin characters I'll note the cause and we'll ignore it.
Thanks!
Alright, spoke too soon (thanks @codenirvana). The file you sent has more than the BOM, it's entire encoding is UTF8-LE (how stupid of me of even thinking otherwise.)
Automatic character encoding detection is tricky and using any third-party library to do that has the following two challenges:
So, what makes sense is we open up options that allows users to specify what encoding should be used to read files.
I have stumbled upon some great work on a library called chardet (https://github.com/runk/node-chardet), where the tradeoff could be performance. I'm checking that out as I type.
It is actually UTF16-LE.
I can see the pain in trying to detect whether a file is ANSI or UTF8... _But:_ you have a BOM, so that's not necessary. I'm a bit surprised JS isn't automagically interpreting that if you've specified you're reading from a text file.
Perhaps rather than just blindly discarding the BOM via trim(), do this: after reading the file check whether there is a BOM present (the file contents string starts with \xfeff or \xfffe), and if so then re-read the file explicitly specifying the correct encoding based on the BOM?
Alternatively: peek the first two bytes of the file to determine the encoding, then read it once with the correct encoding.
I have updated the branch to do auto detection - for a limited subset of encodings.
@jhardin-aptos Thanks for reporting, this has been fixed in Newman v4.4.0.
Know more about supported file encodings: https://github.com/postmanlabs/newman#file-encoding
Just upgraded to Newman 4.4.1 and ran a test with accented data from the database saved in a UTF-16LE (standard Powershell text output) environment file and it worked correctly.
Thanks!