Json: [json.exception.type_error.316] invalid UTF-8 byte at index 1: 0xC3

Created on 4 Dec 2018 · 5Comments · Source: nlohmann/json

What is the issue you have?
Can't dump json object into string
Please describe the steps to reproduce the issue. Can you provide a small but working code example?

    nlohmann::json fJson;
    std::string codigo_ativo("ÇÃO");
    fJson["CODIGO_ATIVO"] = codigo_ativo;
    fJson.dump();

What is the expected behavior?
the .dump() method to generate the serialized string of the json object.
And what is the actual behavior instead?
Exception thrown: [json.exception.type_error.316] invalid UTF-8 byte at index 1: 0xC3
Which compiler and operating system are you using? Is it a supported compiler?
cmake version 3.11.4 with -utf-8 compile option
Did you use a released version of the library or the version from the develop branch?
Release version nº 3.1.2 (https://github.com/nlohmann/json/releases/tag/v3.1.2)
If you experience a compilation error: can you compile and run the unit tests?
no compilation error

I've noticed similar erros at issues https://github.com/nlohmann/json/issues/1022 and
https://github.com/nlohmann/json/issues/1131
To try to fix it I added the -utf-8 flag to the compiler. Before setting a value to tje fJson object, I printed the content of the codigo_ativo variable to check its hex content:

for (size_t i = 0; i < codigo_ativo.size(); ++i)
      {
        std::cout << i << " " << std::hex << static_cast<int>(static_cast<uint8_t>(codigo_ativo[i])) << std::endl;
      }

outputs:

0 c7
1 c3
2 4f

invalid

Source

FabioNevesRezende

Most helpful comment

That is extended ASCII. ASCII can only express 128 characters - from 0x00 to 0x7F.

nlohmann on 4 Dec 2018

👍3

All 5 comments

The string is not UTF-8 encoded. The string ÇÃO should yield the code points C7 C3 4F and thus the UTF-8 byte sequence C387 C383 4F. The latter is printed by your example program. This is not a bug from the library (it in fact detects that C3 is not a valid UTF-8 byte), but your compiler or the encoding of the source code file.

nlohmann on 4 Dec 2018

The string is encoded in Ascii, but isn't ascii codes equivalent to their respective in utf-8?

see:
https://stackoverflow.com/questions/2347783/how-to-convert-an-ascii-string-to-an-utf8-string-in-c

FabioNevesRezende on 4 Dec 2018

ASCII is a subset of UTF-8. From your string, only the last character can be expressed by ASCII.

You may want to have a look at https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ and https://utf8everywhere.org

nlohmann on 4 Dec 2018

"ASCII is a subset of UTF-8" so if the API accepts UTF-8 it should be accepting ASCII. And All the three characters can be expressed in ascii, see its table:

https://theasciicode.com.ar/

Decimal 199 = Ã
Decimal 128 = Ç

FabioNevesRezende on 4 Dec 2018

That is extended ASCII. ASCII can only express 128 characters - from 0x00 to 0x7F.

nlohmann on 4 Dec 2018

👍3

Was this page helpful?

0 / 5 - 0 ratings