Json: [json.exception.type_error.316] invalid UTF-8 byte at index 1: 0xC3

Created on 4 Dec 2018  路  5Comments  路  Source: nlohmann/json

  • What is the issue you have?
    Can't dump json object into string

  • Please describe the steps to reproduce the issue. Can you provide a small but working code example?

    nlohmann::json fJson;
    std::string codigo_ativo("脟脙O");
    fJson["CODIGO_ATIVO"] = codigo_ativo;
    fJson.dump();
  • What is the expected behavior?
    the .dump() method to generate the serialized string of the json object.

  • And what is the actual behavior instead?
    Exception thrown: [json.exception.type_error.316] invalid UTF-8 byte at index 1: 0xC3

  • Which compiler and operating system are you using? Is it a supported compiler?
    cmake version 3.11.4 with -utf-8 compile option

  • Did you use a released version of the library or the version from the develop branch?
    Release version n潞 3.1.2 (https://github.com/nlohmann/json/releases/tag/v3.1.2)

  • If you experience a compilation error: can you compile and run the unit tests?
    no compilation error

I've noticed similar erros at issues https://github.com/nlohmann/json/issues/1022 and
https://github.com/nlohmann/json/issues/1131
To try to fix it I added the -utf-8 flag to the compiler. Before setting a value to tje fJson object, I printed the content of the codigo_ativo variable to check its hex content:

for (size_t i = 0; i < codigo_ativo.size(); ++i)
      {
        std::cout << i << " " << std::hex << static_cast<int>(static_cast<uint8_t>(codigo_ativo[i])) << std::endl;
      }

outputs:

0 c7
1 c3
2 4f

invalid

Most helpful comment

That is extended ASCII. ASCII can only express 128 characters - from 0x00 to 0x7F.

All 5 comments

The string is not UTF-8 encoded. The string 脟脙O should yield the code points C7 C3 4F and thus the UTF-8 byte sequence C387 C383 4F. The latter is printed by your example program. This is not a bug from the library (it in fact detects that C3 is not a valid UTF-8 byte), but your compiler or the encoding of the source code file.

The string is encoded in Ascii, but isn't ascii codes equivalent to their respective in utf-8?

see:
https://stackoverflow.com/questions/2347783/how-to-convert-an-ascii-string-to-an-utf8-string-in-c

ASCII is a subset of UTF-8. From your string, only the last character can be expressed by ASCII.

You may want to have a look at https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ and https://utf8everywhere.org

"ASCII is a subset of UTF-8" so if the API accepts UTF-8 it should be accepting ASCII. And All the three characters can be expressed in ascii, see its table:

https://theasciicode.com.ar/

Decimal 199 = 脙
Decimal 128 = 脟

That is extended ASCII. ASCII can only express 128 characters - from 0x00 to 0x7F.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mlund picture mlund  路  4Comments

CraigHutchinson picture CraigHutchinson  路  4Comments

edi9999 picture edi9999  路  3Comments

bassosimone picture bassosimone  路  3Comments

asmaloney picture asmaloney  路  4Comments