Detailed report: https://oss-fuzz.com/testcase?key=5854339613589504
Project: json
Fuzzer: libFuzzer_json_parse_cbor_fuzzer
Job Type: libfuzzer_asan_json
Platform Id: linux
Crash Type: Stack-overflow
Crash Address: 0x7ffc55e22ce8
Crash State:
nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<cha
nlohmann::detail::binary_reader<nlohmann::basic_json<std::__1::map, std::__1::ve
nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<cha
Sanitizer: address (ASAN)
Regressed: https://oss-fuzz.com/revisions?job=libfuzzer_asan_json&range=201701031958:201701032147
Reproducer Testcase: https://oss-fuzz.com/download?testcase_id=5854339613589504
Issue filed automatically.
See https://github.com/google/oss-fuzz/blob/master/docs/reproducing.md for more information.
This bug is subject to a 90 day disclosure deadline. If 90 days elapse
without an upstream patch, then the bug report will automatically
become visible to the public.
clusterfuzz-testcase-minimized-5854339613589504.dms.zip
Stack trace:
AddressSanitizer:DEADLYSIGNAL
--
聽 | =================================================================
聽 | ==1==ERROR: AddressSanitizer: stack-overflow on address 0x7ffc55e22ce8 (pc 0x000000515f22 bp 0x7ffc55e23530 sp 0x7ffc55e22cd0 T0)
聽 | SCARINESS: 10 (stack-overflow)
聽 | #0 0x515f21 in operator new(unsigned long) _asan_rtl_
聽 | #1 0x5a6470 in __allocate /usr/local/include/c++/v1/new:226:10
聽 | #2 0x5a6470 in allocate /usr/local/include/c++/v1/memory:1747
聽 | #3 0x5a6470 in std::__1::vector<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer>, std::__1::allocator<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> > >* nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer>::create<std::__1::vector<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer>, std::__1::allocator<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> > >>() json/src/json.hpp:7939
聽 | #4 0x5a5fcf in nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer>::json_value::json_value(nlohmann::detail::value_t) json/src/json.hpp:8013:29
聽 | #5 0x5a0061 in nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer>::basic_json(nlohmann::detail::value_t) json/src/json.hpp:8249:22
聽 | #6 0x59ed6a in nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> nlohmann::detail::binary_reader<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> >::get_cbor_array<int>(int) json/src/json.hpp:5374:32
聽 | #7 0x59bd2e in nlohmann::detail::binary_reader<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> >::parse_cbor_internal(bool) json/src/json.hpp:4717:24
聽 | #8 0x5a4bd0 in nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> nlohmann::detail::binary_reader<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> >::get_cbor_array<int>(int)::{lambda()#1}::operator()() const json/src/json.hpp:5377:20
聽 | #9 0x59ee2e in generate_n<std::__1::back_insert_iterator<std::__1::vector<nlohmann::basic_json<std::map, std::vector, std::__1::basic_string<char>, bool, long, unsigned long, double, std::allocator, adl_serializer>, std::__1::allocator<nlohmann::basic_json<std::map, std::vector, std::__1::basic_string<char>, bool, long, unsigned long, double, std::allocator, adl_serializer> > > >, int, (lambda at ../src/json.hpp:5375:73)> /usr/local/include/c++/v1/algorithm:2186:20
聽 | #10 0x59ee2e in nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> nlohmann::detail::binary_reader<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> >::get_cbor_array<int>(int) json/src/json.hpp:5375
聽 | #11 0x59bd2e in nlohmann::detail::binary_reader<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> >::parse_cbor_internal(bool) json/src/json.hpp:4717:24
聽 | #12 0x5a4bd0 in nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> nlohmann::detail::binary_reader<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> >::get_cbor_array<int>(int)::{lambda()#1}::operator()() const json/src/json.hpp:5377:20
聽 | #13 0x59ee2e in generate_n<std::__1::back_insert_iterator<std::__1::vector<nlohmann::basic_json<std::map, std::vector, std::__1::basic_string<char>, bool, long, unsigned long, double, std::allocator, adl_serializer>, std::__1::allocator<nlohmann::basic_json<std::map, std::vector, std::__1::basic_string<char>, bool, long, unsigned long, double, std::allocator, adl_serializer> > > >, int, (lambda at ../src/json.hpp:5375:73)> /usr/local/include/c++/v1/algorithm:2186:20
聽 | #14 0x59ee2e in nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> nlohmann::detail::binary_reader<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> >::get_cbor_array<int>(int) json/src/json.hpp:5375
聽 | #15 0x59bd2e in nlohmann::detail::binary_reader<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> >::parse_cbor_internal(bool) json/src/json.hpp:4717:24
聽 | #16 0x5a4bd0 in nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> nlohmann::detail::binary_reader<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> >::get_cbor_array<int>(int)::{lambda()#1}::operator()() const json/src/json.hpp:5377:20
聽 | #17 0x59ee2e in generate_n<std::__1::back_insert_iterator<std::__1::vector<nlohmann::basic_json<std::map, std::vector, std::__1::basic_string<char>, bool, long, unsigned long, double, std::allocator, adl_serializer>, std::__1::allocator<nlohmann::basic_json<std::map, std::vector, std::__1::basic_string<char>, bool, long, unsigned long, double, std::allocator, adl_serializer> > > >, int, (lambda at ../src/json.hpp:5375:73)> /usr/local/include/c++/v1/algorithm:2186:20
聽 | #18 0x59ee2e in nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> nlohmann::detail::binary_reader<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> >::get_cbor_array<int>(int) json/src/json.hpp:5375
聽 | #19 0x59bd2e in nlohmann::detail::binary_reader<nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool, long, unsigned long, double, std::__1::allocator, nlohmann::adl_serializer> >::parse_cbor_internal(bool) json/src/json.hpp:4717:24
I analyzed the testcase. It more or less consists of a lot of nested arrays. On my machine, there is stack overflow at depth 1450. I am not sure how to cope with such issues. We have the same problem if we would parse a JSON file like
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ ...
At some point, the parser would explode. Any ideas on this?
The only ways to "fix" this are to impose an arbitrary limit on the level of nesting, or change whatever it is that is overflowing the stack from a recursive algorithm to an iterative algorithm.
Yes, and imposing such a limit would be OK with RFC 7159 (Sect. 9):
An implementation may set limits on the maximum depth of nesting.
But I would not like it...
The recursion is:
parse_cbor_internal -> get_cbor_array -> parse_cbor_internal -> get_cbor_array -> ...
According to the stack above, it's actually a 4 function recursion, which I think would be extremely difficult to break:
json nlohmann::detail::binary_reader<json>::get_cbor_array<int>(int)
nlohmann::detail::binary_reader<json>::parse_cbor_internal(bool)
json nlohmann::detail::binary_reader<json>::get_cbor_array<int>(int)::{lambda()}::operator()() const
generate_n<back_insert_iterator<vector<json>>, int, (lambda)>
Sorry, I simplified it a bit. And it all makes sense, when you try to parse a recursive structure with a recursive decent parser...
I think there is nothing we can do about this. JSON and CBOR base on recursion, and adding arbitrary limits is not a solution.
Strangely, I got this message yesterday:
Updates:
Labels: ClusterFuzz-Verified
Status: Verified
Comment #2 on issue 4234 by ClusterFuzz-External: json: Stack-overflow in nlohmann::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<cha
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=4234#c2
ClusterFuzz testcase 5854339613589504 is verified as fixed, so closing issue as verified.
If this is incorrect, please file a bug on https://github.com/google/oss-fuzz/issues/new
Hi,
Could I ask you to reconsider making changes to address this issue? My arguments are below :)
While I agree with you that an arbitrary limit is not a proper solution, leaving it as is requires the library user to look for workarounds like:
A potential solution (which I thinking might be acceptable) is actually giving the library user a choice, instead of setting an arbitrary limit.
For example, there could be a way for the library user to say "I want at most 10 levels of recursion (since I know that in my system there will never be any more than that". In this case the default could be "no limit".
Or, to make thing stable/secure by default, it can probably be assumed (citation needed) that in the wild there 99.9% of packets have a recursion level of e.g. 50 (which is probably a huge exaggeration already, while still should be sane with regards to stack usage), and if the library user needs more, they could specify a different limit (or no limit).
Comments / thoughts are welcomed :)
Cheers,
Gynvael
You mean something like a preprocessor macro JSON_MAXIMAL_RECURSION_DEPTH?
Hi Niels,
Originally I was thinking of a preprocessor macro for the default and a run-time parameter as default overwrite, e.g.:
#ifndef JSON_MAXIMAL_RECURSION_DEPTH
# define JSON_MAXIMAL_RECURSION_DEPTH 50
#endif
...
static basic_json from_cbor(
detail::input_adapter i,
const bool strict = true,
some_integer_type_maybe_unsigned recursion_limit = JSON_MAXIMAL_RECURSION_DEPTH)
I'm not fully convinced of adding another default parameter to from_something functions. Perhaps having a from_cbor_ex-like function that takes additional options might be a better idea (pardon the WinAPI-like "Ex" naming).
That said, on one hand having a runtime method to overwrite the recursion limit might make some users happy (ones who, like me, like to set the limit as low as possible, probably on a per-payload-scheme basis).
On the other hand I recognize that probably just having the compilation-time setting might be enough.
Cheers,
Gynvael
Hello!
Maybe you could try to get the available stack size using:
getrlimit(RLIMIT_STACK, struct rlimit *);
pthread_getattr_np(pthread_t thread, pthread_attr_t *attr); // on linux
void WINAPI GetCurrentThreadStackLimits(
_Out_ PULONG_PTR LowLimit,
_Out_ PULONG_PTR HighLimit
); // on windows
then, regularly measure the remaining stack size (declare a variable on the stack, take its address and substract the value of its cast to (void *) from the value of the top stack address ), and stop parsing when the stack goes under some threshold?
@JoelStienlet
That's a pretty cool idea technically, but I'm not sure if it would be worth to maintain it / work around the problems, like:
Also, doing it dynamically adds some latency and allows untrusted payloads to be processed longer than required.
Still, I like the idea from a technical perspective - a pretty neat trick :)
Indeed limiting the maximum amount of recursions is simpler, more portable and faster to dismiss bad packets.
I came up with this idea because it could have been a way to adjust the limit dynamically, yet not bothering the user by asking him for a reasonable value, or choosing an arbitrary value in his place.
As you said, a default value of 50 is certainly sane 99.99% of the time, but I also like the idea that the user should be able to customize that value if he wants to, per-payload as you said (the library could be used to parse strings coming from unknown clients, and from some trusted IPC inside the server at the same time). I think the key is setting a sane default value, yet providing flexibility for the rare cases that require a larger or best-fit value.
To progress a little the discussion on this issue, are there any argument against introducing the limits as described?
I currently have not found the time to propose an implementation. That said, PRs are welcome :)
Ah, sorry Niels, didn't mean to pressure you :). I was under the impression you wanted to discuss the issue more ("state: please discuss"), so wanted to make sure we progress the discussion.
Of course I would like to have more discussion on this - but it seems no one else wants to join in.
Well if you wanted it to be 'unlimited depth' you could always throw to a trampoline on occasion anyway, if it is not used then it would have no running cost, and if it is used, well, it's better then stack overflowing...
I think this would be much easier combined with some of the previous discussions about creating a policy class to hold the templates rather than having all of the template parameters. The only other ways I see to do it are adding another parameter to from_json or making it compile-time, and I don't think we want to require all users to change from_json to support this.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Not stale, just awaiting its turn :)
FYI: The JSON parser in #971 is now non-recursive. It uses a std::vector<bool> to track the nesting of objects and arrays. This approach may also be applicable to the binary formats. In any case, it is now simpler to set and check a limit.
So basically there are some options on this, once we have the SAX parser merged:
Option 1 is already possible.
Option 2(i) is another argument to overwork the parser function interface (see also #971), because the parameters are piling up there. Maybe it is really time to put all parsing parameters into a policy class and only pass an input adapter and the policy to the parser. I think this is what @gregmarr proposed earlier.
Option 2(ii) is much simpler, but would then required re-compilation if the depth should be adjusted.
I would be happy to discuss this. I am removing the "wontfix" badge, because this should be addressed in the next feature release.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This concrete issue is fixed with #971.
As discussed in https://github.com/nlohmann/json/issues/1419#issuecomment-452831070, the recursive model the of JSON values can lead to stackoverflow during deconstruction of deeply nested objects.
These are the options:
Determine max. stack size during runtime and tune the maximum depth: This is an interesting approach but I doubt we can find a stable and reliable way of doing that. Different API calls have different stack overhead and it is straightforward to calculate maximum the depth from the available stack size.
Redesigning data model to a non-recursive model: This will solve such issues but requires a major overhaul. Additionally, having a recursive representation of JSON objects is very intuitive for future developments.
Let the user specify the max. depth (preferably at runtime): This is probably is simplest yet effective approach. We can ship the library with a reasonable default maximum depth (e.g. 1000). We can also provide an API for the user to change this depth, but we still cannot protect him from a stack overflow.
Rewrite the recursive implementations to non-recursive: This has been done in the parser. But there are other places around the code that should be taken care of.
Personally, I prefer option (4) that doesn't impose any limitations to the user and doesn't introduce any settings (e.g. max_depth) that user should be concerned.
Regarding the stack overflow during deconstruction of deeply nested objects (https://github.com/nlohmann/json/issues/1419), I believe we can iteratively free up our JSON object tree and don't rely on automatic deconstruction.
I have done some experiments as a proof of concept and the results seems promising. To continue, I wanted to know if the basic_json type is _movable_ since it is essential to be able to move the JSON objects out of their current container into a flat std::vector.
To quickly answer your question: Yes, basic_json objects are movable.
There has been some discussions earlier to separate the implementation of the values (using std::string, std::vector, and std::map) from their interface, allowing different implementations thereof. As this would mean a giant refactoring work, these ideas have not been pursued so far. It would be great to see some progress here!