Recently merged PR #1436 introduced quadratic complexity in destruction of json object due to use of std::vector::reserve in a loop.
Benchmark code:
#include "benchmark/benchmark.h"
#include "nlohmann/json.hpp"
static void benchmark_json_creation_destruction(benchmark::State &state) {
size_t N = state.range(0);
std::string s;
for (size_t i = 0; i < N; ++i) {
s += "[1,";
}
s += "2";
for (size_t i = 0; i < N; ++i) {
s += "]";
}
for (auto _ : state) {
nlohmann::json j;
j = nlohmann::json::parse(s);
}
state.SetComplexityN(N);
}
BENCHMARK(benchmark_json_creation_destruction)
->Range(1, 10000)->Complexity(benchmark::oAuto);
int main(int argc, char** argv) {
benchmark::Initialize(&argc, argv);
benchmark::RunSpecifiedBenchmarks();
return 0;
}
Linear complexity.
Quadratic complexity with 3.7.2:
------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------------
benchmark_json_creation_destruction/1 450 ns 450 ns 1521254
benchmark_json_creation_destruction/8 2283 ns 2282 ns 319664
benchmark_json_creation_destruction/64 19322 ns 19322 ns 33992
benchmark_json_creation_destruction/512 290041 ns 290035 ns 2409
benchmark_json_creation_destruction/4096 9455262 ns 9455230 ns 68
benchmark_json_creation_destruction/10000 53660018 ns 53657883 ns 12
benchmark_json_creation_destruction_BigO 0.54 N^2 0.54 N^2
benchmark_json_creation_destruction_RMS 2 % 2 %
Same version, with reserve calls removed:
------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------------
benchmark_json_creation_destruction/1 444 ns 444 ns 1376396
benchmark_json_creation_destruction/8 2111 ns 2111 ns 325257
benchmark_json_creation_destruction/64 16713 ns 16713 ns 41177
benchmark_json_creation_destruction/512 118116 ns 118113 ns 5793
benchmark_json_creation_destruction/4096 916147 ns 916078 ns 729
benchmark_json_creation_destruction/10000 2236044 ns 2235964 ns 302
benchmark_json_creation_destruction_BigO 223.63 N 223.62 N
benchmark_json_creation_destruction_RMS 0 % 0 %
While previous version, 3.7.1, exhibits linear complexity:
------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------------
benchmark_json_creation_destruction/1 387 ns 387 ns 1805004
benchmark_json_creation_destruction/8 1578 ns 1578 ns 435509
benchmark_json_creation_destruction/64 10501 ns 10501 ns 65116
benchmark_json_creation_destruction/512 100457 ns 100454 ns 6841
benchmark_json_creation_destruction/4096 764401 ns 764402 ns 871
benchmark_json_creation_destruction/10000 1867051 ns 1867057 ns 371
benchmark_json_creation_destruction_BigO 186.71 N 186.71 N
benchmark_json_creation_destruction_RMS 0 % 0 %
Ubuntu 19.10 with GCC 9.2:
$ gcc --version
gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008
develop branch?Release 3.7.2
Great benchmarking job! I didn't know that Google benchmark library can measure and report computational complexity too. That comes in handy.
I agree that we should remove vector::reserve given that it doesn't cause any notable difference on real-life JSON (see https://github.com/nlohmann/json/pull/1436#issuecomment-552386759) while performing significantly worse on the worst-case scenarios.
Can you explain why these reserve calls are harmful? It seems that they are just preparing the stack variable to contain the things it's about to be push_back'd with. How is that slowing things down? I know you have benchmarks, but I'm trying to understand _why_ this is true.
@jaredgrubb Wild guess, maybe calculating the object/array size is not a constant time operation?
No, vector and map have constant-time size functions.
I think the Benchmark tool is not detecting complexity correctly.
In all the iteration cases, the runtime goes down by a constant 14% ... if your patch was improving complexity, then we would expect the improvement to _scale_ with the number of elements.
At a high-level, these reserve calls are preparing for something that is about to happen, and "common wisdom" says these should help -- but we all know how "common wisdom" isn't always true.
The fact that they're actually slowing things down is very interesting! But we should try to explain why this is occurring before we accept this change. I would hate to see, for example, that we're working around a GCC-9.2/stdlibc++ bug, and when this test runs on other targets it actually gets worse (like "common wisdom" suggests it should).
@jaredgrubb If you reserve in a loop with a slowly increasing size, you'll get an allocation pattern like this:
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
If you don't reserve, then you'll get an allocation pattern like this, assuming a growth factor of 4:
4 16 64
Each time you call reserve(), you have an allocation, several default constructors (which may be trivial), several copies, several destructors (which may be trivial), and a deallocation. The only time you should reserve inside a loop is when you know the final size up front, or you know that it's a few items of large size, so you'll get a better allocation pattern than the built-in growth factor.
Here is a comment from STL, who is the maintainer of the Visual C++ Standard Library.
https://www.reddit.com/r/cpp/comments/1qhloy/fun_and_danger_with_stdvectorreserve/
To be fair, this is a non-obvious gotcha. reserve() is the only member function allowed to bypass geometric growth, and is therefore the most subtle way to trigger quadratic complexity out of the ordinarily excellently-behaved vector. (There are obvious and stupid ways, like insert()-at-front, which the STL's interface intentionally makes difficult and obvious by not providing push_front().)
I myself made this mistake in my personal library about 10 years ago when I was getting started with the STL.
Further elaboration of the above: https://www.reddit.com/r/cpp/comments/1qhloy/fun_and_danger_with_stdvectorreserve/cdd7dcx/
Ah! That makes a lot of sense thanks for sharing that. I learned something.
(Also, I looked back to the numbers to see what I missed. I made a mistake in my "14%..." claim, as I was taking the last two benchmark sets, which are clearly both marked as linear. Looking at the first two, the improvement is clearly scaling as claimed ... so I take all that back)
Most helpful comment
Ah! That makes a lot of sense thanks for sharing that. I learned something.
(Also, I looked back to the numbers to see what I missed. I made a mistake in my "14%..." claim, as I was taking the last two benchmark sets, which are clearly both marked as linear. Looking at the first two, the improvement is clearly scaling as claimed ... so I take all that back)