https://github.com/facebook/zstd/blob/659e9f05cf227b14b2d2aafbc29b28184e669f0d/lib/compress/zstd_compress.c#L3758-L3762
https://github.com/facebook/zstd/blob/659e9f05cf227b14b2d2aafbc29b28184e669f0d/lib/compress/zstd_compress.c#L3800-L3802
https://github.com/facebook/zstd/blob/659e9f05cf227b14b2d2aafbc29b28184e669f0d/lib/compress/zstd_compress.c#L3862-L3864
https://github.com/facebook/zstd/blob/659e9f05cf227b14b2d2aafbc29b28184e669f0d/lib/compress/zstdmt_compress.c#L1789
https://github.com/facebook/zstd/blob/659e9f05cf227b14b2d2aafbc29b28184e669f0d/lib/decompress/zstd_decompress.c#L621
https://github.com/facebook/zstd/blob/659e9f05cf227b14b2d2aafbc29b28184e669f0d/lib/decompress/zstd_decompress.c#L672-L673
https://github.com/facebook/zstd/blob/659e9f05cf227b14b2d2aafbc29b28184e669f0d/lib/decompress/zstd_decompress.c#L1493-L1498
Possible variants:
1) Edit Zstd project, edit LZ4 project, edit over 10,000 project on github.
2) Configure compiler options.
3) Fix a bug in the compiler.
You have choose 1 variant! Why?
Controversial commit: https://github.com/facebook/zstd/commit/659e9f05cf227b14b2d2aafbc29b28184e669f0d
@remittor, IMHO it's plain old C code, no magic here. What do you find disturbing or magic?
@luben
You can imagine that in Linux kernel in the same way corrected UBSan?
Linux developers fix it like this: compiler.h, compiler-gcc.h, compiler-clang.h, compiler-attributes.h, etc.
Google writes this: https://github.com/google/sanitizers/wiki/AddressSanitizer#turning-off-instrumentation
I still not see how highlighted lines has any relation to ASan or UBSan. Can you point specific problem?
Fix a bug in the compiler.
Null pointer addition is undefined behaviour in C/C++. It is better to fix this ticking bomb now then debugging weird behaviour of zstd in the future due to new compiler releases.
You can propose to C/C++ standard committee a paper and make a behaviour of null pointer addition as defined.
I think (not sure) this optimization (based on known UB) can be disabled by -fno-delete-null-checks compiler flag. Linux kernel is safe, since kernel uses this flag.
We want zstd to be as widely portable as possible, and work on many different platforms and compilers, which means removing undefined behavior.
The kernel is special, because it only compiles with gcc (and now clang), and controls the platform it runs on (since it is the platform).
Previously, it was like this:
https://github.com/facebook/zstd/blob/d9646dcbb5d6fd265bfe19b5376ef8e15cf3cf11/lib/compress/zstd_compress.c#L3778-L3782
Now it has become like this:
https://github.com/facebook/zstd/blob/659e9f05cf227b14b2d2aafbc29b28184e669f0d/lib/compress/zstd_compress.c#L3797-L3802
Surely someone can see the safe code in this.
But I only see here dirty hack for spoofing sanitizer.
Passing the sanitizer is not our primary goal here. The sanitizer raised a valid (if somewhat recondite) issue.
We want Zstd to run correctly on as many platforms as possible, including exotic ones. The best way to ensure that Zstd will work on other people's hardware/compilers/runtimes is to write code that complies with the C Language Standard. (To get yourself into the mindset of writing cross-platform C, you might be interested in this quick quiz.)
At any rate, section 6.5.6.8 of the C18 standard states that:
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression
Ppoints to thei-th element of an array object, the expressions(P)+N(equivalently,N+(P)) and(P)-N(whereNhas the valuen) point to, respectively, thei + n-th andi − n-th elements of the array object, provided they exist. Moreover, if the expressionPpoints to the last element of an array object, the expression(P)+1points one past the last element of the array object, and if the expressionQpoints one past the last element of an array object, the expression(Q)-1points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
[Emphasis mine.]
A pointer to a non-array object is treated like a pointer to the first element of an array of length one (6.5.6.7). It is therefore undefined behavior to add anything other than zero (or maybe one?) to a null pointer.
We use null pointers as sentinel values in these variables in some cases. These constructs are therefore necessary for Zstd to avoid undefined behavior (which can eat your lunch).
I'm sorry that you find them distracting.
Finally, note that gcc and clang squash these constructs: https://godbolt.org/z/4wRAc2, so they do not impose a runtime overhead on platforms where out-of-bounds pointer addition happens to behave like in-bounds pointer addition.
I hope that clarifies things!