Zstd: Why are magic tricks used in code? Stop, plz!

Created on 7 Dec 2019  Â·  7Comments  Â·  Source: facebook/zstd

All 7 comments

@remittor, IMHO it's plain old C code, no magic here. What do you find disturbing or magic?

@luben
You can imagine that in Linux kernel in the same way corrected UBSan?

Linux developers fix it like this: compiler.h, compiler-gcc.h, compiler-clang.h, compiler-attributes.h, etc.

Google writes this: https://github.com/google/sanitizers/wiki/AddressSanitizer#turning-off-instrumentation

I still not see how highlighted lines has any relation to ASan or UBSan. Can you point specific problem?

Fix a bug in the compiler.

Null pointer addition is undefined behaviour in C/C++. It is better to fix this ticking bomb now then debugging weird behaviour of zstd in the future due to new compiler releases.

You can propose to C/C++ standard committee a paper and make a behaviour of null pointer addition as defined.

I think (not sure) this optimization (based on known UB) can be disabled by -fno-delete-null-checks compiler flag. Linux kernel is safe, since kernel uses this flag.

We want zstd to be as widely portable as possible, and work on many different platforms and compilers, which means removing undefined behavior.

The kernel is special, because it only compiles with gcc (and now clang), and controls the platform it runs on (since it is the platform).

Previously, it was like this:
https://github.com/facebook/zstd/blob/d9646dcbb5d6fd265bfe19b5376ef8e15cf3cf11/lib/compress/zstd_compress.c#L3778-L3782
Now it has become like this:
https://github.com/facebook/zstd/blob/659e9f05cf227b14b2d2aafbc29b28184e669f0d/lib/compress/zstd_compress.c#L3797-L3802

Surely someone can see the safe code in this.
But I only see here dirty hack for spoofing sanitizer.

Passing the sanitizer is not our primary goal here. The sanitizer raised a valid (if somewhat recondite) issue.

We want Zstd to run correctly on as many platforms as possible, including exotic ones. The best way to ensure that Zstd will work on other people's hardware/compilers/runtimes is to write code that complies with the C Language Standard. (To get yourself into the mindset of writing cross-platform C, you might be interested in this quick quiz.)

At any rate, section 6.5.6.8 of the C18 standard states that:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

[Emphasis mine.]

A pointer to a non-array object is treated like a pointer to the first element of an array of length one (6.5.6.7). It is therefore undefined behavior to add anything other than zero (or maybe one?) to a null pointer.

We use null pointers as sentinel values in these variables in some cases. These constructs are therefore necessary for Zstd to avoid undefined behavior (which can eat your lunch).

I'm sorry that you find them distracting.

Finally, note that gcc and clang squash these constructs: https://godbolt.org/z/4wRAc2, so they do not impose a runtime overhead on platforms where out-of-bounds pointer addition happens to behave like in-bounds pointer addition.

I hope that clarifies things!

Was this page helpful?
0 / 5 - 0 ratings