The docs for localkey say:
This key uses the fastest possible implementation available to it for the target platform. It is instantiated with the thread_local! macro and the primary method is the with method.
This is sort of true, but sort of not at the same time. I'm not sure if this means the docs should be clarified, or if this is an implementation bug.
Consider this, for example: https://godbolt.org/z/WEvrPn
simply accessing a value in TLS is much, much heavier in Rust than in C.
There are two issues here:
dlopen) but slower. You can see this in the C code as well if you compile with -fPIC. We could use the local-exec TLS model in Rust if we know that the current crate is going to be embedded directly in an executable binary (as opposed to a shared library). You can try it on nightly with -Z tls-model=local-exec.thread_local! macro can't recognize constant expressions. This adds unnecessary checks on the "constructed" flag.
Most helpful comment
There are two issues here:
dlopen) but slower. You can see this in the C code as well if you compile with-fPIC. We could use thelocal-execTLS model in Rust if we know that the current crate is going to be embedded directly in an executable binary (as opposed to a shared library). You can try it on nightly with-Z tls-model=local-exec.thread_local!macro can't recognize constant expressions. This adds unnecessary checks on the "constructed" flag.