Zig: thread local variables

Created on 15 Apr 2018 · 25Comments · Source: ziglang/zig

var x: i32 = 1; // global variable
threadlocal var y: i32 = 2; // thread local variable

here's 1 use case (taken from std/debug/index.zig):

var panicking: u8 = 0; // TODO make this a bool

pub fn panicExtra(trace: ?&const builtin.StackTrace, first_trace_addr: ?usize,
    comptime format: []const u8, args: ...) noreturn
{
    @setCold(true);

    if (@atomicRmw(u8, &panicking, builtin.AtomicRmwOp.Xchg, 1, builtin.AtomicOrder.SeqCst) == 1) {
        // Panicked during a panic.

        // TODO detect if a different thread caused the panic, because in that case
        // we would want to return here instead of calling abort, so that the thread
        // which first called panic can finish printing a stack trace.
        os.abort();
    }
    const stderr = getStderrStream() catch os.abort();
    stderr.print(format ++ "\n", args) catch os.abort();
    if (trace) |t| {
        dumpStackTrace(t);
    }
    dumpCurrentStackTrace(first_trace_addr);

    os.abort();
}

accepted proposal

Source

andrewrk

👍4

All 25 comments

The compiler should emit a warning/error if thread local variable is used by single thread only.
Taking pointer of such variable and passing it to other threads may be forbidden by the compiler.
When the value is initialized at comptime: is it calculated once, stored somewhere and then initial value set for every thread? What if comptime evaluation returns different value each time, say some unique number?

PavelVozenilek on 16 Apr 2018

Instead of making this a language-level feature, could this be implemented in the standard library?

phase on 16 Apr 2018

Technically yes but you'd miss out on out-of-the-box tool support. gdb and lldb know about .tbss and .tdata segments; any zig-specific scheme they would need to be taught first.

The compiler should emit a warning/error if thread local variable is used by single thread only.

Taking pointer of such variable and passing it to other threads may be forbidden by the compiler.

I don't know if that is a reasonable expectation, that kind of data flow analysis is a Hard Problem. I'm not even sure it's possible in general without imposing additional restrictions _à la_ linear types.

bnoordhuis on 16 Apr 2018

For thread safety you may want to consider thread local by default and require the user to declare memory as global. Thread local and and a thread safe way to move memory between threads would be a huge win.

bheads on 16 Apr 2018

For thread safety you may want to consider thread local by default and require the user to declare memory as global.

I understand this is what D does, but I'm not convinced this is the best thing to do. Thread local data has a very specific use case but it is not a general solution to data races within a thread. For example, thread local buffers cannot be used in a function which is directly or indirectly recursive. It also comes at a cost. Less thread local data makes threads less expensive.

andrewrk on 16 Apr 2018

👍1

@andrewrk Just spent an hour dealing with a thread local bug in D, I think you are right...

bheads on 4 May 2018

I'm curious...what alternatives to thread-local-by-default are you thinking of, @andrewrk?

Rust-style ownership and borrowing?

Pony-style builtin actor-model with thread-local deterministic GC?

nordlow on 26 Jun 2018

@nordlow who said the alternative is more complex than the simple approach above: threadlocal var y : i32 = 2;?

isaachier on 26 Jun 2018

Doesn't that put Zing in the inter-thread-data-races-by-default language group?

Which is what languages such as Rust and D has "designed away" and I thought no new system language ever will have again?

nordlow on 26 Jun 2018

As I've said in another issue, that is exactly what Zig's specialty is: allowing precise control over dangerous scenarios as long as errors are returned explicitly and not thrown up the stack. That way the user is in control of the behavior, whether or not races are involved. I think Zig competes with Rust because it is much more flexible and does not burden the user with proofs of correctness upfront. That can always be added later using external tooling/annotations (see Frama-C for an example of this). Personally, I don't see the use of globals being thread-local by default even when threads aren't being used.

isaachier on 26 Jun 2018

👍1

Are you saying that Zig will statically detect data-races and notify them to the developer as compilation errors?

nordlow on 26 Jun 2018

I doubt it. Does C do that? More likely a runtime tool will exist to check for races (similar to Clang's ThreadSanitizer). If you are looking for a language that makes it impossible to shoot yourself in the foot, I think Zig is not the answer. If you want a language that lets you shoot yourself in the foot, but provides a myriad of ways to avoid it and detect it, then Zig is the answer.

isaachier on 26 Jun 2018

I'm curious...what alternatives to thread-local-by-default are you thinking of, @andrewrk?

It sounds like you see thread-local-by-default as solving some problem, and I challenge that here: https://github.com/ziglang/zig/issues/924#issuecomment-381673973

For concurrency (See #174), I'm experimenting with async/await (an event loop with coroutines multiplexed upon kernel threads) and atomics in the self-hosted compiler. If I can show that you can use higher level abstractions in this style relatively easily, then I think the problem is solved.

andrewrk on 26 Jun 2018

No, I of course realize that thread-local storage has its pitfalls as well. But it's more unlikely to have race-conditions in a multi-threaded context when top-scope variables are thread-local by default. That's at least my experience. My private language of choice is D. I am however very interested in the progression of other languages such as Rust and Zig and want to understand all the different ways in which we can make best use of multi-core CPUs in an as safe way as possible. D attacks this using strong or weak function purity (pure), default thread-local storage and immutable GC-backed allocation. Rust uses ownership and borrowing combined with atomics and refcounted allocation at the bottom of its stack. I'm very curious if Zig has a similar or another strategy for tackling the problem of dealing with memory-safe concurrency (task-based parallellism).

Update: Ahh, sorry I'm confusing thread-local storage with function purity and strong immutability (shareable by default). To safely send data either by immutable or isolated references we need some kind of built-in data qualifiers for expressing immutability and isolatedness. What is Zig take on these issues?

nordlow on 27 Jun 2018

I strongly disagree. The simplest optimization for multithreading I know of is to create a thread pool. Work is put into a queue, then operated on. Now imagine task A is interested in queueing task B but assumes it shares the same variables. By making all variables thread local we actually are more likely to make an error if task B runs on another thread than we would have if the variables were truly global.

isaachier on 27 Jun 2018

👍1

Note, given that some architectures/OSes do not support TLS, this makes #1764 especially important. With #1764 I feel comfortable accepting this issue, because we can make thread local variables be global variables when --single-threaded is selected. This protects the OSes/architectures that do not support TLS.

andrewrk on 21 Nov 2018

This protects the OSes/architectures that do not support TLS.

Isn't TLS always possible, but not always fast/efficient?
You can always degenerate into perthreadvariables[gettid()]

daurnimator on 1 Feb 2019

You could do that if you were always in control of creation of threads. But if you are, for example, a library, and the thread is created externally and then calls your function, then you would have no perthreadvariables global. It has to be created when thread memory is allocated. That's the main point of TLS as a language feature, is that it goes into object files and libraries, and the linker keeps track of the perthreadvariables.

andrewrk on 1 Feb 2019

a library, and the thread is created externally and then calls your function, then you would have no perthreadvariables global.

Why couldn't it be local to the library? (and infact the space per thread only needs to consider the amount of thread local storage used by that library).

daurnimator on 1 Feb 2019

Can you elaborate in detail how it would work? Here are some example questions I have: Where is the per thread memory? If allocated statically, how do you know the total number of threads that will ever be created? How do you know that two calls to gettid() which return the same value, refer to the same thread, and not a recycled tid? If allocated dynamically, how do you deal with allocation failure, when a variable load and store cannot fail?

andrewrk on 1 Feb 2019

Where is the per thread memory?

Statically via loading/linking library

If allocated statically, how do you know the total number of threads that will ever be created?

You may not! perthreadvariables would need to be of length max_thread_id. This isn't even that bad if there is an MMU.

How do you know that two calls to gettid() which return the same value, refer to the same thread, and not a recycled tid?

Good question. This does seem to kill the concept when you have no thread-cleanup. I guess we can't count on posixy robust mutexes here?
When I saw this idea last applied it was at the kernel level where you could at least set up handlers for thread cleanup.

daurnimator on 1 Feb 2019

Now that we have #1764 done, this issue should be a breeze. The important thing to note here is that on some architecture/operating system combinations, thread local storage is not available. On these targets, --single-threaded is always on, and thus in Zig you can always use thread local storage, because it will become global variables in this case.

If someone knows of a target that supports threading and does not support thread local storage, I would love to know about that.

andrewrk on 2 Feb 2019

Some references I found:

Here is Ulrich Drepper's paper on the TLS implementation in ELF:
https://akkadia.org/drepper/tls.pdf

If someone knows of a target that supports threading and does not support thread local storage, I would love to know about that.

Motivation of D's TLS by default:
http://www.drdobbs.com/cpp/its-not-always-nice-to-share/217600495

Walter mentions that OSX does not have TLS, or more specifically it did not as of 2009 although C++11 was supposed to push this due to the standardised keyword.
Manual implementations back then (2010/2009):
http://www.drdobbs.com/architecture-and-design/implementing-thread-local-storage-on-os/228701185
https://lifecs.likai.org/2010/05/mac-os-x-thread-local-storage.html

If I an not mistaken the __thread C keyword support was added to OS X 10.7 (2011).

Clang in XCode 8 added support for the C++ keyword as seen on TV (2016):
https://developer.apple.com/videos/play/wwdc2016/405/?time=354

They also mention the differences/limitations of the C++ (all types, compatible but slower) vs. C keyword (basic types+POD only but faster).

The Mach-O TLS section was added around 2015 I believe (based on clang commits) so I assume that before one of the workarounds was used. I am by no means an expert here so correct me if I am wrong.

But, based on this example, it might be safe to assume that a platform with threads does not necessarily provide build in TLS support, depending on their object format. Potentially libraries/compilers need to roll the own which is what D did back in 2010.

bfloch on 3 Feb 2019

But, based on this example, it might be safe to assume that a platform with threads does not necessarily provide build in TLS support, depending on their object format.

Thanks for doing this research. However, I'm not sure I agree with your conclusion.

The way I would go about this is starting with the LLVM documentation, which says:

Not all targets support thread-local variables.

Unfortunately it doesn't say more than this in the documentation, so it is necessary to dive into the source to find the actual list of targets and whether they support TLS.

Next, look at each target which does not support TLS one by one and try to come up with a program that uses threads. I suspect that for each of these, in Zig, we can make --single-threaded unconditionally enabled. Any use cases which are exceptions to this we should examine explicitly, and not in an abstract sense.

andrewrk on 3 Feb 2019

👍1

I have this working for Linux x86_64 (need to polish it up a bit before committing). Next, the other supported targets. MacOS and FreeBSD are probably easy since they always link libc, and thus handle the thread local storage setup before calling main. The Windows one is a complete mystery; I have not looked up how that will work yet.