Rust: RwLock and Mutex on Window theoretically allows undefined behavior in safe code

Created on 19 Aug 2016 · 40Comments · Source: rust-lang/rust

According to this back and forth on a Microsoft blog post, it is currently undefined behavior to even try to acquire an SRWLock recursively, even recursive read locks.

It might (and probably will) cause the lock to fail to fulfil its contract in the future (e.g,. allow two simultaneous exclusive acquisitions). And since wait nodes are threaded on the stack, it can result in stack memory corruption.

~~Also apparently NT keyed events have no stability guarantee so the current implementation of parking_lot on Windows could theoretically break with a new version of Windows.~~ No longer an issue as parking_lot uses the stable WaitOnAddress on newer Windows.

So uh, what do?

C-bug E-help-wanted I-unsound 💥 O-windows P-medium T-libs

Source

retep998

Most helpful comment

So this is solved then?

The official documentation is pretty clear now. And @kennykerr verified it. :)

And as @RalfJung already mentioned, making an SRW lock do anything other than deadlock on a recursive lock() call would require more effort. There's not really any space in such a lock to keep track of the thread that locked it.

The only thing left is the contradictory comments on that archived blog post. But those are a bit vague (and a bit aggressive :eyes:), and feels mostly like miscommunication. The author seems to misunderstand that a deadlock is the outcome we want, as opposed to succesfully recursively locking it: "The fact that they cannot be acquired recursively is what makes them slim!" (They are slim because they don't have to keep track of the thread id, which would mean recursive lock() deadlocks, which is good, that's what we want.) The author also says he isn't sure what Peter means with memory unsafe. There are not many languages where getting a deadlock in this case would be a good thing, so the response seems mostly something like 'just don't do this because it's always wrong, and since you shouldn't do this, we're not making any promises'. And that make sense from a C or C++ perspective, where there's not much reason to specify the difference between a deterministic deadlock and undefined behaviour, as both are unwanted and should be avoided anyway. Things are a bit different for Rust, where the distinction is very important to guard the unsafe/safe border. Luckily they decided to do make this promise after all and update the documentation.

m-ou-se on 1 Oct 2020

👍2

All 40 comments

cc @rust-lang/libs

nagisa on 19 Aug 2016

I don't see any easy way out of this. Consider this example:

let lock = RwLock::new();
mem::forget(lock.read());
lock.read();

To avoid undefined behavior we would have to either:

Have the RwLock track the thread IDs of all active readers so that we can detect a recursive lock.
Have each thread track the RwLocks that it has locked in thread local storage.
Both of these solutions require O(n) space and a lot of overhead to manage it.

Amanieu on 20 Aug 2016

…or document recursive reader lock as UB for now?

nagisa on 20 Aug 2016

@nagisa So we're going to accept, even temporarily, that a safe function in std can invoke UB?

retep998 on 20 Aug 2016

@nagisa But the code I've just written is entirely safe code. We can't allow UB in safe code.

Amanieu on 20 Aug 2016

👍1

Is there not a non-slim R/W lock primitive on Windows?

durka on 20 Aug 2016

No, there isn't.

On Aug 20, 2016 5:44 AM, "Alex Burka" [email protected] wrote:

Is there not a non-slim R/W lock primitive on Windows?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/35836#issuecomment-241174282,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AApc0gCFIgk5JgZR7j6tqhtgee6hWulpks5qhmoWgaJpZM4Jo4pj
.

nagisa on 20 Aug 2016

Even though the Windows API might not have non-slim RW locks, it does have loads of other synchronization objects that you can combine to form a recursive RW lock implementation. I wouldn't know what the best ones to use are, sorry.

andlabs on 20 Aug 2016

Boost.Interprocess provides a very complete mutex implementation that allows recursive lock acquisition and upgradeability. (I am not familiar with Rust, but I'd assume you have some way to call into C/C++ code if you're able to call Windows functions).

http://www.boost.org/doc/libs/1_61_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.sharable_upgradable_mutexes

yilangmok on 20 Aug 2016

We are definitely not depending on Boost. We can however look at their implementation to create our own mutex inspired by it.

retep998 on 21 Aug 2016

That page is for inter-process implementations of synchronisation primitives. It has considerably different design constraints compared to inter-thread stuff (documented here). The implementation of RWLocks in Boost seems to use a number (3) of semaphores in addition to some state.

nagisa on 21 Aug 2016

Thanks for the correction @nagisa. I knew Boost had a RWLock, so I assumed the first page I found would be the right one. Oops!

@retep998, of course it's up to the Rust community to decide what to use or not use. However, I've found that for some areas (including concurrency and cryptography), it's best to use an existing library that has been battle tested. Boost provides primitives that are well tested, performant and cross-platform. Why not consider it?

yilangmok on 21 Aug 2016

@yilangmok Because Boost is a massive dependency to pull in. Right now libstd is lightweight and pure Rust. Pulling in Boost would be hugely detrimental to libstd's weight. Also Boost is C++ so it would depend on the C++ runtime and standard library as well. Furthermore Rust is only able to easily work with C APIs, C++ is a massively complicated beast. Basically, I don't see any chances of Rust depending on Boost for something as simple as mutexes.

retep998 on 21 Aug 2016

👍2

Discussion at @rust-lang/libs triage today concluded that this seems like a good bug to fix, but not high-enough priority for P-high

alexcrichton on 23 Aug 2016

A nice suggestion from the comment section of the linked blog post:

Why can’t you keep using SRW locks and just keep a separate (possibly thread-local) variable to tell you whether you’ve already acquired the lock or not?

Extending the Windows RWLock structure as thus:

pub struct RWLock { 
    inner: UnsafeCell<c::SRWLOCK>,
    thread_local_reading: DWORD,
}

impl RWLock {
    pub /*const*/ fn new() -> RWLock { // cannot const anymore
        RWLock { 
           inner: UnsafeCell::new(c::SRWLOCK_INIT),
           thread_local_reading: c::TlsAlloc(), // free in drop!
        }
    }
    #[inline]
    pub unsafe fn read(&self) {
        if c::TlsGetValue(self.thread_local_reading) != 0 { panic!("recursive read") }
        TlsSetValue(self.thread_local_reading, 1);
        c::AcquireSRWLockShared(self.inner.get())
    }
    /* similar for try_read */

    #[inline]
    pub unsafe fn read_unlock(&self) {
        TlsSetValue(self.thread_local_reading, 0);
        c::ReleaseSRWLockShared(self.inner.get())
    }
}

would allow to fix the issue. Of course the TLS safeguard could be moved to any of the layers above plain wrapper; idea stays the same.

Alas, this limits the number of locks user could have at the same time to number of available TLS slots, which is 1088.

nagisa on 24 Aug 2016

Is the license for the source code in David Butenhof's "Programming with POSIX Threads" book known? The "best" information that I could find was from the pthreads-win32 COPYING file which has this to say:

The file tests/rwlock7.c is derived from code written by
Dave Butenhof for his book 'Programming With POSIX(R) Threads'.
The original code was obtained by free download from his website
http://home.earthlink.net/~anneart/family/Threads/source.html
and did not contain a copyright or author notice. It is assumed to
be freely distributable.

Currently, the source is available on the Informit.com website, which appears to be associated with the current publisher:
http://www.informit.com/store/programming-with-posix-threads-9780201633924

If the license is acceptable for use in Rust, I've written a fairly direct translation of his rwlock_t to use the Windows Vista+ primitives. By source examination, as it prefers readers over writers, recursive read locks will succeed, recursive write locks will deadlock, and acquiring a write lock while the same thread holds a read lock will deadlock. The text for the book describes how to convert it to a writer preference, which should make most recursive operations deadlock eventually, but as that text is not monetarily freely available, I have not provided that as an alternative. There should be no memory corruptions during normal operations†.

As I'm not a Rust developer, it's in C (maybe with some accidental C++ constructs). Someone more experienced in Rust could likely translate it, but that task feels like a bad fit for "my first Rust program".

The translation is available here:
https://gist.github.com/mwinterb/fcf29c312950e2c51ffa47822c8c5241

It's likely not the most efficient implementation of a reader/writer mutex as it is 10 years old, but it's O(1) space and has no dynamic memory allocations, so presumably pub const fn new() -> RWLock can remain const.

† Internally, some counters use int, so if there are more than INT_MAX read acquisitions, pending readers, or pending writers, undefined behavior will technically occur in the C implementation.

mwinterb on 26 Aug 2016

@nagisa That could work with my thread_local crate which gives you per-object thread-local storage.

However I still think that the proper solution to this is to use the parking_lot crate which provides implementations of Mutex, RwLock and Condvar that work on all platforms including Windows XP. This was proposed in a RFC (https://github.com/rust-lang/rfcs/pull/1632) but it in the end it was not accepted.

Amanieu on 26 Aug 2016

👍2

On 2016-08-25 18:26:57-0700, Amanieu wrote:

@nagisa That could work with my thread_local crate
which gives you per-object thread-local storage.

However I still think that the proper solution to this is to use the
parking_lot crate which provides implementations of
Mutex, RwLock and Condvar that work on all platforms including Windows XP. This was proposed
in a RFC (https://github.com/rust-lang/rfcs/pull/1632) but it in the end it was not accepted.

I do not disagree. I’m just brainstorming for simple and quick-to-implement fixes which could paper
over immediate issue without us doing any sort of large scale change.

-- You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/rust-lang/rust/issues/35836#issuecomment-242595153

nagisa on 26 Aug 2016

I'm interested in fixing this.

I think the best long-term solution is to use parking_lot in the stdlib. For now though, @nagisa's solution (above) is much better than UB, no? It's very unlikely that anyone is using more than 1,088 TLS + RwLock, and we can at least panic if that happens.

I'll send in a PR to do that unless someone has a better idea.

mattico on 13 Mar 2017

Making RwLock::new() non-const _shouldn't_ be an issue because:

const_fn is unstable so who cares 😄
Everyone should be using sync::RWLock which does not have a const constructor

_except_ panicking.rs in libstd uses the underlying const-fn impl now that StaticRWLock has been removed.

Not an insurmountable problem but there's going to have to be some hackery, like an ad-hoc reimplementation of lazy_static! - _which should be in std, IMO_.

mattico on 13 Mar 2017

@mattico I think such a solution would limit a process to at most 1088 instances of RwLock, right? If that's so that seems... unfortunately too low :(

alexcrichton on 13 Mar 2017

Oh that's per process, I thought it was per-thread. That might actually be an issue, yes.

mattico on 14 Mar 2017

Maybe allocate an array per thread (how big?) to use as a bit-set for the RwLock flags, and store a pointer to it in TLS. Then add an index into the set to each RwLock. Now we're only using one TLS per thread that has a RwLock, but it's a bit more of a hack 🙁 . I'll write it up and see how bad it is.

mattico on 14 Mar 2017

@mattico Have a look at the thread_local crate. It allows you to have per-object TLS without using up TLS indexes.

Amanieu on 14 Mar 2017

👍1

It might be better to handle this in the sync::RwLock impl instead of the windows platform implementation. It allows folks who know what they're doing (like std) to continue to use the unsafe API. This issue is similar to lock poisoning which we already deal with at this level. It is unfortunate to introduce platform-specific code into sync, though.

mattico on 14 Mar 2017

Shouldn't this be tagged as a soundness issue since this it allows UB in safe code?

CodesInChaos on 15 Feb 2018

Nobody has been able to actually demonstrate UB in safe code using this yet. It's only theoretical at the moment.

We really just need to migrate to parking_lot already.

retep998 on 15 Feb 2018

😕2 👍1

What is that supposed to mean? It's either defined or not. Do you mean it won't be tagged I-unsound unless someone demonstrates memory corruption?

durka on 15 Feb 2018

Nobody has been able to actually demonstrate UB in safe code using this yet. It's only theoretical at the moment.

The burden of proof is on the code author to show that their code does not have UB. If the code has UB, it might do anything, including working perfectly well. One cannot show absence of UB by example.

The Windows Mutex implementation is also wrong (lack of proper initialization), and RwLock is UB on macOS as well (and possibly other non-Linux POSIX platforms) because macOS also says recursive locking of an RwLock is UB when a write lock is involved. (Only recursive write and read-write locking though. Making recursive reads UB is just... wtf, what were they thinking?? And the documentation for the relevant function does not even mention this "detail". Wow.)

Oh and the Mutex situation is horrible on all POSIX platforms because there is no const fn way to initialize a sane Mutex. So Mutex initialization is split into two parts in the std::sys layer, and using the "half-initialized" one (which we do in a couple places in libstd) makes recursive locking UB as well. Together with the restrictions about moving things, the API provided by sys::mutex is just plain horrible and I am somewhat nervous about the fact that it might be used incorrectly somewhere in libstd.

I implemented a reentrancy checker for Mutex (which is easier than RwLock) because there are no read locks), and it makes my microbenchmarks slower by ~20%. Not great.

RalfJung on 7 Aug 2018

👍1

Oh, and we are also using SRWLock for Mutex on Windows, so it will have the same UB.

We really just need to migrate to parking_lot already.

I have been thinking about implementing a lazily initialized const fn-capable Mutex for POSIX and maybe even rearrange things a bit to get rid of the restriction about not moving a Mutex around.

But, given all the other problems in particular around RwLock (where at least two major platforms just do not have a reasonable platform-specific implementation) -- it seems more and more reasonable to just roll our own implementation. I also hear that parking_lot is faster than the platform implementations even where those have reasonable semantics.

So, how realistic is it to use parking_lot? Alternatively, we might be able to rearrange the abstractions a bit, moving the (un)park stuff to sys (implemented directly on top of the platform APIs; whether mutexes are recursive should not be a concern here) and then implement Mutex, RwLock and Condvar on top of that ourselves... which I guess is what parking_lot does anyway?

RalfJung on 7 Aug 2018

👍1

I've opened an internals post to continue more long-form discussion on the topic of continuing to fix these issues.

alexcrichton on 20 Sep 2018

I just looked at the SRW lock docs again and now it says explicitly

Exclusive mode SRW locks cannot be acquired recursively. If a thread tries to acquire a lock that it already holds, that attempt will fail (for TryAcquireSRWLockExclusive) or deadlock (for AcquireSRWLockExclusive)

So, uh, did they retroactively define behavior? And if yes, how far back in terms of platform support does this guarantee go?

RalfJung on 29 Feb 2020

Cc @retep998 @mati865 would be good if someone more knowledgeable in Windows matters could decide if there still is a bug here.

We have a direct contradiction between the statements in this post (including @retep998 asking "is it really actually UB" and the answer being "yes") and the docs change added here. The former says

It's a programming error. It is your responsibility as a programmer not to call AcquireSRWLockShared or AcquireSRWLockExclusive from a thread that has already acquired the lock. Failing to comply with this rule will result in undefined behavior.

and the latter now says (and thus presumably guarantees)

Exclusive mode SRW locks cannot be acquired recursively. If a thread tries to acquire a lock that it already holds, that attempt will fail (for TryAcquireSRWLockExclusive) or deadlock (for AcquireSRWLockExclusive)

Looks like Raymond Chen and whoever changed the docs have a different interpretation of what "cannot be acquired recursively" means. The docs version makes total sense; actually counting how often the current thread holds the write lock would require extra state so a simple naive ("slim") RwLock implementation will deadlock on recursive write locks.

I wonder if there is any way to ask Raymond Chen again in light of this new information?

RalfJung on 4 May 2020

I don't know anyone other than retep998 who could know those parts of Windows API well enough.

mati865 on 4 May 2020

I wanted a simple mutex that is faster than std::sync::Mutex but didn't want to pull in the whole parking_lot dependency so I wrote this crate: https://github.com/stjepang/simple-mutex

The implementation is fairly simple (it should be reviewable at least). Not quite as efficient as parking_lot but still better than std. The overhead is two words + something allocated on the heap on first case of contention.

I wonder if we should consider implementing a simpler version of parking_lot with reasonable performance and get that into the standard library. Think something with 20% of its complexity and 80% of performance.

stjepang on 25 May 2020

@rustbot ping windows

If any one of you is familiar enough with the details of SRWLocks on Windows, would be great if you could look into the question posed here. :)

RalfJung on 11 Jun 2020

Hey Windows Group! This bug has been identified as a good "Windows candidate".
In case it's useful, here are some [instructions] for tackling these sorts of
bugs. Maybe take a look?
Thanks! <3
cc @arlosi @danielframpton @gdr-at-ms @kennykerr @luqmana @lzybkr @retep998 @rylev @sivadeilra @spastorino

rustbot on 11 Jun 2020

If the lock is already held by the calling thread, AcquireSRWLockExclusive will hang and TryAcquireSRWLockExclusive will return false. Internally, there is a lock bit and this is used to check whether the lock may be acquired irrespective of thread.