Rust: Match on repr(C) enum returned from C library with unknown value leads to UB

Created on 3 Oct 2016 · 20Comments · Source: rust-lang/rust

Summary

It is not uncommon for C libraries to add more values to an enum in their API while considering it a non-breaking change. However, with a rust library linked to the C library it causes undefined behavior when the change is made if the rust enum definition is not updated.

Examples

I've reduced the issue down to this sample code where get_bad_kind() represents a call into a C library that returns a new enum value unknown to rust which leads to a crash at runtime (signal: 11, SIGSEGV: invalid memory reference or stack overflow): https://play.rust-lang.org/?gist=f066d8e489a6e220866958065891a812&version=stable&backtrace=0

The C version of this code does not result in this issue, instead it hits the default case and returns -1: https://gist.github.com/tcosprojects/49d0c809d40b8f008e25dacf49c508bf

Tested on

rustc 1.12.0 (3191fbae9 2016-09-23)
binary: rustc
commit-hash: 3191fbae9da539442351f883bdabcad0d72efcb6
commit-date: 2016-09-23
host: x86_64-unknown-linux-gnu
release: 1.12.0

and

rustc 1.12.0 (3191fbae9 2016-09-23)
binary: rustc
commit-hash: 3191fbae9da539442351f883bdabcad0d72efcb6
commit-date: 2016-09-23
host: i686-pc-windows-msvc
release: 1.12.0

Discussion

Discussing the issue on the #rust channel led to the following:

A binding library would need to match the rust definition of the enum with the C definition exactly at all times when linked to a C library to prevent either this undefined behavior or passing unsupported flags to older versions of the C library. This could be done in the cargo build script that checks the C library version and enables feature flags for these enum values. It would still not not guarantee safety though when using dynamically linked libraries. Or it could be resolved by not using repr(C) enums and instead translating them at runtime with a function that handles invalid values. If the latter should be done, it would be helpful to have it mentioned in the docs somewhere. It also makes me uncertain what the purpose of repr(C) enum is, if not to be compatible with and used with C FFI calls.

In any case, as a C programmer this bug was surprising and took some significant debugging. It required going into the assembly output of the match statement to see it jump based on the constant of the highest value from the rust enum definition and then corrupt the stack to pin down the cause of the crash.

A warning about this compatibility issue regarding repr(C) enum in the FFI docs would be helpful. I suspect it is not uncommon for rust bindings libraries to use repr(C) enum this way while expecting it to work.

If the match expression could jump to unreachable!() in debug builds when encountering an unknown value on a repr(C) enum type it would help catch this issue without so much debugging.

Questions

In the discussion on #rust there was some disagreement over whether this is undefined behavior for C. Is it?
In the discussion there was also disagreement over whether this should be considered a breaking ABI change by the C library. In practice it does not seem uncommon for C libraries to do this while considering it non-breaking. Is it?
Should this be a concern for rustc? Or handled elsewhere?
Is it possible for rust to match the behavior of the C program's switch when an enum is repr(C)?

C-feature-request T-lang

Source

tcosprojects

Most helpful comment

The worst part of UB is that it _might_ just do the "reasonable" thing you'd want...

cuviper on 21 May 2017

😄4 👍3

All 20 comments

match in Rust has exhaustive semantics and it is very prominently documented pretty much everywhere where match is mentioned. repr(C) only changes the representation of the enum (i.e. how it appears in memory) to match C representation, but no representational change affects (or should affect) the behaviour of rust expressions.

TL;DR: match is match, not switch by design, no matter what representation the enum has.

In the discussion there was also disagreement over whether this should be considered a breaking ABI change by the C library. In practice it does not seem uncommon for C libraries to do this while considering it non-breaking. Is it?

It is not a ABI breaking change, as adding a variant to a C enum (as long as the size of the actual integer which represents the enum in memory does not change) has no influence on the registers/memory locations where this enum gets placed when returned or passed as an argument.

It is, however, an API-breaking change.

Should this be a concern for rustc? Or handled elsewhere?

Rustc (its match) is behaving as intended IMO.

Is it possible for rust to match the behavior of the C program's switch when an enum is repr(C)?

Matching on enum as i64 gets you switch-like semantics. Though, it is by no means convenient.

nagisa on 4 Oct 2016

Ok, so some research suggests that 1) C and C++ have similar, but slightly different rules about the valid values of an enumeration type. However, it seems that neither are limited to the values defined by the declaration.

As far as I can tell, an enumeration type in C can take on any value of the underlying type. This means that adding an enum constant may-or-may-not be a breaking ABI change depending on the compiler and target (because of course it is). Fortunately, most platforms will default to i32 unless specifically fixed to another type.

With that in mind, we should probably do two things:

Not supply range metadata to LLVM for repr(C) enums, that eliminates one source of UB and allows for manual checking of the values if necessary by casting to an integer type.
Generate a default case if there isn't one already in the match when debug_assertions is enabled. I think that keeping exhaustiveness is valuable. It's also easier.

Aatch on 4 Oct 2016

The fact that Rust enums cause UB when they hold a value that isn't a valid variant is the primary reason I don't use Rust enums in winapi and instead use simple integer constants _{(The other reason being that integer constants don't require any trait impls or methods and significantly reduce compile times)}. C/C++ will often even use enums as bitflags which definitely don't align with Rust enums.

In the discussion on #rust there was some disagreement over whether this is undefined behavior for C. Is it?

Windows API uses enums as bitflags in both C and C++ which indicates that it is not undefined behavior in MSVC C/C++.

retep998 on 4 Oct 2016

Sure. We purged all Rust enums from rustllvm unless we control the other side. Maybe prohibit repr(C) on enums (we would have to find another repr tag)?

arielb1 on 4 Oct 2016

Matching on enum as i64 gets you switch-like semantics. Though, it is by no means convenient.

That's wrong. Creating a Rust enum with an out-of-range value is instant UB - we generate !range metadata, which would cause your switch to be optimized out.

arielb1 on 4 Oct 2016

I think that there should be a lint against using an enum type in FFI, except for the case where the enum cannot have been created or modified by foreign code. It is way too easy to mess this up.

A newtype of an integer is a much better solution.

DemiMarie on 31 Oct 2016

👍4 👎1

Perhaps we should allow extern functions which take an enum by value, but lint on any which return an enum or take a mutable pointer to an enum. And by perhaps, I mean we really need this because this is a giant footgun waiting to catch someone out.

retep998 on 31 Oct 2016

👍3

/subscribe

If a C API takes an enum by value, there's a decent chance it's actually bitflags, i.e. you're expected to be able to do E_THIS|E_THAT|E_THEOTHER in the call.

zackw on 6 Jan 2017

The usage of C enums as bitflags can be handled in Rust with things like the bitflags crate (or something similar). So that shouldn't be a big deal.

But this issue of undefined behavior for "out of range" enums is a real concern for me. bindgen even generates Rust enums from C enums. Describing this current state of affairs as a "foot gun" is a perfect description. Rust really needs an ergonomic solution to using C enums in a Rust FFI setting. Otherwise Rust's safety promises are moot.

mjbshaw on 26 Apr 2017

Wouldn't the simplest option be allowing the user to specify the default value to be selected in the event the enum is out-of-range? Something like:

#[repr(C, default = Invalid)]
enum Thing {
    Invalid,
    ...

At least for the enums I happen to be working with in this FFI, they all support a good choice of a default option.

tonychain on 26 Apr 2017

@mjbshaw

bindgen even generates Rust enums from C enums

Yes, but bindgen's generated bindings are intended to be created at compile time, so this should only be a problem for users that generate bindings from the command line.

Rust really needs an ergonomic solution to using C enums in a Rust FFI setting.

I would first approach this with a proc-macro that generates constants at compile time before adding yet-another-feature to the language.

Otherwise Rust's safety promises are moot.

I do not agree with this, I do not think that C enumerations are part of Rust's safety measures, what happens is that this problem is unsolvable (in my humble opinion) within the Rust language itself, rust-bindgen does great job trying to force users to generate bindings at compile time, thus avoiding UB with C enums. The only problem I see with rust-bindgen is the dependency of libclang (and the unstable API it has).

jeandudey on 27 Apr 2017

👎1

rust-bindgen does great job trying to force users to generate bindings at compile time, thus avoiding UB with C enums.

I don't think this provides the guarantees that you want! C enums can still have values outside of the explicitly stated variants. AIUI, basically anything within the same bit range is still legal.

cuviper on 27 Apr 2017

Note, also, that bindgen allows customizing the enums as needed (generating plain integers, for example).

The possibilities would be much larger with https://github.com/rust-lang/rfcs/pull/1758, though.

emilio on 27 Apr 2017

I would first approach this with a proc-macro that generates constants at compile time before adding yet-another-feature to the language.

Or you could use a regular declarative macro to define enums using normal syntax and have them turned into constants like winapi does.

retep998 on 27 Apr 2017

@jeandudey

Yes, but bindgen's generated bindings are intended to be created at compile time, so this should only be a problem for users that generate bindings from the command line.

How does generating bindings at compile time help? As I mentioned in servo/rust-bindgen#667 if the Rust program dynamically links to the C library then the C library could easily return an "out of range" enum (e.g. if the user's computer has a slightly different version of the C library; C programmers generally don't consider adding a value to an enum to be an API or ABI breaking change). Whether Rust's bindings are generated at compile time or not is irrelevant here.

mjbshaw on 27 Apr 2017

👍3

Before I take the time to switch over to using numerical constants instead of enums in my own project, may I ask if this is still a concern? I just tried running the playground example that was linked in the original post on every rust version from 1.12.0 up to 1.17.0:

1.12.0:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
error: Process didn't exit successfully: `target/debug/playground_application` (signal: 6, SIGABRT: process abort signal)

1.13.0-1.16.0:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow

On 1.17.0, however, everything seems to be fine:

Finished dev [unoptimized + debuginfo] target(s) in 0.22 secs
     Running `target/debug/playground_application`
-1

It exhibits the same behavior when run on play.rust-lang.org; not one of stable, beta, or nightly segfaults.

EDIT: It also doesn't segfault even if you don't have a _ => ... case: https://play.rust-lang.org/?gist=4f40cea0cb5f402019d0398341153119&version=stable&backtrace=0

Cldfire on 21 May 2017

The worst part of UB is that it _might_ just do the "reasonable" thing you'd want...

cuviper on 21 May 2017

😄4 👍3

Have you tried it with optimization? In my case whether the undefined behavior manifested depended on surrounding code and if it was optimized.

tcosprojects on 24 May 2017

If by 'optimization' you mean release mode, then yeah. Other than that, no.

Regardless, what cuviper said made sense; I see now that it would be foolhardy to rely on this behavior :)

Cldfire on 24 May 2017

So, it's been two years. re-reading this issue, it seems that this is working as intended, generally. As such, I'm going to give it a close. Anything that changes here seems big enough to require an RFC, and probably should be pursued through those channels. Thanks!