It is not uncommon for C libraries to add more values to an enum in their API while considering it a non-breaking change. However, with a rust library linked to the C library it causes undefined behavior when the change is made if the rust enum definition is not updated.
I've reduced the issue down to this sample code where get_bad_kind() represents a call into a C library that returns a new enum value unknown to rust which leads to a crash at runtime (signal: 11, SIGSEGV: invalid memory reference or stack overflow): https://play.rust-lang.org/?gist=f066d8e489a6e220866958065891a812&version=stable&backtrace=0
The C version of this code does not result in this issue, instead it hits the default case and returns -1: https://gist.github.com/tcosprojects/49d0c809d40b8f008e25dacf49c508bf
rustc 1.12.0 (3191fbae9 2016-09-23)
binary: rustc
commit-hash: 3191fbae9da539442351f883bdabcad0d72efcb6
commit-date: 2016-09-23
host: x86_64-unknown-linux-gnu
release: 1.12.0
and
rustc 1.12.0 (3191fbae9 2016-09-23)
binary: rustc
commit-hash: 3191fbae9da539442351f883bdabcad0d72efcb6
commit-date: 2016-09-23
host: i686-pc-windows-msvc
release: 1.12.0
Discussing the issue on the #rust channel led to the following:
A binding library would need to match the rust definition of the enum with the C definition exactly at all times when linked to a C library to prevent either this undefined behavior or passing unsupported flags to older versions of the C library. This could be done in the cargo build script that checks the C library version and enables feature flags for these enum values. It would still not not guarantee safety though when using dynamically linked libraries. Or it could be resolved by not using repr(C) enums and instead translating them at runtime with a function that handles invalid values. If the latter should be done, it would be helpful to have it mentioned in the docs somewhere. It also makes me uncertain what the purpose of repr(C) enum is, if not to be compatible with and used with C FFI calls.
In any case, as a C programmer this bug was surprising and took some significant debugging. It required going into the assembly output of the match statement to see it jump based on the constant of the highest value from the rust enum definition and then corrupt the stack to pin down the cause of the crash.
A warning about this compatibility issue regarding repr(C) enum in the FFI docs would be helpful. I suspect it is not uncommon for rust bindings libraries to use repr(C) enum this way while expecting it to work.
If the match expression could jump to unreachable!() in debug builds when encountering an unknown value on a repr(C) enum type it would help catch this issue without so much debugging.
match
in Rust has exhaustive semantics and it is very prominently documented pretty much everywhere where match
is mentioned. repr(C)
only changes the representation of the enum (i.e. how it appears in memory) to match C representation, but no representational change affects (or should affect) the behaviour of rust expressions.
TL;DR: match
is match
, not switch
by design, no matter what representation the enum has.
In the discussion there was also disagreement over whether this should be considered a breaking ABI change by the C library. In practice it does not seem uncommon for C libraries to do this while considering it non-breaking. Is it?
It is not a ABI breaking change, as adding a variant to a C enum (as long as the size of the actual integer which represents the enum in memory does not change) has no influence on the registers/memory locations where this enum gets placed when returned or passed as an argument.
It is, however, an API-breaking change.
Should this be a concern for rustc? Or handled elsewhere?
Rustc (its match
) is behaving as intended IMO.
Is it possible for rust to match the behavior of the C program's switch when an enum is repr(C)?
Matching on enum as i64
gets you switch-like semantics. Though, it is by no means convenient.
Ok, so some research suggests that 1) C and C++ have similar, but slightly different rules about the valid values of an enumeration type. However, it seems that neither are limited to the values defined by the declaration.
As far as I can tell, an enumeration type in C can take on any value of the underlying type. This means that adding an enum constant may-or-may-not be a breaking ABI change depending on the compiler and target (because of course it is). Fortunately, most platforms will default to i32
unless specifically fixed to another type.
With that in mind, we should probably do two things:
repr(C)
enums, that eliminates one source of UB and allows for manual checking of the values if necessary by casting to an integer type.debug_assertions
is enabled. I think that keeping exhaustiveness is valuable. It's also easier.The fact that Rust enums cause UB when they hold a value that isn't a valid variant is the primary reason I don't use Rust enums in winapi and instead use simple integer constants (The other reason being that integer constants don't require any trait impls or methods and significantly reduce compile times). C/C++ will often even use enums as bitflags which definitely don't align with Rust enums.
In the discussion on #rust there was some disagreement over whether this is undefined behavior for C. Is it?
Windows API uses enums as bitflags in both C and C++ which indicates that it is not undefined behavior in MSVC C/C++.
Sure. We purged all Rust enums from rustllvm unless we control the other side. Maybe prohibit repr(C)
on enums (we would have to find another repr tag)?
Matching on
enum as i64
gets you switch-like semantics. Though, it is by no means convenient.
That's wrong. Creating a Rust enum with an out-of-range value is instant UB - we generate !range
metadata, which would cause your switch to be optimized out.
I think that there should be a lint against using an enum type in FFI, except for the case where the enum cannot have been created or modified by foreign code. It is way too easy to mess this up.
A newtype of an integer is a much better solution.
Perhaps we should allow extern
functions which take an enum by value, but lint on any which return an enum or take a mutable pointer to an enum. And by perhaps, I mean we really need this because this is a giant footgun waiting to catch someone out.
/subscribe
If a C API takes an enum by value, there's a decent chance it's actually bitflags, i.e. you're expected to be able to do E_THIS|E_THAT|E_THEOTHER
in the call.
The usage of C enums as bitflags can be handled in Rust with things like the bitflags crate (or something similar). So that shouldn't be a big deal.
But this issue of undefined behavior for "out of range" enums is a real concern for me. bindgen even generates Rust enums from C enums. Describing this current state of affairs as a "foot gun" is a perfect description. Rust really needs an ergonomic solution to using C enums in a Rust FFI setting. Otherwise Rust's safety promises are moot.
Wouldn't the simplest option be allowing the user to specify the default value to be selected in the event the enum is out-of-range? Something like:
#[repr(C, default = Invalid)]
enum Thing {
Invalid,
...
At least for the enums I happen to be working with in this FFI, they all support a good choice of a default option.
@mjbshaw
bindgen even generates Rust enums from C enums
Yes, but bindgen
's generated bindings are intended to be created at compile time, so this should only be a problem for users that generate bindings from the command line.
Rust really needs an ergonomic solution to using C enums in a Rust FFI setting.
I would first approach this with a proc-macro
that generates constants at compile time before adding yet-another-feature to the language.
Otherwise Rust's safety promises are moot.
I do not agree with this, I do not think that C enumerations are part of Rust's safety measures, what happens is that this problem is unsolvable (in my humble opinion) within the Rust language itself, rust-bindgen
does great job trying to force users to generate bindings at compile time, thus avoiding UB with C enums. The only problem I see with rust-bindgen
is the dependency of libclang
(and the unstable API it has).
rust-bindgen does great job trying to force users to generate bindings at compile time, thus avoiding UB with C enums.
I don't think this provides the guarantees that you want! C enums can still have values outside of the explicitly stated variants. AIUI, basically anything within the same bit range is still legal.
Note, also, that bindgen allows customizing the enums as needed (generating plain integers, for example).
The possibilities would be much larger with https://github.com/rust-lang/rfcs/pull/1758, though.
I would first approach this with a proc-macro that generates constants at compile time before adding yet-another-feature to the language.
Or you could use a regular declarative macro to define enums using normal syntax and have them turned into constants like winapi does.
@jeandudey
Yes, but
bindgen
's generated bindings are intended to be created at compile time, so this should only be a problem for users that generate bindings from the command line.
How does generating bindings at compile time help? As I mentioned in servo/rust-bindgen#667 if the Rust program dynamically links to the C library then the C library could easily return an "out of range" enum (e.g. if the user's computer has a slightly different version of the C library; C programmers generally don't consider adding a value to an enum to be an API or ABI breaking change). Whether Rust's bindings are generated at compile time or not is irrelevant here.
Before I take the time to switch over to using numerical constants instead of enums in my own project, may I ask if this is still a concern? I just tried running the playground example that was linked in the original post on every rust version from 1.12.0 up to 1.17.0:
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
error: Process didn't exit successfully: `target/debug/playground_application` (signal: 6, SIGABRT: process abort signal)
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
On 1.17.0, however, everything seems to be fine:
Finished dev [unoptimized + debuginfo] target(s) in 0.22 secs
Running `target/debug/playground_application`
-1
It exhibits the same behavior when run on play.rust-lang.org; not one of stable, beta, or nightly segfaults.
EDIT: It also doesn't segfault even if you don't have a _ => ...
case: https://play.rust-lang.org/?gist=4f40cea0cb5f402019d0398341153119&version=stable&backtrace=0
The worst part of UB is that it _might_ just do the "reasonable" thing you'd want...
Have you tried it with optimization? In my case whether the undefined behavior manifested depended on surrounding code and if it was optimized.
If by 'optimization' you mean release mode, then yeah. Other than that, no.
Regardless, what cuviper said made sense; I see now that it would be foolhardy to rely on this behavior :)
So, it's been two years. re-reading this issue, it seems that this is working as intended, generally. As such, I'm going to give it a close. Anything that changes here seems big enough to require an RFC, and probably should be pursued through those channels. Thanks!
Most helpful comment
The worst part of UB is that it _might_ just do the "reasonable" thing you'd want...