Rust: Regression from stable: pointer to usize conversion no longer compiles

Created on 1 Oct 2018  路  44Comments  路  Source: rust-lang/rust

The following code works with Rust 1.29.0:

union TransmuteHack<T: Copy, U: Copy> {
    from: T,
    to: U,
}

pub static VALUE: usize = 42;
pub static ADDRESS: usize = unsafe { TransmuteHack { from: &VALUE }.to };

But it fails with the current beta (and nightly):

error[E0080]: this static likely exhibits undefined behavior

 --> <source>:7:1

  |

7 | pub static ADDRESS: usize = unsafe { TransmuteHack { from: &VALUE }.to };

  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ type validation failed: encountered a pointer, but expected the type usize

  |

  = note: The rules on what exactly is undefined behavior aren't clear, so this check might be overzealous. Please open an issue on the rust compiler repository if you believe it should not be considered undefined behavior

error: aborting due to previous error

For more information about this error, try `rustc --explain E0080`.

Compiler returned: 1

I think this code should still be valid. I don't see any reason for this to be undefined behavior. This is a hack, yes, but it's very useful for FFI-related work.

A-const-eval T-lang disposition-close finished-final-comment-period regression-from-stable-to-beta

Most helpful comment

Yea, this kind of constant is very dangerous. I personally don't think there can be a good enough use case to outweigh the problems with it.

You can always use a *const () for FFI, which also makes your intent clearer. Pointers are not integers and treating them like that will get you into trouble not just in Rust

All 44 comments

This check was added in #51361 so mentioning @oli-obk and @RalfJung. It looks quite intentional and justifiable to reject this code. The same code without the unsafe hack is already disallowed in 1.29.0.

pub static ADDRESS: usize = &VALUE as *const usize as usize;
error[E0018]: raw pointers cannot be cast to integers in statics
 --> src/lib.rs:7:29
  |
7 | pub static ADDRESS: usize = &VALUE as *const usize as usize;
  |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I would expect this to be reconsidered after stabilizing #51910. Here is the relevant message on nightly:

error[E0658]: casting pointers to integers in statics is unstable (see issue #51910)
 --> src/lib.rs:7:29
  |
7 | pub static ADDRESS: usize = &VALUE as *const usize as usize;
  |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = help: add #![feature(const_raw_ptr_to_usize_cast)] to the crate attributes to enable

Yea, this kind of constant is very dangerous. I personally don't think there can be a good enough use case to outweigh the problems with it.

You can always use a *const () for FFI, which also makes your intent clearer. Pointers are not integers and treating them like that will get you into trouble not just in Rust

Agreed, this was a deliberate change in behavior (and we cratered it, I assume?). The union hack was more of a loophole that we closed.

Now, to be fair, this is a static and not a const. For const there are plenty of examples of such a "pointer disguising as an integer" causing problems. For static, however, the situation is less clear: Consts cannot refer to static, so the final value being bad for CTFE is not necessarily a big problem. And the final value is fine for run-time consumption. @oli-obk what do you think? Would it make sense to relax the validity check for statics? (The validity check needs an argument anyway to switch between checking const and runtime validity, because I want to use it in miri.)

A static can refer to another static's value though. and then we have the same problem.

And I don't think we want to make referring to other statics be dependent on the other static's validity checks succeeding

Oh right, good point. Also, promoted code can refer to statics I assume. Same problem.

So, I think this issue should be closed as "this regression is a bugfix". The work-around is to no longer lie about the actual type of the data stored in that static, and make it e.g. of type *const usize.

A static can refer to another static's value though.

Can they? I thought a static could only refer to another static's address, not its value. Which is pretty different.

Works just fine on stable: https://play.rust-lang.org/?gist=a07a6558a87a86b0ed2e23855f2b4c44&version=stable&mode=debug&edition=2015

This was implemented half a year ago by lifting a bunch of restrictions of the old const evaluator

And here's an example of how the static it complains about can cause a problem. There's no reasonable way to implement "divide a pointer by 2" in CTFE.

Works just fine on stable: https://play.rust-lang.org/?gist=a07a6558a87a86b0ed2e23855f2b4c44&version=stable&mode=debug&edition=2015

This was implemented half a year ago by lifting a bunch of restrictions of the old const evaluator

Interesting! I wasn't aware of those changes. Thanks for the demo.

And here's an example of how the static it complains about can cause a problem. There's no reasonable way to implement "divide a pointer by 2" in CTFE.

I guess I don't really see what's wrong in that example. It fails to compile (which is correct) with a reasonable error message in a reasonable location. Seems okay to me. I don't see why this has to be preemptively prevented by disabling the union transmute hack.

Things get more tricky if the static with the "bad" usize is in a different crate than the place where it is used.

Also, if you write e.g. &(STATIC / 2) in normal code, it will get promoted to 'static lifetime. That will cause trouble if the STATIC does not follow the rules for const safety.

Also, if you write e.g. &(STATIC / 2) in normal code, it will get promoted to 'static lifetime. That will cause trouble if the STATIC does not follow the rules for const safety.

That only works for constants, I hope. Accessing a static's value only works in statics.

But yes. Creating a bad constant and using it in runtime code can cause everything from weird errors to undefined behaviour if used in runtime places.

Still this is probably sth. that some team should look at. AFAIK we do not have a team decision on this so far, and it seems prudent to get one. So maybe we should reopen and tag for, eh, T-lang?

Creating a bad constant and using it in runtime code can cause everything from weird errors to undefined behaviour if used in runtime places.

Yes, but usize doesn't have any "bad" values (I'm assuming "bad" means undefined behavior values (e.g. a zero NonZeroUsize) or a trap representation (e.g. a signaling NaN)), does it?

It doesn't at run-time (except for Uninit aka undef/poison, i.e., except for uninitialized data). It has for constant evaluation. See this blog post for some more thoughts on that.

So, @RalfJung, after going through the blog post (you definitely understand this better than I do, so let me know if I've misunderstood this), I'm still not sure why this has to be disallowed for CTFE. The expression &VALUE is not a constant expression and should not be (and indeed cannot be) (fully) evaluated at compile time. The address of VALUE is determined by the linker (at best) or the executable loader at runtime. I don't think this expression should have all the strict const eval rules applied to it because it's not a constant expression.

Every expression you write in the initializer of a static or const is a constant expression and evaluated at compile time. If the value of a static or const is a pointer, we emit appropriate relocations for LLVM to handle, but that doesn't change the fact that all this is a constant expression -- e.g. if you do ptr.wrapping_offset(4).wrapping_offset(8), we will compute the offset 12 at compile time (well, we would if we allowed those methods to be called, but the underlying engine can handle all that).

What is the problem with using a pointer type here? Pointers are not integers, and for CTFE this is apparent even by looking at what we send to LLVM (for pointers, we need relocations). This distinction is important to describe optimizing compilers even if you ignore constant evaluation, but it surfaces even more in const context.

The problem is that you can already do

let x: &'static u32 = &42;

on stable Rust since a year or so. We need to keep supporting it, so we're very careful to not allow any oddball features like pointer values with integer types. Since &SOME_CONSTANT is (almost) always promoted to a static, we also need to ensure that constants are somewhat sane (which they were by definition until miri came along).

It was an accident that we ever allwoed it.

Some context about my use case: Objective-C type encoding uses different encodings for integers vs pointers. Some FFI-related data structures use usize rather than *mut T (because they pack extra tag flags in the low two bits) (and to be clear: I'm just writing the Rust FFI bindings; I'm not the author of the native library that I'm interacting with).

I could use *mut T but it would mean I would have to special-case the Objective-C type encodings for each struct that does this, and I'd probably have to create a union (of usize and *mut T) so I can assign the correct pointer to the static variable (and later at run time twiddle the bits as needed).

I'll stop pushing for this, though.

so I can assign the correct pointer to the static variable (and later at run time twiddle the bits as needed).

Why can't you just use the *mut T and during your runtime twiddling convert to a usize, do the twiddling, and then convert back? That seems strictly more correct than storing a pointer in a usize just because you sometimes need to modify its bits direclty.

It might also be an option to turn this into a deny-by-default lint for cases when people really are asking for a footgun?

The interaction with promotion is ugly, though.

Why can't you just use the *mut T and during your runtime twiddling convert to a usize, do the twiddling, and then convert back?

That's what I would need to do. But storing the value in *mut T means that the type encoding for the struct can no longer be automatically derived, since *mut T and usize are encoded differently.

It might also be an option to turn this into a deny-by-default lint for cases when people really are asking for a footgun?

I personally would love for this to be a lint. It can be a bit of a footgun, yes, but low level FFI programming is full of footguns.

The footgun here, however, is of a different nature: If you ever write &(ADDRESS/2), that code will fail to compile because the compiler thinks it can execute it at compile-time, and when it finds out it cannot, it is too late.

Forgive my ignorance, but why is that a problem? Failure to compile is exactly what I'd expect in that scenario.

Do you really? This code compiles:

union TransmuteHack<T: Copy> {
    from: T,
    to: usize,
}

pub static VALUE: usize = 42;

fn main() {
    println!("{:?}", unsafe { TransmuteHack { from: &VALUE }.to } / 2);
}

But with promotion, this would not:

union TransmuteHack<T: Copy> {
    from: T,
    to: usize,
}

pub static VALUE: usize = 42;
pub static ADDRESS: usize = unsafe { TransmuteHack { from: &VALUE }.to };

fn main() {
    println!("{:?}", ADDRESS / 2);
}

Notice that println! adds an & in front of the arguments. So in the second case, we have &(ADDRESS/2), which gets promoted and then fails during CTFE. And working around this is quite hard actually, you have to write something like (|| ADDRESS)() / 2 to disable promotion.

@oli-obk Why does the last example compile on stable? Shouldn't it promote and then fail...?

EDIT: Seems like we don't promote uses of a static. Interesting.

@rust-lang/lang I think there is a decision to be made here. (I hope I am using these tags correctly...)

The problem is that there are some values that are safe but not const-safe (using the terminology from my blog post). Namely, &0 as *const _ as usize is a perfectly fine and safe usize at run-time, but not so at compile-time because there are integer operations that we cannot perform on it (e.g. division).

We have a check running on every constant and static making sure that it follows the invariants implied by its type, and this check currently rejects pointer values at integer type on the ground that using them in further constant computations can cause unexpected failure. In particular, things get nasty once promotion is involved: If we decide to promote but then later const evaluation fails, we just made a program not compile for no good reason. So we should only promote if we can be sure that all the values used in there are const-safe. (This is why promoting const fn is such a can of worms, and we will have the same issue with generic consts.)

In this issue, however, we have a usecase for actually storing such a "const-bad" value in a static. It seems that it is not possible to refer to statics in constants or promoteds, but just in other statics, making the "const-bad" values slightly less of a problem. It is still the case that if someone uses a "const-bad" static in what seems like a safe usage of a usize, they can get an error:

union TransmuteHack<T: Copy> {
    from: T,
    to: usize,
}

pub static VALUE: usize = 42;
pub static ADDRESS: usize = unsafe { TransmuteHack { from: &VALUE }.to };

pub static AD2: usize = ADDRESS/2; // Error: Cannot divide a pointer by 2

But that's not fundamentally different from other failures induced by a static not having the value any more that it used to have -- i.e., changing a pub static is semver-breaking.

So, as long as we decide we will never want to promote expressions depending on the value of a static, I think we could change the sanity checks on static (and on static only!) to check for "run-time safety" as opposed to "const-safatey", and hence accept pointer values stored in integers. Alternatively, we could keep the current check but on statics downgrade it to a lint that can be allowed. Either way, for const we still have to rule out pointer values at integer type because they might flow into promotion. @oli-obk please correct me if that is wrong.

We should either error on them as we do now, or not lint at all. We should not add a lint to the compiler that we immediately expect to have unfixable false positives.

If we only warn for runtime-bad data -- like a NULL value at reference type -- which unfixable false positives are you thinking of?

For these I don't see the point in warning. If having a runtime local of reference type with NULL as its value is UB, why isn't a static once the program runs?

Oh I see, you are saying if the info is any good we should make it a hard error. Makes sense.

@RalfJung

So, as long as we decide we will never want to promote expressions depending on the value of a static

I didn't quite understand this part. You did say that "It seems that it is not possible to refer to statics in constants or promoteds", so I guess you mean that -- today -- everything should be fine, since the tainted value stays in "runtime land". (Also, since constants cannot refer to statics (e.g., const X: u32 = Y), so they can't sneak into a promoted indirectly that way.)

But, if we accept these sorts of statics, then in the future we could not lift those restrictions without the results depending on the precise value of the static?

It's an either or. Either we allow referring to statics in constants and promoteds or we allow creating statics with const-unsafe values. We can't have both.

| | allow referring to statics | don't allow referring to statics |
|--------------------------------------------------------------------|------------------------------------------------------------------------------|----------------------------------|
| allow statics with compile-time bad values | can turn runtime code into hard errors without it being a problem at runtime | OK |
| check statics' initializer value like const values are checked | OK | OK (today) |

Indeed. However, the bottom-left corner of that square has some other quirks -- we have to be very careful in what we let you do with the statics. Taking the address is always okay, but that remains the case even if we allow bad statics! So what remains is read-access to immutable statics (their type must be Freeze). We cannot allow read-access to mutable statics (static mut, but also static _: Cell<T>) because they could read differently at run-time.

So we don't lose all that much here.

Addresses are not ok if we ever want to allow dereferencing. or do we already do that? Dereferencing a reference to a static is just as bad as directly accessing it

Ah right. I think it'd be fine to never allow dereferencing in promoteds, but for const that seems awfully restrictive.

These comments have helped clarify some of my initial confusion as to why ADDRESS/2 would be problematic. I didn't realize that:

  1. Constant promotion is a one-way street. That is, once the compiler decides to promote something to a constant and evaluate it at compile time, the compiler won't fall back and "demote" it to a runtime value if it determines it can't actually compute the value at compile time.
  2. Constant promotion is done in non-constant statements and functions (i.e. regular fns, not const fns).

I don't really understand why either of these things are the case. Item 1 would be taken care of if the compiler could "demote" something it had previously promoted and just fall back to evaluating it at run time. Item 2 seems to be unnecessary? The constant propagation optimization pass already exists, so I'm not sure what additional performance benefits are to be gained by doing item 2, and I can't see any additional non-performance benefits that come from constant promotion in regular fns.

I'm just a user of Rust, though, and obviously lack much of the big-picture design efforts and goals that are ongoing right now, so forgive me if my reply is super naive or outright wrong.

I'm not sure what additional performance benefits are to be gained by doing item 2

Promotion is not at all about performance. It's about letting you write

fn foo() -> &'static i32 { &42 } // note the lifetime!

I suggest you have a look at the RFC for some further motivation.

This also connects to item 1: The order of events is "we decide what to promote", "we do borrow checking", "we do const evaluation". Promoted things get the 'static lifetime, which is exploited by the borrow checker. If later const evaluation fails... well, bad luck, we already did borrow checking assuming it had 'static lifetime so we can't take that back.

@rfcbot fcp close

Team member @aturon has proposed to close this. The next step is review by the rest of the tagged teams:

  • [x] @Centril
  • [x] @aturon
  • [x] @cramertj
  • [x] @eddyb
  • [ ] @joshtriplett
  • [x] @nikomatsakis
  • [x] @nrc
  • [x] @pnkfelix
  • [x] @scottmcm
  • [ ] @withoutboats

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Just for context, the lang team is suggesting to close this because it got accidentally stabilized. This could still be a useful feature, but then it should go through the usual process for new features (i.e., an RFC), and not happen by accident.

Promotion is not at all about performance. It's about letting you write

fn foo() -> &'static i32 { &42 } // note the lifetime!

I see; that makes sense. Thank you. Sorry I kept missing the significance of this.

Okay, well with that in mind I really can't argue against closing this. Thank you all (especially @RalfJung!) for looking into this thoroughly (and for your patience with me).

discussed at compiler team meeting. FCP (to close) in process.

:bell: This is now entering its final comment period, as per the review above. :bell:

closing ahead of actual release in a few hours.

Was this page helpful?
0 / 5 - 0 ratings