Rust: [tracking issue] raw ptr to usize cast inside constants

Created on 29 Jun 2018  路  29Comments  路  Source: rust-lang/rust

This is the tracking issue for the const_raw_ptr_to_usize_cast feature.

Activating the feature allows casting *const T and *mut T to usize.

I did not open an RFC for this because I believe we want to experiment with the feature (and we can already emulate it with union transmute hacks).

A-const-eval A-const-fn B-unstable C-tracking-issue T-lang

Most helpful comment

@mjbshaw IIUC, the problem is that the correctness of const depends on a const fn returning the same result every time it is called, because the resulting value can be used in contexts like type definitions. For example, consider:

const BAD : [f32; (&1u8 as *const u8 as usize) % 2] = [4.2];

If such code is allowed, this array definition (and thus the underlying program) will sometimes compile, sometimes throw a compiler error, depending on star alignment. This is, by definition, compile-time undefined behavior, and already looks like something Rust might not want to support without at least some kind of unsafe {} guard rail. Furthermore, once we start adding layers of abstraction...

const fn coin() -> usize {
    let randomness = 1u8;
    (&randomness as *const u8 as usize) % 2
}

/* lots of other code */

const BAD : [f32; coin()] = [4.2; coin()];

...it may not be immediately obvious to either a human or a compiler that something evil is going on, and accepting this fate by forcing compilers to do the in-depth check may preclude improving the efficiency of rustc's const evaluator through optimizations that deduplicate code in the future.

All 29 comments

Just encountered this, when i try to define a constant that uses ::winapi::um::winuser::IDC_ARROW as an integer, which is defined as:

pub const IDC_ARROW: LPCWSTR = 32512 as LPCWSTR;

@crlf0710 what is your use case for casting a raw pointer to an integer? Is there any reason you can't use *const () or something similar?

@oli-obk actually it is an integer in the first place, however the upstream library (winapi) cast the integer to a raw pointer (maybe for convenience using it together with the raw Windows APIs) when it defined that variable (see above comment). So i can't use the original integer value in constant evaluation context now :(

I can work around this by (re-)defining those integers myself again. (Arguably the winapi crate can change its definitions to avoid this.) I'm not seeking for any changes, just want to record this use case.

I can offer you a stable workaround, but it's very unsafe and if you screw up by placing a real pointer in an integer constant we will error, and if we don't, we might so in the future. If you do undefined behaviour at compile-time, it means the behaviour (compilation success) is also undefined.

union Transmuter {
    from: LPCWSTR,
    to: usize,
}
const MY_CONST: usize = unsafe { Transmuter { from: IDC_ARROW }.to };

@oli-obk Great to know, thanks a lot!

if you screw up by placing a real pointer in an integer constant we will error, and if we don't, we might so in the future.

I'm confused, what's wrong with putting a real pointer in an integer constant?

Btw, the following code compiles just fine:

use std::ptr::null;

#[repr(C)]
struct Fields {
    field1: u32,
    field2: [u8; 3],
    field3: u64,
    field4: i32,
}

union Transmuter<T: 'static> {
    ptr: *const T,
    reference: &'static T,
    integer: usize,
}

const FIELD4_OFFSET: usize =
    unsafe { Transmuter { reference: &(Transmuter { ptr: null::<Fields>() }.reference).field4 }.integer };

const _STATIC_ASSERT_EQ_: [(); 16] = [(); FIELD4_OFFSET];

you are converting a null pointer to a reference (which is UB) and then back to an integer (which is fine, since 0 is a valid integer and not a pointer). What I'm talking about is if you transmute a reference to e.g. a static to an integer.

Now that https://github.com/rust-lang/rust/pull/62946 is on its merry way, this issue becomes unactionable. @RalfJung I think we should just close it and make ppl use raw pointers instead of integers?

Hmm actually, if the value is an int it's still fine. So we'd still allow ptr::null() as usize... idk if that is a useful feature

One case where union transmute hacks are used currently is to have a const-capable offsetof!. I am not sure what it would take to avoid this. Ultimately this has to subtract two pointers from each other -- either two integer-valued pointers (but then it did some illegal ptr arithmetic) or two pointer-valued pointers into the same allocation (then we would need new code somewhere if we really wanted to support this).

I don't know if this kind of use case has been listed yet:
https://gist.github.com/Geobert/743fe8075c1fde1b4c7d2ef38a538094

Some C function needs a pointer to channelConfig[0], I tried to do it in Rust with some compromise but it ended having a struct in RAM (https://users.rust-lang.org/t/how-to-convert-this-c-code/33485) which made the C function Hardfaulting. The C function is closed source so I don't know what it's doing but I guess some kind of reading optim makes it crash. I even tried to put the whole config struct in RAM but the function crashed as well.
I ended putting the config in a C file and linked against it.

@Geobert I am afraid I have no idea what this has to do with const-eval. You are calling the C function at run-time, so the lack of const-time raw ptr casts should not be any problem?

If you need something that has an address to give to C, you want a static, not a const.
The purpose of const is to avoid writing the same Rust expression over and over again. The semantics of const is the same as inlining the initializer everywhere. I don't think that's what you want.

@RalfJung indeed, in the forum's thread, I've changed to static, but I had to have some of them as static mut because I couldn't convert a pointer to an u32. And having them mut pushed them in RAM instead of Flash which made the C func hardfault. I can paste an (non compilable) example tomorrow :)

And having them mut pushed them in RAM instead of Flash which made the C func hardfault.

Oh wow. I'm afraid that kind of low-level stuff is out of my league...

I have no idea why having mut should help at all -- it shouldn't. But I don't think this issue here is your problem; you do use the (awful) union-based hack to get around the unsupported cast. So congrats, you cheated your way around our stability checks -- and also, no surprise that you got some crazy behavior. ;) I am going to respond in that thread.

To be clear, I'm not happy abusing union. I did it to satisfy this ugly proprietary C function ;)

Why does regular casting requires unsafe? ~is it because we have no idea what to put in that usize and we cannot promise that it will keep working in the future? if so isn't that redundant by already being unstable?~

#![feature(const_raw_ptr_to_usize_cast)]

const A: usize = &42 as *const i32 as usize;

Error:

error[E0133]: cast of pointer to int is unsafe and requires unsafe function or block
 --> src/lib.rs:3:18
  |
3 | const A: usize = &42 as *const i32 as usize;
  |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^ cast of pointer to int
  |
  = note: casting pointers to integers in constants

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=8b159ba225b57fd39512d1f6d9af9956

EDIT: I see that after adding "unsafe" it prints "pointer-to-integer cast" needs an rfc before being allowed inside constants so I think the bug here is printing about the unsafe part instead of the RFC comment

You can cast pointers that you derived from integers. You just can't cast real pointers. This works as intended, and the reason the feature gate is not stable is because we can't detect statically whether your cast is going to be fine. We actually need to evaluate the constant with the cast as you saw.

In case of associated constants of generic traits, we can't actually do the cast unless you instantiate the trait with specific types. This means that you could build an associated constant that is broken, but only users of your trait see that and not you (that's why it's unsafe. This operation is "const unsafe").

One temporary way by now is to using *const T raw pointer in const structures, but converting it into a usize in constants would be more nice and convenient.

I find it less and less likely for raw ptr to int casts to ever become stable in CTFE. See also #51909 for the related discussion on unconst things and soundness concerns in CTFE.

The thing is, if we don't stabilize the casts, people will just transmute instead once that is stable...

But in consquence that means that we should also stabilize raw pointer comparisons as people will otherwise just cast to usize and compare there.

Depends on what you mean by "stabilize". I think the == operator should continue to always error when either operand is a Scalar::Ptr -- no matter whether this is == at type usize or == at type raw pointer. That ensure people can not compare raw pointers by casting to usize.

But you have a point... almost anything people would usually do after casting to usize does not actually work (like doing arithmetic on that integer or comparing it). Literally the only thing you can do is cast it back. So in that regard, disallowing the cast could be useful -- the error could explain what the cast would be rather pointless anyway. Maybe it should be a lint instead of a hard error.

One could argue that if they transmute, they are doing something unsafe, so it is their problem...

Pointer comparison and casting raw pointers to usize would also be unsafe if we stabilized it in const fn

This might be a stupid question, but why impose such strict restrictions on pointers? What's so bad about subtracting two pointers, or comparing them, or casting to usize and back (possibly with some arithmetic while usize)?

I understand that you don't want to allow const BAD_PTR: *const u8 = &1u8 as *const u8; because that's a meaningless constant pointing to some random spot in the const VM's address space. If that's the problem these restrictions are trying to solve then this sure seems like throwing out the baby with the bathwater, so to speak.

@mjbshaw IIUC, the problem is that the correctness of const depends on a const fn returning the same result every time it is called, because the resulting value can be used in contexts like type definitions. For example, consider:

const BAD : [f32; (&1u8 as *const u8 as usize) % 2] = [4.2];

If such code is allowed, this array definition (and thus the underlying program) will sometimes compile, sometimes throw a compiler error, depending on star alignment. This is, by definition, compile-time undefined behavior, and already looks like something Rust might not want to support without at least some kind of unsafe {} guard rail. Furthermore, once we start adding layers of abstraction...

const fn coin() -> usize {
    let randomness = 1u8;
    (&randomness as *const u8 as usize) % 2
}

/* lots of other code */

const BAD : [f32; coin()] = [4.2; coin()];

...it may not be immediately obvious to either a human or a compiler that something evil is going on, and accepting this fate by forcing compilers to do the in-depth check may preclude improving the efficiency of rustc's const evaluator through optimizations that deduplicate code in the future.

@mjbshaw please read #51909 if you are interested in the icky details. There's no point in rehashing it here in detail as it's just another shade of the same thing. But yea, as explained by @HadrienG2, the TLDR is that const eval cannot possibly compute (pointer as usize) / 2 and will throw up (current impl) or decide to compile your binary into a giant nasal demon after LLVM is done with it (possible impl). There's also a video of a lang team meeting where we discuss this https://youtu.be/b3p2vX8wZ_c?t=1887 (it should link to the 31 minute mark where the whole unsafety thing is explained).

If such code is allowed, this array definition (and thus the underlying program) will sometimes compile, sometimes throw a compiler error, depending on star alignment.

That by itself is not a problem we already have this if you make an array length rely on std::mem::size_of which wildly differs between platforms and compiler versions and other things.

Speaking specifically about subtraction, we could (dynamically) allow subtraction ((ptr1 as usize) - (ptr2 as usize)) for some pointers, but IMO it is clearer to say that this will never work (unless both pointers were originally cast from an integer). For the cases that do work, people should use offset_from, which conveniently is restrictive enough such that all cases where it is non-UB (at runtime) can actually be computed fine during CTFE.

For comparison, unfortunately we don't have a convenient runtime operation to borrow from. But again I think we should rather not (dynamically) allow (ptr1 as usize) == (ptr2 as usize) ever (unless both were cast from an integer), and instead make people use ptr_guaranteed_eq/ptr_guaranteed_ne, thus explicitly acknowledging that CTFE has special rules. For ptr1 == ptr2, then again it feels better to (statically) disallow such that people are forced to use the explicit ptr_guaranteed_* methods.

Casting to usize and back we could support. Indeed, if you replace cast by transmute we cannot prevent it. The only reason to forbid it would be to help people avoid the above pitfalls where operations such as comparison and subtraction only work under specific circumstances. Thus my suggestion to make it a lint. This is based on me being unable to imagine any correct CTFE code that uses ptr-int-casts and that cannot be written better without them -- thus, the lint would not have false positives.

I just had another idea, based on overarching principles rather than a case-by-case analysis: we could statically forbid all safe operations that can cause CTFE trouble: raw ptr comparison and ptr <-> int casts. Dynamically, we make no attempt to support pointer values in integer operations (so if people transmute to get around the lack of casts, that doesn't buy them anything).

As a replacement for what is lost this way, we provide dedicated raw pointer operations that are meaningful during CTFE: offset_from, ptr_gauranteed_{eq,ne}.

This means transmute_ptr_to_int(&4) + 0 will error when compile-time-evaluated, you are expected to use (&4 as *const _).wrapping_offset(0) instead. How it errors is an open question; my personal current thinking is to treat such "unconst" operations as UB, IOW transmute_ptr_to_int(&4) + 0 during CTFE is the same as an out-of-bounds access during CTFE. But we could also guarantee that the compiler will detect this problem during evaluation. This question is orthogonal to the rest that is being discussed here.

Was this page helpful?
0 / 5 - 0 ratings