This is a tracking issue for the RFC "Deprecate uninitialized
in favor of a new MaybeUninit
type" (rust-lang/rfcs#1892).
Steps:
Unresolved questions:
&mut T
?MaybeUninit
? (https://github.com/rust-lang/rust/pull/56138)into_inner
? Should it be more like take
instead and take &mut self
?MaybeUninit<T>
be Copy
for T: Copy
?get_ref
and get_mut
(but not reading from the returned references) before data got initialized? (AKA: "Are references to uninitialized data insta-UB, or only UB when being read from?") Should we rename it similar to into_inner
?into_inner
(or whatever it ends up being called) panic when T
is uninhabited, like mem::uninitialized
does currently?mem::zeroed
. We should however remember to also update its documentation together with MaybeUninit
, make sure people are aware that this is insta-UB if all-0-bits does not satisfy the type's validity invariant.cc @RalfJung
[ ] Implement the RFC
I can help implement the RFC.
Awesome, I can help reviewing :)
I'd like some clarification on this part of the RFC:
Make calling uninitialized on an empty type trigger a runtime panic which also prints the deprecation message.
Should only mem::uninitialized::<!>()
panic? Or should this also cover structs (and maybe enums?) that contain the empty type (e.g. (!, u8)
)?
AFAIK we only do the really harmful code generation for !
. Most other uses of mem::uninitialized
are just as incorrect, but the compiler does not happen to exploit them.
So I'd do it for !
only, but also for mem::zeroed
. (I forgot to amend that part when I added zeroed
to the RFC, it seems.)
We could start off by making this:
https://github.com/rust-lang/rust/blob/8928de74394f320d1109da6731b12638a2167945/src/librustc_codegen_llvm/intrinsic.rs#L184-L198
check whether fn_ty.ret.layout.abi
is Abi::Uninhabited
and at the very least emit a trap, e.g.: https://github.com/rust-lang/rust/blob/8928de74394f320d1109da6731b12638a2167945/src/librustc_codegen_llvm/mir/operand.rs#L400-L403
Once you've seen the trap (i.e. intrinsics::abort
) in action, you can see if there's any nice way of triggering a panic. It'' be tricky because of unwinding, we'll need to special-case them here: https://github.com/rust-lang/rust/blob/8928de74394f320d1109da6731b12638a2167945/src/librustc_codegen_llvm/mir/block.rs#L445-L447
To actually panic, you'd need something like this: https://github.com/rust-lang/rust/blob/8928de74394f320d1109da6731b12638a2167945/src/librustc_codegen_llvm/mir/block.rs#L360-L407
(you can ignore the EvalErrorKind::BoundsCheck
arm)
@eddyb Thanks for the pointers.
I'm now fixing (several) deprecation warnings and I feel (very) tempted to just run sed -i s/mem::uninitialized()/mem::MaybeUninit::uninitialized().into_inner()/g
but I guess that would miss the point ... Or is that OK if I know that the value is a concrete (Copy) type? e.g. let x: [u8; 1024] = mem::uninitialized();
.
That would exactly miss the point, yeah.^^
At least for now, I would like to consider mem::MaybeUninit::uninitialized().into_inner()
UB for all non-union types. Notice that Copy
is certainly not sufficient; both bool
and &'static i32
are Copy
and your snippet is intended to be insta-UB for them. We may want an exception for "types where all bit patterns are okay" (integer types, essentially), but I would be opposed to making such an exception because undef
is not a normal bit pattern. That's why the RFC says you need to fully initialize before calling into_inner
.
It also says that for get_mut
, but the RFC discussion brought up desired by some folks to relax the restriction here. That's an option I could live with. But not for into_inner
.
I'm afraid all these uses of uninitialized
will have to be more carefully reviewed, and in fact this was one of the intents of the RFC. We'd like the wider ecosystem to be more careful here, if everyone just uses into_inner
immediately then the RFC was worthless.
We'd like the wider ecosystem to be more careful here, if everyone just uses
into_inner
immediately then the RFC was worthless.
This gives me an idea... perhaps we should lint (group: "correctness") for this sort of code? cc @oli-obk
I'm now fixing (several) deprecation warnings
We should only ship Nightly with those warnings once the recommended replacement is available at least on Stable. See similar discussion at https://github.com/rust-lang/rust/pull/52994#issuecomment-411413493
@RalfJung
We may want an exception for "types where all bit patterns are okay" (integer types, essentially)
You've participated in discussion about this before, but I'll post here to circulate more widely: this is already something we have many existing use-cases for in Fuchsia, and we have a trait for this (FromBytes
) and a derive macro for these types. There was also an internals Pre-RFC for adding these to the standard library (cc @gnzlbg @joshlf).
I would be opposed to making such an exception because undef is not a normal bit pattern.
Yeah, this is an aspect in which mem::zeroed()
is significantly different from mem::uninitialized()
.
@cramertj
You've participated in discussion about this before, but I'll post here to circulate more widely: this is already something we have many existing use-cases for in Fuchsia, and we have a trait for this (FromBytes) and a derive macro for these types. There was also an internals Pre-RFC for adding these to the standard library (cc @gnzlbg @joshlf).
Those discussions were about ways of allowing safe memcpy
s across types, but I think that's pretty much orthogonal to whether the memory being copied is initialized or not - if you put uninitialized memory in, you get uninitialized memory out.
The consensus also was that it would be unsound for any approach discussed to allow reading padding bytes, which are a form of uninitialized memory, in safe Rust. That is if you put initialized memory in, you can't get uninitialized memory out.
IIRC, nobody there suggested or discussed any approach in which you could put uninitialized memory in and get initialized memory out, so I don't follow what those discussions have to do with this one. To me they are completely orthogonal.
To drive the point home a bit more, LLVM defines uninitialized data as Poison, which is distinct from "some arbitrary but valid bit pattern." Branching based on a Poison value or using it to compute an address which is then dereferenced is UB. So, unfortunately, "types where all bit patterns are okay" are still not safe to construct because using them without separately initializing them will be UB.
Right, sorry, I should have clarified what I meant. I was trying to say that "types where all bit patterns are okay" is already something that we're interested in defining for other reasons. Like @RalfJung said above,
I would be opposed to making such an exception because undef is not a normal bit pattern.
Thank god there are people who can read, because apparently I can't...
Right, so what I meant to say is: We definitely have types where all initialized bit patterns are okay -- all the i*
and u*
types, raw pointers, I think f*
as well and then tuples/structs only consisting of such types.
What is an open question is under which circumstances which of these types are allowed to be uninitialized, i.e., poison. My own preferred answer is "never".
The consensus also was that it would be unsound for any approach discussed to allow reading padding bytes, which are a form of uninitialized memory, in safe Rust. That is if you put initialized memory in, you can't get uninitialized memory out.
Reading padding bytes as MaybeUninit<u8>
should be fine.
The consensus also was that it would be unsound for any approach discussed to allow reading padding bytes, which are a form of uninitialized memory, in safe Rust. That is if you put initialized memory in, you can't get uninitialized memory out.
Reading padding bytes as MaybeUninit
should be fine.
The discussion in a nutshell was about providing a trait, Compatible<T>
, with a safe method fn safe_transmute(self) -> T
that "reinterprets"/"memcpys" the bits of self
into a T
. The guarantee of this method is that if self
is properly initialized, so is the resulting T
. It was proposed for the compiler to fill in transitive implementations automatically, e.g., if there is an impl Compatible<V> for U
, and animpl Compatible<W> for V
then there is an impl Compatible<W> for U
(either because it was provided manually, or the compiler auto generates it - how this could be implemented was completely handwaved).
It was proposed that it should be unsafe
to implement the trait: if you implement it for a T
that has padding bytes where Self
has fields, then everything is fine at least until you try to use the T
and your program behavior ends up depending on the contents of the uninitialized memory.
I have no idea what any of this has to do with MaybeUninit<u8>
, maybe you could elaborate on that?
The only thing I can imagine is that we could add a blanket impl: unsafe impl<T> Compatible<[MaybeUninit<u8>; size_of::<T>()]> for T { ... }
since transmuting any type into a [MaybeUninit<u8>; N]
of its size is safe for all types. I don't know how useful such an impl would be, given that MaybeUninit
is an union, and whoever uses the [MaybeUninit<u8>; N]
has no idea of whether a particular element of the array is initialized or not.
@gnzlbg back then you were talking about FromBits<T> for [u8]
. That is where I say we have to use [MaybeUninit<u8>]
instead.
I discussed this proposal with @nikomatsakis at RustConf, and he encouraged me to go forward with an RFC. I was going to do it in a few weeks, but if there's interest, I can try getting one done this weekend. Would that be useful for this discussion?
@joshlf which proposal are you talking about?
@RalfJung
@gnzlbg back then you were talking about FromBits
for [u8]. That is where I say we have to use [MaybeUninit ] instead.
Gotcha, fully agree here. Had completely forgotten that we also wanted to do that 😆
@joshlf which proposal are you talking about?
A FromBits
/IntoBits
proposal. TLDR: T: FromBits<U>
means that any bit pattern which is a valid U
corresponds to a valid T
. U: IntoBits<T>
means the same thing. The compiler automatically infers both for all pairs of types given certain rules, and this unlocks lots of fun goodness that currently requires unsafe
. There's a draft of this RFC here that I wrote a while back, but I intend to change large parts of it, so don't take that text as anything more than a rough guide.
@joshlf I think such a pair of traits would more build on top of this discussion than be part of it. AFAIK we have two open questions in terms of validity:
MaybeUninit::get_mut
docs accordingly (it is not actually UB to use that before completing initialization, but it is UB to dereference it before completing initialization). However, we first have to make that decision for validity, and I am not sure what the right venue is for that. Probably a dedicated RFC?u8
(and other integer types, floating point, raw pointer) have to be initialized, i.e., is MaybeUinit<u8>::uninitialized().into_inner()
insta-UB? I think so, but mostly based on a gut feeling that we want to keep the places where we allow poison
/undef
to a minimum. However, I could be persuaded otherwise if there are plenty of uses of this pattern (and I hope to use miri to help determining this).Does it recurse below references?
@RalfJung can you show an example of what you mean with "recursing below references"?
Does a u8 (and other integer types, floating point, raw pointer) have to be initialized, i.e., is MaybeUinit
::uninitialized().into_inner() insta-UB?
What happens if it isn't instant UB? What can I do with that value? Can I match on it? If so, is the program behavior deterministic?
I feel like if I can't match on the value without introducing UB, then we have re-invented mem::uninitialized
. If I can match on the value and the same branch is always taken across all architectures, opt-levels, etc. we have re-invented mem::zeroed
(and are kind of making the use of the MaybeUninit
type a bit moot). If the program behavior isn't deterministic, and changes with optimization levels, across architectures, depending on external factors (like whether the OS gave the process zeroed pages), etc., then I feel like we would be introducing a huge footgun into the language.
Does a
u8
(and other integer types, floating point, raw pointer) have to be initialized, i.e., isMaybeUinit<u8>::uninitialized().into_inner()
insta-UB? I think so, but mostly based on a gut feeling that we want to keep the places where we allowpoison
/undef
to a minimum. However, I could be persuaded otherwise if there are plenty of uses of this pattern (and I hope to use miri to help determining this).
FWIW, two of the benefits of this not being UB are that a) it lines up with what LLVM does and, b) it allows more flexibility wrt optimizations. It also seems more consistent with your recent proposal for defining safety at use time, not at construction time.
What happens if it isn't instant UB? What can I do with that value? Can I match on it? If so, is the program behavior deterministic?
I feel like if I can't match on the value without introducing UB, then we have re-invented
mem::uninitialized
. If I can match on the value and the same branch is always taken across all architectures, opt-levels, etc. we have re-inventedmem::zeroed
(and are kind of making the use of theMaybeUninit
type a bit moot). If the program behavior isn't deterministic, and changes with optimization levels, across architectures, depending on external factors (like whether the OS gave the process zeroed pages), etc., then I feel like we would be introducing a huge footgun into the language.
Why would you want to be able to match on something that's uninitialized? Defining it as UB to branch or index based on uninitialized values affords LLVM more room to optimize, so I don't think tying its hands more is a good idea, especially if there's not a compelling use case.
Why would you want to be able to match on something that's uninitialized?
I didn't say I wanted to, I stated that if this cannot be done, I don't understand the difference between MaybeUinit<u8>::uninitialized().into_inner()
and just mem::uninitialized()
.
@RalfJung can you show an example of what you mean with "recursing below references"?
Essentially, the question is whether we allow the following:
let mut b = MaybeUninit::<bool>::uninitialized();
let bref = b.get_mut(); // insta-UB?
If we decide that a reference is valid only if it points to something valid (that's what I mean by "recursing below references"), this code is UB.
What happens if it isn't instant UB? What can I do with that value? Can I match on it? If so, is the program behavior deterministic?
You cannot inspect an uninitialized u8
in any way. match
can do many things, both binding names and actually testing for equality; the former is okay but the latter not. But you can write it back to memory.
Essentially, this is what miri currently implements.
I feel like if I can't match on the value without introducing UB, then we have re-invented mem::uninitialized.
Why that? The biggest problem with mem::uninitialized
was around types that have restrictions for what their valid values are. We could decide that u8
has no such restrictions, so mem::uninitialized()
was okay for u8
. It was just almost impossible to use correctly in generic code, so it is better to entirely get rid of it.
Either way, it is still not okay to pass an uninitialized u8
to safe code, but it might be okay to carefully use it in unsafe code.
You cannot "match" on a &mut
pointing to invalid data either. IOW, I think that the bool
example I gave above is fine, but the following is certainly not:
let mut b = MaybeUninit::<bool>::uninitialized();
let bref = b.get_mut();
match bref {
&b => // insta-UB! We have a bad bool in scope.
}
This is using match
to do a normal pointer dereference.
FWIW, two of the benefits of this not being UB are that a) it lines up with what LLVM does and, b) it allows more flexibility wrt optimizations. It also seems more consistent with your recent proposal for defining safety at use time, not at construction time.
Which optimizations would this allow?
Notice that LLVM does optimizations on essentially untyped code, so none of this is a concern there. We are talking only about MIR optimizations here.
I am essentially coming form the perspective that we should allow as little as possible until we have a clear use. We can always allow more stuff later, but not the other way around. That said, some good uses of byte slices that can old any data have come up recently, which might be enough of an argument to do this at least for u*
and i*
.
If we decide that a reference is valid only if it points to something valid (that's what I mean by "recursing below references"), this code is UB.
Gotcha.
The biggest problem with mem::uninitialized was around types that have restrictions for what their valid values are.
mem::uninitialized
also has the problem that you pointed out above: that creating a reference to an uninitialized value might be undefined behavior (or not). So is the following UB?
let mut b = MaybeUninit::<u8>::uninitialized().into_inner();
let bref = &mut b; // Insta UB ?
I thought that one of the reasons for introducing MaybeUninit
was to avoid this problem by always having the union initialized (e.g. to unit), which allows you to take a reference to it, and mutate its contents, by e.g. setting the active field to the u8
and giving it a value via ptr::write
without introducing UB.
So this is why I am a bit confused. I don't see how into_inner
is any better than:
let mut b: u8 = uninitialized();
let bref = &mut b; // Insta UB ?
Both look like undefined behavior time bombs to me.
Which optimizations would this allow?
Notice that LLVM does optimizations on essentially untyped code, so none of this is a concern there. We are talking only about MIR optimizations here.
If we say that undefined memory has some value, and thus you're allowed to branch on it according to the Rust semantics, then we can't lower it to LLVM's version of undefined, because it'd be unsound.
I am essentially coming form the perspective that we should allow as little as possible until we have a clear use. We can always allow more stuff later, but not the other way around.
That's fair.
That said, some good uses of byte slices that can old any data have come up recently, which might be enough of an argument to do this at least for
u*
andi*
.
Do any of these use cases include having byte slices which hold uninitialized values?
One place that an uninitialized-but-not-poison &mut [u8]
could be valuable is for Read::read
- we'd like to be able to avoid needing to zero the buffer just because some weird Read
impl could read out of it rather than just writing into it.
One place that an uninitialized-but-not-poison
&mut [u8]
could be valuable is forRead::read
- we'd like to be able to avoid needing to zero the buffer just because some weirdRead
impl could read out of it rather than just writing into it.
I see, so the idea is that MaybeUninit
would represent a type which is initialized, but with undefined contents, while other types of uninitialized data (e.g., padding fields) would still be fully uninitialized in the LLVM poison sense?
I don't think it'd need to apply to MaybeUninit generally. There could in theory be some API to "freeze" the contents from undefined to defined-but-arbitrary.
If we say that undefined memory has some value, and thus you're allowed to branch on it according to the Rust semantics, then we can't lower it to LLVM's version of undefined, because it'd be unsound.
That was never the proposal. It is and will remain UB to branch on poison
.
The question is whether it is UB to merely "have" a poison
in a local u8
.
Do any of these use cases include having byte slices which hold uninitialized values?
Slices are like references, so &mut [u8]
of uninitialized data is fine as long as it is only written into (assuming that is the solution we take for reference validity).
@sfackler
One place that an uninitialized-but-not-poison &mut [u8] could be valuable is for Read::read - we'd like to be able to avoid needing to zero the buffer just because some weird Read impl could read out of it rather than just writing into it.
Well, without &out
you will only ever be able to do that if you know the impl. The question is not whether safe code has to handle poison
in u8
(it does not, that is not an okay use of safe code!), the question is whether unsafe code may carefully handle it this way. (See that blog post that I wanted to write today about the distinction between safety invariants and validity invariants...)
Maybe I'm late, but I'd suggest changing signature of set()
method to return &mut T
. This way, it'd be safe to write completely safe code working with MaybeUninit
(at least in some situations).
fn init(dest: &mut MaybeUninit<u8>) -> &mut u8 {
dest.set(produce_value())
}
This is practically a static guarantee that init()
will either initialize the value or diverge. (If it tried to return something else, the lifetime would be wrong and &'static mut u8
is impossible in safe code.) Maybe it could be used as a part of placer API in the future.
@Kixunil It has been that way before, and I agree it is nice. I just find the same set
confusing for a function that returns something.
@Kixunil
This is practically a static guarantee that
init()
will either initialize the value or diverge. (If it tried to return something else, the lifetime would be wrong and&'static mut u8
is impossible in safe code.)
Not quite; you can get one with Box::leak
.
In a codebase I wrote recently, I came up with a similar scheme; it's a bit more complicated, but does provide a true static guarantee that the provided reference was initialized. Instead of
fn init(dest: &mut MaybeUninit<u8>) -> &mut u8
I have
fn init<'a>(dest: Uninitialized<'a, u8>) -> DidInit<'a, u8>
The trick is that Uninitialized
and DidInit
are both invariant on their lifetime parameters, so there's no way to reuse a DidInit
with a different lifetime parameter, even e.g. 'static
.
DidInit
impls Deref
and DerefMut
, so safe code can use it as a reference, like in your example. But the guarantee that it was actually the original passed-in reference that got initialized, not some random other reference, is helpful for unsafe code. It means you can define initializers structurally:
struct Foo {
a: i32,
b: u8,
}
fn init_foo<'a>(dest: Uninitialized<'a, Foo>,
init_a: impl for<'x> FnOnce(Uninitialized<'x, i32>) -> DidInit<'x, i32>,
init_b: impl for<'x> FnOnce(Uninitialized<'x, u8>) -> DidInit<'x, u8>)
-> &'a mut DidInit<'a, Foo> {
let ptr: *mut Foo = dest.ptr;
unsafe {
init_a(Uninitialized::new(&mut (*ptr).a));
init_b(Uninitialized::new(&mut (*ptr).b));
dest.did_init()
}
}
This function initializes a pointer to struct Foo
by initializing each of its fields in turn, using the user-provided initialization callbacks. It requires that the callbacks return DidInit
s, but doesn't care about their values; the fact that they exist is enough. Once all fields have been initialized, it knows that the entire Foo
is valid – so it calls did_init()
on the Uninitialized<'a, Foo>
, which is an unsafe method that just casts it to the corresponding DidInit
type, which init_foo
then returns.
I also have a macro that automates the process of writing such functions, and the real version is a bit more careful about destructors and panics (though it needs improvement).
Anyway, I wonder if something like this could be implemented in the standard library.
(Note: DidInit<'a, T>
is actually a type alias for &'a mut _DidInitMarker<'a, T>
, to avoid lifetime issues with DerefMut
.)
By the way, whereas the above-linked approach ignores destructors, a slightly different approach would be to make DidInit<‘a, T>
responsible for running T
’s destructor. In this case it would have to be a struct, not an alias; and it could only hand out references to T
that live as long as the DidInit
itself, not for all of ’a
(since otherwise you could continue accessing it after destruction).
+1 for including a method to give the behavior I had previously asked for in set
, but I'm fine with it being available via another name.
Any good ideas for what that name could be? set_and_as_mut
?^^
set_and_borrow_mut
?
insert
/insert_mut
? The Entry
type has a somewhat similar or_insert
method (but OccupiedEntry
also has insert
which returns the old value, so that's not similar at all).
Is there a really compelling reason for having two separate methods? It seems simple enough to ignore the return value, and I'd imagine the function would be marked as #[inline]
so I wouldn't expect any real runtime cost.
Is there a really compelling reason for having two separate methods? It seems simple enough to ignore the return value
I guess the only reason is that seeing set
return something is rather surprising.
Maybe I'm missing something, but what could save us from having invalid value? I mean if we
let mut foo: MaybeUninit<T> = MaybeUninit {
uninit: (),
};
let mut foo_ref = &mut foo as *mut MaybeUninit<T>;
unsafe {
some_native_function(&mut (*foo_ref).value, val);
}
what if some_native_function
is no-op and doesn't actually init the value? Is it still UB? How could it be handled?
@Pzixel this is all covered by the API documentation for MaybeUninit
.
If some_native_function
is a NOP, nothing happens; if you then later use foo_ref.value
(or rather do foo_ref.as_mut()
as you can only use the public API), that is UB because the function may only be called once everything is initialized.
MaybeUninit
does not prevent having invalid values -- if it could, it would be safe, but that's not possible. However, it makes working with invalid values less of a footgun because now the information that the value might be invalid is encoded in the type, for both the compiler and the programmer to see.
I wanted to document an IRC conversation I had with @sfackler regarding a hypothetical issue that could arise in the future.
The main question is whether mem::zeroed
is a valid in-memory representation for the current implementation proposal for MaybeUninit<NonZeroU8>
. In my thinking, in the “uninit” state the value is only padding, which the compiler can use for any purpose, and in the "value" state, all possible values except mem::zeroed
are valid (because of NonZero
).
A future type layout system with more advanced enum discriminant packing (than we have now) might then store a discriminant in the padding of the "uninit" state/zeroed memory in the "value" state. In that hypothetical system, the size of Option<MaybeUninit<NonZeroU8>>
is 1, whereas it currently is 2. Furthermore, in that hypothetical system, Some(MaybeUninit::uninitialized())
would be indistinguishable from None
. I think we can probably fix this by changing the implementation of MaybeUninit
(but not its public API) once we do move to such a system.
I see no difference between NonZeroU8
and &'static i32
in this regard. Both of these are types where "0" is not valid. So for both of these, MaybeUninit<T>::zeroed().into_inner()
is insta-UB.
Whether Option<Union>
can do layout optimizations depends on what validity for a union is. This is not decided yet for all cases, but there is general agreement that for unions that have a variant of type ()
, any bit-pattern is valid and hence no layout optimizations are possible. This covers MaybeUninit
. So Option<MaybeUninit<NonZeroU8>>
will never have size 1.
there is general agreement that for unions that have a variant of type (), any bit-pattern is valid and hence no layout optimizations are possible.
Is this a special case for “unions that have a variant of type ()”? Does the stabilization of this feature implicitly stabilize that part of the Rust ABI? What about a union
containing struct UnitType;
or struct NewType(());
? What about struct Padded
(below)? What about a union
containing struct Padded
?
#[repr(C, align(4))]
struct Padded {
a: NonZeroU8,
b: (),
c: NonZeroU16
}
My wording was awfully specific because this is literally the only thing that I am pretty sure we have general agreement on. :) I think we would want to make this dependent on the size only (i.e., all ZSTs would get this), but actually I think this variant shouldn't even be needed and unions will just never get layout optimizations by default (but eventually users may be able to opt-in using attributes). But that is just my opinion.
We will have a proper discussion to gauge the current consensus and maybe get agreement on more things in one of the next discusions in the UCG repo, and you are welcome to join there when it happens.
Does the stabilization of this feature implicitly stabilize that part of the Rust ABI?
We are talking about validity invariants here, not data layout (which I assume you refer to when you bring up the ABI). So none of this would stabilize any ABI. These are related but distinct, and in fact there is currently an ongoing discussion on the ABI of unions.
These are related but distinct, and in fact there is currently an ongoing discussion on the ABI of unions.
AFAICT that discussion is about the memory representation of unions only, and does not include how the unions are passed through function boundaries and other things that might be relevant for an ABI. I don't think the objective of the UCG repo is to create an ABI for Rust.
Well, the objective is to define enough things for interop with C. Things like "Rust bool
and C bool
are ABI-compatible".
But indeed, for repr(Rust)
, I think there are no plans to define a function call ABI -- but that should ideally be an explicit statement in whatever form the resulting document takes, not just an omission.
I'm curious to know whether there's some argument against layout-optimizing Option<Foo>
where Foo
is defined like this:
union Foo {
bar: NonZeroUsize,
baz: &'static str,
}
@Kixunil could you raise that at https://github.com/rust-rfcs/unsafe-code-guidelines/issues/13? Your question really is not related to MaybeUninit
.
I want to know which section will be contains static variables without initialization?
In "C" I can write uint8_t a[100];
in high level of file, and I know that a symbol will be put to .bss section. If I write uint8_t a[100] = {};
the a symbol will be put to .data section (which will be copied from FLASH to RAM before main).
It is small example in Rust which used MaybeUninit:
struct A {
data: MaybeUninit<[u8; 100]>,
len: usize,
}
impl A {
pub const fn new() -> Self {
Self {
data: MaybeUninit::uninitialized(),
len: 0,
}
}
}
static mut a: MaybeUninit<[u8; 100]> = MaybeUninit::uninitialized();
static mut b: A = A::new();
Which section will be contains a and b symbols?
P. S. I know about symbol mangling but it is no matter for this question.
@qwerty19106 In your example both a
and b
will be place in .bss
. LLVM treats undef
values, like MaybeUninit::uninitialized()
, as zeros when selecting which section a variable will go in.
If A::new()
initialized len
to 1 then b
would have ended in .data
. If a static
contains any non-zero value then the variable will go in .data
. Padding is treated as zero values.
This is what LLVM does. Rust makes no ~guarantees~ promises (*) about which linker section a static
variable will go in. It just inherits LLVM's behavior.
(*) Unless you use #[link_section]
Fun fact: At some point LLVM considered undef
as a non-zero value so variable a
in your example ended in .data
. See #41315.
Thanks @japaric for your answer. It was very helpful for me.
I have the new idea.
One can use .init_array section to initialize static mut variables before call main.
This is proof of concept:
#[macro_export]
macro_rules! static_singleton {
($name_var: ident, $ty:ty, $name_init_fn: ident, $name_init_var: ident, $init_block: block) => {
static mut $name_var: MaybeUninit<$ty> = unsafe {MaybeUninit::uninitialized()};
extern "C" fn $name_init_fn() {
unsafe {
$init_block
}
}
#[link_section = ".init_array"]
#[used]
static $name_init_var: [extern "C" fn(); 1] = [$name_init_fn];
};
}
The test code:
static_singleton!(A, u8, a_init_fn, A_INIT_VAR, {
let ptr = A.get_mut();
*ptr = 5;
});
fn main() {
println!("A inited to {}", unsafe {&A.get_ref()});
}
Result: A inited to 5
Full example: playground
Unresolved question:
I could not use concat_idents to generate a_init_fn and A_INIT_VAR. It seems that #1628 is not yet ready for use.
This test not very useful. But it can be useful in embedded for initialize complicated structs (it will be placed in .bss, thus allows to economy FLASH).
Why rustc don't use .init_array section? It is standartized section of ELF format (link).
@qwerty19106 Because life before main() is considered a misfeature and was explicitly disinvited from Rust's semantics.
Ok, it is good lang design.
But in #[no_std] we have not good alternative now (maybe I was searching bad).
We can use spin::Once, but it is very expensive (Ordering::SeqCst on every reference get).
I would like to have a compile-time check on embedded.
it is very expensive (
Ordering::SeqCst
on every reference get).
That doesn't sound right to me. Aren't all "once" abstractions supposed to be relaxed on access, and synchronize on initialization? Or am I thinking of something else?
cc @Amanieu @alexcrichton
@qwerty19106:
When you say "embedded", are you referring to bare-metal? It's worth noting that .init_array
is not, in fact, part of the ELF format itself ¹ - It's not even part of the System V ABI ² that extends it; only .init
is. You don't find .init_array
until you get to the System V ABI draft update, which the Linux ABI inherits from.
As a result, if you are running on bare metal, .init_array
may not even function reliably for your use case - after all, it's implemented on non-bare-metal by code in the dynamic loader and/or libc. Unless your bootloader takes responsibility for running code referenced in .init_array
, it'd do nothing at all.
1: See page 28, figure 1-13 "Special Sections"
2: See page 63, figure 4-13 "Special Sections (continued)"
@eddyb You need an Acquire
load at the very least when reading the Once
. This is a normal load on x86 and a load + fence on ARM.
The current implementation uses load(SeqCst)
, but in practice this generates the same asm as load(Acquire)
on all architectures.
(Would you mind moving these discussions elsewhere? They do not have anything to do with MaybeUninit vs mem::uninitialized any more. Both behave the same wrt what LLVM does - generate undef. What happens with that undef later is not on topic here.)
Am 13. September 2018 00:59:20 MESZ schrieb Amanieu notifications@github.com:
@eddyb You need an
Acquire
load at the very least when reading the
Once
. This is a normal load on x86 and a load + fence on ARM.The current implementation uses
load(SeqCst)
, but in practice this
generates the same asm asload(Acquire)
on all architectures.--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/rust-lang/rust/issues/53491#issuecomment-420825802
MaybeUninit
has landed in master and will be in the next nightly. :)
https://github.com/rust-lang/rust/issues/54470 proposes using Box<[MaybeUninit<T>]>
in RawVec<T>
. To enable this and possibly other interesting combinations with boxes and slices with fewer transmutes, maybe we could add some more APIs to the standard library?
In particular to allocate without initializing (I think Box::new(MaybeUninit::uninitialized())
would still copy size_of::<T>()
padding bytes?):
impl<T> Box<MaybeUninit<T>> {
pub fn new_uninit() -> Self {…}
pub unsafe fn assert_init(s: Self) -> Box<T> { transmute(s) }
}
impl<T> Box<[MaybeUninit<T>]> {
pub fn new_uninit_slice(len: usize) -> Self {…}
pub unsafe fn assert_init(s: Self) -> Box<[T]> { transmute(s) }
}
In core::slice
/ std::slice
, can be used after taking a sub-slice:
pub unsafe fn assert_init<T>(s: &[MaybeUninit<T>]) -> &[T] { transmute(s) }
pub unsafe fn assert_init_mut<T>(s: &mut [MaybeUninit<T>]) -> &mut [T] { transmute(s) }
I think Box::new(MaybeUninit::uninitialized()) would still copy size_of::
() padding bytes
It should not, and there is a codegen test intended to test that.
Padding bytes do not have to be copied as their bit representation does not matter (anything that would observe the bit representation is UB anyway).
Ok, so maybe Box::new_uninit
is unnecessary? The slice version is different, though, since Box::new
requires T: Sized
.
I'd like to advocate for MaybeUninit::zeroed
to be a const fn
. There are some FFI-related uses that I would have for it (e.g. a static that must be initialized to zero) and I believe others might find it useful. I would be happy to volunteer my time to const-ify the zeroed
function.
@mjbshaw you'll need to use #[rustc_const_unstable(feature = "const_maybe_uninit_zeroed")]
for that since zeroed
does things that does not pass the min_const_fn
check (https://github.com/rust-lang/rust/issues/53555) which means that MaybeUninit::zeroed
's constness will be unstable even if the function is stable.
Could implementation/stabilization of this be split into a couple of steps, in order to make the MaybeUninit type available to the wider ecosystem sooner? The steps could be:
1) add MaybeUninit
2) convert all uses of mem::uninitialized/zeroed and deprecate
@scottjmaddox
add MaybeUninit
https://doc.rust-lang.org/nightly/core/mem/union.MaybeUninit.html :)
Nice! So is the plan to stabilize MaybeUninit ASAP?
Next step is to figure out why https://github.com/rust-lang/rust/pull/54668 regresses perf so badly (in a few benchmarks). I won't have much time to look at that this week though, would be happy if someone else could take a look. :D
Also I don't think we should rush this. We got the last API for handling uninitialized data wrong, let's not hurry and screw up again. ;)
That said, I'd also prefer not to add any unnecessary delays, so that we can finally deprecate the old footgun. :)
Oh, and something else just occurred to me... with https://github.com/rust-lang/rust/pull/54667 landed, the old APIs actually protect against some of the worst footguns. I wonder if we could get some of that for MaybeUninit
as well? It's not blocking stabilization, but we could try to find a way to make MaybeUninit::into_inner
panic when called on an uninhabited type. In debug builds, I could also imagine *x
to panic when x: &[mut] T
with T
uninhabited.
Status update: To make progress with https://github.com/rust-lang/rust/pull/54668, we likely need someone to tweak layout computation for unions. @eddyb is willing to mentor, but we need someone to do the implementation. :)
I think a method that moves out of the wrapper, replacing it with an uninitialized value, would be useful:
pub unsafe fn take(&mut self) -> T
Shall I submit this?
@shepmaster This feels very similar to the existing into_inner
method. Maybe we can try to avoid duplication here?
Also "replacing with" is likely the wrong picture here, this shouldn't change the content of self
at all. Just ownership is transferred away so it is now effectively in the same state as when constructed uninitialized.
change the content of
self
at all
Sure, so the implementation would basically be ptr::read
, but from a usage point of view I would encourage framing it as replacement of a valid value with an uninitialized value.
avoid duplication
I have no strong objection, as I expect the implementation of one to call the other. I just don't know what the final state would be.
I feel into_inner
is a way too innocent function name. People, probably without reading the docs too carefully, still do MaybeUninit::uninitialized().into_inner()
. Can we change the name to something like was_initialized_unchecked
or so that indicates that you must only call this after the data has been initialized?
I think the same probably applies to take
.
While a bit awkward, something like unchecked_into_initialized
might work?
Or should those methods be removed entirely and docs give examples with x.as_ptr().read()
?
@SimonSapin into_inner
consumes self
though which is nice.
But for @shepmaster's take
, doing as_mut_ptr().read()
would do the same... though of course then why would you even bother with a mutable pointer?
How about take_unchecked
and into_inner_unchecked
?
That would be a backup plan I guess, but I'd prefer if it could indicate that you must have initialized.
Putting both the emphasis that it has to be initialized and a description of what it does (unwrap/into_inner/etc.) into one name gets rather unwieldy, so how about just doing the former with assert_initialized
and leaving the latter implied by the signature? Possible unchecked_assert_initialized
to avoid implying a runtime check like assert!()
has.
Possible unchecked_assert_initialized to avoid implying a runtime check like assert!() has.
We differentiate between assumptions and assertions already via intrinsics::assume(foo)
vs assert!(foo)
, so maybe assume_initialized
?
assume
is unstable API, a stable example of assume vs assert is unreachable_unchecked
vs unreachable
and get_unchecked
vs get
. So I think unchecked
is the right term.
I'd say that foo_unchecked
only makes sense when there is a corresponding foo
, otherwise the sheer nature of the function being unsafe
indicates to me that something "different" is going on.
This bikeshed is clearly the wrong color
With this specific API, we've already seen and will continue to see programmers assume that the unsafety is because "uninitialized data is garbage so you can cause UB if you handle it carelessly", rather than the intended "it is UB to call this on uninitialized data, period". I don't know for sure whether an arguably redundant ⚠️ like unchecked
will help with that but I'd rather err on the side of being more perplexing (= more likely to cause people to ask around or read the docs very carefully).
@RalfJung
I feel
into_inner
is a way too innocent function name. People, probably without reading the docs too carefully, still doMaybeUninit::uninitialized().into_inner()
. Can we change the name to something likewas_initialized_unchecked
or so that indicates that you must only call this after the data has been initialized?
I _really_ like this idea; I feel strongly that it says the right thing both about the semantics and that this is potentially dangerous.
@rkruppe
Putting both the emphasis that it has to be initialized and a description of what it does (unwrap/into_inner/etc.) into one name gets rather unwieldy, so how about just doing the former with
assert_initialized
and leaving the latter implied by the signature? Possibleunchecked_assert_initialized
to avoid implying a runtime check likeassert!()
has.
I have no qualms about unwieldy long-ass names for dangerous things. If it makes it more people think twice then even was_initialized_into_inner_unchecked
is entirely fine by me. Making it unergonomic (within reason) to write unsafe code is a feature, not a bug ;)
Remember that a vast majority of people will likely be using an IDE with some form of autocomplete, so a long name is a minor roadbump.
I do not particularly care for the ergonomics of using this function, but I do think past a certain point names tend to be skimmed rather than read, and this name really should be read to understand what's happening. Additionally, I expect this function will be discussed/explained almost as often as it's actually used (since it's relatively niche and very subtle), and while typing long identifiers in source code can be fine (e.g. thanks to IDEs), typing them out from memory in a chat system is... less nice (I'm half joking about this point, but only half).
@shepmaster Sure; I use an IDE with auto completion as well; but I think a longer name with unchecked
inside including inside an unsafe
block would give at lest myself some extra pause.
@rkruppe
typing them out from memory in a chat system is... less nice (I'm half joking about this point, but only half).
I would make that trade-off. If a name is a bit special that can even make it more memorable. ;)
Any of (or similar names that include the same semantic connotations):
was_initialized_unchecked
was_initialized_into_inner_unchecked
is_initialized_unchecked
is_initialized_into_inner_unchecked
was_init_unchecked
was_init_into_inner_unchecked
is_init_unchecked
is_init_into_inner_unchecked
assume_initialized_unchecked
assume_init_unchecked
are fine by me.
What about initialized_into_inner
? Or initialized_into_inner_unchecked
, if you think that unchecked
is really necessary, though I tend to agree with @shepmaster that unchecked
is only necessary to distinguish from some other _checked_ variant of the same functionality, where runtime checks are not happening.
When manually implementing a self-borrowing generator I ended up using ptr::drop_in_place(maybe_uninit.as_mut_ptr())
multiple times, it seems like this would work well as an inherent method unsafe fn drop_in_place(&mut self)
on MaybeUninit
.
There is precedent with ManuallyDrop::drop
.
I'd say that foo_unchecked only makes sense when there is a corresponding foo, otherwise the sheer nature of the function being unsafe indicates to me that something "different" is going on.
I don't think not having a safe version is a good reason to remove the warning sign from the unsafe version.
remove the warning sign from the unsafe version
Being a tad hyperbolic, when should an unsafe
function not have _unchecked
stuck on the end then? What's the point of having two warnings that say the same thing?
That's a fair question. :) But I think the answer is "almost never", and I actually regret that we have offset
as unsafe function on pointers which in no way expresses that it is unsafe. It doesn't have to be literally unchecked
, but IMO there should be something. When I am in an unsafe block anyway and accidentally write .offset
instead of .wrapping_offset
, I made a promise to the compiler that I did not intend to make.
as unsafe function on pointers which in no way expresses that it is unsafe
This sums up my bemusement at this stage.
@shepmaster so you don't think it is realistic that someone will edit code inside an existing unsafe
block (maybe a large one, maybe inside a large unsafe fn
which implicitly has an unsafe
block), and not be aware that the call they are adding is unsafe
?
someone will edit code inside an existing
unsafe
block [...] and not be aware that the call they are adding isunsafe
Sorry, I didn't intend to dismiss this possibility and it does seem real. My opinion is that tacking qualifiers onto the name of a function to indicate that a function is unsafe because the existing unsafe
qualifier doesn't help us seems to indicate a deeper failure.
Maybe it's a failure that we can't fix in a backwards compatible way and adding words to names is the only possible solution, but I hope that's not the case.
maybe a large one, maybe inside a large
unsafe fn
which implicitly has anunsafe
block
I've been asked why Rust allows for variables to be shadowed because shadowing is clearly a bad idea when you have a multiple-hundred line function. I'm personally very dismissive of such cases because I believe such code is generally accepted as bad form to start with.
Now, if some aspect of Rust forces unsafe blocks to be bigger than they "need" to be, maybe that also indicates a more fundamental issue.
As an aside, I wonder if IDEs + RLS can identify any function marked as unsafe and highlight those specially. My editor already highlights the unsafe
keyword, for example.
Now, if some aspect of Rust forces unsafe blocks to be bigger than they "need" to be, maybe that also indicates a more fundamental issue.
Well there is https://github.com/rust-lang/rfcs/pull/2585 ;)
As an aside, I wonder if IDEs + RLS can identify any function marked as unsafe and highlight those specially.
That would be great! Not everybody only reads code in IDEs though -- i.e., reviews are not usually done in IDEs.
Now, if some aspect of Rust forces unsafe blocks to be bigger than they "need" to be, maybe that also indicates a more fundamental issue.
I think unsafe methods in chains are one of the bigger examples - splitting out a method in the middle to a let
-binding can be pretty unergonomic, but unless you do so, the entire chain is covered.
Not quite 'forces', but definitely 'motivates'.
l there is rust-lang/rfcs#2585 ;)
Yep, but I didn't mention it because is wouldn't actually help your case either. People can always just add an unsafe
block around the entire body (as is mentioned in the comments) and then you are right back to the same problem: unsafe function calls "sneak in".
Not everybody only reads code in IDEs though
Yep, which is why I put that as an aside. I suppose I should have more clearly stated that.
I guess my problem is that, effectively, you are advocating for this:
unsafe fn unsafe_real_name_of_function() { ... }
^~~~~~ for humans
^~~~~~ for the compiler
This allows you to clearly see every unsafe function when reading code. The repetition greatly annoys me and indicates that something is suboptimal.
This allows you to clearly see every unsafe function when reading code. The repetition greatly annoys me and indicates that something is suboptimal.
I understand. You could also see this repetition as implementing the 4-eyes principle, where the compiler supplies two eyes. ;)
@shepmaster I think this is getting a bit off-track, but IMO the original point is that it's not clear what this method's invariants are - i.e. when is the unsafe
code actually not UB, with the simpler name.
I agree "unchecked" isn't the best option, but it has precedent as "easily violating invariants".
This makes me wish we had a naming convention along the lines of initialized_or_ub
.
I think this is getting a bit off-track
I was about to say so myself. I've said my piece (and apparently no one agrees with me), so I'll let it lie; y'all pick whatever you want.
we had a naming convention along the lines of
initialized_or_ub
You mean like maybe_uninit(ialized)
? Something that could be broadly applied to a set of related methods somehow? 😇
No, I mean like unwrap_or_else
- putting what happens in the "unhappy case" in the method name.
@eddyb Hey that's not so bad... .initialized_or_unsound
maybe?
In general, adding type information to identifier names is considered an anti-pattern (e.g. foo_i32
, bar_mutex
, baz_iterator
) because that's what types are there for.
Yet when it comes to functions, even though unsafe
is part of the fn
type, adding _unchecked
, _unsafe
, _you_better_know_what_you_are_doing
appears to be pretty common.
I wonder, why is this the case?
Also, FYI, there is an issue (https://github.com/rust-analyzer/rust-analyzer/issues/190) in rust-analyzer
to expose whether functions are unsafe
or not. Editors and IDEs should be able to emphasize operations that require unsafe
within unsafe
blocks, which include not only calling unsafe
functions (independently of whether they are suffixed with an identifier like, e.g., _unchecked
or not), but also dereferencing raw pointers, etc.
Arguably, rust-analyzer
cannot do this yet (EDIT: intellij-Rust kind of can: https://github.com/intellij-rust/intellij-rust/issues/3013#issuecomment-440442306), but if the intent is to make it clear that calling this inside an unsafe
block requires unsafe
, syntax highlighting is a possible alternative to suffixing this with anything. I mean, if you really want this right now, you can probably add this function's name as a "keyword" to your syntax highlighter in a couple of minutes and call it a day.
@gnzlbg
In general, adding type information to identifier names is considered an anti-pattern (e.g.
foo_i32
,bar_mutex
,baz_iterator
) because that's what types are there for.
Sure, hungarian notation is in general considered an anti-pattern. I would agree. However, in general, safety is not considered in these discussions and I think that given the danger that UB presents there are good reasons to make an exception here.
Yet when it comes to functions, even though
unsafe
is part of thefn
type, adding_unchecked
,_unsafe
,_you_better_know_what_you_are_doing
appears to be pretty common.I wonder, why is this the case?
Simply put: unsafety. When unsafety is involved, redundancy is imo your friend. This applies both to code and to safety critical hardware and such.
Also, FYI, there is an issue (rust-analyzer/rust-analyzer#190) in
rust-analyzer
to expose whether functions areunsafe
or not. Editors and IDEs should be able to emphasize operations that requireunsafe
withinunsafe
blocks, which include not only callingunsafe
functions (independently of whether they are suffixed with an identifier like, e.g.,_unchecked
or not), but also dereferencing raw pointers, etc.Arguably,
rust-analyzer
cannot do this yet, but if the intent is to make it clear that calling this inside anunsafe
block requiresunsafe
, syntax highlighting is a possible alternative to suffixing this with anything.
All of this is pretty awesome. However, as @RalfJung noted: "Not everybody only reads code in IDEs though -- i.e., reviews are not usually done in IDEs." It strikes me as unlikely that GitHub would embed rust-analyzer in its UI so as to show whether a called function / operation is unsafe or not.
If the tradeoff is between being ugly and being prone to incorrect (and thus unsound) use of unsafe
, I think we should always prefer the former. A lot can be said for making it so that a programmer _has_ to pause and think to themselves, "wait, am I doing this right?"
For example, if you want to use an insecure cryptographic operation in Mundane, you have to:
insecure
moduleallow(deprecated)
or just live with the compiler warnings emitted whenever you use that operationlet mut hash = InsecureSha1::default(); hash.insecure_write(bytes); ...
It's all documented here in more detail. I don't think that we need to be _that_ aggressive in all circumstances, but I think the right philosophy is that, if the operation is dangerous enough to worry about, then there shouldn't be any way for a programmer to miss the gravity of what they're doing.
Since we are 95% worried about people misusing this type, and only 5% worried about long names, let's start out by renaming the type to MaybeUninitialized
. The extra 7 characters are worth it.
Rename it toMaybeUninitializedOrUndefinedBehavior
to really hammer it home to end users.
Opt for this type to have no methods and everything can be an associated function, reinforcing the point on every function call, as desired:
MaybeUninitializedOrUndefinedBehavior::into_inner(value)
MaybeUninitializedOrUndefinedBehaviorReadTheDocsAllOfThemYesThisMeansYou
Well... truthfully, having a long name such as MaybeUninitializedOrUndefinedBehavior
in the type seems misplaced to me. It's the operation .into_inner()
that needs the good name because that's the potentially problematic bit that needs extra attention. Having no methods could be a good idea tho. MaybeUninit::initialized_or_undefined(foo)
seems pretty clear.
IMO we shouldn't be going out of our way to make unsafe operations unergonomic like this. We need ergonomic names and ways to write correct unsafe code. If we clutter it with a bunch of excessively long names and unclear utilities and conversions it's going to discourage users from writing correct unsafe code and make unsafe code harder to read and validate.
Remember that a vast majority of people will likely be using an IDE with some form of autocomplete, so a long name is a minor roadbump.
Until RLS is more functional, this is not the case for me at least.
I think most of us agree that
More descriptive names are good
Less ergonomic names are bad
and the question is about which way to resolve things when these are in tension.
Even so I think into_inner
in particular is a bad name for this method (to use a fancy term, it's not on the Pareto frontier). The general convention is that we have an into_inner
when a Foo<T>
contains exactly one T
, and you want to get it out. But this not true of MaybeUninit<T>
. It contains zero or one T
s.
So a less bad option, at least, would be to call it unwrap
, or perhaps unwrap_unchecked
.
I also think from_initialized
or from_initialized_unchecked
sound alright, although "from" usually appears in the names of static methods.
Maybe unwrap_initialized_unchecked
would be okay?
Call it take_initialized
and make it &mut self
instead of self
. The name makes it clear that it expects the inner value to be initialized. In the context of MaybeUninit
, unsafe
and the fact that it doesn't return an Option
/Result
also makes it clear that this operation is unchecked.
Having it take a &mut self
seems like a footgun that makes it easier to lose track of whether you've semantically moved out of the MaybeUninit
.
Alternative name: since it really is moving ownership just like methods named into
imply, maybe into_initialized_unchecked
?
Having it take a &mut self seems like a footgun that makes it easier to lose track of whether you've semantically moved out of the MaybeUninit.
It's a method that has been requested though, and as long as you track otherwise that this doesn't happen twice, it's okay.
And it doesn't seem worth having both the borrowed and the consuming variant.
I like take_initialized
, or the more explicit variant take_initialized_unchecked
.
let's start out by renaming the type to MaybeUninitialized
Anybody up for preparing a PR?
for preparing a PR?
I can use my awesome sed
skills since I suggested it ;-)
I think it would be an improvement to call the into_inner
method something that emphasizes that it assumes it is initialized, but I think the addition of unchecked
is superfluous and unhelpful. We have a way of informing users that unsafe functions are unsafe: we generate a compiler error if they don't wrap them in an unsafe block.
EDIT: take_initialized
seems good
What about assume_initialized
? This:
assume
intrinsic, is UB if wrongly assumedAnybody up for preparing a PR?
Never mind. The libs team decided this wasn't worth it.
Is there a reason why MaybeUninit<T>
is not Copy
when T: Copy
?
@tommit because that MaybeUninit<T>
relies on ManuallyDrop<T>
, we programmers should guarantee that the inner value is dropped when our structs are out of scope. If it implements Copy
, I think it may be harder for Rust new-comers to remember to drop the inner value T
everytime, of either the struct itself or its copies. By this way it may produce more unconspicuous memory leaks that we are not expecting to.
@luojia65 Not sure that line of reasoning applies when T
itself is Copy
, regardless of what ManuallyDrop
and MaybeUninit
do.
I don't think there is any reason. Just nobody thought of adding #[derive(Copy)]
;)
An observation of maybe a somewhat subtle aspect of this:
I believe that even though MaybeUninit<T>
should be Copy
when T: Copy
, MaybeUninit<T>
should not be Clone
when T: Clone
and T
is not Copy
.
Oh yes, we definitely cannot just call clone
.
I keep forgetting that Copy: Clone
...
That’s fine, we can implement Clone for MaybeUninit<T> where T: Copy
based on returning *self
.
I did my best to update the issue description with all the questions that have come up here. Let me know if I missed anything!
The documentation for ManuallyDrop::drop
says
This function runs the destructor of the contained value and thus the wrapped value now represents uninitialized data. It is up to the user of this method to ensure the uninitialized data is not actually used.
Any suggestions for how to improve that wording so that it cannot possibly be confused with the kind of "uninitializedness" that MaybeUninit
handles?
In my terms, a dropped ManuallyDrop<T>
is no longer a safe T
, but it is a valid T
... at least as far as layout optimizations care.
"stale"/"invalid", perhaps? It is initialized.
FWIW I think the wording is clear (at least to me) making sure that an
object isn’t dropped twice is a “safety” issue. If we had a small document
in the UCG defining “safety” one probably should hyperlink it. You could
add that T must be “valid” and hyperlink “valid” definition, but since we
don’t have these definitions written down anywhere yet... I don’t know. I
don’t think we should paraphrase them all over the docs.
Can we stabilize MaybeUninit before deprecating uninitialized?
@RalfJung I would say that the place is "moved from". FWIW we should use the same kind of terminology in std::ptr::read
, but it's not very clear there either.
@bluss we shouldn’t ever deprecate anything widely used without a “better
solution” / “migration path” for current users.
Deprecation warnings should be: “X is deprecated, use Y instead”. If we
don’t have an Y and X is widely used... then we should consider holding on
the deprecation warning until we have an Y.
Otherwise, we would be sending a really weird message.
@cramertj "invalid" is not a good choice, as it still (must!) satisfy the validity invariant.
If we had a small document in the UCG defining “safety” one probably should hyperlink it. You could add that T must be “valid” and hyperlink “valid” definition, but since we don’t have these definitions written down anywhere yet...
We should definitely do that once we got something :D
@RalfJung I don't think "validity invariant" is in the lexicon of most (nearly any?) Rust users-- I think referring to "invalid data" colloquially is acceptable (the ManuallyDrop<T>
is no longer usable as a T
). Saying that it has to uphold certain representation invariants that the compiler uses for optimizations doesn't make it any less invalid data.
I don't think "validity invariant" is in the lexicon of most (nearly any?) Rust users
Fair enough, the term is not official (yet). But we should pick an official term for this eventually, and then we should avoid such clashes. We can say that "valid" data is what I call "safe" in my post, but then we need another word for what I call "valid".
@shepmaster wrote a while ago
I think a method that moves out of the wrapper, replacing it with an uninitialized value, would be useful:
pub unsafe fn take(&mut self) -> T
I think my biggest concern with this is that with such a function, it is terribly easy to accidentally copy non-copy data. In case you need that, is it really so bad to do maybe_uninit.as_ptr().read()
?
I think I may have suggested somewhere up there to have something like take
replace something like into_inner
. I don't think that that's a good idea any more: Most of the time, the additional restriction that into_inner
consumes self
is actually helpful.
@RalfJung In the end, all methods of MaybeUninit
are unsafe and are just convenience wrappers around as_ptr
. However I expect take
to be one of the most common operations, sinceMaybeUninit
is effectively just an Option
where the tag is managed externally. This is useful in many cases, for example an array where not all elements are initialized (e.g. a hash table).
In https://github.com/rust-lang/rust/pull/57045, I am proposing to add two new operations to MaybeUninit
:
/// Get a pointer to the first contained values.
pub fn first_ptr(this: &[MaybeUninit<T>]) -> *const T {
this as *const [MaybeUninit<T>] as *const T
}
/// Get a mutable pointer to the first contained values.
pub fn first_mut_ptr(this: &mut [MaybeUninit<T>]) -> *mut T {
this as *mut [MaybeUninit<T>] as *mut T
}
See that PR for motivation and discussion.
When removing zeroed
it seems like it is only replaced by MaybeUninit::zeroed().into_inner()
which becomes an equivalent way to write the same thing. There is no practical change. With uninit values we instead have the practical change of all uninitialized data being kept stored in values of the type MaybeUninit
or equivalent union.
For this reason, I'd consider keeping std::mem::zeroed
as it is since it is a widely used function in FFI. Deprecation would make it emit loud warnings, which is almost tantamount to it being removed, and at least very annoying — this can also lead to a growing number of #[allow(deprecated)]
which can hide other more important issues.
This exercise of clarifying Rust's model and guidelines for unsafe
-marked code is super useful, but let's avoid changes like for zeroed
where it is only recasting the same practical effect using a new way to say it.
@bluss My understanding (which might be wrong) is that std::mem:zeroed
is equally as dangerous as std::mem::uninitialized
, and is just as likely to result in UB. Perhaps it is being used to initialize byte arrays, which would be better initialized with vec![0; N]
or [0; N]
, in which case perhaps a rustfix
rule could be added to automate the change? Outside of initializing byte or integer arrays, though, my understanding is that there's a good chance that using std::mem::zeroed
can lead to UB.
@scottjmaddox It's very easy to invoke UB with std::mem:zeroed
, but unlike std::mem::uninitialized
, there are some types for which std::mem:zeroed
is perfectly valid (e.g., native types, many FFI-related struct
s, etc.). Like many unsafe
functions, zeroed()
is not to be used lightly, but it's not nearly as problematic as uninitialized()
. I for one would be sad to have to use MaybeUninit::zeroed().into_inner()
instead of std::mem:zeroed()
since there isn't any difference between the two in terms of safety and the MaybeUninit
version is more unwieldy and a bit less legible (and when I must use unsafe code, I highly value legibility).
@mjbshaw
unlike std::mem::uninitialized, there are some types for which std::mem:zeroed is perfectly valid (e.g., native types,
There are some types for which mem::uninitialized
is perfectly safe (e.g. unit
), while there are some "native" types (e.g. bool
, &T
, etc.) for which mem::zeroed
invokes undefined behavior.
There appears to be a misconception here that MaybeUninit
is somehow about uninitialized memory (and I can see why: "Uninitialized" is in its name).
The danger that we are trying to prevent is the one caused by the creation of an _invalid_ value, whether the _invalid_ value contains all zeros, or uninitialized bits, or something else (e.g. a bool
from a bit-pattern that isn't true
or false
), doesn't really matter - mem::zeroed
and mem::uninitialized
both can be used to create an _invalid_ value, and are therefore pretty much equally dangerous from my point-of-view.
OTOH MaybeUninit::zeroed()
and MaybeUninit::uninitialized()
are _safe_ methods because they return an union
. MaybeUninit::into_inner
is unsafe
, and calling it is only _safe_ if the precondition that the current bits in the MaybeUninit<T>
represent a _valid_ value of T
is satisfied. If the bit-pattern is _invalid_, the behavior is undefined. Whether the bit-pattern is invalid because it contains all zeros, uninitialized bits, or something else, doesn't really matter.
@RalfJung I'm starting to get the feeling that the name MaybeUninit
might be a bit misleading. Maybe we should rename it to MaybeInvalid
or something like that to better convey the problem it solves and the dangers it avoids. EDIT: following @Centril's suggestions I've posted on the bikeshed issue.
EDIT: FWIW, I do think that having an ergonomic way (e.g. without directly using MaybeUninit
) to create zeroed memory safely would be useful, but mem::zeroed
isn't it. We could add a Zeroed
trait similar to Default
that is only implemented for types for which the all zeroes bit-pattern is valid or something like that, as a way to achieve a similar effect to what mem::zeroed
does now, but without its pitfalls.
In general, I do think that we shouldn't deprecate functionality until a migration path for current users to a better solution is in place. MaybeUninit
is a better solution than mem::zeroed
in my eyes, even though it might not be perfect (it is safer, but it is not as ergonomic), so I would be fine with deprecating mem::zeroed
as soon as MaybeUninit
lands, even if by the time that happens we don't have a more ergonomic replacement in place.
Maybe we should rename it to MaybeInvalid or something like that to better convey the problem it solves and the dangers it avoids.
Bikeshed in https://github.com/rust-lang/rust/pull/56138.
@gnzlbg
there are some "native" types (e.g.
bool
As long as bool
is FFI-safe (which it generally is considered to be, despite RFC 954 being rejected and then unofficially-officially accepted), it should be safe to use mem::zeroed
for it.
,
&T
, etc.) for whichmem::zeroed
invokes undefined behavior.
Yes, but these types that have UB for mem::zeroed
also have UB for MaybeUninit::zeroed().into_inner()
(I was careful to intentionally include .into_inner()
in my original comment). MaybeUninit
adds nothing if the user just immediately calls .into_inner()
(which is precisely what I and many others would do if mem::zeroed
was deprecated, because I'm only using mem::zeroed
for types which are zero-safe).
As long as bool is FFI-safe (which it generally is considered to be, despite RFC 954 being rejected and then unofficially-officially accepted), it should be safe to use mem::zeroed for it.
I didn't wanted to get into the specifics of this, but bool
is FFI-safe in the sense that it is defined to be equal to C's _Bool
. However, the true
and false
values of C's _Bool
are not defined in the C standard (although they might be some day, maybe in C20), so whether mem::zeroed
creates a valid bool
or not is technically implementation-defined.
Yes, but these types that have UB for mem::zeroed also have UB for MaybeUninit::zeroed().into_inner() (I was careful to intentionally include .into_inner() in my original comment). MaybeUninit adds nothing if the user just immediately calls .into_inner() (which is precisely what I and many others would do if mem::zeroed was deprecated, because I'm only using mem::zeroed for types which are zero-safe).
I don't really understand which point you are trying to make here.MaybeUninit
adds the option of calling or not calling into_inner
, which mem::zeroed
doesn't have, and there is value in that since that is the operation that can introduce undefined behavior (constructing the union as uninitialized or zeroed is safe).
Why would anyone blindly translate mem::zeroed
to MayeUninit
+into_inner
? That is not the appropriate way to "fix" the deprecation warning of mem::zeroed
, and silencing the deprecation warning has the same effect and a much lower cost.
The appropriate way of moving from mem::zeroed
to MaybeUninit
is to evaluate whether it is safe to call into_inner
, in which case one can just do so and write a comment explaining why that is safe, or just keep working with MaybeUninit
as an union
until calling into_inner
becomes safe (one might need to change a lot of code until that's the case, do API breaking changes to return MaybeUninit
instead of T
s, etc.).
I didn't wanted to get into the specifics of this, but
bool
is FFI-safe in the sense that it is defined to be equal to C's_Bool
. However, thetrue
andfalse
values ofC's
_Boolare not defined in the C standard (although they might be some day, maybe in C20), so whether
mem::zeroedcreates a valid
bool` or not is technically implementation-defined.
Apologies for continuing the tangent, but C11 requires that all-bits-set-to-zero represents the value 0 for integer types (see section 6.2.6.2 "Integer types", paragraph 5) (which includes _Bool
). Additionally, the values of true
and false
are explicitly defined (see the section 7.18 "Boolean type and values <stdbool.h>
").
I don't really understand which point you are trying to make here.
MaybeUninit
adds the option of calling or not callinginto_inner
, whichmem::zeroed
doesn't have, and there is value in that since that is the operation that can introduce undefined behavior (constructing the union as uninitialized or zeroed is safe).
There is value in MaybeUninit
and MaybeUninit::zeroed
. We both agree on that. I'm not arguing for MaybeUninit::zeroed
to be removed. My point is that there is also value in std::mem::zeroed
.
There are some types for which mem::uninitialized is perfectly safe (e.g. unit), while there are some "native" types (e.g. bool, &T, etc.) for which mem::zeroed invokes undefined behavior.
This is a red herring. Just because both zeroed
and uninitialized
are valid for some subset of types doesn't make them comparable in actual use. You need to look at the size of those subsets. The number of types for which mem::uninitialized
is valid is very very small (in fact, is it only zero-sized types?), and no one would actually write code that does that (e.g. for ZSTs you would just use the type constructor). On the other hand, there are many many types for which mem::zeroed
is valid. mem::zeroed
is valid for at least the following types (hope I got this right):
bool
, as mentioned above)Option<T>
where T triggers enum layout optimization. T
includes:NonZeroXXX
(all integer types)NonNull<U>
&U
&mut U
fn
-pointersstruct
where any field is a type in this list.struct
, or union
consisting only of types in this list.Yes, both uninitialized
and zeroed
deal with potentially-invalid values. However, programmers use these primitives in very different ways.
The common pattern for mem::uninitialized
is:
let val = MaybeUninit::uninitialized();
initialize_value(val.as_mut_ptr()); // or val.set
val.into_inner()
If you are not writing your use of uninitialized values this way, you are most likely making a big mistake.
The most common use of mem::zeroed
today is for types described above, and this is perfectly valid. I completely agree with @bluss that I don't see any footgun-prevention gain by replacing mem::zeroed()
everywhere by MaybeUninit::zeroed().into_inner()
.
To summarize, common use of uninitialized
is for types for which can have invalid values. Common use of zeroed
is for types which are valid if zeroed.
A Zeroed
trait or similar (e.g. Pod
, but note that T: Zeroed
does not imply T: Pod
) as has been suggested seems like a fine thing to add in the future, but let's not deprecate fn zeroed<T>() -> T
until we actually have a stable fn zeroed2<T: Zeroed>() -> T
.
@mjbshaw
Apologies for continuing the tangent, but C11 requires that
Indeed! It's only C++'s bool
which leaves the valid values unspecified! Thanks for correcting me, gonna send a PR to the UCG with this guarantee.
@jethrogb
You need to look at the size of those subsets. The number of types for which
mem::uninitialized
is valid is very very small (in fact, is it only zero-sized types?), and no one would actually write code that does that (e.g. for ZSTs you would just use the type constructor).
It's not even correct for all ZSTs if you factor in privacy with which it's possible to have ZSTs as a sort of "proof of work" or "token for resource" or just "proof witness" in general. A trivial example:
mod refl {
use core::marker::PhantomData;
use core::mem;
/// Having an object of type `Id<A, B>` is a proof witness that `A` and `B`
/// are nominally equal type according to Rust's type system.
pub struct Id<A, B> {
witness: PhantomData<(
// Make sure `A` is Id is invariant wrt. `A`.
fn(A) -> A,
// Make sure `B` is Id is invariant wrt. `B`.
fn(B) -> B,
)>
}
impl<A> Id<A, A> {
/// The type `A` is always equal to itself.
/// `REFL` provides a proof of this trivial fact.
pub const REFL: Self = Id { witness: PhantomData };
}
impl<A, B> Id<A, B> {
/// Casts a value of type `A` to `B`.
///
/// This is safe because the `Id` type is always guaranteed to
/// only be inhabited by `Id<A, B>` types by construction.
pub fn cast(self, value: A) -> B {
unsafe {
// Transmute the value;
// This is safe since we know by construction that
// A == B (including lifetime invariance) always holds.
let cast_value = mem::transmute_copy(&value);
// Forget the value;
// otherwise the destructor of A would be run.
mem::forget(value);
cast_value
}
}
}
}
fn main() {
use core::mem::uninitialized;
// `Id<?A, ?B>` is a ZST; let's make one out of thin air:
let prf: refl::Id<u8, String> = unsafe { uninitialized() };
// Segfault:
let _ = prf.cast(42u8);
}
@Centril this is kind of a tangent, but I'm not sure if your code is actually an example of a type for which calling uninitialized
creates an invalid value. You're using unsafe code to violate the internal invariants that Id
is supposed to uphold. There are many ways to do this, for example transmute(())
, or type-casting raw pointers.
@jethrogb My only points are that a) please be more careful with wording, b) privacy doesn't seem sufficiently reasoned about in discussions about what valid values even are. It seems to me that "violate the internal invariants" and "invalid value" are the same thing; there's a side-condition here "if A != B
then Id<A, B>
is uninhabited.".
It seems to me that "violate the internal invariants" and "invalid value" are the same thing; there's a side-condition here "if
A != B
thenId<A, B>
is uninhabited.".
Invariants "imposed by library code" are different from invariants "imposed by the compiler" in several ways, see @RalfJung's blog post about the topic. In that terminology, your Id
example has a safety invariant and mem::zeroed
or other ways to generically synthesize a Id<A, B>
cannot be safe, but it is not is not immediate UB to just construct a wrong Id
value with mem::zeroed
or mem::uninitialized
because Id
has no validity invariant. While unsafe code authors certainly need to keep both kinds of invariants in mind, there are some reasons why this discussions mostly focus on validity:
mem::zeroed::<T>()
based on T
's safety invariant, we may not want to.After reading @jethrogb's comment, I agree that mem::zeroed
should not be deprecated with the introduction of MaybeUninit
.
@jethrogb Small nit:
any array of any type in this list
any struct where any field is a type in this list.
Not sure if this is a simple typo or a semantic difference, but I think you need to out-dent these two bullets-- I don't believe it's necessarily the case that None
of e.g. Option<[&u8; 2]>
has bitwise-zeros as a valid representation (it could e.g. use [0, 24601]
as the representation of the None
case-- only one of the inner values must take on a niche representation -- cc @eddyb to check me on this). I doubt we do this today, but it doesn't seem completely impossible that something like this could appear in the future.
@jethrogb
The most common use of mem::zeroed today is for types described above, and this is perfectly valid.
Is there a source for this?
On the other hand, there are many many types for which mem::zeroed is valid.
There are also infinitely many cases for which it can be used incorrectly.
I understand that for those using mem::zeroed
heavily and correctly, delaying the deprecation until a more ergonomic solution is available is a very appealing alternative.
I prefer the trade-off of reducing or eliminating the number of incorrect usages of mem::zeroed
even if that incurs a temporary ergonomic cost. A deprecation warns users that what they are doing does potentially invoke undefined behavior (particularly new users which use it for the first time), and we have a sound solution to what to do instead, which makes the warning actionable.
I use MaybeUninit
often and it is less ergonomic to use than mem::zeroed
and mem::uninitialized
, but it hasn't been painfully unergonomic for me. If MaybeUninit
is as painful as some comments in this discussion claim, then a library and / or RFC for a safe mem::zeroed
alternative will pop up in no time (nothing is blocking anyone here AFAICT).
Alternatively, users can ignore the warning and keep using mem::zeroed
, that's up to them, we can't ever remove mem::zeroed
from libcore
anyways.
But people using mem::zeroed
heavily should be actively inspecting if all their usages are correct anyways. Particularly those using mem::zeroed
heavily, those using it in generic code, those using it as a "less scary" alternative to mem::uninitialized
, etc. Delaying the deprecation just delays warning users that what they are doing might be undefined behavior.
@bluss
When removing zeroed it seems like it is only replaced by MaybeUninit::zeroed().into_inner() which becomes an equivalent way to write the same thing. There is no practical change. With uninit values we instead have the practical change of all uninitialized data being kept stored in values of the type MaybeUninit or equivalent union.
This is true when we are talking about integers, but once we look at e.g. reference types, mem::zeroed()
becomes a problem as well.
However, I agree that it is much more likely that people will actually realize that mem::zeroed::<&T>()
is a problem, than people realizing that mem::uninitialized::<bool>()
is a problem. So maybe it makes sense to keep mem::zeroed()
.
Notice, however, that we might still decide that mem::uninitialized::<u32>()
is fine -- if we allow uninitialized bits in integer types, mem::uninitialized()
becomes valid for almost all "POD types". I don't think we should allow this, but we still have to have this discussion.
The number of types for which mem::uninitialized is valid is very very small (in fact, is it only zero-sized types?), and no one would actually write code that does that (e.g. for ZSTs you would just use the type constructor).
FWIW, some slice iterator code actually has to create a ZST in generic code without being able to write a type constructor. It uses mem::zeroed()
/MaybeUninit::zeroed().into_inner()
for that.
mem::zeroed()
is useful for certain FFI cases where you are expected to zero a value with memset(&x, 0, sizeof(x))
before calling a C function. I think this is a sufficient reason to keep it undeprecated.
@Amanieu That seems unnecessary. The Rust construct matching memset
is write_bytes
.
mem::zeroed() is useful for certain FFI cases
Also, the last time I checked, mem::zeroed
was the idiomatic way to initialize libc
structures with private or platform-dependent fields.
@RalfJung The full code in question is usually Type x; memset(&x, 0, sizeof(x));
and the first part doesn't have a great Rust equivalent. Using MaybeUninit
for this pattern is a lot of line noise (and much worse codegen without optimizations) when the memory is never actually invalid after the memset
.
I have a question about the design of MaybeUninit
: Is there any way to write to a single field of the T
contained inside a MaybeUninit<T>
such that you could over time write to all of the fields and end up with a valid/initialized type?
Suppose we have a struct like the following:
// Let us suppose that Foo can in principle be any struct containing arbitrary types
struct Foo {bar: bool, baz: String}
Does generating an &mut Foo reference, and then writing to it trigger UB?
main () {
let uninit_foo = MaybeUninitilized::<Foo>::uninitialized();
unsafe { *uninit_foo.get_mut().bar = true; }
unsafe { *uninit_foo.get_mut().baz = "hello world".to_owned(); }
}
Does using a raw pointer instead of a reference avoid this problem?
main () {
let uninit_foo = MaybeUninitilized::<Foo>::uninitialized();
unsafe { *uninit_foo.as_mut_pointer().bar = true; }
unsafe { *uninit_foo.as_mut_pointer().baz = "hello world".to_owned(); }
}
Or is there any other way in which this pattern can be implemented without triggering UB? Intuitively, it seems to me that as long as I'm not reading uninitialized/invalid memory, then everything should be fine, but several of the comments in this thread lead me to doubt that.
My use case for this functionality would be for an in-place builder pattern for types where some of the fields are required to be specified by the user (and don't have a sensible default), but some of the field do have a default value.
Is there any way to write to a single field of the T contained inside a MaybeUninit
such that you could over time write to all of the fields and end up with a valid/initialized type?
Yes. Use
ptr::write(&mut *(uninit.as_mut_ptr()).bar, val1);
ptr::write(&mut *(uninit.as_mut_ptr()).baz, val2);
...
You must not use get_mut()
for this, that's why the docs for get_mut
say that the value must be initialized before calling this method. We might relax that rule in the future, that is being discussed at https://github.com/rust-rfcs/unsafe-code-guidelines/.
@RalfJung Wouldn't *(uninit.as_mut_ptr()).bar = val1;
risk dropping the value previously in bar
, which might be uninitialized? I think it's necessary to do
ptr::write(&mut (*uninit.as_mut_ptr()).bar, val1);
@scottjmaddox ah, right. I forgot about Drop
. I will update the post.
In what way does this variant of writing to uninitialized fields exhibit less undefined behaviour than get_mut()
? At the code point where the first argument to ptr::write
is evaluated, the code has created a &mut _
to the inner field which should be just as undefined as the reference to the whole struct that would otherwise be created. Should the compiler not be allowed to assume this to be in initialized state already then?
Would that not necessitate a new pointer-projection method that does not require exposed &mut _
intermediates?
Slightly interesting example:
pub struct A { inner: bool }
pub fn init(mut uninit: MaybeUninit<A>) -> A {
unsafe {
let mut previous: [u8; std::mem::size_of::<bool>()] = [0];
{
// Doesn't the temorary reference assert inner was in valid state before?
let inner_ptr: *mut _ = &mut (*uninit.as_mut_ptr()).inner;
ptr::copy(inner_ptr as *const [u8; 1], (&mut previous) as *mut _, 1);
// With the assert below, couldn't the compiler drop this?
std::ptr::write(inner_ptr, true);
}
// Assert Inner wasn't false before, so it must have been true already!
assert!(previous[0] != 0);
// initialized all fields, good to proceed.
uninit.into_inner()
}
}
But if the compiler may assume &mut _
to be a valid representation, it may just outright throw away to ptr::write
? If we get past the assert, the content was not 0
but the only other valid bool is true/1
. So it could assume this to be a no-op if we get past the assert. Since the value is not accessed before, after reordering we could end up with this? It doesn't look like llvm exploits this right now, but I'm very unsure if this would be guaranteed.
If we instead create our own MaybeUninit
within the function, we get a slightly different reality. On the playground we instead find out it assumes that the assert can never trigger, presumably as it assumes str::ptr::write
is the only write to inner
thus it must have happened already before we read from previous
? This seems a bit fishy anyways. To support this theory, watch what happens when you change the pointer write to false
instead.
I realize this tracking issue may not be best place for this question.
@RalfJung @scottjmaddox Thank you for your answers. These nuances are exactly why I asked.
@HeroicKatora Yes, I was wondering about that.
Perhaps the correct incantation is this?
struct Foo {bar: bool, baz: String}
fn main () {
let mut uninit_foo = MaybeUninit::<Foo>::uninitialized();
unsafe { ptr::write_unaligned(&mut ((*uninit_foo.as_mut_ptr()).bar) as *mut bool, true); }
unsafe { ptr::write_unaligned(&mut ((*uninit_foo.as_mut_ptr()).baz) as *mut String, "".to_string()); }
}
I read a comment on Reddit (which unfortunately I can no longer find) which suggested that immediately casting a reference to a pointer (&mut foo as *mut T
) actually compiles to just creating a pointer. However, the *uninit_foo.as_mut_ptr()
bit worries me. Is it ok to dereference the pointer to the unitialized memory like this? We're not actually reading anything, but it's unclear to me whether the compiler knows that.
I figured the unaligned
variant of ptr::write
might be required for generic code over MaybeUninit<T>
as not all types will have aligned fields?
No need for write_unaligned
. The compiler handles field alignment for you. And the as *mut bool
shouldn't be necessary either, since the compiler can infer that it needs to coerce the &mut
into a *mut
. I think this inferred coercion is why it's safe/valid. If you want to be explicit and do as *mut _
, that should be fine, too. If you want to save the pointer in a variable, then it's necessary to do coerce it into a pointer.
@scottjmaddox Is ptr::write
still safe even if the struct is #[repr(packed)]
? ptr::write
says the pointer must be correctly aligned, so I assume ptr::write_unaligned
is required in cases where you are writing some generic code that needs to handle packed representations (though to be honest I'm not sure I can think of an example of "generic code over MaybeUninit<T>
" that wouldn't know whether the field was properly aligned or not).
@nicoburns
which suggested that immediately casting a reference to a pointer (&mut foo as *mut T) actually compiles to just creting a pointer.
What it compiles to is distinct from the semantics the compiler is allowed to use to perform this compilation. Even if it is a no-op in IR, it can still have a semantic effect such as asserting additional assumptions to the compiler. @scottjmaddox is correct in which operations are at play here but the critical part of the question is the creation of the mutable reference which happens before and independently of the ref-to-ptr coercion. Then @mjbshaw is technically correct about the general safety requiring ptr::write_unaligned
when the argument is an unknown generic argument.
I don't remember where I read this (nomicon? One of @RalfJung's blog posts?) but I'm fairly certain that field access via raw pointer dereference, reference, and immediate conversion of the reference into a pointer (either via coercion or casting) is special cased.
In what way does this variant of writing to uninitialized fields exhibit less undefined behaviour than get_mut()? At the code point where the first argument to ptr::write is evaluated, the code has created a &mut _ to the inner field which should be just as undefined as the reference to the whole struct that would otherwise be created. Should the compiler not be allowed to assume this to be in initialized state already then?
Very good question! These concerns are one reason why I opened https://github.com/rust-lang/rfcs/pull/2582. With that RFC accepted, the code I showed does not create an &mut
, it creates a *mut
.
@mjbshaw Touché. Yes, I suppose you're right about the possibility of the struct being packed, and therefor needing ptr::write_unaligned
. I had not considered that before, primarily because I've yet to use packed structures in rust. This should probably be a clippy lint, if it's not already.
Edit: I didn't see a relevant clippy lint, so I submitted an issue: https://github.com/rust-lang/rust-clippy/issues/3659
I opened a PR to un-deprecate mem::zeroed
: https://github.com/rust-lang/rust/pull/57825
I've opened an issue in the RFC repo to fork the discussion about safe memory zeroing, so that we can deprecate mem::zeroed
at some point once we have a better solution to that problem: https://github.com/rust-lang/rfcs/issues/2626
Would it be possible to stabilize const uninitialized
, as_ptr
and
as_mut_ptr
ahead of the rest of the API? It seems very likely to me that these
will stabilized as they are right now. Plus, the rest of the API can be built on
top of as_ptr
and as_mut_ptr
, so once stabilized it would be possible to
have a MaybeUninitExt
trait on crates.io that provides, on stable, the API
that's currently being discussed letting more people (e.g. stable-only users)
give feedback on it.
In embedded, instead of a global allocator (unstable), we use static variables,
a lot. Without MaybeUninit
there's no way to have uninitialized memory in
static variables on stable. This prevents us from placing fixed capacity
collections in static variables and initializing static variables at runtime, at
zero cost. Stabilizing this subset of the API will unblock these use cases.
To give you a sense of how important this is for the embedded community, we did
[a survey] asking the community about their pain points and needs. Stabilizing
MaybeUninit
came out as the second most requested thing to stabilize (behind
const fn
with trait bounds) and, overall, ended in 7th place out of dozens of
rust-lang/* related requests. After further deliberation within the WG we bumped
its priority to, overall, third place due to its expected impact on the ecosystem.
(On a more personal note, I'm the author of an embedded concurrency framework
that would benefit from internally using MaybeUninit
(memory usage in
applications could be reduced by 10-50% with zero changes in user code). I
could provide a nightly-only Cargo feature for this, but after years of
nightly-only embedded and only just recently making it to stable I feel that
providing a nightly-only feature would be the wrong message to send to my users
so I'm eagerly waiting for this API to be stabilized.)
@japaric That would certainly avoid the naming discussions around into_inner
and friends. However, I am still concerned about the semantic discussion, e.g. about people doing let r = &mut *foo.as_mut_ptr();
and hence asserting that they have a valid reference, while we are not sure yet what the validity requirements for references are -- i.e., we are not sure yet whether having a ref to invalid data is insta-UB. For a concrete example:
let x: MaybeUninit<!> = MaybeUninit::uninitialized();
let r: &! = &*x.as_ptr() // is this UB?
This discussion just recently started in the UCG WG.
My hope was that would could stabilize MaybeUninit
in a single, coherent "package" with a proper story for uninitialized data, so that people only have to re-learn these things once, instead of releasing it piece-by-piece and maybe having to change some of the rules along the way. But maybe that's not a good idea, and it is more important that we get something out to improve the status quo?
But either way I think we shouldn't stabilize anything before we accepted https://github.com/rust-lang/rfcs/pull/2582, so we that we can at least tell people with certainty that the following is not UB:
let x: MaybeUninit<(!, u32)> = MaybeUninit::uninitialized();
let r1: *const ! = &(*x.as_ptr()).1; // immediately coerced to raw ptr, no UB
let r2 = &(*x.as_ptr()).1 as *const !; // immediately cast to raw ptr, no UB
(Note that as usual !
is a red herring here, and all of the examples in this post we the same, UB-wise, if we used bool
instead.)
My hope was that would could stabilize MaybeUninit in a single, coherent "package" with a proper story for uninitialized data, so that people only have to re-learn these things once, instead of releasing it piece-by-piece and maybe having to change some of the rules along the way.
I find this argument very compelling.
I think the most immediate need is to have some clear message for how to handle uninitialized memory without UB. If that's currently simply "use raw pointers and ptr::read_unaligned
and ptr::write_unaligned
", then that's fine, but we definitely need some well defined way to get raw pointers to uninitialized stack values and to struct/tuple fields. rust-lang/rfcs#2582 (plus some documentation) seems to fulfill the immediate needs, whereas MaybeUninit
does not.
@scottjmaddox how's that RFC but without MaybeUninit
any good for uninitialized (stack) memory?
@RalfJung I suppose that depends on whether or not the following is UB:
let x: bool = mem::uninitialized();
ptr::write(&x as *mut bool, false);
assert_eq!(x, false);
My implicit assumption was that rust-lang/rfcs#2582 would make the above example valid and well defined. Is that not the case?
@scottjmaddox
let x: bool = mem::uninitialized();
This is UB. It has nothing to do with references.
My implicit assumption was that rust-lang/rfcs#2582 would make the above example valid and well defined.
I am thoroughly surprised by this. That RFC is about references only. Why do you assume it changes something about booleans?
@RalfJung
This is UB. It has nothing to do with references.
The documentation to mem::uninitialized() says:
Bypasses Rust's normal memory-initialization checks by pretending to produce a value of type T, while doing nothing at all.
The documentation says nothing about T*
.
@kpp What are you trying to say? There is no *
and no &
in that one line of code:
let x: bool = mem::uninitialized();
Why do you claim this line to be UB?
Because a bool
must always be true
or false
, and this one isn't. Also see https://github.com/rust-rfcs/unsafe-code-guidelines/blob/master/reference/src/glossary.md#validity-and-safety-invariant.
@kpp for that statement to have defined behavior mem::uninitialized
would need to materialize a _valid_ bool
.
On all currently-supported platforms bool
has only two _valid_ values, true
(bit-pattern: 0x1
) and false
(bit-pattern: 0x0
).
mem::uninitialized
, however, produces a bit-pattern where all bits have the value uninitialized
. This bit-pattern is neither 0x0
nor 0x1
, therefore, the resulting bool
is _invalid_, and the behavior is undefined.
To make the behavior defined we would need to change the definition of bool
to support three valid values: true
, false
, or uninitialized
. We can't, however, do that, because T-lang and T-compiler already RFC'ed that bool
is identical to C's _Bool
and we can't break that guarantee (this allows bool
to be used portably in C FFI).
Arguably, C does not have exactly the same definition of validity that Rust does, but C "trap representations" come very close. In a nutshell, there is not much that one can do in C with a _Bool
whose value does not represent true
or false
without invoking undefined behavior.
If you are correct then the following safe code must be UB too:
let x: bool;
x = true;
Which is obviously not.
If you are rcorrect then the following safe code must be UB too:
let x: bool;
does not initialize x
to an uninitialized
bit-pattern, it does not initialize x
at all. The x = true;
initializes x
(note: if you don't initialize x
before using it you get a compilation error).
This is different from C's behavior, where, depending on context, _Bool x;
initializes x
to an _indeterminate_ value.
Nope, there the compiler knows that x
is not initialized.
The issue with mem::uninitialized
is that it is initializing a variable, as far as the compiler's initialization tracking is concerned.
let x: bool;
does not on its own even reserve any space for x
to be stored at, it just reserves a name. let x = foo;
reserves some space and initializes it using foo
. let x: bool = mem::uninitialized();
reserves 1 byte of space for x
but leaves it uninitialized, and that is a problem.
This is such an easy way to shoot your leg designed API that it must be documented both in mem::uninitialized and intrinsics::uninit with a specialization for mem::uninitialized
Does it also mean that initializing any struct with a bool in it with mem::uninitialized is UB too?
@kpp
Does it also mean that initializing any struct with a bool in it with mem::uninitialized is UB too?
Yes - as you are probably discovering, mem::uninitialized
makes it trivial to shoot yourself in the foot, I'd go as far as saying that it is almost impossible to use correctly. That's why we are trying to deprecate it in favor of MaybeUninit
, which is a bit more verbose to use, but has the benefit that, because it is an union, you can initialize values "by parts" without actually materializing the value itself in an _invalid_ state. The value only has to be fully _valid_ by the time one calls into_inner()
.
You might be interested in reading the sections of the nomicon about checked and unchecked (un)initialization: https://doc.rust-lang.org/nomicon/checked-uninit.html They cover how the let x: bool;
initialization works in safe Rust. Please do fill issues if the explanation is not clear or there is anything that you don't understand. Also keep in mind that most of the explanations there are "non-normative" since they haven't gone through the RFC process yet. The Unsafe Code Guidelines WG will attempt to submit an RFC documenting and guaranteeing the current behavior sometime this year.
This is such an easy way to shoot your leg designed API that it must be documented both in mem::uninitialized and intrinsics::uninit
The issue is that currently there's no correct way to do this -- which is why we are hard at work to get MaybeUninit
stabilized so that these functions can have their documentation replaced by a fat "DO NOT USE".
Discussions like this and issues like this make me agree more and more with @japaric that we should get something out of the door ASAP. Basically we need this and this list of checkboxes to get ticked, I'd say. Then we have enough together to provide some basic patterns.
Would it be possible to stabilize const uninitialized, as_ptr and
as_mut_ptr ahead of the rest of the API? It seems very likely to me that these
will stabilized as they are right now.
+1 for this. It would be great to have this functionality available on stable. It would enable people to experiment with a variety of different higher-level (and potentially safe) APIs on top of this basic low-level one. And it seems like this aspect of the API is pretty uncontroversial.
Additionally I would like to suggest that get_ref
and get_mut
are never stabilized, and are removed entirely. Normally, working with references is safer than working with raw pointers (and thus people might be tempted to use these methods over as_ptr
and as_mut_ptr
even though they are marked unsafe), but in this case they are strictly more dangerous than the raw pointer methods as they can cause UB whereas the pointer methods can't.
If the rule is "never created a reference to uninitialized memory" then I think we should help people to comply with this rule by making it only possible to create such a reference by explicitly doing so, rather than having a helper method which does so internally.
Assuming https://github.com/rust-lang/rfcs/pull/2582, are we completely sure that (1) isn't even UB even though (2) is, and (1) also contains a derefencing of a pointer that points to uninitialized memory?
(1) unsafe { ptr::write_unaligned(&mut ((*uninit_foo.as_mut_ptr()).bar) as *mut bool, true); }
(2) let x: bool = mem::uninitialized();
And if so, what is the logic behind that (hopefully we can put some of the discussion on this issue into the documentation for MaybeUninit)? I'm guessing something like because in (1) the dereferenced value always remains an "rvalue" and never becomes and "lvalue", whereas in (2) the invalid bool becomes an "lvalue" and thus actually has to be materialized in memory (I'm not quite sure what the correct term for this is in Rust, but I've seen these terms used for C++).
And do others think it would be worth creating an RFC for field access syntax on raw pointers that evaluates directly into a raw pointer to the field so as to avoid this confusion in the first place?
If the rule is "never created a reference to uninitialized memory"
I don't think that should be the rule, but it might be. It's being discussed right now in the UCG.
are we completely sure that (1) isn't even UB even though (2) is, and (1) also contains a derefencing of a pointer that points to uninitialized memory?
Good question! But yes, we are -- out of shear necessity, basically. Think of &mut foo as *mut bool
as &raw mut foo
, an atomic expression of type *mut bool
. There is no reference here, just a raw ptr to uninitialized memory -- and that's definitely okay.
let x: bool = mem::uninitialized();
This is UB. It has nothing to do with references.
My implicit assumption was that rust-lang/rfcs#2582 would make the above example valid and well defined.
I am thoroughly surprised by this. That RFC is about references only. Why do you assume it changes something about booleans?
@RalfJung I suppose I thought it was not UB because the undefined value was unobservable because it was immediately overwritten with a valid bool value. But I guess that is not the case?
For more complicated examples, in which the value in x implements Drop, a raw pointer would be required to overwrite the value, and that's why I thought rfc 2582 was necessary to avoid UB.
I suppose I thought it was not UB because the undefined value was unobservable because it was immediately overwritten with a valid bool value. But I guess that is not the case?
The semantics proceeds statement by statement (looking at the MIR). Every single statement must make sense. let x: bool = mem::uninitialized();
materializes a bad boolean, and it doesn't matter what happens later -- you must not materialize a bad boolean.
I understand that the value of x is invalid, but does that necessitate undefined behavior? I can see how it could, in general, taken out of context. But in the context of that particular example, is the behavior not well defined? I suppose my basic hangup is that I don't fully understand the meaning of "undefined behavior".
We want the compiler to be able to rely on certain invariants. These are invariants only if they always hold. Once we start adding exceptions, it becomes a mess.
Maybe you are expecting something more of the form "inspecting a value requires the validity invariant to hold". Here, "inspecting" a bool
would be using it in an if
. That's a reasonable specification, but a less useful one: now the compiler has to prove that the value is actually "inspected" before it can assume the invariant.
does that necessitate undefined behavior?
We pick what is and is not undefined behavior. That's part of designing a language. Undefined behavior is hardly ever "necessary" per se -- but it is necessary to enable more optimizations. So the art here is to find a definition of undefined behavior (as contradicting as that may sound^^) that both enables the desired optimizations, and conforms with (unsafe) programmers' expectations.
I don't fully understand the meaning of "undefined behavior".
I wrote a blog post about that, but the short answer is that undefined behavior is a contract between you and the compiler -- and the contract says that it is your obligation to make sure that no undefined behavior occurs. It's a proof obligation. "dereferencing a NULL pointer is UB" is equivalent to saying "every time a pointer is dereferenced, the programmer is expected to prove that this pointer cannot possibly be NULL". This helps the compiler understand the code, because every time a pointer is dereferenced, the compiler can now deduce "aha! here the programmer proved that the pointer is not NULL, and so I can use that information for optimizations and code generation. thank you, programmer!"
What exactly the contract says is up to the programming language. There are constraints, of course (for example, we are constrained by LLVM). In our case, the UCG believes (in accordance with what we heard from the language and compiler teams) that we want the contract to contain the following clause: "Every time an rvalue is created, the programmer has to prove that this rvalue will always satisfy the validity invariant." There's no law of physics or computers that forces us to have this clause in the contract, but it is deemed a reasonable compromise between many different choices.
In particular, we already emit information for LLVM that we could not rightfully emit with a weaker contract. We could decide to change what we tell LLVM, of course -- but if the choice is between "unsafe code must use MaybeUninit
whenever dealing with uninitialized memory" and "all code can be optimized less", the former seems like the better choice.
Taking your example:
let x: bool = mem::uninitialized();
This code is UB in rustc today. If you look at the (unoptimized) LLVM IR for mem::uninitialized::<bool>()
, this is what you get:
; core::mem::uninitialized
; Function Attrs: inlinehint nonlazybind uwtable
define zeroext i1 @_ZN4core3mem13uninitialized17h6c99c480737239c2E() unnamed_addr #0 !dbg !5 {
start:
%tmp_ret = alloca i8, align 1
%0 = load i8, i8* %tmp_ret, align 1, !dbg !14, !range !15
%1 = trunc i8 %0 to i1, !dbg !14
br label %bb1, !dbg !14
bb1: ; preds = %start
ret i1 %1, !dbg !16
}
; snip
!15 = !{i8 0, i8 2}
Essentially, this function allocates 1 byte on the stack and then loads that byte. However the load is marked with !range
, which tells LLVM that the byte must be between 0 <= x < 2, i.e. it can only be 0 or 1. LLVM will assume that this is true, and behavior is undefined if this constraint is violated.
In summary, the problem isn't so much uninitialized variables themselves, it's the fact that you are copying and moving values that violate their type constraints.
Thank you both for the exposition! It is much clearer now!
I suppose my basic hangup is that I don't fully understand the meaning of "undefined behavior".
This series of blog posts (which has rather interesting/scary examples in the second post) is quite helpful, I think: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
I feel this is really in need of good documentation. The change here is probably a good thing for several reasons I can list and probably others I can't. But correct use of uninitialized memory (and other uses of unsafe) can be remarkably counter-intuitive. The Nomicon has a section on uninitialized
(which presumably would be updated to talk about this type), but it doesn't seem to express the full complexity of the issue.
(Not that I'm volunteering to write such documentation. I nominate... whoever knows more about this than me.)
An interesting idea from https://github.com/rust-lang/rust/issues/55422#issuecomment-433943803: We could turn methods like into_inner
into functions, so that you have to write MaybeUninit::into_inner(foo)
instead of foo.into_inner()
-- that documents much more clearly what is happening.
In https://github.com/rust-lang/rust/pull/58129 I am adding some docs, return an &mut T
from set
and renaming into_inner
to into_initialized
.
I think after this, and once https://github.com/rust-lang/rust/pull/56138 is resolved, we could proceed with stabilizing parts of the API (the constructors, as_ptr
, as_mut_ptr
, set
, into_initialized
).
Why isn't MaybeUninit::zeroed()
a const fn
? (MaybeUninit::uninitialized()
is a const fn
)
EDIT: can it actually be made a const fn
using nightly Rust ?
Why isn't
MaybeUninit::zeroed()
aconst fn
? (MaybeUninit::uninitialized()
is aconst fn
)
@gnzlbg I tried, but it requires one of the following:
init
intrinsic a const fn
. This is doable but Ralf will r- any attempts to do that; orstd::ptr::write_bytes
a const fn
. That requires taking an &mut
in a const
, which currently isn't permitted.The one thing that worries me most about moving to stabilization soon-ish is the total lack of feedback from people who actually use this type. It seems that everyone is waiting for this to be stabilized before they will start using it. That is a problem, because it means we will notice API issues too late.
@rust-lang/libs what is the usual conditions under which you will use a function instead of a method? I am wondering if some of the operations here should be functions so that people have to write, e.g., MaybeUninit::as_ptr(...)
. I am worried this will blow up the code so as to become unreadable -- but OTOH, some functions on ManuallyDrop
did exactly this.
@RalfJung My understanding is that methods are avoided on things that deref to generic parameters, to avoid hiding methods from the user's type -- hence ManuallyDrop::take
.
Since MaybeUninit<T>
will never be Deref<Target = T>
, I think methods are appropriate here.
Ask for feedback and ye shall receive. I used MaybeUninit
to implement new functionality in std
recently.
get_mut
incorrectly, thinking references and raw pointers would be equivalent (fixed in 928efca1). I was already in an unsafe block so I didn't really notice the difference at first.const fn new()
is the same as an array initializer in a C header file. I'm using zeroed
followed by set
to try to make sure “don't care” bits are 0. I don't know if this is correct usage, but it seems to work fine.out.get_mut() as *mut _
!= out.as_mut_ptr()
. Looks really C++ish. I hope it would be fixed somehow.What's the point of get_mut()
?
One thing I was wondering recently was whether MaybeUninit<T>
was guaranteed to have the same layout as T
, and whether something like that could be used to partially initialize values on the heap then turn it into a fully initialized value, e.g. something like (full playground)
struct Foo {
x: i32,
}
let mut partial: Box<MaybeUninit<Foo>> = Box::new(MaybeUninit::uninitialized());
let complete: Box<Foo> = unsafe {
ptr::write(&mut (*partial.as_mut_ptr()).x, 5);
mem::transmute(partial)
};
according to Miri this example works (although, I now realize that I don't know if transmuting boxes of types with identical layout is itself sound).
@Nemo157 why do you need the same memory layout when you have into_inner
?
@Pzixel to avoid copying the value after initialization, imagine it contains a 100MB buffer that will cause a stack overflow if allocated on the stack. Although, writing a testcase it seems this requires an extra API fn uninit_boxed<T>() -> Box<MaybeUninit<T>>
to allow allocating an uninitialized box without touching the stack.
Using box
syntax to allow allocating the uninitialized heap space you can see that transmuting like this works, while attempting to use into_initialized
causes a stack overflow: playground
@Nemo157 Maybe it's better to enforce compiler to optimize copying away? I think it should do it anyway, but there could be an attribute to ensure compiles does it.
@Nemo157
One thing I was wondering recently was whether
MaybeUninit<T>
was guaranteed to have the same layout asT
, and whether something like that could be used to partially initialize values on the heap then turn it into a fully initialized value,
I believe that this is guaranteed, and that you code is valid, with a couple of caveats:
ptr::write_unaligned
.This is also a use case that I'm interested in, as I believe it could be combined with a proc-macro to provide a safe in-place builder abstraction.
@Pzixel If it has the same memory layout then you can avoid copying the whole data structure once you've constructed it. Of course the compiler may elide the copy, and it may not matter for small structures. But it's definitely a nice-to-have.
@nicoburns yes, I see it now. I'm just talking that there may be some attribute, e.g. #[same_layout]
or #[elide_copying]
, or they both, or something else, to make sure it works the same way as transmute
. Or maybe change into_constructed
implementation to avoid extra copying. I'd expect this to be a default behaviour, not just for smart guys who read the docs about layout. I mean I have my code that calls into_constructed
and I get an extra copy, but @Nemo157 just calls transmute
and he's fine. There is no reason why into_constructed
cannot do the same thing.
I'd be very confused if
out.get_mut() as *mut _
!=out.as_mut_ptr()
. Looks really C++ish. I hope it would be fixed somehow.
What's the point of
get_mut()
?
I made a similar point above that get_mut()
and get_ref()
are potentially confusing / make it easy to accidentally invoke undefined behaviour (because they give the illusion of being safer alternatives to as_ptr()
and as_mut_ptr()
, but are in fact less safe than those methods).
I believe they are not in the subset of the API that @RalfJung proposed stabilising (see: https://www.ralfj.de/blog/2019/02/12/all-hands-recap.html)
@RalfJung Regarding your proposal for a ptr::freeze()
method (https://www.ralfj.de/blog/2019/02/12/all-hands-recap.html):
Would it make sense to have a similar method for constructing MaybeUninit
?(MaybeUninit::frozen()
, MaybeUninit::abitrary()
or similar). Intuitively, it seems that such memory would be as performant as truly uninitialized memory for many use cases, without having the cost of writing to the memory like zeroed
. Perhaps it could even be recommended over the uninitialized
constructor unless people are really sure that they need uninitialized memory?
On that note, what are the use cases where you really need "uninitialized" memory rather than "frozen" memory?
@Pzixel
1. I'd be very confused if `out.get_mut() as *mut _` != `out.as_mut_ptr()`. Looks really C++ish. I hope it would be fixed somehow.
Noted. The reason why some people propose this is that it can be useful to declare &mut !
uninhabited (as in, having such a value is UB). However, with MaybeUninit::<!>::uninitiailized().get_mut()
, we have created such a value. That's why as_mut_ptr
is less dangerous -- it avoids creating a reference.
@nicoburns (Note that freeze
is not my idea, I just was part of the discussion and I like the proposal a lot.)
I believe they are _not_ in the subset of the API that @RalfJung proposed stabilising
Correct. And indeed maybe we should just not have them at all.
Would it make sense to have a similar method for constructing
MaybeUninit
?(MaybeUninit::frozen()
,MaybeUninit::abitrary()
or similar).
Yes! I was going to propose adding this once a basic MaybeUninit
is stable and ptr::freeze
has landed.
On that note, what are the use cases where you really need "uninitialized" memory rather than "frozen" memory?
This needs more studying and benchmarking, the expectation is that it might cost performance because LLVM will not do optimizations that it could otherwise do.
(I will come back to the other comments as well, once I have time.)
@Pzixel being able to construct objects directly into pre-allocated memory is non-trivial, Rust had two RFCs accepted to implement such a thing (over 4 years ago!), but they have since been un-accepted and most of the implementation removed (except the box
syntax I used above). If you want more details the i.rl.o thread about the removal would be the best place to start.
As @nicoburns mentions MaybeUninit
could potentially be used as a building block for a less-ergonomic library-based solution to the same issue, very useful as a way to start experimenting with the concept and seeing what sort of APIs it allows building. That just depends on whether MaybeUninit
can provide the guarantees required for building such a solution.
@Nemo157 I only suggest to use it in one place, nothing to deal with non-trivial generic case.
@jethrogb Thanks a lot! So it seems the API works fine for you right now?
2. In sys/sgx/rwlock.rs, I'm using it to make sure the bit pattern of a
const fn new()
is the same as an array initializer in a C header file.
Woah, that's crazy.^^ But I guess it should work, it is a const fn
with no arguments after all so it should always return the same thing...
One thing I was wondering recently was whether
MaybeUninit<T>
was guaranteed to have the same layout asT
, and whether something like that could be used to partially initialize values on the heap then turn it into a fully initialized value
On the list of things we should add eventually is something like
fn into_initialized_box(Box<MaybeUninit<T>>) -> Box<T>
that transmutes the Box
.
But yes, I think we should allow such transmutes. Is there precedent for saying in the docs "you may transmute this the following way"? I think usually we prefer to add helper methods instead of people doing their own transmutes.
- Depending on the type you are using (and notably in generic code), you may need
ptr::write_unaligned
.
In generic code you cannot access fields. I think if you can access fields you usually know if the struct is packed, and if it is not then ptr::write
is good enough. (Don't use assignment though because that might drop! I keep forgetting that...)
Although, writing a testcase it seems this requires an extra API
fn uninit_boxed<T>() -> Box<MaybeUninit<T>>
to allow allocating an uninitialized box without touching the stack.
That's a bug, but since that bug might be hard to fix it might also be a good idea to offer a separate constructor for this. Not sure how to implement it, though. And then we probably also want something like zeroed_box
that avoids zeroeing a stack slot and then memcpying, and so on... I don't like all this duplication. :/
So I'd propose that after/in parallel to the initial stabilization, some folks that have usecases for uninitialized memor on the heap (basically, mixing Box
and MaybeUninit
) get together and design the minimal possible API extension for that. @eddyb also expressed interest in this. That's not really related to just deprecating mem::uninitialized
any more, so I think that should get its own place for discussion, outside of this (way-too-big-already) tracking issues.
My own bit of feedback: I'm generally happy with MaybeUninit<T>
. I don't have any big complaints. It is less of a footgun than mem::uninitialized
, which is nice. The const
new
and uninitialized
methods are nice. I wish more of the methods were const, but as I understand it many of them require more progress to be made on const fn
in general before they can be made const
.
I'd like a stronger guarantee than "same layout" for T
and MaybeUninit<T>
. I'd like them be ABI-compatible (effectively, #[repr(transparent)]
, though I know that attribute can't be applied to unions) and FFI-safe (i.e., if T
is FFI-safe, then MaybeUninit<T>
should be FFI-safe too). (Tangentially, I wish we could use #[repr(transparent)]
on unions that have only one positively sized field (like we can for structs))
I actually rely on the ABI of MaybeUninit<T>
in my project to assist with an optimization (but not in an unsafe way, so don't panic). I'm happy go into details if anyone is interested, but I'll keep this comment brief and omit the details for now.
@mjbshaw Thanks!
I wish we could use
#[repr(transparent)]
on unions that have only one positively sized field (like we can for structs).
Once that attribute exists, adding it to MaybeUninit
would be a no-brainer. And in fact the logic for this has already been implemented in rustc (MaybeUninit<T>
de-facto is ABI-compatible with T
, but we do not guarantee that.)
All it takes is for someone to write an RFC and see it through, and add some checks that make sure repr(transparent)
unions only have one non-ZST field. Would you like to give that a try? :D
All it takes is for someone to write an RFC and see it through, and add some checks that make sure
repr(transparent)
unions only have one non-ZST field. Would you like to give that a try? :D
@RalfJung Ask and ye shall receive!
Cc https://github.com/rust-lang/rust/pull/58468
This leaves only the API I think we can reasonably stabilize in maybe_uninit
, and moves the rest into separate feature gates.
Okay, the preparatory PRs all landed, and into_inner
is gone as well.
However, I'd really like for https://github.com/rust-lang/rfcs/pull/2582 to be accepted before stabilizing, otherwise we do not even have a way to initialize a struct field-by-field -- and that seems like a prime use-case for MaybeUninit
. We're very close to having all the necessary boxes for FCP to start.
I just converted my code to use MaybeUninit
. There are quite a few places where I could have used a take
method which works on &mut self
rather than self
. I am currently using x.as_ptr().read()
but I feel that x.take()
or x.take_initialized()
would be much clearer.
@Amanieu This feels very similar to the existing into_inner
method. Maybe we can try to avoid duplication here?
The method take
of Option
have other semantic. x.as_ptr().read()
not change inner value of x, but Option::take
try replace value. It might be misleading for me.
@qwerty19106 x.as_ptr().read()
on a MaybeUninit
_semantically_ takes the value out and leaves the wrapper uninitialised again, it just happens that the uninitialised value left behind has the same bit-pattern as the value that was taken out.
I am currently using
x.as_ptr().read()
but I feel thatx.take()
orx.take_initialized()
would be much clearer.
I find that curious, could you explain why?
In my view, a take
-like method is somewhat misleading because unlike both take
and into_initialized
, it does not protect against take-ing twice. Actually, for Copy
types (and in fact for Copy
values such as None as Option<Box<T>>
), take-ing twice is completely fine! So, the analogy with take
doesn't really hold, from my perspective.
We could call it read_initialized()
, but at that point I am seriously wondering if that is indeed any clearer than as_ptr().read()
.
x.as_ptr().read()
on aMaybeUninit
_semantically_ takes the vale out and leaves the wrapper uninitialised again, it just happens that the uninitialised value left behind has the same bit-pattern as the value that was taken out.
MaybeUninit
doesn't really have a useful semantic invariant, so I am not sure I fully agree with that statement. TBH I am not convinced that it is useful to consider the operations on MaybeUninit
in any other way than just their raw operational effect.
@RalfJung hmm, maybe "semantically" is the wrong word here. In terms of how a user should use the type you should assume the value is uninitialized again after you read out of it (unless you concretely know that the type is Copy
).
If you only ever look at the raw operational effect you get weird interactions like this where you can violate safety invariants of other unsafe APIs without technically reading uninitialized memory. (I was sort of hoping that Miri would still track 0 length reads of uninitialized memory, but it doesn't appear so).
@RalfJung In all of my cases, this involves a static mut
into which a value is placed, and later taken out. Since I cannot consume a static, I cannot use into_uninitialized
.
@Amanieu what I was asking is, why do you think x.take_initialized()
is clearer than x.as_ptr().read()
?
@Nemo157
I was sort of hoping that Miri would still track 0 length reads of uninitialized memory, but it doesn't appear so
A 0-length read of uninitialized memory is never UB, so why would Miri care about them?
If you only ever look at the raw operational effect you get weird interactions like this where you can violate safety invariants of other unsafe APIs without technically reading uninitialized memory.
Sure, you can violate safety invariants without ever reading uninitialized memory. You can also just use MaybeUninit::zeroed().into_initialized()
for that. I don't see the problem.
The "weird interaction" here is that you created two values of a type that you had no right to create. This is all about the safety invariant of Spartacus
, and has nothing to do with validity invariants.
This is why I think read_initialized()
conveys better what happens: We read the data out, and we claim it is properly initialized (which includes making sure we are actually allowed to create this value at that type). This has no effect on the bit pattern still stored in MaybeUninit
.
@RalfJung I am essentially treating MaybeUninit
as an Option
, but without the tag. In fact, I was previously using the untagged-option crate for exactly this purpose, and it has a take
method to extract the value from the union.
@Amanieu @shepmaster I added a read_initialized
in https://github.com/rust-lang/rust/pull/58660. I still think that is a better name than take_initialized
. Does this meet your needs?
That PR also adds examples to some of the other methods, feedback welcome!
I am happy with read_initialized
.
While I was at it, I also made MaybeUninit<T>: Copy
if T: Copy
. Seems no good reason not to do that.
Hm, maybe get_initialized
would be a better name? It kinds of complements set
, after all.
Or maybe set
should be renamed to write
? That would also achieve consistency.
I've been converting my code to use MaybeUninit
and have found that working with uninitialized slices is very unergonomic. I think that this could be improved if we had functions for the following:
&mut [T]
to &mut [MaybeUninit<T>]
. This effectively allows &out
parameters to be emulated using &mut [MaybeUninit<T>]
, which is useful e.g. for read
.&mut [MaybeUninit<T>]
to &mut [T]
(and the same for &[T]
), to be used once we have called .set
on every element of the slice.The APIs that I have look something like this:
// The returned slice is truncated to the number of elements actually read.
fn read<T>(out: &mut [MaybeUninit<T>]) -> Result<&mut [T]>;
I agree that working with slices is unergonomic currently, and that is why I added first_ptr
and first_ptr_mut
. But that's probably far from the best API.
However, I'd prefer if we could concentrate on getting the "core API" shipped first, and then look at the interaction with slices (and with Box
).
I like the idea of renaming set
to write
, providing consistency with ptr::write
.
In the same vein, is read_initialized
really better than just read
? If the concern is over accidental usage that becomes hidden, perhaps make it a function instead of a method, i.e. MaybeUninit::read(&mut v)
? The same could be done for write
, i.e. MaybeUninit::write(&mut v)
for consistency. The tradeoff in both cases is between usability and explicitness, and if explicitness is considered better in one case, I don't see why it would be different in the other.
Regardless, until these API's are hammered out, I strongly support stabilizing with a bare minimum API, i.e. new
, uninitialized
, zeroed
, as_ptr
, as_mut_ptr
, and maybe get_ref
and get_mut
.
and maybe
get_ref
andget_mut
.
These should only be stabilized once we resolved https://github.com/rust-rfcs/unsafe-code-guidelines/issues/77, and that seems like it could take a while...
stabilizing with a bare minimum API, i.e.
new
,uninitialized
,zeroed
,as_ptr
,as_mut_ptr
My plan was for into_initialized
, set
/write
, and read_initialized
to be part of that minimal set. But maybe it shouldn't be? set
/write
and read_initialized
can easily be implemented with the rest, so I am now also leaning towards not stabilizing them in the first batch. But having something like into_initialized
from the start is desirable, IMO.
perhaps make it a function instead of a method, i.e.
MaybeUninit::read(&mut v)
? The same could be done forwrite
, i.e.MaybeUninit::write(&mut v)
for consistency.
From what was discussed here before, we only use the explicit function approach to avoid trouble with Deref
instances. I don't think we should introduce precedence for another reason to use a function instead of a method.
is
read_initialized
really better than justread
?
Good question! I don't know. This was for symmetry with into_initialized
. But into_inner
is a common method where one might lose the overview what type it is called on, read
is much less common. And maybe it should just be initialized
instead of into_initialized
? So many options...
From what was discussed here before, we only use the explicit function approach to avoid trouble with
Deref
instances. I don't think we should introduce precedence for another reason to use a function instead of a method.
Except ptr::read
and ptr::write
are functions, not methods. So the precedence is already set in favor of MaybeUninit::read
and MaybeUninit::write
.
Edit: Okay, apparently there are read
and write
methods on pointers, too... Never noticed those before... But they consume the pointer, which doesn't really make sense for MaybeUninit
.
So many options...
Agreed. Until there's a lot more bike-shedding on the other methods, I think only new
, uninitialized
, zeroed
, as_ptr
, as_mut_ptr
are really ready for stabilization.
Except
ptr::read
andptr::write
are functions, not methods. So the precedence is already set
They are not part of a data structure, of course they are free-standing functions. And as you remark, they nowadays exist as methods, too.
But they consume the pointer
Raw pointers are Copy
, so nothing really gets consumed.
Raw pointers are
Copy
, so nothing really gets consumed.
Good point...
Well, v.as_ptr().read()
is already pretty concise and clear. The as_ptr
followed by read
should make it stand out as something to think about carefully, much more so than into_initialized
does. Personally, I'm in favor of only exposing as_ptr
and as_mut_ptr
, at least for now. And, new
, uninitialized
, and zeroed
, of course.
@Amanieu What about something more like what Cell
has, where there are safe conversions for &mut MaybeUninit<[T]>
to and from &mut [MaybeUninit<T>]
?
That would allow the following, which seems pretty natural to me:
fn read<T>(out: &mut MaybeUninit<[T]>) -> Result<&mut [T]> {
let split = out.as_mut_slice_of_uninit();
// ... operate on split ...
return Some(unsafe { split[0..n].as_uninit_mut_slice().get_mut() })
}
It also feels like it more accurately represents the semantics to the caller. The function taking an &mut [MaybeUninit<T>]
would feel, to me, like it might have some distinguishing logic for which ones are okay and which ones aren't. Taking &mut MaybeUninit<[T]>
, on the other hand expresses that it's not going to be distinguishing between the cells when it comes to what data is already in them.
(The names of the methods are, of course, subject to bikeshedding - I just mimicked what Cell
does.)
@eternaleye MaybeUninit<[T]>
is not a valid type because unions can't be DSTs.
Mm, right
Until there's a lot more bike-shedding on the other methods, I think only
new
,uninitialized
,zeroed
,as_ptr
,as_mut_ptr
are really ready for stabilization.
Well, I think we should accept this RFC before stabilizing anything -- otherwise we do not even have a sanctioned way to initialize a struct field-by-field, which seems like the bare minimum.
So while we wait for the experiments, we can bikeshed a bit about the names for what is currently called set
, read_initialized
and into_initialized
. The following renames have been suggested:
set
-> write
. The best metaphor for .as_ptr().read()
seems to be "read", not "get", but then the complement (.as_ptr_mut().write()
) should be "write", not "set".read_initialized
-> read
. Nicely matches write
, but is unsafe. Is that (plus documentation) enough of a warning that you must manually make sure that data is already initialized? There was a lot of agreement that an unsafe into_inner
is not enough, which is why I renamed it to into_initialized
.into_initialized
-> initialized
. If we have both read_initialized
and into_initialized
, that has a nice consistency to it IMO -- but if it's read
, then into_initialized
sticks out a bit. The method name is quite long. Still, most consuming operations are called into_*
, from what I know.Any objections for (1)? And I am mostly leaning against (3). For (2) I am undecided: read
is easier to type, but read_initialized
IMO works better when reading such code -- and code is read and reviewed more often than written. It seems nice to call out the place where we actually assume things to be initialized.
Thoughts, opinions?
Well, I think we should accept this RFC before stabilizing anything -- otherwise we do not even have a sanctioned way to initialize a struct field-by-field, which seems like the bare minimum.
Is this where I put a plug in for offset_of!
? :)
Note that read_initialized
is a strict superset of into_initialized
(takes &self
instead of self
). Does it make much sense to support both?
Is this where I put a plug in for
offset_of!
? :)
If you can get that stabilized before my RFC gets accepted, sure. ;)
Does it make much sense to support both?
IMO yes. into_initialized
is safer as it prevents using the same value twice, and hence it should be preferred over read_initialized
whenever possible.
So @nikomatsakis kind of made this point before but did not made it a hard blocker.
I just ported a lot of code to use MaybeUninit<T>
and into_initialized
and I find it unnecessarily verbose. The code is already a lot more verbose than it was before where it was "incorrectly" usingmem::uninitialized
.
I think MaybeUninit<T>
should be just called Uninit<T>
, because for all practical purposes, if you get an unknown MaybeUninit<T>
you have to assume that it is uninitialized, so Uninit<T>
would summarize that correctly. Also, into_uninitialized
should only be into_init()
or similar for consistency reasons.
We could also call the type Uninitialized<T>
and the method into_initialized
, but using an abbreviation for the type and the long form for the method or vice-versa is a painful inconsistency. Ideally I should just need to remember that "Rust APIs use abbreviations / long forms" and that's it.
Because abbreviations can be ambiguous to different people, I prefer to just use long forms everywhere and call it a day. But using a mixture is IMO the worst of both worlds. Rust tends to use abbreviations more often than longer forms, so I'd have nothing against Uninit<T>
as an abbreviation and .into_init()
as another abbreviation for the method.
I don't like into_initialized()
, because it sounds like there's a transformation going on to initialize the value. I much prefer take_initialized()
. I realize the type signature departs from other take
methods, but I think it is much clearer, semantically, and I believe semantic clarity should supersede borrow/move consistency. Other alternatives that don't already have a precedence of being mutable borrows could be move_initialized
or consume_initialized
.
As for set()
vs write()
, I strongly favor write()
in order to invoke the similarity to as_ptr().write()
, for which it would be an alias.
And finally, if there's going to be a take_initialized()
or similar, then I favor read_initialized()
over read()
due to the explicitness of the former.
Edit: but to clarify, I think sticking with as_ptr().write()
and as_ptr().read()
is even more clear and more likely to trigger the DANGER DANGER mental circuits.
@gnzlbg we had an FCP for the name of the type, I am not sure if we should re-open that discussion.
However, I do like the proposal of using "init" consistently, as in MaybeUninit::uninit()
and x.into_init()
.
I don't like
into_initialized()
, because it sounds like there's a transformation going on to initialize the value.
into
methods frequently don't actually do any transformation other than viewing the same (owned) data at a particular type -- see for example the various into_vec
methods.
I'm fine with a take_initialized(&mut self)
(in addition to an into_init), but I think that it should revert the internal state back to undef
.
revert the internal state back
https://github.com/rust-lang/rust/issues/53491#issuecomment-437811282
this shouldn't change the content of
self
at all. Just ownership is transferred away so it is now effectively in the same state as when constructed uninitialized.
Many of these things have already been discussed in the 200+ hidden comments.
Many of these things have already been discussed in the 200+ hidden comments.
I've been following the discussion for a while, and I might be mistaken, but don't think this point has been brought up before. In particular, the comment you quote doesn't suggest to "revert the internal state back to undef
", but making it equivalent to ptr::read
(which is to leave the internal state unchanged). What I'm suggesting is the conceptual equivalent of mem::replace(self, MaybeUninit::uninitialized())
.
the conceptual equivalent of
mem::replace(self, MaybeUninit::uninitialized())
.
Because of the meaning of undef
, that's equivalent to read
: https://rust.godbolt.org/z/e0-Gyu
@scottmcm no it is not. With read
, the following is legal:
let mut x = MaybeUninit::<u32>::uninitialized();
x.set(13);
let x1 = unsafe { x.read_initialized() };
// `u32` is `Copy`, so we may read multiple times.
let x2 = unsafe { x.read_initialized() };
assert_eq!(x1, x2);
With the proposed take
, this would be illegal since x2
would be undef
.
Just because two functions generate the same assembly does not mean they are equivalent.
However, I see no benefit from overwriting the contents with undef
. It just introduces more ways for people to shoot themselves into the foot. @jethrogb you haven't given any motivation, could you explain why you think this is a good idea?
I'm fine with a
take_initialized(&mut self)
(in addition to an into_init), but I think that it should revert the internal state back toundef
.
I was proposing take_initialized(self)
instead of into_initialized(self)
, because I believe the former name more accurately describes the operation. Again, I understand that the take
typically takes a &mut self
and into
typically takes a self
, but I believe semantically accurate naming is more important than consistently typed naming. Perhaps a different name should be used, though, such as move_initialized
or transmute_initialized
.
And, again, as for v.write()
and v.read_initialized()
, I don't see any positive value over v.as_ptr().write()
and v.as_ptr().read()
. The latter two seem less likely to be misused.
And, again, as for
v.write()
andv.read_initialized()
, I don't see any positive value overv.as_ptr().write()
andv.as_ptr().read()
. The latter two seem less likely to be misused.
v.write()
(or v.set()
or whatever it is we're calling it these days) is safe. v.as_ptr().write()
requires an unsafe
block, which is kind of annoying. Though I agree about v.read_init()
vs v.as_ptr().read()
. v.read_init()
seems superfluous.
I was proposing take_initialized(self) instead of into_initialized(self), because I believe the former name more accurately describes the operation. Again, I understand that the take typically takes a &mut self and into typically takes a self, but I believe semantically accurate naming is more important than consistently typed naming.
I strongly feel that into_init(ialized)
also semantically is more accurate here -- it consumes the MaybeUninit
, after all.
@mjbshaw Ah, yes, so it is. I did not notice that... Okay, well, in that case I revoke all my previous comments about set
/write
. Maybe set
makes more sense; Cell
and Pin
already define set
methods. The primary difference would be that MaybeUninit::set
would not drop any values previously stored; perhaps that's still closer to write
... I don't know. Either way, the documentation is pretty clear.
@RalfJung Okay, forget take...
then. What about a new name, such as move...
, consume...
, or transmute...
or something? I think into_init(ialized)
is too confusing; too me, it implies the value is being initialized, when really we're implicitly asserting that it already was initialized.
when really we're implicitly asserting that it already was initialized.
I think it is worth it to call out again that the only thing that into_init
asserts is that the value satisfies the _validity invariant_ of T
, which is not to be confused with T
being "initialized" in any general sense of the word.
For example:
pub mod foo {
pub struct AlwaysTrue(bool);
impl AlwaysTrue {
pub fn new() -> Self { Self(true) }
/// It is impossible to initialize `AlwaysTrue` to false
/// and unsafe code can rely on `is_true` working properly:
pub fn is_true(x: bool) -> bool { x == self.0 }
}
}
pub unsafe fn improperly_initialized() -> foo::AlwaysTrue {
let mut v: MaybeUninit<foo::AlwaysTrue> = MaybeUninit::uninitialized();
// let v = v.into_init(); // UB: v is invalid
*(v.as_mut_ptr() as *mut u8) = 3; // OK
// let v = v.inti_init(); // UB v is invalid
*(v.as_mut_ptr() as *mut bool) = false; // OK
let v = v.into_init(); // OK: v is valid, even though AlwaysTrue is false
v
}
Here the return value of improperly_initialized
is "initialized" in the sense that it satisfies the _validity invariant_ of T
, but not in the sense that it satisfies the _safety invariant_ of T
, and the distinction is subtle but important, because in this case this distinction is what requires improperly_initialized
to be declared as an unsafe fn
.
When most users talk about something being "initialized", they typically do not have the "valid but MaybeUnsafe" semantics of MaybeUninit::into_init
.
If we wanted to be excruciatingly verbose about these, we could have Invalid<T>
and Unsafe<T>
, have Invalid<T>::into_valid() -> Unsafe<T>
, and require users to write uninit.into_valid().into_safe()
. Then above improperly_initialized
would return Unsafe<T>
, and only after the user properly sets the value of AlwaysTrue
to true
can they actually obtain the safe T:
// note: this is now a safe fn
fn improperly_uninitialized() -> Unsafe<foo::AlwaysTrue>;
fn initialized() -> foo::AlwaysTrue {
let mut v: Unsafe<foo::AlwaysTrue> = improperly_uninitialized();
unsafe { v.as_mut_ptr() as *mut bool } = true;
unsafe { v.into_safe() }
}
Note that this allows improperly_uninitialized
to become a safe fn
, because now the invariant that the AlwaysTrue
is not safe is not encoded in "comments" around the function, but in the types.
I don't know whether the painfully excruciating approach is worth pursuing. MaybeUninit
goal is to compromise, to allow users to handle uninitialized and invalid memory, but without putting these distinctions in the users face. I personally think that we can't expect users to know these distinctions unless we put them explicitly on their faces, and one has to know this distinction to be able to use MaybeUninit
correctly. Otherwise people might write fn improperly_uninitialized() -> AlwaysTrue
as a safe fn
, and just return an unsafe AlwaysTrue
because well, they "initialized" it.
One thing one could also do with Invalid<T>
and Unsafe<T>
is have two traits, ValidityCheckeable
and UnsafeCheckeable
, with two methods, ValidityCheckeable::is_valid(Invalid<Self>)
and UnsafeCheckeable::is_safe(Unsafe<Self>)
, and have the Invalid::into_valid
and Unsafe::into_safe
methods assert_validity!
and assert_safety!
on them.
Instead of writing the safety invariant in a comment, you could just write the code for the check.
I think it is worth it to call out again that the only thing that into_init asserts is that the value satisfies the validity invariant of T, which is not to be confused with T being "initialized" in any general sense of the word.
This is correct. OTOH, I feel "initialized" is a reasonable proxy for this in a first explanation.
Otherwise people might write fn improperly_uninitialized() -> AlwaysTrue as a safe fn, and just return an unsafe AlwaysTrue because well, they "initialized" it.
I feel we can make a reasonable point that this is not properly "initialized". I agree that we need a proper documentation of how these two invariants interact somewhere (and I am not sure what the best place would be), but I also think that most people's intuition will say that improperly_uninitialized
is not an okay function to export. "Breaking other people's invariants" is a concept that, I think, arises naturally when you think about "all the safe functions I am exporting must be such that safe code cannot use them to wreck havoc".
One thing one could also do with Invalid
and Unsafe is have two traits, ValidityCheckeable and UnsafeCheckeable, with two methods, ValidityCheckeable::is_valid(Invalid ) and UnsafeCheckeable::is_safe(Unsafe ), and have the Invalid::into_valid and Unsafe::into_safe methods assert_validity! and assert_safety! on them.
In the vast majority of cases, the safety invariant will not be checkable. Even the validity invariant is likely not checkable for references. (Well this depends a bit on how we factor things.)
@scottjmaddox
What about a new name, such as move..., consume..., or transmute... or something? I think into_init(ialized) is too confusing; too me, it implies the value is being initialized, when really we're implicitly asserting that it already was initialized.
How does move_init
convey an "assertion" more than into_init
does?
assert_init(italized)
has been previously suggested.
However, notice that read
or read_initialized
or as_ptr().read
also don't really say anything about asserting anything.
If we wanted to be excruciatingly verbose about these, we could have
Invalid<T>
andUnsafe<T>
, haveInvalid<T>::into_valid() -> Unsafe<T>
, and require users to writeuninit.into_valid().into_safe()
. Then aboveimproperly_initialized
would returnUnsafe<T>
, and only after the user properly sets the value ofAlwaysTrue
totrue
can they actually obtain the safe T:
@gnzlbg Hey that's pretty nifty. I like that this throws the distinction into user's faces in an unavoidable way. It's probably a good teaching moment wrt. "validity" and "safety" that will make folks think twice? uninit.into_valid().into_safe()
isn't that verbose anyways as compared to uninit.assume_initialized()
or whatnot. Of course to make this distinction we'll need to find agreement around the model in the first place. 😅 I feel we should investigate this model some more.
assert_init(italized)
has been previously suggested.
@RalfJung We also have assume_initialized
due to @eternaleye (I think). See https://github.com/rust-lang/rust/issues/53491#issuecomment-440730699 with a list of justifications which are pretty compelling.
TBH I feel like having two types is way too verbose.
@RalfJung Can we dig deeper into that? possibly with some comparisons of examples that you think show the high degree of verbosity?
Hmm... if we are considering more verbose APIs, then
uninit.into_inner(uninit.assert_initialized());
could work quite well semantically. The first method returns a token which records your assertion. The second method returns the inner type, but requires you to have asserted it's valid.
I'm not entirely convinced this is worth the extra effort though, as the abstraction might just make people more confused and thus likely to make mistakes.
We also have assume_initialized due to @eternaleye (I think). See #53491 (comment) with a list of justifications which are pretty compelling.
Fair. assume_initialized
sounds good to me.
Or maybe it's assume_init
? That should likely be consistent with the constructor, MaybeUninit::uninit()
vs MaybeUninit::uninitialized()
-- and that one is slated to be stabilized with the first batch, so we should make that call soon.
@nicoburns I don't see the benefit we'd gain from adding an indirection through a token here.
Can we dig deeper into that? possibly with some comparisons of examples that you think show the high degree of verbosity?
Well, it is clear that it is more verbose than "just" MaybeUninit
, right? There's a lot of additional mental burden (having to understand two types), there is the double-unwrapping, and it means I have to choose which type to use. So there's some additional cost here that I feel you need to justify.
I actually generally doubt the usefulness of Unsafe
. From the compiler perspective, it would be entirely a NOP; the compiler never assumes that your data satisfies the safety invariant. From a library implementation perspective, I highly doubt that code readability will improve if, in the Vec
implementation, we transmute things to Unsafe<Vec<T>>
whenever we temporarily violate the safety invariant. And from a teaching perspective, I doubt that anybody will be surprised when they create a Vec<T>
that is valid but unsafe, give that to some safe code, and then everything explodes.
Contrast this with MaybeUninit
which is needed from a compiler perspective, and where the fact that you even need to be careful about "bad" bool
in your own private code might come as a surprise to some.
Given its significant cost, I think Unsafe
needs way stronger motivation. I don't see how it'd actually help to prevent bugs or improve code readability.
I can see the arguments for renaming MaybeUninit
to MaybeInvalid
. However, "invalid" is extremely vague (invalid for what?), I have seen people confused by my distinction between "valid" and "safe" -- one might assume that a "valid Vec
" is valid for any kind of usage. "uninitialized" at least triggers basically the right associations for most people. Maybe we should rename "validity invariant" to "initialization invariant" or so?
Additionally, the mere presence of Unsafe<T>
can be misleading (by wrongly implying that all values not wrapped in it are safe) unless we adopt a strong widespread convention against having unsafe values outside of this wrapper. This would be a large project, requiring another RFC and broader community consensus. I expect it to be somewhat controversial (@RalfJung gave some good reasons against it above), and with weaker arguments on its side than MaybeUninit
since there's no UB involved -- it is essentially a style question. As such, I am skeptical whether such a convention will ever be universal in the Rust community even if an RFC is accepted and the standard library and docs are updated.
So IMO anyone who wants to see that convention happen has bigger fish to fry than bikeshedding the MaybeUninit
API, and I would suggest not delaying its stabilization further to wait on the resolution of that process. If we stabilize MaybeUninit<T> -> T
conversions, future Rust generations could still write MaybeUninit<Unsafe<T>>
to indicate data that is first uninitialized, and then possibly still not safe after being initialized.
@RalfJung
Or maybe it's
assume_init
? That should likely be consistent with the constructor,MaybeUninit::uninit()
vsMaybeUninit::uninitialized()
-- and _that_ one is slated to be stabilized with the first batch, so we should make that call soon.
If we can have 3-way consistency with the type, constructor, and the -> T
function that would be even better. As the type doesn't have the -ialized
suffix I think ::uninit()
and .assume_init()
is probably the way to go.
Well, it is clear that it is more verbose than "just"
MaybeUninit
, right?
Depends... I think foo.assume_init().assume_safe()
(or foo.init().safe()
if one is inclined to be brief) isn't all that longer. We can also offer the combination as foo.assume_init_safe()
if necessary. The combination still has the advantage that it spells out the two assumptions.
There's a lot of additional mental burden (having to understand two types), there is the double-unwrapping, and it means I have to choose which type to use. So there's some additional cost here that I feel you need to justify.
Hopefully the complexity comes from having to understand the underlying concepts behind validity and safety. Once that is done I don't think there's much additional mental complexity. I feel the underlying concepts are important to convey tho.
I actually generally doubt the usefulness of
Unsafe
. From the compiler perspective, it would be entirely a NOP; the compiler never assumes that your data satisfies the safety invariant.
Sure; I agree that from a compiler POV it's useless. Any usefulness from the distinction is as a sort of "session types" interface.
Given its significant cost, I think
Unsafe
needs way stronger motivation. I don't see how it'd actually help to prevent bugs or improve code readability.
The aspect that caught my eye was the teachability one. I do think mistakes are bound to happen when people think that .assume_init()
means that "OK; I have checked the validity invariant and now I have a good T
". The current scheme of MaybeUninit<T>
is sort of unhelpful in this way. I'm not however married to Unsafe<T>
and Invalid<T>
as names. I merely think that the separation into two types, whatever their names, can be helpful educationally. Perhaps there are other ways such as beefing up the documentation that can compensate for this within the current framework?
I _can_ see the arguments for renaming
MaybeUninit
toMaybeInvalid
. However, "invalid" is extremely vague (invalid for _what_?), I have seen people confused by my distinction between "valid" and "safe" -- one might assume that a "validVec
" is valid for any kind of usage. "uninitialized" at least triggers basically the right associations for most people. Maybe we should rename "validity invariant" to "initialization invariant" or so?
I definitely agree with "validity" and "safety" being confusing due to the way "valid" sounds. I've been partial to "machine invariant" as a replacement for "validity" and "type system invariant" for "safety".
@rkruppe
So IMO anyone who wants to see that convention happen has bigger fish to fry than bikeshedding the
MaybeUninit
API, and I would suggest not delaying its stabilization further to wait on the resolution of that process. If we stabilizeMaybeUninit<T> -> T
conversions, future Rust generations could still writeMaybeUninit<Unsafe<T>>
to indicate data that is first uninitialized, and then possibly still not safe after being initialized.
Good points, especially re. MaybeUninit<Unsafe<T>>
; You could probably also add some type alias to make the type name less verbose.
If we can have 3-way consistency with the type, constructor, and the -> T function that would be even better. As the type doesn't have the -ialized suffix I think ::uninit() and .assume_init() is probably the way to go.
Agreed. I am a bit sad about losing the into
prefix, but see no good way to retain it.
So what about read
/read_init
then? Is the similarity with ptr::read
enough to trigger "you better make sure this is actually initialized"? Does read_init
have issue similar to into_init
, where it sounds like it makes it initialized instead of having that as an assumption? Should maybe assume_init
be like read
is now?
Hopefully the complexity comes from having to understand the underlying concepts behind validity and safety. Once that is done I don't think there's much additional mental complexity. I feel the underlying concepts are important to convey tho.
Could you give a code example if something in Vec
properly using this to reflect when Vec
's invariants are violated? I think it'd be extremely verbose and entirely obscure what actually happens.
I think adding a type like this is the wrong way to convey the underlying concept.
I do think mistakes are bound to happen when people think that .assume_init() means that "OK; I have checked the validity invariant and now I have a good T".
I find it highly unlikely that anybody will be like "I have initialized this Vec<i32>
by writing it full of 0xFF
, now it is initialized, that means I can push to it". I would like to see at least an indication, better solid data, that this is actually a mistake people make.
In my experience, people have a pretty solid intuition that when they hand out data to unknown code, or call library operations on some data, then the library invariants need to be upheld.
Things have calmed down a bit here. So what about the following plan:
MaybeUninit::uninitialized
and rename it to MaybeUninit::uninit
.MaybeUninit::{new, uninit, zeroed, as_ptr, as_mut_ptr}
.This leaves open the question around set
/write
, into_init[ialized]
/assume_init[ialized]
and read[_init[italized]]
. Currently, I lean towards assume_init
, write
and read
, but I have changed my mind about this before. Unfortunately I do not have a good idea how to come to a decision here.
- Once that has landed
Does that mean that there will be a period where there is no avenue to create an uninitialized value without either (a) a deprecation warning or (b) using unstable features? That is not a sustainable practice.
When deprecating something that we don't plan on effectively removing, a stable replacement needs to be available whenever the deprecation warning is added. Otherwise, people will just add an annotation to ignore the warning and move on with their lives.
Does that mean that there will be a period where there is no avenue to create an uninitialized value without either (a) a deprecation warning or (b) using unstable features?
I am confused. I am proposing to deprecate an unstable method and introduce another unstable method instead.
Notice that I was talking about MaybeUninit::uninitialized
, not mem::uninitialized
.
Unfortunately I do not have a good idea how to come to a decision here.
@RalfJung Just do it (and r? me if you like) as you did before with the other renaming PRs and if anyone objects we can deal with that in FCP. :)
Just do it (and r? me if you like) as you did before with the other renaming PRs and if anyone objects we can deal with that in FCP. :)
Well, I am going to wait a bit because these don't have to be part of the initial stabilization.
deprecate an unstable method and introduce another unstable method instead
Ah, gotcha. Carry on then.
All right, doing the renames in https://github.com/rust-lang/rust/pull/59284:
uninitialized -> uninit
into_initialized -> assume_init
read_initialized -> read
set -> write
I like the newly proposed names. I'm a little worried about read
being misused, but that seems much less likely than into_initialized
being misused, primarily because of the association with ptr::read
. Overall, I think the new naming is totally acceptable for stabilization.
I prepare a PR to stabilize MaybeUninit::{new, uninit, zeroed, as_ptr, as_mut_ptr}.
Any chance this could make it into 1.35-beta (due in ~2 weeks)?
I am a bit conflicted about pushing this given how entirely up in the air https://github.com/rust-lang/rfcs/pull/2582 still is. :/ Without that RFC, gradual initialization of a struct is still not possible, but people will do it anyway.
OTOH, MaybeUninit
has waited long enough. And it's not like the code for gradual initialization that people write currently is better than what they'd write with MaybeUninit
.
That said, https://github.com/rust-lang/rust/pull/59284 hasn't even landed yet, so we'd have to rush to get this into 1.35. TBH I'd prefer to wait one more cycle so people get at least some time to play with the new method names and see how they feel.
Is there any chance that the constructing functions on MaybeInit
could be const
?
init
and new
are const
. zeroed
is not, we need some extensions to what const functions can do before it can be const
.
I wanted to provide some feedback on MaybeUninit
, the actual code changes can be seen here https://github.com/Thomasdezeeuw/mio-st/pull/71. Overall my (limited) experience with the API was positive.
The only small issue I encountered was that returning &mut T
in MaybeUninit::set
leads to having to use let _ = ...
(https://github.com/Thomasdezeeuw/mio-st/pull/71/files#diff-1b9651542d08c6eca04e6025b1c6fd53R116), which is a bit awkward but not a huge problem.
I also have to addition APIs I would like when working with unitialised arrays, often in combination with C.
&mut [MaybeUninit<T>]
to &mut [T]
would be nice, the user must ensure that all values in the slice a properly initialiseduninitialized_array
, would be really nice addition as well.I wanted to provide some feedback on MaybeUninit
Thanks a lot!
returning &mut T in MaybeUninit::set leads to having to use let _ = ...
Why that? You can just "throw away" return values, in fact the examples in the docs do not do let _ = ...
. (write
/set
does not have an example yet... but really it's pretty much the same as read
, maybe it should just link.)
foo.write(bar);
works just fine without let
.
working with unitialised arrays
Yea, that is definitely an area of future interest.
@RalfJung
returning &mut T in MaybeUninit::set leads to having to use let _ = ...
Why that? You can just "throw away" return values, in fact the examples in the docs do not do
let _ = ...
. (write
/set
does not have an example yet... but really it's pretty much the same asread
, maybe it should just link.)
I've enabled the warning for unused_results
, so without let _ = ...
it would produce a warning. I forgot that isn't the default.
Ah, I didn't know about that warning. Interesting.
That might be an argument to make write
not return a reference, and provide a separate method for that if there is more demand.
An public array initialiser function or macro, like
uninitialized_array
, would be really nice addition as well.
This would just be [MaybeUninit::uninit(); EVENTS_CAP]
. See https://github.com/rust-lang/rust/issues/49147.
I forgot that isn't the default.
That might be an argument to make
write
not return a reference, and provide a separate method for that if there is more demand.
Seems niche? If there's more demand in the future we can add a method that doesn't return a reference.
Seems niche?
Yeah there are tons of methods that set a value and then return a mutable reference to it.
@Centril Heh, I don't think I saw your comment here when I wrote this elsewhere: https://github.com/rust-lang/rust/issues/54542#issuecomment-478261027
Removing the deprecated old renamed functions in https://github.com/rust-lang/rust/pull/59912.
After that, I guess the next thing to do is to propose stabilization... :tada:
I am a bit conflicted about pushing this given how entirely up in the air rust-lang/rfcs#2582 still is. :/ Without that RFC, gradual initialization of a struct is still not possible, but people will do it anyway.
OTOH,MaybeUninit
has waited long enough. And it's not like the code for gradual initialization that people write currently is better than what they'd write withMaybeUninit
.
After that, I guess the next thing to do is to propose stabilization... 🎉
@RalfJung How is the state of documentation here? If we can mitigate "people will do it anyway" with some clear docs that would help me sleep better... :)
Reading over the docs of MaybeUninit
, in particular that of assume_init
, it isn't clear in the "Safety" section that if you are calling mu.assume_init()
and then returning that result in a safe fn
, then you must also uphold the safety invariants as well. Before stabilizing, it would be good to enhance those docs and give snippets with of library provided safety invariants that must also be upheld when using MaybeUninit
.
How is the state of documentation here? If we can mitigate "people will do it anyway" with some clear docs that would help me sleep better... :)
I'll probably add a section on gradual initialization of structs, saying that that's not currently supported. People reading this will be like "WTF, really?".
TBH I find this rather frustrating. :( I think it was very possible for us to come up with some advise for that by now and I am sad that we were not able to do that.
it isn't clear in the "Safety" section that if you are calling mu.assume_init() and then returning that result in a safe fn, then you must also uphold the safety invariants as well. Before stabilizing, it would be good to enhance those docs and give snippets with of library provided safety invariants that must also be upheld when using MaybeUninit.
You are basically suggesting to turn this into docs explaining the entire idea of data-type invariants and how it pans out in Rust. I think MaybeUninit
is the wrong place for that; that would make it sound like this concern is specific to MaybeUninit
when really it is not. The things you are asking should be explained in some more higher-level place like the Nomicon. I plan to focus the docs of MaybeUninit
on the core issue of this type. Feel free to expand them though if you think that is useful. :)
You are basically suggesting to turn this into docs explaining the entire idea of data-type invariants and how it pans out in Rust.
This is a bit strong... I'm only suggesting a "Oh, __by the way__, remember the safety invariant matters too" in some strategic place in MaybeUninit<T>
's documentation. I'm not suggesting we add a novel. ;) That novel can reside in the Nomicon but chances are that most people using MaybeUninit<T>
will mostly interface with the standard library documentation.
All right, I tried to incorporate all of that into the stabilization PR: https://github.com/rust-lang/rust/pull/60445
I just stumbled upon a usage of mem::uninitialized
in the documentation of the standards library, didn't really know where else to note that the last example of core::ptr::drop_in_place
needs to be updated (also kind of ironic that it exhibits the other form of UB that would only be sanctioned by https://github.com/rust-lang/rfcs/pull/2582, so personally I'd remove it).
@HeroicKatora thanks! I incorporated the fix for that into https://github.com/rust-lang/rust/pull/60445.
We can't really do anything about the ref-to-unaligned-field currently, not sure if removing the doc is a good idea though.
Maybe add trait PartialUninit
(or PartialInit
) which would initialize data partially based on metadata.
Example: MODULEENTRY32W.
The first field (dwSize
) must be initialized by the structure size (size_of::<MODULEENTRY32W>()
).
pub trait PartialUninit: Sized {
fn uninit() -> MaybeUninit<Self>;
}
impl<T> PartialUninit for T {
default fn uninit() -> MaybeUninit<Self> {
MaybeUninit::uninit()
}
}
impl PartialUninit for MODULEENTRY32W {
unsafe fn uninit() -> MaybeUninit<MODULEENTRY32W> {
let uninit = MaybeUninit { uninit: () };
uninit.get_mut().dwSize = size_of::<MODULEENTRY32W>();
uninit
}
}
How do you think?
@kgv I'm afraid I don't understand your suggestion. Perhaps some extra context explaining what problem you're trying to solve could help? And maybe a more complete example of your suggested solution?
@scottjmaddox fixed. Is it clearer?
@kgv what is the problem this is solving (as opposed to someone just writing a helper function for this)? I don't see why libstd has to do anything here.
Notice that assignment-based partial initialization of structs only works for types that don't need dropping. uninit.get_mut().foo = bar
will otherwise drop foo
, which is UB.
@RalfJung The problem I'm trying to solve - unified work with FFI structures, some fields of which do not depend on self
(only Self
or do not depend on anything (constant)), for example - one of field is size of Self
.
@kgv I have to agree with @RalfJung here that such a use case is better handled by a helper module or crate.
The stabilization PR landed, just in time for the beta. :) It's been around 8 months since I started looking into the situation around unions and uninitialized memory, and finally we have something that will (most likely) be shipped in 6 weeks. What a journey! Thanks a ton to everyone who helped with that. :D
Of course, we're far from done. There's https://github.com/rust-lang/rfcs/pull/2582 to be resolved. libstd still has quite a few uses of mem::uninitialized
(mostly in platform-specific code) that need porting. The stable API we have now is very minimal: we need to figure out what to do with read
and write
, and we should come up with APIs that help working with arrays and boxes of MaybeUninit
. And we have a lot of explaining to do to slowly move the entire ecosystem away from mem::uninitialized
.
But we'll get there, ans this first step was probably the most important one. :)
and we should come up with APIs that help working with arrays and boxes of
MaybeUninit
.
@RalfJung To that end; maybe now's the time to start working on https://github.com/rust-lang/rust/issues/49147? =P
Also, we should probably split and close this tracking issue in favor of smaller ones for the remaining bits.
To that end; maybe now's the time to start working on #49147? =P
Did you just volunteer? ;) (I'm afraid I won't have time for that.)
we should probably split and close this tracking issue in favor of smaller ones for the remaining bits.
I'll leave that to the process experts. But I tend to agree.
Did you just volunteer? ;) (I'm afraid I won't have time for that.)
What have I done... =D -- I already have a project I'm working on so it will probably take some time. Maybe someone else is interested? (if so, hop into the tracking issue)
I'll leave that to the process experts. But I tend to agree.
That would be me... ;) I'll try to split and close it soon-ish.
@RalfJung about your statement that let x: bool = mem::uninitialized();
is UB, the question is why invalid primitives are considered being so? As I understand you have to read a value to observe it's invalid to trigger UB. But if you don't read it then what?
I feel that even creating a value is a bad thing, but I'd like to know reasons why rust disallow it anyway? It seems that there is no harm if you don't observe an invalid state. Is it just for early errors sake or maybe something else?
Is there any real cases in compiler when it relies on these assumptions?
For example, we annotate functions like foo(x: bool)
telling LLVM that x
is a valid boolean. That makes it UB to pass a bool
that is not true
or false
even if the function originally did not look at x
. This is useful because sometimes the compiler wants to introduce uses of previously unused variables (in particular, this happens when moving statements out of loops without proving that the loop is taken at least once).
AFAIK we also set (or want to set) some of these annotations within a function, not just at function boundaries. And we might find more places in the future where such information can be useful. We might be able to cover that with a clever definition of "using a variable" (a term you used without defining it, and it is indeed not easy to define), but I think when it comes to UB in unsafe code, it is important to have simple rules where we can.
So, we want to make sure that even in unsafe code, the types in the code mean something. That's only possible by treating uninitialized memory in a proper way with a dedicated type, instead of the ad-hoc "yolo" approach of lying to the compiler about the content of a variable ("I claim this is a bool
, but really I'll not initialize it").
For example, we annotate functions like foo(x: bool) telling LLVM that x is a valid boolean. That makes it UB to pass a bool that is not true or false even if the function originally did not look at x. This is useful because sometimes the compiler wants to introduce uses of previously unused variables (in particular, this happens when moving statements out of loops without proving that the loop is taken at least once).
This may be considered as a usage. I'm asking about initing a value and never reading/passing it anywhere before it was overwritten with valid value.
I don't see any useful use cases for initing value in such a nuanced way, but just wondering.
In a nutshell, my question is if this code is UB (according to docs - it it), and if so, what exactly can break if I write so?
let _: bool = unsafe { mem::unitialized };
Another question about the subject itself: we know that we have box
syntax which allows you to allocate memory directly on the heap, and it works always unlike Box::new()
which sometimes stackalloc memory. So if I do box MaybeUninit::new()
and then fill it how could I convert Box<MaybeUninit<T>>
into Box<T>
? Should I write any transmutes or what? Maybe I merely missed this point in the documentation.
@Pzixel we've actually discussed interactions between Box
and MaybeUninit
already on this thread :smile:
@Centril having a sub-issue to discuss that might be good when you split this.
Yep, I remember that discussion, but I don't remember any specific api.
In a nutshell, I want to have something like
fn into_inner<A,T>(value: A<MaybeUninit<T>>) -> A<T> { unsafe { std::mem::transmute() } }
But I don't think there is such API, and it seems that it couldn't be implemented without compiler support at this point of language evolution.
I thought about it a little more and it seems that it should work on any level of nesting. So Vec<Result<Option<MaybeUninit<u8>>>>
should have into_inner
method which returns Vec<Result<Option<u8>>>
I was assuming get_ref and get_mut were going to be stabilized at the same time (all of the features point at this issue). Is there a reason not to? They're nice and are the only indication that performing the action they perform is allowed (which imo should obviously be true).
This may be considered as a usage.
So let x: bool = mem::uninitialized()
is not using the bool
(even though it gets assigned to x
!), but
fn id(x: bool) -> bool { x }
let x: bool = id(mem::uninitialized());
does use it? What about
fn uninit() -> bool { mem::uninitialized() }
let x: bool = uninit();
Is the return here a use?
This very quickly gets very subtle. So the answer I think we should be giving is that every assignment (really every copy, as in, every assignment after lowering to MIR) is a use, and that includes the assignment in let x: bool = mem::uninitialized()
.
I was assuming get_ref and get_mut were going to be stabilized at the same time (all of the features point at this issue). Is there a reason not to? They're nice and are the only indication that performing the action they perform is allowed (which imo should obviously be true).
This is blocked on resolving https://github.com/rust-lang/unsafe-code-guidelines/issues/77: is it safe to have an &mut bool
that points to uninitialized memory? I think the answer should be "yes", but people disagree.
This is blocked on resolving rust-lang/unsafe-code-guidelines#77
I don't think that blocking needs to happen. You can stabilize it and say "it is UB to use this if the memory is uninitialized" and then later soften the requirement if we determine it's fine. It's a nice method to have for post-initialization.
and then later soften the requirement
Which means that if I code against the future version's documentation but someone compiles my code using the (API compatible!) old version of the compiler, there's now UB?
@Gankro
I don't think that blocking needs to happen. You can stabilize it and say "it is UB to use this if the memory is uninitialized" and then later soften the requirement if we determine it's fine. It's a nice method to have for post-initialization.
That seems very footgun-y to me. Why not just write &mut *foo.as_mut_ptr()
? Once you have everything initialized, why would that not work? IOW, I am now wondering about you saying
the only indication that performing the action they perform is allowed
because why would it not be? If we exhaustively list everything you can do once you initialized the value, that'll be a long list.^^
@shepmaster
Which means that if I code against the future version's documentation but someone compiles my code using the (API compatible!) old version of the compiler, there's now UB?
That's true today if people do &mut *foo.as_mut_ptr()
. I see no way to avoid it.
Also, there only is UB if we actually have to change anything when doing that documentation. Otherwise, we are in the weird situation where there would have been UB if that same code was executed with the same compiler before we made a guarantee, but now that we make the guarantee there is no UB any more. UB is a property not just of the compiler but also the spec, and the spec can change retroactively. ;)
Right, I was assuming the process was
@RalfJung
Is the return here a use?
Yes, returning a value or passing it anywhere is a usage.
This very quickly gets very subtle. So the answer I think we should be giving is that every assignment (really every copy, as in, every assignment after lowering to MIR) is a use, and that includes the assignment in let x: bool = mem::uninitialized().
Looks valid.
Anyway, that's about arbitrart MaybeUninit nesting? Could it be transmuted safely without requiring user writing the transmute for every wrapper type?
@Pzixel I am not sure if I understand your question, but I think it is being discussed at https://github.com/rust-lang/rust/issues/61011.
I saw that the still-unstabilized MaybeUninit::write()
method is not unsafe
although it can skip calling drop on an already-present T
, which I'd have assumed to be unsafe. Is there precedent for this being considered safe?
https://doc.rust-lang.org/nomicon/leaking.html#leaking
https://doc.rust-lang.org/nightly/std/mem/fn.forget.html
forget
is not marked asunsafe
, because Rust's safety guarantees do not include a guarantee that destructors will always run.
However, also see https://github.com/rust-lang-nursery/nomicon/issues/135
Could we add a MaybeUninit<T> -> NonNull<T>
method to MaybeUninit
? AFAICT the pointer returned by MaybeUninit::as_mut_ptr() -> *mut T
is never null. That would reduce the churn of having to interface with APIs that use NonNull<T>
, from:
let mut x = MaybeUninit<T>::uninit();
foo(unsafe { NonNull::new_unchecked(x.as_mut_ptr() });
to:
let mut x = MaybeUninit<T>::uninit();
foo(x.ptr());
the pointer returned by MaybeUninit::as_mut_ptr() -> *mut T is never null.
This is correct.
Generally (and I think I've seen @Gankro say this), NonNull
works fairly well "at rest" but when actually using pointers one wants to get to a raw pointer ASAP. That's just much more readable.
However, adding a method that returns NonNull
seems fine. What should it be called though? Is there precedence?
There’s precedent with https://github.com/rust-lang/rust/issues/47336 but the name is not good and I’m not sure we’re gonna stabilize this method.
Has the crater run mentioned in https://github.com/rust-lang/rust/pull/60445#issuecomment-488818677 happened?
The idea of 3 months of available time that @centril mentions doesn't materialize for people who want to be warning-free on all of beta, stable and nightly. 1.36.0 has been released less than a week ago and nightly is already emitting warnings.
Could maybe deprecation be postponed to 1.40.0?
Deprecation warnings aren't always isolated to the crate responsible for them. E.g. when a crate exposes a macro that uses std::mem::uninitialized
internally, usages by third party crates still invoke the deprecation warning. I noticed this today when I compiled one of my projects with the nightly compiler. Even though the code doesn't contain a single mention of uninitialized
, I got the deprecation warning because it invoked glium's implement_vertex
macro.
Running cargo +nightly test
on glium master gives me over 1400 lines of output, mostly comprised of deprecation warnings of the uninitialized
function (I count the warning 200 times, but it is likely capped as the number that rg "uninitialized" | wc -l
outputs is 561).
What are the remaining concerns blocking stabilization of the rest of the methods? Doing everything through *foo.as_mut_ptr()
gets very tedious, and sometimes (for write
) involves more unsafe
blocks than necessary.
@SimonSapin To emulate write
, you could replace the whole MaybeUninit
with no unsafe using *val = MaybeUninit::new(new_val)
where val: &mut MaybeUninit<T>
and new_val: T
or you could use std::mem::replace
if you want the old value.
@est31 these are good points. I'd be fine pushing back deprecation by a release or two.
Any objections?
We've already said in the 1.36.0 release blog post:
As MaybeUninit
is the safer alternative, starting with Rust 1.38, the function mem::uninitialized will be deprecated.
As such, I think we should avoid flip-floppery on this one as that does not send a good message and it is confusing. Moreover the deprecation date should also be widely known given that it has been mentioned in the blog post.
It maybe be late to go back on the deprecation of uninitialized
. But maybe we could decide on a policy to only emit deprecation warnings in Nightly after the replacement has been on the Stable channel for some time?
For example, Firefox compromised on requiring a new Rust version two weeks after its release.
We've already said in the 1.36.0 release blog post:
I disagree that a mention of a date in a blog post is such an iron-clad degree. It's in a repo and we can submit an edit.
As such, I think we should avoid flip-floppery on this one as that does not send a good message and it is confusing.
"flip-floppery" is a bad thing, but changing our minds based on data and feedback is not that.
I don't much care one way or the other about the actual decision, but I don't believe that people will be confused by the proposal. Those that have seen the blog post or the deprecation warning can move to the new thing. People who haven't just won't care for another few releases.
"flip-floppery" is a bad thing, but changing our minds based on data and feedback is not that.
Fully agreed. I don't see a bad message being sent by saying "hey, our deprecation schedule was a bit too aggressive, we moved things back by a release". Quite the opposite, actually.
In fact IIRC I mentioned during landing the stabilization PR that the precedent is to deprecate 3 releases in the future and not 2, but for some reason we went with 2. Three releases means 1 entire release between stable-gets-released-with-the-deprecation-announcement and deprecated-on-nightly, that seems like fair time for people tracking nightly. 6 weeks is an eon, right? ;)
So I plan to submit a PR tomorrow that changes the deprecation version to 1.39.0. I can also submit a PR to update that blog post if people think it is important.
So I plan to submit a PR tomorrow that changes the deprecation version to 1.39.0. I can also submit a PR to update that blog post if people think it is important.
I will agree to 1.39 but no later than that. You will also need to update the release notes in addition to the blog-post.
Submitted PR for changed deprecation schedule: https://github.com/rust-lang/rust/pull/62599.
@SimonSapin
What are the remaining concerns blocking stabilization of the rest of the methods? Doing everything through *foo.as_mut_ptr() gets very tedious, and sometimes (for write) involves more unsafe blocks than necessary.
For as_ref
/as_mut
, I honestly wanted to wait until we know if references have to point to initialized data. Otherwise the documentation for those methods is just so preliminary.
For read
/write
, I am fine stabilizing them if everyone agrees that the names and signatures make sense. I think this should be coordinated with ManuallyDrop::take/read
, and maybe there should also be ManuallyDrop::write
?
I honestly wanted to wait until we know if references have to point to initialized data.
What does it take for the Unsafe Code Guidelines WG and the Language Team to come to a decision on this subject? Do you expect it’ll more likely happen in a few weeks, a few months, or a few years?
In the meantime, as_mut
being unstable doesn’t stop users from writing &mut *manually_drop.as_mut_ptr()
what that’s when they need to get something done.
What does it take for the Unsafe Code Guidelines WG and the Language Team to come to a decision on this subject? Do you expect it’ll more likely happen in a few weeks, a few months, or a few years?
Months, maybe years.
In the meantime, as_mut being unstable doesn’t stop users from writing &mut *manually_drop.as_mut_ptr() what that’s when they need to get something done.
Yes I know. The hope is to nudge people towards delaying the &mut
part as much as possible, and working with raw pointers. Of course without https://github.com/rust-lang/rfcs/pull/2582 that is often hard.
The documentation on MaybeUninit seems like a prime place to at least discuss that this is an ambiguity in the language's semantics and that users should conservatively assume it's not OK.
True, that would be the other option.
Even with a conservative assumption, as_mut
is valid after the value is fully initialized.
One way to be conservative with arrays is using MaybeUninit<[MaybeUninit<Foo>; N]>
. The outer wrappers allows creating the array with a single uninit()
call. (I think the [expr; N]
literal requires Copy
?) The inner wrappers make it safe even in the conservative assumption to use the convenience of slice::IterMut
in order to traverse the array, and then initialize the Foo
values one by one.
@SimonSapin see the unstable uninitialized_array!
macro in libcore.
@RalfJung maybe uninit_array!
would be a better name.
@Stargateur Absolutely, this is definitely not going to be stabilized with its current name. It is hopefully not going to get stabilized ever if https://github.com/rust-lang/rust/issues/49147 happens Soon (TM).
@RalfJung Ugh, that's my fault, I was blocking the PR without a great reason: https://github.com/rust-lang/rust/pull/61749#issuecomment-512867703
@eddyb this works for libcore, yay! But somehow when I try to use the feature in liballoc, it doesn't compile even though I set the flag. See https://github.com/rust-lang/rust/commit/4c2c7e0cc9b2b589fe2bab44173acc2170b20c09.
Building stage1 std artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Compiling alloc v0.0.0 (/home/r/src/rust/rustc.2/src/liballoc)
error[E0277]: the trait bound `core::mem::MaybeUninit<K>: core::marker::Copy` is not satisfied
--> <::core::macros::uninit_array macros>:1:32
|
1 | ($ t : ty ; $ size : expr) => ([MaybeUninit :: < $ t > :: uninit () ; $ size])
| - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `core::marker::Copy` is not implemented for `core::mem::MaybeUninit<K>`
| _|
| |
2 | | ;
| |_- in this expansion of `uninit_array!`
|
::: src/liballoc/collections/btree/node.rs:109:19
|
109 | keys: uninit_array![_; CAPACITY],
| -------------------------- in this macro invocation
|
= help: consider adding a `where core::mem::MaybeUninit<K>: core::marker::Copy` bound
= note: the `Copy` trait is required because the repeated element will be copied
error[E0277]: the trait bound `core::mem::MaybeUninit<V>: core::marker::Copy` is not satisfied
--> <::core::macros::uninit_array macros>:1:32
|
1 | ($ t : ty ; $ size : expr) => ([MaybeUninit :: < $ t > :: uninit () ; $ size])
| - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `core::marker::Copy` is not implemented for `core::mem::MaybeUninit<V>`
| _|
| |
2 | | ;
| |_- in this expansion of `uninit_array!`
|
::: src/liballoc/collections/btree/node.rs:110:19
|
110 | vals: uninit_array![_; CAPACITY],
| -------------------------- in this macro invocation
|
= help: consider adding a `where core::mem::MaybeUninit<V>: core::marker::Copy` bound
= note: the `Copy` trait is required because the repeated element will be copied
error[E0277]: the trait bound `core::mem::MaybeUninit<collections::btree::node::BoxedNode<K, V>>: core::marker::Copy` is not satisfied
--> <::core::macros::uninit_array macros>:1:32
|
1 | ($ t : ty ; $ size : expr) => ([MaybeUninit :: < $ t > :: uninit () ; $ size])
| - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `core::marker::Copy` is not implemented for `core::mem::MaybeUninit<collections::btree::node::BoxedNode<K, V>>`
| _|
| |
2 | | ;
| |_- in this expansion of `uninit_array!`
|
::: src/liballoc/collections/btree/node.rs:162:20
|
162 | edges: uninit_array![_; 2*B],
| --------------------- in this macro invocation
|
= help: the following implementations were found:
<core::mem::MaybeUninit<T> as core::marker::Copy>
= note: the `Copy` trait is required because the repeated element will be copied
error: aborting due to 3 previous errors
Mystery solved: the uses of the repeat expression in libcore actually were for types that are copy.
And the reason it doesn't work in liballoc is that MaybeUninit::uninit
is not promotable.
@RalfJung Maybe open a PR removing uses of the macro where it's completely unnecessary?
@eddyb I made that part of https://github.com/rust-lang/rust/pull/62799.
maybe_uninit_ref
For as_ref/as_mut, I honestly wanted to wait until we know if references have to point to initialized data. Otherwise the documentation for those methods is just so preliminary.
Unstable get_ref
/ get_mut
are definitely adviseable because of that; however, there are cases where get_ref
/ get_mut
may be used when the MaybeUninit
has been init: to get a safe handle to the (now known initialised) data while avoiding any memcpy
(instead of assume_init
, which may trigger a memcpy
).
Because of this, I imagine assume_init_by_ref
/ assume_init_by_mut
could be nice to have (since into_inner
has been called assume_init
, I seems plausible that the ref
/ ref mut
getters also get a special name to reflect this).
There are two / three options for this, related to Drop
interaction:
Exact same API as get_ref
and get_mut
, which can lead to memory leaks when there is drop glue;
get_ref
/ get_mut
, but with a Copy
bound;Closure style API, to guarantee drop:
impl<T> MaybeUninit<T> {
/// # Safety
///
/// - the contents must have been initialised
unsafe
fn assume_init_with_mut<R, F> (mut self: MaybeUninit<T>, f: F) -> R
where
F : FnOnce(&mut T) -> R,
{
if mem::needs_drop::<T>().not() {
return f(unsafe { self.get_mut() });
}
let mut this = ::scopeguard::guard(self, |mut this| {
ptr::drop_in_place(this.as_mut_ptr());
});
f(unsafe { MaybeUninit::<T>::get_mut(&mut *this) })
}
}
(Where scopeguard
's logic can easily be reimplemented, so there is no need to depend on it)
These could be stabilized faster than get_ref
/ get_mut
, given the explicit assume_init
requirement.
If a variant of option .1
was chosen, and get_ref
/ get_mut
were to become usable without the assume_init
situation, then this API would become almost strictly inferior (I say almost because with the proposed API, reading from the reference would be okay, which it can never be in the case of get_ref
and get_mut
)
Similar to what @danielhenrymantilla wrote about get_{ref,mut}
, I am beginning to think that read
should probably be renamed to read_init
or read_assume_init
or so, something that indicates that this may only be done after initialization is complete.
@RalfJung I have a question about this:
fn foo<T>() -> T {
let newt = unsafe { MaybeUninit::<T>::zeroed().assume_init() };
newt
}
For example, we call foo<NonZeroU32>
. Does this trigger UB when we declare a foo
function (because it have to be valid for all T
s or when we instantinate it with a type which triggers UB? Sorry if it's a wrong place to ask a question.
@Pzixel code can only cause UB when it runs.
So, foo::<i32>()
is fine. But foo::<NonZeroU32>()
is UB.
The property of being valid for all possible ways to call is called "soundness", also see the reference. The general contract in Rust is that the safe API surface of a library must be sound. This so that users of a library do not have to worry about UB. The entire safety story of Rust rests on libraries with sound APIs.
@RalfJung thanks.
So if I get you correctly this function is unsound (and thus, invalid), but if we mark it as unsafe
then this body becomes valid and sound
@Pzixel if you mark it unsafe, soundness is just not a concept that even applies any more. "Is this sound" only makes sense as a question for safe code.
Yes, you should mark the function unsafe
because some inputs can trigger UB. But even if you do, those inputs still trigger UB, so the function still must not be called that way. It is never okay to trigger UB, not even in unsafe code.
Yes, of course, I understand that. I only wanted to conclude that partial functuon must be marked as unsafe
. Makes sense to me but I didn't think about it before you replied.
Since the discussion on this tracking issue is so long now, can we break it out into a few other tracking issues for each feature of MaybeUninit
that's still unstable?
maybe_uninit_extra
maybe_uninit_ref
maybe_uninit_slice
Seems reasonable. There also is https://github.com/rust-lang/rust/issues/63291.
Closing this in favor of a meta-issue which tracks MaybeUninit<T>
more generally: #63566
Most helpful comment
mem::zeroed()
is useful for certain FFI cases where you are expected to zero a value withmemset(&x, 0, sizeof(x))
before calling a C function. I think this is a sufficient reason to keep it undeprecated.