In this page:
https://doc.rust-lang.org/nightly/std/mem/union.MaybeUninit.html
I suggest to add an equivalent example of "how to safely initialize an array of Vecs" using MaybeUninit instead of uninitialized, if possible:
https://doc.rust-lang.org/nightly/std/mem/fn.uninitialized.html
What would the example look like? Something like this?
let arr: [MaybeUninit<Vec<u8>>; 5] = [MaybeUninit::uninitialized(); 5];
for v in &mut arr {
v.set(vec![]);
}
let arr: [Vec<u8>; 5] = mem::transmute(arr);
cc @RalfJung
As arr array length is better to use 50 instead of 5, because Default:default goes up to about 30.
One gotcha is that the example doesn't even work as-is because MaybeUninit
isn't Copy
:
If MaybeUninit
is intended to be compatible with arrays then maybe that was overlooked?
In https://github.com/rust-lang/rust/pull/53508/commits/7c37c6d33ef890f73c16323fe87cdbe403ba2ac8#diff-322c42fd5e8ec8e8f8093261d59a5782R87, @eddyb and @RalfJung mention that MaybeUninit
“distributes over products” such as arrays, so you could instead do something like:
let mut arr: MaybeUninit<[Vec<u8>; 5]> = MaybeUninit::uninitialized();
for i in 0..5 {
ptr::write(&mut arr.get_mut()[i], vec![]);
}
let arr: [Vec<u8>; 5] = arr.into_inner();
...if it is safe to get a pointer or reference to the uninitialized inner value or its fields/elements, as long as this pointer isn't used to read from uninitialized portions of the value. It would only be safe to read from an element after it is initialized, and only safe to read from the whole (e.g. using into_inner
) when all the elements are initialized.
Note that the expression arr.get_mut()
above creates an &mut T
that points to an uninitialized T
, so this can only work if we decide that such references are “unsafe but valid” (i.e., not instant UB). Also it may be a bit risky passing this to the safe Index::index
function, so a more conservative version might use only raw pointers:
(arr.as_mut_ptr() as *mut Vec<u8>).offset(i).write(vec![]);
So would this be a valid example of how to initialize an array of Vecs then?
#![feature(maybe_uninit)]
use std::mem::{self, MaybeUninit};
fn main() {
let mut arr: [MaybeUninit<Vec<u8>>; 50] = {
// The compiler complains about `Copy` bounds if we try to directly
// create a `[MaybeUninit<Vec<u8>>; 50]`, so we use the fact that
// MaybeUninit<[T; N]> == [MaybeUninit<T>; N] to work around that
let arr: MaybeUninit<[Vec<u8>; 50]> = MaybeUninit::uninitialized();
unsafe { mem::transmute(arr) }
};
// Initialize the elements one at a time. The `Iterator` trait isn't
// implemented for arrays over size 32 (integer generics when!?),
// so let's use good ol' indexing instead.
for idx in 0..arr.len() {
arr[idx].set(vec![]);
}
// Now that everything is initialized (and they'd better be fully
// initialized or things WILL blow up here), we can transmute to
// a fully valid array of Vecs
let arr: [Vec<u8>; 50] = unsafe { mem::transmute(arr) };
// `Debug` is also unimplemented for arrays of this size, so let's
// index-loop again to show our completed work!
for idx in 0..arr.len() {
println!("Index {}: {:?}", idx, arr[idx]);
}
}
Is it worth adding to the Rust std lib a helper macro like this, to simplify the creation of arrays of noncopy values?
So would this be a valid example of how to initialize an array of Vecs then?
My code from above was meant to be an entire solution, without any transmuting necessary. Here's a complete example program:
use std::mem::MaybeUninit;
fn main() {
const N: usize = 5;
let arr: [Vec<u8>; N] = unsafe {
let mut arr = MaybeUninit::uninitialized();
for i in 0..N {
(arr.as_mut_ptr() as *mut Vec<u8>).add(i).write(vec![]);
}
arr.into_inner()
};
for idx in 0..arr.len() {
println!("Index {}: {:?}", idx, arr[idx]);
}
}
Ah nice, that's way more concise.
@mbrubeck One extremely subtle point about your code is the Nice to have no transmute, though as *mut Vec<u8>
is pretty much the same thing. But yeah, I agree that is correct. Might be nicer to do the arr.as_mut_ptr() as *mut Vec<u8>
only once? Also I think arr
should have a type ascribed, inference is not entirely trivial here.
use std::mem::MaybeUninit;
fn main() {
const N: usize = 5;
let arr: [Vec<u8>; N] = unsafe {
let mut arr: MaybeUninit<[Vec<u8>; N]> = MaybeUninit::uninitialized();
let arr_ptr = arr.as_mut_ptr() as *mut Vec<u8>; // pointer to 1st element
for i in 0..N {
arr_ptr.add(i).write(vec![]);
}
arr.into_inner()
};
for idx in 0..arr.len() {
println!("Index {}: {:?}", idx, arr[idx]);
}
}
@mbrubeck One extremely subtle point about your code is the Nice to have no transmute, though
as *mut Vec<u8>
is pretty much the same thing. But yeah, I agree that is correct. Might be nicer to do thearr.as_mut_ptr() as *mut Vec<u8>
only once? Also I thinkarr
should have a type ascribed, inference is not entirely trivial here.use std::mem::MaybeUninit; fn main() { const N: usize = 5; let arr: [Vec<u8>; N] = unsafe { let mut arr: MaybeUninit<[Vec<u8>; N]> = MaybeUninit::uninitialized(); let arr_ptr = arr.as_mut_ptr() as *mut Vec<u8>; // pointer to 1st element for i in 0..N { arr_ptr.add(i).write(vec![]); } arr.into_inner() }; for idx in 0..arr.len() { println!("Index {}: {:?}", idx, arr[idx]); } }
I think the ascribed type is also very helpful!
When I try to run this on the Playground I get an error however:
error[E0599]: no method named `into_inner` found for type `std::mem::MaybeUninit<[std::vec::Vec<u8>; 5]>` in the current scope
When I try to run this on my own machine with nightly-1.35.0 I get the same error. Am I missing something?
Ultimately I think we should somehow create a good safe abstraction for this. I passed this to the security workgroup to see if there is something we can come up with.
The API has changed, the following should work now:
#![feature(maybe_uninit)]
use std::mem::MaybeUninit;
fn main() {
const N: usize = 5;
let arr: [Vec<u8>; N] = unsafe {
let mut arr: MaybeUninit<[Vec<u8>; N]> = MaybeUninit::uninit();
let arr_ptr = arr.as_mut_ptr() as *mut Vec<u8>; // pointer to 1st element
for i in 0..N {
arr_ptr.add(i).write(vec![]);
}
arr.assume_init()
};
for idx in 0..arr.len() {
println!("Index {}: {:?}", idx, arr[idx]);
}
}
But that's basically a hack. Better interaction of MaybeUninit
with Box
/arrays/slices is definitely on the roadmap, but let's first land the basics.
Isn't it simpler and safer to turn &mut MaybeUninit<[Vec<u8>; N]>
into &mut [MaybeUninit<Vec<u8>>; N]
and iterate that normally, instead of iterating with raw pointers?
I am not saying the above is the API we should use.^^
Probably it makes most sense to start with [MaybeUninit<Vec<u8>>; N]
created by uninitialized_array![Vec<u8>; N]
. But then we need a good way to go from that to [Vec<u8>; N]
. We cannot write that as a generic function yet, unfortunately (nor your array pointer conversion function, FWIW).
So, here's a trick...
&mut MaybeUninit<[T; N]>
should coerce to &mut MaybeUninit<[T]>
(do we have to solve union
/ custom DST unresolved questions for this?)
Then you don't need const generics to turn it into &mut [MaybeUninit<T>]
.
Btw, we should just make [MaybeUninit::uninit(); N]
and even [vec![]; N]
work in the NLL borrowck, we were waiting for Rust 2018 to ship, but it should be easy now.
We can even bypass const fn
promotability for vec![]
by special-casing it to use a perma-unstable Vec::EMPTY
associated const! (but Vec::new()
working would be nice)
cc @oli-obk (all we have to do is promote Rvalue::Repeat
's element, where possible, and move the Copy
check from typeck to old & new borrowck)
There is an issue open for this, I think, but I can't find it easily right now.
I suggest to add an equivalent example of "how to safely initialize an array of Vecs"
The docs have been expanded to contain such an example:
use std::mem::{self, MaybeUninit};
use std::ptr;
let data = {
// Create an uninitialized array of `MaybeUninit`. The `assume_init` is
// safe because the type we are claiming to have initialized here is a
// bunch of `MaybeUninit`s, which do not require initialization.
let mut data: [MaybeUninit<Vec<u32>>; 1000] = unsafe {
MaybeUninit::uninit().assume_init()
};
// Dropping a `MaybeUninit` does nothing, so if there is a panic during this loop,
// we have a memory leak, but there is no memory safety issue.
for elem in &mut data[..] {
unsafe { ptr::write(elem.as_mut_ptr(), vec![42]); }
}
// Everything is initialized. Transmute the array to the
// initialized type.
unsafe { mem::transmute::<_, [Vec<u32>; 1000]>(data) }
};
assert_eq!(&data[0], &[42]);
@RalfJung This pattern is a way to avoid unnecessary initialization of the array. And it is crystal clear :)
In a context where unnecessary initialization is not wanted, it is likely that unnecessary moves are also unwanted. Is it somehow guaranteed by the compiler that returning unsafe { mem::transmute::(...) }
from the block in this case will not copy all bytes to another location owned by the new variable data
? If not, how to ensure that results will not be moved around?
Should this be more explicit in this example of the doc, or is it the wrong place to bring up a move-elision-related topic?
AFAIK Rust has no guaranteed move elision, and I am afraid I don't know how well the optimizer can elide such moves in practice. This is below the layers of Rust that I am familiar with.
@iago-lito I think the feature you are asking for was also requested at https://github.com/rust-lang/rust/issues/61011.
Most helpful comment
The docs have been expanded to contain such an example: