PR https://github.com/rust-lang/rust/pull/49669 adds a GlobalAlloc
trait, separate from Alloc
. This issue track the stabilization of this trait and some related APIs, to provide the ability to change the global allocator, as well as allocating memory without abusing Vec::with_capacity
+ mem::forget
.
Defined in or reexported from the std::alloc
module:
Update: below is the API proposed when this issue was first opened. The one being stabilized is at https://github.com/rust-lang/rust/issues/49668#issuecomment-393263510.
/// #[global_allocator] can only be applied to a `static` item that implements this trait
pub unsafe trait GlobalAlloc {
unsafe fn alloc(&self, layout: Layout) -> *mut Opaque;
unsafe fn dealloc(&self, ptr: *mut Opaque, layout: Layout);
unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut Opaque {
// Default impl: self.alloc() and ptr::write_bytes()
}
unsafe fn realloc(&self, ptr: *mut Opaque, layout: Layout, new_size: usize) -> *mut Opaque {
// Default impl: self.alloc() and ptr::copy_nonoverlapping() and self.dealloc()
}
// More methods with default impls may be added in the future
}
extern {
pub type Opaque;
}
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
pub struct Layout { /* private */ }
impl Layout {
pub fn from_size_align(size: usize: align: usize) -> Result<Self, LayoutError> {…}
pub unsafe fn from_size_align_unchecked(size: usize: align: usize) -> Self {…}
pub fn size(&self) -> usize {…}
pub fn align(&self) -> usize {…}
pub fn new<T>() -> Self {…}
pub fn for_value<T: ?Sized>(t: &T) -> Self {…}
}
#[derive(Copy, Clone, Debug)]
pub struct LayoutError { /* private */ }
/// Forwards method calls to the `static` item registered
/// with `#[global_allocator]` if any, or the operating system’s default.
pub struct Global;
/// The operating system’s default allocator.
pub struct System;
impl GlobalAlloc for Global {…}
impl GlobalAlloc for System {…}
CC @rust-lang/libs, @glandium
Global
type. GlobalAllocator
?Void
type. Lower-case void
would match C (our *mut void
is very close to C’s void*
type), but violates the convension of camel-case for non-primitive types. But Void
in camel-case is already the name of an empty enum (not an extern type like here) in a popular crate. (An extern type is desirable so that <*mut _>::offset
cannot be called without first casting to another pointer type.) Maybe call it Opaque
? Unknown
?Opaque
.Layout
by reference, or making it Copy
https://github.com/rust-lang/rust/issues/48458.~ Copy
: https://github.com/rust-lang/rust/pull/50109GlobalAlloc::owns
: https://github.com/rust-lang/rust/issues/44302 proposes making it required for allocators to be able to tell if it “owns” a given pointer.~ Not to be required by this trait since glibc and Windows do not support this.usable_size
method (for now), the layout passed to dealloc must be the same as was passed to alloc. (Same as with Alloc::usable_size
’s default impl returning (layout.size(), layout.size())
.)oom
method is part of GlobalAlloc
so that implementation-specific printing to stderr does not need to be part of liballoc and instead goes through #[global_allocator]
, but this does not really belong in the GlobalAlloc
trait. Should another mechanism like #[global_allocator]
be added to have pluggable OOM handling?~ https://github.com/rust-lang/rust/pull/50144Alloc
traitLayout
methodsAllocErr
typeWe’re not ready to settle on a design for collections generic over the allocation type. Discussion of this use case should continue on the tracking issue for the Alloc
trait: https://github.com/rust-lang/rust/issues/32838.
I was initially confused by the name Void
. I confused it with the aforementioned empty enum. Bikeshed: I wonder if it could be called Allocation
or Memory
?
Relevant discussions pre-PR:
But I don’t know if any platform’s system allocator (like libc’s malloc) supports this, let alone all of them.
The OSX system allocator supports that. Jemalloc does too, optionally. Glibc malloc and Windows heap allocator don't.
Taking Layout by reference, or making it Copy #48458
The fact that realloc
now takes the new size as usize
rather than a Layout
strongly points towards just making Layout
implement Copy
and treating it as a simple (length, alignment) pair.
Has the Heap
type been renamed to Global
?
@alexreg Yes, as described in the PR message of #49669.
Naming bikesheds:
Given that the attribute is global_allocator
, should the trait be called GlobalAllocator
instead? alloc
vs. allocator
is inconsistent.
Also, I am not very happy with Void
. void
is probably the most misunderstood type in C. It is often described as "empty", but it actually is a unit type ()
, not the empty never type !
. void*
is NOT just a pointer to ()
though, but an independent concept. I'd rather we do not step into the middle of this confusion.
From the alternatives proposed above, I prefer Unknown
. Some more random suggestions: Blob
, Extern
, Data
. I think from all of these, my personal favorite is Blob
, which I have often seen used as a term for "not further specified binary data". But really, any of these is strictly better than Void
IMHO.
If GlobalAlloc
is renamed to GlobalAllocator
, should Alloc
be Allocator
? And the alloc
method allocate
? (And similarly other methods.)
Agreed that Void
or void
is probably best avoided, though I’m not sure what to pick instead.
I would personally keep the name "alloc" for traits and methods, the global_allocator
is talking about a specific instance whereas the methods/traits are actions/groups of types.
std::alloc::Global
could also be something like GlobalHeap
(I think suggested by @glandium?) or DefaultHeap
or DefaultAlloc
. I don't think we can choose the name GlobalAlloc
without renaming the trait GlobalAlloc
itself.
I don't think we should implement owns
. The sheer fact that two tier-1 platforms don't implement it I feel like is a nail in that coffin.
The GlobalAlloc
trait should require Sync
as a super trait, no?
Yeah, Send
as well.
I don't think Send
is actually necessary. A GlobalAlloc
is never dropped and does not require a &mut
.
My rule of thumb for Send
is: "is it OK to drop this type in a different thread than the one it was created in?"
@fitzgen @sfackler I don't think either trait is necessarily required per se though, inherently sticking it into a static
will require both traits for sure. I think the main benefit may be being informative early on, but we've tended to shy away from those types of trait bounds.
As Alex said, Sync
is already required by static
items which are the only ones where #[global_allocator]
can be used.
@alexcrichton
sticking it into a static will require both traits for sure.
That should only require Sync
, as @SimonSapin said, but not Send
, right? Only &T
is shared across threads.
from_size_align_unchecked
is not an unsafe method? Are there any invariants that Layout
upholds?
Why Void
and not fn alloc<T>() -> *mut T
?
from_size_align_unchecked
is unsafe, that was a mistake. See https://doc.rust-lang.org/nightly/core/heap/struct.Layout.html#method.from_size_align for invariants.
alloc_one<T>()
is a convenience method on top of alloc(Layout)
, you can’t always statically construct a type that has the desired layout.
Every time I see this trait name I get it confused with GlobalAlloc
the Windows API function: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366574
@retep998 What do you think would be a better name for the trait?
[Reposting in the right tracking issue this time...]
So, the name currently decided on by @SimonSapin for the allocated memory type seems to be Opaque. I concur that Void is a dubious name, but I don't think Opaque is great, mainly due to its complete lack of descriptiveness. I propose one of the following:
Blob
Mem
MemBlob
MemChunk
OpaqueMem
– not my favourite, but at least slightly more explicit than just OpaqueI would add Raw
to the list, since allocation returns raw untyped memory.
RawMem
would work too, yeah. Just Raw
is too vague though.
What word does OED define as the following perfect description of our use case?
Used to emphasize a lack of restriction in referring to any thing or amount, no matter what.
answer
This is a serious suggestion by the way.
extern {
fn memcpy(dest: *mut Whatever, src: *const Whatever, n: size_t);
}
unsafe impl GlobalAlloc for BetterAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut Whatever {
/* ... */ as *mut Whatever
}
}
It’s not a regular noun though. (An interrogative/relative pronoun yes, but those are a bit different.)
What is the caller going to store in this memory that I allocated for them?
Well, whatever they want.
What type of data am I copying with this memcpy?
I don't know, ¯\_(ツ)_/¯ whatever happens to be there I guess.
IMO this captures the essence of void *
.
@rfcbot fcp merge
The global allocator APIs and associated infrastructure have been around for a very long time at this point and I think that it's high-time to propose stabilization. With @SimonSapin's recent PR to refactor out a GlobalAlloc
trait I'd like to FCP to stabilize just that trait and associated items (plus a conservative API on Layout
)
Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged teams:
No concerns currently listed.
Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
As for the name of Opaque
, I'm hoping we can decide during stabilization. I'm personally currently in favor of *mut Whatever
There could be some more bikeshedding on the Global
name too. (https://github.com/rust-lang/rust/commit/c32a3de476768cb222f5062efdf71c73b38739fa#commitcomment-28405600)
I’d rather stick with Opaque
than go with Whatever
. Both are too vague, but especially Whatever
, which is really nebulous.
As motivated in https://github.com/rust-lang/rust/issues/49668#issuecomment-381219288, *mut Whatever
conveys a pointer that points to whatever data happens to exist there, of whatever type. One can allocate memory intended to hold whatever the caller wants there, one may copy whatever data from one pointer to another, etc.
While I’m all for keeping momentum, there’s still a number of open issues to resolve before stabilization. (The semantics of layout fit is a subtle one that I don’t quite understand.)
One open question that came up during the all-hands is Layout::for_value
. This does not need to actually dereference the pointer, so maybe it should be changed to a raw pointer. The trouble with making it a reference is that references (maybe?) assert that the value they point to is valid, while for_value
is called with a reference to a dead value e.g. by Arc::drop
.
Cc @nikomatsakis
@RalfJung It calls std::mem::size_of_val
which takes &T
@SimonSapin Good point. Probably we should have raw pointer versions of that (and align_of_val
).
@dtolnay Okay, I see your point. The problem for me is still that Whatever
doesn’t really work as a noun.
I would like to repeat @alexreg's suggestion of RawMem
. Either that or keep Opaque
, I think that Whatever
feels too informal.
@SimonSapin indeed yeah there are some unresolved questions, but I'd consider them largely bikesheds by this point. (I also see our bikeshed for this as more of an airplane hangar it's got so many coats of paint)
extern type
- I am personally mostly a fan of *mut u8
, but I realize that this isn't popular. If extern type Foo
doesn't work out though I think we'll be forced to switch to *mut u8
to get .offset
working in terms of bytes. I wouldn't see this as the end of the world.Global
naming - a bikeshed!usable_size
, I believe this is settled? That is, you must deallocate and reallocate with precisely the same size and alignment. (alignment has our hand forced by Windows I believe)oom
here. Ok, sounds good. (FWIW I’m also fine with *mut u8
.)
@alexcrichton why are we stabilizing GlobalAlloc::oom
? I still do not understand why the allocator is the thing that decides how OOM is treated.
OOM in general to me feels pretty closely coupled with the global allocator (in that you often want to customize both). On a technical level it's pretty awkward for us to move OOM handling off the GlobalAlloc
trait and if we were to not stabilize it then customizing OOM wouldn't be possible and you also wouldn't even get a nice message saying that OOM happened.
They're still on separable axes in that only the final crate decides the global allocator, and it's up to that crate to pick-and-choose implementations if it likes.
It's true you often want to customize both but they're totally orthogonal customizations. If they're conflated, what does that actually look like? Are people expected to make their own GlobalAlloc
implementation that wraps another and delegates all methods except for oom
? How would System::oom
and Jemalloc::oom
differ?
Yes if you want to customize OOM and not the allocator you'd have to make your own implementation of the trait and dispatch all methods except one. That, however, doesn't seem like that much overhead to me (it's not really that common to override OOM?). I would imagine Jemalloc::oom
is the same as System::oom
, but we have to make sure that it's possible to define it as that.
So it seems like we're trying to hitch OOM customization to gobal allocator customization because we happen to already have the glue in place for it. What is the specific driver for stabilizing it in the first place? Is there concrete demand for customizing OOM? The "delegate n-1 methods" workflow seems really bad.
We somehow need to stablize when you're using jemalloc that OOM prints out a nice message like the system allocator would by default. Currently we're not really in a position to remove OOM handling from the trait to decouple oom handling from the GlobalAlloc
trait, which in my mind forces our hand on stabilizing GlobalAlloc::oom
.
If we can figure out how to move the method off the trait and a good way and have custom allocators by default route to "print an OOM message" then we should be able to avoid stabilizing it for sure.
That being said, I don't really see the erognomic downside here as really that bad, it's something you do vanishing rarely and we're about enabling workflows rather than making the workflows perfect right now.
I feel like I must be missing something because this doesn't seem like a hard problem to solve:
Option 1: Replace the code in liballoc that calls the_global_allocator.oom()
with code that calls rtabort!("out of memory")
or whatever System::oom
does.
Option 2: Keep GlobalAlloc::oom
, give it a default impl (which every implementation will want anyway, right?), and leave it unstable.
@sfackler Keep in mind that liballoc can't depend on any crates other than core, so the only implementation of oom
that liballoc could provide would be to call intrinsics::abort
. This isn't very user-friendly since it just crashes the program with an illegal instruction.
There are only really 2 options to provide a user-friendly OOM message:
GlobalAlloc
, which allows allocator implementations to call libc to print a message and abort.no_std
programs to use liballoc (since liballoc relies on this lang item existing).As far as the workflow being bad, if I want to pick what happens on OOM, I want to write something like
#[oom_handler]
fn oom() -> ! {
// log stuff!
}
not
use alloc::{Layout, GlobalAlloc, Opaque};
use system_alloc::SystemAllocator;
#[global_allocator]
static _ALLOC: OomHandlingAllocator<SystemAllocator> = OomHandlingAllocator(SystemAllocator);
struct OomHandlingAllocator<A>(A);
impl<A> GlobalAlloc for OomHandlingAllocator<A>
where
A: GlobalAlloc
{
unsafe fn alloc(&self, layout: Layout) -> *mut Opaque {
self.0.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut Opaque, layout: Layout) {
self.0.dealloc(ptr, layout)
}
unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut Opaque {
self.0.alloc_zeroed(layout);
}
unsafe fn realloc(&self, ptr: *mut Opaque, layout: Layout, new_size: usize) -> *mut Opaque {
self.0.realloc(ptr, layout, new_size)
}
// hopefully I didn't forget to delegate anything...
fn oom(&self) -> ! {
// log stuff!
}
}
We should put the work into making this reasonable.
OOM and the global allocator seem to be two examples of "compiler-endorsed singletons" ("stable lang-items"?) or whatever one may want to call them; I think I've seen this brought up elsewhere where people wanted to use a similar mechanism for something else. Maybe it makes more sense to make them both instances of the same mechanism, rather than bolting them together in one trait?
@sfackler as @Amanieu mentioned unfortunately we can't tweak the definiton as-is today. The GlobalAlloc
trait is defined in core
which has no access to primitives like write
, so it can't define an implementation that either delegates to or implements the "reasonable expectation" by default. (but libstd can do this).
We could perhaps move GlobalAlloc
but we could only move it up as far as alloc
which as the same issues, it doesn't have access to system resources as it purely assumes global allocation.
We could, however, use an entirely new mechanism for defining OOM handlers like panic handlers and such, but I really do think that's overkill. What you mentioned is indeed not super ergonomic, but I don't think it's something we must fix. Your example could alternatively be written as:
extern crate some_crates_io_crate;
use std::alloc::System;
use some_crates_io_crate::{CustomOom, Oom};
#[global_allocator]
static ALLOC: CustomOom<System, MyOom> = CustomOom(System, MyOom);
struct MyOom;
impl Oom for MyOom {
fn oom() -> ! { ... }
}
I don't understand what you mean by overkill.
The amount of compiler support necesary for custom panic implementations, custom global allocators, custom panic runtimes, etc, is quite large and difficult to maintain. Adding "yet another one" is overkill in my mind (as I've implemented all of these in the compiler so far) compared to just having a crate on crates.io that allows customizing just OOM
oom seems very similar to panic_fmt - why would it not be able to use the same logic as that but be pulled in through liballoc rather than libcore?
I think OOM is actually very similar to panic in the kinds of ways you'd want to control it. Setting something like this statically at build time is the least flexible way of doing things, and I think we'd want to end up in a place in std where we have something like std::panic::set_hook
but for OOM. This allows you more control to easily change your OOM behavior based on runtime config.
@sfackler yeah we can concoct anything we want to design an OOM handler into the system, my point is that I feel like it's all overkill. We've got a workable solution today and it's not clear to me why we'd want to strong optimize differently for various use cases.
It's always possible, yes, for alloc
to have another "weak lang item" defined in libstd, and then libstd has infrastructure for saying "here's the real oom handler". That comes at the downside of adding another assumption to alloc
(an OOM handler and a global allocator).
How would System::oom and Jemalloc::oom differ?
Today, jemallocator::Jemalloc::oom
is defined as a call to std::alloc::System::oom
.
Is there concrete demand for customizing OOM?
We have an accepted RFC https://github.com/rust-lang/rust/issues/48043 for some way to opt into OOM causing a panic rather than an abort. The exact mechanism is not defined by the RFC and left up to the implementation, but this is not implemented yet. fn oom() -> !
(whether it’s a method of #[global_allocator]
or a free function with its own similar attribute) could be that mechanism.
:bell: This is now entering its final comment period, as per the review above. :bell:
We've got a workable solution today and it's not clear to me why we'd want to strong optimize differently for various use cases.
My point is that the current solution is bad API design. The fact that jemallocator
depends on alloc_system
just so that it doesn't need to copy all of this stuff is bad: https://github.com/rust-lang/rust/blob/master/src/liballoc_system/lib.rs#L370-L428. Dlmalloc just doesn't implement oom, not really sure what the deal is there: https://github.com/alexcrichton/dlmalloc-rs/blob/c99638dc2ecfc750cc1656f6edb2bd062c1e0981/src/global.rs#L123-L125.
The GlobalAlloc
trait should be responsible for allocating memory, not allocating memory and figuring out how to print an error message on OOM. When someone in the future asks "why is OOM behavior configured through the global allocator?", what is the answer other than "we didn't have the energy to make a lang item and it's a convenient place to stick it"?
I am willing to do the work of making a #[lang="oom"]
. It's another lang item you need to define if you're working in alloc-but-no_std-land, but I think that is a reasonable price to pay.
@sfackler thanks for doing the legwork to turn it into a lang item!
Now that #50144 has moved OOM to its own lang item, this may not be the right place to talk about this, so please redirect me to the right place if not.
In Firefox, we have the issue that we need to differentiate OOM crashes from other crashes, and we also need to collect the requested size that led to OOM when it happens. The problem is that the GlobalAlloc changes removed the oom function argument that could have allowed that, and the new oom lang item has kept the new signature.
The alternatives are not great:
How does the C side of the codebase deal with this?
The C side has different function calls for fallible and infallible allocations. Fallible allocations return NULL when they fail, and infallible allocations panic and handle recording the OOM requested size.
How would the old oom function have supported what you want, though? It's still a separate function that's called after the function that failed, so you'd still need thread local tracking of what went wrong.
Since you control the allocator, you could have the allocation functions directly abort like the C versions do?
How would the old oom function have supported what you want, though? It's still a separate function that's called after the function that failed, so you'd still need thread local tracking of what went wrong.
The old oom function was taking an AllocErr, which contained the requested Layout.
Since you control the allocator, you could have the allocation functions directly abort like the C versions do?
As I said, that would mean making all rust allocations infallible, even for code that does want to handle allocation failure.
Ah sure - the AllocErr parameter was removed a bit before it turned into a separate function and I had forgotten about that.
It seems reasonable to me for oom take a Layout as a parameter.
Shall I just go ahead with a PR doing that?
Sure!
Should it be Layout
or just usize
for the allocation size?
Since checking for null and calling oom(…)
might be a very common pattern, it might make sense to provide convenience API(s) for it. But to avoid users having to pass the layout (or size) twice, it’d have to be a whole set of parallel APIs for alloc
, alloc_zeroed
, and realloc
. With Alloc::*_excess
, it’s now two axes where we potentially want all combinations.
Considering alloc might fail because of wrong alignment requirements, an oom handler may want to check that too.
(Thank you for reminding me that I wanted to add alloc_zeroed_excess
)
So the downside is that, with AllocErr now being a ZST, there are places in raw_vec where it's actually not convenient to pass a Layout to oom:
https://github.com/rust-lang/rust/blob/9e8f683476d5a9d72c6c1e9383a519cf0fb27494/src/liballoc/raw_vec.rs#L319
https://github.com/rust-lang/rust/blob/9e8f683476d5a9d72c6c1e9383a519cf0fb27494/src/liballoc/raw_vec.rs#L328
https://github.com/rust-lang/rust/blob/9e8f683476d5a9d72c6c1e9383a519cf0fb27494/src/liballoc/raw_vec.rs#L445
https://github.com/rust-lang/rust/blob/9e8f683476d5a9d72c6c1e9383a519cf0fb27494/src/liballoc/raw_vec.rs#L555
The final comment period is now complete.
Another problem with the oom lang item: it doesn't allow to override the one from std:
error[E0152]: duplicate lang item found: `oom`.
= note: first defined in crate `std`.
Presumably, this would be fixed by making it an actual weak lang item.
Continuation of my earlier comment about passing Layout to oom being painful, it's particularly painful in
https://github.com/rust-lang/rust/blob/9e8f683476d5a9d72c6c1e9383a519cf0fb27494/src/liballoc/raw_vec.rs#L555
where the patch looks like:
@@ -552,7 +553,7 @@ impl<T, A: Alloc> RawVec<T, A> {
pub fn reserve(&mut self, used_cap: usize, needed_extra_cap: usize) {
match self.try_reserve(used_cap, needed_extra_cap) {
Err(CapacityOverflow) => capacity_overflow(),
- Err(AllocErr) => oom(),
+ Err(AllocErr) => oom(Layout::array::<T>(self.amortized_new_size(used_cap, needed_extra_cap).unwrap()).unwrap()),
Ok(()) => { /* yay */ }
}
}
It ends up similarly painful in
https://github.com/rust-lang/rust/blob/9e8f683476d5a9d72c6c1e9383a519cf0fb27494/src/libstd/collections/hash/table.rs#L773
https://github.com/rust-lang/rust/blob/9e8f683476d5a9d72c6c1e9383a519cf0fb27494/src/libstd/collections/hash/table.rs#L812
https://github.com/rust-lang/rust/blob/9e8f683476d5a9d72c6c1e9383a519cf0fb27494/src/libstd/collections/hash/map.rs#L787
where the patch similarly needs to call a function.
Actually, the ones for collections are just plain horrible, especially the one for HashMap, which I haven't implemented because of the error-proneness. At this point, I'm tempted to say infallibility should be handled by the allocator itself. Adding methods is not incredibly attractive, so maybe a parameter to the alloc functions?
Another problem with the oom lang item: it doesn't allow to override the one from std:
The thing to do here is to make an API like std::panic::set_hook
. It is already a weak lang item.
@glandium want to open a separate tracking issue for customizing OOM behavior? (through an API like @sfackler is thinking)
set_hook
or a lang item – either sounds like a good way to handle OOM for the GlobalAlloc
. Is there any reason for preferring one over the other?
Oh, I didn't know about restriction 2. As for 1, it can be circumvented easily as you have a lang item. :-)
I noticed that the compiler will accept both a normal and an extern
function for the oom
lang item:
// This is how rust_oom is defined in std
#[lang = "oom"]
extern fn rust_oom() -> ! { loop {} }
// The compiler also accepts this
#[lang = "oom"]
fn rust_oom() -> ! { loop {} }
Is this intentional? Does rustc automatically pick the correct ABI for the lang item?
OTOH, global_allocator is set at compile time, and you get one by default if you don't set one yourself. Why should oom handling be different?
I don't understand. Why would we want to be less flexible here?
Why would you not want global_allocator to be similarly more flexible, then?
Anyways, this discussion also doesn't address the problem I raised in my last comments, where trying to pass a layout to the oom function is difficult in some places. As mentioned by @SimonSapin this can be addressed by adding helper functions to the Alloc
trait, but the multiplication of functions is not helping.
I'm now thinking of something like:
struct Infallible<A: Alloc>(A);
impl<A: Alloc> Alloc for Infallible<A> { ... }
I was thinking that the infallible APIs could maybe return NonNull
, but that only works if they’re separate APIs.
Why would you not want global_allocator to be similarly more flexible, then?
Because we can't. panic::set_hook
requires allocating, and the runtime allocates before main.
Because we can't. panic::set_hook requires allocating, and the runtime allocates before main.
That's a fair reason. OTOH, I'm not convinced the flexibility for oom would be any useful. And "because lang items can be defined only once" doesn't seem like a compelling reason, when, as mentioned earlier, the problem doesn't exist for #[global_allocator]
. IOW, I don't see why oom can't be handled more similarly to #[global_allocator]
.
(or #[panic_implementation]
, for that matter)
@Amanieu lang items aren't type checked.
@glandium It is way easier to implement a panic-hook style interface than another thing like global_allocator. panic_implementation behaves identically to oom.
I was thinking that the infallible APIs could maybe return NonNull, but that only works if they’re separate APIs.
Callers are not calling directly the GlobalAlloc
trait methods, though. And the Alloc
methods already return NonNull
, albeit in a Result
with an AllocErr
. It would sure be better if an infallible API returned !
instead of AllocErr
. In the Infallible<A>
scheme I suggested above, that could be done by adding an Error
associated type to the Alloc
trait.
Something like panic_hook
requires dynamic dispatch, which is not a cost we wanted to pay for allocators. Panics are the extra-slow path, like OOM, so don't need to be optimized at all.
So I gave Infallible<A>
a shot, and while it does help with many places calling oom
, it doesn't help with all the places involving CollectionAllocErr
. This would be equally true with infallible methods rather than Infallible<A>
. It seems, though, it could be reasonable to add the Layout
to CollectionAllocErr::AllocErr
, although that would kill From<AllocErr> for CollectionAllocErr
.
Ok, I have a working PoC for Infallible<A>
and oom(Layout)
, and I haven't touched anything related to GlobalAlloc
, so I'll move the discussion to #48043 and #32838 and will file a separate issue for OOM itself.
Refocusing on GlobalAlloc
itself, there doesn't seem to have been a conclusion on the bikeshedding.
cc @SimonSapin
I just went through https://github.com/rust-lang/rust/pull/49669 which does not offer much rationale about some of the changes but points to this issue as the correct place for discussion, and also mentions the All-Hands but I wasn't there (maybe someone that was there can answer my questions).
It looks like https://github.com/rust-lang/rust/pull/49669 takes many steps in the opposite direction that I at least thought that the allocator design was going, but I don't see these issues being mentioned here, and they are IMO fundamental.
I thought that the plan was to:
Before https://github.com/rust-lang/rust/pull/49669 we were already at the step where we had the different allocator crates implementing their own traits, so I thought the next steps towards the allocator system structure would be to just replace them with submodules pointing to the crates in the nursery.
However, https://github.com/rust-lang/rust/pull/49669 changes all these crates to not implement any allocator traits anymore, instead implementing GlobalAlloc
for the allocators in liballoc/alloc.rs
and implementing Alloc
on top of the more restricted GlobalAlloc
API.
This prevents allocators from providing more efficient implementations of the trait methods. For example, liballoc_jemalloc
used to provide more efficient implementations of Alloc:shrink_in_place
and Alloc::grow_in_place
using xallocx
, but this is no longer the case. Also, Alloc::usable_size
used to return the correct available size for jemalloc but now it returns the size that was requested, which I'd consider a bug.
have one crate per allocator
Once upon a time, customizing the global allocator was done through an "allocator crate" with some dedicated crate-level attributes (and I think some free functions expected to be at the root?) Maybe you’re thinking of this?
Now we have the #[global_allocator]
attribute that goes on a static
item whose type implements GlobalAlloc
, but that type can be defined anywhere. Crates are not special there, the same crate could have multiple types that implement that trait.
have each allocator implement the traits themselves (GlobalAlloc, Alloc, etc.)
I don’t think there ever was a requirement that any type implements both traits. jemallocator
does for convenience of users.
get the rust version of jemalloc
When System
becomes the default, I expect the alloc_jemalloc
crate to be removed and I’d say there will no longer be a "rust version of jemalloc". Unless you consider the nursery to be part of "the rust project" but I view it more as crates that happen to be maintained by some of the same people as std.
I thought the next steps towards the allocator system structure would be to just replace them with submodules pointing to the crates in the nursery
As far as I know there never was a plan for jemallocator
to be a sumbodule of this repository and be distributed together with std
. Rather, users can opt to add it to their [dependencies]
in Cargo.toml
, and define themselves a static
with #[global_allocator]
.
Alloc:shrink_in_place
,Alloc::grow_in_place
,Alloc::usable_size
These methods are not part of the initial stabilization:
The high level idea that lead to #49669 is that we want to stabilize #[global_allocator]
and the trait that supports it, but we’re not ready to stabilize the Alloc
trait. From in-person discussion (sorry if I didn’t explain the reasoning enough here) we settled on having a separate trait with only "libc-equivalent functionality", minimizing the API surface in order to minimize the number of blockers to initial stabilization.
allow allocators, including the global one, to provide more specialized implementations of the GlobalAlloc and the Alloc traits?
jemallocator
already does.
@SimonSapin thanks for taking the time to go over all that on IRC.
To summarize my thoughts. I think that after understanding the next steps of unifying __rde_
/__rdl_
, removing liballoc_jemalloc
, ... the general direction looks to me to be the right one.
The only thing I think might be better to do differently is the implementation details of the liballoc_system
crate. This is a nitpick, but on some platforms like FreeBSD, NetBSD, and newer Android versions, jemalloc is the system allocator, and I think it would be cool if in these systems liballoc_system
would have feature parity with the jemallocator
crate. For example, I'd expect that in these systems the specializations of the Alloc
trait for jemalloc, that are available in the jemallocator
crate, are also implemented in the impl of Alloc
for the System
allocator and for the Global
allocator when Global == System
.
Yeah, seems reasonable for alloc_system to become more tuned for individual platforms over time.
To summarize a bit the discussion in internals, these are the changes that I would do to the global allocator trait. It is a fallible API that always returns the "Excess" and would look like this (where Excess
is now called Allocation
):
/// Represents a memory allocation, that is, a combination of a starting address
/// and the allocation layout.
pub struct Allocation(*mut Opaque, Layout);
impl Allocation {
/// Instantiates a memory allocation from a pointer and a layout.
pub fn new(ptr: *mut Opaque, layout: Layout) -> Self;
/// Starting address of the memory allocation.
pub fn ptr(&self) -> *mut Opaque;
/// Layout of the allocation.
pub fn layout(&self) -> Layout;
}
/// The `AllocErr` error specifies that an allocation failed.
///
/// This can only happen due to resource exhaustion or because the allocator
/// does not support the provided `Layout`.
#[derive(Clone, PartialEq, Eq, Debug)]
pub struct AllocErr(Layout);
impl AllocErr {
/// Layout that produced the allocation to fail.
pub fn layout() -> Layout;
}
/// Reallocation error.
///
/// This can happen either due to resource exhaustion, because the allocator
/// does not support the provided `Layout`, if the `Allocation` was allocated
/// with a differen allocator, etc.
///
/// It contains the `AllocErr` containing the provided `Layout` and the
/// `Allocation` that failed to reallocate.
pub struct ReallocErr(AllocErr, Allocation);
impl Realloc {
/// Layout of the reallocation request.
fn layout_request(&self) -> Layout;
/// Allocation that failed to reallocate.
fn allocation(self) -> Allocation;
}
/// A memory allocator that can be registered to be the one backing `std::alloc::Global`
/// though the `#[global_allocator]` attributes.
pub unsafe trait GlobalAlloc {
/// Allocates memory as described by the given `layout`.
fn alloc(&self, layout: Layout) -> Result<Allocation, AllocErr>;
/// Deallocate the memory `allocation`.
///
/// # Safety
///
/// The `allocation` pointer must come from this allocator and its size
/// and alignment must be either the requested or returned.
unsafe fn dealloc(&self, allocation: Allocation);
/// Allocates zeroed memory as described by the given `layout`.
fn alloc_zeroed(&self, layout: Layout) -> Result<Allocation, AllocErr>;
/// Shink or grow a memory `allocation` to the given `new_layout`.
/// # Safety
///
/// The `allocation` pointer must come from this allocator and its size
/// and alignment must be either the requested or returned.
unsafe fn realloc(&self, allocation: Allocation,
new_layout: Layout) -> Result<Allocation, ReallocErr>;
}
A couple of notes on the changes:
Excess
is renamed to Allocation
and contains a pointer to the allocation and a layout. The pointer can be null, and the layout can have size zero. As it was the case, if that is a valid the allocation, calling deallocate with that allocation should succeed.alloc
and alloc_zeroed
are not unsafe fn
s. They either succeed or return an error. If an allocator implementation returns Ok
then that's a valid allocation that can be passed to dealloca
and realloc
.Allocation
are still unsafe because that Allocation
could come from a different allocator, and thus cause undefined behavior. It is a precondition on these methods that the Allocation
s provided come from calling one of the allocation methods of these allocators.Allocation
is neither Copy
nor Clone
, double frees cannot happen without the user using unsafe code to duplicate an Allocation
. realloc
consumes the allocation and, if there was an error, returns it. # Safety
needs to be tuned such that collections like Vec
and Box
can recompute the Allocation
from the capacity and alignment of the type, where the capacity might be the size requested, or the size returned, and the same applies to the alignment. That is, recomputing an Allocation
from the pointer, the size returned, and the alignment requested, should work out. I can try to send a working implementation of this as a PR over the weekend, that way we can see how much does this change the design, and do a perf run to see the impact on performance. I expect that suitable #[inline]
reduce the impact to close to zero when jemalloc is not used, since the only thing we would do is access the pointer and drop the Layout
.
Since AllocErr
now contains the layout, it should be possible to just pass it to oom
without contortions @glandium . An alternative is to provide an oom()
method to AllocErr
and ReallocErr
that do this for you. The only thing we would need to allow this AFAICT is a Display
impl for Layout
.
From the top of my head, at quick glance:
Using a pointer instead of NonNull
makes Result
s larger than they should be. Also, how can a null pointer be an Ok
result when there's AllocErr
? Are callers supposed to check both now? There's still the question whether a zero-size Layout
should be valid or not, both for codegen reasons (see last paragraph of the first message in https://internals.rust-lang.org/t/pre-rfc-changing-the-alloc-trait/7487) and consistency reasons (many things assume ZSTs have an address in the null page, allowing the allocator to whatever it pleases with that sounds dangerous).
Returning the layout as part of the AllocErr
is actually making things awful from a codegen perspective because to use that value to call oom
(the only reason it's there would be to feed it there) means things need to be moved between registers. See https://github.com/rust-lang/rust/issues/32838#issuecomment-377097485 (beware, it's in the "hidden items"). Using the Layout
the caller already has is actually better in that regard, and I have a PR underway that deals with the oom
thing without too much contortion already, I just needed to adjust some methods to make things easier.
One thing that is nice-ish about Allocation
is that there could be:
impl From<*mut T> for Allocation {
fn from(ptr: *mut T) -> Self {
Allocation::new(ptr as *mut Opaque, Layout::new::<T>())
}
}
I'm not particularly convinced by ReallocErr
.
Because these functions consume the allocation, and Allocation is neither Copy nor Clone, double frees cannot happen without the user using unsafe code to duplicate an Allocation. realloc consumes the allocation and, if there was an error, returns it.
Is that true? I think I can implement Clone
in safe code:
fn clone_alloc(a: &Allocation) -> Allocation {
Allocation::new(a.ptr(), a.layout())
}
@RalfJung indeed, Allocation::new
should be unsafe
.
@glandium
Using a pointer instead of NonNull makes Results larger than they should be. Also, how can a null pointer be an Ok result when there's AllocErr?
This is already the case. The current allocation API states that if you allocate a Layout
with size() == 0
the allocator can either return an error, or return a unique pointer value on which deallocate
must succeed. This value can be Null
. Quoting the current docs:
A note regarding zero-sized types and zero-sized layouts: many methods in the Alloc trait state that allocation requests must be non-zero size, or else undefined behavior can result.
However, some higher-level allocation methods (alloc_one, alloc_array) are well-defined on zero-sized types and can optionally support them: it is left up to the implementor whether to return Err, or to return Ok with some pointer.
If an Alloc implementation chooses to return Ok in this case (i.e. the pointer denotes a zero-sized inaccessible block) then that returned pointer must be considered "currently allocated". On such an allocator, all methods that take currently-allocated pointers as inputs must accept these zero-sized pointers, without causing undefined behavior.
In other words, if a zero-sized pointer can flow out of an allocator, then that allocator must likewise accept that pointer flowing back into its deallocation and reallocation methods.
The API I proposed is a non-functional change, so it allows this as well, and therefore it cannot use NonNull
. If we were to lift this requirement, then the API could use NonNull
.
Returning the layout as part of the AllocErr is actually making things awful from a codegen perspective because to use that value to call oom (the only reason it's there would be to feed it there) means things need to be moved between registers.
I went through the posts there but I failed to find any convincing argument about why this is bad from a codegen perspective beyond "it just happens". Is this a codegen bug? I think that for ease of use we could just add Layout::oom(self) -> !
method that calls the oom
lang item passing itself as argument. I have tried with and without a Layout in the AllocError
, and at least on godbolt both cases always produce the same assembly https://godbolt.org/g/MiWHpP What am I missing? I can't think of any situation in which if they don't generate the exact same code then that would be a codegen bug, but we should not design the APIs to workaround codegen bugs, we should identify them, asses whether they are solvable, and if so report them/fix them.
It is a bit how to follow exactly what the proposed API for global alloc is or should be, because we don't have a design document anywhere for that, and one has to extract the information from the implementation, the PRs, and the comments that are scattered on many different places.
If @SimonSapin @sfackler and @glandium agree that an API like the one I suggested is worth giving a try, I can send a PR upstream so that we can try it out and identify codegen issues. It might well be that it doesn't work out.
Maybe it would be worth it to summarize exactly what the current GlobalAlloc
API looks like and why somewhere as well.
For one, you're talking about GlobalAlloc
and are quoting the Alloc
trait doc, which is not the same thing. Secondly, you're quoting the doc as it was multiple weeks ago, and that's nowhere "current". Many things changed when GlobalAlloc
landed, including the Alloc
trait. This is the current doc: https://doc.rust-lang.org/nightly/core/alloc/trait.Alloc.html. Methods don't return null anymore.
I’ll update my proposed API to use non-null then. There are still comments
I the source code lying around these issues though.
On Sat 19. May 2018 at 13:05, Mike Hommey notifications@github.com wrote:
For one, you're talking about GlobalAlloc and are quoting the Alloc trait
doc, which is not the same thing. Secondly, you're quoting the doc as it
was multiple weeks ago, and that's nowhere "current". Many things changed
when GlobalAlloc landed, including the Alloc trait. This is the current
doc: https://doc.rust-lang.org/nightly/core/alloc/trait.Alloc.html.
Methods don't return null anymore.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/49668#issuecomment-390397535,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA3Npn7nI8m_kB2ise4TwvWCMNqndfJrks5tz_xngaJpZM4THipZ
.
@glandium methods are still able to return null for GlobalAlloc
though, and they do in some cases: https://github.com/rust-lang/rust/blob/master/src/liballoc_system/lib.rs#L121
Why can GlobalAlloc
return null ptrs but Alloc cannot?
GlobalAlloc
methods that return nullable pointers don’t return a Result
. Null there is equivalent to Err
in Result<NonNull<Opaque>, AllocErr>
.
To summarize a bit the discussion in internals, these are the changes that I would do to the global allocator trait.
This is a lot of changes, they’d need a lot of justification to do this late in the process after FCP to stabilize is already over. The burden of proof is on you, at this point.
Why does Allocation
contain a full Layout
rather than just the excess size? Is the alignment in it always the same as was passed to e.g. alloc()
? What makes it worth the cost of a larger return type?
Is the Layout
inside of AllocErr
always the same that was passed to e.g. alloc()
? If so why is it there at all?
Because the current AllocErr
is zero-size, Result<NonZero<Opaque>, AllocErr>
can have the size of a single pointer (using null to represent errors). Your proposal quadruples this size.
It is a precondition on these methods that the Allocations provided come from calling one of the allocation methods of these allocators. […] Because these functions consume the allocation, and Allocation is neither Copy nor Clone, double frees cannot happen without the user using unsafe code
This doesn’t sound useful at all. Collection instances typically do not store the equivalent of a full Layout
(so doing so would add space overhead), and even if they did deallocation typically happens when we get &mut self
exclusive access but not ownership to e.g. move fields out of a struct. If every single use needs to use the unsafe constructor, what is the point of ownership semantics for Allocation
?
Why does Allocation contain a full Layout rather than just the excess size?
Some allocators (e.g. the system allocator on windows) require the alignment used on deallocation to be exactly the same to the alignment that the memory was allocated with. Therefore, at least on windows, the memory address and the full layout is what describes a memory allocation.
Is the alignment in it always the same as was passed to e.g. alloc()?
Not necessarily, the allocator can return a Layout
containing a higher alignment.
The cost of returning an extra usize
is tiny compared to the cost of allocating and touching memory, and zero if the value is just propagated without using it and the allocation methods are inlined.
Is the Layout inside of AllocErr always the same that was passed to e.g. alloc()?
Yes.
If so why is it there at all?
Because otherwise one needs to bubble it up together with AllocErr
if oom
is to be called at a higher level than where the allocation happened and that's inconvenient. See also: https://github.com/glandium/rust/commit/5a522352904f2873c2622899ecda067c25368dd5 .
Because the current AllocErr is zero-size, Result
, AllocErr> can have the size of a single pointer (using null to represent errors). Your proposal quadruples this size.
AFAICT the proposal only requires this size to double. The Ok
variant contains a single pointer, and the Err
variant contains two usize
s. Note that neither the pointer nor the Layout
alignment can be zero (and the Layout
size should probably never be zero as well). Currently on nightly, the sizes triples, due to https://github.com/rust-lang/rust/issues/46213.
If every single use needs to use the unsafe constructor, what is the point of ownership semantics for Allocation?
This is a good point, and from porting RawVec
and Box
to use Allocation
I am pretty sure that none of the std
collections will store it internally.
But not storing a full Allocation
is an optimization that some collections can perform. Box doesn't need to store the allocation size, but RawVec cannot perform this optimization and needs to store both the pointer and the size. At the same time, neither Box
nor RawVec
need to store the alignment, but an OverAlignedRawVec
that stores it might just store the full Allocation
directly.
EDIT: FWIW I would prefer not to have the alignment be part of Allocation
, but the API (in particular that of Alloc
) did get a lot simpler by just using Layout
and Allocation
consistently.
Therefore, at least on windows, the memory address and the full layout is what describes a memory allocation.
That sounds correct in abstract terms, but doesn’t say why everything that "describes" the allocation needs to be part of the return value of alloc()
.
if the value is just propagated without using it and the allocation methods are inlined.
To enable #[global_allocator]
, there are symbols like extern { fn __rust_alloc(…) } that cannot be inlined without LTO (which IIRC is not enabled by default in Cargo’s release mode).
an
OverAlignedRawVec
that stores it might just store the fullAllocation
directly.
I think that hypothetical is not sufficient to justify a guiding principle for redesigning this entire API.
which IIRC is not enabled by default in Cargo’s release mode
ThinLTO is enabled by default on release mode.
I think that hypothetical is not sufficient to justify a guiding principle for redesigning this entire API.
This was tangentially discussed on IRC last week (maybe @sfackler has the logs?). The only thing that would be backwards incompatible to change here is the definition of Layout
which in my branch can't have zero size.
For everything else, we can always add a GlobalAlloc2
trait in the future, and require that either a type implementing GlobalAlloc
or GlobalAlloc2
is available in the dependency graph.
@Amanieu and I discussed (in person at impl days) this trait and the remaining open issues, consensus between us is to propose the changes below. @rust-lang/libs, what do you think?
extern
types themselves are not stable, and we’d rather not block on them. To avoid that, we can change Opaque
to be defined instead as struct Opaque(u8);
.
We make it have a size of 1 byte instead of zero so that <*mut Opaque>::offset
is not a no-op (which might be unexpected). Unlike with an extern type, *mut Opaque
can be dereferenced. But then what can be done with that dereference is very limited because we make Opaque
not implement Copy
or Clone
(or any other trait).
Rather than raw *mut Opaque
where null represent an error, change GlobalAlloc
to use Result<NonNull<Opaque>, AllocErr>
in return types (and NonNull<Opaque>
in parameter types).
We provide a conversion from raw pointers so that impls based on malloc
-style APIs stay convenient to define:
struct AllocErr;
impl AllocErr {
/// Convenience conversion for `GlobalAlloc` or `Alloc` impls
#[inline]
pub fn from_null<T>(ptr: *mut T) -> Result<NonNull<Opaque>, AllocErr> {
NonNull::new(ptr as _).ok_or(AllocErr)
}
}
Since AllocErr
is zero-size, this result has the same memory representation as a raw pointer and this conversion optimizes to a no-op.
We’re still sort of undecided about zero-size allocations. Alternatives are: 1. make them UB (alloc()
is an unsafe fn
), 2. make them guaranteed to return a non-null aligned pointer (such as NonNull::dangling
, but that likely requires an explicit check in every impl, including in dealloc
), or 3. make them safe but implementation-defined (either an error or a non-null aligned pointer). Or some combination of these, such as 1 in GlobalAlloc
and 2 in Alloc
.
Option 2 in AlloC
may allow Vec
and other users to simplify their code and remove some special code paths for not making zero-size allocations.
We can add default methods in the future, but let’s not count on GlobalAlloc2
. As far as I know the type system cannot express things like "either this or that trait". (And even if it could, let’s just not.)
For zero-sized allocations, I would like to point to some prior art in C++'s operator new
:
If this argument is zero, the function still returns a distinct non-null pointer on success (although dereferencing this pointer leads to undefined behavior).
In practice (in both Clang/libc++ and GCC/libstdc++) this is implemented as a quick check that changes the size to 1 if it is zero. I believe that the overhead in this is sufficiently low since nobody in C++-land is complaining about it.
There are 3 approaches that we can take with regards to zero-sized arguments:
1) Keep things as they are: alloc
is an unsafe function and passing a zero-sized layout is UB. This would apply to both Alloc
and GlobalAlloc
.
2) Allow zero-sized layouts in both Alloc
and GlobalAlloc
. In that case the Alloc
implementation would just forward to GlobalAlloc
, which itself would simply return the layout alignment if the size is zero (since the alignment is itself a non-zero and suitably aligned address). The alloc
method in both traits would become a safe function.
3) Allow zero-sized layouts in Alloc
but not GlobalAlloc
. This would allow the zero-check in Alloc
to be inlined into the caller and eliminated if the caller already performs a zero-check. My feeling is that the GlobalAlloc
trait is not really meant to be used directly and should instead be used through the Alloc
implementation on Global
.
I personally prefer option 2 since it keeps the API consistent between Alloc
and GlobalAlloc
, and LTO will allow inlining which can eliminate the zero check in most cases.
@SimonSapin
We’re still sort of undecided about zero-size allocations. Alternatives are: 1. make them UB (alloc() is an unsafe fn),
@Amanieu
Keep things as they are: alloc is an unsafe function and passing a zero-sized layout is UB. This would apply to both Alloc and GlobalAlloc.
There is another alternative (implemented here: https://github.com/gnzlbg/rust/commit/806d11ba4a09e1f361a8456a8c7af36d17d58c65) which makes alloc
and alloc_zeroed
safe by making impl Layout { fn from_size_align(size: usize, align: usize) -> Result<Layout, LayoutErr> }
return LayoutErr
if size == 0
. That is, what's UB is calling the unsafe fn Layout::from_size_align_unchecked
with size == 0
.
This allows making the size
inside Layout
a NonZeroUsize
as well (along with the align), which isn't a big deal if the GlobalAlloc
API does not propagate Layout
s on errors, but which would allow shrinking the size of those enums further once we get niche filling for enums.
Option 2 in AlloC may allow Vec and other users to simplify their code and remove some special code paths for not making zero-size allocations.
Currently box
does something like this where it just returns align
as the address of a zero sized allocation, but RawVec
is probably still going to need quite a bit of logic to handle zero-sized types anyways. I haven't tried but I wouldn't bet that it will simplify the implementation of RawVec
significantly (maybe not at all).
pub fn from_null<T>(ptr: *mut T) -> Result<NonNull<Opaque>, AllocErr>
That name is somewhat odd, sounds like I have to pass it a null pointer? from_raw
would be more accurate IMHO.
- make them guaranteed to return a non-null aligned pointer (such as NonNull::dangling, but that likely requires an explicit check in every impl, including in dealloc),
Seems like either every user or every implementer of GlobalAlloc
has to special-case zero-sized types. Not sure which there will be fewer of, probably implementers? Also, couldn't there be some kind of wrapper provided by rustc that would take care of handling size 0? Essentially, putting the bruden on all allocators but then factoring out the common part so it is only implemented once?
Related, @glandium
many things assume ZSTs have an address in the null page, allowing the allocator to whatever it pleases with that sounds dangerous
Do you have an example? While libstd will frequently generate ZST pointers into the null page, why would anything rely on that? Notice that if you have a ZST field in a larger struct, taking the address of that field will give you a value that's well outside the null page! So anything making such assumptions is probably already incorrect.
from_null
is a bit of a strawman. I made it a method so that is not yet-another-import. But naming it "form" is not really accurate since it doesn’t return Self
or Result<Self, _>
. It could also be a method of Opaque
, but there too I’m not sure what name fits.
My preference for what to do with zero sized allocations is to just forbid zero sized Layouts. This makes alloc safe, and changes the Layout convenience methods like new<T>
to return Option<Layout>
. I don't think that's all that bad, since ~all allocation code is already branching on zero-sizedness and that could be changed to match on the result of Layout::new
.
Seems like either every user or every implementer of GlobalAlloc has to special-case zero-sized types. Not sure which there will be fewer of, probably implementers?
This assumes that if GlobalAlloc
handles zero-sized types users won't need to handle them themsselves. Under this assumption, I would agree that it would be better for GlobalAlloc
to just handle zero-sized types since there will be fewer implementations of GlobalAlloc
than users.
But this assumption doesn't hold. For example, RawVec
and box
need to handle zero-sized types independently of whether GlobalAlloc
supports them or not:
RawVec
needs to handle the fact that vectors of zero-sized types have always capacity isize::MAX
, etc.Box
stores a Unique
internally so it just manually checks for zero size, and stores the align_of
as the non-zero address. If you'd get a NonZero
pointer from an GlobalAlloc
it would be undefined behavior to dereference this pointer.I'd find it cleaner if GlobalAlloc
would not support zero sizes or zero Layouts, but it would properly returning errors instead.
That is, if a user wants to allocate some memory, then they can use GlobalAlloc
. If they try to allocate zero memory and are using the safe API, they get an Err
. If they don't want to allocate any memory (e.g. because of ZSTs), they can't use GlobalAlloc
so they have to handle this special case differently, but this is something they have to do anyways.
@gnzlbg @sfackler
The problem with disallowing zero-sized layouts is that Layout
is used for more than just passing a size and alignment to an allocator. There are methods which allow you to construct complex layouts, which involves concatenating and repeating existing layouts. Disallowing zero-sized layouts would limit the usefulness of these methods (e.g. for generic types which might be ZSTs or zero-length arrays).
@gnzlbg
Currently RawVec
must handle zero-length allocations in two cases: ZSTs, which can be optimized away at compile time, and zero-length arrays which depend on runtime information. Currently the code for RawVec
still has checks for this second case, which could be eliminated if allocators support zero-length allocations.
As for Box
, I don't understand what you are saying? A NonZero
pointer returned by GlobalAlloc
is still required to be properly aligned, which is exactly what I propose that the allocator implementation should do.
As for Box, I don't understand what you are saying? A NonZero pointer returned by GlobalAlloc is still required to be properly aligned, which is exactly what I propose that the allocator implementation should do.
I think I was a bit confused about what dereferencing a pointer to a ZST means. Thinking about it a bit more, I think that for box
and Box
we can probably move these branches from the box
implementation into the allocator: https://github.com/rust-lang/rust/blob/master/src/liballoc/alloc.rs#L110 .
There are methods which allow you to construct complex layouts, which involves concatenating and repeating existing layouts.
With non-zero-sized Layout
s, these methods still work as intended, since existing Layouts
must be valid.
Disallowing zero-sized layouts would limit the usefulness of these methods (e.g. for generic types which might be ZSTs or zero-length arrays).
The helper methods for constructing layouts of a single type, or an array of objects of the same type also still work as intended. If the type is zero-sized, they return LayoutErr
. That's still generic.
Currently RawVec must handle zero-length allocations in two cases: ZSTs, which can be optimized away at compile time, and zero-length arrays which depend on runtime information. Currently the code for RawVec still has checks for this second case,
It's not about whether the code can be optimized away, but whether the user still has to write it. The code also checks for the case in which the T
in RawVec<T>
is a ZST, and sets the capacity of the RawVec
to a fixed value: https://github.com/rust-lang/rust/blob/master/src/liballoc/raw_vec.rs#L212
I don't see how supporting Layout
s of zero size would allow RawVec
to remove this code which most collections supporting ZSTs have in one form or another.
With non-zero-sized Layouts, these methods still work as intended, since existing Layouts must be valid.
That's not true, consider this example:
// Allocates an A followed by an array of B
fn alloc_a_b<A, B>(num_b: usize) -> Result<NonNull<Opaque>, AllocErr> {
let a = Layout::new::<A>();
let b = Layout::new::<B>();
let bs = b.repeat(num_b).map_err(|_| AllocErr); // Only returns None on arithmetic overflow
let layout = a.extend(bs).map_err(|_| AllocErr); // Only returns None on arithmetic overflow
Global.alloc(layout)
}
Having to manually handle all the cases where the creation of the intermediate Layout
s could fail would make this code much more complicated for no good reason. The HashMap
implementation is an example of the sort of code which performs this sort of allocations (although it currently performs the layout calculation manually since the code dates from before Layout
was introduced).
It's not about whether the code can be optimized away, but whether the user still has to write it.
There are quite a few place that check for cap == 0
which could be eliminated or greatly simplified if the zero check is moved into the allocator:
It seems to me GlobalAlloc
is the wrong level to handle ZSTs and whatnot. The Alloc
implementation for GlobalAlloc
would be more appropriate.
@glandium This is the 3rd option that I suggested in my comment.
@Amanieu
With non-zero-sized Layouts, these methods still work as intended, since existing Layouts must be valid.
That's not true, consider this example: [...]
How does that example show that the methods do not work as intended?
Having to manually handle all the cases where the creation of the intermediate Layouts could fail would make this code much more complicated for no good reason.
fn alloc_a_b<A, B>(num_b: usize) -> Result<NonNull<Opaque>, AllocErr> {
let a = Layout::new::<A>()?;
let b = Layout::new::<B>()?;
let bs = b.repeat(num_b)?;
let l = a.extend(bs)?;
Global.alloc(l)
}
Care to elaborate in which way is this code "much more complicated for no good reason" ? If zero Layout
s are not allowed, and one uses the safe API, this code appears to me to be as complicated as it needs to be since there are a bunch of Layout
calculations that can fail. Also, if all of them succeed, calling alloc
is safe. In any case, at least from a readability point-of-view, I don't find it significantly less readable than the example you posted.
Your code doesn't do the same thing at all. For example, if A is not a ZST and B is, this would be expected to allocate for the size of A. With your code, that ends with a failure.
@glandium ah, that makes sense!
To handle that case without zero-sized Layout
s and without support for zero-sized allocations in GlobalAlloc
one does need quite a bit of extra code (playground):
fn alloc_a_b<A, B>(num_b: usize) -> Result<NonNull<Opaque>, AllocErr> {
if let Ok(a) = Layout::new::<A>() {
if let Ok(b) = Layout::new::<B>() {
if let Ok(bs) = b.repeat(num_b) {
let l = a.extend(bs)?;
return Global.alloc(l);
}
}
return Global.alloc(a);
}
let b = Layout::new::<B>()?;
let bs = b.repeat(num_b)?;
Global.alloc(bs)
}
In particular, it is not clear whether the Layout
operations failed due to overflow or due to a zero input size.
We could move all the "utility" methods that are used for building Layout
s into a builder API (playground):
fn alloc_a_b<A, B>(num_b: usize) -> Result<NonNull<Opaque>, AllocErr> {
let a = LayoutBuilder::new::<A>();
let b = LayoutBuilder::new::<B>();
let l: Layout = a.extend(b.repeat(num_b)).try_into()?;
Global.alloc(l)
}
This is something that std
wouldn't need to expose at first, but collections like HashMap
could still use it internally.
Whether this is worth it isn't clear to me. This adds a bit of machinery but so does supporting zero-sized allocations :/
@Amanieu and I discussed (in person at impl days) this trait and the remaining open issues, consensus between us is to propose the changes below.
Implementation PR for these changes: https://github.com/rust-lang/rust/pull/51160
I opened #51163 as an example of how layouts can be used in practice. This code depends on Layout
s supporting zero sizes to handle the cases where K
and V
are both ZSTs.
There is one point when @SimonSapin and I disagree: should Global
implement GlobalAlloc
.
Pros:
Alloc
isn't going to be stabilized yet, so GlobalAlloc
will be the only way for code to directly allocate memory (without using the Vec
hack). Stabilizing Alloc
will likely take a while since we will want to have some experience using the trait in the std collections before freezing it.Cons:
GlobalAlloc
trait is intended only to allow a type to be used as #[global_allocator]
. Specifying Global
as the global allocator does not make sense since it would just recursively call itself.Alloc
is stabilized, we will want all user allocations to go through that trait. This means that we will effectively be deprecating the GlobalAlloc
implementation on Global
when Alloc
is stabilized.Here is what I feel are the remaining issues:
Alloc
and GlobalAlloc
).Global
implement GlobalAlloc
?Opaque
? Global
?The impl GlobalAlloc for Global
is a bit icky, I agree. One of the alternatives we discussed at the all hands is free functions for alloc/dealloc/realloc in std::heap. They'll end up deprecated either way, but it does avoid a kind of confusing trait impl, as well as having a type that implements two traits with methods of the same names.
I'm OK with global functions, with the understanding that they will be deprecated once Alloc
is stabilized.
I’ve previously argued that if we have a set of free functions that have the exact same signature as a trait, they might as well be a trait impl. But it is a good point that functions would be easier to deprecate, so I’m fine with that as well.
Right now we can't deprecate trait impls AFAIK: https://github.com/rust-lang/rust/issues/39935
@Amanieu
For zero-sized allocations, I would like to point to some prior art in C++'s operator new:
If this argument is zero, the function still returns a distinct non-null pointer on success (although dereferencing this pointer leads to undefined behavior).
In practice (in both Clang/libc++ and GCC/libstdc++) this is implemented as a quick check that changes the size to 1 if it is zero.
AFAICT this is not exactly how it is implemented in neither libc++ [0] nor libstdc++ [1]. When a size of zero is requested, the size is as you mention changed to one, but then, a real memory allocation for this new size is performed with the requested alignment, and a unique pointer to the address is returned.
That is, for allocations of zero size, C++ calls the system allocator, potentially performing a context switch to the kernel, etc.
I believe that the overhead in this is sufficiently low since nobody in C++-land is complaining about it.
C++ does not have zero-sized types: all types are at least 1 byte in size. For example, if you create an array of length zero of an empty type struct A{}; auto a = new A[0];
, that array still has sizeof(A[0]) == 1
, so the raw allocation functions will not be called with size == 0
. In fact, I think triggering a call to new(size = 0)
is actually really hard unless one starts writing very weird C++ code, and probably impossible if one writes modern C++.
If your language does not have a concept of zero-sized types, zero-sized allocations do not really make much sense, and I argue that the real reason nobody is complaining about these is because nobody is triggering calls to new(size = 0)
in C++ yet in such a way that the answer isn't "don't write that code, it's horrible".
Many C++ developers have, however, complained about the lack of zero-sized types in the language, and ZST are already part of the C++20 standard and opt-in in some situations, but none that could trigger new(size = 0)
AFAICT. Basically, empty types can opt-in to becoming zero-sized when used within other types only.
I think a better example than C++ new
would be C malloc
, which people do have to manually pass a size argument all the time. The C11 standard says (7.20.3 Memory management functions):
If the size of the space requested is zero, the behavior is implementation defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.
Which if the implementation decides to return something different than null
means:
The pointer returned if the allocation succeeds is suitably aligned [...]. The lifetime of an allocated object extends from the allocation until the deallocation. The pointer returned points to the start (lowest byte address) of the allocated space. Each such allocation shall yield a pointer to an object disjoint from any other object.
So AFAICT for zero-sized allocations a compliant C11 implementation returns a pointer that cannot be dereferenced but can be freed. There are no requirements on the pointers to be unique. Since NULL
can be free
d without issues (its a no-op), it can return the same address for all zero-sized allocations, or some other address, or a unique address for each like C++ does.
There is a potentially interesting presentation about the topic [2] discussing it in the context of security vulnerabilities, but these probably don't apply to Rust and I haven't gone through it all yet.
[0] https://github.com/llvm-mirror/libcxx/blob/master/src/new.cpp#L71
[1] https://github.com/gcc-mirror/gcc/blob/da8dff89fa9398f04b107e388cb706517ced9505/libstdc%2B%2B-v3/libsupc%2B%2B/new_opa.cc#L95
[2] Automated vulnerability analysis of zero sized heap allocations - Julien Vanegue - Microsoft Security Engineering Center (MSEC) Penetration testing team: http://www.hackitoergosum.org/2010/HES2010-jvanegue-Zero-Allocations.pdf
I believe that the overhead in this is sufficiently low since nobody in C++-land is complaining about it.
I would like to clarify that the point I'm trying to make: The overhead of the extra zero-check in the common case (non-zero length allocation) is tiny to the point of being irrelevant. As such, we should aim to provide an API with fewer footguns, especially considering that ZSTs are much more common in Rust.
By the way, I would like to add another data point for how allocation APIs handle zero sizes. jemalloc's internal API (je_mallocx
, which we use instead of je_malloc
in Rust's jemalloc wrapper) has undefined behavior if a size of zero is passed [1]
[1] http://jemalloc.net/jemalloc.3.html
The mallocx() function allocates at least size bytes of memory, and returns a pointer to the base address of the allocation. Behavior is undefined if size is 0.
The libs team discussed this thread today today and consensus was to:
Stick with the status quo for zero-size allocations: alloc()
is an unsafe fn
and callers must ensure that the layout is not zero-size. We felt that making this method safe is not a significant win since the kind of code calling it needs typically needs unsafe
anyway (for dereferencing the returned pointer or of deallocating), and didn’t feel confident relying on (Thin)LTO and Sufficiently Advanced optimizers to not regress performance
Also keep raw pointers in GlobalAlloc
rather than results. Rationale: https://github.com/rust-lang/rust/pull/51160#issuecomment-392871051
Remove the Opaque
type entirely and use pointers to u8
. If we’re not waiting on extern types, an dedicated struct does not buy us much over u8
. And not having a new type avoids entirely questions of its name or what traits it should implement.
Replace impl GlobalAlloc for Global
with a set of free functions
Finally stabilize this subset (after modifications listed above) of the API:
/// #[global_allocator] can only be applied to a `static` item that implements this trait
pub unsafe trait GlobalAlloc {
unsafe fn alloc(&self, layout: Layout) -> *mut u8;
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout);
unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 {
// Default impl: self.alloc() and ptr::write_bytes()
}
unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {
// Default impl: self.alloc() and ptr::copy_nonoverlapping() and self.dealloc()
}
// More methods with default impls may be added in the future
}
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
pub struct Layout { /* private */ }
impl Layout {
pub fn from_size_align(size: usize: align: usize) -> Result<Self, LayoutError> {…}
pub unsafe fn from_size_align_unchecked(size: usize: align: usize) -> Self {…}
pub fn size(&self) -> usize {…}
pub fn align(&self) -> usize {…}
pub fn new<T>() -> Self {…}
pub fn for_value<T: ?Sized>(t: &T) -> Self {…}
}
#[derive(Copy, Clone, Debug)]
pub struct LayoutError { /* private */ }
/// Forwards to the `static` item registered
/// with `#[global_allocator]` if any, or the operating system’s default.
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {…}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {…}
unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 {…}
unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {…}
/// The operating system’s default allocator.
pub struct System;
impl GlobalAlloc for System {…}
Update: also:
pub fn oom(layout: Layout) -> ! {…}
The C99 Rationale V5.10 document has rationale on why malloc
supports zero-sized allocation (section 7.20):
The treatment of null pointers and zero-length allocation requests in the definition of these
functions was in part guided by a desire to support this paradigm:OBJ * p; // pointer to a variable list of OBJs /* initial allocation */ p = (OBJ *) calloc(0, sizeof(OBJ)); /* ... */ /* reallocations until size settles */ while(1) { p = (OBJ *) realloc((void *)p, c * sizeof(OBJ)); /* change value of c or break out of loop */ }
This coding style, not necessarily endorsed by the Committee, is reported to be in widespread use.
Some implementations have returned non-null values for allocation requests of zero bytes. Although this strategy has the theoretical advantage of distinguishing between “nothing” and “zero” (an unallocated pointer vs. a pointer to zero-length space), it has the more compelling theoretical disadvantage of requiring the concept of a zero-length object. Since such objects cannot be declared, the only way they could come into existence would be through such allocation requests.
The C89 Committee decided not to accept the idea of zero-length objects. The allocation functions may therefore return a null pointer for an allocation request of zero bytes
It is not 100% clear to me what this rationale means, but it appears to mean that they decided to allow malloc(0)
because, since C does not have ZSTs, too many people were emulating them with malloc(0)
in the wild before standardization.
About C++, I've asked on stackoverflow, and s Bo Persson pointed to C++'s standard 9th defect report:
Scott Meyers, in a comp.std.c++ posting: I just noticed that section 3.7.3.1 of CD2 seems to allow for the possibility that all calls to operator new(0) yield the same pointer, an implementation technique specifically prohibited by ARM 5.3.3. Was this prohibition really lifted? [...]
Proposed resolution:
[...]
Change 3.7.3.1/2, next-to-last sentence, from :
If the size of the space requested is zero, the value returned shall not be a null pointer value (4.10).
to:
Even if the size of the space requested is zero, the request can fail. If the request succeeds, the value returned shall be a non-null pointer value (4.10) p0 different from any previously returned value p1, unless that value p1 was since passed to an operator delete.
[...]
Rationale:
So it seems that in the original draft new(0)
could return any non-zero pointer, including the same one for different allocations, but "ARM 5.3.3 specifically prohibited it" (no rationale given) so they changed it to the current wording. The relevant section of the C++ standard has a footnote saying that the intent of new
is to be implementable by a call to malloc
modulo the zero size exception.
@SimonSapin
If the Layout
used to call alloc
cannot have zero-size, what did the libs team give as rationale for
allowing Layout
s of zero size in the first place?
@gnzlbg The use case of intermediate possibly-zero-size Layout
s being used to build larger Layout
s given earlier in this thread https://github.com/rust-lang/rust/issues/49668#issuecomment-392580785 sounds reasonable to me. Having a separate LayoutBuilder
API seems like a lot just to allow alloc()
to be safe (given that callers typically need some unsafe
code anyway).
@SimonSapin
Basically, the trait API looks ok, but what is missing there is the comments expressing the function requirements, semantics, and guarantees.
Right, this still needs lots of docs to write out the API contracts. Returning null indicates an error or failure. Maybe the OS is out of memory, maybe the given Layout
is not supported at all by this allocator (e.g. align overflowing u32
on macOS’s System
allocator)
Calling alloc
with a zero-size layout is undefined behavior, a violation of the unsafe fn
contract. Meaning: impls are allowed to do anything in that case, including returning null.
Thanks for typing this up @SimonSapin! That all looks accurate and good to me
I'm not too happy with the lack of Result
and not handling zero-sized allocations, but I can accept it with the knowledge that we still have a chance to revise the API later on with the Alloc
trait. However we should make it clear (as I discussed with @SimonSapin at Rustfest) that the free alloc::alloc()
functions will be deprecated as soon as Alloc
is stabilized, in favour of Global.alloc()
.
As a side note, should we add a global_
prefix to the free functions? A bare alloc
might be confusing in a module full of allocation-related definitions (e.g. Alloc
which differs only by casing). However this is only a weak objection, I was just wondering if the option had been discussed in the libs team meeting.
I don't think a global_ prefix is all that necessary. It's a free function so it has to be talking to the global allocator, right? There aren't any other options.
Or a global
sub-module? So the full path might be std::alloc::global::alloc()
Stabilization PR: https://github.com/rust-lang/rust/pull/51241 (finally!)
Most helpful comment
As for the name of
Opaque
, I'm hoping we can decide during stabilization. I'm personally currently in favor of*mut Whatever