Rust: Tracking issue for the GlobalAlloc trait and related APIs

Created on 4 Apr 2018  ·  148Comments  ·  Source: rust-lang/rust

PR https://github.com/rust-lang/rust/pull/49669 adds a GlobalAlloc trait, separate from Alloc. This issue track the stabilization of this trait and some related APIs, to provide the ability to change the global allocator, as well as allocating memory without abusing Vec::with_capacity + mem::forget.

Defined in or reexported from the std::alloc module:

Update: below is the API proposed when this issue was first opened. The one being stabilized is at https://github.com/rust-lang/rust/issues/49668#issuecomment-393263510.

/// #[global_allocator] can only be applied to a `static` item that implements this trait
pub unsafe trait GlobalAlloc {
    unsafe fn alloc(&self, layout: Layout) -> *mut Opaque;
    unsafe fn dealloc(&self, ptr: *mut Opaque, layout: Layout);

    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut Opaque { 
        // Default impl: self.alloc() and ptr::write_bytes()
    }
    unsafe fn realloc(&self, ptr: *mut Opaque, layout: Layout, new_size: usize) -> *mut Opaque {
        // Default impl: self.alloc() and ptr::copy_nonoverlapping() and self.dealloc()
    }

    // More methods with default impls may be added in the future
}

extern {
    pub type Opaque;
}

#[derive(Debug, Copy, Clone, PartialEq, Eq)]
pub struct Layout { /* private */ }

impl Layout {
    pub fn from_size_align(size: usize: align: usize) -> Result<Self, LayoutError> {…}
    pub unsafe fn from_size_align_unchecked(size: usize: align: usize) -> Self {…}
    pub fn size(&self) -> usize {…}
    pub fn align(&self) -> usize {…}
    pub fn new<T>() -> Self {…}
    pub fn for_value<T: ?Sized>(t: &T) -> Self {…}
}

#[derive(Copy, Clone, Debug)]
pub struct LayoutError { /* private */ }

/// Forwards method calls to the `static` item registered
/// with `#[global_allocator]` if any, or the operating system’s default.
pub struct Global;

/// The operating system’s default allocator.
pub struct System;

impl GlobalAlloc for Global {…}
impl GlobalAlloc for System {…}

CC @rust-lang/libs, @glandium

Unresolved questions or otherwise blocking

  • [x] Wait for extern types https://github.com/rust-lang/rust/issues/43467 to be stable in the language before stabilazing one in the standard library.
  • [x] Name of the Global type. GlobalAllocator?
  • [x] Name of the Void type. Lower-case void would match C (our *mut void is very close to C’s void* type), but violates the convension of camel-case for non-primitive types. But Void in camel-case is already the name of an empty enum (not an extern type like here) in a popular crate. (An extern type is desirable so that <*mut _>::offset cannot be called without first casting to another pointer type.) Maybe call it Opaque? Unknown? Renamed to Opaque.

    • [x] Rename again to something else?

  • [x] ~Taking Layout by reference, or making it Copy https://github.com/rust-lang/rust/issues/48458.~ Copy: https://github.com/rust-lang/rust/pull/50109
  • [x] ~GlobalAlloc::owns: https://github.com/rust-lang/rust/issues/44302 proposes making it required for allocators to be able to tell if it “owns” a given pointer.~ Not to be required by this trait since glibc and Windows do not support this.
  • [x] ~Settle semantics of layout "fit" and other safety conditions.~ Without an usable_size method (for now), the layout passed to dealloc must be the same as was passed to alloc. (Same as with Alloc::usable_size’s default impl returning (layout.size(), layout.size()).)
  • [x] ~An oom method is part of GlobalAlloc so that implementation-specific printing to stderr does not need to be part of liballoc and instead goes through #[global_allocator], but this does not really belong in the GlobalAlloc trait. Should another mechanism like #[global_allocator] be added to have pluggable OOM handling?~ https://github.com/rust-lang/rust/pull/50144

Not proposed (yet) for stabilization

  • The Alloc trait
  • The rest of Layout methods
  • The AllocErr type

We’re not ready to settle on a design for collections generic over the allocation type. Discussion of this use case should continue on the tracking issue for the Alloc trait: https://github.com/rust-lang/rust/issues/32838.

A-allocators B-unstable C-tracking-issue T-libs disposition-merge final-comment-period finished-final-comment-period

Most helpful comment

As for the name of Opaque, I'm hoping we can decide during stabilization. I'm personally currently in favor of *mut Whatever

All 148 comments

I was initially confused by the name Void. I confused it with the aforementioned empty enum. Bikeshed: I wonder if it could be called Allocation or Memory?

But I don’t know if any platform’s system allocator (like libc’s malloc) supports this, let alone all of them.

The OSX system allocator supports that. Jemalloc does too, optionally. Glibc malloc and Windows heap allocator don't.

Taking Layout by reference, or making it Copy #48458

The fact that realloc now takes the new size as usize rather than a Layout strongly points towards just making Layout implement Copy and treating it as a simple (length, alignment) pair.

Has the Heap type been renamed to Global?

@alexreg Yes, as described in the PR message of #49669.

Naming bikesheds:

Given that the attribute is global_allocator, should the trait be called GlobalAllocator instead? alloc vs. allocator is inconsistent.

Also, I am not very happy with Void. void is probably the most misunderstood type in C. It is often described as "empty", but it actually is a unit type (), not the empty never type !. void* is NOT just a pointer to () though, but an independent concept. I'd rather we do not step into the middle of this confusion.
From the alternatives proposed above, I prefer Unknown. Some more random suggestions: Blob, Extern, Data. I think from all of these, my personal favorite is Blob, which I have often seen used as a term for "not further specified binary data". But really, any of these is strictly better than Void IMHO.

If GlobalAlloc is renamed to GlobalAllocator, should Alloc be Allocator? And the alloc method allocate? (And similarly other methods.)

Agreed that Void or void is probably best avoided, though I’m not sure what to pick instead.

I would personally keep the name "alloc" for traits and methods, the global_allocator is talking about a specific instance whereas the methods/traits are actions/groups of types.

std::alloc::Global could also be something like GlobalHeap (I think suggested by @glandium?) or DefaultHeap or DefaultAlloc. I don't think we can choose the name GlobalAlloc without renaming the trait GlobalAlloc itself.

I don't think we should implement owns. The sheer fact that two tier-1 platforms don't implement it I feel like is a nail in that coffin.

The GlobalAlloc trait should require Sync as a super trait, no?

Yeah, Send as well.

I don't think Send is actually necessary. A GlobalAlloc is never dropped and does not require a &mut.

My rule of thumb for Send is: "is it OK to drop this type in a different thread than the one it was created in?"

@fitzgen @sfackler I don't think either trait is necessarily required per se though, inherently sticking it into a static will require both traits for sure. I think the main benefit may be being informative early on, but we've tended to shy away from those types of trait bounds.

As Alex said, Sync is already required by static items which are the only ones where #[global_allocator] can be used.

@alexcrichton

sticking it into a static will require both traits for sure.

That should only require Sync, as @SimonSapin said, but not Send, right? Only &T is shared across threads.

from_size_align_unchecked is not an unsafe method? Are there any invariants that Layout upholds?

Why Void and not fn alloc<T>() -> *mut T?

from_size_align_unchecked is unsafe, that was a mistake. See https://doc.rust-lang.org/nightly/core/heap/struct.Layout.html#method.from_size_align for invariants.

alloc_one<T>() is a convenience method on top of alloc(Layout), you can’t always statically construct a type that has the desired layout.

Every time I see this trait name I get it confused with GlobalAlloc the Windows API function: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366574

@retep998 What do you think would be a better name for the trait?

[Reposting in the right tracking issue this time...]

So, the name currently decided on by @SimonSapin for the allocated memory type seems to be Opaque. I concur that Void is a dubious name, but I don't think Opaque is great, mainly due to its complete lack of descriptiveness. I propose one of the following:

  • Blob
  • Mem
  • MemBlob
  • MemChunk
  • OpaqueMem – not my favourite, but at least slightly more explicit than just Opaque
    I'd probably lean towards Mem, since it's the most pithy, but the others are okay too.

I would add Raw to the list, since allocation returns raw untyped memory.

RawMem would work too, yeah. Just Raw is too vague though.

What word does OED define as the following perfect description of our use case?

Used to emphasize a lack of restriction in referring to any thing or amount, no matter what.


answer


*mut Whatever


This is a serious suggestion by the way.

extern {
    fn memcpy(dest: *mut Whatever, src: *const Whatever, n: size_t);
}

unsafe impl GlobalAlloc for BetterAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut Whatever {
        /* ... */ as *mut Whatever
    }
}

It’s not a regular noun though. (An interrogative/relative pronoun yes, but those are a bit different.)

What is the caller going to store in this memory that I allocated for them?

Well, whatever they want.

What type of data am I copying with this memcpy?

I don't know, ¯\_(ツ)_/¯ whatever happens to be there I guess.

IMO this captures the essence of void *.

@rfcbot fcp merge

The global allocator APIs and associated infrastructure have been around for a very long time at this point and I think that it's high-time to propose stabilization. With @SimonSapin's recent PR to refactor out a GlobalAlloc trait I'd like to FCP to stabilize just that trait and associated items (plus a conservative API on Layout)

Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged teams:

  • [x] @Kimundi
  • [x] @SimonSapin
  • [x] @alexcrichton
  • [x] @aturon
  • [x] @dtolnay
  • [x] @sfackler
  • [x] @withoutboats

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

As for the name of Opaque, I'm hoping we can decide during stabilization. I'm personally currently in favor of *mut Whatever

There could be some more bikeshedding on the Global name too. (https://github.com/rust-lang/rust/commit/c32a3de476768cb222f5062efdf71c73b38739fa#commitcomment-28405600)

I’d rather stick with Opaque than go with Whatever. Both are too vague, but especially Whatever, which is really nebulous.

As motivated in https://github.com/rust-lang/rust/issues/49668#issuecomment-381219288, *mut Whatever conveys a pointer that points to whatever data happens to exist there, of whatever type. One can allocate memory intended to hold whatever the caller wants there, one may copy whatever data from one pointer to another, etc.

While I’m all for keeping momentum, there’s still a number of open issues to resolve before stabilization. (The semantics of layout fit is a subtle one that I don’t quite understand.)

One open question that came up during the all-hands is Layout::for_value. This does not need to actually dereference the pointer, so maybe it should be changed to a raw pointer. The trouble with making it a reference is that references (maybe?) assert that the value they point to is valid, while for_value is called with a reference to a dead value e.g. by Arc::drop.

Cc @nikomatsakis

@RalfJung It calls std::mem::size_of_val which takes &T

@SimonSapin Good point. Probably we should have raw pointer versions of that (and align_of_val).

@dtolnay Okay, I see your point. The problem for me is still that Whatever doesn’t really work as a noun.

I would like to repeat @alexreg's suggestion of RawMem. Either that or keep Opaque, I think that Whatever feels too informal.

@SimonSapin indeed yeah there are some unresolved questions, but I'd consider them largely bikesheds by this point. (I also see our bikeshed for this as more of an airplane hangar it's got so many coats of paint)

  • extern type - I am personally mostly a fan of *mut u8, but I realize that this isn't popular. If extern type Foo doesn't work out though I think we'll be forced to switch to *mut u8 to get .offset working in terms of bytes. I wouldn't see this as the end of the world.
  • Global naming - a bikeshed!
  • "fit" semantics - if we're not stabilizing usable_size, I believe this is settled? That is, you must deallocate and reallocate with precisely the same size and alignment. (alignment has our hand forced by Windows I believe)
  • OOM methods - we discussed this a bit on IRC but I think that we'll want to stabilize oom here.

Ok, sounds good. (FWIW I’m also fine with *mut u8.)

@alexcrichton why are we stabilizing GlobalAlloc::oom? I still do not understand why the allocator is the thing that decides how OOM is treated.

OOM in general to me feels pretty closely coupled with the global allocator (in that you often want to customize both). On a technical level it's pretty awkward for us to move OOM handling off the GlobalAlloc trait and if we were to not stabilize it then customizing OOM wouldn't be possible and you also wouldn't even get a nice message saying that OOM happened.

They're still on separable axes in that only the final crate decides the global allocator, and it's up to that crate to pick-and-choose implementations if it likes.

It's true you often want to customize both but they're totally orthogonal customizations. If they're conflated, what does that actually look like? Are people expected to make their own GlobalAlloc implementation that wraps another and delegates all methods except for oom? How would System::oom and Jemalloc::oom differ?

Yes if you want to customize OOM and not the allocator you'd have to make your own implementation of the trait and dispatch all methods except one. That, however, doesn't seem like that much overhead to me (it's not really that common to override OOM?). I would imagine Jemalloc::oom is the same as System::oom, but we have to make sure that it's possible to define it as that.

So it seems like we're trying to hitch OOM customization to gobal allocator customization because we happen to already have the glue in place for it. What is the specific driver for stabilizing it in the first place? Is there concrete demand for customizing OOM? The "delegate n-1 methods" workflow seems really bad.

We somehow need to stablize when you're using jemalloc that OOM prints out a nice message like the system allocator would by default. Currently we're not really in a position to remove OOM handling from the trait to decouple oom handling from the GlobalAlloc trait, which in my mind forces our hand on stabilizing GlobalAlloc::oom.

If we can figure out how to move the method off the trait and a good way and have custom allocators by default route to "print an OOM message" then we should be able to avoid stabilizing it for sure.

That being said, I don't really see the erognomic downside here as really that bad, it's something you do vanishing rarely and we're about enabling workflows rather than making the workflows perfect right now.

I feel like I must be missing something because this doesn't seem like a hard problem to solve:

Option 1: Replace the code in liballoc that calls the_global_allocator.oom() with code that calls rtabort!("out of memory") or whatever System::oom does.
Option 2: Keep GlobalAlloc::oom, give it a default impl (which every implementation will want anyway, right?), and leave it unstable.

@sfackler Keep in mind that liballoc can't depend on any crates other than core, so the only implementation of oom that liballoc could provide would be to call intrinsics::abort. This isn't very user-friendly since it just crashes the program with an illegal instruction.

There are only really 2 options to provide a user-friendly OOM message:

  • A trait method in GlobalAlloc, which allows allocator implementations to call libc to print a message and abort.
  • A lang item, which allows libstd to provide an OOM handler. The downside of this is that it adds yet another lang item that we will need to stabilize later on to allow no_std programs to use liballoc (since liballoc relies on this lang item existing).

As far as the workflow being bad, if I want to pick what happens on OOM, I want to write something like

#[oom_handler]
fn oom() -> ! {
    // log stuff!
}

not

use alloc::{Layout, GlobalAlloc, Opaque};
use system_alloc::SystemAllocator;

#[global_allocator]
static _ALLOC: OomHandlingAllocator<SystemAllocator> = OomHandlingAllocator(SystemAllocator);

struct OomHandlingAllocator<A>(A);

impl<A> GlobalAlloc for OomHandlingAllocator<A>
where
    A: GlobalAlloc
{
    unsafe fn alloc(&self, layout: Layout) -> *mut Opaque {
        self.0.alloc(layout)
    }

    unsafe fn dealloc(&self, ptr: *mut Opaque, layout: Layout) {
        self.0.dealloc(ptr, layout)
    }

    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut Opaque {
        self.0.alloc_zeroed(layout);
    }

    unsafe fn realloc(&self, ptr: *mut Opaque, layout: Layout, new_size: usize) -> *mut Opaque {
        self.0.realloc(ptr, layout, new_size)
    }

    // hopefully I didn't forget to delegate anything...

    fn oom(&self) -> ! {
        // log stuff!
    }
}

We should put the work into making this reasonable.

OOM and the global allocator seem to be two examples of "compiler-endorsed singletons" ("stable lang-items"?) or whatever one may want to call them; I think I've seen this brought up elsewhere where people wanted to use a similar mechanism for something else. Maybe it makes more sense to make them both instances of the same mechanism, rather than bolting them together in one trait?

@sfackler as @Amanieu mentioned unfortunately we can't tweak the definiton as-is today. The GlobalAlloc trait is defined in core which has no access to primitives like write, so it can't define an implementation that either delegates to or implements the "reasonable expectation" by default. (but libstd can do this).

We could perhaps move GlobalAlloc but we could only move it up as far as alloc which as the same issues, it doesn't have access to system resources as it purely assumes global allocation.

We could, however, use an entirely new mechanism for defining OOM handlers like panic handlers and such, but I really do think that's overkill. What you mentioned is indeed not super ergonomic, but I don't think it's something we must fix. Your example could alternatively be written as:

extern crate some_crates_io_crate;

use std::alloc::System;
use some_crates_io_crate::{CustomOom, Oom};

#[global_allocator]
static ALLOC: CustomOom<System, MyOom> = CustomOom(System, MyOom);

struct MyOom;

impl Oom for MyOom {
    fn oom() -> ! { ... }
}

I don't understand what you mean by overkill.

The amount of compiler support necesary for custom panic implementations, custom global allocators, custom panic runtimes, etc, is quite large and difficult to maintain. Adding "yet another one" is overkill in my mind (as I've implemented all of these in the compiler so far) compared to just having a crate on crates.io that allows customizing just OOM

oom seems very similar to panic_fmt - why would it not be able to use the same logic as that but be pulled in through liballoc rather than libcore?

I think OOM is actually very similar to panic in the kinds of ways you'd want to control it. Setting something like this statically at build time is the least flexible way of doing things, and I think we'd want to end up in a place in std where we have something like std::panic::set_hook but for OOM. This allows you more control to easily change your OOM behavior based on runtime config.

@sfackler yeah we can concoct anything we want to design an OOM handler into the system, my point is that I feel like it's all overkill. We've got a workable solution today and it's not clear to me why we'd want to strong optimize differently for various use cases.

It's always possible, yes, for alloc to have another "weak lang item" defined in libstd, and then libstd has infrastructure for saying "here's the real oom handler". That comes at the downside of adding another assumption to alloc (an OOM handler and a global allocator).

How would System::oom and Jemalloc::oom differ?

Today, jemallocator::Jemalloc::oom is defined as a call to std::alloc::System::oom.

Is there concrete demand for customizing OOM?

We have an accepted RFC https://github.com/rust-lang/rust/issues/48043 for some way to opt into OOM causing a panic rather than an abort. The exact mechanism is not defined by the RFC and left up to the implementation, but this is not implemented yet. fn oom() -> ! (whether it’s a method of #[global_allocator] or a free function with its own similar attribute) could be that mechanism.

:bell: This is now entering its final comment period, as per the review above. :bell:

We've got a workable solution today and it's not clear to me why we'd want to strong optimize differently for various use cases.

My point is that the current solution is bad API design. The fact that jemallocator depends on alloc_system just so that it doesn't need to copy all of this stuff is bad: https://github.com/rust-lang/rust/blob/master/src/liballoc_system/lib.rs#L370-L428. Dlmalloc just doesn't implement oom, not really sure what the deal is there: https://github.com/alexcrichton/dlmalloc-rs/blob/c99638dc2ecfc750cc1656f6edb2bd062c1e0981/src/global.rs#L123-L125.

The GlobalAlloc trait should be responsible for allocating memory, not allocating memory and figuring out how to print an error message on OOM. When someone in the future asks "why is OOM behavior configured through the global allocator?", what is the answer other than "we didn't have the energy to make a lang item and it's a convenient place to stick it"?

I am willing to do the work of making a #[lang="oom"]. It's another lang item you need to define if you're working in alloc-but-no_std-land, but I think that is a reasonable price to pay.

50144

@sfackler thanks for doing the legwork to turn it into a lang item!

Now that #50144 has moved OOM to its own lang item, this may not be the right place to talk about this, so please redirect me to the right place if not.

In Firefox, we have the issue that we need to differentiate OOM crashes from other crashes, and we also need to collect the requested size that led to OOM when it happens. The problem is that the GlobalAlloc changes removed the oom function argument that could have allowed that, and the new oom lang item has kept the new signature.

The alternatives are not great:

  • Moving OOM handling to the allocator function would mean turning it to an infallible allocator function, meaning code paths that actually want to handle allocation failures gracefully can't anymore.
  • Making the allocator record the size of allocations that fail, so that the OOM handler can check it out afterwards is not reliable, because a caller may handle the failure gracefully, and you'd have to reset the oom record for a subsequent crash not to look like an oom. It would be possible, though, to store the oom record in a thread local variable, and then use that thread local variable from the oom handler to feed the crash reporter. That seems convoluted. It could work in Firefox, because we control the allocator as well (although that's not true for ASAN builds), but that's not a general purpose solution.

How does the C side of the codebase deal with this?

The C side has different function calls for fallible and infallible allocations. Fallible allocations return NULL when they fail, and infallible allocations panic and handle recording the OOM requested size.

How would the old oom function have supported what you want, though? It's still a separate function that's called after the function that failed, so you'd still need thread local tracking of what went wrong.

Since you control the allocator, you could have the allocation functions directly abort like the C versions do?

How would the old oom function have supported what you want, though? It's still a separate function that's called after the function that failed, so you'd still need thread local tracking of what went wrong.

The old oom function was taking an AllocErr, which contained the requested Layout.

Since you control the allocator, you could have the allocation functions directly abort like the C versions do?

As I said, that would mean making all rust allocations infallible, even for code that does want to handle allocation failure.

Ah sure - the AllocErr parameter was removed a bit before it turned into a separate function and I had forgotten about that.

It seems reasonable to me for oom take a Layout as a parameter.

Shall I just go ahead with a PR doing that?

Sure!

Should it be Layout or just usize for the allocation size?

Since checking for null and calling oom(…) might be a very common pattern, it might make sense to provide convenience API(s) for it. But to avoid users having to pass the layout (or size) twice, it’d have to be a whole set of parallel APIs for alloc, alloc_zeroed, and realloc. With Alloc::*_excess, it’s now two axes where we potentially want all combinations.

Considering alloc might fail because of wrong alignment requirements, an oom handler may want to check that too.

(Thank you for reminding me that I wanted to add alloc_zeroed_excess)

The final comment period is now complete.

Another problem with the oom lang item: it doesn't allow to override the one from std:

error[E0152]: duplicate lang item found: `oom`.
   = note: first defined in crate `std`.

Presumably, this would be fixed by making it an actual weak lang item.

Continuation of my earlier comment about passing Layout to oom being painful, it's particularly painful in
https://github.com/rust-lang/rust/blob/9e8f683476d5a9d72c6c1e9383a519cf0fb27494/src/liballoc/raw_vec.rs#L555
where the patch looks like:

@@ -552,7 +553,7 @@ impl<T, A: Alloc> RawVec<T, A> {
     pub fn reserve(&mut self, used_cap: usize, needed_extra_cap: usize) {
         match self.try_reserve(used_cap, needed_extra_cap) {
             Err(CapacityOverflow) => capacity_overflow(),
-            Err(AllocErr) => oom(),
+            Err(AllocErr) => oom(Layout::array::<T>(self.amortized_new_size(used_cap, needed_extra_cap).unwrap()).unwrap()),
             Ok(()) => { /* yay */ }
          }
      }

It ends up similarly painful in
https://github.com/rust-lang/rust/blob/9e8f683476d5a9d72c6c1e9383a519cf0fb27494/src/libstd/collections/hash/table.rs#L773
https://github.com/rust-lang/rust/blob/9e8f683476d5a9d72c6c1e9383a519cf0fb27494/src/libstd/collections/hash/table.rs#L812
https://github.com/rust-lang/rust/blob/9e8f683476d5a9d72c6c1e9383a519cf0fb27494/src/libstd/collections/hash/map.rs#L787

where the patch similarly needs to call a function.

Actually, the ones for collections are just plain horrible, especially the one for HashMap, which I haven't implemented because of the error-proneness. At this point, I'm tempted to say infallibility should be handled by the allocator itself. Adding methods is not incredibly attractive, so maybe a parameter to the alloc functions?

Another problem with the oom lang item: it doesn't allow to override the one from std:

The thing to do here is to make an API like std::panic::set_hook. It is already a weak lang item.

@glandium want to open a separate tracking issue for customizing OOM behavior? (through an API like @sfackler is thinking)

set_hook or a lang item – either sounds like a good way to handle OOM for the GlobalAlloc. Is there any reason for preferring one over the other?

  1. Allowing people to configure something at runtime is strictly more flexible than only allowing them to configure it at compile time.
  2. Lang items can only be defined once, and std defines the oom lang item.

Oh, I didn't know about restriction 2. As for 1, it can be circumvented easily as you have a lang item. :-)

I noticed that the compiler will accept both a normal and an extern function for the oom lang item:

// This is how rust_oom is defined in std
#[lang = "oom"]
extern fn rust_oom() -> ! { loop {} }

// The compiler also accepts this
#[lang = "oom"]
fn rust_oom() -> ! { loop {} }

Is this intentional? Does rustc automatically pick the correct ABI for the lang item?

OTOH, global_allocator is set at compile time, and you get one by default if you don't set one yourself. Why should oom handling be different?

I don't understand. Why would we want to be less flexible here?

Why would you not want global_allocator to be similarly more flexible, then?

Anyways, this discussion also doesn't address the problem I raised in my last comments, where trying to pass a layout to the oom function is difficult in some places. As mentioned by @SimonSapin this can be addressed by adding helper functions to the Alloc trait, but the multiplication of functions is not helping.
I'm now thinking of something like:

struct Infallible<A: Alloc>(A);
impl<A: Alloc> Alloc for Infallible<A> { ... }

I was thinking that the infallible APIs could maybe return NonNull, but that only works if they’re separate APIs.

Why would you not want global_allocator to be similarly more flexible, then?

Because we can't. panic::set_hook requires allocating, and the runtime allocates before main.

Because we can't. panic::set_hook requires allocating, and the runtime allocates before main.

That's a fair reason. OTOH, I'm not convinced the flexibility for oom would be any useful. And "because lang items can be defined only once" doesn't seem like a compelling reason, when, as mentioned earlier, the problem doesn't exist for #[global_allocator]. IOW, I don't see why oom can't be handled more similarly to #[global_allocator].

(or #[panic_implementation], for that matter)

@Amanieu lang items aren't type checked.

@glandium It is way easier to implement a panic-hook style interface than another thing like global_allocator. panic_implementation behaves identically to oom.

I was thinking that the infallible APIs could maybe return NonNull, but that only works if they’re separate APIs.

Callers are not calling directly the GlobalAlloc trait methods, though. And the Alloc methods already return NonNull, albeit in a Result with an AllocErr. It would sure be better if an infallible API returned ! instead of AllocErr. In the Infallible<A> scheme I suggested above, that could be done by adding an Error associated type to the Alloc trait.

Something like panic_hook requires dynamic dispatch, which is not a cost we wanted to pay for allocators. Panics are the extra-slow path, like OOM, so don't need to be optimized at all.

So I gave Infallible<A> a shot, and while it does help with many places calling oom, it doesn't help with all the places involving CollectionAllocErr. This would be equally true with infallible methods rather than Infallible<A>. It seems, though, it could be reasonable to add the Layout to CollectionAllocErr::AllocErr, although that would kill From<AllocErr> for CollectionAllocErr.

Ok, I have a working PoC for Infallible<A> and oom(Layout), and I haven't touched anything related to GlobalAlloc, so I'll move the discussion to #48043 and #32838 and will file a separate issue for OOM itself.

Refocusing on GlobalAlloc itself, there doesn't seem to have been a conclusion on the bikeshedding.

cc @SimonSapin

I just went through https://github.com/rust-lang/rust/pull/49669 which does not offer much rationale about some of the changes but points to this issue as the correct place for discussion, and also mentions the All-Hands but I wasn't there (maybe someone that was there can answer my questions).

It looks like https://github.com/rust-lang/rust/pull/49669 takes many steps in the opposite direction that I at least thought that the allocator design was going, but I don't see these issues being mentioned here, and they are IMO fundamental.

I thought that the plan was to:

  • have one crate per allocator
  • have each allocator implement the traits themselves (GlobalAlloc, Alloc, etc.) so that they can provide specializations when available
  • eventually replace the allocator crates in rust-lang/rust like liballoc_jemalloc with the rust-lang-nursery crates like jemallocator, so that users that choose to include a separate allocator crate get the same allocator that they would have gotten when this was used through rust. For example, a crate compiled with system malloc as global allocator that wants to use Rust jemalloc for a particular thing could just include jemallocator and get the rust version of jemalloc

Before https://github.com/rust-lang/rust/pull/49669 we were already at the step where we had the different allocator crates implementing their own traits, so I thought the next steps towards the allocator system structure would be to just replace them with submodules pointing to the crates in the nursery.

However, https://github.com/rust-lang/rust/pull/49669 changes all these crates to not implement any allocator traits anymore, instead implementing GlobalAlloc for the allocators in liballoc/alloc.rs and implementing Alloc on top of the more restricted GlobalAlloc API.

This prevents allocators from providing more efficient implementations of the trait methods. For example, liballoc_jemalloc used to provide more efficient implementations of Alloc:shrink_in_place and Alloc::grow_in_place using xallocx, but this is no longer the case. Also, Alloc::usable_size used to return the correct available size for jemalloc but now it returns the size that was requested, which I'd consider a bug.

  • So why were these changes made?
  • Is it still a plan to replace the rust-lang/rust allocators with the ones in the nursery? If not, why not?
  • Is it still a plan to allow allocators, including the global one, to provide more specialized implementations of the GlobalAlloc and the Alloc traits? If not why not? And if yes, why did that PR introduce this regression?

have one crate per allocator

Once upon a time, customizing the global allocator was done through an "allocator crate" with some dedicated crate-level attributes (and I think some free functions expected to be at the root?) Maybe you’re thinking of this?

Now we have the #[global_allocator] attribute that goes on a static item whose type implements GlobalAlloc, but that type can be defined anywhere. Crates are not special there, the same crate could have multiple types that implement that trait.

have each allocator implement the traits themselves (GlobalAlloc, Alloc, etc.)

I don’t think there ever was a requirement that any type implements both traits. jemallocator does for convenience of users.

get the rust version of jemalloc

When System becomes the default, I expect the alloc_jemalloc crate to be removed and I’d say there will no longer be a "rust version of jemalloc". Unless you consider the nursery to be part of "the rust project" but I view it more as crates that happen to be maintained by some of the same people as std.

I thought the next steps towards the allocator system structure would be to just replace them with submodules pointing to the crates in the nursery

As far as I know there never was a plan for jemallocator to be a sumbodule of this repository and be distributed together with std. Rather, users can opt to add it to their [dependencies] in Cargo.toml, and define themselves a static with #[global_allocator].

Alloc:shrink_in_place, Alloc::grow_in_place, Alloc::usable_size

These methods are not part of the initial stabilization:

The high level idea that lead to #49669 is that we want to stabilize #[global_allocator] and the trait that supports it, but we’re not ready to stabilize the Alloc trait. From in-person discussion (sorry if I didn’t explain the reasoning enough here) we settled on having a separate trait with only "libc-equivalent functionality", minimizing the API surface in order to minimize the number of blockers to initial stabilization.

allow allocators, including the global one, to provide more specialized implementations of the GlobalAlloc and the Alloc traits?

jemallocator already does.

@SimonSapin thanks for taking the time to go over all that on IRC.

To summarize my thoughts. I think that after understanding the next steps of unifying __rde_/__rdl_, removing liballoc_jemalloc, ... the general direction looks to me to be the right one.

The only thing I think might be better to do differently is the implementation details of the liballoc_system crate. This is a nitpick, but on some platforms like FreeBSD, NetBSD, and newer Android versions, jemalloc is the system allocator, and I think it would be cool if in these systems liballoc_system would have feature parity with the jemallocator crate. For example, I'd expect that in these systems the specializations of the Alloc trait for jemalloc, that are available in the jemallocator crate, are also implemented in the impl of Alloc for the System allocator and for the Global allocator when Global == System.

Yeah, seems reasonable for alloc_system to become more tuned for individual platforms over time.

To summarize a bit the discussion in internals, these are the changes that I would do to the global allocator trait. It is a fallible API that always returns the "Excess" and would look like this (where Excess is now called Allocation):

/// Represents a memory allocation, that is, a combination of a starting address
/// and the allocation layout.
pub struct Allocation(*mut Opaque, Layout);

impl Allocation {
    /// Instantiates a memory allocation from a pointer and a layout.
    pub fn new(ptr: *mut Opaque, layout: Layout) -> Self;
    /// Starting address of the memory allocation.
    pub fn ptr(&self) -> *mut Opaque; 
    /// Layout of the allocation.
    pub fn layout(&self) -> Layout;
}

/// The `AllocErr` error specifies that an allocation failed.
///
/// This can only happen due to resource exhaustion or because the allocator
/// does not support the provided `Layout`.
#[derive(Clone, PartialEq, Eq, Debug)]
pub struct AllocErr(Layout);

impl AllocErr {
    /// Layout that produced the allocation to fail.
    pub fn layout() -> Layout;
}

/// Reallocation error.
///
/// This can happen either due to resource exhaustion, because the allocator
/// does not support the provided `Layout`, if the `Allocation` was allocated
/// with a differen allocator, etc.
///
/// It contains the `AllocErr` containing the provided `Layout` and the
/// `Allocation` that failed to reallocate.
pub struct ReallocErr(AllocErr, Allocation);

impl Realloc {
    /// Layout of the reallocation request.
    fn layout_request(&self) -> Layout;
    /// Allocation that failed to reallocate.
    fn allocation(self) -> Allocation;
}

/// A memory allocator that can be registered to be the one backing `std::alloc::Global`
/// though the `#[global_allocator]` attributes.
pub unsafe trait GlobalAlloc {
    /// Allocates memory as described by the given `layout`.
    fn alloc(&self, layout: Layout) -> Result<Allocation, AllocErr>;

    /// Deallocate the memory `allocation`.
    ///
    /// # Safety
    ///
    /// The `allocation` pointer must come from this allocator and its size 
    /// and alignment must be either the requested or returned.  
    unsafe fn dealloc(&self, allocation: Allocation);

    /// Allocates zeroed memory as described by the given `layout`.
    fn alloc_zeroed(&self, layout: Layout) -> Result<Allocation, AllocErr>;

    /// Shink or grow a memory `allocation` to the given `new_layout`.
    /// # Safety
    ///
    /// The `allocation` pointer must come from this allocator and its size 
    /// and alignment must be either the requested or returned.  
    unsafe fn realloc(&self, allocation: Allocation,
                      new_layout: Layout) -> Result<Allocation, ReallocErr>;
}

A couple of notes on the changes:

  • Excess is renamed to Allocation and contains a pointer to the allocation and a layout. The pointer can be null, and the layout can have size zero. As it was the case, if that is a valid the allocation, calling deallocate with that allocation should succeed.
  • alloc and alloc_zeroed are not unsafe fns. They either succeed or return an error. If an allocator implementation returns Ok then that's a valid allocation that can be passed to dealloca and realloc.
  • functions taking an Allocation are still unsafe because that Allocation could come from a different allocator, and thus cause undefined behavior. It is a precondition on these methods that the Allocations provided come from calling one of the allocation methods of these allocators.
  • Because these functions consume the allocation, and Allocation is neither Copy nor Clone, double frees cannot happen without the user using unsafe code to duplicate an Allocation. realloc consumes the allocation and, if there was an error, returns it.
  • The wording of # Safety needs to be tuned such that collections like Vec and Box can recompute the Allocation from the capacity and alignment of the type, where the capacity might be the size requested, or the size returned, and the same applies to the alignment. That is, recomputing an Allocation from the pointer, the size returned, and the alignment requested, should work out.

I can try to send a working implementation of this as a PR over the weekend, that way we can see how much does this change the design, and do a perf run to see the impact on performance. I expect that suitable #[inline] reduce the impact to close to zero when jemalloc is not used, since the only thing we would do is access the pointer and drop the Layout.

Since AllocErr now contains the layout, it should be possible to just pass it to oom without contortions @glandium . An alternative is to provide an oom() method to AllocErr and ReallocErr that do this for you. The only thing we would need to allow this AFAICT is a Display impl for Layout.

From the top of my head, at quick glance:

Using a pointer instead of NonNull makes Results larger than they should be. Also, how can a null pointer be an Ok result when there's AllocErr? Are callers supposed to check both now? There's still the question whether a zero-size Layout should be valid or not, both for codegen reasons (see last paragraph of the first message in https://internals.rust-lang.org/t/pre-rfc-changing-the-alloc-trait/7487) and consistency reasons (many things assume ZSTs have an address in the null page, allowing the allocator to whatever it pleases with that sounds dangerous).

Returning the layout as part of the AllocErr is actually making things awful from a codegen perspective because to use that value to call oom (the only reason it's there would be to feed it there) means things need to be moved between registers. See https://github.com/rust-lang/rust/issues/32838#issuecomment-377097485 (beware, it's in the "hidden items"). Using the Layout the caller already has is actually better in that regard, and I have a PR underway that deals with the oom thing without too much contortion already, I just needed to adjust some methods to make things easier.

One thing that is nice-ish about Allocation is that there could be:

impl From<*mut T> for Allocation {
    fn from(ptr: *mut T) -> Self {
        Allocation::new(ptr as *mut Opaque, Layout::new::<T>())
    }
}

I'm not particularly convinced by ReallocErr.

Because these functions consume the allocation, and Allocation is neither Copy nor Clone, double frees cannot happen without the user using unsafe code to duplicate an Allocation. realloc consumes the allocation and, if there was an error, returns it.

Is that true? I think I can implement Clone in safe code:

fn clone_alloc(a: &Allocation) -> Allocation {
  Allocation::new(a.ptr(), a.layout())
}

@RalfJung indeed, Allocation::new should be unsafe.


@glandium

Using a pointer instead of NonNull makes Results larger than they should be. Also, how can a null pointer be an Ok result when there's AllocErr?

This is already the case. The current allocation API states that if you allocate a Layout with size() == 0 the allocator can either return an error, or return a unique pointer value on which deallocate must succeed. This value can be Null. Quoting the current docs:

A note regarding zero-sized types and zero-sized layouts: many methods in the Alloc trait state that allocation requests must be non-zero size, or else undefined behavior can result.

However, some higher-level allocation methods (alloc_one, alloc_array) are well-defined on zero-sized types and can optionally support them: it is left up to the implementor whether to return Err, or to return Ok with some pointer.

If an Alloc implementation chooses to return Ok in this case (i.e. the pointer denotes a zero-sized inaccessible block) then that returned pointer must be considered "currently allocated". On such an allocator, all methods that take currently-allocated pointers as inputs must accept these zero-sized pointers, without causing undefined behavior.

In other words, if a zero-sized pointer can flow out of an allocator, then that allocator must likewise accept that pointer flowing back into its deallocation and reallocation methods.

The API I proposed is a non-functional change, so it allows this as well, and therefore it cannot use NonNull. If we were to lift this requirement, then the API could use NonNull.

Returning the layout as part of the AllocErr is actually making things awful from a codegen perspective because to use that value to call oom (the only reason it's there would be to feed it there) means things need to be moved between registers.

I went through the posts there but I failed to find any convincing argument about why this is bad from a codegen perspective beyond "it just happens". Is this a codegen bug? I think that for ease of use we could just add Layout::oom(self) -> ! method that calls the oom lang item passing itself as argument. I have tried with and without a Layout in the AllocError, and at least on godbolt both cases always produce the same assembly https://godbolt.org/g/MiWHpP What am I missing? I can't think of any situation in which if they don't generate the exact same code then that would be a codegen bug, but we should not design the APIs to workaround codegen bugs, we should identify them, asses whether they are solvable, and if so report them/fix them.


It is a bit how to follow exactly what the proposed API for global alloc is or should be, because we don't have a design document anywhere for that, and one has to extract the information from the implementation, the PRs, and the comments that are scattered on many different places.

If @SimonSapin @sfackler and @glandium agree that an API like the one I suggested is worth giving a try, I can send a PR upstream so that we can try it out and identify codegen issues. It might well be that it doesn't work out.

Maybe it would be worth it to summarize exactly what the current GlobalAlloc API looks like and why somewhere as well.

For one, you're talking about GlobalAlloc and are quoting the Alloc trait doc, which is not the same thing. Secondly, you're quoting the doc as it was multiple weeks ago, and that's nowhere "current". Many things changed when GlobalAlloc landed, including the Alloc trait. This is the current doc: https://doc.rust-lang.org/nightly/core/alloc/trait.Alloc.html. Methods don't return null anymore.

I’ll update my proposed API to use non-null then. There are still comments
I the source code lying around these issues though.

On Sat 19. May 2018 at 13:05, Mike Hommey notifications@github.com wrote:

For one, you're talking about GlobalAlloc and are quoting the Alloc trait
doc, which is not the same thing. Secondly, you're quoting the doc as it
was multiple weeks ago, and that's nowhere "current". Many things changed
when GlobalAlloc landed, including the Alloc trait. This is the current
doc: https://doc.rust-lang.org/nightly/core/alloc/trait.Alloc.html.
Methods don't return null anymore.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/49668#issuecomment-390397535,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA3Npn7nI8m_kB2ise4TwvWCMNqndfJrks5tz_xngaJpZM4THipZ
.

@glandium methods are still able to return null for GlobalAlloc though, and they do in some cases: https://github.com/rust-lang/rust/blob/master/src/liballoc_system/lib.rs#L121

Why can GlobalAlloc return null ptrs but Alloc cannot?

GlobalAlloc methods that return nullable pointers don’t return a Result. Null there is equivalent to Err in Result<NonNull<Opaque>, AllocErr>.

To summarize a bit the discussion in internals, these are the changes that I would do to the global allocator trait.

This is a lot of changes, they’d need a lot of justification to do this late in the process after FCP to stabilize is already over. The burden of proof is on you, at this point.

Why does Allocation contain a full Layout rather than just the excess size? Is the alignment in it always the same as was passed to e.g. alloc()? What makes it worth the cost of a larger return type?

Is the Layout inside of AllocErr always the same that was passed to e.g. alloc()? If so why is it there at all?

Because the current AllocErr is zero-size, Result<NonZero<Opaque>, AllocErr> can have the size of a single pointer (using null to represent errors). Your proposal quadruples this size.

It is a precondition on these methods that the Allocations provided come from calling one of the allocation methods of these allocators. […] Because these functions consume the allocation, and Allocation is neither Copy nor Clone, double frees cannot happen without the user using unsafe code

This doesn’t sound useful at all. Collection instances typically do not store the equivalent of a full Layout (so doing so would add space overhead), and even if they did deallocation typically happens when we get &mut self exclusive access but not ownership to e.g. move fields out of a struct. If every single use needs to use the unsafe constructor, what is the point of ownership semantics for Allocation?

Why does Allocation contain a full Layout rather than just the excess size?

Some allocators (e.g. the system allocator on windows) require the alignment used on deallocation to be exactly the same to the alignment that the memory was allocated with. Therefore, at least on windows, the memory address and the full layout is what describes a memory allocation.

Is the alignment in it always the same as was passed to e.g. alloc()?

Not necessarily, the allocator can return a Layout containing a higher alignment.

The cost of returning an extra usize is tiny compared to the cost of allocating and touching memory, and zero if the value is just propagated without using it and the allocation methods are inlined.

Is the Layout inside of AllocErr always the same that was passed to e.g. alloc()?

Yes.

If so why is it there at all?

Because otherwise one needs to bubble it up together with AllocErr if oom is to be called at a higher level than where the allocation happened and that's inconvenient. See also: https://github.com/glandium/rust/commit/5a522352904f2873c2622899ecda067c25368dd5 .

Because the current AllocErr is zero-size, Result, AllocErr> can have the size of a single pointer (using null to represent errors). Your proposal quadruples this size.

AFAICT the proposal only requires this size to double. The Ok variant contains a single pointer, and the Err variant contains two usizes. Note that neither the pointer nor the Layout alignment can be zero (and the Layout size should probably never be zero as well). Currently on nightly, the sizes triples, due to https://github.com/rust-lang/rust/issues/46213.

If every single use needs to use the unsafe constructor, what is the point of ownership semantics for Allocation?

This is a good point, and from porting RawVec and Box to use Allocation I am pretty sure that none of the std collections will store it internally.

But not storing a full Allocation is an optimization that some collections can perform. Box doesn't need to store the allocation size, but RawVec cannot perform this optimization and needs to store both the pointer and the size. At the same time, neither Box nor RawVec need to store the alignment, but an OverAlignedRawVec that stores it might just store the full Allocation directly.


EDIT: FWIW I would prefer not to have the alignment be part of Allocation, but the API (in particular that of Alloc) did get a lot simpler by just using Layout and Allocation consistently.

Therefore, at least on windows, the memory address and the full layout is what describes a memory allocation.

That sounds correct in abstract terms, but doesn’t say why everything that "describes" the allocation needs to be part of the return value of alloc().

if the value is just propagated without using it and the allocation methods are inlined.

To enable #[global_allocator], there are symbols like extern { fn __rust_alloc(…) } that cannot be inlined without LTO (which IIRC is not enabled by default in Cargo’s release mode).

an OverAlignedRawVec that stores it might just store the full Allocation directly.

I think that hypothetical is not sufficient to justify a guiding principle for redesigning this entire API.

which IIRC is not enabled by default in Cargo’s release mode

ThinLTO is enabled by default on release mode.

I think that hypothetical is not sufficient to justify a guiding principle for redesigning this entire API.

This was tangentially discussed on IRC last week (maybe @sfackler has the logs?). The only thing that would be backwards incompatible to change here is the definition of Layout which in my branch can't have zero size.

For everything else, we can always add a GlobalAlloc2 trait in the future, and require that either a type implementing GlobalAlloc or GlobalAlloc2 is available in the dependency graph.

@Amanieu and I discussed (in person at impl days) this trait and the remaining open issues, consensus between us is to propose the changes below. @rust-lang/libs, what do you think?

  • extern types themselves are not stable, and we’d rather not block on them. To avoid that, we can change Opaque to be defined instead as struct Opaque(u8);.

    We make it have a size of 1 byte instead of zero so that <*mut Opaque>::offset is not a no-op (which might be unexpected). Unlike with an extern type, *mut Opaque can be dereferenced. But then what can be done with that dereference is very limited because we make Opaque not implement Copy or Clone (or any other trait).

  • Rather than raw *mut Opaque where null represent an error, change GlobalAlloc to use Result<NonNull<Opaque>, AllocErr> in return types (and NonNull<Opaque> in parameter types).

    We provide a conversion from raw pointers so that impls based on malloc-style APIs stay convenient to define:

    struct AllocErr;
    
    impl AllocErr {
      /// Convenience conversion for `GlobalAlloc` or `Alloc` impls
      #[inline]
      pub fn from_null<T>(ptr: *mut T) -> Result<NonNull<Opaque>, AllocErr> {
          NonNull::new(ptr as _).ok_or(AllocErr)
      }
    }
    

    Since AllocErr is zero-size, this result has the same memory representation as a raw pointer and this conversion optimizes to a no-op.

  • We’re still sort of undecided about zero-size allocations. Alternatives are: 1. make them UB (alloc() is an unsafe fn), 2. make them guaranteed to return a non-null aligned pointer (such as NonNull::dangling, but that likely requires an explicit check in every impl, including in dealloc), or 3. make them safe but implementation-defined (either an error or a non-null aligned pointer). Or some combination of these, such as 1 in GlobalAlloc and 2 in Alloc.

    Option 2 in AlloC may allow Vec and other users to simplify their code and remove some special code paths for not making zero-size allocations.

We can add default methods in the future, but let’s not count on GlobalAlloc2. As far as I know the type system cannot express things like "either this or that trait". (And even if it could, let’s just not.)

For zero-sized allocations, I would like to point to some prior art in C++'s operator new:

If this argument is zero, the function still returns a distinct non-null pointer on success (although dereferencing this pointer leads to undefined behavior).

In practice (in both Clang/libc++ and GCC/libstdc++) this is implemented as a quick check that changes the size to 1 if it is zero. I believe that the overhead in this is sufficiently low since nobody in C++-land is complaining about it.

There are 3 approaches that we can take with regards to zero-sized arguments:

1) Keep things as they are: alloc is an unsafe function and passing a zero-sized layout is UB. This would apply to both Alloc and GlobalAlloc.

2) Allow zero-sized layouts in both Alloc and GlobalAlloc. In that case the Alloc implementation would just forward to GlobalAlloc, which itself would simply return the layout alignment if the size is zero (since the alignment is itself a non-zero and suitably aligned address). The alloc method in both traits would become a safe function.

3) Allow zero-sized layouts in Alloc but not GlobalAlloc. This would allow the zero-check in Alloc to be inlined into the caller and eliminated if the caller already performs a zero-check. My feeling is that the GlobalAlloc trait is not really meant to be used directly and should instead be used through the Alloc implementation on Global.

I personally prefer option 2 since it keeps the API consistent between Alloc and GlobalAlloc, and LTO will allow inlining which can eliminate the zero check in most cases.

@SimonSapin

We’re still sort of undecided about zero-size allocations. Alternatives are: 1. make them UB (alloc() is an unsafe fn),

@Amanieu

Keep things as they are: alloc is an unsafe function and passing a zero-sized layout is UB. This would apply to both Alloc and GlobalAlloc.

There is another alternative (implemented here: https://github.com/gnzlbg/rust/commit/806d11ba4a09e1f361a8456a8c7af36d17d58c65) which makes alloc and alloc_zeroed safe by making impl Layout { fn from_size_align(size: usize, align: usize) -> Result<Layout, LayoutErr> } return LayoutErr if size == 0. That is, what's UB is calling the unsafe fn Layout::from_size_align_unchecked with size == 0.

This allows making the size inside Layout a NonZeroUsize as well (along with the align), which isn't a big deal if the GlobalAlloc API does not propagate Layouts on errors, but which would allow shrinking the size of those enums further once we get niche filling for enums.

Option 2 in AlloC may allow Vec and other users to simplify their code and remove some special code paths for not making zero-size allocations.

Currently box does something like this where it just returns align as the address of a zero sized allocation, but RawVec is probably still going to need quite a bit of logic to handle zero-sized types anyways. I haven't tried but I wouldn't bet that it will simplify the implementation of RawVec significantly (maybe not at all).

pub fn from_null<T>(ptr: *mut T) -> Result<NonNull<Opaque>, AllocErr>

That name is somewhat odd, sounds like I have to pass it a null pointer? from_raw would be more accurate IMHO.

  1. make them guaranteed to return a non-null aligned pointer (such as NonNull::dangling, but that likely requires an explicit check in every impl, including in dealloc),

Seems like either every user or every implementer of GlobalAlloc has to special-case zero-sized types. Not sure which there will be fewer of, probably implementers? Also, couldn't there be some kind of wrapper provided by rustc that would take care of handling size 0? Essentially, putting the bruden on all allocators but then factoring out the common part so it is only implemented once?

Related, @glandium

many things assume ZSTs have an address in the null page, allowing the allocator to whatever it pleases with that sounds dangerous

Do you have an example? While libstd will frequently generate ZST pointers into the null page, why would anything rely on that? Notice that if you have a ZST field in a larger struct, taking the address of that field will give you a value that's well outside the null page! So anything making such assumptions is probably already incorrect.

from_null is a bit of a strawman. I made it a method so that is not yet-another-import. But naming it "form" is not really accurate since it doesn’t return Self or Result<Self, _>. It could also be a method of Opaque, but there too I’m not sure what name fits.

My preference for what to do with zero sized allocations is to just forbid zero sized Layouts. This makes alloc safe, and changes the Layout convenience methods like new<T> to return Option<Layout>. I don't think that's all that bad, since ~all allocation code is already branching on zero-sizedness and that could be changed to match on the result of Layout::new.

Seems like either every user or every implementer of GlobalAlloc has to special-case zero-sized types. Not sure which there will be fewer of, probably implementers?

This assumes that if GlobalAlloc handles zero-sized types users won't need to handle them themsselves. Under this assumption, I would agree that it would be better for GlobalAlloc to just handle zero-sized types since there will be fewer implementations of GlobalAlloc than users.

But this assumption doesn't hold. For example, RawVec and box need to handle zero-sized types independently of whether GlobalAlloc supports them or not:

  • RawVec needs to handle the fact that vectors of zero-sized types have always capacity isize::MAX, etc.
  • Box stores a Unique internally so it just manually checks for zero size, and stores the align_of as the non-zero address. If you'd get a NonZero pointer from an GlobalAlloc it would be undefined behavior to dereference this pointer.

I'd find it cleaner if GlobalAlloc would not support zero sizes or zero Layouts, but it would properly returning errors instead.

That is, if a user wants to allocate some memory, then they can use GlobalAlloc. If they try to allocate zero memory and are using the safe API, they get an Err. If they don't want to allocate any memory (e.g. because of ZSTs), they can't use GlobalAlloc so they have to handle this special case differently, but this is something they have to do anyways.

@gnzlbg @sfackler

The problem with disallowing zero-sized layouts is that Layout is used for more than just passing a size and alignment to an allocator. There are methods which allow you to construct complex layouts, which involves concatenating and repeating existing layouts. Disallowing zero-sized layouts would limit the usefulness of these methods (e.g. for generic types which might be ZSTs or zero-length arrays).

@gnzlbg

Currently RawVec must handle zero-length allocations in two cases: ZSTs, which can be optimized away at compile time, and zero-length arrays which depend on runtime information. Currently the code for RawVec still has checks for this second case, which could be eliminated if allocators support zero-length allocations.

As for Box, I don't understand what you are saying? A NonZero pointer returned by GlobalAlloc is still required to be properly aligned, which is exactly what I propose that the allocator implementation should do.

As for Box, I don't understand what you are saying? A NonZero pointer returned by GlobalAlloc is still required to be properly aligned, which is exactly what I propose that the allocator implementation should do.

I think I was a bit confused about what dereferencing a pointer to a ZST means. Thinking about it a bit more, I think that for box and Box we can probably move these branches from the box implementation into the allocator: https://github.com/rust-lang/rust/blob/master/src/liballoc/alloc.rs#L110 .

There are methods which allow you to construct complex layouts, which involves concatenating and repeating existing layouts.

With non-zero-sized Layouts, these methods still work as intended, since existing Layouts must be valid.

Disallowing zero-sized layouts would limit the usefulness of these methods (e.g. for generic types which might be ZSTs or zero-length arrays).

The helper methods for constructing layouts of a single type, or an array of objects of the same type also still work as intended. If the type is zero-sized, they return LayoutErr. That's still generic.

Currently RawVec must handle zero-length allocations in two cases: ZSTs, which can be optimized away at compile time, and zero-length arrays which depend on runtime information. Currently the code for RawVec still has checks for this second case,

It's not about whether the code can be optimized away, but whether the user still has to write it. The code also checks for the case in which the T in RawVec<T> is a ZST, and sets the capacity of the RawVec to a fixed value: https://github.com/rust-lang/rust/blob/master/src/liballoc/raw_vec.rs#L212

I don't see how supporting Layouts of zero size would allow RawVec to remove this code which most collections supporting ZSTs have in one form or another.

With non-zero-sized Layouts, these methods still work as intended, since existing Layouts must be valid.

That's not true, consider this example:

// Allocates an A followed by an array of B
fn alloc_a_b<A, B>(num_b: usize) -> Result<NonNull<Opaque>, AllocErr> {
    let a = Layout::new::<A>();
    let b = Layout::new::<B>();
    let bs = b.repeat(num_b).map_err(|_| AllocErr); // Only returns None on arithmetic overflow
    let layout = a.extend(bs).map_err(|_| AllocErr); // Only returns None on arithmetic overflow
    Global.alloc(layout)
}

Having to manually handle all the cases where the creation of the intermediate Layouts could fail would make this code much more complicated for no good reason. The HashMap implementation is an example of the sort of code which performs this sort of allocations (although it currently performs the layout calculation manually since the code dates from before Layout was introduced).

It's not about whether the code can be optimized away, but whether the user still has to write it.

There are quite a few place that check for cap == 0 which could be eliminated or greatly simplified if the zero check is moved into the allocator:

It seems to me GlobalAlloc is the wrong level to handle ZSTs and whatnot. The Alloc implementation for GlobalAlloc would be more appropriate.

@glandium This is the 3rd option that I suggested in my comment.

@Amanieu

With non-zero-sized Layouts, these methods still work as intended, since existing Layouts must be valid.

That's not true, consider this example: [...]

How does that example show that the methods do not work as intended?

Having to manually handle all the cases where the creation of the intermediate Layouts could fail would make this code much more complicated for no good reason.

playground:

fn alloc_a_b<A, B>(num_b: usize) -> Result<NonNull<Opaque>, AllocErr> {
    let a = Layout::new::<A>()?;
    let b = Layout::new::<B>()?;
    let bs = b.repeat(num_b)?;
    let l = a.extend(bs)?;
    Global.alloc(l)
}

Care to elaborate in which way is this code "much more complicated for no good reason" ? If zero Layouts are not allowed, and one uses the safe API, this code appears to me to be as complicated as it needs to be since there are a bunch of Layout calculations that can fail. Also, if all of them succeed, calling alloc is safe. In any case, at least from a readability point-of-view, I don't find it significantly less readable than the example you posted.

Your code doesn't do the same thing at all. For example, if A is not a ZST and B is, this would be expected to allocate for the size of A. With your code, that ends with a failure.

@glandium ah, that makes sense!

To handle that case without zero-sized Layouts and without support for zero-sized allocations in GlobalAlloc one does need quite a bit of extra code (playground):

fn alloc_a_b<A, B>(num_b: usize) -> Result<NonNull<Opaque>, AllocErr> {
    if let Ok(a) = Layout::new::<A>() {
        if let Ok(b) = Layout::new::<B>() {
            if let Ok(bs) = b.repeat(num_b) {
                let l = a.extend(bs)?;
                return Global.alloc(l);
            }
        } 
        return Global.alloc(a);
    }
    let b = Layout::new::<B>()?;
    let bs = b.repeat(num_b)?;
    Global.alloc(bs)
}

In particular, it is not clear whether the Layout operations failed due to overflow or due to a zero input size.

We could move all the "utility" methods that are used for building Layouts into a builder API (playground):

fn alloc_a_b<A, B>(num_b: usize) -> Result<NonNull<Opaque>, AllocErr> {
    let a = LayoutBuilder::new::<A>();
    let b = LayoutBuilder::new::<B>(); 
    let l: Layout = a.extend(b.repeat(num_b)).try_into()?;
    Global.alloc(l)
}

This is something that std wouldn't need to expose at first, but collections like HashMap could still use it internally.

Whether this is worth it isn't clear to me. This adds a bit of machinery but so does supporting zero-sized allocations :/

@Amanieu and I discussed (in person at impl days) this trait and the remaining open issues, consensus between us is to propose the changes below.

Implementation PR for these changes: https://github.com/rust-lang/rust/pull/51160

I opened #51163 as an example of how layouts can be used in practice. This code depends on Layouts supporting zero sizes to handle the cases where K and V are both ZSTs.

There is one point when @SimonSapin and I disagree: should Global implement GlobalAlloc.

Pros:

  • Alloc isn't going to be stabilized yet, so GlobalAlloc will be the only way for code to directly allocate memory (without using the Vec hack). Stabilizing Alloc will likely take a while since we will want to have some experience using the trait in the std collections before freezing it.

Cons:

  • I feel that the GlobalAlloc trait is intended only to allow a type to be used as #[global_allocator]. Specifying Global as the global allocator does not make sense since it would just recursively call itself.
  • Once Alloc is stabilized, we will want all user allocations to go through that trait. This means that we will effectively be deprecating the GlobalAlloc implementation on Global when Alloc is stabilized.

Here is what I feel are the remaining issues:

  • Behavior of allocating zero-sized layouts (for Alloc and GlobalAlloc).
  • Should Global implement GlobalAlloc?
  • Bikeshed: Opaque? Global?

    • At this point I would strongly prefer just sticking with the existing names unless there is a strong consensus to switch to a different one.

The impl GlobalAlloc for Global is a bit icky, I agree. One of the alternatives we discussed at the all hands is free functions for alloc/dealloc/realloc in std::heap. They'll end up deprecated either way, but it does avoid a kind of confusing trait impl, as well as having a type that implements two traits with methods of the same names.

I'm OK with global functions, with the understanding that they will be deprecated once Alloc is stabilized.

I’ve previously argued that if we have a set of free functions that have the exact same signature as a trait, they might as well be a trait impl. But it is a good point that functions would be easier to deprecate, so I’m fine with that as well.

Right now we can't deprecate trait impls AFAIK: https://github.com/rust-lang/rust/issues/39935

@Amanieu

For zero-sized allocations, I would like to point to some prior art in C++'s operator new:

If this argument is zero, the function still returns a distinct non-null pointer on success (although dereferencing this pointer leads to undefined behavior).

In practice (in both Clang/libc++ and GCC/libstdc++) this is implemented as a quick check that changes the size to 1 if it is zero.

AFAICT this is not exactly how it is implemented in neither libc++ [0] nor libstdc++ [1]. When a size of zero is requested, the size is as you mention changed to one, but then, a real memory allocation for this new size is performed with the requested alignment, and a unique pointer to the address is returned.

That is, for allocations of zero size, C++ calls the system allocator, potentially performing a context switch to the kernel, etc.

I believe that the overhead in this is sufficiently low since nobody in C++-land is complaining about it.

C++ does not have zero-sized types: all types are at least 1 byte in size. For example, if you create an array of length zero of an empty type struct A{}; auto a = new A[0];, that array still has sizeof(A[0]) == 1, so the raw allocation functions will not be called with size == 0. In fact, I think triggering a call to new(size = 0) is actually really hard unless one starts writing very weird C++ code, and probably impossible if one writes modern C++.

If your language does not have a concept of zero-sized types, zero-sized allocations do not really make much sense, and I argue that the real reason nobody is complaining about these is because nobody is triggering calls to new(size = 0) in C++ yet in such a way that the answer isn't "don't write that code, it's horrible".

Many C++ developers have, however, complained about the lack of zero-sized types in the language, and ZST are already part of the C++20 standard and opt-in in some situations, but none that could trigger new(size = 0) AFAICT. Basically, empty types can opt-in to becoming zero-sized when used within other types only.

I think a better example than C++ new would be C malloc, which people do have to manually pass a size argument all the time. The C11 standard says (7.20.3 Memory management functions):

If the size of the space requested is zero, the behavior is implementation defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.

Which if the implementation decides to return something different than null means:

The pointer returned if the allocation succeeds is suitably aligned [...]. The lifetime of an allocated object extends from the allocation until the deallocation. The pointer returned points to the start (lowest byte address) of the allocated space. Each such allocation shall yield a pointer to an object disjoint from any other object.

So AFAICT for zero-sized allocations a compliant C11 implementation returns a pointer that cannot be dereferenced but can be freed. There are no requirements on the pointers to be unique. Since NULL can be freed without issues (its a no-op), it can return the same address for all zero-sized allocations, or some other address, or a unique address for each like C++ does.

There is a potentially interesting presentation about the topic [2] discussing it in the context of security vulnerabilities, but these probably don't apply to Rust and I haven't gone through it all yet.

[0] https://github.com/llvm-mirror/libcxx/blob/master/src/new.cpp#L71
[1] https://github.com/gcc-mirror/gcc/blob/da8dff89fa9398f04b107e388cb706517ced9505/libstdc%2B%2B-v3/libsupc%2B%2B/new_opa.cc#L95
[2] Automated vulnerability analysis of zero sized heap allocations - Julien Vanegue - Microsoft Security Engineering Center (MSEC) Penetration testing team: http://www.hackitoergosum.org/2010/HES2010-jvanegue-Zero-Allocations.pdf

I believe that the overhead in this is sufficiently low since nobody in C++-land is complaining about it.

I would like to clarify that the point I'm trying to make: The overhead of the extra zero-check in the common case (non-zero length allocation) is tiny to the point of being irrelevant. As such, we should aim to provide an API with fewer footguns, especially considering that ZSTs are much more common in Rust.

By the way, I would like to add another data point for how allocation APIs handle zero sizes. jemalloc's internal API (je_mallocx, which we use instead of je_malloc in Rust's jemalloc wrapper) has undefined behavior if a size of zero is passed [1]

[1] http://jemalloc.net/jemalloc.3.html

The mallocx() function allocates at least size bytes of memory, and returns a pointer to the base address of the allocation. Behavior is undefined if size is 0.

The libs team discussed this thread today today and consensus was to:

  • Stick with the status quo for zero-size allocations: alloc() is an unsafe fn and callers must ensure that the layout is not zero-size. We felt that making this method safe is not a significant win since the kind of code calling it needs typically needs unsafe anyway (for dereferencing the returned pointer or of deallocating), and didn’t feel confident relying on (Thin)LTO and Sufficiently Advanced optimizers to not regress performance

  • Also keep raw pointers in GlobalAlloc rather than results. Rationale: https://github.com/rust-lang/rust/pull/51160#issuecomment-392871051

  • Remove the Opaque type entirely and use pointers to u8. If we’re not waiting on extern types, an dedicated struct does not buy us much over u8. And not having a new type avoids entirely questions of its name or what traits it should implement.

  • Replace impl GlobalAlloc for Global with a set of free functions

  • Finally stabilize this subset (after modifications listed above) of the API:

/// #[global_allocator] can only be applied to a `static` item that implements this trait
pub unsafe trait GlobalAlloc {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8;
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout);

    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { 
        // Default impl: self.alloc() and ptr::write_bytes()
    }
    unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {
        // Default impl: self.alloc() and ptr::copy_nonoverlapping() and self.dealloc()
    }

    // More methods with default impls may be added in the future
}

#[derive(Debug, Copy, Clone, PartialEq, Eq)]
pub struct Layout { /* private */ }

impl Layout {
    pub fn from_size_align(size: usize: align: usize) -> Result<Self, LayoutError> {…}
    pub unsafe fn from_size_align_unchecked(size: usize: align: usize) -> Self {…}
    pub fn size(&self) -> usize {…}
    pub fn align(&self) -> usize {…}
    pub fn new<T>() -> Self {…}
    pub fn for_value<T: ?Sized>(t: &T) -> Self {…}
}

#[derive(Copy, Clone, Debug)]
pub struct LayoutError { /* private */ }

/// Forwards to the `static` item registered
/// with `#[global_allocator]` if any, or the operating system’s default.
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {…}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {…}
unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 {…}
unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {…}

/// The operating system’s default allocator.
pub struct System;

impl GlobalAlloc for System {…}

Update: also:

pub fn oom(layout: Layout) -> ! {…}

The C99 Rationale V5.10 document has rationale on why malloc supports zero-sized allocation (section 7.20):

The treatment of null pointers and zero-length allocation requests in the definition of these
functions was in part guided by a desire to support this paradigm:

OBJ * p; // pointer to a variable list of OBJs
/* initial allocation */
p = (OBJ *) calloc(0, sizeof(OBJ));
 /* ... */
/* reallocations until size settles */
while(1) {
p = (OBJ *) realloc((void *)p, c * sizeof(OBJ));
/* change value of c or break out of loop */
}

This coding style, not necessarily endorsed by the Committee, is reported to be in widespread use.

Some implementations have returned non-null values for allocation requests of zero bytes. Although this strategy has the theoretical advantage of distinguishing between “nothing” and “zero” (an unallocated pointer vs. a pointer to zero-length space), it has the more compelling theoretical disadvantage of requiring the concept of a zero-length object. Since such objects cannot be declared, the only way they could come into existence would be through such allocation requests.

The C89 Committee decided not to accept the idea of zero-length objects. The allocation functions may therefore return a null pointer for an allocation request of zero bytes

It is not 100% clear to me what this rationale means, but it appears to mean that they decided to allow malloc(0) because, since C does not have ZSTs, too many people were emulating them with malloc(0) in the wild before standardization.


About C++, I've asked on stackoverflow, and s Bo Persson pointed to C++'s standard 9th defect report:

Scott Meyers, in a comp.std.c++ posting: I just noticed that section 3.7.3.1 of CD2 seems to allow for the possibility that all calls to operator new(0) yield the same pointer, an implementation technique specifically prohibited by ARM 5.3.3. Was this prohibition really lifted? [...]

Proposed resolution:

[...]

Change 3.7.3.1/2, next-to-last sentence, from :

If the size of the space requested is zero, the value returned shall not be a null pointer value (4.10).

to:

Even if the size of the space requested is zero, the request can fail. If the request succeeds, the value returned shall be a non-null pointer value (4.10) p0 different from any previously returned value p1, unless that value p1 was since passed to an operator delete.

[...]

Rationale:

So it seems that in the original draft new(0) could return any non-zero pointer, including the same one for different allocations, but "ARM 5.3.3 specifically prohibited it" (no rationale given) so they changed it to the current wording. The relevant section of the C++ standard has a footnote saying that the intent of new is to be implementable by a call to malloc modulo the zero size exception.

@SimonSapin

If the Layout used to call alloc cannot have zero-size, what did the libs team give as rationale for
allowing Layouts of zero size in the first place?

@gnzlbg The use case of intermediate possibly-zero-size Layouts being used to build larger Layouts given earlier in this thread https://github.com/rust-lang/rust/issues/49668#issuecomment-392580785 sounds reasonable to me. Having a separate LayoutBuilder API seems like a lot just to allow alloc() to be safe (given that callers typically need some unsafe code anyway).

@SimonSapin

  • Is alloc allowed to return a null ptr, if so, what does that mean?

Basically, the trait API looks ok, but what is missing there is the comments expressing the function requirements, semantics, and guarantees.

Right, this still needs lots of docs to write out the API contracts. Returning null indicates an error or failure. Maybe the OS is out of memory, maybe the given Layout is not supported at all by this allocator (e.g. align overflowing u32 on macOS’s System allocator)

Calling alloc with a zero-size layout is undefined behavior, a violation of the unsafe fn contract. Meaning: impls are allowed to do anything in that case, including returning null.

Thanks for typing this up @SimonSapin! That all looks accurate and good to me

I'm not too happy with the lack of Result and not handling zero-sized allocations, but I can accept it with the knowledge that we still have a chance to revise the API later on with the Alloc trait. However we should make it clear (as I discussed with @SimonSapin at Rustfest) that the free alloc::alloc() functions will be deprecated as soon as Alloc is stabilized, in favour of Global.alloc().

As a side note, should we add a global_ prefix to the free functions? A bare alloc might be confusing in a module full of allocation-related definitions (e.g. Alloc which differs only by casing). However this is only a weak objection, I was just wondering if the option had been discussed in the libs team meeting.

I don't think a global_ prefix is all that necessary. It's a free function so it has to be talking to the global allocator, right? There aren't any other options.

Or a global sub-module? So the full path might be std::alloc::global::alloc()

Stabilization PR: https://github.com/rust-lang/rust/pull/51241 (finally!)

Was this page helpful?
0 / 5 - 0 ratings