Rust: Consider reverting the merge of collections into alloc

Created on 7 Jul 2017  路  74Comments  路  Source: rust-lang/rust

This PR merges the collections crate into the alloc crate, with the intent of enabling this PR.

Here are some reasons against the merge:

  • It is a violation of layering / seperation of responsibilities. There is no conceptual reason for collections and allocation to be in the same crate. It seems to have been done to solve a language limitation, for the enablement of a fairly marginal feature. The tradeoff does not seem worth it to me.

  • It forces any no_std projects that want allocation to also take collections with it. There are presumably use cases for wanting allocator support without the Rust collections design (we know the collections are insufficient for some use cases).

  • It gives the std collections crate special capabilities that other collections may not implement themselves - no other collections will be able to achieve feature parity with the conversion this merger is meant to enable.

Personally I value the decomposition of the standard library into individual reusable components and think the merge is moving in the wrong direction.

I am curious to know what embedded and os developers think of this merge cc @japaric @jackpot51 .

C-bug P-high T-libs final-comment-period

Most helpful comment

@brson thanks for reopening discussion, and @japaric thanks for the ping.

Personally I value the decomposition of the standard library into individual reusable components and think the merge is moving in the wrong direction.

Exactly. I have a few reasons why I think combining facade crates relating to allocation is bad, but I want to start positively with why I think breaking facde crates further apart in general is good. Writing this with @joshlf's help.

Small, quickly compiled binaries are nice, but more important is the efficiency of the development process. We want a large, diverse, and flexible no-std ecosystem, because ultimately we are targeting a large diverse space of projects. Small libraries are key to this both because they reduce the cognitive load of everyone developing those libraries, and allow more developers to participate.

For cognitive load, it's important both to be utterly ignorant of the code that doesn't matter---as in, don't know what you don't know---and incrementally learn the code that does. --gc-sections or super lazy incremental compilation strip away the noise after the the fact, but do nothing to help the developer reading the code doing the same task. Small libraries help ensure as little dead code as possible, allowing the developer and tools alike to stay happily ignorant of everything else out there. For incremental learning, it's crucial to be able to digest the code one small bite at a time. Rust libraries allow all sorts of intra-crate (inter-module) cycles meaning that given a big huge unknown library, it can be really hard to know where to begin as first-time reader. But fine-grained dependency graphs on crates enforce acyclicity--both a top-down learning approach (start with the nice abstractions) or bottom-up one (complete understanding every step of the way) have clear curricula.

For distributed development and diverse goals, the benefits are probably more obvious. Lighter coupling means less coordination so more people can scratch their own itches in isolation. But it also allows complexity to be managed by splitting the ecosystem into small, modular components - this allows people to take only what they need, and thus only opt-in to as much complexity as their application requires.

All this begs the question---how far do we go down the road of splitting coarse crates into fine crates? I think quite far. It is my hope that as crates behind the facade and the language mature, more crates will be able to be written in stable (or otherwise trustworthy) code, and be moved outside rust-lang/rust into their own repo. Likewise, std should be able to (immediately and transitively) depend on third party crates just fine---the lockfile keeps this safe. Rustbuild, and my RFC #1133 are our friends here. To put it another way, there should be as little magic in rust-lang/rust as possible because magic cannot be replicated outside of rust-lang/rust. By splitting crates, we decrease the risk that a given piece of near-stable code will be bundled with code that cannot be stabilized, thus preventing the former from ever becoming stabilized and thus "de-magicked" in the sense of becoming eligible for being moved outside of rust-lang/rust. This runs counter to the arguments in favor of the collections/alloc merge.


In concrete terms, I recall there being talk of incorporating lazy_static into std, and I know there was talk of using parking_lot for locks too. lazy_static optionally uses spin and so should parking_lot, but spin probably shouldn't be exposed in std because it's a bad idea in hosted environments outside bootstrapping such core abstractions abstractions. Yet it would also weird if spin was included in the main repo as a dependency, but not exposed:I generally like the idea of shrinking, not enlarging the main repo, but would feel especially weird about including non-reexported functionality in the main repo.

Back to allocation in particular, @joshlf's been doing some great work with object allocation (type-parametric allocators that only allocate objects of a particular type and cache initialized objects for performance), and it would be nice to use the default collections with that: the fact is, most collections only need to allocate a few types of objects, and so could work with object allocators just fine. Now if we were to combine alloc and collections and the object allocator traits in one crate, that would be a very unwieldy crate playing not just double but triple duty.


if we wanted to remove everything then we should just move the Alloc trait to core in my opinion, not make a whole new crate for just a trait.

Besides the technical reasons, as @japaric and I mentioned elsewhere, including anything allocation related in core, even something as harmless as a trait that need not be used, will scare away developers of real-time systems.

OTOH, I like the idea of allocator types being usable without Heap. Heap represents "ambient authority " for allocation, to use the capabilities lingo. As a general principle, ambient authority is suspicious, and in particular this complicates safely booting an OS where one doesn't want to allocate accidentally before any allocator is set up.

Also, there's an analogous problematic cycle to avoid when defining the static allocator even if it is initialized statically: For crates which implement global allocators it's incorrect for them to use Heap, yet so long as it's in the same crate as the alloc traits, there's nothing to prevent them from doing so. Technically the language still wouldn't stop them if it's in another crate, but at least they would have to import something suspicious explicitly. This is just like @japaric's concern with allocators accidentally using collections.

CC @eternaleye

All 74 comments

@brson I'm not sure I agree with the reasons you laid out. In my mind liballoc provides the ability to depend on dynamic memory allocation, but nothing else. On top of that one assumption you can build up the entire crate (pointers + collections) with no more assumptions than libcore already has. In that sense I don't see this as a violation of layering but rather just giving you one layer and then everything else that it implies.

It forces any no_std projects that want allocation to also take collections with it.

Is this a problem? Linkers gc'ing sections should take care of this? Isn't this equivalent to libcore bringing utf-8 formatting when you otherwise don't need it?

It gives the std collections crate special capabilities that other collections may not implement themselves

Do you have an example of this?

I'm in favor of undoing the merge.

In general, I'd prefer not to reduce the number of crates in the std facade. I'm
afraid that these seemingly "small" changes may eventually become hard, if not
impossible, to undo once stable std API depends on these crates being merged
(cf. HashMap confined to std even though it doesn't depend on OS mechanisms).
And that may lock us from creating a set of stable fine grained standard /
blessed crates for no-std development.

In the particular case of alloc / collections it feels odd to have a full suite
of collections available to me when I'm implementing the Allocator trait.
Things are likely to go wrong if I use those collections in the implementation
of my allocator, right? I think allocators are not supposed to do any dynamic
allocation for their internal data structures.

Personally, I'd prefer if alloc only contained the Allocator interface and
Arc was in another crate since it depends on dynamic allocation.

cc @Ericson2314 @whitequark

If we're going to separate this crate then I think we need to be super clear about why. For example the previous incarnation of alloc I don't think made any sense to keep collections separate because it already had smart pointers and such. If we wanted to remove everything then we should just move the Alloc trait to core in my opinion, not make a whole new crate for just a trait. The purpose of alloc is to provide the assumption of a global allocator, notably the Heap type and value.

Things are likely to go wrong if I use those collections in the implementation
of my allocator, right?

Yes, and to me this is not a reason to separate the crates. This is already a "problem" in the standard library where the "fix" is otherwise too unergonomic to work with.

It gives the std collections crate special capabilities that other collections may not implement themselves - no other collections will be able to achieve feature parity with the conversion this merger is meant to enable.

If cargo had the ability to run with multiple different forks of std see this RFC, this would be much less of an issue.

Is this a problem? Linkers gc'ing sections should take care of this? Isn't this equivalent to libcore bringing utf-8 formatting when you otherwise don't need it?

It is a problem. I do not think depending on linker to throw away unwanted code is a reasonable excuse to make unsatisfactory architectural decisions. If this were an out-of-tree project where people have to spend CPU time compiling code, then telling people to just merge your abstraction layers, compile it all, and throw most of it away later, would not be acceptable.

Do you have an example of this?

Yes. The entire motivation for this merge is to follow it up with this PR, which is seemingly not possible without the collections being literally in the alloc crate. No other collections crate can achieve this.

For example the previous incarnation of alloc I don't think made any sense to keep collections separate because it already had smart pointers and such.

This was indeed a weakness of the previous alloc crate. I'd love to pull the smart pointers out if possible, and would love to not make the conflation worse.

If we wanted to remove everything then we should just move the Alloc trait to core in my opinion, not make a whole new crate for just a trait

I don't think this follows. Each layer in the facade adds new capabilities. Alloc is a crucial capability that separates different classes of Rust software.

I'd love to pull the smart pointers out if possible, and would love to not make the conflation worse.

So merging alloc and collections happened to be able to impl<T> From<Vec<T>> for Arc<[T]>. If instead Arc (and Rc) moved to collections, the impl would also be fine -- right?

If the goal is to minimize library code bloat, should we move to libbtree_map, libbtree_set, libvec, libvec_deque, libbinary_heap, etc?

@brson thanks for reopening discussion, and @japaric thanks for the ping.

Personally I value the decomposition of the standard library into individual reusable components and think the merge is moving in the wrong direction.

Exactly. I have a few reasons why I think combining facade crates relating to allocation is bad, but I want to start positively with why I think breaking facde crates further apart in general is good. Writing this with @joshlf's help.

Small, quickly compiled binaries are nice, but more important is the efficiency of the development process. We want a large, diverse, and flexible no-std ecosystem, because ultimately we are targeting a large diverse space of projects. Small libraries are key to this both because they reduce the cognitive load of everyone developing those libraries, and allow more developers to participate.

For cognitive load, it's important both to be utterly ignorant of the code that doesn't matter---as in, don't know what you don't know---and incrementally learn the code that does. --gc-sections or super lazy incremental compilation strip away the noise after the the fact, but do nothing to help the developer reading the code doing the same task. Small libraries help ensure as little dead code as possible, allowing the developer and tools alike to stay happily ignorant of everything else out there. For incremental learning, it's crucial to be able to digest the code one small bite at a time. Rust libraries allow all sorts of intra-crate (inter-module) cycles meaning that given a big huge unknown library, it can be really hard to know where to begin as first-time reader. But fine-grained dependency graphs on crates enforce acyclicity--both a top-down learning approach (start with the nice abstractions) or bottom-up one (complete understanding every step of the way) have clear curricula.

For distributed development and diverse goals, the benefits are probably more obvious. Lighter coupling means less coordination so more people can scratch their own itches in isolation. But it also allows complexity to be managed by splitting the ecosystem into small, modular components - this allows people to take only what they need, and thus only opt-in to as much complexity as their application requires.

All this begs the question---how far do we go down the road of splitting coarse crates into fine crates? I think quite far. It is my hope that as crates behind the facade and the language mature, more crates will be able to be written in stable (or otherwise trustworthy) code, and be moved outside rust-lang/rust into their own repo. Likewise, std should be able to (immediately and transitively) depend on third party crates just fine---the lockfile keeps this safe. Rustbuild, and my RFC #1133 are our friends here. To put it another way, there should be as little magic in rust-lang/rust as possible because magic cannot be replicated outside of rust-lang/rust. By splitting crates, we decrease the risk that a given piece of near-stable code will be bundled with code that cannot be stabilized, thus preventing the former from ever becoming stabilized and thus "de-magicked" in the sense of becoming eligible for being moved outside of rust-lang/rust. This runs counter to the arguments in favor of the collections/alloc merge.


In concrete terms, I recall there being talk of incorporating lazy_static into std, and I know there was talk of using parking_lot for locks too. lazy_static optionally uses spin and so should parking_lot, but spin probably shouldn't be exposed in std because it's a bad idea in hosted environments outside bootstrapping such core abstractions abstractions. Yet it would also weird if spin was included in the main repo as a dependency, but not exposed:I generally like the idea of shrinking, not enlarging the main repo, but would feel especially weird about including non-reexported functionality in the main repo.

Back to allocation in particular, @joshlf's been doing some great work with object allocation (type-parametric allocators that only allocate objects of a particular type and cache initialized objects for performance), and it would be nice to use the default collections with that: the fact is, most collections only need to allocate a few types of objects, and so could work with object allocators just fine. Now if we were to combine alloc and collections and the object allocator traits in one crate, that would be a very unwieldy crate playing not just double but triple duty.


if we wanted to remove everything then we should just move the Alloc trait to core in my opinion, not make a whole new crate for just a trait.

Besides the technical reasons, as @japaric and I mentioned elsewhere, including anything allocation related in core, even something as harmless as a trait that need not be used, will scare away developers of real-time systems.

OTOH, I like the idea of allocator types being usable without Heap. Heap represents "ambient authority " for allocation, to use the capabilities lingo. As a general principle, ambient authority is suspicious, and in particular this complicates safely booting an OS where one doesn't want to allocate accidentally before any allocator is set up.

Also, there's an analogous problematic cycle to avoid when defining the static allocator even if it is initialized statically: For crates which implement global allocators it's incorrect for them to use Heap, yet so long as it's in the same crate as the alloc traits, there's nothing to prevent them from doing so. Technically the language still wouldn't stop them if it's in another crate, but at least they would have to import something suspicious explicitly. This is just like @japaric's concern with allocators accidentally using collections.

CC @eternaleye

Given @Ericson2314 's comment, we (again, jointly) would like to make a proposal for how all of these various components might be structured.

We have the following goals:

  • The Alloc trait should have no dependencies other than core. Its dependencies include other items in its own crate such as collections or the Heap.
  • collections should have no dependencies other than core and the Alloc trait (e.g., it should not depend on the Heap); it should be possible for a no-std program to define its own allocator and use it to back the standard collections.
  • On the other hand, it is desirable for ergonomics to provide variants of each of the collections that default to using the Heap.

Thus, we propose the following:

  • The Alloc trait should either be in core or in its own crate (we'll call it alloc here). The former is simpler, but may be off-putting to embedded/real-time developers, and goes against the original philosophy of core - the bare minimum of nice abstractions to define the always-needed parts of the runtime, not every self-contained abstraction under the sun. The latter involves making a tiny crate whose sole purpose is to hold Alloc and related types (although it would become less tiny if an ObjectAlloc trait were added later). We lean towards a separate crate, but either works.
  • collections should be its own crate with dependencies only on core and alloc (if it exists). It should be no-std.
  • Heap can either be defined in its own crate with dependencies only on core and alloc (if it exists) or simply be defined in std.
  • std will re-export all of the collections with a default Alloc parameter of Heap. There may be other default values that std needs to set as well (e.g., the OS-dependent randomness provider for HashMap).

@sfackler

If the goal is to minimize library code bloat, should we move to libbtree_map, libbtree_set, libvec, libvec_deque, libbinary_heap, etc?

I don't think that the arguments provided suggest that that should happen. Keeping all of the collections together doesn't increase the difficulty of understanding how things work because, for the most part, collections do not depend on one another, and do not share magic under the hood. From a usability perspective, they're logically related, so it would not be surprising to a developer to find them together in the same crate. The arguments we and others have presented do suggest splitting collections into its own thing - separate from, e.g., Alloc - but do not suggest further splitting collections into multiple crates.

All this begs the question---how far do we go down the road of splitting coarse crates into fine crates? I think quite far.

@joshlf ^ seems to imply to me that it would suggest that level of granularity.

Ah, we definitely didn't mean to imply that. @Ericson2314 can weigh in when he gets a chance, but speaking for myself, I would interpret that as "quite far within reason." I don't think that our reasoning provides a good argument for splitting collections up into separate crates, although maybe @Ericson2314 will disagree and will think we should even go that far.

Well I... wouldn't fight that level of granularity if many thers want it :).

Trying to think of a principle of why it's more important to separate alloc from collections than each collection from each other, I arrived at a sort of tick-tock model where one crate (tick) adds some new capability, and the next (tok) builds a bunch of with the capabilities added so far (it's "marginally pure"). Crates like alloc or a kernel bindings (e.g
Libc without the other junk), would be be ticks, while collections and the old libsync tocks. I think it's more important that the ticks be minimal than the tocks.

The entire motivation for this merge is to follow it up with this PR, which is seemingly not possible without the collections being literally in the alloc crate. No other collections crate can achieve this.

@brson: Just a minor correction here: The referenced PR doesn't require collections to be within the alloc crate. It only requires Vec to be visible from the alloc crate, in order to implement From<Vec<T>> for Rc<[T]> and Arc<[T]>. Prior to the merge, this was not possible as it would have meant a cyclical dependency.

An out-of-tree collections crate would be able to make the same impls, as it could have both alloc and collections as dependencies.

@murarth

Just a minor correction here: The referenced PR doesn't require collections to be within the alloc crate. It only requires Vec to be visible from the alloc crate, in order to implement From<Vec<T>> for Rc<[T]> and Arc<[T]>. Prior to the merge, this was not possible as it would have meant a cyclical dependency.

Presumably the visibility requirement exists because Rc and Arc are defined in alloc? If that's the case, would moving them either to their own crate or into collections solve the problem?

@joshlf: Yes, that's correct.

In that case, that'd be my suggestion - to either make a separate box crate (or similar name) or to move them into collections.

@brson

I do not think depending on linker to throw away unwanted code is a reasonable excuse to make unsatisfactory architectural decisions. If this were an out-of-tree project where people have to spend CPU time compiling code, then telling people to just merge your abstraction layers, compile it all, and throw most of it away later, would not be acceptable.

I don't think I entirely agree with this. The whole premise of this crate is that we're shipping it in binary form so the compilation time doesn't matter too much. We absolutely rely on gc-sections for so many other features I think the ship has long since sailed on making an optional feature of the linker that we invoke.

I think it's also important to keep this all in perspective and extra concrete. On 2017-06-01 liballoc took 0.3s to compile in release mode and libcollections took 3s. Today (2017-07-11) liballoc takes 3.2s to compile in release mode. This is practically nothing compared to crates in the ecosystem.

Yes. The entire motivation for this merge is to follow it up with this PR, which is seemingly not possible without the collections being literally in the alloc crate. No other collections crate can achieve this.

I think this is too quick an interpretation, though. As @murarth pointed out above we're not empowering std collections with extra abilities. Any collection outside std can have these impls.

I don't think this follows. Each layer in the facade adds new capabilities. Alloc is a crucial capability that separates different classes of Rust software.

I think my main point is that we should not automatically go back to what we were doing before. I believe the separation before didn't make sense, and I believe the current separate makes a more sense. If there's a desire to separate the concept of allocation from the default collections that's fine by me, but I don't think we should blindly attempt to preserve what existed previously which wasn't really all that well thought out (the alloc crate looked basically exactly the same as when I first made it ever so long ago)


I'd personally find it quite useful if we stick to concrete suggestions here. The facade is full of subtle intricacies that make seemingly plausible proposals impossible to implement today and only marginally possible in the future.

One alternative is the title of this issue, "Consider reverting the merge of collections into alloc". I have previously stated why I disagree with this.

Another alternative by @joshlf above relies on defining collection types that don't know about Heap. This is not possible today because you can't add default type parameters after-the-fact. This may be possible one day but it is not possible today.

I also further more disagree with rationale that keeps Alloc out of libcore. There's no technical reason that I know of why it can't be in libcore and I'd personally find it a perfect fit for libcore. I can imagine a good number of contexts that want libcore, not libstd, and still use dynamic allocation.

@alexcrichton

Another alternative by @joshlf above relies on defining collection types that don't know about Heap. This is not possible today because you can't add default type parameters after-the-fact. This may be possible one day but it is not possible today.

I wasn't thinking that you'd add default type parameters after-the-fact, but rather re-export as a newtype. E.g., in collections:

pub struct Vec<T, A: Alloc> { ... }

And then in std:

use collections::vec;
use heap::Heap;

pub type Vec<T, A: Alloc = Heap> = vec::Vec<T, A>;

I also further more disagree with rationale that keeps Alloc out of libcore. There's no technical reason that I know of why it can't be in libcore and I'd personally find it a perfect fit for libcore. I can imagine a good number of contexts that want libcore, not libstd, and still use dynamic allocation.

That's fine - as we mentioned, keeping Alloc out of core is merely a slight preference, and our proposal still works if Alloc is in core.

The whole premise of this crate is that we're shipping it in binary form so the compilation time doesn't matter too much.

no-std devs will be recompiling it.

3.2s to compile in release mode

How much longer does it take to build a final binary that depends only on alloc not collections? I suspect that will tell a different story?

As @murarth pointed out above

Huh? The issue is Vec, Arc, and RC need to live in the same crate, but that create need not contain the allocator traits. I'd say we do indeed have a problem and while moving all 3 of those to collections is a good step, there's still a problem because anyone else writing there own vec-like thing runs into the same issue.

I think my main point is that we should not automatically go back to what we were doing before.

I'd personally find it quite useful if we stick to concrete suggestions here.

I think there is some consensus beside you that the first step could be making a smaller alloc than before: with no Rc or Arc, and maybe not Box either? Heap and the Alloc traits would stay in alloc, and then as a second step either the traits would move to core, or heap would move to its own crate.

This is not possible today because you can't add default type parameters after-the-fact. This may be possible one day but it is not possible today.

@joshlf lf beat me to using an alias (or if they fails newtype). CC @Gankro cause HashMap and the hasher is 100% analogous.

I can imagine a good number of contexts that want libcore, not libstd, and still use dynamic allocation.

So can I, but nobody was saying just put it in libstd! libcore <- liballoc <- {libcollections | liballoc-system} <- libstd: I can see any sub-graph (including libcore) being useful.

  • libcore alone: hard real-time
  • libcore, liballoc: soft real time---carefully denying ability to do dynamic allocation to time-critical sections.
  • libcore, liballoc, liballoc-global: Linux kernel module

@joshlf

I wasn't thinking that you'd add default type parameters after-the-fact, but rather re-export as a newtype. E.g., in collections:

Yes I understand, and I'm re-emphasizing that this does not work today. Whether it will ever work is still up in the air. As a result it's not a viable proposal at this time.


@Ericson2314

How much longer does it take to build a final binary that depends only on alloc not collections? I suspect that will tell a different story?

No, I highly doubt it. Everything in collections is generic which is pay-for-what-you-use.

Huh? The issue is Vec, Arc, and RC need to live in the same crate, but that create need not contain the allocator traits.

This is missing the point. @brson originally though that by moving Vec to liballoc we're empowering the Vec type with a capability not available to any other collection. This is not correct, the only problem was that the Box type could not reference the Vec name, due to how we organized the crates. This point has nothing to do with the allocator traits.

there's still a problem because anyone else writing there own vec-like thing runs into the same issue.

Can you articulate precisely what you think this problem is?

I think there is some consensus beside you that the first step could be making a smaller alloc than before: with no Rc or Arc, and maybe not Box either?

I disagree with two things here. I don't think it makes sense to couple Heap and Alloc. The Alloc trait is a generic form of allocation, assuming nothing. The Heap type assumes there is a global allocator available and that's quite significant. The second part I disagree with is a "smaller alloc", which I think should be called core::heap. The core::heap module would not contain the Heap type.

@joshlf lf beat me to using an alias (or if they fails newtype). CC @Gankro cause HashMap and the hasher is 100% analogous.

Again, this is not possible. I was the one that put HashMap in std because it wouldn't fit in collections. We have not solved this problem, this is not a viable solution today. Whether it's a possibility in some future compiler needs to be explicitly accounted for as it means we cannot take action at this time.

Let's consider this from another point of view. I see the current crate hierarchy as follows:

  • libcore: no allocation, no I/O
  • liballoc: infallible allocation, no I/O
  • libstd: infallible allocation, hosted I/O

As a no-std/embedded developer, I do not see any practical use in having what's currently in liballoc split into any number of crates. It is permeated by infallibility on every level, from Box to BTreeMap (and we don't even have HashMap on no-std in the first place...) If I have to replace parts of it, I'm going to have to start from Box, and go downhill from there. Otherwise, there's no reason to use only alloc but not collections; Vec is as fundamental as Box.

The savings in terms of code size do not exist because the majority of embedded software is compiled with LTO and at least opt-level 1 even for debugging; without LTO, libcore alone would blow through the entire storage budget, and without opt-level 1, the generated code is all of: too large, too slow, and too hard to read at that special moment you have to read assembly listings.

It seems straightforward and obviously good to put the Alloc trait into libcore, since there is no cost for doing so and then it can become a common interface for any allocators people implement.

It seems straightforward and obviously good to put the Alloc trait into libcore, since there is no cost for doing so and then it can become a common interface for any allocators people implement.

If you, @whitequark, as an embedded developer, don't mind putting the trait in libcore, then I think your opinion overrides mine and @Ericson2314's since we aren't embedded devs :)

@joshlf Mostly it's that I don't really understand the argument for keeping it out. The argument for having separate libcore and liballoc goes: libcore's only dependency in terms of linking is libcompiler_builtins, and it introduces no global entities, whereas liballoc introduces linking dependencies to the __rust_allocate, etc, symbols, as well as such global entities as an OOM handler. This presents clear practical argument for keeping them separate.

A trait that isn't used has no cost.

@alexcrichton

Everything in collections is generic which is pay-for-what-you-use.

Right; my apologies; I forgot about that.

Can you articulate precisely what you think this problem is?

I thought it was a coherence issue. If it's a name-reachability issue with Box, that still counts: the ability to be named in the implementation of Box is a special ability only available to types in the same crate as Box or an upstream crate.

We have not solved this problem, this is not a viable solution today.

Ah there is some naming confusing here because @joshlf used a type alias. A wrapper struct is what I consider a newtype, and that would work, right?. Wrapping every inherent method or using a one-off trait is annoying, but works.

@whitequark

Yes a unused trait is harmless re what code does and what the final binary looks like. But

  • I want to be able to blacklist liballoc, and then be sure that no library I link is trying to use the trait behind my back (unlikely but....).

  • I don't expect real-time devs coming from C to immediately grasp the harmlessness of an unused trait.

  • Finally, I think everything in core should either be a lang item, or be used (immediately or transitively) by a lang item in core----adding other stuff is just making libcore needlessly big even if those items are harmless when ignored.

@whitequark I hope that teaching collections to work with an infallible lobal allocator will be far algorithmicly easier than teaching them to work with a fallible (global or local) allocator.

@Ericson2314

A wrapper struct is what I consider a newtype, and that would work, right?

In a technical sense, yes. In a more practial sense we've discussed doing this in the past and have always decided against it because then std::vec::Vec and a hypothetical collections::vec::Vec wouldn't unify, bifrucating the ecosystem.

Yes a unused trait is harmless re what code does and what the final binary looks like. But

I personally do not view it useful to belabor this point about balking at the mere presence of allocation. Do you really think that someone would shun Rust immediately because the Alloc trait exists in core but they would not shun Rust because of:

  • 128-bit integers
  • float support
  • atomics
  • utf-8 decoding/parsing
  • fallible core::cell types
  • panicking support

Are any of those really that "less offensive" than an allocator trait? If so, why? I don't think you're designing for the right audience.

@Ericson2314

I don't expect real-time devs coming from C to immediately grasp the harmlessness of an unused trait.

I think you should give more credit to embedded developers. Aggressive dead code elimination is a baseline requirement in this space. And many vendor board support packages do come with some sort of allocator (usually a bad one). It's not like we're frightened by the word "alloc".

I hope that teaching collections to work with an infallible lobal allocator will be far algorithmicly easier than teaching them to work with a fallible (global or local) allocator.

This makes no real difference to embedded development. For simple, non-mission-critcal things using a single common heap is usually fine enough, and for more serious applications infallible allocators aren't cutting it. For that matter, any mature library would have to consider allocation failures, and yes, it means that right now it's nearly impossible to implement a mature Rust library that uses dynamic allocation.


@alexcrichton

Let's go through this list.

  • float support

    Actually, higher-end microcontrollers come with floating-point support; e.g. the Cortex-M4F series. We're using one right now in ionpak to implement a PID controller, and we similarly use soft-float in ARTIQ. There's nothing thorny about libcore's floats, not even the fact that you cannot turn them off; the marginal usefulness of getting a slightly more intelligible error is well offset by all downstream code being infected by #[cfg(float)] or something.

  • atomics

    On small single-core systems the largest supported atomic size is also the largest register size, so atomic memory accesses lower into normal ones. On symmetric or even asymmetric multiprocessor systems, assuming the microarchitecture includes cache coherency and the silicon vendor didn't screw it up (lucky!), atomics are absolutely necessary.

  • utf-8 decoding/parsing

    This is a perfectly reasonable library routine that would be stripped by DCE.

  • fallible core::cell types

    This is actually a serious issue. For example, the soft-CPU we use in ARTIQ doesn't have a debug port; even if it had, the amount of legwork required to get JTAG working with a soft-CPU inside an FPGA is inordinate; and even if I got that working, or used a gdb stub (which I also tried), gdb support for the architecture is broken beyond belief and I could never get it to do anything. There's no lldb support. And I had to patch binutils just to get everything to link, because it resolved a relocation wrong. Good times...

    You say "what a horror", for me it's Tuesday. I ended up writing a borrow_mut! macro that embeds the file:line into every site where RefCell::borrow_mut() is called, and relying more on Cell as well as other ways of managing state.

  • panicking support

    There is no problem with panicking _per se_. It's just that every embedded dev out there permanently sets panic = abort and forgets about it. Even if I was nuts enough to implement stack unwinding for an embedded platform (in fact, I did, but that was for other purposes), it's far too slow and finicky to be used as a mechanism of error recovery. You need caches for the DWARF walking code to become fast, and enabling the unwinder caches makes your timings unpredictable. It's better to let the watchdog reset the entire thing, just to be sure.

  • 128-bit integers

    Oh boy this one is bad. It's particularly contentious because right now it means Rust cannot be used on a large number of architectures with less-tested LLVM backends that break on sight of an arithmetic op on i128, and eager codegen for libcore is basically stress-testing the backend. At M-Labs this means we cannot upgrade the Rust compiler used in ARTIQ for the time being, for example.

@whitequark oh I think we're in agreement, what you mentioned above mirrors what I'm thinking. @Ericson2314 seems to be claiming that allocation scares away embedded devs, so I was pointing out a number of other features already in libcore that could hypothetically be construed as scaring away embedded devs. I personally believe that none of those is an impediment to embedded development in Rust.

OK, so it seems like here's where we are:

  • We agree that the Alloc trait (and friends) can go in core
  • We haven't yet agreed on whether collections should have a dependency on Heap

Does that sound accurate?

@joshlf on the first point yes, on the second point I believe we don't have an option. We can't have a type alias which adds a default (not implemented language feature) and having a wrapper struct introduces two types that don't unify. At this point in time I think the only answer is "yes, Vec must mention Heap"

We haven't yet agreed on whether collections should have a dependency on Heap

Speaking of this, why can't the libcollections collections have an alternative constructor that takes a &Alloc? This gives us infallible local allocators without the backwards compatibility nightmare. Sure it is one more indirection, but is it really that expensive?

@alexcrichton

We can't have a type alias which adds a default (not implemented language feature) and having a wrapper struct introduces two types that don't unify. At this point in time I think the only answer is "yes, Vec must mention Heap"

I see the reasoning here, but I worry that if we do that, then we are hamstringing ourselves moving forwards. In particular, if collections have a default Alloc parameter of Heap, and then in the future we gain the ability to have the type alias trick actually work, we won't be able to drop the default parameter without breaking code that relies on it.

@whitequark

Speaking of this, why can't the libcollections collections have an alternative constructor that takes a &Alloc? This gives us infallible local allocators without the backwards compatibility nightmare. Sure it is one more indirection, but is it really that expensive?

I'm not sure I follow - what problem is this trying to solve, and how?

I'm not sure I follow - what problem is this trying to solve, and how?

Nevermind, what I thought of wouldn't work anyway.

@alexcrichton

In a more practial sense we've discussed doing this in the past and have always decided against it because then std::vec::Vec and a hypothetical collections::vec::Vec wouldn't unify, bifrucating the ecosystem.

Well, if it's a temporary measure (resolved before crates are stable), then I think that's perfectly fine.

Also, I forget about doing in std type Vec<T> = Vec<T, GlobalAlloc>;. It might adversely impact type inference, but otherwise should be fine, right? Not exposing any custom allocator at all to std is not a regression.

We could just wait and hope the chalk work injects some new energy into this, but I rather get an earlier start---working cleaning up the crates behind the facade has long been stalled on things like this.

Do you really think that someone would shun Rust immediately because the Alloc trait exists in core but they would not shun Rust because of: ...

I would like to put those behind feature gates, but fair point Alloc and friends could be behind one in core too. The important thing is a crate can declare which of these it uses, and be held to that standard.

Also, I forget about doing in std type Vec<T> = Vec<T, GlobalAlloc>;. It might adversely impact type inference, but otherwise should be fine, right? Not exposing any custom allocator at all to std is not a regression.

I totally hadn't thought of that! That works. Also, I don't think it affects type inference. At the very least, this toy example works fine: https://is.gd/2KN7aC

Anyways, given this realization, I put my support firmly in this camp. I think this addresses all of our (my and @Ericson2314's) original concerns, and addresses (or comes close to addressing) @alexcrichton's concerns about re-exporting in std.

I would like to put those behind feature gates, but fair point Alloc and friends could be behind one in core too. The important thing is a crate can declare which of these it uses, and be held to that standard.

Are you suggesting a separate feature gate beyond the one we're already using for the alloc API (allocator_api)?

It would be pretty bizarre to have to pull in libcollections and explicitly import Vec from there whenever you wanted to switch out the allocator.

The important thing is a crate can declare which of these it uses, and be held to that standard.

This doesn't seem useful to me. A crate can allocate memory without ever touching liballoc or libstd or even libcore. It could call out to mmap directly or drag in its own allocator, however that works.

@sfackler it's a stop gap. If we officially give up on default parameters in aliases, then we can reconsider.

I'm sort of losing track of how this discussion is related to this issue. https://github.com/rust-lang/rust/issues/42774 is the tracking issue for dealing with custom allocators in standard collections. This issue is instead about reverting the change to fuse liballoc/libcollections.

@alexcrichton Sorry for increasing the scope of this, but they do overlap in that before the merge HEAP and collections did not live in the same crate, and yet they still would if we just move Alloc out into core.

To move towards finishing up this issue. I now suggest:

  1. Your idea: move Alloc and friends to libcore. I don't really like it alone as the end result, but I think we all agree it is still better than the status quo.

  2. Moving everything remaining but HEAP out of liballoc into libcollections, so liballoc just deals with global allocation. Even if collections needs HEAP, we don't want collections to have special powers from being in the same crate as HEAP. @japaric likes this I think, @whitequark seems to think it's pointless as the larger issue is Rust is unsafe/unergomic at fallible allocation (I agree that is a big issue), @brson I'm guessing would like it. You I'm guessing wouldn't.

  3. In #42774 decide whether collections need depend on the new liballoc containing just HEAP.

  4. I'll open a new issue for either feature-gating Alloc in libcore or moving it to it's own crate.


@joshlf I meant Cargo features, not language for those parts of libcore which may not be relevant/supported on all platforms. The portability lint might also work.

I don't think your point (2) is actionable with an absence of discussion about point (3), it seems to assume a particular outcome. This is also notably missing feedback from @japaric and @brson

Right I don't want don't want to put words in their mouths. I'm guessing/hoping based on the rest of the thread.

(2) Doesn't depend on (3)---there would be "libcollections -> liballoc (just with heap) -> libcore (with Alloc trait)" dependencies to begin with, which would only be changed if (3) went through. I still prefer more little libraries, and the deferred-global tricks used implement Heap are exactly the sort of tricks I'd expect to defer the stabilization of collections were we not to do this. Conversely, with the split crates, collections may find itself just using stable features before those tricks are stabilized.

I may be biased coming from the no operating system except for a heap perspective of no_std usage, but to me things like Box and Vec are foundational and one of my biggest pain points as a no_std user has been trying to use them from no_std environments. There are crates I want to leverage from the greater Rust ecosystem with features that depend on Box and Vec, and right now that's not easy.

My singular goal getting allocators working in my code was always to unlock Box, Vec, and String (and in some cases things like BTreeMap), and in that regard I thought the original liballoc/libcollections merger made sense. Having a "batteries included" set of heap allocated types that come free with an allocator is much more important to me than being able to craft my own artisanal data structures, although I could see how that might be different for someone writing a kernel.

I concur with @tarcieri, but add one more issue: IO traits. These depend on String being available, and lack of any way of using it is seriously harming embedded libraries. Right now there are several crates attempting to provide Read/Write, they're incompatible and some of them are rather limited.

Yeah, I've also been bitten by IO traits. Also: std::error::Error

Why on earth is std::error::Error not in core?

It used to be, however my understanding is it has some entanglements with Box. There are some comments about it here too:

https://github.com/rust-lang/rust/blob/master/src/libstd/error.rs#L45

We discussed moving it into liballoc at the error-chain meeting.

@tarcieri

My singular goal getting allocators working in my code was always to unlock Box, Vec, and String (and in some cases things like BTreeMap), and in that regard I thought the original liballoc/libcollections merger made sense. Having a "batteries included" set of heap allocated types that come free with an allocator is much more important to me than being able to craft my own artisanal data structures, although I could see how that might be different for someone writing a kernel.

I assume the current proposal of moving the Alloc trait to libcore but keeping Heap in libcollections and having collections take an Alloc parameter that defaults to Heap works for this use case?

This is perhaps getting a bit off topic, but what I really want is Box, String, and Vec added to the prelude if an #[allocator] is defined:

https://internals.rust-lang.org/t/sgx-target-and-no-operating-system-x86-x86-64-targets-in-general/5493/5?u=bascule

@tarcieri the best solution to that is custom preludes.

@Ericson2314 custom preludes don't really solve the problem I'd like to solve, which is allowing crates to leverage Box, Vec, and String from no_std without explicit dependencies on nightly. Since custom preludes are scoped to individual crates, every crate that wants to consume Box, Vec, and/or String still needs to do #[feature(alloc)] (or if this change goes through, #[feature(collections)] again)

The whole point of automatically adding them to a core/alloc prelude would be to remove this explicit dependency. Then use of Box, Vec, and/or String could just be gated on cargo features with no explicit ties to nightly.

@tarcieri I'm not sure what to tell you, sorry. Some insta-stable trick to expose items through abnormal means based on whether anything implements allocator anywhere is....unwise in my book. I'd say stabilize the crates quicker, but @whitequark bring up a good point that our current story around handling allocation failure in everything but the allocator trait itself is atrocious: unergonomic and unsafe. I'm loath to stabilize the "beneath the facade" interfaces until that is fixed.

unergonomic and unsafe

What? That's the exact opposite of reality. It is safe (because crashes are safe), and it's ergonomic, because explicicly handling allocation failures in typical server, desktop, or even hosted embedded software has a high cost/benefit ratio.

Consider this. With mandatory explicit representation of allocation failure, almost every function that returns or mutates a Vec, Box or String would have to return Result<X>, where X is an error representing allocation failure. The caller then has two choices, both of them bad:

  • either call unwrap, or
  • use ? and return Result<Error>, which may fail because Error has a Box in it!
    If the crate defines its own error type, then that can contain OutOfMemory, but this still pessimizes standalone impls. We'll also need to add OutOfMemory to io::ErrorKind.

Also, let's say you have a public function that returns T. You want to optimize it by using a hashmap or a cache or something, but that means it'd now need to return Result<T>, and that breaks your public interface. So you use unwrap(), resulting in essentially the same thing as what we have today, but with ugly source code (don't we all say that unwrap is an antipattern?) and bloated machine code (because of the inlined condition).

The only remotely workable solution I see is extending the API of Box, Vec, etc, adding a fallible sibling to every function that may allocate. But I believe the libs policy is explicitly against that.

To add to this, the majority of small embedded devices whose memory is truly scarce cope in one of the two ways:

  • they partition memory statically or early in boot, so that any fallible allocations, e.g. network packet buffers, happen in pools and can be easily handled, or
  • they acknowledge that recovering from an out-of-memory condition is a complicated affair--you have to have some logic for what to free, you may just have memory too fragmented, etc--and instead, reset, just like in a case of a panic resulting from a violated invariant, an assertion, a lockup, a cosmic ray hitting the CPU core, etc.

As such, I feel like the primary structure providing fallible allocations would be some sort of memory pool. This can be easily handled outside the alloc crate.

As a former libs team member, I'm not opposed to adding try_push, try_reserve, etc to the stdlib at this point in time. Someone just needs to put in the legwork to design the API, which I think was partially blocked on landing allocator APIs -- to provide guidance on what allocation error types are like -- when this first came up.

I believe the gecko team broadly wants these functions, as there are some allocations (often user-controlled, like images) which are relatively easy and useful to make fallible.

@whitequark Sorry, I meant handling allocation in the client is unergonomic/unsafe. Everyone agrees that what we do today is both safe and ergonomic, but not flexible in that regard.

As a former libs team member, I'm not opposed to adding try_push, try_reserve, etc to the stdlib at this point in time.

Then this sounds like the best solution to me, yeah.

This thread has gone in a lot of different directions, and I'm having
a hard time following it now, but it is originally about how to
respond to the out-of-process merge of the collections and alloc
crate.

I'm afraid I framed this incorrectly by presenting a defense of the
unmerged crates, but the onus here is on proponents of merged
collections and alloc crates to justify that design, and to do it in a
way that is fair to the rest of us. I don't believe that argument has
been made successfully.

This was a major breaking architectural decision, done out of process.
In cases like this our default stance should be, and usually is, to
revert. This needs to be done before the change hits stable. We have
about 6 weeks right now.

I suggest we take the following actions:

  • Immediately un-deprecate the collections crate to stem the downstream churn

  • Revert the merge of collections and alloc

  • Implement #43112 some other way. It has been said that it can be
    done without the merge, so please do it.

  • Proponents of changing the std facade write an RFC

My gut feeling is there are some complex interrelationships between these types which are difficult to model given the previous alloc/collections split. I think that was the motivation for the merge in the first place.

As an example, at the error-chain meeting we discussed moving std::error::Error to liballoc. My gut feeling (and it seems at least @whitequark agrees) is that it belongs in libcore. However, because blanket impls for Box required knowing that &str: !Error (or so says the comments in error.rs), it's coupled to Box, which I guess was the motivation to moving it to std in the first place.

This means any crates that want to work with no_std can't use std::error::Error, which unfortunately is a big blocker for making much of the extant crate ecosystem available in no_std.

I guess my question is: is there still a path forward for unlocking std::error::Error in no_std if this merge is reverted? Could std::error::Error be moved to libcollections instead of liballoc? (which feels even weirder, but I'm not sure there are other options due to its entanglement with Box)

@tarcieri with or without the revert, it can go in alloc. If Box is moved to collections, it can go in there. [Based on http://aturon.github.io/blog/2017/04/24/negative-chalk/, I now think we actually will be able to get negative impls (but not necessarily user-writable negative bounds). Then it can be in core. But I digress.]

@brson Ah, I wasn't actually sure what the process is for changing crates whose very existence is unstable. Thanks for clarifying.

My apologies if that's all orthogonal to the revert. If that's the case, I have no objections.

A couple of process points:

  • I don't know that this unstable arrangement of crates needed the full RFC process, but it should've gotten buy-in from more stakeholders, clearly.

  • That said, my feeling is that this has landed on nightly, it's already caused churn, and it would be better to see if we can quickly reach a rough consensus on the direction we want to take things before changing direction. We should try to minimize pain and confusion in the ecosystem while we try to correct our process mistake.

From the libs team meeting, it sounded like @alexcrichton was quite confident that the original crate organization was buying us nothing, and that the problems people are interested in solving are better addressed in some other way. I think it'd be good for @alexcrichton and @brson to sync up, and then summarize their joint perspective on thread, before we make more changes here.

There doesn't seem to have been any progress on this in the last 3 weeks; and PR #42565 is blocked on this resolving one way or the other. What steps do we need to take to unstick this?

"Watch me hit this beehive with a big ol' stick", he said, pressing Comment

42565 was the sort of thing I was alluding to earlier. Is there a path forward for that if the merger were to be reverted?

42565 was the sort of thing I was alluding to earlier. Is there a path forward for that if the merger were to be reverted?

Yes -- move just Arc and Rc to libcollections. As long as Arc, Rc and Vec are all in the same crate, the orphan impl check (or whatever we call it in Rust ;) should go through.

I've re-read this entire thread, and to me the key points are:

Clarity of layering. I think @whitequark put it best:

  • libcore: no allocation, no I/O
  • liballoc: infallible allocation, no I/O
  • libstd: infallible allocation, hosted I/O

though we are working toward supporting fallible allocation in liballoc as well. But the point is that there's a crystal clear rationale for this coarse-grained organization, in terms of requirements imposed.

Crates providing features that aren't used. It's true that the crates.io ecosystem in general skews toward small crates, but these core library crates are a very important exception to that. The thread has made clear that bloat isn't an issue (due to linkers), nor is compilation time (which is trivial here, due to generics).

Special treatment of core crates. Grouping type and/or trait definitions together can enable impls that are not possible externally. However, (1) the very same issues apply to any choice of breaking up crates and (2) the standard library already makes substantial and vital use of its ability to provide impls locally.

Separating collections from the global heap assumption. There is not currently a plausible path to do this, but with some language additions there may eventually be. But by the same token, this kind of question also seems amenable to a feature flag treatment.

Personally, I find the new organization has a much more clear rationale than the current one, and is simpler as well. Stakeholders from impacted communities (the no_std and embedded communities) also appear to be on board. I think we should stay the course.

@rfcbot fcp close

Team member @aturon has proposed to close this. The next step is review by the rest of the tagged teams:

  • [x] @BurntSushi
  • [x] @Kimundi
  • [x] @alexcrichton
  • [x] @aturon
  • [x] @brson
  • [x] @dtolnay
  • [x] @sfackler

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Bummer. Then I suppose the next fronts for pushing for modularity are:

:bell: This is now entering its final comment period, as per the review above. :bell:

Please don't revert it. This change has been out in the wild for a long time, and many projects have updated to the new organization of alloc. The new organization has benefits as listed above (https://github.com/rust-lang/rust/issues/40475), and the downsides seem to be smaller than the benefits, especially considering that us folks tracking the nightly, and using the allocation API, have already adjusted to it.

I'm going to close this issue now that the full libs team has signed off. (@jackpot51, to be clear, that means: we will not revert.)

Was this page helpful?
0 / 5 - 0 ratings