Rust: Tracking issue for slice::{ref_slice, mut_ref_slice}

Created on 13 Aug 2015  ยท  39Comments  ยท  Source: rust-lang/rust

This is a tracking issue for the unstable ref_slice feature in the standard library. These functions seem fairly innocuous but it's also somewhat questionable how useful they are. They're probably ready for either stabilization or deprecation as-is.

cc @nikomatsakis

B-unstable T-libs final-comment-period

Most helpful comment

That seems like a really weird decision. This is such a fundamental conversion between primitive datatypes it probably belongs at the language level (ie. &x as &[T]). In C they _are_ the same type. The standard library has stuff for doing networking, hashing, spawning processes and threads all of which could be spun off into separate libraries but if I want to treat a value as an array of length one I need an auxiliary crate?

What does it matter if people only need it from time to time? It's such a trivial function it's not like it's adding bloat.

All 39 comments

Speaking personally, I love these functions. I don't need 'em often, but when I do, I'm happy I have them. =)

Although they are unsafe, the implementations of these functions are one-liners (based on slice::from_raw_parts{,_mut}) that can be easily replicated outside of std. So I wouldnโ€™t miss them too much.

(Compare with functionality that may be rarely used but is impossibly to reasonably implement outside of std because e.g. it requires accessing private fields of std types.)

Nominating for 1.5 resolution (I'd personally want to deprecate)

It's a safe abstraction of unsafe code, I think it would be suited for the standard library.

Just to be clear for where I use this (in case you don't know), it's in a situation like this. Often I have an enum where the variants have different numbers of things associated with them (let's say integers here). Sometimes it's handy to match and only have to deal with the exact right number of integers for each case, but other times I want to just walk over all the integers. For some reason, it often happens that the variants have either 0, 1, or N such things. This works out quite nicely:

enum Foo {
    A, // no integers
    B(u32), // 1 integer
    C(Vec<u32>), // >1 integer
}

impl Foo {
    fn get_integers(&self) -> &[u32] {
        match *self {
            Foo::A => &[],
            Foo::B(ref i) => ref_slice(i),
            Foo::C(ref v) => v,
        }
    }
}

This seems to come up semi-regularly for me. (Sadly, the methods that let you go from &(T, T) to &[T] were removed, or it would work even for an arbitrary number of integers.)

The other situation where this fn comes in handy for me is when you have some generic helper that takes a &[T], but you happen to have an &T lying around. You can easily reuse this helper.

I think it's ok for this to live in libstd. It's not really duplicating anything else; I mean it's true you can write them in terms of other APIs, but only by dropping into unsafe code and going into a whole 'nother level of abstraction.

(That said, if there were to be another crate that ALSO included the ability to convert tuples into slices and so forth, that's more interesting. Though that's probably something we can't "safely" do until we agree to commit to the way tuples are represented in memory, though I could never imagine us changing this.)

This issue is now entering its cycle-long FCP for resolution in 1.5

The libs team was a little up in the air about deprecation or stabilization here, comments are certainly welcome!

:+1: It's somewhat superfluous, yes, but being that it's a safe wrapper around unsafe code it's not a bad abstraction overall. It'd just end up in some utility crate somewhere, like the Apache Commons for Rust.

Please stabilize these. Personally, I use them for reading single bytes from a reader (with read_exact).

The documentation says "Converts a pointer to A into a slice of length 1 (without copying)." Shouldn't "pointer" be "reference"?

Yes.

Definitely stabilize. Don't force me to use unsafe for something so simple.

Personally, I would like to see that they are deprecated. They are simple wrappers of the unsafe code but not marked as unsafe. This is somewhat confusing and inconsistent.

@photino They are not marked unsafe because it's safe to convert any &T or &mut T where T: Sized to, respectively, &[T] or &mut [T] with a length of 1. The layout of the data in the pointed-to area of memory is the same, it doesn't extend the lifetime of the reference, and it doesn't change an &T to an &mut [T].

This is absolutely consistent and is actually a very concise example of a core Rust idiom: that safe wrappers can be created around unsafe code if they can guarantee certain invariants. Here, the invariants are just very easy to enforce.

@cybergeek94 Thanks for your explanation. I can't figure out any other reasons for not providing such safe wrappers.

The libs team discussed this during triage today and the decision was to deprecate these functions. The standard library currently doesn't house many nuggets of functionality like this (e.g. see the recent discussions around Box::leak), and we've also deprecated similar functionality (Option::as_slice).

While certainly useful from time to time, it's not clear that these functions belong in the standard library right now. There can always be a helper crate implementing all sorts of conversions like this, however!

That seems like a really weird decision. This is such a fundamental conversion between primitive datatypes it probably belongs at the language level (ie. &x as &[T]). In C they _are_ the same type. The standard library has stuff for doing networking, hashing, spawning processes and threads all of which could be spun off into separate libraries but if I want to treat a value as an array of length one I need an auxiliary crate?

What does it matter if people only need it from time to time? It's such a trivial function it's not like it's adding bloat.

I mean it's like saying "Oh sure you can parse the file table of a fat32 drive out-of-the-box but if you want to upcast a u32 to a u64 you'll need to download the int_conversion crate". That kinda weird.

Would have liked to see these stabilized as well as I was using ref_slice a while back but when converting the code to only use stable features I had to store a 1-sized array instead https://github.com/Marwes/embed_lang/blob/master/vm/src/vm.rs#L485-L491 (forgot about from_raw_parts at the time).

They may be easy enough to implement and put in a crate but having a crate with just these two functions seems a bit annoying (in the same sense as @canndrew is saying) and I am having a hard time coming up with other similar functions to fit into this.

@alexcrichton

If we want this, we also need to ensure TBAA know T and [T; 1] are the same type (because a &[T; 1] can alias interior of an Option<T>) - so this probably belongs in the standard library.

@canndrew , @Marwes

I think the fundamental-ness of these kinds of methods is somewhat subjective, for example I don't think this is "as fundamental" as a conversion like pointer/length to a slice. These methods can be trivially implemented elsewhere (e.g. mem::transmute::<&T, &[T; 1]>(ptr)) and a safe wrapper can be placed around them. Conversions like slices, on the other hand, need to happen in the standard library or with some standard structure for the ordering of the ptr/length fields.

Also note that this isn't a permanent-until-the-end-of-time decision. We can always choose to include these functions in the future!


@arielb1

That sounds... worrisome! I do not understand any of the specifics, but it seems worth opening a separate issue to track that.

@arielb1 we don't currently employ any TBAA, right?

@alexcrichton
@Gankro

LLVM supports something called "TBAA", which marks every memory access with the type-path used to access it.

The idea is that each C "object" has exactly one type, so if you have code like the following

struct foo {
    int a[16];
};
struct bar {
    int b;
};
void test(struct foo* foo, struct bar *bar) {
    for(int i=0; i<16; i++)
        foo->a[i] += bar->b;
}

then bar can't point into the middle of foo (because otherwise the "object" would need to have the types both struct foo and struct bar), and the compiler can hoist the load of bar->b out of the loop. This is implemented by giving every memory access a type-path (struct foo โ†’ int โ†’ memory and struct bar โ†’ int โ†’ memory in this case) and assuming accesses with incompatible paths don't alias (in C, accesses to char use the tag memory, and therefore are compatible with everything).

The problem is that nothing in safe Rust is preventing us from having [u32] โ†’ u32 โ†’ memory, or even [u32; 1] โ†’ [u32] โ†’ u32 โ†’ memory, as a TBAA tag, because you can't access a u32 field in a struct via a &[u32] array reference, or even a subslice of a &[u32; 3] via a &[u32; 1] (AFAIK). On the other hand, this will break ref_slice (as e.g. Option<u32> โ†’ u32 โ†’ memory isn't compatible with [u32] โ†’ u32 โ†’ memory), and in addition any theoretical method turning a &[u32] to a &[u32; 1] after a bounds check. We need to decide which semantics we prefer, and having ref_slice in std decides it solidly.

Anyway, we don't use TBAA at this moment, and it is not as important because of Rust's aliasing guarantees, but we want to reserve the right to add it in the future.

(FWIW, I am not a fan of TBAA in general. I think the gains are too small
relative to the risks of people using casts and transmutes and not being
aware of the TBAA-rules.)

On Thu, Oct 22, 2015 at 3:15 PM, arielb1 [email protected] wrote:

@alexcrichton https://github.com/alexcrichton
@Gankro https://github.com/Gankro

LLVM supports something called "TBAA", which marks every memory access
with the type-path used to access it.

The idea is that each C "object" has exactly one type, so if you have code
like the following

struct foo {
int a[16];
};struct bar {
int b;
};void test(struct foo* foo, struct bar *bar) {
for(int i=0; i<16; i++)
foo->a[i] += bar->b;
}

then bar can't point into the middle of foo (because otherwise the
"object" would need to have the types both struct foo and struct bar),
and the compiler can hoist the load of bar->b out of the loop. This is
implemented by giving every memory access a type-path (struct foo โ†’ int โ†’
memory and struct bar โ†’ int โ†’ memory in this case) and assuming accesses
with incompatible paths don't alias (in C, accesses to "char" use the tag
"memory", and therefore are compatible with everything).

The problem is that nothing in safe Rust is preventing us from having [u32]
โ†’ u32 โ†’ memory, or even [u32; 1] โ†’ [u32] โ†’ u32 โ†’ memory, as a TBAA tag,
because you can't access a u32 field in a struct via a &[u32] array
reference, or even a subslice of a &[u32; 3] via a &u32; 1. On
the other hand, this will break ref_slice (as e.g. Option โ†’ u32 โ†’
memory isn't compatible with [u32] โ†’ u32 โ†’ memory), and in addition any
theoretical method turning a &[u32] to a &[u32; 1] after a bounds check.
We need to decide which semantics we prefer, and having ref_slice in std
decides it solidly.

Anyway, we don't use TBAA at this moment, and it is not as important
because of Rust's aliasing guarantees, but we want to reserve the right to
add it in the future.

โ€”
Reply to this email directly or view it on GitHub
https://github.com/rust-lang/rust/issues/27774#issuecomment-150371040.

Yeah I was under the impression that TBAA had become quite maligned in recent years.

Although upon reflection I may have heard that from @nikomatsakis ^_^

Yeah I was under the impression that TBAA had become quite maligned in recent years.

https://lwn.net/Articles/316126/ :D
And I've seen a lot of similar frustration on isocpp forums from people doing low-level programming.

On the other hand I've heard from Clang developers that TBAA is crucial for C++ performance (and similarly for performance of Rust raw pointers, I suppose), that's why they push for even stricter aliasing rules.

My opinion was also informed by a conversation with Paul Pedriana, who said
that at EA they disable TBAA because it's too hard for them to reason about
and buys too little. (I also find the rules hard to reason about,
personally.) In any case, I suspect the importance of TBAA for optimization
will depend a lot on what piece of code we're talking about, also -- i.e.,
maybe it is crucial for auto-vectorization, where more reordering is
necessary, but less crucial for other things, or something like that.
Anyway, somewhat off-topic for this thread I guess (though @arielb1's
original point was correct, that rules like TBAA will affect whether
ref_slice is valid unsafe code or not).

/me disappointed by decision to remove ref_slice, but moving on with his
life. ;)

On Fri, Oct 23, 2015 at 4:04 AM, Vadim Petrochenkov <
[email protected]> wrote:

Yeah I was under the impression that TBAA had become quite maligned in
recent years.

https://lwn.net/Articles/316126/ :D
And I've seen a lot of similar frustration on isocpp forums from people
doing low-level programming.

On the other hand I've heard from Clang developers that TBAA is crucial
for C++ performance (and similarly for performance of Rust raw pointers, I
suppose), that's why they push for even stricter aliasing rules.

โ€”
Reply to this email directly or view it on GitHub
https://github.com/rust-lang/rust/issues/27774#issuecomment-150544297.

If anyone still wants these functions, I put them in a small crate: https://crates.io/crates/ref_slice

@nikomatsakis

Will we specify that ref_slice is legal?

The lang team decided that &mut T -> &Cell<T> wasn't crazy, so I imagine tupling on a 1_usize to a pointer is OK.

@huonw Do you have a link to that decision?

That crate and its documentation. (You might be able to get the actual things that were written down at the time in the lang team meeting minutes somewhere in July or August, but I don't think they'll offer anything more than that link.)

@steveklabnik Thanks for the crate.

@notriddle Thanks.

NB. that discussion isn't the lang team (I believe it was an independent invention of the same idea). The notes that were taken are at the end of https://github.com/rust-lang/meeting-minutes/blob/master/lang-team/2015-07-23.md

@huonw Thank you, I tried finding them, but I didn't.

Note that with the recent effort to reduce the amount of dependencies in some projects, we end up downstream copying the functions in their own crate rather than having one more dependency.

This is IMO quite unfortunate and that's more unsafe code to review for everyone.

https://github.com/CraneStation/cranelift/blob/c47ca7bafc8fc48358f1baa72360e61fc1f7a0f2/cranelift-codegen/src/ref_slice.rs#L1-L8

OH WOW, I never saw that. Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

withoutboats picture withoutboats  ยท  211Comments

nikomatsakis picture nikomatsakis  ยท  331Comments

nikomatsakis picture nikomatsakis  ยท  274Comments

Leo1003 picture Leo1003  ยท  898Comments

withoutboats picture withoutboats  ยท  308Comments