rust 🚀 - repr(packed) allows invalid unaligned loads

I... honestly thought that was expected behaviour. What can possibly be done about this is general, other than making references into packed fields a different type (or forbidding them)?

Gankra on 16 Jul 2015

This is UB, so it's clearly not expected... but yes. it's more of a design flaw than an implementation issue.

Note that it's possible to make code like this misbehave even without SIMD, although it's a bit trickier; LLVM performs optimizations based on the alignment of loads.

eefriedman on 16 Jul 2015

Yeah, I only used SIMD because it was the simplest way to demonstrate the problem on x86. I believe platforms like ARM are generally stricter about load alignments, so even, say, u16 may crash in simple cases like the above.

huonw on 16 Jul 2015

Is there a way to tell LLVM that the value could be unaligned, and so LLVM should emit code that tries to read it in a safe way for the given architecture, even if it results in slower code?

retep998 on 21 Jul 2015

@retep998: That's exactly what LLVM should have done for a packed struct. This looks like a LLVM codegen bug to me.

vadimcn on 21 Jul 2015

@vadimcn this is entirely our (or rather: my) fault. We used to not emit alignment data at all, which caused misaligned accesses for small aggregates (see #23431). But now we unconditionally emit alignment data purely based on the type that we're loading, ignoring where that value is stored. In fact at the point where we create the load, we currently don't even know where that pointer comes from and can't properly handle packed structs.

dotdash on 21 Jul 2015

Perhaps we need some sort of alignment attribute that we can attach to pointers?

retep998 on 21 Jul 2015

In general we have no idea where a reference comes from, e.g. fn foo(x: &f32x4) { let y = *x; ... in some external crate can't know if we happen to pass in &val.y.0 in the code above. The only way to codegen to handle unaligned references properly is to assume pointers are never aligned, which would be extremely unfortunate. We don't currently have type-attributes, so it would be somewhat strange to introduce one here, instead of using types (for example).

huonw on 21 Jul 2015

Can we make creation of references into packed structs unsafe?

vadimcn on 22 Jul 2015

triage: P-high

nikomatsakis on 23 Jul 2015

Seems clear there's a bug here. First step is to evaluate how widely used packed is, so huon is going to do a crater run with #[repr(packed)] feature gated.

nikomatsakis on 23 Jul 2015

I made a patch that feature gated repr(packed) and did a crater run. Results: https://gist.github.com/huonw/8262a4fb2da0a052f5f1 . Summary:

High-level summary:

several unnecessary uses of repr(packed)
most necessary ones are to match the declaration of a struct in C
many "necessary" uses can be replaced by byte arrays/arrays of smaller types
8 crates are currently on stable themselves (unsure about deps), 4 are already on nightly
- 1 of the 8, http2parse, is essentially only used by a nightly-only crate (tendril)
- 4 of the stable and 1 of the nightly crates don't need repr(packed) at all

| | stable | needed | FFI only |
| --- | --- | --- | --- |
| image | ✓ | | |
| nix | ✓ | ✓ | ✓ |
| tendril | | ✓ | |
| assimp-sys | ✓ | ✓ | ✓ |
| stemmer | ✓ | | |
| x86 | ✓ | ✓ | ✓ |
| http2parse | ✓ | ✓ | |
| nl80211rs | ✓ | ✓ | ✓ |
| openal | ✓ | | |
| elfloader | | ✓ | ✓ |
| x11 | ✓ | | |
| kiss3d | ✓ | | |

As you can see from the "huon references this issue ..." spam above this comment, I've opened PRs against the libraries where the repr(packed) seems to be unnecessary.

huonw on 25 Jul 2015

The bug here is the testcase, it's intentionally violating CPU alignment constraints. In C, packed is unsafe, and you're not supposed to take pointers to the fields of a packed structure, for fear of unaligned access. So, a simple path forward to me seems to be disallow taking references to fields of a struct with #[repr(packed)].

emberian on 25 Jul 2015

I like @cmr's suggestion, on which I see 3 variations:

hard error on all borrows (most conservative)
copy an rvalue out and borrow that (likely to avoid errors in existing code)
make borrows unsafe

The first two have the advantage that we could pack things tighter, e.g. bool fields could be bit-fields.
Not sure that's something we want to do to #[repr(packed)], but nevertheless, opting into no references to fields will be needed at some point for more layout optimizations.

@huonw Could those crates be checked with an error being emitted here if cmt refers to a field of a #[repr(packed)] struct, or something inside such a field?

eddyb on 25 Jul 2015

👍1

cc https://github.com/rust-lang/rfcs/issues/311 (and also #[repr(simd)], all of which are in the same situation)

glaebhoerl on 25 Jul 2015

brief sketch of how all of these could work:

for interior & refs, copy out onto stack first
for interior &mut, copy out first, and copy _back_ immediately after borrow ends (subtlety: if ownership of &mut is passed to another fn, copy-back is _still_ done by the caller (because callee can't know whether it should); difference should not be observable because &mut implies no other references exist through which to observe; same logic suggests mem::forget shouldn't be a problem either)
wrinkle: interior mutability aka UnsafeCell, needs to either fall back to normal repr, compile error ("can't #repr(foo) on types containing UnsafeCell..."; wrinkle: what about generic types?), or some other special handling
question: are "UnsafeCell" and "not UnsafeCell" the only two possible relevant cases?

(sorry for brevity, been wanting to write this down for a while but short of time)

glaebhoerl on 25 Jul 2015

@glaebhoerl That's an interesting suggestion, and I can't think of any issues with the &mut handling.
So for a struct with no interior mutability, would the only difference with #[repr(packed)] be the lifetime of field borrows, i.e. &'a Struct -> &'a Field wouldn't work?

eddyb on 25 Jul 2015

Oh, hmm. To be honest I hadn't even thought of that. (In case it's not clear to anyone else (it only became clear to me after I had written out a comment asking for clarification): if I have a PackedStruct on my stack, and wish to call an fn get_field(&PackedStruct) -> &Field on it, that plainly won't work if get_field copies the Field out onto its _own_ stack, which'll be deallocated by the time it returns.) Too bad: I had been hoping all #[repr(s)] could keep the same observable semantics, and even a subtlety like this rules out that possibility. Apparently there's no free lunch.

glaebhoerl on 25 Jul 2015

@glaebhoerl I do expect that such a scheme would allow most code to continue compiling, given the small number of repr(packed) uses.

@huonw have you grepped for repr(packed) in the cargo dir to find occurrences in code that does not otherwise compile?
(Just making sure nothing is missed, I don't foresee anything significant showing up.)

eddyb on 25 Jul 2015

I use and need repr(packed) for one of my libraries, because I do memory mapping tricks. I'm fine with it being unsafe, just as long as I can continue to use it from within stable Rust.

retep998 on 25 Jul 2015

The bug here is the testcase, it's intentionally violating CPU alignment constraints. In C, packed is unsafe, and you're not supposed to take pointers to the fields of a packed structure, for fear of unaligned access. So, a simple path forward to me seems to be disallow taking references to fields of a struct with #[repr(packed)].

Of course the bug is in the test case... that's why I filed an issue with it ;) It's a compiler bug that the test case compiles and then crashes: it should either not compile, or not crash. It's triggering undefined behaviour (at the very least, we're telling LLVM that the load is aligned when it isn't), and, notably, it's particularly egregious, since there's no explicit internal references at all: our current (broken) scheme could probably be tweaked to make that specific example compile, without making any other changes.

(Of course, the fundamental problem won't be fixed by making a few tweaks to how we codegen struct field accesses.)

I use and need repr(packed) for one of my libraries, because I do memory mapping tricks. I'm fine with it being unsafe, just as long as I can continue to use it from within stable Rust.

Could you go into more detail about what the struct looks like?

huonw on 26 Jul 2015

have you grepped for repr(packed) in the cargo dir to find occurrences in code that does not otherwise compile?

Nothing new.

huonw on 26 Jul 2015

@huonw https://github.com/NoLifeDev/nx-rs/blob/master/src/repr.rs

retep998 on 26 Jul 2015

It looks like that might be able to get away with [u32; 2] + some transmutes in a pinch (highly not great of course...).

(I assume you've thought about the endianness issues that comes with mmaping data directly?)

huonw on 27 Jul 2015

Going to throw in my voice in that #[repr(packed)] is rather important for http://arcnmx.github.io/nue/pod/ (has a safe/stable interface via syntex). However, yes, they will get aligned wrong on ARMv5 (an arch that rust doesn't support), but will work on ARMv6 (rust's current minimum supported arm platform). Loading from a pointer with incorrect alignment is considered UB in LLVM though, so... ouch. Just because it works in most cases on x86 doesn't mean it can be ignored :(

Note that borrows are only UB/unsafe for types that have an alignment > 1 - I would hope to retain this behaviour at the very least, so that my Le, u8, and other such types can continue to function properly in stable and safe code.

My proposal: If packed can be exposed as a marker trait, OIBIT could be used to ensure that packed types only contain other packed types (and thus all have alignments of 1, and borrowing any of its members will be safe). Then you can opt in to packed via an unsafe impl if you really want to use unaligned accesses, or don't care for whatever reason.

arcnmx on 28 Jul 2015

Not that it really matters due to the LLVM UB issue, but it's still possible to disable unaligned accesses on ARM with a CPU flag; GCC and Clang expose -mno-unaligned-access to generate code that supports this.

comex on 28 Jul 2015

Can this cause memory unsafety?

If it always leads to an exception and abort, maybe it is something that Rust can allow.

bluss on 29 Jul 2015

@bluss it can. Look at the linked issue about unaligned reads on ARM, it'll just silently give you wrong/unexpected values when reading from a pointer. It's also explicitly stated as undefined behaviour in LLVM:

It is the responsibility of the code emitter to ensure that the alignment information is correct. Overestimating the alignment results in undefined behavior. Underestimating the alignment may produce less efficient code. An alignment of 1 is always safe. The maximum possible alignment is 1 << 29.

arcnmx on 29 Jul 2015

Well, I put an initial implementation of this into http://arcnmx.github.io/nue/packed/index.html

#[packed] ensures that all types are safe to use unaligned before applying #[repr(packed)], and includes some helpers along with pod's usual endian primitives. The API needs some work, and some of my assumptions may not be entirely correct, but it gives a rough idea of how something similar could work for std!

arcnmx on 30 Jul 2015

I've written https://github.com/rust-lang/rfcs/pull/1240 which proposes making taking a reference into a repr(packed) struct unsafe.

huonw on 7 Aug 2015

I don't feel like this is the correct approach to take, will write up some notes in the RFC comments in a bit.

arcnmx on 7 Aug 2015

Also, I'd like to point out that, as it turns out, a bunch of types in the Windows SDK are packed, so #[repr(packed)] really needs to stay in stable Rust for winapi's sake.

retep998 on 1 Sep 2015

Also, I'd like to point out that, as it turns out, a bunch of types in the Windows SDK are packed, so #[repr(packed)] really needs to stay in stable Rust for winapi's sake.

Might also be worth noting that some Windows API structs are packed to different alignments, e.g. #pragma pack(2), whereas Rust seems to currently only allow 1-byte alignment (although this can be easily worked around with additional padding fields).

ghost on 2 Sep 2015

Could we perhaps expand the syntax to #[repr(packed(N))] then?

retep998 on 2 Sep 2015

Note that rust-lang/rfcs#1240 was accepted. I have re-used this as the tracking issue.

nikomatsakis on 18 Sep 2015

@rust-lang/compiler it'd be great to find someone to implement this! :)

nikomatsakis on 2 Oct 2015

I'll take a look at it this weekend, I won't assign the issue to myself until I actually start working on it, just in case I decide that I don't have time (or forget!)

Aatch on 2 Oct 2015

A new align metadata was added to the load instruction in LLVM.

http://reviews.llvm.org/rL248721

retep998 on 5 Oct 2015

That doesn't solve the problem, unless we mark _all_ loads out of a &T as having alignment 1 (i.e. the current set-up means we could never assume a &T is correctly aligned), which would be very unfortunate from a performance point of view.

huonw on 6 Oct 2015

Hey @Aatch, have you got an update the situation?

huonw on 12 Nov 2015

@huonw
This is basically waiting on MIR passes.

arielb1 on 13 Nov 2015

triage: P-medium

also waiting for mir

nrc on 19 Nov 2015

Came up today that repr(packed) may also introduce data races. For instance, accessing a misaligned AtomicPtr, or the fields next to it, may do strange things. I'm unsure if this can be triggered in any way that isn't just "misaligned load". I don't know a lot about the relevant instruction sets and things LLVM does.

Gankra on 15 Dec 2015

I'd be fine with not having atomics in repr(packed)

retep998 on 15 Dec 2015

Given that this has been sat around for a while and that it sounds like it is not going to get implemented that soon, I think we should feature-gate #[repr(packed)] in the meantime. I propose we start with a warning cycle and uplift that to beta then put a feature gate in nightly now. Alternatively, we could just warn on use of #[repr(packed)] and never move to a feature gate (since it is widely used in WinAPI).

I think we should do something. Having potential memory unsafety sat around without the compiler telling the user about it at all is pretty bad.

nrc on 7 Feb 2016

Meanwhile in winapi #[repr(packed)] is the only tool I have to lower the alignment of structs to make their layout match that of the C version which uses #pragma pack. Please don't feature gate it without providing an alternative. Many features that I'd like for FFI, such as unions, I can manage without by using my own methods and macros. However without some way to place a 4 byte field on a 2 byte boundary, I'm simply stuck and can't represent those types.

retep998 on 8 Feb 2016

I don't know, packed is used widely enough, and the vast majority of
uses are fine. I expect the fix will come in a month or two, once the
MIR transition has made more progress, but perhaps that is wildly
optimistic.

On Sun, Feb 07, 2016 at 01:48:31PM -0800, Nick Cameron wrote:

Given that this has been sat around for a while and that it sounds like it is not going to get implemented that soon, I think we should feature-gate #[repr(packed)] in the meantime. I propose we start with a warning cycle and uplift that to beta then put a feature gate in nightly now. Alternatively, we could just warn on use of #[repr(packed)] and never move to a feature gate (since it is widely used in WinAPI).

I think we should do something. Having potential memory unsafety sat around without the compiler telling the user about it at all is pretty bad.

Reply to this email directly or view it on GitHub:
https://github.com/rust-lang/rust/issues/27060#issuecomment-181126967

nikomatsakis on 10 Feb 2016

After some discussion in compiler team meeting:

We can easily feature-gate references into fields of packed structs specifically.
_But_ MIR may be mature enough to just port over the unsafety checking and do this The Right Way. I a thinking of taking a stab at that because it sounds fun and I want to do some more MIR work. But if others really want to and I fail to make progress please ping me. :)

nikomatsakis on 11 Feb 2016

Maybe I didn't understand everything, but it seems to me there are 2 different problems here: letting rustc squeeze every bit of space possible (e.g. bool on 1 bit, optimized enums) to have compact data structures, and choosing each struct member's footprint to match an FFI object layout (e.g. including a C bitfield with arbitrary length) - which can't always be done automatically.
So repr(packet) is not an universal solution for data exchange with foreign libs.

bestouff on 30 May 2016

On x86-64, is it always possible to just use unaligned instructions? GHC had to edit the output of LLVM to deal with alignment-requiring AVX instructions.

DemiMarie on 27 Jun 2016

@drbo That doesn't solve the problem on other architectures where alignment is critically important.

retep998 on 27 Jun 2016

@retep998 Still it would solve the problem temporarily, at least for some architectures.

My preferred solution is to disallow borrows from such fields (which are thus required to be Copy), and to emit unaligned instructions for reading and writing to the misaligned fields.

DemiMarie on 8 Jul 2016

@DemiMarie The current plan already is to make borrows into such fields unsafe (but not forbidding them outright, nor requiring such types to be Copy although it would be really hard to use such a field if it wasn't), and to make access into such fields otherwise safe by emitting unaligned instructions for those fields.

retep998 on 18 Aug 2016

It would be nice if borrows of fields within a repr(packed) struct could remain safe if the type of the field was also repr(packed) - or perhaps more generally if the field has byte alignment. I have an unpublished crate that I'm hoping to finish off some day that relies on users of the crate being able to do this.

davidlattimore on 30 Aug 2016

It would be nice if unsafe was only required when the alignment of the field's type is greater than the packing of the struct itself, and if it is only less than or equal it should merely be a lint.

Also, I'm pretty sure the only reason this wasn't implemented yet was because it was blocked by MIR. Now that old trans is completely gone can someone hurry up and implement this already? Maybe squeeze in #[repr(align(N))] and #[repr(packed(N))] in while they're at it.

retep998 on 30 Aug 2016

👍1

@retep998 This sounds very tricky to do if you take generics into account since the type layout is only known after monomorphization. I think just making packed structs unsafe to access is enough for a start, we can always look into loosening the restrictions later.

Amanieu on 31 Aug 2016

👍1

@Amanieu That's not entirely correct, we can do partial layout queries at any point after typeck completes.
_In fact_, that's how transmute has been checked since rust-lang/rust#32939.

eddyb on 31 Aug 2016

The trick in the original bug report of assigning to an intermediate variable, which the simd crate also uses, seems to work in debug builds, but currently does not appear to work in optimized builds (anymore?).

Looking at emmintrin.h from clang, it seems that LLVM doesn't have internal intrinsics for unaligned SSE2 loads but instead the dereferencing a pointer to a single-member struct with __attribute__((__packed__, __may_alias__)) and then extracting the member is supposed to generate unaligned SSE2 loads (and analogously for stores). I.e. the same thing the simd crate does.

Since there isn't an LLVM intrinsic to call directly, it's rather important for the packed idiom for getting LLVM to emit unaligned SSE2 loads/stores to work reliably.

This was previously marked as blocking on MIR. Now that MIR has landed, what should Rust user expectations be in terms of controlling aligned vs. unaligned SSE2 load/store generation?

hsivonen on 4 Nov 2016

@hsivonen

This was previously marked as blocking on MIR. Now that MIR has landed, what should Rust user expectations be in terms of controlling aligned vs. unaligned SSE2 load/store generation?

The next step is to port the unsafety check to operate on MIR, instead of HIR. I'd be happy to mentor such a thing, though it's probably complex enough that it makes sense to sync up online.

nikomatsakis on 4 Nov 2016

The next step is to port the unsafety check to operate on MIR, instead of HIR. I'd be happy to mentor such a thing, though it's probably complex enough that it makes sense to sync up online.

I followed up on internals about a simpler way to address the SIMD load/store use case.

hsivonen on 7 Nov 2016

I would just like to note, that as I described in issue #39376 (which is probably a duplicate of this one), that some architectures don't support unaligned access. packed is definitely a minefield, not only for supporting those platforms, but for users of the language that may not be familiar with the many subtleties involved. With that in mind, I agree with the suggestion that accessing packed structs may need to be considered "unsafe".

binarycrusader on 2 Feb 2017

Just to put the packed knowledge in the same place, in rustc there is a safe and sound (hopefully) use of packed to have the compiler generate the appropriate load of an u32 from an unaligned pointer. The point is that self is used by value and align of Unaligned<u32> is 1.

#[repr(packed)]
#[derive(Copy, Clone)]
struct Unaligned<T>(T);

impl<T> Unaligned<T> {
    fn get(self) -> T { self.0 }
}

bluss on 3 Feb 2017

I would just like to note, that as I described in issue #39376 (which is probably a duplicate of this one), that some architectures don't support unaligned access.

I don't see why platform support for unaligned access is required. LLVM could always expand the access into a byte-by-byte load, or call an intrinsic (I don't remember what it usually does).

whitequark on 3 Feb 2017

@whitequark is right. We even have a read_unaligned function in nightly which is portable across all architectures: https://doc.rust-lang.org/nightly/src/core/ptr.rs.html#175-181

pub unsafe fn read_unaligned<T>(src: *const T) -> T {
    let mut tmp: T = mem::uninitialized();
    copy_nonoverlapping(src as *const u8,
                        &mut tmp as *mut T as *mut u8,
                        mem::size_of::<T>());
    tmp
}

solson on 8 Feb 2017

I discovered that most #[derive(...)] impls are unsafe for packed structs due to taking references to unaligned fields (see https://github.com/rust-lang/rust/pull/39682).

@bluss's rustc code sample above also has an unsafe derived Clone impl which won't compile once we fix this issue:

   fn clone(&self) -> Unaligned<T> {
        match *self {
            Unaligned(ref __self_0_0) => // BAD: reference to unaligned packed struct field
            Unaligned(::std::clone::Clone::clone(&(*__self_0_0))),
        }
    }

EDIT: I sent a PR to fix this code in the compiler and also opened an issue about the general problem.

solson on 9 Feb 2017

👍4

Nice try, GitHub. I only said "when we fix #27060".

@eddyb Can you reopen?

solson on 10 Feb 2017

😄1

Nominating for compiler team discussion. We've been talking about implementing @huonw's RFC 1240 for over a year at this point. I'm not sure how difficult that is to do now, but the longer we wait, the greater the chance of fallout. So I think it'd be good to re-triage.

aturon on 10 Feb 2017

Leaving this nominated for now because @nikomatsakis is not in for the meeting.

arielb1 on 16 Feb 2017

@aturon We've been able to do it more or less ever since we fully switched to MIR.
IMO we should track unsafety in VisibilityScope and move all unsafety checking to work on MIR.
I've been wanting to do this for a while and IIRC @nikomatsakis agrees with my plan, I just didn't get to it.
I might be able to mentor someone to do it (we still have no way to advertise this, do we?).

eddyb on 16 Feb 2017

@eddyb yes, I've been wanting to do the same, and I've actually had this on my (personal) list of "things to write mentoring instructions for". I'll try to write up some instructions today.

nikomatsakis on 17 Feb 2017

@nikomatsakis Did you get a chance to write up that information? I'd be interested in implementing this.

cramertj on 22 Feb 2017

We discussed this in the @rust-lang/compiler meeting today. Nobody has written up any instructions =) but we did discuss some of the more subtle points:

Currently, we strip dead code out of MIR very early. This would mean that if we do 'unsafe' checking on MIR, we will ignore dead code (e.g., return; <this code>). We already ignore a number of errors in dead-code, and borrowck will totally ignore it, so this seems potentially just fine, but it's worth making a firm decision here.

There is another option: we can run the "unsafe check" before we strip dead code. The tricky part is that in that case some of the types in MIR may not match, since the type-checker ignores data-flow out of dead-code. This isn't really a problem for the unsafeck, but more generally it would mean the MIR is not "well-typed" and it'd be sort of nice to have an invariant that MIR is always well-typed.

One last point: we have to be careful that if false { <sort-of-dead-code> } doesn't get stripped out of MIR before we are done doing the various safety checks! It's one thing to suppress errors in code that is obviously dead, but quite another to led the results of constant propagation feed into safety analysis in that way.

nikomatsakis on 23 Feb 2017

@nikomatsakis

One last point: we have to be careful that if false { } doesn't get stripped out of MIR before we are done doing the various safety checks! It's one thing to suppress errors in code that is obviously dead, but quite another to led the results of constant propagation feed into safety analysis in that way.

Especially because that conditional might look like if cfg!(...) or similar.

joshtriplett on 23 Feb 2017

👍1

Actually, it's not even necessary to write & to get such unaligned loads -- having Drop type in a packed struct is also a problem, because drop gets an &mut and hence assumes it is aligned. (I noticed this while playing around with miri and suddenly getting unaligned access errors for boxes in packed structs.)

So, beyond making references to packed fields unsafe, it seems we also have to forbid types in packed structs that implement Drop. Will this need an RFC (despite being a soundness fix)?

RalfJung on 13 Jul 2017

👍1

Because references should always be aligned... how are we supposed to get a raw pointer to a given field to check whether it is in fact aligned or even do an unaligned read or write? Doing &foo as *const T doesn't really work because it goes through an intermediate reference.

retep998 on 13 Jul 2017

@RalfJung huh, interesting. We could amend the existing RFC to cover that case -- i.e., open a PR amending the original text -- but I think it also suffices to open up a Rust issue, tag it as T-lang, and do the FCP there; I feel like it's in the spirit of the existing RFC, so I don't know that a full PR on the RFC repository is needed. (It'd be nice to clarify these procedures, this question comes up somewhat regularly; but the intention is that an RFC, once landed, can be amended in reasonable ways, and any concerns will be addressed and discussed in the tracking issue and -- ultimately -- during stabilization.)

nikomatsakis on 17 Jul 2017

Notes from IRC: https://botbot.me/mozilla/rustc/2017-07-31/?msg=89222091&page=2

RalfJung on 9 Aug 2017

The next step is to port the unsafety check to operate on MIR, instead of HIR. I'd be happy to mentor such a thing, though it's probably complex enough that it makes sense to sync up online.

@nikomatsakis Is this still relevant? This sounds like a significant task of its own, should it be a separate issue?

SimonSapin on 19 Sep 2017

@SimonSapin

I am working on it right now.

arielb1 on 19 Sep 2017

👍2

For Drop fields, could we not move them into an aligned temporary before calling drop?

Zoxc on 27 Sep 2017

@Zoxc

That wouldn't work for unsized fields (yes you can have repr(packed) + unsized), and it could create unexpected stack copies.

I think we should try to forbid repr(packed) structs with destructors and see how that goes.

arielb1 on 27 Sep 2017

This is fixed by #44884, isn't it?

RalfJung on 29 Dec 2017

@RalfJung Last time I tried, rustc itself was no longer crashing with SIGBUS. But code compiled natively on sparc64 still crashed (rustc being cross-compiled for sparc64) with SIGBUS.

I have to perform more tests on sparc64 first to be able to give a qualified answer.

glaubitz on 29 Dec 2017

Ah, this issue now seems to be about two things:

Tracking issue for RFC 1240. That RFC is now implemented in nightly, but it is a warning currently, not an error: https://play.rust-lang.org/?gist=7ebeed4f9cc64fb970a4ba2e604a5697&version=nightly. Probably the tracking issue should remain open until the RFC is fully implemented, i.e., this is a hard error (in nightly at least)?
The code in the first post compiling incorrectly. The code doesn't actually take a reference to an unsafe field, so it seems rather orthogonal to the RFC -- this is purely a codegen issue. I am a little puzzled that these are in the same issue.

RalfJung on 29 Dec 2017

Could somebody summarize exactly what remains to be done here ?

It is currently hard, although not impossible, to write C FFI code without repr(packed) and repr(packed(N)) that does not invoke undefined behavior. Because of how hard this is, libc, mach, and other low-level OS api libraries prefer to actually just silently invoke undefined behavior. AFAICT only rust-bindgen does consistently the right thing here, and the results are pretty horrible.

I only really need this for C FFI so I would be fine with completely forbidding all references to packed structs if that could lead to a minimally usable subset of this being stabilized quicker.

Also, the current RFC2366: portable packed SIMD vector types does not mention the interaction between SIMD vector types and packed, it just assumes that all vector types are always stored at a multiple of their alignment. I don't think that implicitly doing unaligned loads would be a sane default, and I'd rather avoid having to assert! on every vector method that &self is properly aligned. I would be fine with RFC2366 requiring that vector types must always be stored at a multiple of their alignment, which would prevent them from being stored in repr(packed) structs.

gnzlbg on 16 Apr 2018

Could somebody summarize exactly what remains to be done here ?

I only really need this for C FFI so I would be fine with completely forbidding all references to packed structs if that could lead to a minimally usable subset of this being stabilized quicker.

AFAIK everything is stable already (and has been since 1.0), but the problem is that some code is accepted without unsafe. https://github.com/rust-lang/rust/issues/46043 tracks turning the warning this generates into a hard error.

I don't know about the codegen error this issue was originally about.

It is currently hard, although not impossible, to write C FFI code without repr(packed) and repr(packed(N)) that does not invoke undefined behavior.

Is that because the C side is packed, or because repr(C) somehow doesn't do the right thing?

I'd rather avoid having to assert! on every vector method that &self is properly aligned.

You do not have to. References are always aligned; only raw pointers may be unaligned and only if they are used with write_unaligned and read_unaligned. We already tell LLVM that references are aligned, so vector types shouldn't be any different here.

RalfJung on 16 Apr 2018

Is that because the C side is packed, or because repr(C) somehow doesn't do the right thing?

Because the C side is packed with #pragma pack N (and/or a similar packed attribute). In many cases repr(packed) isn't enough and one needs repr(packed(N)) which is not stable yet. This issue is one of its unresolved issues.

gnzlbg on 16 Apr 2018

Is there some reason that everything needs to be aligned?

On at least x86, amd64, and ARM64 I cannot see the harm in telling LLVM
things may not be aligned.

On Mon, Apr 16, 2018, 5:31 AM Ralf Jung notifications@github.com wrote:

Could somebody summarize exactly what remains to be done here ?

I only really need this for C FFI so I would be fine with completely
forbidding all references to packed structs if that could lead to a
minimally usable subset of this being stabilized quicker.

AFAIK everything is stable already (and has been since 1.0), but the
problem is that some code is accepted without unsafe. #46043
https://github.com/rust-lang/rust/issues/46043 tracks turning the
warning this generates into a hard error.

I don't know about the codegen error this issue was originally about.

It is currently hard, although not impossible, to write C FFI code without
repr(packed) and repr(packed(N)) that does not invoke undefined behavior.

Is that because the C side is packed, or because repr(C) somehow doesn't
do the right thing?

I'd rather avoid having to assert! on every vector method that &self is
properly aligned.

You do not have to. References are always aligned; only raw pointers
may be unaligned and only if they are used with write_unaligned
https://doc.rust-lang.org/beta/std/ptr/fn.write_unaligned.html and
read_unaligned
https://doc.rust-lang.org/beta/std/ptr/fn.read_unaligned.html. We
already tell LLVM that references are aligned, so vector types shouldn't be
any different here.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/27060#issuecomment-381537915,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGGWB9taIj5c7dcvGBg4eZRJyg7Kb01Pks5tpGTlgaJpZM4FZlMJ
.

DemiMarie on 16 Apr 2018

@DemiMarie A compiler shouldn't generate code which produces unaligned access. Depending on the target platform, unaligned accesses can either result in performance penalties or even or a crash with SIGBUS.

Rust supports more than just x86 where unaligned access doesn't have such a huge impact. I don't know about the performance impact on arm64 though.

glaubitz on 16 Apr 2018

That depends on the platform, though. On some platforms, generating an
unaligned load or store might be the best option. Can LLVM do that when it
should?

On Mon, Apr 16, 2018, 1:34 PM John Paul Adrian Glaubitz <
[email protected]> wrote:

@DemiMarie https://github.com/DemiMarie A compiler shouldn't generate
code which produces unaligned access. Depending on the target platform,
unaligned accesses can either result in performance penalties or even or a
crash with SIGBUS.

Rust supports more than just x86 where unaligned access doesn't have such
a huge impact. I don't know about the performance impact on arm64 though.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/27060#issuecomment-381686574,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGGWB_WOArzdsx9923j4Utll9JbHhErJks5tpNYygaJpZM4FZlMJ
.

DemiMarie on 16 Apr 2018

The user expects the compiler to generate aligned accesses by default since this aligned accesses are normally the safest and fastest option.

glaubitz on 16 Apr 2018

On some platforms, generating an
unaligned load or store might be the best option. Can LLVM do that when it
should?

Yes, LLVM of course does that. It compares the alignment of data with alignment required by the selected instruction and adjusts the pointer/shifts the data if necessary.

whitequark on 17 Apr 2018

@DemiMarie Notice that alignment doesn't just affect loads. Once we tell LLVM about the alignment 8which we do), it is UB for the pointer value to not have the given alignment even if we do not load/store. For example, if we do bit operations on the least significant bits of the pointer, LLVM will optimize them (or so I am told) based on the assumption that these bits all have to be 0 due to alignment. Also see the example of using alignment for layout optimizations that I mentioned at the end of this post.

RalfJung on 17 Apr 2018

I see. My thought was that on most platforms, telling LLVM pointers are
aligned does not help optimization, so there is no need to do it. Does it
actually help?

On Tue, Apr 17, 2018, 3:57 AM Ralf Jung notifications@github.com wrote:

@DemiMarie https://github.com/DemiMarie Notice that alignment doesn't
just affect loads. Once we tell LLVM about the alignment 8which we do), it
is UB for the pointer value to not have the given alignment even if we do
not load/store. For example, if we do bit operations on the least
significant bits of the pointer, LLVM will optimize them (or so I am told)
based on the assumption that these bits all have to be 0 due to alignment.
Also see the example of using alignment for layout optimizations that I
mentioned at the end of this post
https://github.com/rust-lang/rust/issues/46043#issuecomment-381570961.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/27060#issuecomment-381886959,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGGWBxwUqNEdNXoXE1yhNhV4AX0tjJqmks5tpaBcgaJpZM4FZlMJ
.

DemiMarie on 18 Apr 2018

Does it actually help?

Does it matter? The question is: what do we do in those platforms in which unaligned loads are illegal. Should we have two different programming languages?

gnzlbg on 18 Apr 2018

👍1

Food for thought, another use case that may flavour the solution here is device drivers.

Having control over how a structure lays out in memory is important to systems programmers who work with board support packages and drivers. A common technique in writing C device drivers, where the device registers are memory mapped, is to create a struct that perfectly overlays the registers. gcc and other compilers often require #pragmas to ensure packing and/or alignment are as required and its not uncommon to see C structs created as fully packed and have the alignment dealt with explicitly by the programmer adding explicit padding fields. (This makes it easy to verify by code inspection that the code matches what the device's data book describes.)

You then create a pointer to that struct with the base address of the device's registers. The device registers are read and written by reading and writing to the fields of the struct. You also have to ensure that you use volatile so that the compiler doesn't optimize out register reads not knowing that the device is changing them.

I've seen similar methods used for heterogeneous single board computers that communicated via shared memory and also for messages sent over communications channels where the on-the-wire layout was important to match with whatever was on the other end.

ibkevg on 19 Jul 2018

👍1

@ibkevg You can do this today, with #[repr(C)] structs (for defined layout) and a wrapper type to force accesses to individual registers to be volatile. For example: https://docs.rs/cortex-m/0.5.2/cortex_m/peripheral/cpuid/struct.RegisterBlock.html

However these device registers are typically designed so that none of them is at a misaligned address, aren’t they? So #[repr(C)] is enough and you probably don’t need #[repr(packed)] for this.

SimonSapin on 19 Jul 2018

Great - I might have been thrown off the trail by the name #[repr(C)] because this is needed even if there is no C code present at all and a driver is being writing in 100% Rust. I’d suggest renaming or adding an alias that isn’t FFI specific.

Another thing that led me to write the comment above was skimming this thread and noticing at one point musing about fixing the issue in software by copying values into aligned memory and then reading that, which would lead to extra reads. I didn’t read it carefully but it brought to mind that in the event the compiler for whatever reason can’t decide what’s the right thing to do and decides to play it safe in this way, it could cause trouble for a driver. I’ve worked with devices that have “read clear” registers. So reading the register clears one or more bits in it and when you read it next, it’s gone. Now imagine edge cases of a solution that copies values into memory, reading extra addresses that inadvertently read clears.

So I just wanted to offer that caution - but yes it would be strange for hw to be designed that would require unaligned accesses. Even if a part was used that might have been designed for an older memory architecture, it would typically be interfaced to via an FPGA that could present a “normal” set of registers to software, that or tricks using address lines, etc.

(Barely related, mostly off topic, war story: not every device is going to be memory mapped :) I did have to interface to an Ethernet switch chip once where we only had an SPI (serial) interface into it that was wired to the CPU’s general purpose I/O registers. My code had direct control of a data line and a clock line by writing to the I/O register and was literally twiddling bits to feed in cmds and read data out. Lowest level code I’ve ever written.)

ibkevg on 19 Jul 2018

We have been warning about borrows to packed struct fields requiring unsafe for a long time. How should this issue progress? It is blocking the stabilization of repr(packed(N)) which is required by libc, mach, and winap (widely used libraries), for correctness.

The only unresolved question I can find here is whether we want to make this unsafe on all cases or some cases only (e.g. when the specified packing is less than the field's natural alignment). However, there is also the option of making this always safe, by requiring that the reference is directly casted into a raw pointer (e.g. &foo.field as *const _).

So we could progress this by either turning the current warning (requiring unsafe code) into an error, or by changing the warning to warn about references to fields of packed structs that are not directly turned into pointers, so that we can turn that into an error in the future.

gnzlbg on 19 Oct 2018

@gnzlbg I'm fine with making this a hard error. There's just one thing: I have some ideas for how to make it even harder in the future to create unaligned references. Namely:

I want to propose considering &expr as *[mut,const] _ as a single primitive operation that creates a raw ptr without an intermediate ref every existing.
I want to make it illegal (well, first warn-by-default then maybe later err-by-default and then hard error, maybe in a new edition) even in unsafe code to use anything but the above operation when taking the address of a field of a packed struct. The error should state clearly that even just creating an unaligned reference is UB. Creating a raw ptr to a field of a packed struct is actually safe.

I am a bit worried about having people change the same code twice, that's my only potential reservation with proceeding to make safe-ref-to-packed-field an error. But I don't think that should actually stop us, just bringing it up in case anyone disagrees.

RalfJung on 19 Oct 2018

I am a bit worried about having people change the same code twice, that's my only potential reservation with proceeding to make safe-ref-to-packed-field an error. But I don't think that should actually stop us, just bringing it up in case anyone disagrees.

Im worried to, which is why I would prefer to change the warning to require &expr as *[mut,const] _ which would not require any unsafe code. Otherwise, if we make not using unsafe an error, we are basically forcing everyone to use unsafe { &x.unaligned } which potentially would still be UB, without really telling them that they need to write &x.unaligned as *const _ which does not require unsafe.

gnzlbg on 19 Oct 2018

I think the current status is that we still just warn about these references-to-packed-fields. And indeed adding unsafe doesn't actually help, it's still UB. What would help is stabilizing https://github.com/rust-lang/rust/issues/64490 and then slowly making the references a hard error.

@rust-lang/lang is that also your plan, and do you have a rough timescale for that?

RalfJung on 16 Apr 2020

then slowly making the references a hard error.

I hope you mean a hard error when not in an unsafe block? (Though I could potentially see emitting a warning even in unsafe blocks, because of how easily problematic it can be.) If we _do_ want to make it an error even in unsafe blocks, we definitely need &raw sooner rather than later so we can start linting against it there as well.

It can be sound to take a reference to a packed member when you know the packed member to be aligned, due to a known alignment of the packed datatype, even if it isn't guaranteed by the language.

It'd be kind of sad if the way to say "yes, this is actually aligned" had to be &*&raw const packed.x "just" to help avoid accidentally taking references to packed fields. (Though to be perfectly clear: I understand the issue with silently allowing references to packed fields in unsafe, due in part to implicit autoref and the scope of unsafe for other reasons potentially covering one. I just happen to have recently written and used a packed structure guaranteed to be fully aligned, which is only packed to remove trailing padding for space reasons.)

CAD97 on 16 Apr 2020

Strawman: Create an Unaligned<T> type; deprecate #[repr(packed)] in favor of Unaligned<T> plus #[repr(C)].

comex on 16 Apr 2020

👍1

I hope you mean a hard error when not in an unsafe block? (Though I could potentially see emitting a warning even in unsafe blocks, because of how easily problematic it can be.) If we do want to make it an error even in unsafe blocks, we definitely need &raw sooner rather than later so we can start linting against it there as well.

It is UB even in unsafe blocks. So whether or not there is an unsafe block should make no difference for our lints/hard errors, once we have a stable alternative.

It can be sound to take a reference to a packed member when you know the packed member to be aligned, due to a known alignment of the packed datatype, even if it isn't guaranteed by the language.

The current check already takes that into account, I think, when it is guaranteed. For example, taking a reference to a u8 in a packed struct is fine (no lint, and no errors in the future).

RalfJung on 16 Apr 2020

It'd be kind of sad if the way to say "yes, this is actually aligned" had to be &*&raw const packed.x "just" to help avoid accidentally taking references to packed fields. (Though to be perfectly clear: I understand the issue with silently allowing references to packed fields in unsafe, due in part to implicit autoref and the scope of unsafe for other reasons potentially covering one. I just happen to have recently written and used a packed structure guaranteed to be fully aligned, which is only packed to remove trailing padding for space reasons.)

Using packed for this seems a bit hacky. After all, if you do a direct load or store of one of the fields, LLVM will generate an unaligned load even though, according to you, everything is actually properly aligned. In theory that could have a performance cost; in practice, maybe not, but the mismatch suggests that it's not the ideal solution for your use case.

Perhaps there should be a way to just disable trailing padding for a struct. But that would make it dangerous to create an array of that struct, and I'm not sure how to prevent it from becoming a footgun...

Still, I think making it easy/ergonomic to implicitly assert alignment should be, well, an anti-goal. After all, for typical uses of packed, the data really is expected to be misaligned.

For similar reasons, I'm not thrilled with &raw as a long-term solution. It's better than having no way to avoid UB... but an unaligned raw pointer is still a footgun, because you can accidentally dereference it without using {read|write}_unaligned.

Hence my suggestion for a wrapper type. Not only does that avoid the footgun, it also avoids the need for unsafety altogether, since reading and writing an Unaligned<T> would be safe operations.

comex on 16 Apr 2020

Using packed for this seems a bit hacky. After all, if you do a direct load or store of one of the fields, LLVM will generate an unaligned load even though, according to you, everything is actually properly aligned.

It's not, though. The struct itself could be insufficiently aligned. repr(packed) sets the required alignment of the struct to 1, which is the only reason it can leave away trailing padding. The field might be at an aligned offset within the struct, but the stack/heap allocation containing the struct could be insufficiently aligned.

So @CAD97 I think your code is actually wrong.

For similar reasons, I'm not thrilled with &raw as a long-term solution. It's better than having no way to avoid UB... but an unaligned raw pointer is still a footgun, because you can accidentally dereference it without using {read|write}_unaligned.

That's fair. However, my main concern in this issue is to plug the existing soundness hole -- that seems at least mostly orthogonal to developing more ergonomic ways to deal with unaligned data.

RalfJung on 16 Apr 2020

My guess is that @CAD97 manually ensured their packed struct would be placed at an aligned address, by embedding it in a higher-aligned struct or something like that.

comex on 16 Apr 2020

That's possible, but then it seems entirely reasonable that you need to add some unsafe to promise that you got this right... it's non-trivial.

RalfJung on 16 Apr 2020

Having to use Unaligned<T> would be pretty atrocious in FFI bindings. Large structs full of fields would need every single field's type wrapped in Unaligned<T>, and if it was conditional based on something like architecture you'd end up with duplication...

retep998 on 16 Apr 2020

repr(packed(n)) supports alignments other than 1, so any Unaligned<T> would need to take an alignment value to have feature parity with repr(packed).

bitshifter on 16 Apr 2020

I'm nominating for lang-team meeting to discuss @RalfJung's question of the plans around &raw.

nikomatsakis on 16 Apr 2020

My own personal take:

I am semi-convinced that we should "rethink" raw pointer syntax and try to adopt a new kind of pointer that is raw, non-null, and (probably?) doesn't make the const/mut distinction. @withoutboats was pitching &unsafe (and &unsafe T) as a possible syntax for this, and I don't hate that, though there are parsing ambiguities to consider (&unsafe { 22 } being legal syntax now).

All the same, I think we should consider moving on &raw sooner rather than later, especially if "old-style raw pointers" would be soft-deprecated. It probably takes some time to prove out an &unsafe-like design. It feels like a big deal, in part because I think it would best be accompanied by other changes around the Unsafe Code Guidelines. (Though maybe I'm expecting too much, perhaps &unsafe isn't as hard as all that.)

Regardless, a change around the syntax of raw pointers is clearly the sort of thing that would best be done around an edition, I think, since it's a big shift in developer practices, and in concert with other supporting changes, which is why I brought up UCG-related work.

nikomatsakis on 16 Apr 2020

❤1

(Yes, my code code ensures the alignment of the packed structs by construction, with a kind of "placement new" that requires proper alignment, and never actually holding an instance of the type on the stack. packed(N) wouldn't be enough for this use case, as it includes a structure whose safety alignment is "8+4" (i.e. 4 but not 8), which is not expressible in the type system at all. The code is on GitHub if people want to gawk at the cursed code. In actuality, the code just does an aligned read from the packed fields, it doesn't actually need to create a reference.)

My position has mellowed a bit overnight; I definitely think it's fair to warn on taking a reference to a packed field (when the field isn't guaranteed to be aligned by the Rust type system), even in unsafe blocks, due to the ease of accidentally causing UB through an invisibly added reference (method call) or that just happens to be in an unsafe scope for a different reason, _once we have an alternative_ (&raw and/or &unsafe). Said warning could then potentially be upgraded to a hard error in the (far) future.

CAD97 on 16 Apr 2020

👍1

I am semi-convinced that we should "rethink" raw pointer syntax and try to adopt a new kind of pointer that is raw, non-null, and (probably?) doesn't make the const/mut distinction. @withoutboats was pitching &unsafe (and &unsafe T) as a possible syntax for this, and I don't hate that, though there are parsing ambiguities to consider (&unsafe { 22 } being legal syntax now).

I agree the entire area of raw pointers needs a lot of work. But I also would like to finally see this soundness hole closed. I am torn between not wanting to rush in an addition that we might regret later, and wanting to do something to guide people away from taking references to packed fields (which currently is blocked on there not being anything to guide them to). Maybe exposing &raw in a macro is a better "hotfix" than full-blown new syntax...

RalfJung on 16 Apr 2020

If I have a simple packed struct like this:

#[repr(packed)]
#[derive(PartialEq)]
struct p{
    a: usize,
    b: u8,
    c: usize,
}

And I have 2 variables like that:

let x = p {a: 1usize, b: 2u8, c: 3usize};
let y = p {a: 1usize, b: 2u8, c: 4usize};

Then will x == y on platforms that don't support unaligned access? (Some platforms that don't support unaligned access just ignore the lowest bits so when reading out the field c it will read out field b and the first bytes except the last of c)

I guess my question is: I get that its unsafe to access references which point into a packed struct, but what kind of code will #[derive(PartialEq)] generate? Is it going to take into account that the struct is packed and has to perform unaligned memory accesses? Or is it just generates broken code right now? Unfortunatly I wasn't able to cross compile to test this, but if Rust generates broken code for PartialEq for packed structs then just requiring to put unsafe block around packed-struct-field accesses aren't gonna fix the problem.

alvardes on 17 Apr 2020

repr(packed) + derive(PartialEq) shows a warning:

warning: `#[derive]` can't be used on a `#[repr(packed)]` struct that does not derive Copy (error E0133)
 --> src/main.rs:2:10
  |
2 | #[derive(PartialEq)]
  |          ^^^^^^^^^
  |
  = note: `#[warn(safe_packed_borrows)]` on by default
  = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
  = note: for more information, see issue #46043 <https://github.com/rust-lang/rust/issues/46043>
  = note: this warning originates in a derive macro (in Nightly builds, run with -Z macro-backtrace for more info)

This one is probably going on long enough that we can move towards making it a hard error. Is it worth taking small steps here even while there is no proper solution for taking references yet?

RalfJung on 17 Apr 2020

If you add Copy to the derives, Miri is happy with alignment in the auto-derived code: https://github.com/rust-lang/miri/pull/1343.

RalfJung on 17 Apr 2020

I am semi-convinced that we should "rethink" raw pointer syntax and try to adopt a new kind of pointer that is raw, non-null, and (probably?) doesn't make the const/mut distinction. @withoutboats was pitching &unsafe (and &unsafe T) as a possible syntax for this, and I don't hate that, though there are parsing ambiguities to consider (&unsafe { 22 } being legal syntax now).

Do you by chance have a link to somewhere this was being discussed? It sounds very churny to me, but I’m curious about the motivations behind it.

comex on 17 Apr 2020

👍1

@comex I don't believe there is a great link, it's an idea that's been quietly simmering. I agree that it is would cause a great deal of churn, and I think we'd want to think very carefully about that aspect of it.

If we were going to pursure this, I think a good idea if we wanted to pursue this would be to try and document the motivations, explore what a "green field" design might look like, and also think about smaller steps that might help. (It seems like a good place for a project group, to me.) But that work isn't really happening in earnest afaik.

--

@RalfJung Hmm, I actually quite like the idea of using a macro. I've been meaning for some time to write a blog post talking about how I think we underutilize that option -- i.e., using "ugly" macros to create experimental base capabilities, so that we can iterate more using procedural macros on the higher-level details. This might be a perfect opportunity to do that.

nikomatsakis on 20 Apr 2020

❤1

Oops, send that message a bit early. @comex I wanted to add that some of the motivations I can think of would be:

Ergonomic and safety hurdles around raw pointers: the language doesn't make the "most obvious" thing the correct and performant thing here, which is what we usually attempt to do (e.g., &raw const x would generally be better than &x, but I predict the latter is what most people will continue to reach for.)
Similarly, the panoply of types like Unique<T>, NonNull<T>, and so forth are complex to use but kind of required to get peak performance, better layout, etc. Maybe we can do better?
Bringing unsafe raw pointers closer to Rust and more "in line" with coding elsewhere in the language. We know that most pointers are non-null, and we know that Option is a nice way to signal when this is not the case, and it's how you would do it elsewhere in Rust, but unsafe code tends to be more like C.
Consistency, since the type of raw pointers (*const T) doesn't align with the way they are created (&raw const), unlike most types.

Stepping back, I think there's a general need for us to explore how we can make unsafe less of an "all or nothing" proposition. I'd really like to see layers of unsafe:

Fully safe code
Capabilities like "forget", "safe transmute", or AtomicU32, that permit one to do "subtle" things that require caution, but which are not memory unsafe.
Ergonomic unsafe things that require some proof from users but only simple ones.
"The most unsafe", that require complex reasoning from users.

Thirdly, I also think we need to look at some of the nifty, but unstable, things that we've long held in unsafe code (inline assembly is a very simple example of this, but there are many others) and think about how we can actually expose those.

And then there's the work of gradually bringing conclusions from "Unsafe Code Guidelines" work out and cementing them (which @RalfJung has been pursuing, 🙏).

Lots to do...

nikomatsakis on 20 Apr 2020

❤2

I actually quite like the idea of using a macro.

Note that a macro has been mentioned during the RFC discussion for &raw (https://github.com/rust-lang/rfcs/pull/2582#issuecomment-438850491, https://github.com/rust-lang/rfcs/pull/2582#issuecomment-499845969). One response was that just a macro isn't enough, because we need a new MIR operation, and also people didn't want a "magic macro" that directly goes to a MIR primitive. Otherwise there was not much discussion about it.

But I don't think using a macro just for stabilization purposes has been discussed. However, I assume this would mean the parser has to still always accept the new syntax, so that it can be used in macro expansions?

RalfJung on 20 Apr 2020

Yes, presumably the new syntax would remain unstable and there's a macro that expands to it (using allow_internal_unstable) that gets stabilized sooner. If and when some "proper" syntax for the underlying operation is stabilized later, it becomes a trivial macro any user could have written.

hanna-kruppe on 20 Apr 2020

How about allowing p.field on raw pointers, which returns a raw pointer to that field?

CodesInChaos on 20 Apr 2020

@CodesInChaos that has been suggested multiple times in RFCs and elsewhere. The conclusion was that it would be really confusing if reference.field dereferences the pointer but raw_pointer.field does not. The two expressions look very similar but would have wildly different types. It's also not how any other language works.

Also this issue is about what we can do to mitigate the problems around packed structs. This is not the right place to discuss &raw except insofar as it helps to solve the problem at hand. Your question better fits the tracking issue. To be fair, we already derailed that above a bit with the macro discussion -- but the goal there was to stabilize "something like &raw" faster. Re-opening the already RFC'd syntax bikeshed will certainly not help that goal.

RalfJung on 20 Apr 2020

👍1

@nikomatsakis Thanks for the more detailed explanation :)

comex on 20 Apr 2020

I personally don't have a problem with magic macros. Well, to be specific, I'm thinking more and more about the idea of some kind of syntax that allows us to expose "core operations" in a way that is intentionally signaled as a building block. Macro syntax is one way to go about that, but not the only way.

For example, we might use something like _::<id> as a reserved bit of syntax (I don't think that's legal today?) such that we could write things like _::borrow_raw(x) or something as the "underlying syntax".

Anyway, this tracking thread isn't the best place to talk about that, but I think having an established convention that we use in various places would be a good idea.

nikomatsakis on 22 Apr 2020

We discussed this in the lang team meeting. We'd absolutely like to move forward with raw references. We'd like to see a crater run showing the extent of this problem, and then we may make this change over an edition boundary.

We'd also like to standardize a syntax for "non-breaking syntax/keywords", so that it'd be possible to introduce new keywords understood by the compiler in a namespace that nobody especially wants to type in their code regularly, but that people can experiment with surface syntax for in macros or proc macros. This syntax would be stabilized for a given feature, and continue working even after we introduce some new surface syntax (e.g. at an edition boundary). One such proposal would be r#$token, where token will always get interpreted as a keyword. Then, we could have e.g. r#$raw_ref(...) and r#$raw_mut_ref(...), and decide what the surface syntax should look like later, deferring that either to an edition (with a new keyword) or a point where we find backward-compatible syntax.

joshtriplett on 30 Apr 2020

We'd like to see a crater run showing the extent of this problem

You mean, a crater run that makes references to packed fields a hard error (unsafe or not)?

We'd also like to standardize a syntax for "non-breaking syntax/keywords"

So the plan would be to only go ahead with raw-references-to-packed-fields once such a syntax materializes? (That sounds like it needs its own RFC first.)

RalfJung on 30 Apr 2020

You mean, a crater run that makes references to packed fields a hard error (unsafe or not)?

I think that was the proposal, yes.

So the plan would be to only go ahead with raw-references-to-packed-fields once such a syntax materializes? (That sounds like it needs its own RFC first.)

Not necessarily. It's more that either 1) we need to finish bikeshedding the raw reference syntax, or 2) we need a way to move forward with things that we know we want semantically but haven't finished developing the syntax for. Either one would unblock this.

EDIT: https://github.com/rust-lang/rfcs/pull/2919

joshtriplett on 30 Apr 2020

The topic of &raw versus &#$raw versus raw_ref! has been discussed a bunch by the lang design team, as you can see from the comments above and from issue #64490

At this point we are going to take baby steps here, namely via steps like PR #72279

So I think we can safely unnominate this from lang team's plate (and thus, as a nice side effect, stop talking about it at the T-compiler pre-triage sessions).

pnkfelix on 20 May 2020

A lint to detect references to packed fields (suggested by @retep998) is now available on nightly: #![deny(unaligned_references)]. You can use this to test how much your codebase will be affected when this gradually turns into a hard error.

RalfJung on 28 May 2020

👍1

Is there a Crater run planned with it being denied scheduled for those not watching this issue?

mathstuf on 28 May 2020

@mathstuf in fact that crater run is already queued. ;) See https://github.com/rust-lang/rust/pull/72644.

RalfJung on 28 May 2020

👍1

Is RUSTFLAGS=-Dunaligned_references expected to apply to dependencies, despite Cargo’s use of --cap-lints to normally inhibit warnings there?

SimonSapin on 28 May 2020

@SimonSapin I have no idea how RUSTFLAGS interacts with --cap-lints, but this lint is not doing anything special here.

RalfJung on 28 May 2020

@joshtriplett

We'd like to see a crater run showing the extent of this problem

Crater results are in: https://crater-reports.s3.amazonaws.com/pr-72644/index.html
However, also note @retep998's comment due to crater being Linux-only.

RalfJung on 2 Jun 2020

In particular, there are quite a few sys crates that have all their code on one line... is that bindgen (or so) generating these invalid references?

RalfJung on 2 Jun 2020

Yes, rust-bindgen and anything else that uses https://crates.io/crates/quote to generate source code will have that code all in one line. rust-bindgen then attempts to run rustfmt on that code by default, maybe that’s not available in the Crater environment?

SimonSapin on 2 Jun 2020

As requested by @RalfJung, here you can find the list of all functions that trigger “borrow of packed field” lint. Our dataset is based on a snapshot of crates.io made on 2020-01-14.

vakaras on 10 Oct 2020

Thanks @vakaras :-)
When I asked if it is possible to gather that data, I didn't quite expect you'd just go and get it. ;)

RalfJung on 10 Oct 2020

Thanks @vakaras :-)
When I asked if it is possible to gather that data, I didn't quite expect you'd just go and get it. ;)

I would say that we just got lucky this time that @fpoli decided to extract the reasons why the Rust compiler thinks that a specific function needs to have an unsafe block, which also includes this reason.

vakaras on 10 Oct 2020

😄1

https://github.com/rust-lang/rust/pull/75534 should help make progress on getting crates to update their code once the necessary primitives have been stabilized.

Aaron1011 on 12 Oct 2020

Rust: repr(packed) allows invalid unaligned loads

Original Issue Description

Most helpful comment

All 143 comments

Related issues