This is now a tracking issue for RFC 1240, "Taking a reference into a struct marked repr(packed)
should become unsafe
", but originally it was a bug report that led to the development of that RFC.
#![feature(simd, test)]
extern crate test;
// simd types require high alignment or the CPU faults
#[simd]
#[derive(Debug, Copy, Clone)]
struct f32x4(f32, f32, f32, f32);
#[repr(packed)]
#[derive(Copy, Clone)]
struct Unalign<T>(T);
struct Breakit {
x: u8,
y: Unalign<f32x4>
}
fn main() {
let val = Breakit { x: 0, y: Unalign(f32x4(0.0, 0.0, 0.0, 0.0)) };
test::black_box(&val);
println!("before");
let ok = val.y;
test::black_box(ok.0);
println!("middle");
let bad = val.y.0;
test::black_box(bad);
println!("after");
}
Will print, on playpen:
before
middle
playpen: application terminated abnormally with signal 4 (Illegal instruction)
The assembly for the ok
load is:
movups 49(%rsp), %xmm0
movaps %xmm0, (%rsp)
#APP
#NO_APP
But for the bad
one, it is
movaps 49(%rsp), %xmm0
movaps %xmm0, (%rsp)
#APP
#NO_APP
Specifically, the movups
(unaligned) became a movaps
(aligned), but the pointer isn't actually aligned, hence the CPU faults.
I... honestly thought that was expected behaviour. What can possibly be done about this is general, other than making references into packed fields a different type (or forbidding them)?
This is UB, so it's clearly not expected... but yes. it's more of a design flaw than an implementation issue.
Note that it's possible to make code like this misbehave even without SIMD, although it's a bit trickier; LLVM performs optimizations based on the alignment of loads.
Yeah, I only used SIMD because it was the simplest way to demonstrate the problem on x86. I believe platforms like ARM are generally stricter about load alignments, so even, say, u16
may crash in simple cases like the above.
Is there a way to tell LLVM that the value could be unaligned, and so LLVM should emit code that tries to read it in a safe way for the given architecture, even if it results in slower code?
@retep998: That's exactly what LLVM should have done for a packed struct. This looks like a LLVM codegen bug to me.
@vadimcn this is entirely our (or rather: my) fault. We used to not emit alignment data at all, which caused misaligned accesses for small aggregates (see #23431). But now we unconditionally emit alignment data purely based on the type that we're loading, ignoring where that value is stored. In fact at the point where we create the load, we currently don't even know where that pointer comes from and can't properly handle packed structs.
Perhaps we need some sort of alignment attribute that we can attach to pointers?
In general we have no idea where a reference comes from, e.g. fn foo(x: &f32x4) { let y = *x; ...
in some external crate can't know if we happen to pass in &val.y.0
in the code above. The only way to codegen to handle unaligned references properly is to assume pointers are never aligned, which would be extremely unfortunate. We don't currently have type-attributes, so it would be somewhat strange to introduce one here, instead of using types (for example).
Can we make creation of references into packed structs unsafe?
triage: P-high
Seems clear there's a bug here. First step is to evaluate how widely used packed is, so huon is going to do a crater run with #[repr(packed)]
feature gated.
I made a patch that feature gated repr(packed) and did a crater run. Results: https://gist.github.com/huonw/8262a4fb2da0a052f5f1 . Summary:
High-level summary:
repr(packed)
repr(packed)
at all | | stable | needed | FFI only |
| --- | --- | --- | --- |
| image | β | | |
| nix | β | β | β |
| tendril | | β | |
| assimp-sys | β | β | β |
| stemmer | β | | |
| x86 | β | β | β |
| http2parse | β | β | |
| nl80211rs | β | β | β |
| openal | β | | |
| elfloader | | β | β |
| x11 | β | | |
| kiss3d | β | | |
As you can see from the "huon references this issue ..." spam above this comment, I've opened PRs against the libraries where the repr(packed)
seems to be unnecessary.
The bug here is the testcase, it's intentionally violating CPU alignment constraints. In C, packed
is unsafe, and you're not supposed to take pointers to the fields of a packed structure, for fear of unaligned access. So, a simple path forward to me seems to be disallow taking references to fields of a struct with #[repr(packed)]
.
I like @cmr's suggestion, on which I see 3 variations:
The first two have the advantage that we could pack things tighter, e.g. bool
fields could be bit-fields.
Not sure that's something we want to do to #[repr(packed)]
, but nevertheless, opting into no references to fields will be needed at some point for more layout optimizations.
@huonw Could those crates be checked with an error being emitted here if cmt
refers to a field of a #[repr(packed)]
struct
, or something inside such a field?
cc https://github.com/rust-lang/rfcs/issues/311 (and also #[repr(simd)]
, all of which are in the same situation)
brief sketch of how all of these could work:
&
refs, copy out onto stack first&mut
, copy out first, and copy _back_ immediately after borrow ends (subtlety: if ownership of &mut
is passed to another fn, copy-back is _still_ done by the caller (because callee can't know whether it should); difference should not be observable because &mut
implies no other references exist through which to observe; same logic suggests mem::forget
shouldn't be a problem either)UnsafeCell
, needs to either fall back to normal repr, compile error ("can't #repr(foo)
on types containing UnsafeCell
..."; wrinkle: what about generic types?), or some other special handlingUnsafeCell
" and "not UnsafeCell
" the only two possible relevant cases?(sorry for brevity, been wanting to write this down for a while but short of time)
@glaebhoerl That's an interesting suggestion, and I can't think of any issues with the &mut
handling.
So for a struct with no interior mutability, would the only difference with #[repr(packed)]
be the lifetime of field borrows, i.e. &'a Struct -> &'a Field
wouldn't work?
Oh, hmm. To be honest I hadn't even thought of that. (In case it's not clear to anyone else (it only became clear to me after I had written out a comment asking for clarification): if I have a PackedStruct
on my stack, and wish to call an fn get_field(&PackedStruct) -> &Field
on it, that plainly won't work if get_field
copies the Field
out onto its _own_ stack, which'll be deallocated by the time it returns.) Too bad: I had been hoping all #[repr(s)]
could keep the same observable semantics, and even a subtlety like this rules out that possibility. Apparently there's no free lunch.
@glaebhoerl I do expect that such a scheme would allow most code to continue compiling, given the small number of repr(packed)
uses.
@huonw have you grepped for repr(packed)
in the cargo dir to find occurrences in code that does not otherwise compile?
(Just making sure nothing is missed, I don't foresee anything significant showing up.)
I use and need repr(packed)
for one of my libraries, because I do memory mapping tricks. I'm fine with it being unsafe, just as long as I can continue to use it from within stable Rust.
The bug here is the testcase, it's intentionally violating CPU alignment constraints. In C, packed is unsafe, and you're not supposed to take pointers to the fields of a packed structure, for fear of unaligned access. So, a simple path forward to me seems to be disallow taking references to fields of a struct with #[repr(packed)].
Of course the bug is in the test case... that's why I filed an issue with it ;) It's a compiler bug that the test case compiles and then crashes: it should either not compile, or not crash. It's triggering undefined behaviour (at the very least, we're telling LLVM that the load is aligned when it isn't), and, notably, it's particularly egregious, since there's no explicit internal references at all: our current (broken) scheme could probably be tweaked to make that specific example compile, without making any other changes.
(Of course, the fundamental problem won't be fixed by making a few tweaks to how we codegen struct field accesses.)
I use and need repr(packed) for one of my libraries, because I do memory mapping tricks. I'm fine with it being unsafe, just as long as I can continue to use it from within stable Rust.
Could you go into more detail about what the struct looks like?
have you grepped for repr(packed) in the cargo dir to find occurrences in code that does not otherwise compile?
Nothing new.
It looks like that might be able to get away with [u32; 2]
+ some transmutes in a pinch (highly not great of course...).
(I assume you've thought about the endianness issues that comes with mmaping data directly?)
Going to throw in my voice in that #[repr(packed)]
is rather important for http://arcnmx.github.io/nue/pod/ (has a safe/stable interface via syntex). However, yes, they will get aligned wrong on ARMv5 (an arch that rust doesn't support), but will work on ARMv6 (rust's current minimum supported arm platform). Loading from a pointer with incorrect alignment is considered UB in LLVM though, so... ouch. Just because it works in most cases on x86 doesn't mean it can be ignored :(
Note that borrows are only UB/unsafe for types that have an alignment > 1 - I would hope to retain this behaviour at the very least, so that my Le
My proposal: If packed
can be exposed as a marker trait, OIBIT could be used to ensure that packed types only contain other packed types (and thus all have alignments of 1, and borrowing any of its members will be safe). Then you can opt in to packed via an unsafe impl
if you really want to use unaligned accesses, or don't care for whatever reason.
Not that it really matters due to the LLVM UB issue, but it's still possible to disable unaligned accesses on ARM with a CPU flag; GCC and Clang expose -mno-unaligned-access
to generate code that supports this.
Can this cause memory unsafety?
If it always leads to an exception and abort, maybe it is something that Rust can allow.
@bluss it can. Look at the linked issue about unaligned reads on ARM, it'll just silently give you wrong/unexpected values when reading from a pointer. It's also explicitly stated as undefined behaviour in LLVM:
It is the responsibility of the code emitter to ensure that the alignment information is correct. Overestimating the alignment results in undefined behavior. Underestimating the alignment may produce less efficient code. An alignment of 1 is always safe. The maximum possible alignment is 1 << 29.
Well, I put an initial implementation of this into http://arcnmx.github.io/nue/packed/index.html
#[packed]
ensures that all types are safe to use unaligned before applying #[repr(packed)]
, and includes some helpers along with pod's usual endian primitives. The API needs some work, and some of my assumptions may not be entirely correct, but it gives a rough idea of how something similar could work for std!
I've written https://github.com/rust-lang/rfcs/pull/1240 which proposes making taking a reference into a repr(packed)
struct unsafe
.
I don't feel like this is the correct approach to take, will write up some notes in the RFC comments in a bit.
Also, I'd like to point out that, as it turns out, a bunch of types in the Windows SDK are packed, so #[repr(packed)]
really needs to stay in stable Rust for winapi
's sake.
Also, I'd like to point out that, as it turns out, a bunch of types in the Windows SDK are packed, so
#[repr(packed)]
really needs to stay in stable Rust for winapi's sake.
Might also be worth noting that some Windows API structs are packed to different alignments, e.g. #pragma pack(2)
, whereas Rust seems to currently only allow 1-byte alignment (although this can be easily worked around with additional padding fields).
Could we perhaps expand the syntax to #[repr(packed(N))]
then?
Note that rust-lang/rfcs#1240 was accepted. I have re-used this as the tracking issue.
@rust-lang/compiler it'd be great to find someone to implement this! :)
I'll take a look at it this weekend, I won't assign the issue to myself until I actually start working on it, just in case I decide that I don't have time (or forget!)
A new align metadata was added to the load instruction in LLVM.
That doesn't solve the problem, unless we mark _all_ loads out of a &T
as having alignment 1 (i.e. the current set-up means we could never assume a &T
is correctly aligned), which would be very unfortunate from a performance point of view.
Hey @Aatch, have you got an update the situation?
@huonw
This is basically waiting on MIR passes.
triage: P-medium
also waiting for mir
Came up today that repr(packed)
may also introduce data races. For instance, accessing a misaligned AtomicPtr, or the fields next to it, may do strange things. I'm unsure if this can be triggered in any way that isn't just "misaligned load". I don't know a lot about the relevant instruction sets and things LLVM does.
I'd be fine with not having atomics in repr(packed)
Given that this has been sat around for a while and that it sounds like it is not going to get implemented that soon, I think we should feature-gate #[repr(packed)]
in the meantime. I propose we start with a warning cycle and uplift that to beta then put a feature gate in nightly now. Alternatively, we could just warn on use of #[repr(packed)]
and never move to a feature gate (since it is widely used in WinAPI).
I think we should do something. Having potential memory unsafety sat around without the compiler telling the user about it at all is pretty bad.
Meanwhile in winapi
#[repr(packed)]
is the only tool I have to lower the alignment of structs to make their layout match that of the C version which uses #pragma pack
. Please don't feature gate it without providing an alternative. Many features that I'd like for FFI, such as unions, I can manage without by using my own methods and macros. However without some way to place a 4 byte field on a 2 byte boundary, I'm simply stuck and can't represent those types.
I don't know, packed is used widely enough, and the vast majority of
uses are fine. I expect the fix will come in a month or two, once the
MIR transition has made more progress, but perhaps that is wildly
optimistic.
On Sun, Feb 07, 2016 at 01:48:31PM -0800, Nick Cameron wrote:
Given that this has been sat around for a while and that it sounds like it is not going to get implemented that soon, I think we should feature-gate
#[repr(packed)]
in the meantime. I propose we start with a warning cycle and uplift that to beta then put a feature gate in nightly now. Alternatively, we could just warn on use of#[repr(packed)]
and never move to a feature gate (since it is widely used in WinAPI).I think we should do something. Having potential memory unsafety sat around without the compiler telling the user about it at all is pretty bad.
Reply to this email directly or view it on GitHub:
https://github.com/rust-lang/rust/issues/27060#issuecomment-181126967
After some discussion in compiler team meeting:
Maybe I didn't understand everything, but it seems to me there are 2 different problems here: letting rustc squeeze every bit of space possible (e.g. bool
on 1 bit, optimized enum
s) to have compact data structures, and choosing each struct member's footprint to match an FFI object layout (e.g. including a C bitfield with arbitrary length) - which can't always be done automatically.
So repr(packet)
is not an universal solution for data exchange with foreign libs.
On x86-64, is it always possible to just use unaligned instructions? GHC had to edit the output of LLVM to deal with alignment-requiring AVX instructions.
@drbo That doesn't solve the problem on other architectures where alignment is critically important.
@retep998 Still it would solve the problem temporarily, at least for some architectures.
My preferred solution is to disallow borrows from such fields (which are thus required to be Copy
), and to emit unaligned instructions for reading and writing to the misaligned fields.
@DemiMarie The current plan already is to make borrows into such fields unsafe (but not forbidding them outright, nor requiring such types to be Copy
although it would be really hard to use such a field if it wasn't), and to make access into such fields otherwise safe by emitting unaligned instructions for those fields.
It would be nice if borrows of fields within a repr(packed) struct could remain safe if the type of the field was also repr(packed) - or perhaps more generally if the field has byte alignment. I have an unpublished crate that I'm hoping to finish off some day that relies on users of the crate being able to do this.
It would be nice if unsafe was only required when the alignment of the field's type is greater than the packing of the struct itself, and if it is only less than or equal it should merely be a lint.
Also, I'm pretty sure the only reason this wasn't implemented yet was because it was blocked by MIR. Now that old trans is completely gone can someone hurry up and implement this already? Maybe squeeze in #[repr(align(N))]
and #[repr(packed(N))]
in while they're at it.
@retep998 This sounds very tricky to do if you take generics into account since the type layout is only known after monomorphization. I think just making packed structs unsafe to access is enough for a start, we can always look into loosening the restrictions later.
@Amanieu That's not entirely correct, we can do partial layout queries at any point after typeck completes.
_In fact_, that's how transmute
has been checked since rust-lang/rust#32939.
The trick in the original bug report of assigning to an intermediate variable, which the simd crate also uses, seems to work in debug builds, but currently does not appear to work in optimized builds (anymore?).
Looking at emmintrin.h from clang, it seems that LLVM doesn't have internal intrinsics for unaligned SSE2 loads but instead the dereferencing a pointer to a single-member struct with __attribute__((__packed__, __may_alias__))
and then extracting the member is supposed to generate unaligned SSE2 loads (and analogously for stores). I.e. the same thing the simd crate does.
Since there isn't an LLVM intrinsic to call directly, it's rather important for the packed
idiom for getting LLVM to emit unaligned SSE2 loads/stores to work reliably.
This was previously marked as blocking on MIR. Now that MIR has landed, what should Rust user expectations be in terms of controlling aligned vs. unaligned SSE2 load/store generation?
@hsivonen
This was previously marked as blocking on MIR. Now that MIR has landed, what should Rust user expectations be in terms of controlling aligned vs. unaligned SSE2 load/store generation?
The next step is to port the unsafety check to operate on MIR, instead of HIR. I'd be happy to mentor such a thing, though it's probably complex enough that it makes sense to sync up online.
The next step is to port the unsafety check to operate on MIR, instead of HIR. I'd be happy to mentor such a thing, though it's probably complex enough that it makes sense to sync up online.
I followed up on internals about a simpler way to address the SIMD load/store use case.
I would just like to note, that as I described in issue #39376 (which is probably a duplicate of this one), that some architectures don't support unaligned access. packed is definitely a minefield, not only for supporting those platforms, but for users of the language that may not be familiar with the many subtleties involved. With that in mind, I agree with the suggestion that accessing packed structs may need to be considered "unsafe".
Just to put the packed knowledge in the same place, in rustc there is a safe and sound (hopefully) use of packed
to have the compiler generate the appropriate load of an u32
from an unaligned pointer. The point is that self
is used by value and align of Unaligned<u32>
is 1.
#[repr(packed)]
#[derive(Copy, Clone)]
struct Unaligned<T>(T);
impl<T> Unaligned<T> {
fn get(self) -> T { self.0 }
}
I would just like to note, that as I described in issue #39376 (which is probably a duplicate of this one), that some architectures don't support unaligned access.
I don't see why platform support for unaligned access is required. LLVM could always expand the access into a byte-by-byte load, or call an intrinsic (I don't remember what it usually does).
@whitequark is right. We even have a read_unaligned
function in nightly which is portable across all architectures: https://doc.rust-lang.org/nightly/src/core/ptr.rs.html#175-181
pub unsafe fn read_unaligned<T>(src: *const T) -> T {
let mut tmp: T = mem::uninitialized();
copy_nonoverlapping(src as *const u8,
&mut tmp as *mut T as *mut u8,
mem::size_of::<T>());
tmp
}
I discovered that most #[derive(...)]
impls are unsafe for packed structs due to taking references to unaligned fields (see https://github.com/rust-lang/rust/pull/39682).
@bluss's rustc code sample above also has an unsafe derived Clone
impl which won't compile once we fix this issue:
fn clone(&self) -> Unaligned<T> {
match *self {
Unaligned(ref __self_0_0) => // BAD: reference to unaligned packed struct field
Unaligned(::std::clone::Clone::clone(&(*__self_0_0))),
}
}
EDIT: I sent a PR to fix this code in the compiler and also opened an issue about the general problem.
Nice try, GitHub. I only said "when we fix #27060".
@eddyb Can you reopen?
Nominating for compiler team discussion. We've been talking about implementing @huonw's RFC 1240 for over a year at this point. I'm not sure how difficult that is to do now, but the longer we wait, the greater the chance of fallout. So I think it'd be good to re-triage.
Leaving this nominated for now because @nikomatsakis is not in for the meeting.
@aturon We've been able to do it more or less ever since we fully switched to MIR.
IMO we should track unsafety in VisibilityScope
and move all unsafety checking to work on MIR.
I've been wanting to do this for a while and IIRC @nikomatsakis agrees with my plan, I just didn't get to it.
I might be able to mentor someone to do it (we still have no way to advertise this, do we?).
@eddyb yes, I've been wanting to do the same, and I've actually had this on my (personal) list of "things to write mentoring instructions for". I'll try to write up some instructions today.
@nikomatsakis Did you get a chance to write up that information? I'd be interested in implementing this.
We discussed this in the @rust-lang/compiler meeting today. Nobody has written up any instructions =) but we did discuss some of the more subtle points:
Currently, we strip dead code out of MIR very early. This would mean that if we do 'unsafe' checking on MIR, we will ignore dead code (e.g., return; <this code>
). We already ignore a number of errors in dead-code, and borrowck will totally ignore it, so this seems potentially just fine, but it's worth making a firm decision here.
There is another option: we can run the "unsafe check" before we strip dead code. The tricky part is that in that case some of the types in MIR may not match, since the type-checker ignores data-flow out of dead-code. This isn't really a problem for the unsafeck, but more generally it would mean the MIR is not "well-typed" and it'd be sort of nice to have an invariant that MIR is always well-typed.
One last point: we have to be careful that if false { <sort-of-dead-code> }
doesn't get stripped out of MIR before we are done doing the various safety checks! It's one thing to suppress errors in code that is obviously dead, but quite another to led the results of constant propagation feed into safety analysis in that way.
@nikomatsakis
One last point: we have to be careful that if false {
} doesn't get stripped out of MIR before we are done doing the various safety checks! It's one thing to suppress errors in code that is obviously dead, but quite another to led the results of constant propagation feed into safety analysis in that way.
Especially because that conditional might look like if cfg!(...)
or similar.
Actually, it's not even necessary to write &
to get such unaligned loads -- having Drop
type in a packed struct is also a problem, because drop
gets an &mut
and hence assumes it is aligned. (I noticed this while playing around with miri and suddenly getting unaligned access errors for boxes in packed structs.)
So, beyond making references to packed fields unsafe, it seems we also have to forbid types in packed structs that implement Drop. Will this need an RFC (despite being a soundness fix)?
Because references should always be aligned... how are we supposed to get a raw pointer to a given field to check whether it is in fact aligned or even do an unaligned read or write? Doing &foo as *const T
doesn't really work because it goes through an intermediate reference.
@RalfJung huh, interesting. We could amend the existing RFC to cover that case -- i.e., open a PR amending the original text -- but I think it also suffices to open up a Rust issue, tag it as T-lang, and do the FCP there; I feel like it's in the spirit of the existing RFC, so I don't know that a full PR on the RFC repository is needed. (It'd be nice to clarify these procedures, this question comes up somewhat regularly; but the intention is that an RFC, once landed, can be amended in reasonable ways, and any concerns will be addressed and discussed in the tracking issue and -- ultimately -- during stabilization.)
Notes from IRC: https://botbot.me/mozilla/rustc/2017-07-31/?msg=89222091&page=2
The next step is to port the unsafety check to operate on MIR, instead of HIR. I'd be happy to mentor such a thing, though it's probably complex enough that it makes sense to sync up online.
@nikomatsakis Is this still relevant? This sounds like a significant task of its own, should it be a separate issue?
@SimonSapin
I am working on it right now.
For Drop
fields, could we not move them into an aligned temporary before calling drop
?
@Zoxc
That wouldn't work for unsized fields (yes you can have repr(packed)
+ unsized), and it could create unexpected stack copies.
I think we should try to forbid repr(packed)
structs with destructors and see how that goes.
This is fixed by #44884, isn't it?
@RalfJung Last time I tried, rustc itself was no longer crashing with SIGBUS. But code compiled natively on sparc64 still crashed (rustc being cross-compiled for sparc64) with SIGBUS.
I have to perform more tests on sparc64 first to be able to give a qualified answer.
Ah, this issue now seems to be about two things:
Tracking issue for RFC 1240. That RFC is now implemented in nightly, but it is a warning currently, not an error: https://play.rust-lang.org/?gist=7ebeed4f9cc64fb970a4ba2e604a5697&version=nightly. Probably the tracking issue should remain open until the RFC is fully implemented, i.e., this is a hard error (in nightly at least)?
The code in the first post compiling incorrectly. The code doesn't actually take a reference to an unsafe field, so it seems rather orthogonal to the RFC -- this is purely a codegen issue. I am a little puzzled that these are in the same issue.
Could somebody summarize exactly what remains to be done here ?
It is currently hard, although not impossible, to write C FFI code without repr(packed)
and repr(packed(N))
that does not invoke undefined behavior. Because of how hard this is, libc
, mach
, and other low-level OS api libraries prefer to actually just silently invoke undefined behavior. AFAICT only rust-bindgen does consistently the right thing here, and the results are pretty horrible.
I only really need this for C FFI so I would be fine with completely forbidding all references to packed structs if that could lead to a minimally usable subset of this being stabilized quicker.
Also, the current RFC2366: portable packed SIMD vector types does not mention the interaction between SIMD vector types and packed, it just assumes that all vector types are always stored at a multiple of their alignment. I don't think that implicitly doing unaligned loads would be a sane default, and I'd rather avoid having to assert!
on every vector method that &self
is properly aligned. I would be fine with RFC2366 requiring that vector types must always be stored at a multiple of their alignment, which would prevent them from being stored in repr(packed)
structs.
Could somebody summarize exactly what remains to be done here ?
I only really need this for C FFI so I would be fine with completely forbidding all references to packed structs if that could lead to a minimally usable subset of this being stabilized quicker.
AFAIK everything is stable already (and has been since 1.0), but the problem is that some code is accepted without unsafe
. https://github.com/rust-lang/rust/issues/46043 tracks turning the warning this generates into a hard error.
I don't know about the codegen error this issue was originally about.
It is currently hard, although not impossible, to write C FFI code without repr(packed) and repr(packed(N)) that does not invoke undefined behavior.
Is that because the C side is packed, or because repr(C)
somehow doesn't do the right thing?
I'd rather avoid having to assert! on every vector method that &self is properly aligned.
You do not have to. References are always aligned; only raw pointers may be unaligned and only if they are used with write_unaligned
and read_unaligned
. We already tell LLVM that references are aligned, so vector types shouldn't be any different here.
Is that because the C side is packed, or because repr(C) somehow doesn't do the right thing?
Because the C side is packed with #pragma pack N
(and/or a similar packed attribute). In many cases repr(packed)
isn't enough and one needs repr(packed(N))
which is not stable yet. This issue is one of its unresolved issues.
Is there some reason that everything needs to be aligned?
On at least x86, amd64, and ARM64 I cannot see the harm in telling LLVM
things may not be aligned.
On Mon, Apr 16, 2018, 5:31 AM Ralf Jung notifications@github.com wrote:
Could somebody summarize exactly what remains to be done here ?
I only really need this for C FFI so I would be fine with completely
forbidding all references to packed structs if that could lead to a
minimally usable subset of this being stabilized quicker.AFAIK everything is stable already (and has been since 1.0), but the
problem is that some code is accepted without unsafe. #46043
https://github.com/rust-lang/rust/issues/46043 tracks turning the
warning this generates into a hard error.I don't know about the codegen error this issue was originally about.
It is currently hard, although not impossible, to write C FFI code without
repr(packed) and repr(packed(N)) that does not invoke undefined behavior.Is that because the C side is packed, or because repr(C) somehow doesn't
do the right thing?I'd rather avoid having to assert! on every vector method that &self is
properly aligned.You do not have to. References are always aligned; only raw pointers
may be unaligned and only if they are used with write_unaligned
https://doc.rust-lang.org/beta/std/ptr/fn.write_unaligned.html and
read_unaligned
https://doc.rust-lang.org/beta/std/ptr/fn.read_unaligned.html. We
already tell LLVM that references are aligned, so vector types shouldn't be
any different here.β
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/27060#issuecomment-381537915,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGGWB9taIj5c7dcvGBg4eZRJyg7Kb01Pks5tpGTlgaJpZM4FZlMJ
.
@DemiMarie A compiler shouldn't generate code which produces unaligned access. Depending on the target platform, unaligned accesses can either result in performance penalties or even or a crash with SIGBUS.
Rust supports more than just x86 where unaligned access doesn't have such a huge impact. I don't know about the performance impact on arm64 though.
That depends on the platform, though. On some platforms, generating an
unaligned load or store might be the best option. Can LLVM do that when it
should?
On Mon, Apr 16, 2018, 1:34 PM John Paul Adrian Glaubitz <
[email protected]> wrote:
@DemiMarie https://github.com/DemiMarie A compiler shouldn't generate
code which produces unaligned access. Depending on the target platform,
unaligned accesses can either result in performance penalties or even or a
crash with SIGBUS.Rust supports more than just x86 where unaligned access doesn't have such
a huge impact. I don't know about the performance impact on arm64 though.β
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/27060#issuecomment-381686574,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGGWB_WOArzdsx9923j4Utll9JbHhErJks5tpNYygaJpZM4FZlMJ
.
The user expects the compiler to generate aligned accesses by default since this aligned accesses are normally the safest and fastest option.
On some platforms, generating an
unaligned load or store might be the best option. Can LLVM do that when it
should?
Yes, LLVM of course does that. It compares the alignment of data with alignment required by the selected instruction and adjusts the pointer/shifts the data if necessary.
@DemiMarie Notice that alignment doesn't just affect loads. Once we tell LLVM about the alignment 8which we do), it is UB for the pointer value to not have the given alignment even if we do not load/store. For example, if we do bit operations on the least significant bits of the pointer, LLVM will optimize them (or so I am told) based on the assumption that these bits all have to be 0 due to alignment. Also see the example of using alignment for layout optimizations that I mentioned at the end of this post.
I see. My thought was that on most platforms, telling LLVM pointers are
aligned does not help optimization, so there is no need to do it. Does it
actually help?
On Tue, Apr 17, 2018, 3:57 AM Ralf Jung notifications@github.com wrote:
@DemiMarie https://github.com/DemiMarie Notice that alignment doesn't
just affect loads. Once we tell LLVM about the alignment 8which we do), it
is UB for the pointer value to not have the given alignment even if we do
not load/store. For example, if we do bit operations on the least
significant bits of the pointer, LLVM will optimize them (or so I am told)
based on the assumption that these bits all have to be 0 due to alignment.
Also see the example of using alignment for layout optimizations that I
mentioned at the end of this post
https://github.com/rust-lang/rust/issues/46043#issuecomment-381570961.β
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/27060#issuecomment-381886959,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGGWBxwUqNEdNXoXE1yhNhV4AX0tjJqmks5tpaBcgaJpZM4FZlMJ
.
Does it actually help?
Does it matter? The question is: what do we do in those platforms in which unaligned loads are illegal. Should we have two different programming languages?
Food for thought, another use case that may flavour the solution here is device drivers.
Having control over how a structure lays out in memory is important to systems programmers who work with board support packages and drivers. A common technique in writing C device drivers, where the device registers are memory mapped, is to create a struct that perfectly overlays the registers. gcc and other compilers often require #pragmas to ensure packing and/or alignment are as required and its not uncommon to see C structs created as fully packed and have the alignment dealt with explicitly by the programmer adding explicit padding fields. (This makes it easy to verify by code inspection that the code matches what the device's data book describes.)
You then create a pointer to that struct with the base address of the device's registers. The device registers are read and written by reading and writing to the fields of the struct. You also have to ensure that you use volatile so that the compiler doesn't optimize out register reads not knowing that the device is changing them.
I've seen similar methods used for heterogeneous single board computers that communicated via shared memory and also for messages sent over communications channels where the on-the-wire layout was important to match with whatever was on the other end.
@ibkevg You can do this today, with #[repr(C)]
structs (for defined layout) and a wrapper type to force accesses to individual registers to be volatile. For example: https://docs.rs/cortex-m/0.5.2/cortex_m/peripheral/cpuid/struct.RegisterBlock.html
However these device registers are typically designed so that none of them is at a misaligned address, arenβt they? So #[repr(C)]
is enough and you probably donβt need #[repr(packed)]
for this.
Great - I might have been thrown off the trail by the name #[repr(C)] because this is needed even if there is no C code present at all and a driver is being writing in 100% Rust. Iβd suggest renaming or adding an alias that isnβt FFI specific.
Another thing that led me to write the comment above was skimming this thread and noticing at one point musing about fixing the issue in software by copying values into aligned memory and then reading that, which would lead to extra reads. I didnβt read it carefully but it brought to mind that in the event the compiler for whatever reason canβt decide whatβs the right thing to do and decides to play it safe in this way, it could cause trouble for a driver. Iβve worked with devices that have βread clearβ registers. So reading the register clears one or more bits in it and when you read it next, itβs gone. Now imagine edge cases of a solution that copies values into memory, reading extra addresses that inadvertently read clears.
So I just wanted to offer that caution - but yes it would be strange for hw to be designed that would require unaligned accesses. Even if a part was used that might have been designed for an older memory architecture, it would typically be interfaced to via an FPGA that could present a βnormalβ set of registers to software, that or tricks using address lines, etc.
(Barely related, mostly off topic, war story: not every device is going to be memory mapped :) I did have to interface to an Ethernet switch chip once where we only had an SPI (serial) interface into it that was wired to the CPUβs general purpose I/O registers. My code had direct control of a data line and a clock line by writing to the I/O register and was literally twiddling bits to feed in cmds and read data out. Lowest level code Iβve ever written.)
We have been warning about borrows to packed struct fields requiring unsafe for a long time. How should this issue progress? It is blocking the stabilization of repr(packed(N))
which is required by libc
, mach
, and winap
(widely used libraries), for correctness.
The only unresolved question I can find here is whether we want to make this unsafe on all cases or some cases only (e.g. when the specified packing is less than the field's natural alignment). However, there is also the option of making this always safe, by requiring that the reference is directly casted into a raw pointer (e.g. &foo.field as *const _
).
So we could progress this by either turning the current warning (requiring unsafe code) into an error, or by changing the warning to warn about references to fields of packed structs that are not directly turned into pointers, so that we can turn that into an error in the future.
@gnzlbg I'm fine with making this a hard error. There's just one thing: I have some ideas for how to make it even harder in the future to create unaligned references. Namely:
&expr as *[mut,const] _
as a single primitive operation that creates a raw ptr without an intermediate ref every existing.I am a bit worried about having people change the same code twice, that's my only potential reservation with proceeding to make safe-ref-to-packed-field an error. But I don't think that should actually stop us, just bringing it up in case anyone disagrees.
I am a bit worried about having people change the same code twice, that's my only potential reservation with proceeding to make safe-ref-to-packed-field an error. But I don't think that should actually stop us, just bringing it up in case anyone disagrees.
Im worried to, which is why I would prefer to change the warning to require &expr as *[mut,const] _
which would not require any unsafe
code. Otherwise, if we make not using unsafe
an error, we are basically forcing everyone to use unsafe { &x.unaligned }
which potentially would still be UB, without really telling them that they need to write &x.unaligned as *const _
which does not require unsafe
.
I think the current status is that we still just warn about these references-to-packed-fields. And indeed adding unsafe
doesn't actually help, it's still UB. What would help is stabilizing https://github.com/rust-lang/rust/issues/64490 and then slowly making the references a hard error.
@rust-lang/lang is that also your plan, and do you have a rough timescale for that?
then slowly making the references a hard error.
I hope you mean a hard error when not in an unsafe
block? (Though I could potentially see emitting a warning even in unsafe
blocks, because of how easily problematic it can be.) If we _do_ want to make it an error even in unsafe
blocks, we definitely need &raw
sooner rather than later so we can start linting against it there as well.
It can be sound to take a reference to a packed member when you know the packed member to be aligned, due to a known alignment of the packed datatype, even if it isn't guaranteed by the language.
It'd be kind of sad if the way to say "yes, this is actually aligned" had to be &*&raw const packed.x
"just" to help avoid accidentally taking references to packed fields. (Though to be perfectly clear: I understand the issue with silently allowing references to packed fields in unsafe
, due in part to implicit autoref and the scope of unsafe
for other reasons potentially covering one. I just happen to have recently written and used a packed structure guaranteed to be fully aligned, which is only packed to remove trailing padding for space reasons.)
Strawman: Create an Unaligned<T>
type; deprecate #[repr(packed)]
in favor of Unaligned<T>
plus #[repr(C)]
.
I hope you mean a hard error when not in an unsafe block? (Though I could potentially see emitting a warning even in unsafe blocks, because of how easily problematic it can be.) If we do want to make it an error even in unsafe blocks, we definitely need &raw sooner rather than later so we can start linting against it there as well.
It is UB even in unsafe blocks. So whether or not there is an unsafe block should make no difference for our lints/hard errors, once we have a stable alternative.
It can be sound to take a reference to a packed member when you know the packed member to be aligned, due to a known alignment of the packed datatype, even if it isn't guaranteed by the language.
The current check already takes that into account, I think, when it is guaranteed. For example, taking a reference to a u8
in a packed struct is fine (no lint, and no errors in the future).
It'd be kind of sad if the way to say "yes, this is actually aligned" had to be
&*&raw const packed.x
"just" to help avoid accidentally taking references to packed fields. (Though to be perfectly clear: I understand the issue with silently allowing references to packed fields inunsafe
, due in part to implicit autoref and the scope ofunsafe
for other reasons potentially covering one. I just happen to have recently written and used a packed structure guaranteed to be fully aligned, which is only packed to remove trailing padding for space reasons.)
Using packed
for this seems a bit hacky. After all, if you do a direct load or store of one of the fields, LLVM will generate an unaligned load even though, according to you, everything is actually properly aligned. In theory that could have a performance cost; in practice, maybe not, but the mismatch suggests that it's not the ideal solution for your use case.
Perhaps there should be a way to just disable trailing padding for a struct. But that would make it dangerous to create an array of that struct, and I'm not sure how to prevent it from becoming a footgun...
Still, I think making it easy/ergonomic to implicitly assert alignment should be, well, an anti-goal. After all, for typical uses of packed
, the data really is expected to be misaligned.
For similar reasons, I'm not thrilled with &raw
as a long-term solution. It's better than having no way to avoid UB... but an unaligned raw pointer is still a footgun, because you can accidentally dereference it without using {read|write}_unaligned
.
Hence my suggestion for a wrapper type. Not only does that avoid the footgun, it also avoids the need for unsafety altogether, since reading and writing an Unaligned<T>
would be safe operations.
Using packed for this seems a bit hacky. After all, if you do a direct load or store of one of the fields, LLVM will generate an unaligned load even though, according to you, everything is actually properly aligned.
It's not, though. The struct itself could be insufficiently aligned. repr(packed)
sets the required alignment of the struct to 1
, which is the only reason it can leave away trailing padding. The field might be at an aligned offset within the struct, but the stack/heap allocation containing the struct could be insufficiently aligned.
So @CAD97 I think your code is actually wrong.
For similar reasons, I'm not thrilled with &raw as a long-term solution. It's better than having no way to avoid UB... but an unaligned raw pointer is still a footgun, because you can accidentally dereference it without using {read|write}_unaligned.
That's fair. However, my main concern in this issue is to plug the existing soundness hole -- that seems at least mostly orthogonal to developing more ergonomic ways to deal with unaligned data.
My guess is that @CAD97 manually ensured their packed struct would be placed at an aligned address, by embedding it in a higher-aligned struct or something like that.
That's possible, but then it seems entirely reasonable that you need to add some unsafe
to promise that you got this right... it's non-trivial.
Having to use Unaligned<T>
would be pretty atrocious in FFI bindings. Large structs full of fields would need every single field's type wrapped in Unaligned<T>
, and if it was conditional based on something like architecture you'd end up with duplication...
repr(packed(n))
supports alignments other than 1, so any Unaligned<T>
would need to take an alignment value to have feature parity with repr(packed)
.
I'm nominating for lang-team meeting to discuss @RalfJung's question of the plans around &raw
.
My own personal take:
I am semi-convinced that we should "rethink" raw pointer syntax and try to adopt a new kind of pointer that is raw, non-null, and (probably?) doesn't make the const/mut
distinction. @withoutboats was pitching &unsafe
(and &unsafe T
) as a possible syntax for this, and I don't hate that, though there are parsing ambiguities to consider (&unsafe { 22 }
being legal syntax now).
All the same, I think we should consider moving on &raw
sooner rather than later, especially if "old-style raw pointers" would be soft-deprecated. It probably takes some time to prove out an &unsafe
-like design. It feels like a big deal, in part because I think it would best be accompanied by other changes around the Unsafe Code Guidelines. (Though maybe I'm expecting too much, perhaps &unsafe
isn't as hard as all that.)
Regardless, a change around the syntax of raw pointers is clearly the sort of thing that would best be done around an edition, I think, since it's a big shift in developer practices, and in concert with other supporting changes, which is why I brought up UCG-related work.
(Yes, my code code ensures the alignment of the packed structs by construction, with a kind of "placement new" that requires proper alignment, and never actually holding an instance of the type on the stack. packed(N)
wouldn't be enough for this use case, as it includes a structure whose safety alignment is "8+4" (i.e. 4 but not 8), which is not expressible in the type system at all. The code is on GitHub if people want to gawk at the cursed code. In actuality, the code just does an aligned read from the packed fields, it doesn't actually need to create a reference.)
My position has mellowed a bit overnight; I definitely think it's fair to warn on taking a reference to a packed field (when the field isn't guaranteed to be aligned by the Rust type system), even in unsafe
blocks, due to the ease of accidentally causing UB through an invisibly added reference (method call) or that just happens to be in an unsafe
scope for a different reason, _once we have an alternative_ (&raw
and/or &unsafe
). Said warning could then potentially be upgraded to a hard error in the (far) future.
I am semi-convinced that we should "rethink" raw pointer syntax and try to adopt a new kind of pointer that is raw, non-null, and (probably?) doesn't make the const/mut distinction. @withoutboats was pitching &unsafe (and &unsafe T) as a possible syntax for this, and I don't hate that, though there are parsing ambiguities to consider (&unsafe { 22 } being legal syntax now).
I agree the entire area of raw pointers needs a lot of work. But I also would like to finally see this soundness hole closed. I am torn between not wanting to rush in an addition that we might regret later, and wanting to do something to guide people away from taking references to packed fields (which currently is blocked on there not being anything to guide them to). Maybe exposing &raw
in a macro is a better "hotfix" than full-blown new syntax...
If I have a simple packed struct like this:
#[repr(packed)]
#[derive(PartialEq)]
struct p{
a: usize,
b: u8,
c: usize,
}
And I have 2 variables like that:
let x = p {a: 1usize, b: 2u8, c: 3usize};
let y = p {a: 1usize, b: 2u8, c: 4usize};
Then will x == y
on platforms that don't support unaligned access? (Some platforms that don't support unaligned access just ignore the lowest bits so when reading out the field c
it will read out field b
and the first bytes except the last of c
)
I guess my question is: I get that its unsafe to access references which point into a packed struct, but what kind of code will #[derive(PartialEq)] generate? Is it going to take into account that the struct is packed and has to perform unaligned memory accesses? Or is it just generates broken code right now? Unfortunatly I wasn't able to cross compile to test this, but if Rust generates broken code for PartialEq
for packed structs then just requiring to put unsafe block around packed-struct-field accesses aren't gonna fix the problem.
repr(packed)
+ derive(PartialEq)
shows a warning:
warning: `#[derive]` can't be used on a `#[repr(packed)]` struct that does not derive Copy (error E0133)
--> src/main.rs:2:10
|
2 | #[derive(PartialEq)]
| ^^^^^^^^^
|
= note: `#[warn(safe_packed_borrows)]` on by default
= warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
= note: for more information, see issue #46043 <https://github.com/rust-lang/rust/issues/46043>
= note: this warning originates in a derive macro (in Nightly builds, run with -Z macro-backtrace for more info)
This one is probably going on long enough that we can move towards making it a hard error. Is it worth taking small steps here even while there is no proper solution for taking references yet?
If you add Copy
to the derives, Miri is happy with alignment in the auto-derived code: https://github.com/rust-lang/miri/pull/1343.
I am semi-convinced that we should "rethink" raw pointer syntax and try to adopt a new kind of pointer that is raw, non-null, and (probably?) doesn't make the
const/mut
distinction. @withoutboats was pitching&unsafe
(and&unsafe T
) as a possible syntax for this, and I don't hate that, though there are parsing ambiguities to consider (&unsafe { 22 }
being legal syntax now).
Do you by chance have a link to somewhere this was being discussed? It sounds very churny to me, but Iβm curious about the motivations behind it.
@comex I don't believe there is a great link, it's an idea that's been quietly simmering. I agree that it is would cause a great deal of churn, and I think we'd want to think very carefully about that aspect of it.
If we were going to pursure this, I think a good idea if we wanted to pursue this would be to try and document the motivations, explore what a "green field" design might look like, and also think about smaller steps that might help. (It seems like a good place for a project group, to me.) But that work isn't really happening in earnest afaik.
--
@RalfJung Hmm, I actually quite like the idea of using a macro. I've been meaning for some time to write a blog post talking about how I think we underutilize that option -- i.e., using "ugly" macros to create experimental base capabilities, so that we can iterate more using procedural macros on the higher-level details. This might be a perfect opportunity to do that.
Oops, send that message a bit early. @comex I wanted to add that some of the motivations I can think of would be:
&raw const x
would generally be better than &x
, but I predict the latter is what most people will continue to reach for.)Unique<T>
, NonNull<T>
, and so forth are complex to use but kind of required to get peak performance, better layout, etc. Maybe we can do better?Option
is a nice way to signal when this is not the case, and it's how you would do it elsewhere in Rust, but unsafe code tends to be more like C.*const T
) doesn't align with the way they are created (&raw const
), unlike most types.Stepping back, I think there's a general need for us to explore how we can make unsafe less of an "all or nothing" proposition. I'd really like to see layers of unsafe:
AtomicU32
, that permit one to do "subtle" things that require caution, but which are not memory unsafe.Thirdly, I also think we need to look at some of the nifty, but unstable, things that we've long held in unsafe code (inline assembly is a very simple example of this, but there are many others) and think about how we can actually expose those.
And then there's the work of gradually bringing conclusions from "Unsafe Code Guidelines" work out and cementing them (which @RalfJung has been pursuing, π).
Lots to do...
I actually quite like the idea of using a macro.
Note that a macro has been mentioned during the RFC discussion for &raw
(https://github.com/rust-lang/rfcs/pull/2582#issuecomment-438850491, https://github.com/rust-lang/rfcs/pull/2582#issuecomment-499845969). One response was that just a macro isn't enough, because we need a new MIR operation, and also people didn't want a "magic macro" that directly goes to a MIR primitive. Otherwise there was not much discussion about it.
But I don't think using a macro just for stabilization purposes has been discussed. However, I assume this would mean the parser has to still always accept the new syntax, so that it can be used in macro expansions?
Yes, presumably the new syntax would remain unstable and there's a macro that expands to it (using allow_internal_unstable
) that gets stabilized sooner. If and when some "proper" syntax for the underlying operation is stabilized later, it becomes a trivial macro any user could have written.
How about allowing p.field
on raw pointers, which returns a raw pointer to that field?
@CodesInChaos that has been suggested multiple times in RFCs and elsewhere. The conclusion was that it would be really confusing if reference.field
dereferences the pointer but raw_pointer.field
does not. The two expressions look very similar but would have wildly different types. It's also not how any other language works.
Also this issue is about what we can do to mitigate the problems around packed structs. This is not the right place to discuss &raw
except insofar as it helps to solve the problem at hand. Your question better fits the tracking issue. To be fair, we already derailed that above a bit with the macro discussion -- but the goal there was to stabilize "something like &raw
" faster. Re-opening the already RFC'd syntax bikeshed will certainly not help that goal.
@nikomatsakis Thanks for the more detailed explanation :)
I personally don't have a problem with magic macros. Well, to be specific, I'm thinking more and more about the idea of some kind of syntax that allows us to expose "core operations" in a way that is intentionally signaled as a building block. Macro syntax is one way to go about that, but not the only way.
For example, we might use something like _::<id>
as a reserved bit of syntax (I don't think that's legal today?) such that we could write things like _::borrow_raw(x)
or something as the "underlying syntax".
Anyway, this tracking thread isn't the best place to talk about that, but I think having an established convention that we use in various places would be a good idea.
We discussed this in the lang team meeting. We'd absolutely like to move forward with raw references. We'd like to see a crater run showing the extent of this problem, and then we may make this change over an edition boundary.
We'd also like to standardize a syntax for "non-breaking syntax/keywords", so that it'd be possible to introduce new keywords understood by the compiler in a namespace that nobody especially wants to type in their code regularly, but that people can experiment with surface syntax for in macros or proc macros. This syntax would be stabilized for a given feature, and continue working even after we introduce some new surface syntax (e.g. at an edition boundary). One such proposal would be r#$token
, where token
will always get interpreted as a keyword. Then, we could have e.g. r#$raw_ref(...)
and r#$raw_mut_ref(...)
, and decide what the surface syntax should look like later, deferring that either to an edition (with a new keyword) or a point where we find backward-compatible syntax.
We'd like to see a crater run showing the extent of this problem
You mean, a crater run that makes references to packed fields a hard error (unsafe
or not)?
We'd also like to standardize a syntax for "non-breaking syntax/keywords"
So the plan would be to only go ahead with raw-references-to-packed-fields once such a syntax materializes? (That sounds like it needs its own RFC first.)
You mean, a crater run that makes references to packed fields a hard error (
unsafe
or not)?
I think that was the proposal, yes.
So the plan would be to only go ahead with raw-references-to-packed-fields once such a syntax materializes? (That sounds like it needs its own RFC first.)
Not necessarily. It's more that either 1) we need to finish bikeshedding the raw reference syntax, or 2) we need a way to move forward with things that we know we want semantically but haven't finished developing the syntax for. Either one would unblock this.
The topic of &raw
versus &#$raw
versus raw_ref!
has been discussed a bunch by the lang design team, as you can see from the comments above and from issue #64490
At this point we are going to take baby steps here, namely via steps like PR #72279
So I think we can safely unnominate this from lang team's plate (and thus, as a nice side effect, stop talking about it at the T-compiler pre-triage sessions).
A lint to detect references to packed fields (suggested by @retep998) is now available on nightly: #![deny(unaligned_references)]
. You can use this to test how much your codebase will be affected when this gradually turns into a hard error.
Is there a Crater run planned with it being denied scheduled for those not watching this issue?
@mathstuf in fact that crater run is already queued. ;) See https://github.com/rust-lang/rust/pull/72644.
Is RUSTFLAGS=-Dunaligned_references
expected to apply to dependencies, despite Cargoβs use of --cap-lints
to normally inhibit warnings there?
@SimonSapin I have no idea how RUSTFLAGS
interacts with --cap-lints
, but this lint is not doing anything special here.
@joshtriplett
We'd like to see a crater run showing the extent of this problem
Crater results are in: https://crater-reports.s3.amazonaws.com/pr-72644/index.html
However, also note @retep998's comment due to crater being Linux-only.
In particular, there are quite a few sys crates that have all their code on one line... is that bindgen (or so) generating these invalid references?
Yes, rust-bindgen and anything else that uses https://crates.io/crates/quote to generate source code will have that code all in one line. rust-bindgen then attempts to run rustfmt
on that code by default, maybe thatβs not available in the Crater environment?
As requested by @RalfJung, here you can find the list of all functions that trigger βborrow of packed fieldβ lint. Our dataset is based on a snapshot of crates.io made on 2020-01-14.
Thanks @vakaras :-)
When I asked if it is possible to gather that data, I didn't quite expect you'd just go and get it. ;)
Thanks @vakaras :-)
When I asked if it is possible to gather that data, I didn't quite expect you'd just go and get it. ;)
I would say that we just got lucky this time that @fpoli decided to extract the reasons why the Rust compiler thinks that a specific function needs to have an unsafe block, which also includes this reason.
https://github.com/rust-lang/rust/pull/75534 should help make progress on getting crates to update their code once the necessary primitives have been stabilized.
Most helpful comment
I discovered that most
#[derive(...)]
impls are unsafe for packed structs due to taking references to unaligned fields (see https://github.com/rust-lang/rust/pull/39682).@bluss's rustc code sample above also has an unsafe derived
Clone
impl which won't compile once we fix this issue:EDIT: I sent a PR to fix this code in the compiler and also opened an issue about the general problem.