Rust: null trait object raw pointers are UB

Created on 24 Aug 2019 · 25Comments · Source: rust-lang/rust

Opening after some discussion in https://github.com/rust-lang/miri/issues/918#issuecomment-524554552

This playground snippet crashes on an illegal instruction on release: https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=7c69493026add62256996d204e1278c0
Are vtable ptrs in trait object ptrs NonNull? Is that intended? This (2 year old) comment would suggest so https://github.com/rust-lang/rfcs/issues/433#issuecomment-345497470

CC @RalfJung

T-lang

Source

nico-abram

Most helpful comment

If we decide this is not a bug, we'd have to adjust https://doc.rust-lang.org/nightly/nomicon/what-unsafe-does.html (and also the reference). It would be a very special case to require anything for raw pointers that are not dereferenced.

RalfJung on 24 Aug 2019

👍2

All 25 comments

trait T { }

fn main() {
    a(unsafe { std::mem::zeroed() });
}

fn a(_: *mut dyn T) {}

; ...
    call void @_ZN10playground1a17hbc18b5e956a8e54dE({}* %1, [3 x i64]* noalias readonly align 8 dereferenceable(24) %2), !dbg !145
; ...

The vtable is marked as dereferencable(24).

bjorn3 on 24 Aug 2019

Thanks for reporting!

In https://github.com/rust-lang/unsafe-code-guidelines/issues/166 we have been discussing, rather theoretically, the validity requirements for wide raw pointers. We were not actually aware that there already are validity requirements in place for them. That seems very surprising to me. Was this ever discussed in depth, or did it "just happen"?

Cc @rust-lang/wg-unsafe-code-guidelines @eddyb

RalfJung on 24 Aug 2019

@bjorn3 on top of that, *const dyn Send has size 16 -- so, we do enum niche optimizations.

RalfJung on 24 Aug 2019

👍2

The assumption is that the vtable is a valid reference, so that all raw pointers have valid pointee types.

*mut dyn Trait corresponds to (pseudo-notation) dyn T: Trait.*mut T or, further expanded, exists T.(*mut T, &<T as Trait>::vtable).

IMO, if the information associated with that T is invalid, or doesn't require a T to exist in the first place, that complicates the typesystem (or rather, the supposed typesystem that may have never been fully formally modeled - I've been meaning to do a writeup on dyn for a while now).

In custom DST terms, we could treat *mut T as (*mut T::Data, MaybeUninit<T::Meta>) (whereas &T would always have an initialized T::Meta), so overall maybe the cost of supporting this difference between raw pointers and references might not be that large.

There's also something to be said about having a null vtable, that Go people have some experience with (but I don't, so I'm not the best person to comment on it).

eddyb on 24 Aug 2019

The assumption is that the vtable is a valid reference, so that all raw pointers have valid pointee types.

The curious part here is that no one involved in the UCG realized that this is the current status.

IMO, if the information associated with that T is invalid, or doesn't require a T to exist in the first place, that complicates the typesystem

Where is there a complication here? I mean not in terms of the implementation, that shouldn't be a primary concern in terms of language design, but in terms of documentation and teaching.

Requiring vtables of raw pointers to be valid complicates the language as well, it would be an exception from the principle to not require things off of raw pointers.

In custom DST terms, we could treat mut T as (mut T::Data, MaybeUninit)

I was more thinking of *mut dyn Trait as (usize, usize). But your proposal scales better to more generalized metadata, indeed. I like it.

RalfJung on 24 Aug 2019

I don't even think I'm the first to come up with that, but I can't remember who did.

In the end, I guess it's a tradeoff between:

safe size_of_val for *mut T, even for T: !Sized
ptr::null::<T>(), even for T: !Sized
- I think this was discussed before, since I remember Go being brought up

(these are not the only outcomes, but they should be somewhat representative)

eddyb on 24 Aug 2019

If one reads the custom DST proposal, there's a reason that we expect raw pointers to DSTs to have valid metadata. The rawness of the pointer is unrelated to the rawness of the metadata, imo -- even if you have *mut [T], you expect the length to be a usize.

strega-nil on 25 Aug 2019

safe size_of_val for *mut T, even for T: !Sized

There's also the ability for methods taking self: *const Self to be object safe. For now, such methods can be written and are treated as object safe, but the functionality is guarded by the arbitrary_self_types feature. The tracking issue for that feature has some discussion of whether raw pointer methods should be object safe.

(It claims that "even today you could UB in safe code with mem::size_of_val(foo)", where foo: *const dyn Foo, but I don't think that's true? size_of_val doesn't currently work on raw pointers.)

comex on 25 Aug 2019

It seems like this is at least somewhat expected, because if all pointers could be zeroed then we wouldn't have had so many conversations about how to make ptr::null() work with extern type, and would have just changed its implementation to unsafe { mem::zeroed() }.

Also, do any of the choices here have any impacts to the optimizations to never reload vtables when multiple calls get made?

scottmcm on 25 Aug 2019

FWIW, ptr::null being restricted to Sized types is not just because of trait object vtables, there's also concerns about interaction with custom DSTs whose metadata may be a fair bit more complicated than just an usize or a pointer.

hanna-kruppe on 25 Aug 2019

👍1

safe size_of_val for *mut T, even for T: !Sized

That won't hold up with custom DST anyway, e.g. if T is a "thin" trait object.
So, if we want to be future-compatible with custom DST, I think requiring the metadata to be valid for raw pointers buys us pretty much nothing?

There's also the ability for methods taking self: *const Self to be object safe.

Isn't the issue there just that calling methods with a "raw receiver type" will be unsafe?

RalfJung on 25 Aug 2019

there's also concerns about interaction with custom DSTs whose metadata may be a fair bit more complicated than just an usize or a pointer.

But if we go for MaybeUninit<T::Meta>, then that's uniform across all types.

That won't hold up with custom DST anyway, e.g. if T is a "thin" trait object.

Yeah it would be nice to not tie anything to the T::Meta but always require a reference, since you wouldn't need several traits, just something like DynSized. I wonder what we lose in that case.

Isn't the issue there just that calling methods with a "raw receiver type" will be unsafe?

So "it's UB to call with invalid Self::Meta"? Sounds reasonable.

Overall I feel like we should look into this a bit more, but I can see how the "low-level compromise" lets us focus on &T and not worry about raw pointers (e.g. regarding custom DSTs).

It's almost like raw pointers are union { &UnsafeCell<T>, usize }, huh.

eddyb on 25 Aug 2019

Turns out that for raw slices, safe code can cause them to have invalid metadata:

123456789 as *const [(); usize::max_value()] as *const [()] as *const [u64]

Ouch.

RalfJung on 26 Aug 2019

@RalfJung Wow, so we couldn't actually have <[T]>::Meta = (T::Meta, usize), it would have to be something opaque that depends on the size of T (which in the future might get very complicated with T: !Sized slice elements).

eddyb on 26 Aug 2019

Looks like probably it should be just usize, and the max-size comes in via the requirement that references (including slices) must point to memory that is allocated in a single allocation -- which for LLVM means it cannot be bigger than isize::MAX.

So, I was just interpreting that length requirement the wrong way so far. I'll adjust the various PRs (Reference, Nomicon, Miri) accordingly.

RalfJung on 26 Aug 2019

👍1

@RalfJung The issue is not invalid metadata -- invalid metadata is fine. The big thing that you want is _initialized_ metadata.

*const UnsizedType should contain <UnsizedType>::Meta no matter what, but there's no guarantee on the validity of that metadata for the pointed to object.

strega-nil on 26 Aug 2019

"initialized" as a term turned out to be pretty unsuited. I mean "valid" specifically as in "validity invariant".

RalfJung on 26 Aug 2019

Sure, whatever. It needs to be a valid value of type <T>::Meta

strega-nil on 26 Aug 2019

@ubsan You can circumvent that by defining raw pointers as having MaybeUninit<T::Meta> instead of T::Meta directly.

eddyb on 29 Aug 2019

@eddyb you can, but I don't think there's a reasonable argument for it. Creating and destructuring raw pointers safely seems really nice to me.

strega-nil on 29 Aug 2019

"really nice" is IMO not a sufficient argument for introducing this kind of UB. We should have some optimization that's worth it, ir some amount of spec simplification that's worth it.

The "default" answer should be "let's avoid UB, to make unsafe code author's lives easier".

RalfJung on 29 Aug 2019

@RalfJung how does one introduce UB in this case? Are people generally returning mem::zeroed::<*mut Trait>()?

I would argue that this is removing UB that people get when calling methods through a *mut Trait

strega-nil on 29 Aug 2019

how does one introduce UB in this case? Are people generally returning mem::zeroed::<*mut Trait>()?

That would be an example of an operation that becomes UB under your semantics. Doesn't seem too strange to me to zero-initialize a struct with a wide raw pointer field.

I would argue that this is removing UB that people get when calling methods through a *mut Trait

There is literally no program that is UB under my semantics but not under yours.

This is not about the safety invariant, it is about the validity invariant.

RalfJung on 29 Aug 2019

@RalfJung _shrug_

I don't have a horse in this race. I just think they're nicer semantics. If you think it's more important to allow mem::zeroed than access to the metadata being safe, that's your prerogative.

strega-nil on 30 Aug 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Callback parameter names are missing from rustdoc

dtolnay · 3Comments

nightly ICEs on invalid code with break outside of loop

zhendongsu · 3Comments

core::marker::Send and raw pointers

drewcrawford · 3Comments

Inherited trait bounds are not completely checked

behnam · 3Comments

Use #[repr(C)] HList's to infer type-erased fmt fn pointers in format_args!'s static data.

eddyb · 3Comments