Rust: null trait object raw pointers are UB

Created on 24 Aug 2019  路  25Comments  路  Source: rust-lang/rust

Opening after some discussion in https://github.com/rust-lang/miri/issues/918#issuecomment-524554552

This playground snippet crashes on an illegal instruction on release: https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=7c69493026add62256996d204e1278c0
Are vtable ptrs in trait object ptrs NonNull? Is that intended? This (2 year old) comment would suggest so https://github.com/rust-lang/rfcs/issues/433#issuecomment-345497470

CC @RalfJung

T-lang

Most helpful comment

If we decide this is not a bug, we'd have to adjust https://doc.rust-lang.org/nightly/nomicon/what-unsafe-does.html (and also the reference). It would be a very special case to require anything for raw pointers that are not dereferenced.

All 25 comments

trait T { }

fn main() {
    a(unsafe { std::mem::zeroed() });
}

fn a(_: *mut dyn T) {}
; ...
    call void @_ZN10playground1a17hbc18b5e956a8e54dE({}* %1, [3 x i64]* noalias readonly align 8 dereferenceable(24) %2), !dbg !145
; ...

The vtable is marked as dereferencable(24).

Thanks for reporting!

In https://github.com/rust-lang/unsafe-code-guidelines/issues/166 we have been discussing, rather theoretically, the validity requirements for wide raw pointers. We were not actually aware that there already are validity requirements in place for them. That seems very surprising to me. Was this ever discussed in depth, or did it "just happen"?

Cc @rust-lang/wg-unsafe-code-guidelines @eddyb

@bjorn3 on top of that, *const dyn Send has size 16 -- so, we do enum niche optimizations.

If we decide this is not a bug, we'd have to adjust https://doc.rust-lang.org/nightly/nomicon/what-unsafe-does.html (and also the reference). It would be a very special case to require anything for raw pointers that are not dereferenced.

The assumption is that the vtable is a valid reference, so that all raw pointers have valid pointee types.

*mut dyn Trait corresponds to (pseudo-notation) dyn T: Trait.*mut T or, further expanded, exists T.(*mut T, &<T as Trait>::vtable).

IMO, if the information associated with that T is invalid, or doesn't require a T to exist in the first place, that complicates the typesystem (or rather, the supposed typesystem that may have never been fully formally modeled - I've been meaning to do a writeup on dyn for a while now).

In custom DST terms, we could treat *mut T as (*mut T::Data, MaybeUninit<T::Meta>) (whereas &T would always have an initialized T::Meta), so overall maybe the cost of supporting this difference between raw pointers and references might not be that large.

There's also something to be said about having a null vtable, that Go people have some experience with (but I don't, so I'm not the best person to comment on it).

The assumption is that the vtable is a valid reference, so that all raw pointers have valid pointee types.

The curious part here is that no one involved in the UCG realized that this is the current status.

IMO, if the information associated with that T is invalid, or doesn't require a T to exist in the first place, that complicates the typesystem

Where is there a complication here? I mean not in terms of the implementation, that shouldn't be a primary concern in terms of language design, but in terms of documentation and teaching.

Requiring vtables of raw pointers to be valid complicates the language as well, it would be an exception from the principle to not require things off of raw pointers.

In custom DST terms, we could treat mut T as (mut T::Data, MaybeUninit)

I was more thinking of *mut dyn Trait as (usize, usize). But your proposal scales better to more generalized metadata, indeed. I like it.

I don't even think I'm the first to come up with that, but I can't remember who did.

In the end, I guess it's a tradeoff between:

  • safe size_of_val for *mut T, even for T: !Sized
  • ptr::null::<T>(), even for T: !Sized

    • I think this was discussed before, since I remember Go being brought up

(these are not the only outcomes, but they should be somewhat representative)

If one reads the custom DST proposal, there's a reason that we expect raw pointers to DSTs to have valid metadata. The rawness of the pointer is unrelated to the rawness of the metadata, imo -- even if you have *mut [T], you expect the length to be a usize.

  • safe size_of_val for *mut T, even for T: !Sized

There's also the ability for methods taking self: *const Self to be object safe. For now, such methods can be written and are treated as object safe, but the functionality is guarded by the arbitrary_self_types feature. The tracking issue for that feature has some discussion of whether raw pointer methods should be object safe.

(It claims that "even today you could UB in safe code with mem::size_of_val(foo)", where foo: *const dyn Foo, but I don't think that's true? size_of_val doesn't currently work on raw pointers.)

It seems like this is at least somewhat expected, because if all pointers could be zeroed then we wouldn't have had so many conversations about how to make ptr::null() work with extern type, and would have just changed its implementation to unsafe { mem::zeroed() }.

Also, do any of the choices here have any impacts to the optimizations to never reload vtables when multiple calls get made?

FWIW, ptr::null being restricted to Sized types is not just because of trait object vtables, there's also concerns about interaction with custom DSTs whose metadata may be a fair bit more complicated than just an usize or a pointer.

safe size_of_val for *mut T, even for T: !Sized

That won't hold up with custom DST anyway, e.g. if T is a "thin" trait object.
So, if we want to be future-compatible with custom DST, I think requiring the metadata to be valid for raw pointers buys us pretty much nothing?

There's also the ability for methods taking self: *const Self to be object safe.

Isn't the issue there just that calling methods with a "raw receiver type" will be unsafe?

there's also concerns about interaction with custom DSTs whose metadata may be a fair bit more complicated than just an usize or a pointer.

But if we go for MaybeUninit<T::Meta>, then that's uniform across all types.

That won't hold up with custom DST anyway, e.g. if T is a "thin" trait object.

Yeah it would be nice to not tie anything to the T::Meta but always require a reference, since you wouldn't need several traits, just something like DynSized. I wonder what we lose in that case.

Isn't the issue there just that calling methods with a "raw receiver type" will be unsafe?

So "it's UB to call with invalid Self::Meta"? Sounds reasonable.


Overall I feel like we should look into this a bit more, but I can see how the "low-level compromise" lets us focus on &T and not worry about raw pointers (e.g. regarding custom DSTs).

It's almost like raw pointers are union { &UnsafeCell<T>, usize }, huh.

Turns out that for raw slices, safe code can cause them to have invalid metadata:

123456789 as *const [(); usize::max_value()] as *const [()] as *const [u64]

Ouch.

@RalfJung Wow, so we couldn't actually have <[T]>::Meta = (T::Meta, usize), it would have to be something opaque that depends on the size of T (which in the future might get very complicated with T: !Sized slice elements).

Looks like probably it should be just usize, and the max-size comes in via the requirement that references (including slices) must point to memory that is allocated in a single allocation -- which for LLVM means it cannot be bigger than isize::MAX.

So, I was just interpreting that length requirement the wrong way so far. I'll adjust the various PRs (Reference, Nomicon, Miri) accordingly.

@RalfJung The issue is not invalid metadata -- invalid metadata is fine. The big thing that you want is _initialized_ metadata.

*const UnsizedType should contain <UnsizedType>::Meta no matter what, but there's no guarantee on the validity of that metadata for the pointed to object.

"initialized" as a term turned out to be pretty unsuited. I mean "valid" specifically as in "validity invariant".

Sure, whatever. It needs to be a valid value of type <T>::Meta

@ubsan You can circumvent that by defining raw pointers as having MaybeUninit<T::Meta> instead of T::Meta directly.

@eddyb you can, but I don't think there's a reasonable argument for it. Creating and destructuring raw pointers safely seems really nice to me.

"really nice" is IMO not a sufficient argument for introducing this kind of UB. We should have some optimization that's worth it, ir some amount of spec simplification that's worth it.

The "default" answer should be "let's avoid UB, to make unsafe code author's lives easier".

@RalfJung how does one introduce UB in this case? Are people generally returning mem::zeroed::<*mut Trait>()?

I would argue that this is removing UB that people get when calling methods through a *mut Trait

how does one introduce UB in this case? Are people generally returning mem::zeroed::<*mut Trait>()?

That would be an example of an operation that becomes UB under your semantics. Doesn't seem too strange to me to zero-initialize a struct with a wide raw pointer field.

I would argue that this is removing UB that people get when calling methods through a *mut Trait

There is literally no program that is UB under my semantics but not under yours.

This is not about the safety invariant, it is about the validity invariant.

@RalfJung _shrug_

I don't have a horse in this race. I just think they're nicer semantics. If you think it's more important to allow mem::zeroed than access to the metadata being safe, that's your prerogative.

Was this page helpful?
0 / 5 - 0 ratings