Rust: Tracking issue for ptr::offset_from (feature: ptr_offset_from)

Created on 5 Apr 2017 · 43Comments · Source: rust-lang/rust

PR: #40943

Adds an offset_to method to calculate the distance between two raw pointers.

List o' stuff:

[x] Rename to offset_from and make unsafe (with wrapping_offset_from for the safe one) https://github.com/rust-lang/rust/issues/41079#issuecomment-316905233 -- done in https://github.com/rust-lang/rust/pull/49297
- [ ] Should offset_from abort instead of panic on ZSTs?
[x] Given that the difference between arbitrary pointers is meaningless and subject to change at the whims of LLVM anyway, do we even want the safe version? https://github.com/rust-lang/rust/issues/41079#issuecomment-374426786 Experience reports from the field appreciated.
[ ] Should there be more methods like this with different restrictions (to allow nuw/nsw, perhaps) or that return usize (like how isize-taking offset is more conveniently done with usize-taking add these days)
[x] Ensure that our safety requirements align with the proposed llvm.psub https://github.com/rust-lang/rust/issues/57818

A-raw-pointers B-unstable C-tracking-issue Libs-Tracked T-libs

Source

Amanieu

🎉2 👍1

Most helpful comment

I'd like to see this stabilized. Looks like 18 months ago most concerns were already resolved, except maybe for the isize/usize issue that @scottmcm mentioned. Is that issue still present?

I am surprised there would be a problem here, since in LLVM an allocation cannot be larger than isize::MAX, so even if you know the relative order of the pointers, isize can still always represent their difference as as usize will be lossless.

RalfJung on 21 Jun 2020

🚀4

All 43 comments

I wonder whether this should have the unsafe + wrapping_ split that offset does.

The unsafe one could, for example, use sdiv exact by requiring that the pointers are to the beginning or past-the-end of real objects inside the same allocated object.

scottmcm on 21 Jul 2017

👍1

It seems suspicious at best to calculate the offset between two unrelated pointers... I agree with @scottmcm

strega-nil on 22 Jul 2017

@pornel made the point that this should be offset_from, with the order of the subtraction reversed, which I agree with. (Or, perhaps, both methods should exist.)

joshtriplett on 17 Oct 2017

Also, I find it unfortunate that this forces handling the ZST case even if you know the pointer type and know that it doesn't point to a zero-sized type.

Could we use traits to provide a method that only exists for non-zero-sized types, and then returns isize?

joshtriplett on 17 Oct 2017

👍3

I like @scottmcm's suggestion to have an unsafe ptr::offset_to that uses sdiv exact. This is the situation sdiv exact was designed for, and it often reduces a 5-instruction sequence down to 2. And, if the unsafe version could be documented to have UB if the pointers don't point into the same array (or one-past-the-end), it would make me less worried about breaking pervasive assumptions that LLVM makes about pointer differences [0]. And it would be sufficient for Vec and related use cases.

I suggest omitting the wrapping_ form though. Even if the implementation is entirely "safe" code, I agree with @ubsan that it's suspicious at best and doesn't seem like the kind of thing that should be encouraged via a convenient standard library function.

[0] For example, GetUnderlyingObject, which is used throughout the optimizer, assumes that getelementptr X, Y aliases a subset of X. This doesn't work if Y can be the difference from X to an independent object.

sunfishcode on 10 Nov 2017

I think that returning None for the zero-sized-type case is unnecessary. It is a very unlikely event, and a majority of the time the user knows they will never get the None case, so unwrapping is just clutter.

iitalics on 4 Jan 2018

👍1

Renaming to offset_from and swapping the subtraction order seems easy enough.

I don’t really understand the unsafe / sdiv exact aspect, though. I’ll leave that decision to someone else.

@Amanieu since you added this, what do you think?

SimonSapin on 17 Mar 2018

I have no objection to renaming it to offset_from.

The unsafe / sdiv exact issue basically boils around one question: what happens if the distance between the two given pointers is not a multiple of sizeof_of::<T>(). There's only 2 ways we can realistically handle this:

Make this situation UB (in which case offset_from becomes an unsafe function).
Round the result of the division towards zero.

The current implementation rounds towards zero, however I am considering going with the UB route instead. Note that this situation can only happen if one of the pointers is misaligned, so we could just require that both pointers be properly aligned for their type.

Amanieu on 17 Mar 2018

this situation can only happen if one of the pointers is misaligned

Can’t it also happen with aligned pointers, if size_of::<T>() > align_of::<T>()?

SimonSapin on 19 Mar 2018

That's a good point, I'm not sure why I thought of alignment there. You are correct.

Amanieu on 19 Mar 2018

@SimonSapin That's a good point. C manages to avoid that issue, because in C it's undefined behavior if the pointers aren't both pointing to elements of the same array (6.5.6p9). I don't know whether Rust's offset_from wants a similar rule, though you can get into dangerous territory with pointer aliasing if you access one object using a pointer to another plus the distance between the two.

sunfishcode on 19 Mar 2018

@sunfishcode it seems... incredibly suspect to allow offset_from between two pointers from different objects.

strega-nil on 20 Mar 2018

I started on a PR for this at https://github.com/rust-lang/rust/pull/49297; feedback and suggestions appreciated.

scottmcm on 23 Mar 2018

Amanieu and sunfishcode commented that doing fn(*const _, *const _) -> isize needs a subtraction that doesn't fit with any of the "does not overflow" flags available for LLVM sub (like the GEP rules are specific to it, and not available on a normal add instruction).

As such, should this API change form to something where it can be -> usize, and thus use nuw by requiring an order between the arguments? Such a thing probably shouldn't use the word offset; my strawman proposal is something involving distance (since I'm primed by C++).

scottmcm on 24 Mar 2018

Keep in mind that std::distance and pointer subtraction in C++ both return a signed value. This is one of the reasons why I proposed making offset_to return a signed value.

LLVM IR generated by Clang is just a sub (without nsw/nuw) and a sdiv exact, so we currently don't lose anything compared to C.

I considered the name distance, however I'm not a big fan of it because the name somewhat implies that the order of parameters doesn't matter and that an absolute value is returned.

Amanieu on 24 Mar 2018

❤1

In Nightly this is the tracking issue for three different inherent methods of both *const T and *mut T.

pub fn offset_to(self, other: *const T) -> Option<isize> where T: Sized {…}
pub unsafe fn offset_from(self, origin: *const T) -> isize where T: Sized {…}
pub fn wrapping_offset_from(self, origin: *const T) -> isize where T: Sized {…}

(Note: the *mut T methods also take a *const T parameter. I don’t know if this is intentional or if the parameter’s type should be changed to Self.)

The libs team discussed this and it wasn’t clear from this tracking issue or the implementation PR what is the motivation for this feature.

@Amanieu Could you comment on what situations this would be used in, why it should be in the standard library, and why three different methods are needed?

SimonSapin on 28 Mar 2018

My understanding is that offset_from is the "new" form which is intended to replace the old offset_to.

These methods are useful when converting between slices and raw pointers, which can happen when building data structures or with FFI. offset_to is currently used in two places in the standard library:

Amanieu on 28 Mar 2018

If this is a replacement, should offset_to be removed? It’s also unstable.

SimonSapin on 31 Mar 2018

I'll make a PR for that and to move vec&slice to offset_from. (Unless it should be deprecated for a bit first? I don't remember what the rules are for nightly things...)

scottmcm on 31 Mar 2018

PR up at https://github.com/rust-lang/rust/issues/41079

That did reinforce my feeling that isize, while good for consistency with offset, might not be the best signature, as the only places that use it in core & alloc know which pointer is higher and want usize.

scottmcm on 1 Apr 2018

I tried using the offset_from() method on a pointer today. I like the name and API; it seemed rather intuitive. The only issue I have is that the implementations of offset_from() and wrapping_offset_from() use assert!() rather than debug_assert!(). The result is that offset_from() is a bit slower than it could be when debug = false.

Is there a reason to prefer assert!()?

ghost on 20 Aug 2018

@dtrebbien The assert is on the size of the type, which is a compile-time constant, so I would expect it to always get optimized out. Are you seeing otherwise?

scottmcm on 20 Aug 2018

It looks like I am seeing a slight difference, but I am not entirely sure.

I am currently working on a piece of code where offset_from() would be used rather frequently. When I run my benchmark using:

let all_code_units_ptr = all_code_units.as_ptr();

//...

let code_unit_index = unsafe { iter.as_slice().as_ptr().offset_from(all_code_units_ptr) } as usize - 1;

//...

let end_code_unit_index = unsafe { slice.as_ptr().offset_from(all_code_units_ptr) } as usize - 1;

.. then my benchmark numbers are:

bench:   6,136,818 ns/iter (+/- 344,438) = 751 MB/s
bench:   6,156,946 ns/iter (+/- 349,123) = 748 MB/s
bench:   6,140,547 ns/iter (+/- 298,704) = 750 MB/s
bench:   6,135,250 ns/iter (+/- 248,410) = 751 MB/s
bench:   6,141,995 ns/iter (+/- 228,005) = 750 MB/s

When I use this "inlined" version instead:

let all_code_units_ptr_as_usize = all_code_units.as_ptr() as usize;

//...

let code_unit_index = unsafe { intrinsics::exact_div(iter.as_slice().as_ptr() as usize - all_code_units_ptr_as_usize, mem::size_of::<CodeUnitT>()) } - 1;

//...

let end_code_unit_index = unsafe { intrinsics::exact_div(slice.as_ptr() as usize - all_code_units_ptr_as_usize, mem::size_of::<CodeUnitT>()) } - 1;

.. then my benchmark numbers are:

bench:   6,120,998 ns/iter (+/- 205,285) = 753 MB/s
bench:   6,108,348 ns/iter (+/- 198,379) = 754 MB/s
bench:   6,109,708 ns/iter (+/- 223,981) = 754 MB/s
bench:   6,113,860 ns/iter (+/- 230,122) = 754 MB/s
bench:   6,123,206 ns/iter (+/- 221,520) = 752 MB/s

Here, CodeUnitT is u8.

ghost on 20 Aug 2018

Now that I'm not on my phone, I tried this out explicitly in playground.

pub unsafe fn demo(x: *const u8, y: *const u8) -> isize {
    x.offset_from(y)
}

; playground::demo
; Function Attrs: norecurse nounwind readnone uwtable
define i64 @_ZN10playground4demo17h3547ff97a5dad82dE(i8* %x, i8* %y) unnamed_addr #0 {
start:
  %0 = ptrtoint i8* %x to i64
  %1 = ptrtoint i8* %y to i64
  %2 = sub i64 %0, %1
  ret i64 %2
}

Which, as expected, has no sign of the assert. So while there may be some other issue here, it's more complex than just the debug_assert! vs assert!.

Do you have a bench you can share? Are you building with incremental? With LTO? With multiple codegen units? Do you see something different with a locally-built nightly without the assert!?

scottmcm on 21 Aug 2018

Thanks for looking into this. I can see that you are right; there isn't an effect of debug_assert!() vs. assert!(). There must be something else at play.

My benchmark is not public at the moment, but I am working to try to clean up everything so that I can eventually publish it.

ghost on 21 Aug 2018

Here are my thoughts on the remaining issues:

Regarding usize/isize, I think that we can defer this concern: regardless of whether we add a usize API later, having a function returning isize will always be useful, and match the behavior of pointer subtraction in C.
Regarding wrapping_offset_from, I personally feel that there isn't much point to this function. The only thing that I can think of could be to calculate the position of a &T in a slice without using any unsafe code, but then again this is sufficiently niche that I don't feel it is worth adding a function just for that.

Amanieu on 25 Oct 2018

👍1

On usize vs isize, slices can't use offset_from in part because they know that usize is ok:

https://github.com/rust-lang/rust/blob/db592f4627251cfd8571a05cf8e06a56be9470c2/src/libcore/slice/mod.rs#L2998-L3008

I can't ever see a reason I wouldn't know the relative orders of the pointers using this, so leaving perf on the table in order to return isize makes me sad. Any good ideas for a name? sub is taken...

scottmcm on 8 Jul 2019

I was just looking at C++20's midpoint function (the (2) variant for pointers) and I went looking to see if Rust had something similar on raw pointers, which brought me here. I think ptr::midpoint(a, b) would be a neat little addition to Rust to make unsafe algorithm implementations a little more readable, without much libstd maintenance burden.

ptr::midpoint(a, b) would be equivalent to (and exactly as unsafe as) a.offset(b.offset_from(a) / 2) if I'm not mistaken. Just thought I'd leave this here since it's a bit minor on its own but it has the exact same unsafety nuances as offset_from as far as I can tell.

solson on 17 Sep 2019

Before I can continue with https://github.com/rust-lang/rust/pull/63810 (making offset_from const fn), we need to figure out what the expected semantics are. From the docs I would presume that the doctests actually perform UB:

#![feature(ptr_offset_from)]

fn main() {
    let mut a = [0; 5];
    let ptr1: *mut i32 = &mut a[1];
    let ptr2: *mut i32 = &mut a[3];
    unsafe {
        assert_eq!(ptr2.offset_from(ptr1), 2);
        assert_eq!(ptr1.offset_from(ptr2), -2);
        assert_eq!(ptr1.offset(2), ptr2);
        assert_eq!(ptr2.offset(-2), ptr1);
    }
}

The second assert looks to me like it violates

The distance being in bounds cannot rely on "wrapping around" the address space.

because ptr1 is smaller than ptr2, we need to wrap around the address space in order to get the offset to ptr1

The third and fourth assert look to me like they violate

Both the starting and other pointer must be either in bounds or one
byte past the end of the same allocated object. Note that in Rust,
every (stack-allocated) variable is considered a separate allocated object.

because integers cannot be in the same address space as pointers

This also makes me wonder whether isize makes any sense, if only positive values are legal anyway.

oli-obk on 17 Sep 2019

The second assert looks to me like it violates

The distance being in bounds cannot rely on "wrapping around" the address space.

because ptr1 is smaller than ptr2, we need to wrap around the address space in order to get the offset to ptr1

I'm not sure I agree with that interpretation. Rather, I think the requirements regarding "wrapping around" mean that you can't have an "object" split in memory (at the end of the address space and wrapping around to the beginning). That's not the case here; the array (and all objects in Rust) are not split. Perhaps the requirement can be clarified, though.

The third and fourth assert look to me like they violate

Both the starting and other pointer must be either in bounds or one
byte past the end of the same allocated object. Note that in Rust,
every (stack-allocated) variable is considered a separate allocated object.

because integers cannot be in the same address space as pointers

Perhaps the docs could be clarified further, but in this case I would expect the "allocated object" to be the array, not the individual array elements. Otherwise you could never use pointer arithmetic to access array elements, and I don't think the intention was to ever prohibit that. I see no problem with the third and fourth asserts here taking the interpretation of "allocated object" to be the array.

mjbshaw on 17 Sep 2019

Perhaps the docs could be clarified further, but in this case I would expect the "allocated object" to be the array, not the individual array elements. Otherwise you could never use pointer arithmetic to access array elements, and I don't think the intention was to ever prohibit that. I see no problem with the third and fourth asserts here taking the interpretation of "allocated object" to be the array.

Oh I fully agree, my problem is that 2 is not a valid address inside the allocated object

oli-obk on 17 Sep 2019

I don't think the docs are talking about the offset value (2 and -2 in this case). They're talking about the offset function's input pointer and returned pointer.

mjbshaw on 17 Sep 2019

Both the starting and other pointer must be either in bounds or one byte past the end of the same allocated object.

I assume starting to be the argument ("from" which we want the offset) and other pointer to be the self (the address we want to know the offset relative to the from argument). I'm not sure how else to read this. The docs need significant improvements if they are interpreted differently.

oli-obk on 17 Sep 2019

The offset documentation uses slightly different wording (you quoted the offset_from documentation). I think offset's documentation is reasonably clear:

Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object.

"starting" is the self pointer and "resulting pointer" is the return value.

mjbshaw on 17 Sep 2019

offset and offset_from are very different operations, so let's disect this

c = a.offset(b)

means that a and c must be part of the same object. So you can use this to access array elements, no problem. This also means that

(&x[0] as *const i8).offset(&x[1] as *const i8)

is UB, because the resulting 1 is clearly not part of the allocation x

Now if we apply this logic to offset_from, using the same variables from above to invert the operation

b = c.offset_from(a)

then that means that

(1 as *const i8).offset_from(&x[0] as *const i8)

is UB, even if it results in the same value as &x[1] as *const i8

The legal way to use it is

(&x[1] as *const i8).offset_from(&x[0] as *const i8)

resulting in 1, but this time without UB

So for offset_from c and a must be in the same allocation, just like in offset.

oli-obk on 18 Sep 2019

c = a.offset(b)
means that a and c must be part of the same object. So you can use this to access array elements, no problem.

Agreed.

This also means that
(&x[0] as *const i8).offset(&x[1] as *const i8)
is UB,

That's a compiler error. &x[1] as *const i8 is a pointer type, which does not coerce to isize. The b expression must be an isize.

because the resulting 1 is clearly not part of the allocation x

I'm not sure what you mean by the "resulting 1".

(1 as *const i8).offset_from(&x[0] as *const i8)

is UB,

Agreed, but that expression is a very different. Where is 1 as *const i8 coming from?

even if it results in the same value as &x[1] as *const i8

No, it results in a very different value (even ignoring UB and the fact that offset_from returns an isize and not a *const _). The offset_from function is a subtraction operation, not an addition operation. Using pseudocode, &x[1] is 1 + &x[0], not 1 - &x[0].

The legal way to use it is
(&x[1] as *const i8).offset_from(&x[0] as *const i8)
resulting in 1, but this time without UB

So for offset_from c and a must be in the same allocation, just like in offset.

Agreed.

mjbshaw on 18 Sep 2019

:man_facepalming: I realized now that the tests use offset and not offset_from. Sorry that I did not listen to you and instead went off on a rant.

oli-obk on 18 Sep 2019

I'd like to see this stabilized. Looks like 18 months ago most concerns were already resolved, except maybe for the isize/usize issue that @scottmcm mentioned. Is that issue still present?

RalfJung on 21 Jun 2020

🚀4

I would also like to see this resolved.

Since the only valid application appears to be comparing two positions within a slice, a safe API is possible, if that helps.

impl<T> [T] {
    fn ptr_offset(&self, p: *const T) -> Option<usize>;
    /// Returns `None` if either pointer does not point to a valid position within this slice
    fn ptr_sub(&self, p: *const T, q: *const T) -> Option<isize>;
}

Of course, the latter is certainly not optimal. Personally I care more about making this possible (in stable Rust) than optimal.

dhardy on 11 Jul 2020

A safe API is very tricky at best: offset_from has the same problem as methods that "mend" two slices together when their one-past-the-end and first pointer are equal. Just because slice_base + len <= p does not mean it is safe to call p.offset_from(slice_base).

However, (p as usize as *const _).offset_from(slice_base) should work -- except that LLVM has some long-standing bugs in this area.

RalfJung on 11 Jul 2020

Stabilization PR is up: https://github.com/rust-lang/rust/pull/74238
I don't know what it takes to summon rfcbot, I hope someone can take over. :)

RalfJung on 11 Jul 2020

A safe API is very tricky at best: offset_from has the same problem as methods that "mend" two slices together when their one-past-the-end and first pointer are equal. Just because slice_base + len <= p does not mean it is safe to call p.offset_from(slice_base).

But it's always safe to call wrapping_offset_from – or, since that's deprecated, cast both pointers to usize and do the subtraction that way. It might spuriously report that a pointer is one-past-the-end of a slice when it's actually at the start of a different allocation, but that's not itself unsafe or UB, just surprising...

comex on 12 Jul 2020

But it's always safe to call wrapping_offset_from – or, since that's deprecated, cast both pointers to usize and do the subtraction that way. It might spuriously report that a pointer is one-past-the-end of a slice when it's actually at the start of a different allocation, but that's not itself unsafe or UB, just surprising...

True.

RalfJung on 12 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings