reflect.SliceHeader
and reflect.StringHeader
are clumsy to use because their Data
fields have type uintptr
instead of unsafe.Pointer
.
This proposal is to add types unsafe.Slice
and unsafe.String
as replacements. They would be declared just like their package reflect analogs, except with unsafe.Pointer
-typed Data
fields:
type Slice struct {
Data Pointer
Len int
Cap int
}
type String struct {
Data Pointer
Len int
}
Additionally, I suggest that for the purposes of type conversion, we treat that string
and unsafe.String
have the same underlying type, and also []T
and unsafe.Slice
. For example, these would be valid:
func makestring(p *byte, n int) string {
// Direct conversion of unsafe.String to string.
return string(unsafe.String{unsafe.Pointer(p), n})
}
func memslice(p *byte, n int) (res []byte) {
// Direct conversion of *[]byte to *unsafe.Slice, without using unsafe.Pointer.
s := (*unsafe.Slice)(&res)
s.Data = unsafe.Pointer(p)
s.Len = n
s.Cap = n
return
}
While the same results can be achieved using unsafe.Pointer
conversions, by using direct conversions the compiler can provide a little extra type safety.
If we do this, we should figure out a way to exempt these new types from the Go 1 compatibility guarantee, so that we can change the representation of strings and slices in the future. I'm not sure how best to do that.
@ianlancetaylor reflect.SliceHeader and reflect.StringHeader already try:
It cannot be used safely or portably and its representation may change in a later release.
but the compat doc itself gives a strong exemption for all of unsafe:
Packages that import unsafe may depend on internal properties of the Go implementation. We reserve the right to make changes to the implementation that may break such programs.
ISTM that unsafe.{Slice,String} would already be exempted sufficiently.
Go 2 seems like the time to think about this (and reflect.SliceHeader etc).
-rsc for @golang/proposal-review
This proposal seems a bit redundant with https://github.com/golang/go/issues/13656.
How much of the use-case is "create a string or slice aliasing C memory" vs. "manipulate existing strings and slices by tweaking header fields unsafely"?
I'd like to suggest renewing consideration of this proposal for Go 1.14. I think it will be useful for users trying to address issues flagged by -d=checkptr.
I'll also offer a counter-proposal that I think better addresses most end user needs in a more ergonomic manner:
package unsafe
func Slice(ptr *ArbitraryType, len, cap int) []ArbitraryType
func String(ptr *byte, len int) string
[Edit: As discussed below, I'm now in favor of combining Slice's len/cap parameter into a single parameter.]
This is a little less versatile than exposing the Header types, but I think it will minimize typing for most users, while also providing better type safety.
We could also do both this proposal and my original one, if we want to still offer the full flexibility of the Header types. In that case, I would suggest renaming the types to SliceHeader and StringHeader, and reserve the shorter Slice and String identifiers for the constructor functions.
I like that counter proposal API.
A few additional thoughts to add to my counter proposal:
We should decide what happens when len < 0
or cap < len
. I'm leaning towards panic, but maybe we should just leave it unspecified/undefined.
Edit: ptr == nil && len > 0
is another case to consider.
Edit 2: Also, len > MAXWIDTH / unsafe.Sizeof(*ptr)
.
The functions would be builtins; in particular, users can't write f := unsafe.String; f(...)
.
The cap
argument to unsafe.Slice
can be optional; if it's omitted, the len
argument is used. (Just like make([]T, n)
is shorthand for make([]T, n, n)
.)
Perhaps the int
parameters should actually follow the same goofy semantics that make([]T, n, m)
follows. (I.e., make([]T, uint64(10), int8(20))
is valid, even though uint64 and int8 aren't normally assignable to int.)
Since unsafe.String
would be a builtin, it could evaluate to an untyped string.
That API is closer to what I had suggested in https://golang.org/issue/13656#issuecomment-303216308, and we've been using that variant within Google for a couple of years now without complaints.
If the type desired for the slice does not match the pointer that the user has (for example, if one is a cgo-generated type and the other is a native Go type), I'm assuming that the caller could do something like:
var s = unsafe.Slice((*someGoType)(unsafe.Pointer(cPtr)), len, cap)
to set the element type?
We should decide what happens when
len < 0
orcap < len
. I'm leaning towards panic, but maybe we should just leave it unspecified/undefined.
I would leave it unspecified, but panic
is a fine implementation of āunspecifiedā.
Perhaps the
int
parameters should actually follow the same goofy semantics thatmake([]T, n, m)
follows.
That would certainly smooth out the call site in the (overwhelmingly common) case that len
and/or cap
is a C.size_t
.
If the type desired for the slice does not match the pointer that the user has (for example, if one is a cgo-generated type and the other is a native Go type), I'm assuming that the caller could do something like:
var s = unsafe.Slice((*someGoType)(unsafe.Pointer(cPtr)), len, cap)
to set the element type?
Yeah, that's my thought. If a user wants to convert *T
into []U
, then I think it's reasonable to require an explicit conversion there.
I would leave it unspecified, but panic is a fine implementation of āunspecifiedā.
Ack, though my concern is if we panic by default, then users might come to rely on it panicking and not write their own checking.
It would be easy to put the panic behind -d=checkptr
though.
func Slice(ptr *ArbitraryType, len, cap int) []ArbitraryType
Can we instead do:
func Slice(ptr *ArbitraryType, len int[, cap int]) []ArbitraryType
... with an optional cap. Where omitting cap means cap == len?
@bradfitz Yeah, that's my additional thought #3 above. :)
if we panic by default, then users might come to rely on it panicking and not write their own checking.
Hmm, good point. We could make it a throw! š
Or we could make it a panic
in ordinary code but a throw
under -race
or -d=checkptr
. (The important thing, I think, is to vary it just enough that it causes tests to fail in some reasonably-common configuration.)
(The important thing, I think, is to vary it just enough that it causes tests to fail in some reasonably-common configuration.)
Obviously we should just make it randomly do one or the other, like map iteration order. :>
Do you need to be able to specify len < cap? After it's constructed, it's a slice.
@jimmyfrasche, good point. Requiring only cap would simplify the API and remove a number of behavior questions.
Yeah, I like simplifying it to just:
func Slice(ptr *ArbitraryType, cap int) []ArbitraryType
It's not much more typing to write unsafe.Slice(ptr, cap)[:len]
instead of unsafe.Slice(ptr, len, cap)
, and like @bradfitz points out it's a little simpler to specify and implement.
It's not much more typing to write unsafe.Slice(ptr, cap)[:len] instead of unsafe.Slice(ptr, len, cap), and like @bradfitz points out it's a little simpler to specify and implement.
This was my argument against the 3-arg slice op: a[x:y:z] = a[x::z][:y-x]
, so we would only need the 2-arg [x:y]
and [x::y]
operators. I lost that argument :(
https://go-review.googlesource.com/c/go/+/202080 contains a prototype implementation of the counter-proposal for cmd/compile.
Edit: And CL 202082 demonstrates some usage of it within the Go runtime.
Change https://golang.org/cl/202080 mentions this issue: cmd/compile: implement unsafe.Slice and unsafe.String
Change https://golang.org/cl/202082 mentions this issue: runtime: make use of unsafe.Slice
Change https://golang.org/cl/201839 mentions this issue: cmd/compile: recognize (*[Big]T)(ptr)[:n:m] pattern for -d=checkptr
@rsc suggests adding a new, optional, parameter to make
of a slice type:
make([]byte, p, l, c)
This example would make a new slice of type []byte
with the underlying array set to p
, the length set to l
, and the capacity set to c
. This could be used with any slice type. The new p
parameter would be required to have type unsafe.Pointer
(which would permit distinguishing this new case from existing ones, in which the second parameter must have integer type or be an untyped constant). The capacity parameter c
would be optional as it is today.
I don't see how further overloading make
buys that much in terms of parsimony. And it makes it less obvious at call sites what is going on; the types of the parameters can be effectively invisible to a human reader of code, but unsafe.Slice
never will be.
I don't think make
is actually a great fit. For all other types, make
allocates some kind of backing store (on either the stack or the heap) and returns header that refers to that backing store.
In contrast, punning an unsafe.Pointer
to a slice does not allocate any kind of backing store ā it only returns a header that refers to the existing data. That feels like a different operation to me ā more like a conversion than a make
.
My initial reaction is that having make
construct a []T
from unsafe.Pointer
is less type-safe than unsafe.Slice
constructing []T
from *T
, but after re-reviewing the likely use sites, it seems less of a problem than I initially suspected. In most of the cases in CL 202082, the *T
was explicitly converted from unsafe.Pointer
anyway.
I think like @josharian says, it might be tricky reading the code to understand what's going on, but I think technically it works and addresses the issue.
@bcmills I think there's some analogy to be made to C++'s placement new
operator. Normally new
allocates new memory, but placement new
uses existing memory.
So we going to call it "placement make"?
The ergonomics of the āplacement makeā form seem worse compared to unsafe.Slice
. To draw an example from https://golang.org/cl/202082:
Under @mdempsky's proposal, we have:
scases := unsafe.Slice(cas0, ncases)
with the element type propagated from element type of the existing cas0
variable.
In contrast, under @rsc's alternative we would have:
scases := make([]scase, unsafe.Pointer(cas0), ncases)
Riffing on the theme of āfeels more like a conversionā, maybe we could allow arrays of non-constant type in conversions, provided that they are immediately sliced:
scases := (*[ncase]scase)(unsafe.Pointer(cas0))[:]
If we wait for generics unsafe.Slice is no longer magical:
package unsafe
func Slice<T>(p *T, cap int) []T
We could do unsafe.String
now, no generics required.
The reverse operation for Slice
is easy, just &s[0]
with a special condition for len==0
. Maybe we also want the reverse operation for strings?
// Caller must not modify the memory pointed to by the return value
func StringPtr(s string) *byte // or unsafe.Pointer?
or maybe
// Caller must not modify the memory pointed to by the return value
func StringAsSlice(s string) []byte
@bcmills I think that's what I was referring to about unsafe.Pointer
vs *T
. While that function is a bit less ergonomic under @rsc's proposal, other ones like:
xmhdr := unsafe.Slice((*method)(add(unsafe.Pointer(x), uintptr(x.moff))), nt)
methods := unsafe.Slice((*unsafe.Pointer)(unsafe.Pointer(&m.fun[0])), ni)
would become:
xmhdr := make([]method, add(unsafe.Pointer(x), uintptr(x.moff)), nt)
methods := make([]unsafe.Pointer, unsafe.Pointer(&m.fun[0]), ni)
which are shorter.
I think select.go isn't a great example either. That function is called by code generated by the Go compiler, which has its own separate type signatures for the function anyway, so the type safety there is superficial. We could just change the cas0 parameter's type to unsafe.Pointer.
@randall77, note that I proposed essentially the same thing using reflect
in https://github.com/golang/go/issues/13656#issuecomment-303216308.
Riffing on the theme of āfeels more like a conversionā, maybe we could allow arrays of non-constant type in conversions, provided that they are immediately sliced:
I think this would work, but we'd probably need to decide how go/types should handle it. In particular, what go/types.Type
would one of these non-constant array type literals have?
You could spell this:
scases := (*[ncase]scase)(unsafe.Pointer(cas0))[:]
like this instead:
scases := (*[...]scase)(unsafe.Pointer(cas0))[:ncase]
and then, following the discussion in #13656, resolve ...
in this case to be the largest possible for the type.
(Although I can't say I really like either way of spelling it.)
We could do unsafe.String now, no generics required.
I don't think we want to do unsafe.String at all. It's too unsafe. (It's too easy to create a string that then survives forever with a broken promise and changing memory underneath).
If people want a string, they can string(byteSlice)
and get a safe copy.
I can see the appeal of "placement make" from a Go language perspective (i.e., it's making a []T
value, and exposing more user control over the component element values). And I can understand why CL 202082 would make it seems like that's a better solution than unsafe.Slice
.
But I think that's because CL 202082 is focused solely on the Go runtime, where we end up having to do a lot of manual pointer arithmetic anyway.
I expect the more common case where users will have a (ptr, len) pair and want to construct a slice will be users working with C APIs. In that case, cgo provides a type-safe *T
pointer value for them, and it seems unfortunate to require them to write make([]T, unsafe.Pointer(ptr), len)
instead of unsafe.Slice(ptr, len)
.
So I think I'm still leaning towards unsafe.Slice(ptr, len)
as the least error-prone solution for users.
@mdempsky
In particular, what
go/types.Type
would one of these non-constant array type literals have?
Fortunately, the documentation for go/types
already gives us a good answer. Per https://golang.org/pkg/go/types/#NewArray:
A negative length indicates an unknown length.
The go/types.Type
for the result of the conversion would be *Pointer
. Call that type p
. Then we have:
p.Elem()
has concrete type *types.Array
.p.Elem().Elem()
is the element type indicated in the conversion expression.p.Elem().Len()
is negative (presumably -1), since the value of the length expression is only known an run-time.If such an expression is bound to a variable, I would decay the type of that variable to *[0]T
, analogous to how an untyped integer constant decays to int
.
@josharian
The problem with spelling it
scases := (*[...]scase)(unsafe.Pointer(cas0))[:ncase]
is that the slice expression normally only sets the length. It seems inconsistent to have that also set the capacity in this case.
In contrast, I want the resulting slice to always have both a well-defined length and capacity.
We could instead _require_ the three-argument slice form for arrays of indeterminate length:
scases := (*[...]scase)(unsafe.Pointer(cas0))[:ncase:ncase]
but that seems like unnecessary duplication for the vast majority of real use-cases.
@bcmills
Fortunately, the documentation for
go/types
already gives us a good answer.
Good point. That does seem like a good solution.
If such an expression is bound to a variable, I would decay the type of that variable to
*[0]T
, analogous to how an untyped integer constant decays toint
.
I'd suggest leaving *[nonConst]T
an error outside of a slice expression, in which case it's fine to leave the variable type as *[-1]T
instead of decaying it to something else.
(That said, I think (*[n]T)(p)[:]
has the same issue as make([]T, p, n)
that it requires T
to be explicitly typed by the user, and thus loses some type safety for common uses.)
@randall77
If we wait for generics unsafe.Slice is no longer magical:
Package unsafe is inherently magical, so I think waiting for generics to make it non-magical only makes sense if we're wanting to put unsafe.Slice
in a different package. But since it's an operation that the compiler can't guarantee is safe, package unsafe seems like the right place to me to make it stand out during code review.
Just to throw another idea out there: If the main use for this is cgo, we could push some magic into cgo. C.SliceOfT(ptr *T, cap int)
could generate code that does the conversion for type T.
It's not just cgo, though. syscall packages are full of this sort of stuff.
Regarding the objection of adding magic functions to the unsafe package, the unsafe package is already pretty magic:
https://golang.org/pkg/unsafe/#ArbitraryType
(And all its uses)
Adding a polymorphic unsafe function (as opposed to extending the polymorphic built-in make
) wouldn't make the unsafe package much more weird. There's already precedent for how to godoc it.
A case that this would help in the standard library: #37350.
Of all the proposals, in this thread and the many in the past, unsafe.Slice(p, n)
looks like it would be the easiest to write and it would be the easiest to read by human and machine alike.
There are very nice properties to have when doing something unsafe and this is a fundamentally unsafe thing to do.
I've personally only needed something like this when messing with cgo, but I don't recall any code I've written with cgo that wouldn't be simpler and more correct with this. Every case I can think of could be rewritten as just unsafe.Slice(v, int(n))
.
A place where the slice header type would have been useful for me: https://github.com/smasher164/mem/blob/11c40568d91b031ff1d4049628ff83b46bbdc4f2/map_unix.go#L22.
Retrieving the underlying address for a segment gotten from unix.Mmap
requires a conversion to a slice header. While the above munmap implementation could make use of unsafe.Slice(p, n)
, the mmap implementation cannot. In my view, the flexibility that a guaranteed struct layout provides is far outweighed by the ergonomics of using a polymorphic function.
@smasher164, assuming that the size is nonzero, you don't need a header type to obtain the address of an mmap'd slice. &(b[0])
should suffice.
A place where the slice header type would have been useful for me: https://github.com/smasher164/mem/blob/11c40568d91b031ff1d4049628ff83b46bbdc4f2/map_unix.go#L22.
Note that the safe coding pattern here is:
sl := (*reflect.SliceHeader)(unsafe.Pointer(&b))
return unsafe.Pointer(sl.Data), nil
It's not guaranteed that you can convert a pointer to a slice to a pointer to any other struct type except reflect.SliceHeader
.
@bcmills Thank you for pointing that out.
It's not guaranteed that you can convert a pointer to a slice to a pointer to any other struct type except reflect.SliceHeader.
@mdempsky I see. I assumed that the conversion would be correct based on the usage in x/sys/unix/syscall_unix.go#L117.
I assumed that the conversion would be correct based on the usage in x/sys/unix/syscall_unix.go#L117.
Thanks. We should fix that. That file's not following best practice regarding unsafe.Pointer. (This is a separate issue though.)
@randall77 @ianlancetaylor @griesemer
If we wait for generics unsafe.Slice is no longer magical:
package unsafe func Slice<T>(p *T, cap int) []T
With generics, the _type signature_ is no longer magical, but I think the implementation still is: looking at the code in https://github.com/golang/go/issues/13656#issuecomment-303216308, a generic implementation would require either a compile-time-constant min
function to compute the size of the array to be sliced, or a compiler change to allow ephemeral use of types larger than the address space (so that an implausible array size can be used nonetheless).
The reason is that the size of [maxInt]T
can overflow the address space if T
is large, and the compiler currently rejects arrays of sizes that exceed the address space (https://play.golang.org/p/zyourYb_JpR), presumably in part because unsafe.SizeOf
would be unable to report the size of the type. (This problem is especially acute on 32-bit systems, but also possible on 64-bit systems.)
The workaround in https://github.com/golang/go/issues/13656#issuecomment-303216308 is to size the array to āthe maximal array type that fits in the address space or [maxInt]T, whichever is smallerā, but I don't see a way to compute that value at compile-time. (Perhaps someone with better bit-twiddling skills could find a constant expression that does that ā but even then, good luck explaining that expression in a code comment!)
@bcmills If we had generics, including a way to compute unsafe.Sizeof(*new(T))
, then I think we could implement Slice
using reflect.SliceHeader
without any further magic functionality.
Ah, that's a good point: we could use the SliceHeader
to bypass the large-array-type conversion entirely. I wonder if there's a way to do that in the reflect
implementation too...
It does appear to be possible to implement using reflect.SliceHeader
instead of reflect.ArrayOf
.
(Reference reflect
implementation here. Bugs are likely.)
So, #38203 was closed as a duplicate of this, but for reference, there is a subtle difference in that proposal, which is that it doesn't propose a struct with fields you can examine. Rather, it proposes an opaque type much more like unsafe.Pointer
, only which represents "a slice", and which can be converted to any slice type directly, but doesn't allow things like overriding len/cap/pointer. So the idea would be something like b := []byte(unsafe.Slice(intSlice))
to get a byte slice which refers to the backing store of the provided intSlice, and works regardless of how many bytes there are in an int
, etcetera.
@seebs There are several different ideas described in this issue. It's gone well beyond @mdempsky's original idea.
Regardless of whether the type signature is magical, under the current generics draft, all call sites for the magical function would be valid call sites to the generic function. Based on the type signature in generic func Slice<T>(p *T, cap int) []T
, the type argument can always be inferred. Doesn't this mean that func Slice(ptr *ArbitraryType, cap int) []ArbitraryType
can be introduced before generics, and its signature can be upgraded if generics land, without violating backwards compatibility?
Doesn't this mean that
func Slice(ptr *ArbitraryType, cap int) []ArbitraryType
can be introduced before generics, and its signature can be upgraded if generics land, without violating backwards compatibility?
I was going to say that that could be a problem for assignments to variables of function type, but it appears that the builtin functions with magical signatures cannot be assigned to ordinary variables:
https://play.golang.org/p/MzMW0J-zkPp
So I believe you are correct.
Change https://golang.org/cl/231223 mentions this issue: internal/unsafeheader: consolidate stringHeader and sliceHeader declarations into an internal package
@rsc, @griesemer, @ianlancetaylor, @bradfitz, @andybons, @spf13 How does this move forward? Why hasn't the proposal meeting discussed this issue yet?
The only official proposal meeting action taken on this was @rsc's comment in 2017:
Go 2 seems like the time to think about this (and reflect.SliceHeader etc).
-rsc for @golang/proposal-review
The issue tracker lists 7 Go2 proposals that have been accepted: https://github.com/golang/go/issues?q=is%3Aissue+label%3Ago2+label%3Aproposal-accepted
@ianlancetaylor relayed an alternative suggestion from @rsc in 2019, which @bcmills and I discussed at some length, and I summarized as still favoring unsafe.Slice(ptr, len)
(and evidently supported by issue followers, based on the thumbs ups).
As a reminder, the proof-of-concept cmd/compile implementation is at https://go-review.googlesource.com/c/go/+/202080 and a quick example of its usage at https://go-review.googlesource.com/c/go/+/202082.
@mdempsky, I think most of the Go 2 changes have been blocked behind generics. We did a few small changes early on to exercise our "make language change" muscles. And Robert and Ian have been focusing on the generics design, essentially to the exclusion of other Go 2-tagged changes. We only have so much time (less so since the world went wonky).
Now that this has been moved back to the non-Go2 queue, we'll take a look at it in the regular proposal meetings.
Based on your comment just above this one, it sounds like maybe the proposal we should be evaluating is not the description in the top comment on this issue but now the shorter idea "add the function unsafe.Slice(ptr *T, len int[, cap int]) []T". Is that understanding correct? Thanks.
I think there's two separate use cases being considered here.
One is "I know I have the address of one or more T, I want to use that slice". the other is "I want to type-pun this existing slice of T1 into a slice of T2". The func Slice(*T, len, cap) []T
function is good for the first case, but not as good for the second. The unsafe.Slice
type which is parallel to unsafe.Pointer
is good for the second, but not for the first. My idea for the unsafe.Slice
type would have made it, not a struct, but an opaque type, similar to the way unsafe.Pointer works -- it's just a type, it's not a composite of things, you can't do anything with it but use it as an intermediate state in conversions or pass it to functions expecting one. This would imply, I think, more compiler magic to deal with it -- the compiler would be the one generating the conversion code.
So for instance, given b []byte
, you'd have things like:
int64s := ([]int64)(unsafe.SliceType(b))`
The rationale for this is, in part, that it avoids the risk of someone getting the numbers wrong:
int64s := unsafe.SliceFunc((*int64)(unsafe.Pointer(&b[0])), len(b) / 4)
On the other hand, this does basically nothing for the case where you have a pointer already. So I think they both have utility, but solve different problems. I would rather have the unsafe.Slice
function than not, but I would love to have the arbitrary-slice type which can handle the type punning conversions (and, implicitly, fix up len/cap automatically, so it's impossible to get them wrong).
And yes, I'm aware that it's not necessarily safe to treat bytes as int64s, but that's why the package is unsafe
, not perfectlySafe
.
@rsc
Based on your comment just above this one, it sounds like maybe the proposal we should be evaluating is not the description in the top comment on this issue but now the shorter idea "add the function unsafe.Slice(ptr *T, len int[, cap int]) []T". Is that understanding correct? Thanks.
My current proposal is to add two new builtin functions to package unsafe:
func Slice(ptr *ArbitraryType, len int) []ArbitraryType
func String(ptr *byte, len int) string
By builtin function, I mean that using them in non-call contexts (e.g., f := unsafe.String
) would be invalid, just like other Go builtin and package unsafe functions.
A few design questions still are still open, listed with my best summary of the current discussion on them:
What to do about len < 0
, ptr == nil && len > 0
, or len > MAXWIDTH / Sizeof(*ptr)
?
I'm still inclined to say we should panic. I see these functions as being there to help users correctly create slices/strings from pointers, and checking these invariants for them is part of making them safer. I think if users actually want to create these invalid slice values, they need to use reflect.SliceHeader.
@bcmills recommended leaving it unspecified, and making sure the behavior actually differs in some configs (e.g., panic normally, but throw under -race) to avoid users coming to depend on it.
Include a cap
argument for Slice?
Based on my experience looking at unsafe.Pointer code, I think most call sites will omit it, and users can always write Slice(ptr, cap)[:len]
anyway. It wouldn't hurt to include for completeness though as an optional argument, like the make
builtin. We could also omit it for now, and add later if it turns out users would benefit from it.
If cap
is included, it adds more failure cases to design question 1 to consider.
Should the len
arguments be literally the int
type, or should they be flexible integer types, like make
?
@bcmills pointed out flexibility here would make it easier to interoperate with C.size_t
, which I think is compelling for one of the main use cases.
Should unsafe.String
return string
or untyped string?
It's a builtin so it could potentially return untyped string instead. But the Go spec doesn't currently make use of non-constant untyped string values, and there's no type safety issues with requiring users to explicitly convert between string types (contrast with the risk of truncation from needing to convert a C.uint64_t
to int
for the len
parameter on 32-bit CPUs), so probably no need for this.
The current prototype (CL 202080) does not do any argument validation (mostly for simplicity); does not allow a cap
argument; allows arbitrary integer-typed arguments for len
(test/unsafe.go); and returns string
.
@seebs Do you have any examples of those sorts of slice type punning that I can look at? In particular, are the use cases you have in mind always slices of primitive types, or do they ever involve arbitrary user-constructed element types?
If it's just between primitive types, a helper library (doesn't even have to be in the Go standard library) like:
package unsafebytes
import (
"reflect"
"unsafe"
)
func AsUint32(s []byte) []uint32 { var res []uint32; as(unsafe.Pointer(&res), unsafe.Sizeof(res[0]), s); return res }
func AsUint32(s []byte) []uint64 { var res []uint64; as(unsafe.Pointer(&res), unsafe.Sizeof(res[0]), s); return res }
// more AsXXX...
func OfUint32(s []uint32) []byte { return of(unsafe.Pointer(&s), unsafe.Sizeof(s[0])) }
func OfUint64(s []uint64) []byte { return of(unsafe.Pointer(&s), unsafe.Sizeof(s[0])) }
// more OfXXX...
func as(dst unsafe.Pointer, size uintptr, src []byte) {
dstHdr := (*reflect.SliceHeader)(dst)
srcHdr := (*reflect.SliceHeader)(unsafe.Pointer(&src))
if srcHdr.Data&(size-1) != 0 {
panic("pointer is not aligned")
}
// XXX: Check that Len and Cap are multiples of size?
dstHdr.Data = srcHdr.Data
dstHdr.Len = srcHdr.Len / int(size)
dstHdr.Cap = srcHdr.Cap / int(size)
}
func of(src unsafe.Pointer, size uintptr) (dst []byte) {
dstHdr := (*reflect.SliceHeader)(unsafe.Pointer(&dst))
srcHdr := (*reflect.SliceHeader)(src)
dstHdr.Data = srcHdr.Data
dstHdr.Len = srcHdr.Len * int(size)
dstHdr.Cap = srcHdr.Cap * int(size)
return
}
could probably do what you want. For example, to convert from []uint64
to []uint32
, you'd just call unsafebytes.AsUint32(unsafebytes.OfUint64(s))
.
I can believe slices of structs could be useful, but the Go spec doesn't currently guarantee struct field layout or order. I suppose you could use cgo to work around this though (e.g., by declaring a packed struct type in C).
Sometimes arbitrary user-constructed element types. If you look in pilosa's roaring implementation, there's a lot of type-punning where we mmap in data ([]byte), and then treat it as slices of uint64, uint16, or struct{start, end uint16}. (And yes, this breaks the unsafe rules and in some cases we do end up with unaligned pointers, this is going to get fixed in a future file format, etc.)
I was under the impression that Go did guarantee struct order, but not layout, but in practice it appears to be the case that a struct of two uint16 will always pack them in order into 4 bytes.
I think you have the * int(size)
and / int(size)
backwards, although it's been a very long week so I could actually have it backwards, but either way, this being a question that I can have at all is why I would love to have an unsafe.Slice
type which works like unsafe.Pointer, but also handles the len/cap conversions.
Sometimes arbitrary user-constructed element types. If you look in pilosa's roaring implementation, there's a lot of type-punning where we mmap in data ([]byte), and then treat it as slices of uint64, uint16, or struct{start, end uint16}.
Conversion from []byte to []uint64 and []uint16 could be handled by unsafebytes.AsUint64 and unsafebytes.AsUint16.
Conversion to struct{start, end uint16}
could be handled by unsafe.AsUint16, with appropriate indexing. This could be abstracted even as:
func start(s []uint16, i int) uint16 { return s[2*i] }
func end(s []uint16, i int) uint16 { return s[2*i+1] }
I was under the impression that Go did guarantee struct order, but not layout,
No, the status quo from #10014 is still that struct order is formally unspecified.
I think you have the * int(size) and / int(size) backwards
Thanks, you're right. Edited.
but either way, this being a question that I can have at all is why I would love to have an unsafe.Slice type which works like unsafe.Pointer, but also handles the len/cap conversions.
The risk of mixing it up would happen somewhere, and be easily discovered with basic testing. I posted code as a demonstration of the idea, not as production ready code.
Yeah, it's reasonably easy to fix up. I do have another case in some experimental code where I'm converting a chunk of []byte to slices of struct{uint16, uint16, uint32}. That said, the existing unsafe code is adequate for this; if it just isn't practical to provide a convenience feature for it, the existing tools are plenty suitable, while the proposed unsafe.Slice(pointer, len)
function seems like it offers a very unambiguous benefit compared to messing around with slice headers.
I guess I was vaguely aware that strictly speaking the spec never guarantees layout, but the existence of _
members for padding is mentioned in passing in the spec, so it seems vanishingly unlikely to me that a change to that could happen without breaking significant amounts of code.
@mdempsky, I don't think unsafe.String
is even necessary.
unsafe.Slice
cannot be implemented as a library today because there is no way to write a signature parameterized on the element type.
In contrast, unsafe.String
has a monomorphic return type (namely string
), so it is straightforward to implement as a library (albeit a bit tricky to implement _safely_ ā with appropriate mutation checks). See github.com/bcmills/unsafeslice.AsString
.
@seebs, it turns out that an efficient type-punning library is possible to implement under the current generics draft.
The API has a few minor warts: full type inference requires the use of an an out-parameter, and defined types with slices as their _underlying_ type require an explicit pointer conversion. However, all in all it is decently usable. See https://github.com/bcmills/go2go/tree/master/unsafeslice.
@bcmills I saw some discussion in #39186 about the possibility of a compiler reordering writes. Does that have any bearing on whether or not this can be implemented as a library?
What's blocking this from getting into the next go release?
@elichai this is a proposal that hasn't had a decision yet. The next go release is due in about two weeks, and it has been feature-frozen for nearly three months as per the release cycle.
@cbandy, I don't think write-reordering is relevant here ā the unsafe
conversions don't need to read the underlying data, and Go doesn't have anything like C's āstrict aliasingā with regards to type-punning.
@bcmills After looking at uses of reflect.StringHeader
in Google's internal codebase, I'm inclined to agree with you that unsafe.String
is unnecessary. It seems like the predominant use case is conversion between []byte
and string
, like your library provides, and unsafe.Slice
is perfectly adequate for creating the []byte
value.
OK, so it sounds like the proposal on the table is to add only:
unsafe.Slice(ptr *T, len anyIntegerType) []T
(Len serves as both len and cap in the returned slice.)
I have retitled the issue to note this.
unsafe.Slice as in the previous comment looks pretty good.
Thinking about rollout, it might be good to bundle this change with
two other related changes that have been discussed for a long time:
some kind of slice to array conversion (#395),
and some more direct "add unsafe.Pointer + uintptr -> Pointer" operation
(can't find a number for that at the moment).
some more direct "add unsafe.Pointer + uintptr -> Pointer" operation (can't find a number for that at the moment).
There's my 2014 safe pointer arithmetic proposal: https://docs.google.com/a/dempsky.org/document/d/1yyCMzE4YPfsXvnZNjhszaYNqavxHhvbY-OWPqdzZK30/pub
Parts of it have been independently incorporated into Go since then. Eg, the pointer arithmetic rule is one of the unsafe.Pointer safely rules now, and the compiler instrumentation was implemented as checkptr.
Thanks @mdempsky. I created #40481 for unsafe.Add.
Forgot to say: Based on the comments, this seems like a likely accept.
Going to leave this in likely accept until the others are ready, just in case further discussion there leads to some kind of general solution that covers this one too.
With the "anyIntegerType" requirement, under the current generics draft, the signature looks like this:
type integer interface {
type int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, uintptr
}
func Slice[type T interface{}, I integer](p *T, len I) []T
@smasher164 Note that's a very close approximation, but not 100% precise. E.g., make([]int, 1.0)
works fine with make
as a builtin; but as a generic function, it would fail because the type argument would be inferred as float64
due to 1.0
's default type.
(I'm not aware of any use case for passing untyped float/complex constants to make
, but that's what the Go spec today allows, and how I'd define/implement unsafe.Slice
.)
@smasher164 but as a generic function, it would fail because the type argument would be inferred as
float64
I see. And extending the constraint to allow floats wouldn't work either, since it compares the underlying type. This would require you to panic at runtime, instead of just statically rejecting untyped constants.
Maybe that's okay? Since people don't pass floating-point literals into builtins, changing the signature in a future release would only be backwards incompatible in a draconian sense.
Given that unsafe.Slice
does not yet exist, I think it would also be fine to have the compiler reject unsafe.Slice(p, 1.0)
from the outset, and just have that behavior be slightly different from make([]int, 1.0)
until and unless we tighten up make
itself.
Still likely accept; waiting on others.
For what it's worth, I spent a while looking through the corpus for possible uses here, and I was surprised how little it would apply directly. Most of the time the construction of a slice was starting with an unsafe.Pointer and not a *T, so code would look like:
slice := (*[1000]T)(p)[:n]
and it would become
slice := unsafe.Slice((*T)(p), n)
which is a bit better but it's odd to have to write *T when you are trying to produce a []T, especially compared with something like unsafe.Slice([]T, p, n).
I was doing this survey as part of gathering data for #395, and I haven't written up the results in full yet, but I wanted to note this. It does make me wonder whether there's some other conversion we should be thinking about that would target both this issue and #395.
If memory serves, the [:n] form, right now, is likely to provoke checkptr in some cases, if the cap is too large for the space pointed to, so that's another advantage of the unsafe.Slice()
form.
Vaguely reminded of the C++ism of placement-new, maybe we need make([]T, n,[ n,] ptr)
. ... on looking at it i think i'm going to vote against it.
I do sort of like the explicit provision of the type as a parameter, maybe. Consider:
b := make([]byte, 64)
u = unsafe.Slice([]uint64, &b[0], 8)
Options would be (1) this is an error, because the pointer is the wrong type, or (2) this is a valid call and works like type-punning.
Okay, madness: Imagine that we have, effectivtely, three signatures:
unsafe.Slice(ptr, len[, cap]) => uses pointer's type
unsafe.Slice(T, ptr, len[, cap]) => uses specified type
unsafe.Slice(T, slice, len[, cap]) => uses specified type, and the pointer from slice, and verifies that provided len and cap are valid.
I would be inclined to say that T should be specified as the slice type, not the member type, for consistency with make
.
So these would be exactly equivalent:
unsafe.Slice((*T)(p), n)
unsafe.Slice([]T, p, n)
except that in the second case, p could be a pointer of any type, and/or possibly a slice. Having a slice work like a pointer to its first member isn't a completely novel notion -- %p
does it.
it's odd to have to write *T when you are trying to produce a []T, especially compared with something like unsafe.Slice([]T, p, n).
I could imagine an implementation of generics, not too far from the current design draft but with a couple of additions, that would allow callers to elide the slice type when it can be inferred but to specify it otherwise.
Specifically, with a convertible.To[T]
constraint and a inference algorithm that can infer a default T
from convertible.To[T]
, then
package unsafe
func [T any, P convertible.To[*T]] Slice(ptr P, n int) []T
should allow both of the calls in
type X struct { ā¦ }
var u unsafe.Pointer
s1 := unsafe.Slice[X](u, n)
var p *X
s2 := unsafe.Slice(p, n)
On the other hand, I don't think it's unreasonable to have to convert an unsafe.Pointer
to a specific pointer type in order to use it, even for another unsafe
API.
I would not want to infer T
from convertible.To[T]
. But that's OK; I don't think we need convertible
at all here. It would be OK to require people to write their own conversion for the uncommon case of wanting a different type.
I don't think it's unreasonable to have to convert an unsafe.Pointer to a specific pointer type in order to use it,
This wasn't my point. My point was that it's weird to write a conversion to *T when what you are trying to produce is []T.
If you were already holding a *T then there'd be no conversion at all; but in most of the cases I found, you're holding an unsafe.Pointer, so a conversion is needed, but it's a conversion to not quite the type you want.
The conversion from *T
to []T
requires the caller to supply only one item of additional data:
The conversion from unsafe.Pointer
to []T
requires two:
unsafe.Slice
as proposed provides the element type first (via the conversion to *T
), and then the number of elements in the slice (via the call to unsafe.Slice
). However, I could imagine some alternatives.
If array types with run-time lengths were allowed, then the conversion from unsafe.Pointer
could supply both the type and the number of elements as a single conversion:
slice := (*[n]T)(p)[:]
If array types with _indeterminate_ lengths were allowed, then the conversion would look more-or-less the same, but the slice operation would have to supply an explicit length and capacity:
slice := (*[...]T)(p)[:n:n]
With generics, I could imagine spelling [...]T
differently, perhaps as unsafe.Array[T]
. Either way, the semantics would be āan array of T
for which the size is unknown and not checked during slicing and indexing operationsā.
However, with either of those approaches, we would either still need an unsafe.Slice
function or equivalent, or else callers would lose substantial type-safety when they already have a *T
: because they would need to convert the *T
to unsafe.Pointer
, dropping the information about the element type, and then convert the unsafe.Pointer
back to a *[...]T
or *[n]T
, redundantly supplying that same information.
On the other hand, if we add some form of generics to the language it will likely be possible to implement both of those conversions as a library.
I still don't have the data I wanted to present about this, but it doesn't seem settled even so. Moving back to Active.
With unsafe.Slice(ptr *anyType, len anyInteger)
and generics, you could write
func SliceOf[T any](ptr unsafe.Pointer, len int) []T {
return unsafe.Slice((*T)(ptr), len)
}
(This could probably take another parameter to accept something like anyInteger as well)
Conversely, if there were something like unsafe.Slice(T typeArg, ptr unsafe.Pointer, len anyInteger)
func SliceFrom[T any](ptr *T, len int) []T {
return unsafe.Slice(T, unsafe.Pointer(ptr), len)
}
but that would require defining the type parameter of unsafe.Slice in some way that works with generics. The former seems simpler overall even if it often requires a conversion in practice.
@rsc what's the split when you say most code would require the conversionāare we talking roughly 51% or 99% here?
My point was that it's weird to write a conversion to *T when what you are trying to produce is []T.
I don't think it's weird. If I want to read a T
value from the memory location pointed to by an unsafe.Pointer
, I have to convert it to *T
and then dereference the result. There's no direct "load T
value from unsafe.Pointer
" operation.
As I see it, a []T
slice is just a 3-tuple consisting of a *T
pointer and two int
values, length and capacity. Analogously, I don't see why we would add a direct "create []T
slice from unsafe.Pointer
operation." Converting unsafe.Pointer
to *T
and then bundling that *T
along with a length/capacity as a slice are two logically distinct operations.
I think there's a reasonable alternative view that a []T
slice is actually a 3-tuple with an unsafe.Pointer
instead of a *T
pointer. I can see this as slightly cleaner when it comes to empty-but-non-nil slices, where the *T
pointer is non-nil yet doesn't necessarily actually point to a T
variable. But for this to be consistent, I think the length and capacity values should be byte counts, rather than T
-element counts. unsafe.Pointer
is an element-type-less pointer, and byte counts are the only element-type-less measure of memory.
I'll also remind that a direct unsafe.Pointer
-to-[]T
conversion operation was already previously suggested (as an extension of make
, rather than a new unsafe.Slice
function), and I even made the same observation that generally the *T
pointers were converted from unsafe.Pointer
. Yet after consideration, consensus still favored the current proposal.
Edit: I'll caveat though that in that "consensus" comment that I argued I expect *T
pointers to be more common in user code than unsafe.Pointer
due to cgo, and Russ's data reportedly refutes that expectation. However, in the unsafe.Pointer
vs *T
scenarios, we're weighing between these two spellings:
// ptr has type unsafe.Pointer
unsafe.Slice((*T)(ptr), len) // current proposal
unsafe.Slice([]T, ptr, len) // alternate proposal
// ptr has type *T
unsafe.Slice(ptr, len) // current proposal
unsafe.Slice([]T, unsafe.Pointer(ptr), len) // alternate proposal
For unsafe.Pointer
s, I think the proposals are very comparably ergonomic. But for *T
s, the current proposal is considerably simpler.
but it doesn't seem settled even so.
I'd appreciate if folks who were previously settled but now unsettled would affirmatively voice that (preferably with their concerns). There have been several comments since the "likely accept" update, but I only see one that clearly expresses withdrawn support for the unsafe.Slice
proposal. The restāas I read themāseem still favor the proposal, and are just responding to the counter-proposal.
I'm still in favor, but I do think we should be careful to ensure that the call sites can be also expressed as some likely form of generics.
unsafe.Slice((*T)(ptr), len)
has that property ā I can envision a lot of possible type-inference algorithms in which that could be expressed.
However, unsafe.Slice([]T, ā¦)
does not have that property ā I think it's relatively unlikely that the final form of generics will allow intermixing types and values in the run-time argument list.
I also think that unsafe.Slice[T any](*T, int) []T
is the right declaration. I understand why @rsc suggests that it is odd to write *T
when you want a []T
. However, you only call unsafe.Slice
when you have a pointer, and you want to get a slice. Programs that are operating at this level are intimately familiar with the fact that a slice is in effect a bounded pointer. It seems to me to be entirely reasonable that there is an operation that takes a pointer and a bound and returns a slice. I don't think the fact that typical uses will explicitly say *T
will lead to any confusion; the pointer is type *T
, and the resulting bounded pointer is type []T
.
I'm afraid that involving generics in the API here will mean that this won't be added for at least the next year
@elichai The actual implementation of this function would be entirely in the compiler, so generics are not really involved. It's just a way of picturing the declaration.
I also think that
unsafe.Slice[T any](*T, int) []T
is the right declaration. I understand why @rsc suggests that it is odd to write*T
when you want a[]T
. However, you only callunsafe.Slice
when you have a pointer, and you want to get a slice. Programs that are operating at this level are intimately familiar with the fact that a slice is in effect a bounded pointer. It seems to me to be entirely reasonable that there is an operation that takes a pointer and a bound and returns a slice. I don't think the fact that typical uses will explicitly say*T
will lead to any confusion; the pointer is type*T
, and the resulting bounded pointer is type[]T
.
FWIW this is how it's declared/works both in Rust and in C++:
https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html
https://en.cppreference.com/w/cpp/container/array/to_array
Based on Ian's response to my objection as well as the reactions, I retract the objection above.
Based on the discussion above, this is now a likely accept (again). Thanks for bearing with me. I just want to make sure we get this right.
No change in consensus, so accepted.
A bit late for Go 1.16 so milestoning to Go 1.17.
Change https://golang.org/cl/263800 mentions this issue: unsafe: add Slice and String headers
Most helpful comment
I also think that
unsafe.Slice[T any](*T, int) []T
is the right declaration. I understand why @rsc suggests that it is odd to write*T
when you want a[]T
. However, you only callunsafe.Slice
when you have a pointer, and you want to get a slice. Programs that are operating at this level are intimately familiar with the fact that a slice is in effect a bounded pointer. It seems to me to be entirely reasonable that there is an operation that takes a pointer and a bound and returns a slice. I don't think the fact that typical uses will explicitly say*T
will lead to any confusion; the pointer is type*T
, and the resulting bounded pointer is type[]T
.