Crystal: UUID slice has different value when passed to a method

Created on 18 Jun 2019  路  16Comments  路  Source: crystal-lang/crystal

From the conversation https://gitter.im/crystal-lang/crystal?at=5d07a49b2313502d385b7a3c

https://play.crystal-lang.org/#/r/730d

require "uuid"

def convert(value : ::UUID)
  value.to_slice
end

uuid = UUID.new "24b408d0-5f08-43d3-9027-c0d4ec097e3a"

uuid.to_slice # => Bytes[36, 180, 8, 208, 95, 8, 67, 211, 144, 39, 192, 212, 236, 9, 126, 58]
convert uuid # => Bytes[36, 180, 8, 208, 95, 8, 67, 211, 0, 0, 0, 0, 16, 0, 0, 0]

I can't imagine this being the desired behavior; but if it is, it should probably be documented somewhere.

bug stdlib

Most helpful comment

@asterite Maybe it should be named #to_unsafe then?

All 16 comments

oof

Isolated behavior is moving a stack allocated StaticArray, causing it to point to invalid memory when accessing its data.

def convert(v : StaticArray(UInt8, 16))
  v.to_slice
end

a = StaticArray(UInt8, 16).new(0)
b = convert(a)
p!(a)
p!(b)
a # => StaticArray[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
b # => Bytes[22, 192, 119, 204, 115, 85, 0, 0, 136, 166, 125, 204, 115, 85, 0, 0]

This can be fixed by changing @bytes to Slice(UInt8)/Bytes instead of a stack allocated StaticArray, or making UUID a class. Don't know if that's appropriate here. Maybe this is something wrong with StaticArray, but I don't know its implementation details.

If this is important for you, you can just copy/paste the UUID implementation as your own (its a single file) and make either change I mentioned above.

Yup, StaticArray#to_slice can't be returned from a method... or well, it can with undefined behavior.

Some questions and thoughs:

  • Why is there such a low-level method on UUID?
  • If needed, I think UUID should be implemented with a backing read-only Slice(UInt8). Then that would be heap-allocated and it would be fine (though it wouldn't be that efficient)
  • Otherwise to_slice can just return a copy of the static array... it's less efficient, but it works and there's really no other way around it

There was some talk about efficiency during implementation. Before UUID was added to the stdlib I was creating them and mutating them but I am not sure that should be a feature in the stdlib implementation of UUID.

Slice instances must point to heap memory not stack memory or they're unsafe. So I'd say StaticArray#to_slice must copy.

It'd be useful to audit users of that method and see what they need to convert to a slice for. If it's just slice's extra API then StaticArray has an API problem.

The to_slice method is used all over the place in the internals of the stdlib: allocate a stack buffer, read to it (via that method), etc. It's just that it's incorrect to return that slice from a method.

@asterite Maybe it should be named #to_unsafe then?

Maybe.

#to_unsafe typically returns a pointer. #to_slice should copy, #to_unsafe_slice could create a slice of the stack.

Actually, let's fix UUID. Let's document to_slice and move on. This problem is not common at all. It's not common to implement a data structure banked by a static array and wanting to return a slice or pointer to it (I still don't know what's the use case for Uuid is)

Unsafe methods should be obviously unsafe. Segfaults are surprising behaviour. Compromising memory safety is to me not an option.

StaticArray is kind of low level and unsafe.

I agree with @RX14 on this, however I'm not sure about doing a copy with to_slice.
Usually when you call to_slice you sort of know it's an inexpensive call, and doing an allocation there would not be "inexpensive"..
I would prefer to force the user to be explicit about the copy with the need to call .to_unsafe_slice.dup.

@asterite the UUID type is meant to create/read a UUID of varying flavors (random, md5, sha1, ...), to query it (there are many variants and versions), all that encoded as a 16-bytes / 128-bit struct, instead of having an external 32-bytes String allocation. UUIDs can also be represented in different formats (bytes, string with or without dashes, urn, ...). Having a type that can generate, parse, validate, format and compare all these forms transparently is a great addition to the stdlib.

@ysbaddaden I understand this. I think avoiding memory allocation is good.

My thoughts: Let's remove to_slice. Let's instead introduce a method to return a StaticArray if needed. Then the user can call that method (gets a copy), call to_slice on it and then it can be passed to an IO. Then there's no problem.

We should document that to_slice on StaticArray returns a Slice that points to the underlying StaticArray and that one shouldn't return that from methods, and show an example of how to use that when reading/writing IO.

The thing is, let's not change the API every time someones stumbles upon a problem.

Specially if the problem is on our side in the std and it's easy to fix.

Was this page helpful?
0 / 5 - 0 ratings