While working on a terminfo parsing library, where I manipulate Slice of bytes (Bytes
type), I needed to convert digits to an actual number.
I realized that all the to_<num_type>
conversion methods (like to_i64
, to_u8
, to_f
, ..) are in String
, and I needed to allocate a String
just to convert the bytes to a number.
I think thoses methods should be available in Slice
too. The number conversion methods don't need anything from String
as they directly work with a pointer and a size. They can almost directly be copy/pasted to Slice
!
I have a working branch with this change, I can make a PR when it's accepted!
Note: the only problem I see here is that those methods will now exists for all kind of Slice, not only Slice(UInt8)
.. But maybe it's ok? or maybe we can add a macro check inside the to_<num_type>
methods that raise when T != UInt8
?
Another possibility would be to make Bytes
a new type (not just an alias) that inherits from Slice(UInt8)
so it's easier to add methods that works specifically on bytes.
WDYT?
Crystal strings are always UTF-8 encoded, so when it comes to number conversion they work in ASCII so to say. You cannot assume that any Slice(UInt8)
is a valid ASCII encoded sequence or that the "right" conversion between "a number" and a "byte sequence" is interpreting that byte sequence as ASCII, there are many more ways to encode a number to a byte sequence.
We should rather try to find an allocation free way to turn a Bytes
into a String
assuming the Bytes
is valid ASCII or UTF-8 already. Or we could support ASCII number conversion as part of the IO::ByteFormat
interface I guess. As a last resort I would extract the number conversion methods to static methods that work with a Bytes
, but remain on String
or a new util class whose name makes it clear that it's about string based number conversion.
I agree that String.unsafe_from_slice
is probably a better solution.
Unfortunately, it can't be implemented because strings are layed out differently from every other class. Strings don't have a pointer to their data, they have their data directly after their class in memory, so it's impossible to turn a Bytes into a String without allocating.
I think that String.to_i(slice, ...)
is the best interface thats workable.
I wouldn't mind having to_i
, etc., directly in Slice
.
Having to_i
on Slice
is only logic for Slice(UInt8)
, right @asterite ? What do you think should be done for a Slice(Person)
?
Give an error. We already have methods on slice that only work for byte slices. And byte slices are the most common slice type (in fact, but I would happily change Slice(T) to a non generic Bytes type and remove the Slice type from the std, or maybe keep the two but return Bytes instead of Slice(UInt8) in all cases).
Ok, that's what I thought too. And I was also thinking about a Bytes
type which inherits from Slice(UInt8)
and would have Bytes-specific methods like to_*
methods.
Can't inherit from struct. They would be two different types with a few similar methods.
oh right, then maybe make it as a wrapper like:
struct Bytes
def initialize(@raw : Slice(UInt8))
end
forward_missing_to @raw
def to_i(...)
# ...
end
end
A little copying is better than a little dependency. I won't get tired of saying that.
just put it on string
@asterite by "a little copying" you mean the 550 lines of Slice
?
It's bit less than that, because there's hexdump and others in Slice that should only belong to Bytes.
Maybe adding such methods to Slice but only having them compile for a slide of bytes isn't that bad, though.
I guess this could be a nice use case for extension methods as described in #6438
it's easy enough to throw a {% raise unless T == UInt8 %}
in the aforementioned methods and you'll hardly be able to tell the difference
@jhass what do you mean in this paragraph:
You cannot assume that any Slice(UInt8) is a valid ASCII encoded sequence or that the "right" conversion between "a number" and a "byte sequence" is interpreting that byte sequence as ASCII, there are many more ways to encode a number to a byte sequence.
By there are many more ways to encode a number to a byte sequence
? What other ways? (I'm probably just not seeing it)
Note that in this case to_<num-type>
methods works with raw bytes directly, interpreting each byte as an ASCII char.
@bew A random example would be BCD, the main point is that it would be confusing for to_i
to be to_i_assuming_this_is_really_ascii_here
.
Most helpful comment
Crystal strings are always UTF-8 encoded, so when it comes to number conversion they work in ASCII so to say. You cannot assume that any
Slice(UInt8)
is a valid ASCII encoded sequence or that the "right" conversion between "a number" and a "byte sequence" is interpreting that byte sequence as ASCII, there are many more ways to encode a number to a byte sequence.We should rather try to find an allocation free way to turn a
Bytes
into aString
assuming theBytes
is valid ASCII or UTF-8 already. Or we could support ASCII number conversion as part of theIO::ByteFormat
interface I guess. As a last resort I would extract the number conversion methods to static methods that work with aBytes
, but remain onString
or a new util class whose name makes it clear that it's about string based number conversion.