Since Bytes is just an alias for Slice(UInt8), it lacks a lot of methods you'd usually want for byte strings / byte arrays. One of these is concatenation. The following doesn't work:
bytes1 = Bytes[0x43, 0x72, 0x79, 0x73]
bytes2 = Bytes[0x74, 0x61, 0x6c, 0x21]
combined = bytes1 + bytes2
This of course goes for repeating using *, too.
Not being able to concatenate bytes makes handling binary data a nightmare (and even makes some things impossible without using .to_unsafe), so I'd say it's a good feature to implement.
This is by design. Slice represents a pointer and a size. It's very often just a window into memory that you don't own, so it's impossible to resize or append to slices. You could implement + and * because they return new slices, i.e. they copy the data onto the heap, but that really goes against the idea of slices in many ways.
Can you explain your usecase more? Where you find yourself using to_unsafe? I haven't found myself ever reaching for these methods. I'm almost sure what you actually want to do, instead of implementing + and * for slice, is to use IO.
@obskyr See my extended answer on SO for an example of how to use an IO for this.
Sure! I posted a question on SO about it, but my example seems to have been simple enough as to cause confusion.
Basically, I've written a decoder for a binary data format. The data is split up into chunks of different types, and there's no way of knowing how long the data will be in advance. This means I have to read the data progressively, adding more to the end of the data I already have as I go along. What I'd like to do is soemthing along the lines of this:
def decode(file)
# No idea how many bytes we're gonna end up with,
# so I can't set this to a known size.
data = Bytes.new(0) # ...Or just "Bytes.new" or something.
chunk_type = data.read_byte.not_nil!
until chunk_type == 0
case chunk_type
when 1
data += read_chunk_type_1(file)
when 2
data += read_chunk_type_2(file)
end
chunk_type = data.read_byte.not_nil!
end
end
def read_chunk_type_1(file)
length = file.read_byte.not_nil!
data = Bytes.new(length)
file.read(data)
return data
end
# ...
However, since bytes can't be concatenated, what I've instead been doing is using Array(UInt8) and eventually calling Bytes.new(data.to_unsafe, data.size * sizeof(UInt8)) on that.
Is using IO the correct way to go about this, perhaps?
Yeah, in that case I'd use IO::Memory and IO.copy to build up the data.
def decode(io)
data = IO::Memory.new
loop do
chunk_type = data.read_byte
case chunk_type
when 1
read_chunk_type_1(from: io, to: data)
when 2
# ...
when nil
raise "Unexpected EOF in chunk type header"
when 0
break
end
end
end
def read_chunk_type_1(from, to)
length = file.read_byte
raise "Unexpected EOF reading chunk type 1 length header" unless length
copied_bytes = IO.copy(from, to, length)
raise "Unexpected EOF in chunk type 1" unless copied_bytes == length
end
or similar
you can use data.to_slice to get a slice out of an IO::Memory. Also even in your above example, you probably want to use file.read_fully.
You're right, I should indeed be using read_fully. Based on the name I assumed it'd read the entire file, but I suppose that's not the case.
The only thing I'm missing now is being able to do things like:
bytes_to_repeat = Bytes.new(length)
file.read(bytes_to_repeat)
return bytes_to_repeat * times_to_repeat
But I supppose I'll just have to do the slightly more verbose:
bytes_to_repeat = Bytes.new(length)
file.read(bytes_to_repeat)
times_to_repeat.times do
IO.copy(bytes_to_repeat, io)
end
@obskyr I've never come across wanting to send the same data multiple times, it seems pretty wasteful and an edge case so having it be a little bit more code is fine.
You probably want io.write instead of IO.copy though, since bytes_to_repeat isn't an IO.
I've never come across wanting to send the same data multiple times
That's the point - it's a file format that uses RLE here and there, so when decoding (not encoding to send/store) it you have to repeat strings every now and then.
Ah, I see. Yeah, your best bet is to read the part which is RLE encoded into a side-buffer and write it back out however many times is needed. It's a bit more code but it's probably pretty rare. Especially considering that RLE is hardly state of the art in compression these days.
Can I close this issue?
You can't close me, I quit!
Yes, my use case for this is solved, at least. I do think the documentation could be a bit more informative about this - perhaps the docs for either IO#read or Bytes could link to IO::Memory?
I think it'd be possible to mention IO::Memory in the doc for IO. However, covering specific usecases and common workflows sounds like something for more long-form tutorials myself.
I think this might be fairly closely tied to reading binary data in general, which goes beyond a specific use case. Maybe, maybe not - the info I needed wasn't in the places I looked, at least. At least this conversation will show up on Google for "crystal read binary" now!
I think the answer is StackOverflow. Now if someone stumbles upon this problem, even using your exact same words ("concatenate bytes") they will find the answer.
I mean, whenever I have a problem or doubt I ask it to Google and I get an answer, usually in StackOverflow, not language docs (well, sometimes language docs, like when I search for a specific type or method). What's good about StackOverflow is that it's like a community wiki and it can grow without having to modify the language source code. So I prefer that, and over the years it has proven to be the way to document these things and relations.
That's true. But the API need improvements as well, obviously. =)
Most helpful comment
That's true. But the API need improvements as well, obviously. =)