Crystal: Sparse arrays

Created on 16 Jun 2018  路  12Comments  路  Source: crystal-lang/crystal

I'd like to suggest adding sparse arrays into the core language. The documentation explicitely mentions that a[1_000_000]=123 will not work and is likely an error, however I believe I have a legitimate usecase for such a feature. As resize is now a private method, I'm left with diff.times { a.push Nil } which is horrible on so many levels.

Off the top of my hat I see a couple implementations. First is a kind of a Hash(Int, T) and the second is a kind of a Hash(Int, Array(T)), effectively storing continuous slices indexed by the first element.

Apologies if this was already discussed, a quick search didn't reveal results for me.

question

All 12 comments

This should be a shard, I don't think this is common enough to be put in the stdlib. What's the usecase?

I was hesitant to expand on my particular case right away because I believe this could be helpful in general, but it's no secret, I can explain it.

Basically I have to deal with a bunch of arbitrary JSON structures. So I have to be able to do something like @json.set value, "key_level1", "key_level2", index_level3, "key_level4" and I can't just raise IndexError if level3 array is not large enough. As I've mentioned previously I have to resort to manually extending the array now which is slow and can consume a huge amount of memory just to keep all those nils around.

I'm not sure I entirely understand your explanation, but it sounds like Hash is the right tool for the job here. Do you need the ordering of keys that Array provides or is it purely a K => V mapping?

Actually that's a good question. What I need is a JSON.parse result that I can modify later. As the parser parses arrays as Array what do you propose I should do? Go and manually change all arrays to hashes? Subclass JSON and overload its parser? And that's a reason for me to think sparse arrays should be in the core, not a shard -- so that other stdlib classes would make use of them as needed.

Would you serialize the modified array back to JSON with maybe hundreds of nil items?

An array is usually a list with continuous indices, if you want random indices, that's a job for a hash. You can easily use a custom JSON converter to create a Hash when reading the JSON array. If you're using JSON.mapping, take a look at the converter option.

sparse array == hash

And it doesn't need to be in the standard library. Or if you want that behaviour, you can implement your own sparse array. I think it's easier and faster to write a shard and solve your problem rather than discuss this addition to the standard library.

And yeah, pushing 1_000_000 nils will be slooow... and consume a lot of memory.

So, let's break it down.

  1. Serialization could either involve all the nils, or a hash, or even something else entirely, all depending on circumstances.
  2. "An array is usually a list with continuous indices" -- this makes sense for direct pointer access, where a[i] is just a way to note a memory address. Crystal's Array and Hash classes aren't really a way to deal with memory directly afaiu, so the underlying structure is less relevant than the public interface.
  3. Custom converters is something I didn't really explore yet, so please accept my apologies for lack of knowledge. I will consider this when I need to play with classes JSON.parse creates, thanks for pointing that out.

I want to get your attention away from JSON though. Sparse arrays are useful when you are willing to sacrifice a bit of cycles for a structure that doesn't allocate memory for nils, or even for "zero values", rather implying their presence as appropriate.

Again, yes, I have hands, I can do it for myself, but then other stdlib code will not be able to use it.

A vanilla Hash is not a substitute, because (a) is not a sub/superclass of Array, so isn't type-compatible, (b) isn't conducive to distinguish between a nil/zero value and a value that was never set at all (aside from allocating those nils/zeroes), and of course (c) isn't ordered.

but then other stdlib code will not be able to use it

but the stdlib doesn't never (and will probably never need) a sparse array like the one you are describing. For the very few cases it might need one, a Hash is enough.

That's why I'm suggesting a shard might be better.

But then, I read on IRC that you are builing a generic JSON processor/manager. And it seems you want to be able to modify it by assining any index in the range. Maybe you should allow that as long as the index fits the bounds. And then have a push operation, just like Array.

I mean, Ruby is maybe the only language that allows this "assign any index, we'll fill it will nil" craziness. We are not going to copy that. And we are not going to add a sparse array into the standard library (I never needed one, and their usage is probably very specific).

Okay then, my job is to suggest, yours it to reject, as we say in my country =)

Crystal's Array and Hash classes aren't really a way to deal with memory directly afaiu, so the underlying structure is less relevant than the public interface.

This is incorrect. Array#[i : Int32] is essentially just mapping the internal pointer plus offset i (with a few safety checks). Array#to_unsafe allows to use the underlying pointer directly. That's unsafe, but perfectly valid, as long as it is handled with care.

PS @asterite mind your grammar obsession 馃榾

doesn't never

I realize that. My point is a bit different, however I think this issue isn't the right place to discuss it. Just to tie it back in, a sparse array may override #to_unsafe (raising an exception, or even refusing to compile), keeping the rest of API compatible.

I should say I'm not even arguing in favour of my original proposal at this point, I'll go ahead and close the issue probably.

@straight-shoota touch茅 馃槄

Was this page helpful?
0 / 5 - 0 ratings