I originally posed this on the crystal google groups forum and it was suggested I raise this as an issue.
https://groups.google.com/forum/#!topic/crystal-lang/Iiqgrgvca8M
In trying to use bit arrays (0.20.5) I've noticed a number of bugs and inconsistencies
in expected behavior compared to regular arrays which use the same methods.
a = BitArray.new(10) --> [false, false, false, false, false, false, false, false, false, false]
a[0]= true --> [true, false, false, false, false, false, false, false, false, false]
b = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
b.first(5) --> [0, 1, 2, 3, 4]
a.first(5) --> [true, false, false, false, false]
b.last(5) --> [5, 6, 7, 8, 9]
a.last(5) --> Error in sozcore2test.cr:107: wrong number of arguments for 'BitArray#last' (given 1, expected 0)
Overloads are:
- Indexable(T)#last()
- Indexable(T)#last(&block)
Then there are indexing issues using bit arrays.
b[5..-1] --> [5, 6, 7, 8, 9]
a[5..-1] --> Error in sozcore2test.cr:109: no overload matches 'BitArray#[]' with type Range(Int32, Int32)
Overloads are:
- Indexable(T)#[](index : Int)
According to BitArray API, https://crystal-lang.org/api/0.20.5/BitArray.html
these are inherited from the appropriate modules that work with Arrays.
Are these problems with BitArrays just a consequence of the current age of the language, and planned
to be fixed, or are these the intended different behavior compared to other Arrays?
They definitely look like bugs :).
From wikipedia:
A bit array (also known as bitmap, bitset, bit string, or bit vector) is an array data structure that compactly stores bits. It can be used to implement a simple set data structure.
That is, the use case of a bit array is to count things, to know if a given object is inside a set. For that, you consider each position in the bit array to be a different object of interest. For example the compiler uses this to determine if all types are covered by method overloads.
With this in mind, you normally don't want to get a piece of a bit array. You don't use it like an array. In Java there's no such functionality. Neither in C#. It's not that this was overlooked, it's just that the use case of a BitArray is not that of an Array.
That said, we could implement #[]. The question is whether this allocates new memory for the data or it still points to the old one.
@jzakiya It would be interesting to know what's your use case here and why you need this slicing functionality.
I also changed the title: there's no bug or inconsistency here, we are just missing an #[] method.
(with what I said above, maybe even BitArray shouldn't be Enumerable at all)
I think you need to see and appreciate this issue from the perspective of a user.
First, you can not restrict the use of any resource that is provided to just what you think it will/should be used for. If you create it people will try to use if for things you might have never considered.
A BitArray is presented as an array, which is a collection, which according to it's own documentation inherits a host of methods from other modules, including Enumerables. If so, it should work with them.
It's not logical that you can do: bitary.first(5) but not bitary.last(5).
I do a lot of numerical heavy applications. I use arrays to represents data that is boolean in nature, i.e, 1|0, true|false. I can significantly reduce memory usage by using BitArrays, over arrays of Ints, where I'm more concerned with memory reduction versus speed.
From a user's perspective, or at least MY USER PERSPECTIVE, it is a bug, and certainly inconsistent, to not be able to manipulate a BitArray like an Array, especially when the documentation states they inherit methods from the same modules. Either create accurate documentation to explain how the resource actually behaves or make the resource behavior match its documentation.
That's interesting, maybe we could rename BitArray to Bitmap to avoid these kinds of false expectations. The name could be misleading if the user doesn't know the context discussed here and in https://en.wikipedia.org/wiki/Bit_array.
So why don't we just:
Indexable, unless we provide sound implementations for all methods in that module.If we can implement the full array interface, is there any reason why not?
Bitmap can be confusing too (images). I think I'd choose BitSet instead, and/or document how specific it is (i.e. not to be considered an Array).
This appears to have turned into a feature, as we don't have the #[] in BitArray.
At the top of src/bit_array.cr, we can see:
BitArrayincludes all the methods inEnumerable
So this either is a documentation issue, where we change Enumerable to Indexable, or ensure that BitArray should indeed include all Enumerable methods.
@miketheman #[] methods aren't defined on Enumerable. Neither are ranged #[] methods defined on Indexable.
FWIW I personally think that BitSet would be a great name...