A vector-based list is a list that store elements in continuous area.
Vector can improve cache performane and have smaller constant factor.
Whether it's faster than list depends on the operation pattern and ratio. Vector is faster at accessing but slower at inserting than List.
With properly reserve, Vector can be much faster than List.
|LANG|LIB|
|----|---|
|C++|std::vector|
|Rust|Vec|
|Vector|List.chpl|
|------|----|
|Better cache performance| |
|Friendly for compiler optimization and prefetch| |
|Expensive copy penalty for appending|No copy penalty for appending|
|Fast random access|Slower random access since calling log2|
After digging into the source of List.chpl, I don't think it's a good idea to put them together because they have a very different logic.
However, an identical interface is possible.
same as List except an addtional procedure proc reserve(size: int) {}
module Vector {
record vector {
param parSafe = false;
type eltType;
var capacity: int;
var size: int;
/*
Request a change in capacity
*/
proc requestCapacity(size: int) {}
proc init=(other: list(this.type.eltType, ?p)) {}
}
}
@Rapiz1 I would suggest having methods like
back : return the last elementpop_front: removes the first element.resize: To resize the vector to given size.We can take reference from other languages like C++.
Is there an inherent reason not to re-use the current List interface for uniformity? (ultimately, I also wonder whether we simply want a single List type that takes a param saying which implementation to use under the covers, but that could be a separate refactoring step).
Is there an inherent reason not to re-use the current
Listinterface for uniformity? (ultimately, I also wonder whether we simply want a singleListtype that takes aparamsaying which implementation to use under the covers, but that could be a separate refactoring step).
@bradcray I think we can use the same interface as List. I created the issue as a start point and it's not exhausted so now it's actually a subset of List. I will later look into the source of List more carefully to find out whether we can put two things together.
init() will need a parSafe argument, and this() will need an idx argument.
@Rapiz1 I would suggest having methods like
* `back` : return the last element * `pop_front`: removes the first element. * `resize`: To resize the vector to given size.We can take reference from other languages like C++.
I agree with @bradcray it would be great if we can have a unified interface for Vector and List. I updated the issue and now the interface is the same with List with an additional procedure reserve.
back: We have last from Listpop_front: pop(0) will do that. I think it's not necessary.resize: I'm not against adding this. However, we have reserve to adjust the capacity so this is not very necessary. In terms of simplicity and unification, I think it could be omitted.Is there an inherent reason not to re-use the current
Listinterface for uniformity? (ultimately, I also wonder whether we simply want a singleListtype that takes aparamsaying which implementation to use under the covers, but that could be a separate refactoring step).
@bradcray I didn't think of a good way to put them together without messing up code. They have a very different logic.
My thoughts:
After digging into the source of List.chpl, I don't think it's a good idea to put them together because they have a very different logic.
This is not surprising and not a big deal for me.
same as List except an addtional procedure proc reserve(size: int) {}
We already have a requestCapacity that only applies to associative domains. Not to say that we are sure that's the name for that method going forward, but as long as it is there, we should use that name.
resize: I'm not against adding this. However, we have reserve to adjust the capacity so this is not very necessary. In terms of simplicity and unification, I think it could be omitted.
resize if added might have a different meaning than requestCapacity. And that is what happens if the size given is the less than the size of the vector.
I don't think we need a second method only for difference. However, it maybe worthwhile to think what happens in such a case. Do we; (1) truncate from the end or beginning, (2) do we give throw an error, (3) something else?
If we're inheriting the name from requestCapacity it wouldn't be too weird to inherit the behavior from that, too. (Again, I am not sure whether that behavior is staying for good, but no need to create inconsistency)
Also as a general thought I am wondering whether it'd make sense to have some functions on list that converts a list to a vector. There maybe scenarios where you create a list of data in a very dynamic fashion, but in a subsequent phase of the logic, that data structure remains static and it is only there for reading. Arguably, this question can apply to any other pairs of data structures. Moreover, such support can come in in a separate PR.
That also has some sub-questions: if vector is going to end up a package or mason module, it shouldn't probably poke around the internals of list, but if it is also in standard modules, it wouldn't be as bad an idea to do that if copying the data from the list to the new vector gets significantly faster.
@e-kayrakli I think it's great to inherit the name requestCapacity and also the behavior when requesting a capacity smaller than needed.
Given that we can create list from array, I think It is reasonable to convert list into vector or vice versa. We can support creating vector from list without knowing the internals of list, just iterating the elements and pushing them into vector should work.
However, if we also want to support creating list from vector, there apparently needs to be some modification to List.chpl. And I agree that this could come in as a separate PR.
I also wonder whether we simply want a single List type that takes a param saying which implementation to use under the covers, but that could be a separate refactoring step
I didn't think of a good way to put them together without messing up code. They have a very different logic.
I didn't mean that the implementation code would need to be unified in any way (and agree that the two implementations are very different), but still think from a user's perspective that it _might_ be attractive to say something like List(impl.linked) vs. List(impl.vector) vs. List(impl.multivector) rather than List vs. Vector vs. MultiVector if they all support the same operations and just result in different implementations. I.e., primarily, I want a List, and then as a secondary concern, I might want to steer its implementation in a specific direction. It seems to me that the code could be kept fairly separate via good use of nested records and where clauses.
But as I said at the outset, I think of this as a secondary (if ever) step and think that the first thing to do is to focus on getting the standalone implementation working.
I think Java has done something similar. For example, they have a list interface. This is implemented by the classes of ArrayList, LinkedList, Vector, and Stack. We can something like that and give the user the flexibility to select.
The vector concept as used by todays HPCs, i.e. the data used in X86-64m Sparc and MIPS SIMDs, and the newer ARM (X64FX?) , RISC-V and NEC SX-Aurora Tsubasa, has been around, albiet in a less struct sense, for 40+ years, Why the C++ STL came along later and used the name vector for its data structure of that name was beyond me.
Your datatype and the C++ STL's vector are far more flexible than the HPC concept of a vector and cannot really be used for that data structure as a HPC vector has lots of restrictions on it which your break. That breakage is not meant to imply any criticism of your implementation. By any data structure that can pop the first element off of itself is not a vector in my book.
It will also mean that Chapel, arguably the best HPC language available today, will not be able to use the word vector to map to a hardware level vector. You loose a massive marketing edge and Chapel needs all of such an edge that it can get at this stage of its life if it is going to attack the HPC programming spare dominated by Fortran and to some extent by C++.
It will also mean that Chapel, arguably the best HPC language available today, will not be able to use the word vector to map to a hardware level vector. You loose a massive marketing edge and Chapel needs all of such an edge that it can get at this stage of its life if it is going to attack the HPC programming spare dominated by Fortran and to some extent by C++.
I agree that the term "vector" is overloaded. However, I doubt that the word can be used in the same context between the two uses. In other words, I wouldn't expect the user to define a "vector" object that is meant to fit in a vector register and can be used as such.
An example of the word "vector" used in Chapel today is the vectorizeOnly iterator. I am not sure whether that interface is stable. It may be confusing, but I think that benefits of using a well-established name for the data structure outweighs that confusion.
Damian's point is another reason that if Vector is just another implementation of List, I'd prefer to have a single List type with multiple implementations under the hood that a user can choose between rather than a group of types that are just different names and implementations for List.
(As a bit of additional context for Damian's comment, I believe that he has been stewing on a feature request that we support an intrinsic vector datatype in the "vectorization" sense of the term. My rough understanding of the idea is that it would support the kinds of operations that are available via vector instructions on modern processors as a more readable, portable, and dependable way to write vectorizing code than to (a) drop down to assembly or (b) cross fingers hoping that the compiler would automatically vectorize things. So my understanding is that he'd prefer to reserve the name vector for that than for a list-like thing that happened to be implemented using dense/consecutive storage and I think that's a reasonable point of view).
I don't know that the name is already taken. Changing the name is easy so I would just follow Vector in my development and change it after the community made the decision.
As for the List(impl.vector) style initialization, I think the idea is good to have simple and clear way to change the implementation. However, I wonder if that will significantly add to the trouble that we come into to keep the two or more implementation synchronized? Most of the interfaces are identical but they could also have their own different procedures(e.g. requestCapacity for Vector). And even for the same interface, there could be some slight differences. For now, I'm not sure how the way of achieving multiple implementations under one class would look like and whether this could be a problem.
Another question is that if we do that,can we avoid a breaking change to the existing List interface?
There can be small variations in the interface where one implementation can support an operation whereas another doesn't. We also have similar things in the array interface. The "wrapper" list record would have the union of operations as its interface. The actual list implementations can just do a compilerError for those operations that they don't support.
At least that's what we do for the array interface.
I just want to re-re-emphasize that I'd focus on getting a working and complete version of this type done and committed first before worrying at all about trying to unify with List. And I agree with @Rapiz1's comment that we can call it whatever we want in the code in the short-term as long as we finalize that decision before it's released. So I wouldn't let either of those concerns hold up progress on the implementation. But I think they're fair things to discuss under this interface design issue.
With all the guidance, I made a wrapper and an example of List(int, impl.vector):
https://github.com/Rapiz1/gsoc2020-chapel-data-structures/blob/wrapper/Vector/example/listng.chpl
https://github.com/Rapiz1/gsoc2020-chapel-data-structures/blob/wrapper/Vector/src/ListNG.chpl
Most helpful comment
Damian's point is another reason that if
Vectoris just another implementation ofList, I'd prefer to have a singleListtype with multiple implementations under the hood that a user can choose between rather than a group of types that are just different names and implementations forList.