Many C libraries have functions like the following:
void fastadd(size_t len, const double *a, const double *b, double *c)
These functions assume that the arrays have contiguous memory layout.
Now for safety, packages like Yeppp
and VML
etc restrict the arguments to Array
. It would be very useful to generalize the fast computation functions there to arbitrary array types with contiguous array layouts (e.g. some SubArray
or array views).
A iscontiguous
function can be useful in such cases
function fastadd(a::StridedArray{Float64}, b::StridedArray{Float64}, c::StridedArray{Float64})
(iscontiguous(a) && iscontiguous(b) && iscontiguous(c)) ||
error("a, b, c must be contiguous")
# invoke the fastadd c-function
end
cc: @timholy @ViralBShah @simonster
I can make a PR if people think it is worth it.
I feel that this should already verified by any implementation of convert(Ptr, AbstractArray)
, so that the extra effort here is not an improvement over duck-typing
There are certain cases where we want to get a ptr, but doesn't require the underlying memory to be contiguous (e.g. gemv
and gemm
only requires the matrix to be strided but not necessarily contiguous).
The iscontiguous
function is used where the ccall
requires the underlying memory to be strictly contiguous.
The BLAS functions now use pointer
to obtain the pointer to the base address, which does not require the array to be contiguous. For example,
a = rand(3, 4)
b = sub(a, 1:2, :)
p = pointer(b) # this is fine, and rightly so
However, supplying b
to a function that relies on contiguous memory block is dangerous.
Hence, being able to get the pointer and testing whether an array is contiguous are two different things, and need to be addressed in different functions.
DenseArrays are necessarily contiguous, right? So the problem is effectively that there are some parameters that make for contiguous SubArrays and some that don't. I see two other possible ways to handle this:
You might actually want to dispatch on contiguity, e.g. if you are picking between an optimized algorithm from an external library that only supports contiguous arrays and pure Julia code that is slower but can handle anything, so I think I'd prefer one of the above two approaches to iscontiguous
.
DenseArray
s only need to be strided, not contiguous.
Adding ContiguousArray
is one approach, but it needs to be an abstract type not a union. Otherwise, the contiguous view
in ArrayViews
can not be incorporated here.
If we take this approach, we may need some revision of the type hierarchy of arrays. While the iscontsguous
function seems to be less elegant, it is much easier to implement.
The current type hierarchy is as follows:
-- AbstractArray
|-- DenseArray
|-- Array
|-- ArrayViews.ContiguousView
|-- ArrayViews.StridedView
|-- SubArray
Then, we have a union type StridedArray
defined as a union of certain subarray types and dense array.
Union(SubArray{T,N,A<:DenseArray{T,N},I<:Tuple{Vararg{Union(Int64,Colon,Range{Int64})}},LD},DenseArray{T,N})
Related to the discussion #10385.
Another way is to use _array traits_, as suggested by @timholy
type Contiguous end
type NotContiguous end
contiguousness(a::AbstractArray) = NotContiguous()
contiguousness(a::Array) = Contiguous()
contiguousness(a::SubArray) = # more sophisticated stuff
function fastsum(a::StridedArray)
_fastsum(a, contiguousness(a))
end
_fastsum(a::StridedArray, ::Contiguous) = # ... call the underlying C functions
iscontiguous
already exists: https://github.com/JuliaLang/julia/blob/8f69307cef149405fe2908bccdb2d721ba28cfe1/base/subarray.jl#L396-L412
Right now it's intended for internal use in stagedfunctions, but we could change it to return Val{true}
or Val{false}
so it could be used in type inference.
Great. I think you may also add
iscontiguous{T,N}(::Type{Array{T,N}}) = true
iscontiguous(::Array) = true
I think this is useful for interfacing with C libraries, as mentioned above.
See also the (slightly more general, perhaps, because it doesn't assume the dimesions are sorted in a particular order?) iscontiguous
function that I wrote for Blosc.jl.
There seem to be two conflicting requirements: something that works on types so you can use it during type inference, and something that you can evaluate at runtime and call out to C libraries. This:
A = rand(3,5)
B1 = sub(A, :, 2:3) # inference: yes, runtime: yes
B2 = sub(A, 1:2, 2:3) # inference: no, runtime: no
B3 = sub(A, 1:3, 2:3) # inference: no, runtime: yes
illustrates the conflict.
Instead of Val{true}
why not a new type as with linear indexing. Then we can make storage itself a trait with a useful hierarchy: Strided
, UnitStrided
( for BLAS matrices), Contiguous
, ...
That way storage does no longer need no longer be reflected in the AbstractArray
hierarchy itself, where it often leads to conflicts.
I think using specific types to indicate different memory layout is a good idea.
Below is an example that demonstrates how this may be used:
# memory layout traits
abstract MemLayout
type Contiguous <: MemLayout end
type UnitStrided <: MemLayout end
type Strided <: MemLayout end
memlayout(a::StridedArray) = Strided
memlayout(a::Array) = Contiguous
memlayout(a::SubArray) = ... # probably need a staged function to figure out
# suppose we have a highly optimized C functions to sum elements
# and we want to port it to Julia, naming it to `fastsum`
fastsum(a::StridedArray{Float64}) = fastsum(a, memlayout(a))
fastsum(a::StridedArray{Float64}, ::Type{Contiguous}) = # call out the C function
# it is possible to write it for general arrays,
# using matrix here to make the illustration simple
function fastsum(a::StridedMatrix{Float64}, ::Type{UnitStrided})
s = 0.0
for i = 1:size(a,2)
s += fastsum(slice(a, :, i), Contiguous)
end
return s
end
# fallbacks to the builtin implementation for other layouts
fastsum{L<:MemLayout}(a::StridedArray, ::Type{L}) = sum(a)
Sure, we could definitely introduce those types. An alternative approach would be Val{:LinearFast}
etc (or Val{:Contiguous}
in this case). I'm not actually sure which is the better approach, but I had the impression that proliferation of types is more of a problem than proliferation of parameters. Others surely know more.
(For the record, the linear indexing traits were introduced before Val
existed, if my memory serves.)
The point of using types would be that you could use them like this:
abstract MemLayout
abstract Strided <: MemLayout
abstract UnitStrided <: Strided
abstract Contiguous <: UnitStrided
(the last one can probably be a type
or immutable
). Then some methods can be defined for all strided storage (e.g. permutedims
), others only for unit strided storage (BLAS methods), and yet others only for contiguous storage.
Works for me.
Still relevant/necessary?
I think it is. It would be good to have a storage/memlayout trait, rather than trying to reflect this property in the type hierarchy, which makes it less flexible and more difficult to extend (as in the SubArray
example).
Yes — the StridedArray
typealias is still a major wart here. I'd like to see that replaced with a more trait-like interface.
For anyone who wants to jump on this, it mostly consists of replacing
qr(A::StridedArray) = ...
with
qr(A::AbstractArray) = _qr(MemLayout(A), A)
_qr(::Strided, A) = ...
once the MemLayout trait is defined (for which one can mimic the IndexStyle
trait).
The first question to answer here is if adding this is a breaking change in any way. If not, we should move this off of the 1.0 milestone. If not, we should figure out the minimal change we'd need to allow this in the future without breaking anything.
Still unclear if there's anything breaking here and no one has chimed in since I asked.
I'm not convinced that traits are needed here. If it is a DenseArray
or a subarray thereof, then it should have strides, and from strides
we can implement functions to check whether it is contiguous (and if so in what dimension order). That approach would not be breaking.
(And if traits are needed, they can be added on top of that. A trait would not be breaking either.)
If it is a
DenseArray
Well, that's the problem. SubArray
only defines strides
for some of its parameterizations and is not a DenseArray
in general. Similar issues exist for Sparse arrays and views thereof.
That said, I don't think this is breaking, but it can be hard to think through all the ramifications of changing informal interfaces that get extended by user code. This is really largely an addition of a new feature — the ability to ask an array about its storage type — and a _large_ refactoring of functions in Base to take advantage of that fact.
We can follow the pattern that we use for IndexStyle
optimizations (as Tim proposes above for qr
via the internal _qr
). For some functions, we'd be removing the provided svdfact(::StridedMatrix)
in favor of a more general svdfact(::AbstractMatrix)
, which I suppose could affect the precedence of some pathological and type-pirating user-code (e.g., defining svdfact(::Any)
).
Type-Piracy absolutely can and should be excluded from stability guarantees. Am I missing any other places where this would be breaking?
@stevengj, you want to have AbstractArray types that endow arrays with all kinds of new properties: you want to reshape them (ReshapedArray
and RowVector
), modify their indices (OffsetArray
), reinterpret/map their values (ReinterpretedArray
or MappedArray
), give their axes names (AxisArray
and similar), make their indices periodic (FFTViews
), and so on. These wrappers can often work just fine for either dense/contiguous or non-dense/non-contiguous parent arrays. Consequently, none of these "behavior-endowing" AbstractArrays can be a subtype of DenseArray
. So for most purposes, if you need to know whether data are packed contiguously, it's almost completely useless to rely on inheritance. This is a corner of the design space that simply requires traits.
@StefanKarpinski, the breaking part is that in an ideal world we'd simultaneously ditch StridedArray
from Julia 1.0.
Eh, it's a fairly harmless typealias. We can always deprecate it alongside the new feature on a minor release (even if it won't be completely removed until 2.0). The important part isn't that it goes away, the important part is that folks start using the alternative. And a deprecation alone is sufficient for that.
Most helpful comment
Eh, it's a fairly harmless typealias. We can always deprecate it alongside the new feature on a minor release (even if it won't be completely removed until 2.0). The important part isn't that it goes away, the important part is that folks start using the alternative. And a deprecation alone is sufficient for that.