Julia: Introduce iscontiguous function or trait

Created on 19 Apr 2015  Â·  29Comments  Â·  Source: JuliaLang/julia

Many C libraries have functions like the following:

void fastadd(size_t len, const double *a, const double *b, double *c)

These functions assume that the arrays have contiguous memory layout.

Now for safety, packages like Yeppp and VML etc restrict the arguments to Array. It would be very useful to generalize the fast computation functions there to arbitrary array types with contiguous array layouts (e.g. some SubArray or array views).

A iscontiguous function can be useful in such cases

function fastadd(a::StridedArray{Float64}, b::StridedArray{Float64}, c::StridedArray{Float64})
    (iscontiguous(a) && iscontiguous(b) && iscontiguous(c)) || 
        error("a, b, c must be contiguous")
    # invoke the fastadd c-function
end

cc: @timholy @ViralBShah @simonster

arrays

Most helpful comment

Eh, it's a fairly harmless typealias. We can always deprecate it alongside the new feature on a minor release (even if it won't be completely removed until 2.0). The important part isn't that it goes away, the important part is that folks start using the alternative. And a deprecation alone is sufficient for that.

All 29 comments

I can make a PR if people think it is worth it.

I feel that this should already verified by any implementation of convert(Ptr, AbstractArray), so that the extra effort here is not an improvement over duck-typing

There are certain cases where we want to get a ptr, but doesn't require the underlying memory to be contiguous (e.g. gemv and gemm only requires the matrix to be strided but not necessarily contiguous).

The iscontiguous function is used where the ccall requires the underlying memory to be strictly contiguous.

The BLAS functions now use pointer to obtain the pointer to the base address, which does not require the array to be contiguous. For example,

a = rand(3, 4)
b = sub(a, 1:2, :)
p = pointer(b)   # this is fine, and rightly so

However, supplying b to a function that relies on contiguous memory block is dangerous.

Hence, being able to get the pointer and testing whether an array is contiguous are two different things, and need to be addressed in different functions.

DenseArrays are necessarily contiguous, right? So the problem is effectively that there are some parameters that make for contiguous SubArrays and some that don't. I see two other possible ways to handle this:

  • Split contiguous SubArrays and non-contiguous SubArrays into two separate types, so that we can make contiguous SubArrays <: DenseArray and then use DenseArray as intended. I think I would prefer this, although I'm not sure I'm imagining all the complications.
  • Add a ContiguousArray (or something) union type. StridedArray is also a union type, so I'm not sure this makes things less flexible than they are now. This requires being able to define the set of SubArrays that are contiguous within the type system (although this is trivial if we were to add another parameter during SubArray construction). This is kind of ugly, because ContiguousArray doesn't really mean anything different from DenseArray.

You might actually want to dispatch on contiguity, e.g. if you are picking between an optimized algorithm from an external library that only supports contiguous arrays and pure Julia code that is slower but can handle anything, so I think I'd prefer one of the above two approaches to iscontiguous.

DenseArrays only need to be strided, not contiguous.

Adding ContiguousArray is one approach, but it needs to be an abstract type not a union. Otherwise, the contiguous view in ArrayViews can not be incorporated here.

If we take this approach, we may need some revision of the type hierarchy of arrays. While the iscontsguous function seems to be less elegant, it is much easier to implement.

The current type hierarchy is as follows:

-- AbstractArray
   |-- DenseArray
        |-- Array
        |-- ArrayViews.ContiguousView
        |-- ArrayViews.StridedView
   |-- SubArray

Then, we have a union type StridedArray defined as a union of certain subarray types and dense array.

Union(SubArray{T,N,A<:DenseArray{T,N},I<:Tuple{Vararg{Union(Int64,Colon,Range{Int64})}},LD},DenseArray{T,N})

Related to the discussion #10385.

Another way is to use _array traits_, as suggested by @timholy

type Contiguous end
type NotContiguous end

contiguousness(a::AbstractArray) = NotContiguous()
contiguousness(a::Array) = Contiguous()
contiguousness(a::SubArray) = # more sophisticated stuff

function fastsum(a::StridedArray)
    _fastsum(a, contiguousness(a))
end

_fastsum(a::StridedArray, ::Contiguous) = # ... call the underlying C functions

iscontiguous already exists: https://github.com/JuliaLang/julia/blob/8f69307cef149405fe2908bccdb2d721ba28cfe1/base/subarray.jl#L396-L412

Right now it's intended for internal use in stagedfunctions, but we could change it to return Val{true} or Val{false} so it could be used in type inference.

Great. I think you may also add

iscontiguous{T,N}(::Type{Array{T,N}}) = true
iscontiguous(::Array) = true

I think this is useful for interfacing with C libraries, as mentioned above.

See also the (slightly more general, perhaps, because it doesn't assume the dimesions are sorted in a particular order?) iscontiguous function that I wrote for Blosc.jl.

There seem to be two conflicting requirements: something that works on types so you can use it during type inference, and something that you can evaluate at runtime and call out to C libraries. This:

A = rand(3,5)
B1 = sub(A, :, 2:3)    # inference: yes, runtime: yes
B2 = sub(A, 1:2, 2:3)  # inference: no, runtime: no
B3 = sub(A, 1:3, 2:3)  # inference: no, runtime: yes

illustrates the conflict.

Instead of Val{true} why not a new type as with linear indexing. Then we can make storage itself a trait with a useful hierarchy: Strided, UnitStrided ( for BLAS matrices), Contiguous, ...

That way storage does no longer need no longer be reflected in the AbstractArray hierarchy itself, where it often leads to conflicts.

I think using specific types to indicate different memory layout is a good idea.

Below is an example that demonstrates how this may be used:

# memory layout traits

abstract MemLayout
type Contiguous <: MemLayout end
type UnitStrided <: MemLayout end
type Strided <: MemLayout end

memlayout(a::StridedArray) = Strided
memlayout(a::Array) = Contiguous
memlayout(a::SubArray) = ...   # probably need a staged function to figure out

# suppose we have a highly optimized C functions to sum elements
# and we want to port it to Julia, naming it to `fastsum`

fastsum(a::StridedArray{Float64}) = fastsum(a, memlayout(a))

fastsum(a::StridedArray{Float64}, ::Type{Contiguous}) = # call out the C function

# it is possible to write it for general arrays, 
# using matrix here to make the illustration simple
function fastsum(a::StridedMatrix{Float64}, ::Type{UnitStrided})
    s = 0.0
    for i = 1:size(a,2)
        s += fastsum(slice(a, :, i), Contiguous)
    end
    return s
end

# fallbacks to the builtin implementation for other layouts
fastsum{L<:MemLayout}(a::StridedArray, ::Type{L}) = sum(a)  

Sure, we could definitely introduce those types. An alternative approach would be Val{:LinearFast} etc (or Val{:Contiguous} in this case). I'm not actually sure which is the better approach, but I had the impression that proliferation of types is more of a problem than proliferation of parameters. Others surely know more.

(For the record, the linear indexing traits were introduced before Val existed, if my memory serves.)

The point of using types would be that you could use them like this:

abstract MemLayout
abstract Strided <: MemLayout
abstract UnitStrided <: Strided
abstract Contiguous <: UnitStrided

(the last one can probably be a type or immutable). Then some methods can be defined for all strided storage (e.g. permutedims), others only for unit strided storage (BLAS methods), and yet others only for contiguous storage.

Works for me.

Still relevant/necessary?

I think it is. It would be good to have a storage/memlayout trait, rather than trying to reflect this property in the type hierarchy, which makes it less flexible and more difficult to extend (as in the SubArray example).

Yes — the StridedArray typealias is still a major wart here. I'd like to see that replaced with a more trait-like interface.

For anyone who wants to jump on this, it mostly consists of replacing

qr(A::StridedArray) = ...

with

qr(A::AbstractArray) = _qr(MemLayout(A), A)
_qr(::Strided, A) = ...

once the MemLayout trait is defined (for which one can mimic the IndexStyle trait).

The first question to answer here is if adding this is a breaking change in any way. If not, we should move this off of the 1.0 milestone. If not, we should figure out the minimal change we'd need to allow this in the future without breaking anything.

Still unclear if there's anything breaking here and no one has chimed in since I asked.

I'm not convinced that traits are needed here. If it is a DenseArray or a subarray thereof, then it should have strides, and from strides we can implement functions to check whether it is contiguous (and if so in what dimension order). That approach would not be breaking.

(And if traits are needed, they can be added on top of that. A trait would not be breaking either.)

If it is a DenseArray

Well, that's the problem. SubArray only defines strides for some of its parameterizations and is not a DenseArray in general. Similar issues exist for Sparse arrays and views thereof.

That said, I don't think this is breaking, but it can be hard to think through all the ramifications of changing informal interfaces that get extended by user code. This is really largely an addition of a new feature — the ability to ask an array about its storage type — and a _large_ refactoring of functions in Base to take advantage of that fact.

We can follow the pattern that we use for IndexStyle optimizations (as Tim proposes above for qr via the internal _qr). For some functions, we'd be removing the provided svdfact(::StridedMatrix) in favor of a more general svdfact(::AbstractMatrix), which I suppose could affect the precedence of some pathological and type-pirating user-code (e.g., defining svdfact(::Any)).

Type-Piracy absolutely can and should be excluded from stability guarantees. Am I missing any other places where this would be breaking?

@stevengj, you want to have AbstractArray types that endow arrays with all kinds of new properties: you want to reshape them (ReshapedArray and RowVector), modify their indices (OffsetArray), reinterpret/map their values (ReinterpretedArray or MappedArray), give their axes names (AxisArray and similar), make their indices periodic (FFTViews), and so on. These wrappers can often work just fine for either dense/contiguous or non-dense/non-contiguous parent arrays. Consequently, none of these "behavior-endowing" AbstractArrays can be a subtype of DenseArray. So for most purposes, if you need to know whether data are packed contiguously, it's almost completely useless to rely on inheritance. This is a corner of the design space that simply requires traits.

@StefanKarpinski, the breaking part is that in an ideal world we'd simultaneously ditch StridedArray from Julia 1.0.

Eh, it's a fairly harmless typealias. We can always deprecate it alongside the new feature on a minor release (even if it won't be completely removed until 2.0). The important part isn't that it goes away, the important part is that folks start using the alternative. And a deprecation alone is sufficient for that.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ararslan picture ararslan  Â·  3Comments

Keno picture Keno  Â·  3Comments

omus picture omus  Â·  3Comments

musm picture musm  Â·  3Comments

StefanKarpinski picture StefanKarpinski  Â·  3Comments