julia 🚀 - deprecate (then remove) generalized linear indexing

The last index is effectively interpreted as a linear index, if the dimensionality is insufficient. (More properly, for LinearFast AbstractArrays, everything gets converted to an linear index in the end, although there is intermediate bounds-checking.) Extra 1s are also dropped. It's been this way "forever," but I won't close this issue if you're hoping for reconsideration. Suffice it to say that I've sometimes found this behavior useful, but I also understand how it could be a source of bugs.

Relevant code:
LinearFast (converts all to a linear index): https://github.com/JuliaLang/julia/blob/0c20e64ab2697302b419042b9a043e967ed3db53/base/abstractarray.jl#L508
LinearSlow (expands to full dimensionality): https://github.com/JuliaLang/julia/blob/0c20e64ab2697302b419042b9a043e967ed3db53/base/abstractarray.jl#L525-L551

timholy on 23 Jan 2016

Whether the functionality is useful is not quite the same question as whether this should be the default behaviour. There could be a function linearindexing with this behaviour, or one could create a subarray...

eschnett on 23 Jan 2016

FWIW, I tend to feel like this behavior is too odd to be the default.

johnmyleswhite on 23 Jan 2016

👍3

I would say that I'm looking for reconsideration, yes. I think its got a high surprising-ness to it, relative to its (minimal-but-nonzero) utility. I guess my main initial goal was understanding how much utility it does have, and to whether "fixing it" would have nontrivial performance applications.

In terms of options for "fixing it", after I admit relatively little consideration, it doesn't seem like anything but requiring all dimensions would be consistent. e.g. consider

julia> x = reshape(1:3^3, (3,3,3));
julia> x[3,3]
9

In some ways I find that even more surprising than my other example (but would have found it returning x[3,3,:] less so).

IainNZ on 23 Jan 2016

I agree and feel strongly that it should be an error to index an array with fewer indices than the rank of the array. It just makes it too easy to introduce hard to find errors into code when using higher rank arrays (say 4-10 dimensional). It is easy enough to use reshape to make a copy free reference if one wants to use the trailing linear indexing feature (and this leaves the intent clear in the code). And one could easily write some helper functions to make trailing linear indexing easier to use. I would also note that other languages (e.g., IDL and Fortran) make this an error so the present behavior will be surprising to many.

The case of 1-D linear indexing is perhaps OK since indexing as a one dimensional array is visually distinct (compared with, e.g., indexing a 8-D array as 7-D). However, since it is so easy to use vec I am not sure even this is necessary.

This was discussed a bit in https://github.com/JuliaLang/julia/issues/4774 but I don't know how to link to specific comments and that is a long issue.

BobPortmann on 23 Jan 2016

@BobPortmann you can link to specific comments by right clicking on the timestamp (next to the author name in the header of the comment), and selecting "copy link location" or equivalent.

rfourquet on 23 Jan 2016

+1 to reconsidering this, it just seems so much more likely that this would
be used by mistake than on purpose.

Even for linear indexing I think that we should consider if it should have
its own method instead of being overloaded onto regular indexing.

toivoh on 23 Jan 2016

I kind of love the idea of giving linear indexing a special method.

johnmyleswhite on 23 Jan 2016

What about A[i, LinearIndex(j)]?

timholy on 23 Jan 2016

I'd be ok with that.

johnmyleswhite on 23 Jan 2016

+1 to dropping this behavior and having separate syntax of some kind for linear indexing. This behavior was just taken as the default choice since it's what some other environments do. I would describe it as an over-generalization of a premature optimization.

JeffBezanson on 23 Jan 2016

I would describe it as an over-generalization of a premature optimization.

Precisely. Let's ditch this for sure.

StefanKarpinski on 23 Jan 2016

👍1

-1 for dropping the 1D version. Its very natural to loop over all elements in a multidimensional array using

for n=1:length(x)
  x[n] = ...
end

I know this can be achieve in other ways (e.g. enumerate) but the above is quite easy to remember.

tknopp on 24 Jan 2016

👍2

@JeffBezanson, i disagree with the statement about over-generalization of a premature optimization.

Linear indexing is the _natural_ indexing scheme and anything providing multi dimensional access is actually some infrastructure to make it easier for the person writing code or design an algorithm.
Taking linear indexing away completely just feels wrong to me, introducing additional syntax (which means to me, i have to hit more keys on the REPL before getting output) seems to me additional burden; a[i] is just linear, a[i,j] is dimensional; where is the problem?

Now to the interesting problem that started this: Having a a[i,j] access when a is actually defined 3D. I was not aware that this existed, but actually this is something that i'd like to have in a language.

The initial problem for me (and i think the title of the issue is missleading) is rather in the area of a compiler warning or some code checking (like lint).

lobingera on 24 Jan 2016

The way I see it, linear indexing is the natural indexing scheme in the same sense that machine code is the natural programming language: it is the fundamental one.

But Julia is in the business of providing abstractions. In fact, it provides abstractions such as subarrays, where linear indexing is _not_ the natural indexing scheme.

I don't think that linear indexing should be removed completely, especially not for the array types where it is the fundamental scheme. But seeing as subarray performance is going to be increasingly important, it might make sense to make it a bit more cumbersome to use linear indexing, in order to nudge people into using eachindex etc to get good performance across a greater number of array types.

toivoh on 24 Jan 2016

Just thinking out loudly: In addition to eachindex, there could also be an eachelement function that returns a reference to an array element, as in

for xn in eachelement(x)
    xn = ...
end

This reference could be something like a zero-dimensional subarray.

eschnett on 24 Jan 2016

Wouldn't that have to be xn[] = ...?

To just access the value, we already have that from for x in X....

timholy on 24 Jan 2016

Right, should have been xn[].

Yes, this is about avoiding having to handle indices at all if one wants to iterate over a whole array, assigning to elements.

eschnett on 24 Jan 2016

That could be useful.

toivoh on 24 Jan 2016

:+1: to deprecating this. I've called this "partial" linear indexing before — not sure where I picked up that terminology. It's been discussed in https://github.com/JuliaLang/julia/issues/13015 and https://github.com/JuliaLang/julia/issues/5396, and it's a bullet point that I listed as a possibility in #13157.

mbauman on 24 Jan 2016

That kind of iterator would be very useful for sparse matrices where accessing non-structural locations is not allowed (for example, PETSc matrices).

JaredCrean2 on 24 Jan 2016

i'm just wondering, how and why this was brought in, anyway? Someone found it helpful and i guess it was happening

Maybe i'm biased because my matlab coding includes very often linear indexing although the data is organized in 2or3D and i have a mental model to deal with it.

lobingera on 25 Jan 2016

Another -1 for dropping the 1D version for same reason as https://github.com/JuliaLang/julia/issues/14770#issuecomment-174262352.

Just keep the 1D version of linear indexing and remove the "partial" linear indexing imo.

KristofferC on 25 Jan 2016

👍2

I too would love to have the 1d linear indexing version and getting rid of the partial linear indexing. That is quite a sane solution.

ViralBShah on 25 Jan 2016

👍1

Sorry for hijacking the thread, no matter what happens to linear indexing,
it seems that almost everyone is for getting rid of partial linear
indexing. That should also be a much smaller change, since there shouldn't
be much (if any) code that depends on it.

toivoh on 25 Jan 2016

@toivoh, No, i'm not for getting rid of this partial linear indexing.

lobingera on 25 Jan 2016

That's why I said almost everyone :)
Is there anyone else who wants to keep partial linear indexing?

There is of course always a tension between catching bugs and convenience,
personally I just think that this would catch a lot more bugs than the
times the functionality would be needed. I don't even know when I would use
partial linear indexing.

toivoh on 25 Jan 2016

s/linear/partial linear/ in the last sentence.

toivoh on 25 Jan 2016

Well, although I do not want to vote in a particular direction, there are use cases and I even have uses it in some situations. Imagine you have some plotting tool that is capable of displaying slices within a 3D dataset c. So it will display

c[:,:,k]

where k is the slice number. Now I am confronted with 4D data (3D+time) and would like to reuse my existing infrastructure. Currently, when everything is properly duck typed it would be very simple to push a 4D dataset an allow k to be beyond size(c,3).

tknopp on 25 Jan 2016

I have a something like the following in some matlab code. We evaluate (per simulation time step/iteration) an A-B relation in a 2D NxN matrix and record (for postpro) this NxN matrices in a TxNxN 3D store. As the copy operation is based on find (reporting linear index), in julia i could do Tnn[time_step,find(ABrelation) >= cell_limit)] = 1; And yes, i'm aware i could write this as explicit loop and should not reason with matlab code for julia data constructs...

lobingera on 25 Jan 2016

@tknopp, valid example with extending dimensions and trying to keep infrastructure.

lobingera on 25 Jan 2016

To me, it seems that a Vector of NxN matrices would be a better data structure for that use case.

KristofferC on 25 Jan 2016

And yes, i'm aware i could write this as explicit loop and should not reason with matlab code for julia data constructs...

There is a reason, why something is called postprocessing -> at the time of recording you might not know what to process, so we choose homogenous collections to choose freely on which dimensions to operate, later.

lobingera on 25 Jan 2016

With a sub array, your code can be written as sub(Tnn, time_step)[find(ABrelation) >= cell_limit] = 1. This is almost as short, and maybe clearer. When views become the default, it will be even shorter.

Another +1 from me for allowing linear indexing only for the 1D case.

nalimilan on 25 Jan 2016

The find APIs are definitely some of the greatest forces behind keeping linear indexing, in general. They come from languages where there was no other option.

Interestingly, once you start using subarrays, you don't need find at all (assuming you actually meant find(ABrelation .>= cell_limit)):

sub(Tnn, time_step, :, :)[ABrelation .>= cell_limit] = 1

If ABrelation is also a matrix, the resulting logical matrix _already has_ the cartesian information embedded within it. Calling find destroys that (as does our current logical indexing implementation, but it needn't be like that).

As one last aside, note how relying on partial linear indexing here pushed your access pattern to be row-major. We should be aiming for APIs that encourage the optimal access patterns.

mbauman on 25 Jan 2016

Linear indexing should require a special syntax. The idea is to catch errors where one indexes a matrix accidentally as vector.

The options for such a syntax are endless, and it should be possible to find a syntax that is both short and concise.

arr[:linear, i]
arr[Linear(i)]
linind(arr, i)

In addition, reshaping the array or using an array view outside of the loop are also possible.

eschnett on 25 Jan 2016

👍2

it might make sense to have an operation that is a fused reshape + getindex to avoid allocating a reshaped wrapper on the heap

carnaval on 25 Jan 2016

Care to expand, @carnaval? I've been musing about that too, in the form of "Janus" iterators that present two (or more) notions of location. I just haven't decided yet what operations besides increment are necessary for such iterators, and worry that supporting offsets (addition/subtraction) is a minefield.

timholy on 25 Jan 2016

In #5396 I figured out for myself that the use for partial linear indexing in matlab is to be able to do let setindex! silently add dimensions to arrays. As julia does not allow that (phew!) there is this not much reason for partial linear indexing left.

mschauer on 28 Jan 2016

Still i'm wondering, why this was here in the first place (didn't have time to dig into git logs)?

I'm convinced (from the above discussion):
-1 to partial linear indexing
+1 to have linear indexing available without additional syntax (a single entry in [v] should tell the story)

lobingera on 28 Jan 2016

I'm for allowing only indexing with an index for each dimension or exactly one index.

StefanKarpinski on 28 Jan 2016

👍2

I'm also in favor.

timholy on 28 Jan 2016

@carnaval A notation for that could be derived from annotating the dimension type parameters as in A{3,3}[i,j] and maybe A{}[i] for linear indexing

mschauer on 29 Jan 2016

I assume that linear indexing would always use the range 1:length(A).

How would this work e.g. for sparse vectors? There is a priori no way to distinguish between A[regular_index] and A[linear_index]. Similarly for vectors where the lower bound is not 1. A special syntax (e.g. A[:linear, i]) might be required.

eschnett on 29 Jan 2016

What is the problem you try to solve with additional syntax for linear indexing?

lobingera on 29 Jan 2016

There are three:
(1) Given a Matrix M, accidentally writing M[i] instead of M[i,j], i.e. leaving out an index; with a special syntax for linear indexing, this would be caught.
(2) If you have a zero-based vector (doesn't exist in Base, but convenient in some cases), then regular indexing starts at 0, whereas people will expect linear indexing to always start at 1. Without a special syntax, there is no way to distinguish between these two. The same is true for any one-dimensional array that behaves differently from Vector; you can also think of a "window" view into a vector where indexing doesn't start at 1.
(3) Using a single linear index instead of the index ranges declared with the array is not something that is commonly available in many languages (without explicitly creating such a view), and will thus be confusing to beginners.

I also think that this operation is not very common in "user" code. Linear indices are probably used mostly in generic functions that handle arrays independent of their shape.

eschnett on 29 Jan 2016

(2) is pretty hypothetical. Any array M can be seen as a container and as such it can be traversed using M[1:length(M)]. This is why length exists in the first place.

(3) What languages do you have in mind that don't have that feature? Of course only languages with build-in multi-arrays count here.

tknopp on 29 Jan 2016

(1) do we really need to help every developer to write correct code?

lobingera on 29 Jan 2016

For (2), see e.g. https://github.com/eschnett/FlexibleArrays.jl. If you define a 1d-array with index range 0:9, then indexing it with M[1:length(M)] doesn't work. (Of course you still want to be able to index it this way, that's why you need a special syntax.)

For (3), my point was "without explicitly introducing a view". It doesn't matter whether arrays are built-in or defined in a library. I'm thinking e.g. of Fortran, C, C++, Mathematica, Python. They all allow linear indexing, and all require either a special syntax or construct to use linear indexing with a multi-dimensional array.

eschnett on 29 Jan 2016

(2) I see that, but in my point of view there is an interface for containers and this will always index from one. So the flexible arrays simply violate the interface and this is nothing one can take care of here.

(3) Hm ok, just saw that Python/Numpy handles this differently. C/C++ don't have multi dim arrays in the sense we use them in Julia.

For what ist worth. I think the syntax makes sense since I consider an array as a container type. And I have used linear indexing a lot in "user" code.

tknopp on 29 Jan 2016

A generalized container indexing mechanism is very useful. For example, iterating over dictionary this way would be nice. Instead, one has to use an iterator to obtain key/value pairs. Of course, iterators don't work if you want to access multiple containers (that have the same structure) simultaneously.

So, one uses integers as abstract markers for all array elements.

Regarding C and C++: double array[2][2] gives you a 2d array. To get a 1d view, write (double*)&array[0][0]: It's possible, but not implicitly.

eschnett on 29 Jan 2016

Looks like i'm not soo convincing with my idea of keeping the syntax like it is today.
Maybe there was not enough thinking in the first place and now with dedicated syntax for linear indexing we reach some higher optimum.

lobingera on 29 Jan 2016

I am not convinced that this needs special syntax. This is all extra stuff that needs to remembered.

tknopp on 29 Jan 2016

I've played with some containers like that myself… and I've come to the conclusion that it's simply too dangerous to subtype AbstractArray if indexing with integers means anything besides accessing an element inside 1:size(A,d). There are far too many abstract array methods that assume that those accesses are inbounds, and subsequently turn off bounds checking. eachindex does help some, but it doesn't get us the whole way there.

I think the best alternative I've found is to use a custom index type that augments indexing behavior with your custom functionality. As you say, there's no way to distinguish between your custom indexing behavior and "normal" indexing without a special syntax or index type. And in this case, I think the unusual behavior loses in getting the nicer of the syntaxes. That's how AxisArrays.jl works. You can use non-integer types to allow indexing by the axis labels, which can make it behave like an offset array (example; instead of leaning on SIUnits, you could also make your own Integer type).

Let's start by deprecating partial linear indexing and trailing singleton dimensions. It's fairly simple, but the best way of doing so depends upon #11242… otherwise we'd need to further (ab)use staged functions. We can easily add a LinearIndex type if the deprecation period proves to be painful, but I'd be surprised if that's necessary.

mbauman on 29 Jan 2016

For #11242, I need to find some time to debug the mysterious performance problems, but otherwise I think it's close.

timholy on 30 Jan 2016

Let's start by deprecating partial linear indexing and trailing singleton dimensions

I agree with the partial linear indexing, but I'm not so sure about the trailing singleton dimensions. There are a lot of algorithms that look like this:

function sumcolumns(A::AbstractVecOrMat)
    m, n = size(A, 1), size(A, 2)
    s = zeros(accum(eltype(A)), 1, n)
    for j = 1:n
        z = s[1,j]
        for i = 1:m
            z += A[i,j]
        end
        s[1,j] = z
    end
    s
end

which would break for Vector input if we got rid of trailing singletons.

timholy on 31 Jan 2016

Treating vectors as matrices is something that seems to come from Julia's Matlab background, and people seem to be trying to move away from it, without losing the convenience that it offers.

So it seems some people are really fond of this feature (Matlab background), while others are not used to it and find it strange (Python/C/Fortran background).

I am dealing with tensors, not just vectors and matrices, and the code I write and use is generic in the number of dimensions. This is currently, unfortunately, a bit cumbersome in Julia (requires generated functions), but still much better than in any other language I know of comparable efficiency. Treating vectors as matrices, allowing extra indices etc. doesn't help this generic code. In fact, it adds a layer of complexity that I'd like to be able to avoid.

I recently implemented SIMD vectors as datatype https://github.com/eschnett/SIMD.jl, and then had the idea to make them subtypes of DenseArray{T,1}. To my surprise, this broke the examples in my Readme -- the REPL tries output vectors as matrices (in the very same way you describe above), forcing me to support treating SIMD vectors as 2d-arrays. SIMD vectors are by their very nature not multi-dimensional; in fact, indexing them is usually quite expensive, and supporting a multi-dimensional view won't be useful for any actual user.

So -- just what is an "abstract one-dimensional array" AbstractVector in Julia? Is it a generalization of all 1d array-like types, or is it intrinsically always also a multi-dimensional array? If so, should there be another abstract type in Julia for cases where 1d arrays will always stay 1d

You can currently even treat ranges as matrices: (1:10)[4,1].

eschnett on 31 Jan 2016

This is currently, unfortunately, a bit cumbersome in Julia (requires generated functions), but still much better than in any other language I know of comparable efficiency.

That's often true, but these days I find I can do most multidimensional stuff now just using the cartesian iterators. If you're not familiar with this, AxisAlgorithms and base/reducedim.jl are good models.

As far as the SIMD types go, yes, I agree it's a good example of how some things don't really fit this model well. In specific cases you can of course override the display functions, but your broader point is still valid: it's not a good interface if you have to effectively disable or replace what would otherwise be handled by the lowest-level fallbacks. So perhaps trailing 1s should be on the chopping block.

timholy on 31 Jan 2016

I tried the Cartesian iterators. Unfortunately, they expect integer literals as arguments, whereas I have a type parameter N. By the time the type parameter is known, the macro has already been expanded -- hence a generated function that is expanded late enough.

I don't think the Cartesian iterators produce any unescaped code. It should thus be possible to reimplement them as generated functions, so that one can pass non-literal constants.

Now that I think about this, I have the feeling that this would greatly simplify my code.

eschnett on 31 Jan 2016

I think you're confusing Base.Cartesian with CartesianIndex/CartesianRange. There are no macros if you use the latter. In the code I linked, there are no macros, and you can still do amazing things :smile:.

timholy on 31 Jan 2016

Thanks for the pointer. I was not aware of the difference.

Is there a way to list in the REPL all functions that have a CartesianIndex or CartesianRange as input or output?

I think all my multi-dimensional arrays need to support Cartesian indexing.

eschnett on 31 Jan 2016

Is there a way to list in the REPL all functions that have a CartesianIndex or CartesianRange as input or output?

Is methodswith(CartesianIndex) what you seek? Best!

Sacha0 on 31 Jan 2016

D'oh. Thanks.

eschnett on 1 Feb 2016

@eschnett, you weren't the first person who seemed to be confused by that difference. It was enough to finally push me into writing a blog post:
https://github.com/JuliaLang/julialang.github.com/pull/324

timholy on 1 Feb 2016

Current thinking is that we should deprecate generalized linear indexing.

JeffBezanson on 30 Mar 2016

@JeffBezanson, Based on what?

lobingera on 31 Mar 2016

@lobingera, if you read this thread there is a pretty clear majority view. I don't know of any plans to eliminate _all_ linear indexing, this is just about the "partial" or "generalized" case. And AFAICT there seems to be no resistance to the plan in https://github.com/JuliaLang/julia/issues/14770#issuecomment-174212684, so if adopted you wouldn't lose anything.

Given that, can you elaborate on your concern?

timholy on 31 Mar 2016

Is there a plan or a proposal for linear indexing for arrays where the lower bounds are not 1? I'd like to implement this for https://github.com/eschnett/FlexibleArrays.jl.

My current idea is to define a type immutable LinearIndex i::Int end, and use it for linear indexing, assuming that the linear index i=1 should be the first array element.

eschnett on 31 Mar 2016

I think I'd try using CartesianIndex to represent indexing at the _shape_ of the array instead of memory offsets or other interpretations. In fact, this is what a LinearSlow vector will currently produce with eachindex, even though it could be a regular old range. It seems a little backwards, but it might make sense to use CartesianIndexes to represent linear indexing, too. Linear indexing explicitly depends upon a cartesian interpretation of your indices.

mbauman on 31 Mar 2016

If I have a 1D array that has an index range 0:9, how can I use a Cartesian index to represent a linear index? The obvious CartesianIndex((1,)) already represents a regular index pointing to the second element, and CartesianIndex((10,)) is already an out-of-bounds error.

eschnett on 31 Mar 2016

Just specialize getindex(A::FlexibleArray, ::CartesianIndex) to have the interpretation you want.

mbauman on 31 Mar 2016

I already specialize it, and I gave it the interpretation that it should be Cartesian (and not linear) indexing, as that's what's needed to make eachindex work. I now want to add linear indexing as well.

eschnett on 31 Mar 2016

getindex(A::FlexibleVector, i::CartesianIndex{1}) = … # interpretation of index as 1:end
getindex(A::FlexibleArray, i::CartesianIndex{1}) = A[CartesianIndex(ind2sub(size(A), i[1]))]

?

mbauman on 31 Mar 2016

I already want to interpret a Cartesian index as regular (shape-dependent) index since Cartesian indices are very convenient for index calculations.

eschnett on 31 Mar 2016

Aha, we're talking past each other. When I've been talking about cartesian indexing above, I've been meaning not only one-or-more indices, but that each index goes from 1:size(A, d). If you want to support linear indexing, you'll need to use a special type and implement special indexing rules, no matter what. If you just use CartesianIndex to universally mean 1-based indexing, then you no longer need to specialize eachindex, and code that uses explicit CartesianRanges will still work. I think it's a less painful path, personally.

mbauman on 31 Mar 2016

The special type and the special indexing rules are not a problem (already there). I'm looking for some standardization that would work for other array packages as well, and hopefully even for Base.

eschnett on 31 Mar 2016

I'm proposing an interpretation of CartesianIndex that we already assume in Base: that it's indices are always 1-based. By agreeing to that interpretation, your FlexibleArrays suddenly become a whole lot more like any other LinearSlow array… and it requires no new types or idioms. I think it'll even get better as we refine our index iteration idioms. I think that makes for a nice standardization.

mbauman on 31 Mar 2016

Having CartesianIndex always starting at 1 is not much different form requiring that all array indices should always start at 1: This is what we have in Base, but there are cases where this is quite inconvenient.

CartesianIndex is not only used for eachindex, it is a very convenient way to perform rank-independent calculations on array indices. If CartesianIndex indexing were to always start at 1, then I'd need a second FlexibleCartesianIndex (with the same implementation) that didn't have this requirement.

eschnett on 1 Apr 2016

@timholy Well...
with my recent experiences what happens in deprecation of language options (or new syntax, or even only renaming functions) i'm concerned, that any language change or change of options will create effort in porting available code. Yes, i'm aware we all should assume julia to be breaking before 1.0. But this applies to a proper definition of 'we'. There is a growing community of julia users outside the mailing list, github and events like JuliaCon. And e.g. the list in packages (904 and growing) is from their perspective something like the standard library.

I'm interested in indexing into arrays because i already think about vectorisation and simd when creating algorithms therefore i'd like to use a language that supports me there. So i want to write code in vectors and matrices and higher arrays (and avoid local explicit loops). Julia looked like this in the beginning but discussions like this thread give me the impression, that gradually we move away from this. This might be a problem in understanding on my side.

So, for breaking changes (deprecaing an option) from my perspective, there should be a good, technical reason like: simplifying internal structures, the parser OR it better matches the assumed end point of julia as a programming language.

At one of two points in the thread the reasoning for deprecation of this or that option was: Unexpected behaviour. That's for me no technical reason.

lobingera on 1 Apr 2016

@lobingera, this change is entirely deprecation safe, as far as I'm aware. I'm not really sure what this discussion has to do with vectorization and simd, it's about removing a particular syntax that's prone to confusion and makes it easy to introduce bugs.

mlubin on 1 Apr 2016

We do regularly run PkgEval on the entire package ecosystem to gauge the impact of changes. For the jb/functions change, for example, @JeffBezanson went to great lengths to make sure that most packages would be able to continue working, including making pull requests to fix them. It was, of course, it was not 100%, but this is something we're aware of and pay attention to.

As to technical reasons to get rid of partial linear indexing:

It is complex, hard to implement correctly, and even harder to test correctness of.
It is dangerous in the sense that it tends to hide programming errors.
It is not necessary very often.
It can be replaced by the same functionality with a less accident-prone syntax.
A less accident-prone syntax can be provided in a package rather than the standard library.

All of these points combined with the general consensus here that it should go, seem sufficient to deprecate partial linear indexing in 0.5 and remove it in 0.6.

StefanKarpinski on 1 Apr 2016

@StefanKarpinski, Tim asked me for concerns so i replied. I was recently entangled into making Gadfly running on a recent v0.5 again and i asked myself, if this is really efficient (it has not anything to do with this Issue, still changing something in the language stopped people from working). I still see your points 2/3/4 as rather non-technical and point 1 is strange, because obviously an implementation exists ... while i agree on missing testing.

If you (all) see this as necessary to streamline v0.5, who am i to disagree?

lobingera on 1 Apr 2016

I do appreciate your concerns and that you're taking time to express them here. Porting things forward versions can certainly be pain.

Points 2-4 may not be strictly technical, but then programming language design in general is not strictly technical – it's applied psychology. If humans were strictly technical beings, we wouldn't need usable interfaces. The issue with 1 is that any amount of complexity in the standard library is an ongoing burden to maintain and be responsible for – I don't think this feature is pulling its weight: it doesn't offer enough to justify its presence in the standard library.

StefanKarpinski on 1 Apr 2016

👍4

Current thinking is that we should deprecate generalized linear indexing.

Do we have a proposed implementation plan for how to go about doing this?

tkelman on 21 Apr 2016

With #11242 I think it should be pretty straight-forward. We may want to introduce a new sub2sub method that is akin to ind2sub but takes a partial-linear-index-tuple instead of a scalar linear index in order to allow for a simple replacement in the deprecation method.

getindex(::AbstractArray, ::Int) = # linear
getindex{T,N}(::AbstractArray{T,N}, ::Vararg{Int, N}) = # cartesian
@deprecate getindex(A::AbstractArray, I::Int...) getindex(A, sub2sub(size(A), I)...)

Note, though, that this would _also_ deprecate trailing singleton dimensions.

mbauman on 21 Apr 2016

Blocked on https://github.com/JuliaLang/julia/pull/11242.

StefanKarpinski on 28 Apr 2016

I agree that the "generalized linear indexing" is confusing and we are probably better off without it. However, in searching for usages of trailingsize, I found one package where this feature is used:

DSP.jl: https://github.com/JuliaDSP/DSP.jl/blob/a2e25f670d061296ea2fad0a613142313965bc22/src/Filters/filt.jl#L52

It would be good to think about what code would replace this.

stevengj on 28 Apr 2016

This is a _huge_ task. The nuts and bolts of it are fairly simple, even without #11242. Just throw depwarns in the appropriate cases in abstractarray.jl and make the linear fast function a generated function to do the same. Remove this line and you're done (assuming Windows bootstrap is no longer an issue).

But holy cow, there are many thousands of deprecation warnings when running the tests. I think some of it is just that testing code relies on this behavior more than normal code does… but it will take quite a bit of work to get this through. There's no way I'll be able to tackle this anytime soon.

mbauman on 1 May 2016

@mbauman do you have the beginnings of a branch or something where we could start from? If it's just a matter of resolving deprecation warnings (but a lot of them) that sounds easy to distribute, but who else would know how to start this off?

tkelman on 5 May 2016

Yes, I have that branch but I can update it to use the swanky new Vararg type without much hassle.

mbauman on 5 May 2016

Thats unexpected... Curious to see where/what they are, maybe this discussion has overlooked something

IainNZ on 5 May 2016

We might be able to handle the deprecations by adding LinearIndex(i) that is allowed only in the last slot. Naively that seems to require a Vararg in the middle of an argument list, but these days we can arrange it with either generated functions or a lispy combination of Base.front and @inline (for performance).

timholy on 6 May 2016

I think I'd rather just use reshape on the array being indexed. That way it'll be more performant for linear slow arrays, too. Maybe we need some shortcut way of expressing reshape A to have N dimensions by either adding trailing singleton dimensions or collapsing the last dimensions.

It's also a nice way of expressing the algorithm in DSP.jl that Steven referenced above.

mbauman on 6 May 2016

👍1

Where do we stand on this now with #16251 merged? Is it at the point where the core part of the change is easy and the rest is a matter of a deprecation hunt that multiple people could chip in on?

tkelman on 20 May 2016

Yes, it should be fairly straightforward now.

Move the methods I marked for deprecation, and add deprecation warnings. In my quick test above, I noticed some wonky behavior with depwarn at the repl because it relies on the stack… and getindex inlines like crazy, so there simply isn't much of a stack. So this may take some consideration.
Watch out for surprising dispatch between methods of the form getindex(A::Array, I::Int…) and getindex{T,N}(A::Array{T,N}, I::Vararg{Int, N}). You may need to add a few explicit non-deprecated functions in order to keep the behavior we want non-deprecated. I'd still put them in deprecations.jl, though, since they'll only be necessary as long as the deprecations exist. Slightly annoying, but not terribly difficult.
Slog through base and tests, adding reshapes everywhere to match the number of indices and make Julia happy: reshape(A, Val{3})[i, j, k].

mbauman on 20 May 2016

What's needed here is for someone (@tkelman hasn't had time) to get this started by making the changes @mbauman proposed, even if the tests fail. Others can then continue by fixing various tests and other usages of generalized linear indexing.

StefanKarpinski on 9 Jun 2016

Decision on the triage call is we can postpone this one until we have time to do it thoroughly.

tkelman on 16 Jun 2016

For what it's worth, the most straight-forward implementation here requires #18457 since it fixes a method sorting bug with Varargs of a bound length.

mbauman on 9 Dec 2016

Since #18457 is almost ready (can we merge it yet?), let's wait for that and then do that.

StefanKarpinski on 9 Dec 2016

An update is in order here. We've been talking about two completely different things in this thread, with _vastly_ different ramifications:

Deprecate the linearization of any dimension beyond the first. This is what I've done in #20079. Note that it still allows indexing into an N dimensional array with fewer than N indices. Once this change goes through, we will be able to simplify the lowering of A[i, end] to simply use size(A, 2) — and I think we can ditch Base.trailingsize altogether. Similarly, we'll no longer need to fuss with special cases for trailing :s beyond the single A[:] case. Notably, this only impacted test code that was specifically crafted to test this behavior.
Deprecate indexing into N dimensional arrays with anything but 1 or N indices. This is what I attempted in #20040, and it is massively disruptive because it removes the ability to index into vectors with trailing 1s. This change requires bending over backwards to ensure that we only define indexing with 1 or N dimensions. While I suppose we could theoretically deprecate indexing with 1 < n < N indices but keep trailing singleton dimensions, specifying that with dispatch would require support for a new signature: getindex{N}(A::AbstractArray{T,N} where T, I::Vararg{Int, N}, trailing_singletons::Int…).

I'm for the first change, but against the second.

mbauman on 18 Jan 2017

👍1

Re. indexing with trailing singletons: Though indexing with trailing singletons is sometimes convenient when writing code, that style is often confusing / obfuscating when reading code. Code being read more than written, the benefit of disallowing indexing with trailing singletons (clarity) seems worth the cost (minor convenience reduction).

Deprecating indexing with anything other than 1 or N dimensions received broad support in this thread. The primary obstacle to completing this deprecation seems to be the volume of work involved in removing indexing with trailing singletons from existing base code. Particularly, the linear algebra code seems to be the primary consumer of indexing with trailing singletons. (Reading the linear algebra code is what convinced me of the above clarity point :).) If that is correct, and there otherwise remains broad support for deprecating indexing with anything other than 1 or N dimensions, I would be happy to primarily shoulder the necessary linear algebra related work over the next release cycle. Best!

Sacha0 on 22 Jan 2017

❤1 👍1

I'm not as certain on the consensus since the thread is a little meandering… and I know I have been imprecise in my language when talking about these two options. It'd be great to hear updated opinions now that we have two very concrete (and implemented) options. You can try them out! CC @andreasnoack, who is conspicuously absent here.

My hesitancy about only-1-or-N-indices isn't just that it requires a lot of changes to existing code, but that it is _also_ more difficult to implement and enforce. Were this more consistently a net simplification (on either side), then I think I'd find it more attractive.

mbauman on 23 Jan 2017

@mbauman Either of your 2 options above will be a big improvement but I think having stricter rules would be better (i.e., deprecating indexing with anything other than 1 or N dimensions). It is not clear to me why this would be "more difficult to implement and enforce". Seems naively that it would be easier, not harder. In any case, in my experience this distinction only becomes an issue when working with higher-dimensional arrays (say larger that 5 or so) where being off by one index is less visually distinctive and getting yelled at by the compiler really helps in the long run. I suppose the linear-algebra folks usually use lower dimensional arrays and thus have no issue.

BobPortmann on 23 Jan 2017

I can't speak to the difficulty of implementation, but if we're prioritizing user experience, unintentionally providing the wrong number of indices is currently an unfortunate trap. We get complaints about this in JuMP: https://github.com/JuliaOpt/JuMP.jl/issues/937

mlubin on 23 Jan 2017

I'm also in in favor of 1 or N. One of the other people in my lab spent nearly a week tracking down a bug caused by this behavior.

JaredCrean2 on 23 Jan 2017

👍1

Moving milestone now that #20079 is merged.

mbauman on 25 Jan 2017

What remains to be done here? Now is probably a good time to take this out entirely and try to simplify all the array indexing infrastructure.

StefanKarpinski on 20 Jul 2017

I need to spend some time to get https://github.com/JuliaLang/julia/pull/21750 through. Then we can talk about the next steps here. I'd be in favor of one final round of bounds check tightening here, such that we only allow omitted trailing dimensions when the size of those omitted dimensions are all 1.

mbauman on 20 Jul 2017

With https://github.com/JuliaLang/julia/pull/21750 merged, I believe there are only a few small indexing cases that still need to be deprecated here.

StefanKarpinski on 31 Aug 2017

Julia: deprecate (then remove) generalized linear indexing

Most helpful comment

All 109 comments

Related issues