Julia: `size`, `reshape` not consistent

Created on 3 Jul 2017  Â·  33Comments  Â·  Source: JuliaLang/julia

if size returns a tuple of an arrays dimensions, then why is the function which changes an array's dimensions not called resize? size and reshape are borrowed from matlab. numpy uses shape and reshape. the latter make a lot more sense to me, despite being a recovering matlab user. shall we consider changing? happy to submit a PR if there is a consensus. some discussion here.

Most helpful comment

If we think this to the end then length should be renamed to size. Since resize! modifies the array size.

yuyichao has a point though size(a,d) gives us the size of the d-th dimension.

Personally I don't think all this is worth it. Naming is something that at some point gets subjective and IMHO reaching 100% consistency is close to impossible.

All 33 comments

-1 since reshape(to make into a different shape) is a more strict operation than resize(to alter the size of something):

julia> reshape(rand(2,2), 3, 1)
ERROR: DimensionMismatch("new dimensions (3,1) must be consistent with array size 4")

julia> resize!([1,2,3,4], 3)
3-element Array{Int64,1}:
 1
 2
 3

If we had multidimensional resizable (as in re-length-able) arrays, then a multidimensional resize method would indeed make some sense. But we can't change their "total" size (length), only their shape.

If we decide to go with non-length-changing arrays, then we can still support reshape, as a view...

I think the core suggestion here is to rename size to shape. This makes sense to me. size is less important anyway given that indices is what really matters :smile:.

Linking to #20402 so this gets more eyes before anyone goes to great effort.

shape(a) sounds fine, not so much for shape(a, 1) though....

renaming size to shape is not just the core suggestion, it's the only suggestion. thanks @timholy for clarifying my verbosity.

seems to me this would be relatively straightforward. just a cut and paste with a deprecation, no?

If we think this to the end then length should be renamed to size. Since resize! modifies the array size.

yuyichao has a point though size(a,d) gives us the size of the d-th dimension.

Personally I don't think all this is worth it. Naming is something that at some point gets subjective and IMHO reaching 100% consistency is close to impossible.

If we were to move to shape instead of size we should at the same time move to an API where the range of valid values is returned. At which point, we'd probably also want a function that returns the length of each of those index collections – maybe we could call it size.

@StefanKarpinski we already have indices, which returns the range of valid values.

i'm surprised to see so much resistance to this idea. is it just that change is bad / a lot of work and this syntax is too entrenched? we're not at 1.0 yet.

would there be less opposition to change reshape to resize? then we'd have size, resize, and resize!. methods to the latter could eventually be added to handle N-D arrays. i'd be happy with that too. anything to make the nomenclature consistent.

There are two key properties here: the number of elements, and how they are factored into dimensions. size is a bit ambiguous and could refer to either, but shape definitely only refers to the second property (do we agree shape(a, 1) probably doesn't make sense?). The defining feature of reshape is that it can only change the factoring into dimensions, not the number of elements. So I think its name should stay. There is never going to be a reshape!, since that would require changing the type of an array (we also want to avoid mutating dimension sizes in general).

One issue we're up against here is that length is a standard term for the number of items in an array, but relength is not a word. Sometimes you just have to live with things like that. Maybe we could use length!? Though a bit weird since it's not a verb.

@JeffBezanson matlab uses length to mean the number of elements in the longest dimension. it doesn't make sense; just wanted to point it out. [edit: perhaps the person he coined this usage was a woodworker, where length, width, and depth refers to the longest, 2nd longest, and shortest dimensions of a board, respectively]

to me, in plain layman terms, vectors have lengths, matrices have areas, and 3-D arrays have volumes. i guess i tend to think in physical terms, as if they represented a space in the real world. again, to me, a general term for a scalar quantity that includes length, area, and volume is size (not length), meaning how big it is, that is how many elements it has.

in julia sizeof currently means the number of bytes an array consumes, and length means the number of elements. two entirely different words for functions that return the same thing but in different units. really?? i like @tknopp 's suggestion that length be renamed to size. my mother would understand terminology like that.

the STL (C++) also uses size. The suggestion length -> size and size -> shape seems to make things indeed a little bit more consistent.

The question if its worth the effort. On the other hand: Now or never... :-)

I don't think we have enough deprecation cycles to make that change happen in 1.0.

in julia sizeof currently means the number of bytes an array consumes, and length means the number of elements. two entirely different words for functions that return the same thing but in different units. really??

No. length is the abstract number of elements in anything iterable, while sizeof refers to the concrete memory representation. You can have an object whose size is 16 bytes but that iterates millions of elements, or whose size is 1GB but that iterates one or zero elements. These are in no way the same concept.

I think renaming size to shape is reasonable; there is precedent for that name. But I'm not sure it's worth it. size is also an increasingly problematic function for e.g. OffsetArrays. Not sure if it's necessarily related, but helps motivate some kind of shake-up there.

A more descriptive name for sizeof would be nice, e.g. memsize. Best!

@StefanKarpinski not enough deprecation cycles? how is the change proposed here (size to shape, length to size, and, for good measure, sizeof to memsize) different in this regard than anything that's proposed in https://github.com/JuliaLang/julia/issues/20402 ? your roadmap talk is not posted yet, but if it includes even just one more release (0.7) before 1.0, then making the change now and including a deprecation should suffice, no?

I really don't want to rename length. If we didn't allow growing collections, then I assume there'd be nothing wrong with using length. So maybe we should rename resize!? Also, length is not tied to arrays; it is much more general than that. It refers to the number of elements in a sequence (you might even say the "length" of a sequence), and arrays happen to be able to implement that interface.

@bjarthur: a chain of renames requires a deprecation cycle for each link, and we only have one left. Specifically, we can't deprecated length to size until one cycle after we've deprecated size to length – even just one cycle is kind of dangerous since deprecations often don't get caught until the function is deleted entirely.

resize! has an if/else/end block with entirely different code to handle lengthening and shortening. not surprising. what about splitting it into two functions: lengthen! and shorten!? upside is that this terminology would then be consistent with retaining length (as opposed to renaming to size). downside is that the logic about which to use would then have to be hoisted to the user. for the life of me i can't think of a verb which encompasses both lengthen and shorten and is specific to altering a dimension.

re. renaming sizeof to something more descriptive, @Sacha0 suggested memsize, and here i put forth footprint. is that too slang-y?

size to shape, resize! to lengthen! and shorten!, and sizeof to either memsize or footprint could all be done at once i believe.

splitting it into two functions: lengthen! and shorten!?

NO. Don't do that! It's not even useful for performance since LLVM can fold that branch easily.

memsize, and here i put forth footprint

memsize is OK although ~less~more ambiguous than the sizeof which has a well accepted meaning from C. footprint is wrong since the return value is not the memory footprint of the object, that's what summarysize estimates.

for the life of me i can't think of a verb which encompasses both lengthen and shorten and is specific to altering a dimension.

"resize"?  

there are three things wrong with the following interface to arrays:

. | query | alter
---|---|---
tuple | size | reshape!
scalar | length | resize!

the first is that the same root word, "size", is used to query an array and return a tuple of dimensions, as well as to alter an array with a scalar input (the diagonal elements in the 2x2 table above).

the second and third are that the same root word is not shared for tuple inputs and outputs, and scalar input and outputs (the columns in each row).

a single change, size to shape, would fix the first and second. we can't seem to agree on how to fix the third. but 2 of 3 ain't bad. can we please proceed with at least this?

What about shape(a, 2)? That seems awkward to me. Maybe length(a, 2) makes sense? For querying a single dimension "what is the size of the dimension" seems like how I would usually say it.

i'd suggest deprecating the 2-input method of shape, and use shape(a)[2] instead. they both produce the same llvm and native code, so there is no performance penalty.

Just to fill in your table a bit more:

  | query | alter
---|---|---
tuple of indices | indices | reshape
indices of dimension d | indices(A, d) | -
tuple of dimension sizes | size | reshape
size of dimension d | size(A, d) | -
number of iterated elements | length | resize!

Personally, I don't find this all that dreadful. Note that length is more strongly tied to iteration than it is to arrays specifically, and resize! only works for vectors, where size(A) == (size(A, 1),).

The only refactoring that I could really see is size(A) → shape(A), but I don't think asking for the "shape" of a given dimension makes all that much sense… so you'd probably also want size(A, d) → length(A, d). Then we'd probably get the complaint that shape(A) and length(A, d) is inconsistent with indices(A) and indices(A,d).

@bjarthur size(a)[2] and size(a,2) are not redundant. Try calling both with a vector. This behavior is pretty useful.

Thinking more from the linguistic point of view I would second @JeffBezanson and even say that shape(a) is not as specific as size(a). Just for instance the shape of a matrix could also be a property describing if it is a diagonal, or upper diagonal matrix, etc.

i would've expected size([1,2,3],2) to return 0 if it didn't throw an error. is there an issue i can read about the design decision here?

If [1,2,3] conceptually had size 3x0x...x0 then it would have zero total elements (the product of the dimension sizes).

I support having a shape function, but as I said before, it should support arrays with arbitrary index offsets, which essentially means that it's a rename/rebrand of indices, not size. Moreover, it makes sense to allow reshape to change the index offsets of dimensions, not just their size. In fact, OffsetArrays does precisely this. In this context, changing indices to shape would make more sense – and that's something I already proposed above. Then shape and reshape would match.

From this point of view, size(A) would really just a shorthand for map(length, shape(A)) and size(A, d) would be shorthand for length(shape(A, d)). I could get behind changing the spelling of size(A, d) to length(A, d) but calling size(A) is still a really common and useful idiom and that can't be shoehorned into length since length(A) means something quite different.

I would also point out that some of these things are just vagaries of English, which we're inheriting. To change the length of something, you "resize" it, you don't "relength" it – that's just not a word. Moreover, even though asking for the length of an individual dimension makes sense, asking for the length of an entire array and getting back a tuple of dimension lengths does not make sense, so swapping length and size would not be good despite the length vs resize! mismatch.

I would actually like to rename indices anyway, since to me it sounds like it would be a collection of the indices themselves, i.e. for i in indices(a) sounds like it would be similar to for i in eachindex(a).

I bet this suggestion will be unpopular, but one possible way to reduce the unfortunate linguistic similarity between the conceptually distinct functions size and resize would be to simply rename size to sizes, since it returns a tuple of dimension sizes. While sizes(A, d) perhaps sounds less natural than either size(A, d) or length(A, d), I think it certainly sounds more natural than shape(A, d). (Never too early to start thinking about version 2.0 ...)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tkoolen picture tkoolen  Â·  3Comments

m-j-w picture m-j-w  Â·  3Comments

i-apellaniz picture i-apellaniz  Â·  3Comments

sbromberger picture sbromberger  Â·  3Comments

musm picture musm  Â·  3Comments