Julia: syntax: separate array concatenation from array construction

Created on 5 Jun 2014  Â·  145Comments  Â·  Source: JuliaLang/julia

Much gnashing of teeth derives from the overlap between syntax for array literal construction and array concatenation in Julia – largely inherited from Matlab. Perhaps we should just use a different syntax for block matrix construction entirely. One thought would be this:

| a b
  c d |

This has the advantage of being pretty terse and lightweight. For example, the current idiom of expanding a range into an array is [1:10] which would become |1:10| while [1:10] would construct a one-element array of type UnitRange{Int}.

breaking speculative

Most helpful comment

Just wanted to register my strong preference that none of the array literals (including matrices) do any form of concatenation. The concatenation behavior feels like an unnecessary MATLAB-ism to me. If we're building LinAlg to be block-matrix friendly, why not let [A B; C D] make a block matrix?

Regarding syntax changes, the syntax that appeals most to me is the one mentioned by @yuyichao, with commas and semicolons and no whitespace sensitivity. I can see this generalizing arbitrary dimensions by having other symbols like double-semicolon ;;, being whitespace invariant, allowing row matrices vs vector, etc.

# vector
[1, 2, 3, 4]

# matrix
[1, 2;
 3, 4]

# 3D array
[1, 2;
 3, 4;;

 5, 6;
 7, 8]

The only thing is I sometimes wonder if it would be nicer to do this in storage order not "looks like linear algebra" order. Otherwise [1, 2, 3, 4;] is a 1x4 matrix not a 4x1 matrix, it seems odd that the last semicolon can make such a drastic change.

The final thing I'd love is a literal for 0-dimensional arrays, as I find StaticArrays.Scalar extremely useful to control broadcast, etc. The possibility here is [a] makes a zero-D array and [a,] makes a 1D vector. I admit some users might find that obnoxious, but it has some logic to it (with , being the first dimensional separator and following the same rule as trailing ;, etc, plus we have to treat length-1 tuples with trailing ,, as in (a,) so there is a precedence.).

All 145 comments

If breaking changes are being considered for this, I'll put in a vote for using an explicit delimiter here. Comma, semicolon, whatever. Not whitespace.

Well, this would free up the ability to use an explicit delimiter instead of _requiring_ using whitespace. It is still sometimes nice to use whitespace, although I guess we could avoid that since it's a bit of a parsing nightmare.

dup of:

3737

2488

related syntax discussion:

6960

related:

6491

Inconvenience of one extra character per element for explicit delimiters is minor compared to the parsing inconsistency IMO

It's not really a perfect dup of either of those, although it is related.

3737 is an exact dup. Even the title is almost the same.

|...| looks a bit too much like a determinant for my comfort. In the OP, I read | a b; c d | as a*d-c*b the first time round.

As mentioned elsewhere, the {...} syntax could be made available since it's completely redundant anyway (can always use Any[...] instead). I like the thought of making curly braces actually useful!

+1 for making {...} concatenation (and making [ ] equivalent to Any[ ] instead of None[ ]).

Consolidating from #7293, I'd like to see ways of constructing both N-vectors and Nx1 matrices, since we're distinguishing them.

Reclaiming {...} may be a good solution. [...] could actually keep its concatenating behavior, and {...} would become the non-concatenating version. This would be minimally disruptive since {...} is used less often, and it doesn't concatenate at the moment. (Plus, I find it more "intuitive" than the reversed roles, not sure why...)

Plus, I find it more "intuitive" than the reversed roles, not sure why...

Perhaps it reminds you of C++11 initializer lists?

Perhaps it reminds you of C++11 initializer lists?

I guess not, I don't even know what they are! I've not updated my knowledge of C++ in the last decade...

{...} syntax could be made available since it's completely redundant anyway

FWIW, as someone who does more general-purpose work with Julia I disagree that the {...} syntax isn't useful. I can definitely understand that perspective from those who mostly write performance- (and therefore type-) sensitive code, but when you don't want to think about the type system at all it's great to have a convenient escape hatch.

(And redundant ≠ useless – all of the current array syntax is essentially a syntactical convenience, but often those are important)

I like the suggestion of using x = [] as shorthand for x = Any[], and
having to explicitly say x = None[] to create a None array. Then we can
recoup {...} for something else and there's no convenience lost.

On Thu, Sep 18, 2014 at 9:11 PM, one-more-minute [email protected]
wrote:

{...} syntax could be made available since it's completely redundant anyway

FWIW, as someone who does more general-purpose work with Julia I disagree
that the {...} syntax isn't useful. I can definitely understand that
perspective from those who mostly write performance- (and therefore type-)
sensitive code, but when you don't want to think about the type system at
all it's great to have a convenient escape hatch.

(And redundant ≠ useless – all of the current array syntax is essentially
a syntactical convenience, but often those are important)

—
Reply to this email directly or view it on GitHub
https://github.com/JuliaLang/julia/issues/7128#issuecomment-56124658.

[] = Any[] would definitely makes sense, but is somewhat separate – I often have to write array/dict literals that contain elements. Presumably, recouping {} would mean I'd have to write Any[...].

In general Julia does really well with its "never mention types when you don’t feel like it" philosophy – I really think it would be a shame to lose that.

How crazy would it be for |...| to stand for absolute value? Somehow x = | x - 1 | looks really nice to me, but on the other hand, it _might_ be a parsing nightmare.

I'm going to go with pretty crazy. Which norm would you want (plenty to choose from)? Are norms commonly needed enough to deserve their own syntax?

I feared as much, just wanted to bounce it off of sane people. Probably the sqrt(dot(x,x)) norm for 1d data structures and friends (abs(x) for scalars, det(x) for matrices?). I imagine norms are used quite a bit for vectors, although the norm is technically || x || for them.

wouldn't standalone | or || create lots of insane parsing edge cases? Does 2 | x | in REPL mean 2 bit-or x bit-or something else to be followed in the next line, or 2 * norm( x )? Compound brackets with directional hints such as [| x |], {| x |}, |: x :|, etc may make a bit more sense, in danger of creating line noise. Or we start to use unicode brackets ⟦ ⟧ ⟨ ⟩ ⟪ ⟫

That would apply to the original proposal as well, right? Of those, |: x :| looks cleanest.

So is {...} going to be the new Matrix magic, or the type signature of a Tuple?

I don't think using | | as brackets is going to happen. However it would be great to have a general approach to using more kinds of brackets --- a standard way to parse ⟦x⟧, x⟦i⟧, etc. x[] as getindex does not have an obvious generalization.

Quite right... we have the current behavior at the parsing level, before even getting to what they do

type   with prefix    without prefix
[]     ref           vcat (maybe hcat), comprehension
{}     curly         cell1d, comprehension (and in 0.4: tuple type?)
()     call          tuple, or nothing (AST abstracted it away)

(Wow the brackets wear so many hats. I'm not sure I have all of them listed.)

New brackets could follow a simplified pattern, mapping into refdoublesquare and catdoublesquare for the prefixed and non-prefixed cases. Then we decide what they do: set, norm, matrix notation, dictionary shorthand, etc.

The unicode brackets I'd be content just to parse for now. But some alternative ascii brackets like [| |] are a real possibility for use in Base.

so now [| ... |] for matrix construction is a real possibility.

julia> [[1, 0] , [1, 0] ]
4-element Array{Int64,1}:
 1
 0
 1
 0

julia> [[1, 0] for i=1:2]
2-element Array{Array{Int64,1},1}:
 [1,0]
 [1,0]

Is this fixed? We do have a clear separation now: spaces and semicolons concatenate, anything else constructs an array from elements.

I will always dislike whitespace as syntax for hcat. But it's likely that ship has sailed by now. The separation is certainly much nicer post-8599 at least.

I can see a case for the consistency in making all concatenation operations (hcat, vcat and hvcat) always returning at least a 2D array. See this post from Bill Hart on the users list. But that's perhaps a separate issue.

Is it possible to create a 1x1 Array?? e.g. in something like the following (silly) example:

julia> [1, 2] * [3 4]
2x2 Array{Int64,2}:
 3  4
 6  8

julia> [1, 2] * [3]
# fails

Here [3], [3 ], [3;] (all permutations I've tried) all create a Array{Int, 1}, it'd be nice if there was someway to force 1x1...

julia> Array{Int, 2}([[1 3]]) # this is cheating
1x2 Array{Int64,2}:
 1  3

julia> Array{Int, 2}([[1]])
# fails

[1]' works. reshape is another option.

Is typed hcat doing something funny right now? https://groups.google.com/forum/#!topic/julia-users/E3G686bg9lE probably worth its own issue... cc @SimonDanisch

For literals with extra dimensions, would something like Int{2}[1;2;3;4] work or does it appear too much at odds with current type parameter syntax?

You could have:

Int[1;2;3;4]     #4-element Array{Int,1} (unchanged)
Int{1}[1;2;3;4]  #same
Int{2}[1;2;3;4]  #4 x 1 Array{Int,2}
Int{2}[1,2,3,4]  #error?
Int{3}[1;2;3;4]  #4 x 1 x 1 Array{Int,3}
Int{3}[1 2; 3 4] #2 x 2 x 1 Array{Int,3}
Int{1}[1 2] #error

Presumably you would require a type name as {2}[x] might look a little too weird.

Hm this does look a little weird and confusing.
I like the previously discussed [| solution, as @JeffBezanson suggested in Julia-users.
It enables us to clearly differentiate between concatenation and concatenation+flattening, which seems to be the crucial difference here.
This would justify, to simply handle these two cases with different syntax, making it very clear what will happen:

# All cases with optional typing, ensuring that you don't end up with Any[], 
# which I think is what the typing is really for
[vec, vec] => [vec, vec] 
[vec vec] => 
[vec 
vec] 

[| vec, vec |] => [el1, el2, el3, el4, ...]
[| vec vec |]  => 
[el el2 
el3 el4]

I don't know how I overlooked this one when I opened #10338 , cause it's the same (so I will close it in favor of this older issue). However, I don't think this should become v0.5. There is nothing difficult implementationwise, it's just a matter of decision. And since concatenation vs array construction is already undergoing changes in julia 0.4, it would be annoying to change it again in 0.5. Better make the correct well-considered decision right away and then stick to it.

I hope the following summary is useful for any remaining discussions:

How to distinguish between array construction and concatenation?

_Proposal 1: distinction via separator type_
This is what currently exists in master, i.e. pre-0.4

  • Array construction uses commas as in [a,b,c] or T[a,b,c]
  • Concatenation uses semicolons for vertical concatenation and spaces for horizontal concatenation as in [A B; C D] or T[A B; C D]
  • Remarks: there is only a pure Vector=Array{T,1} constructor, no pure Matrix=Array{T,2} constructor that does not concatenate; constructing a Matrix with a single column is hard and requires tricks using reshape or transpose which is not future proof as both these functions might change in the future (returning a special type of view instead of a pure Array).
  • Typed concatenation is not fully functional as it cannot be used to concatenate doubly nested arrays into an T=Array{...}; maybe it has to go but this means that it depends on the contents whether T[ ... ] is allowed.
  • Spaces as separators are also often criticized as hard to parse and easy to make mistakes.

_Proposal 2: distinction via bracket type_

  • Use one pair of brackets for pure array construction without concatenation (e.g. [ ... ]) and another pair of brackets (e.g. [| ... |]) for concatenation.
  • Minimal breakage with v0.3 would actually require the reverse convention, i.e [| ... |] for pure array construction and [ ... ] for concatenation.
  • Both vectors and matrices can easily be constructed: [a, b, c, d] and [a b; c d]. One could even use commas instead of spaces in matrix construction as [a, b; c, d] and it is easy to construct a matrix with a single column as [a; b; c; d].
  • Typed concatenation can easily be eliminated if wanted: there is one pair of brackets that can be prefixed by a type and another pair which cannot.

I personally prefer proposal 2 - especially doing away with spaces as separators and using commas instead. This would be a big breaking change, one that I feel we should have done a long time back.

I would also prefer alternative 2.

:100: for proposal 2)

I agree that supporting only commas and semicolons as separators would be much clearer. How about making [a, b, c, d] create a column vector (as currently), [a, b, c, d;] create a one-column matrix, [a; b; c; d] a one-row matrix, and [a, b; c, d] a block matrix?

Likewise in favor of proposal 2.

Sorry, hit the wrong button.

Option 2 does seem clearer.

Assuming we use [| and |] for concatenation, I think it would be nice to still allow spaces/new lines for matrix construction in this context, but disallow spaces in normal, non-concatenating array construction. I find matrices constructed like this easier to read with spaces.

The reason I don't like spaces is because the code becomes sensitive to whitespace. If you have matrix expressions that are being concatenated, you can't have spaces in them. This can lead to subtle bugs and can be difficult to debug.

I have no strong preference about the spaces. One suggestion could be that the parser allows spaces if every element is a single symbol, and returns an error if subexpressions occur in combinations with spaces. But that's probably also prone to raising many questions, and I have no idea how hard that is for the parser.

[a, b, c, d;] create a one-column matrix, [a; b; c; d] a one-row matrix

Did you mean the other way around with that? Otherwise that would be reversing the current convention and a little inconsistent with the block matrix version.

I've personally gotten in the habit of using hcat and vcat in conventional function syntax more often. Spaces are really only appropriate when every entry is a single identifier (single character even) and the value of each of those identifiers is a scalar. It might ease some pain if hcat and vcat accepted element type as an input. Maybe hvcat too but that one's a little messier to use manually.

@tkelman My idea was that [a, b, c, d] creates a column vector with one dimension, so [a, b, c, d;] would simply add the second dimension, keeping the organization of entries in a column. Sounds logical that way, but it indeed goes completely against how it currently works. Maybe we need another symbol...

I guess that's indeed the downside of using commas instead of spaces as the horizontal separator in the matrix constructor; in the vector constructor people would think of it as a vertical separator, so that becomes confusing. I was doubting whether I should mention that suggestion precisely because of this.

The reason I don't like spaces is because the code becomes sensitive to whitespace. If you have matrix expressions that are being concatenated, you can't have spaces in them. This can lead to subtle bugs and can be difficult to debug.

Roger that (and understand that point of view). Still prefer spaces in some circumstances. ;-)

Well before I started using Julia, I was made the switch from Matlab to Python/Numpy/Scipy, and while I was happy that I could do much of what I used to do Matlab, I remember feeling rather annoyed at how cluttered and verbose to type matrices became with commas (and Python's brackets, which aren't relevant here). For the uses I have, the matrices usually just look cleaner with spaces.

What can I say, I'm "space-sensitive". ;-)

Someone tell me whether or not this is a crazy idea or would work at all. Get rid of spaces as concatenation for normal use. But write a shorthand macro @[ that uses normal macro parsing rules for spaces and tries to act like the current array literal constructor for simple cases. I might be wrong, but I have a sneaky suspicion that a lot of the use cases where spaces are currently unambiguous and more compact are probably not also needing an element type specification.

It didn't occur to me that @[ ] is available syntax. Clever.

Fortunately I'm not _completely_ evil, and the parser just has a single "space sensitive" setting used by both macro calls and concatenation, so the rules are the same in both those cases.

Ah, convenient, they already share in implementation. So I'll vote (edit: oh wait, I already did last year - vote early, vote often, right?) for [| |] as concatenating construction requiring commas, [ ] as pure array requiring commas, and maybe for people who really like spaces, @[ ] as space-sensitive array. @[| |] as space-sensitive concatenating constructor even?

Luckily [1, 2; 3, 4] does at least parse in 0.3.6 (but not in 0.3.0? I'll have to find which backport fixed it), so at least that syntax for block matrices could be done in Compat.jl. For single-row matrices, function-syntax hcat should work on both. I don't think we have a syntax currently for creating a one-column matrix directly (not counting transposing a vector twice)?

While we're spitballing massive syntax changes here, one option now that { ... } is being made available would be {a, b, c} for array construction and [a, b, c] for array concatenation. You could use commas and semicolons for both and eliminate spaces. This actually has the advantage of being more backwards compatible with 0.3 and before since [a, b, c] would not change meaning and {a, b, c} would only change element type. The major drawbacks are that there's no way to express Any[a, b, c] and since the space-sensitive mode still exists for macros, it still doesn't entirely remove the space-sensitive parsing mode. One of the things I've got my eye on for spaces is something like f a b for f(a,b) or even generalizing a in b. Perhaps that's not incompatible with the way macros are parsed.

Lets take a look at them

a = {1,2,3,4;
     3,2,3,4;
     7,2,3,1}

a = [1,2,3,4;
     3,2,3,4;
     7,2,3,1]

a = [|1,2,3,4;
      3,2,3,4;
      7,2,3,1|]

a = {1 2 3 4;
     3 2 3 4;
     7 2 3 1}

a = [1 2 3 4;
     3 2 3 4;
     7 2 3 1]

a = [|1 2 3 4;
      3 2 3 4;
      7 2 3 1|]

I think all of the solutions look pretty acceptable, while spaces look a little better than , and { looks a little better than [| (probably greatly depends on the font).
But I don't perceive the difference as very big, so I would decide by convenience and clarity.
For people who are not used to matlab , makes much more sense as its congruent with the 1D case and it will remove spaces, so I'd vote for , instead of spaces.
I'd vote against { since I find typing the array pretty crucial. How else would you force the array to be of type Any, or a common abstract type?

In Scala I liked that you can write map sin A especially for f for f(). But it also introduces some randomness in your code, because you can choose one syntax or the other in an arbitrary fashion. Which is pretty annoying and makes part of the code hard to read.
So I think I would vote against this... I'm not entirely set on this, but I think it's a slippery slope for readability ;)

Why can't we have typed array construction if { ... } is used for array construction. It's not simply obtained by overloading getindex, but it this gives problems anyway if one tries to create an array with a Tuple element type. But the parser could transform a T{ ... } to a specialized call, just like there currently is typed_(h)(v)cat.

I guess the main problem is with backward compatibility, namely that T[ ... ] would either be deprecated or would become a typed version of concatenation.

@Jutho, the issue is that X{...} is the syntax for parametric types.

Ah yes, how can I not think of that. Maybe I should try to focus on one thing at the time.

I think it should work completely symmetrical, this way its the easiest for the user to learn and he doesn't run into the moment of "heyy, why doesn't the same principle apply here, I need to ask the mailing list about this odd behavior".
So I'd suggest, that T[ + T[| stay the syntax for defining the resulting array element type.

T[...] is also the syntax for indexing, which doesn't seem to be a problem in practice but is a little iffy.

What I was trying to get to is that by using new brackets for array construction (e.g. [| ... |] but clearly not { ...}), we can have typed array construction, i.e. T[| ... |] could give rise to a specialized typed constructor call, that also works with T<:Tuple. Then [ ... ] would be concatenation, like it has been in v0.3 and before, and T[ ... ] could be deprecated in favor of T[| ... |]. That seems like a rather smooth transition strategy.

@SimonDanisch : there are some problems with typed concatenation, if you want to use it with T<:AbstractArray. What happens if I want call typed_vcat(Vector{T},a) where a is some object of type Vector{S}. Should one try to convert a to a Vector{T}, or should a be expanded and the individual elements of type S be converted to Vector{T}. This probably depends on what S and T are, but with more levels of nesting this becomes essentially unsolvable. That and the problem that T[...] interferes with indexing into a tuple type.

So I think deprecating typed concatenation is fine. Either you force element type using a pure array construction and don't concatenate (or use splatting for simple concatenations), or you use the convenience syntax for concatenation and accept the fact that it has limitations.

...one option now that { ... } is being made available would be {a, b, c} for array construction and [a, b, c] for array concatenation.

This is very similar to MATLAB syntax, again--cell arrays have non-concatenating construction behavior. Of course Julia would be doing something more tightly typed here, but the structure is similar.

I see... Bummer! ;)
Although I thought the general rule would be quite simple.
For typed_vcat(::Type{T}, A...) something like:
First try to flatten any iterable and then try to convert the elements to T.
If your vector in A doesn't have Vector{T} as elements, than don't type vcat's element with Vector{T}.
With the new syntax it would also be a lot easier to actually construct the typed vcat you're talking about:

a = Vector{Int}[|[[1],[1],[1]], [[1],[1],[1]]|] # [| for concatenation
#-> [[1], [1], [1], [1], [1], [1]]

We could use {a,b,c} for array construction and [a,b,c] for concatenation and come up with a different syntax for specifying the element type. For example, maybe T@{a,b,c} or T@[a,b,c]. I believe that Int.{a,b,c} and Int.[a,b,c] are also available.

For specifying the element we could mimic Go by putting the element _after_
the brackets:

x = {a,b,c}Int
y = [a,b,c]Float64

On Tue, Mar 10, 2015 at 3:15 PM, Stefan Karpinski [email protected]
wrote:

We could use {a,b,c} for array construction and [a,b,c] for concatenation
and come up with a different syntax for specifying the element type. For
example, maybe T@{a,b,c} or T@[a,b,c]. I believe that Int.{a,b,c} and
Int.[a,b,c] are also available.

—
Reply to this email directly or view it on GitHub
https://github.com/JuliaLang/julia/issues/7128#issuecomment-78149861.

I guess now we have an embarrassment of brackets. I've come to dislike [a,b] concatenating. It just "looks like" array construction, e.g. if you know python. One problem is that the behavior on scalars ([1,2,3]) makes you think it does array construction. [ ] are also more valuable real estate since they look better and don't require shift. I think we would see a lot of [1,2,3], but then you'd have to remember to switch to a different syntax for more general elements or to specify an element type. So being closer to matlab cell arrays is not too appealing to me.

Array construction is more general than concatenation in the following interesting way: if [a b; c d] gave a 2x2 array just referencing a, b, c, and d, you could get concatenation with one extra call, e.g. blockmat([a b; c d]). Furthermore, this is pretty efficient since the amount of extra allocation doesn't depend on the sizes of the arguments. Going the other way, starting with concatenation, is not possible. So I could live without concatenation syntax easily.

+1
I also realized today that your earlier suggestion to have a python-like array function for constructing arbitrary multi-dimensional arrays (with possibly N>2) from a nested vectors would look really ugly if array construction required braces or even worse the [| ... |] brackets.

Though one possible caveat is if you want to construct a block matrix such that the different blocks are unequal and thus there could be a different number of blocks on the different rows. something like
[A ; 1 2] where A has two columns.

Hmm yes, that's a good point. That's probably one of the main reasons things are the way they are now.
However it's almost an accidental feature; only the rows work that way and you can't do the same trick with columns. I wonder if there is a clever solution.

2D/nD splatting?

R = [0.5 0.5; 0.5 0.5]
t = [2.0, 3.0]

Rt = [R... t...; 0 0 1]
# Gives: Rt == [ 0.5 0.5 2.0
#                0.5 0.5 3.0 
#                0.0 0.0 1.0 ]

I could live with this.

I think Stefan has an interesting idea to add dedicated syntax for specifying the element type of an array construction/concatenation.

It would decouple the rest of the syntax decisions a lot from the current considerations of which special cases collide with existing syntax.

And although I really like using eg Int[x, y, z] syntax, I think that we have seen by now that this overloading of indexing into types is not quite without its problems. This could be a chance to improve on that situation.

I would personally vote for a syntax that doesn't use @, since I don't want to get desensitised to the feeling that there is something special going on when it is involved.

I believe matlab's concatenation has the same limitations as hvcat, btw?

[ ] are also more valuable real estate since they look better and don't require shift.

That's right, they require AltGr. (Not trying to make a point for this particular discussion, just a gentle reminder that the international outlook on what characters are easy to type may vary.)

Hmm yes, that's a good point. That's probably one of the main reasons things are the way they are now. However it's almost an accidental feature; only the rows work that way and you can't do the same trick with columns. I wonder if there is a clever solution.

That is indeed true. I guess not allowing blockmat([A; 1, 2;]) and having to write blockmat([A; [1,2;]]) is not too bad. It's more symmetric, since in the column case you would also need to write blockmat([A, [1; 2];]). Note however, that I need many semicolons to prepare the input of blockmat if I don't use spaces as horizontal separators. In the first case, I want to construct a (2,1) matrix containing elements A and the (1,2) matrix [1,2;]. In the second case I want to construct a (1,2) matrix containing the elements A and the (2,1) matrix [1;2]. Keeping spaces around as horizontal separator would be helpful for such cases, i.e. then it becomes

A=randn(2,2)
blockmat([A; [1 2]]) # adding an extra row A
blockmat([A [1; 2]]) # adding an extra column to A

which looks perfectly symmetric.

Also note that we could have a blockmat custom string literal and do this kind of thing:

X = blockmat"""
    A B
    C D
"""

This could allow us to get rid of the space-sensitive parsing for concatenation.

Another possibility is to pad with empty arrays: blockmat([A []; B C]), although that could lead to type pollution.

Blue-sky mode engaged:

Wide row:

mat"""
    A -
    B C
"""

Tall column:

mat"""
    A B
    | C
"""

Partitioned augmented matrix:

mat"""
    A 0 X
    0 B |
"""

I would personally vote for a syntax that doesn't use @, since I don't want to get desensitised to the feeling that there is something special going on when it is involved.

+1 to this.
Is there something that speaks against [...]T / {...}T ?
If we follow @JeffBezanson's blockmat proposal, {...}T could still be used for FixedSizeArrays, allowing them to follow the same principles for the creation of fixedsize matrices.
If we combine this with , instead of spaces, I feel like most of the basic demands are satisfied.
The last use cases with the string literals could than be an addition to this, probably not living in base.

a,b,c... = 1
const T = Int # T is in all cases optional

A = [a,b,c]T
#-> Array{T, 1}
B = [a, b, c; 
     c, d, f]T 
#-> Array{T, 2}
C = blockmat([A;A]) # and why not also: blockmat([A;A], T) 
#-> Array{T, 2}

AF = {a,b,c}
#-> FixedSizeArray{T, (3,)}
BF = {a, b, c; 
     c, d, f}T
#-> FixedSizeArray{T, (2,3)}
CF = {A;A}
#-> FixedSizeArray{ Vector{T, 1}, (1,2)}
DF = blockmat(CF)
#-> FixedSizeArray{T, (2,3)}
EF = blockmat({AF;AF})
#-> FixedSizeArray{T, (3,3)}. Probably more copies used than optimal?

I'm fine with this as it covers my use cases very well, but I don't know what the matlab people will think of their precious concatenation.
Hopefully not something analogous to one of my favorite quotes:

In the beginning the Universe was created.
This has made a lot of people very angry and been widely regarded as a bad move.

As this is a very crucial change, directly incfluencing how Julia is perceived, we should probably treat this a little bit more analytical.
We could first compile a list of use cases, than another list with solutions that satisfy the use cases and than we could make a user poll.
I might write something down later.

Use statistics over packages would be good to have.

Though there are some advantages to moving the type to the other side ([1,2,3]Int), the change seems a bit harebrained to me, like we're just messing with people.

I don't think we should pick a lucky data structure to give braces to. [ ] should construct Arrays (not concatenate), and { } can be something else entirely. Sets and Dicts are justifiable uses, but I think even those are not common enough to need special brackets instead of Set([a,b,c]).

So nobody gets {} ?
I agree, it shouldn't be an arbitrary lucky datatype, but rather be based on the usage statistics.
Also, FixedSizeArrays would be a good use case, as it needs exactly the same set of constructors as normal arrays. (Though I'd understand, if they're not used enough to justify their own syntax)
The promised lists (a little hastily put together):

Use Cases:

  • non concatenating vector creation
  • non concatenating matrix creation
  • concatenating vector creation
  • concatenating matrix creation
  • keep Julia close to matlab's syntax, making it easier to switch
  • get rid of spaces
  • concat syntax for a mix of elements and vectors
  • typing the element type of any created array
  • make everything intuitive to use and smooth looking (duh! )

There are obviously conflicting items on this list, like get rid of spaces while keeping close to matlab.

Implementation building blocks:

non concatenating vector/matrix creation
  • anything created with [ is non concatenating
  • introduce other syntax ({, |, [|, blockmat()) for concatenation
get rid of spaces
  • use , instead of spaces
  • use string literals for creation
concat syntax for a mix of elements and vectors
  • use string literals for creation
  • 2D/nD splatting (could also make conatenating obsolete)
  • use placeholder symbols
typing the element type of any created array
  • T[
  • ]T
  • T@[

Implementations

satisfies:
  • [x] non concatenating vector creation
  • [x] non concatenating matrix creation
  • [x] concatenating vector creation
  • [x] concatenating matrix creation
  • [ ] keep Julia close to [matlab's syntax]
  • [x] get rid of spaces
  • [ ] concat syntax for a mix of elements and vectors (not explicitly)
  • [x] typing the element type of any created array
  • [ ] make everything intuitive to use and smooth looking (blockmat not that nice for for nested expressions)
a,b,c... = 1
const T = Int # T is in all cases optional

A = [a,b,c]T
#-> Array{T, 1}
B = [a, b, c; 
     c, d, f]T 
#-> Array{T, 2}
C = blockmat([A;A]) # and why not also: blockmat([A;A], T) 
#-> Array{T, 2}

AF = {a,b,c}
#-> FixedSizeArray{T, (3,)}
BF = {a, b, c; 
     c, d, f}T
#-> FixedSizeArray{T, (2,3)}
CF = {A;A}
#-> FixedSizeArray{ Vector{T, 1}, (1,2)}
DF = blockmat(CF)
#-> FixedSizeArray{T, (2,3)}
EF = blockmat({AF;AF})
#-> FixedSizeArray{T, (3,3)}. Probably more copies used than optimal?

No time for adding more complete implementation permutations ;)

Best,
Simon

Is this really v0.5 material? The array construction/concatenation interface is already drastically changing from v0.3 to v0.4 and then drastically changing the same aspect of the language again in v0.5 seems like it will annoy many users.

+1

Ah I guess I misread the milestone, this is still v0.4 material. Good to see and my apologies for the spam.

From @JeffBezanson's talk yesterday here at JuliaCon2015, it sounds like {} may be given to Tuples.
Would that mean that () for tuples is deprecated, and {expr} works instead of having to remember to stick a , in (expr,) (something that caused me no end of problems in my first weeks)

it sounds like {} may be given to Tuples.
Would that mean that () for tuples is deprecated

No, {} would be tuples of types, the current Tuple{}. Tuples of values would still use ().

Ah! OK, that's fine too. Looking forward a lot to the subtyping shown yesterday.

OK, time to decide how to close out the 0.4 part of this. I see three options:

  1. Turn on the new [a,b] behavior now (_oldstyle_array_vcat_ = false).
  2. Produce a release candidate, give people 1-2 weeks to fix deprecation warnings, then flip the switch.
  3. Leave the old behavior enabled and deprecated for all of 0.4.

I vote for 1 or 2.
cc @StefanKarpinski @ViralBShah

I vote for 1 but there will probably be cries of anguish.

i have generally seen a non-zero number of package tests that are still running into this warning now. turning this off will break their code before they get a chance to see and fix their behavior. I suspect that many package authors have been following the recommendation to wait for the release of v0.4-pre to update their code from v0.3 and will be unhappy if their code breaks instead of getting a helpful deprecation warning.

i vote for not rushing this (aka 3), since it apparently hasn't been that much of a problem for several years (since it is only just getting fixed now).

This isn't really about fixing a problem; it's a behavior change.

I would be ok with 2 as well, which is the slightly less drastic version of 1.

1 or 2 here also (this bit my 9 year old son last week, he was very happy to hear that it was going to be changed).

Won't anything but 3 mean that anyone who has been running just 0.3 will, upon upgrading to the released version of 0.4, encounter silent breakage, with no recommendation about how to fix it?

Won't anything but 3 mean that anyone who has been running just 0.3 will, upon upgrading to the released version of 0.4, encounter silent breakage, with no recommendation about how to fix it?

Yep. 3 is the only choice that actually follows our deprecation policy.

Policy: for those times when you need a reason not to do what everybody wants.

I agree it's very inconvenient---for the record, personally I too would rather just make this change now. But I don't think we can. There is merit in "release discipline."

Think of it as motivation to make more frequent releases :smile:.

This isn't really a deprecation, though. It's a major change in semantics. As things stand, we're in this strange middle-ground where there's a warning for syntax that both was and will be completely valid. Blocking that syntax for an entire release cycle would be frustrating… but I also understand that silent breakage would also be very frustrating.

I like option 2 — there are lots of broken packages on 0.4 right now. We'll want to spend some time in the pre- stage cleaning things up and making sure things work. May as well give folks the warning as they're fixing things, and then flip the switch on the second release candidate. (We'll also have a chance to reconsider at that point, too.)

Echoing @vtjnash's point, I'll point out that I lobbied for this rather some time ago, but someone else pointed out we couldn't change it :stuck_out_tongue:.

No strong opinion here on 1-2-3, but is it clear that the current _oldstyle_array_vcat_ = false behavior is what we want to stick with? I thought that around and below here there was somewhat of a consensus to fully separate array construction and array concatenation, not by the type of separator used but by the type of brackets used. The latter would buy several advantages over the former.

That's a fair point --- we might want a more comprehensive overhaul of concatenation syntax, in which case we could leave this deprecation in place and change everything at once in 0.5.

I'm pretty sure [a,b] should construct a 2-element vector. We might make other changes, but I don't think that should change. So it's not totally crazy to change just that first.

So far the suggestions in this thread I like the most are having [a b; c d] only construct arrays, and [| |] concatenate, and use Int.[ ] for typed arrays. [a,b,c;] to construct a 1-column matrix is kind of ok too.

I'd vote for 3. Worst case scenario for 1 or 2 is that someone is doing some kind of analysis with private code not in a package and updates 0.3 -> 0.4 only once it's released. There are situations in which one could get different, incorrect results with no warnings at all, which, while not likely in most uses, is a really scary possibility.

The fact that we might further change this syntax in the Arraymageddon is a really good point. I think that tips my vote in favor of 3 and waiting to see how all that pans out.

Ok, we'll leave the warning and hope we can fix this once and for all in Arraymageddon.

Is it a good idea to change the concationation vs construction syntax in two steps (not talking about the deprecation period). If people learn in v0.4 that [ a, b, c] will no longer concatenate, they will switch to [a ; b; c]. This syntax would then in a second cycle be changed again to a non-concating one-column matrix constructor. So in v0.5 [a; b; c] would probably yield a deprecation with a suggestion to use a different pair of brackets for concatenation, and the final syntax would only be settled in v0.6.

I was hoping that the bracket based (instead of separator based) distinction between construction and concatenation would be completed in v0.4, though still disabled in favor of the old behavior with deprecation warning, and then become enabled in v0.5. I am willing to help with the remaining work on the julia side, but cannot offer assistance on the parser side.

Why [|...|] for concatenation? Would that be [| a, b, c |] to act like vcat(a, b, c)?
In that case, why not consider also a ++ b ++ c, and not have any syntax with [ and ] that concatenates?

Block matrices.

I'm increasingly in favor of writing vcat(a, b, c) and hcat(a, b, c) and having a string literal syntax for block matrices that looks something like this:

mat"""
 A B
 C D
"""

It can generate optimal code for constructing the new matrix from the inputs, which is an advantage over the current lowering to vcat and hcat calls.

@StefanKarpinski 's string literal idea seems rather interesting, and seems a lot more understandable than something with [| and |].

One could also support convenient things like this:

mat"""
 A 1
 0 B
"""

This would be the equivalent of what one currently expresses as:

[ A                            ones(size(A,1), size(B,2))
  zeros(size(A,2), size(B,1))  B                          ]

I've always rather wanted that.

It could also easily come in different flavors like:

fixedmat"""
1 1 1
0 0 0 
"""
So +1 from me =)

@StefanKarpinski , I also just came to that conclusion and it would indeed be very nice. Though maybe with I instead of 1 being replaced with eye instead of ones (though both maybe useful). Anyway, I have no particular favor for a concatenation syntax, just wanted to make the point that it will be dreadful if in 0.4 there is a deprecation warning to use ; for concatenation and then in v0.5 there is another deprecation warning that ; is now reserved for a non-concatening matrix constructor and there is yet another new syntax for concatenation.

the current lowering to vcat and hcat calls

We actually call hvcat.

+1 for:

mat"""
 A I
 0 B
"""

although practically speaking it should probably be blockmat"""...""" or something similar.

On Jul 14, 2015, at 5:49 PM, Jeff Bezanson [email protected] wrote:

the current lowering to vcat and hcat calls

We actually call hvcat.

That's what I thought but not what I saw when I tried it.

I'm on board for having some alternate block matrix or concat syntax, but putting code in strings always gives me the willies. It seems to me we could just as well do something like

@mat [
  a b
  c d
]

but putting code in strings always gives me the willies.

Yes! I always feel like representing code as strings was always a last resort. _Almost_ as bad as throwing stuff into a string and running eval. There's just too much freedom. I love rigorous, strict systems with clearly defined error behavior.

In fact, I'd go for almost anything _but_ a string.

Also:

  • IDEs have a terrible time with completion and highlighting inside strings

+1 for @mat.

+1 for @mat from me too. I also feel really uncomfortable putting code into strings.

+1 for @mat. I agree that is better/cleaner than strings.

+1 for @mat also, it's nice to move things out of special syntax built into the parser and into macros.

I also dislike putting code in strings, but the thing about @mat is that what comes after the macro needs to be valid syntax. I'd like to get rid of the fiddly whitespace sensitivity that occurs inside of array concatenation syntax. Having white-space sensitivity inside of mat"..." would be far more palatable.

@StefanKarpinski What about sth like

@mat [A, B;
      C, D]

This could be made white space insensitive.

I didn't follow the discussion here closely so sorry if this have been brought up before.

@StefanKarpinski are you saying that you would like

mat"""
A B
  C D
"""

to be translated to:

mat"""
A B 0
0 C D
"""

Would definitions like:

mat"""
longmatrixnameA B
C D
"""

be invalid because the columns don't line up vertically?

@tbreloff: no, I wasn't proposing that. I think you need explicit zeros.

I'm going back and forth on this now. I think the implementation can actually be more robust with the mat"""...""" version. It's a little more straightforward to split/parse/rebuild the string if we can assume that only symbols and numbers are allowed, especially if the syntax surrounding [] changes, or if the AST layout changes. If, however, we want arbitrary expressions to be allowed like:

@mat [
map(sin,rand(10,10)) a
b 0
]

then that might be really hard to get right with strings.

other questions...
1) If there are size discrepancies, how do you handle them? Pad with zeros? Throw an error?
2) How to handle mismatched shapes like:

a = rand(1,2)
b = rand(1,3)
c = rand(1,4)
z = @mat [
 a 0
 0 b
 c
]

# does z look like:
[ a1 a2 0 0
 0  b1 b2 b3
 c1 c2 c3 c4 ]

# or:
[ a1 a2 0  0  0
 0  0  b1 b2 b3
 c1 c2 c3 c4 0 ]

For the mat""" approach, wouldn't you just use interpolation for that case? i.e.

mat"""
      $(map(sin,rand(10,10))) a
      b 0
      """

(maybe a and b should really be $a and $b also?)

I don't think that works:

julia> macro mat_str(e); dump(e); end

julia> mat"""
       sin(rand(2,2)) a
       b 0
       """
ASCIIString "sin(rand(2,2)) a\nb 0\n"

julia> mat"""
       $(sin(rand(2,2))) a
       b 0
       """
ASCIIString "\$(sin(rand(2,2))) a\nb 0\n"

I think the string macro is a great solution for constructing small matrix literals without commas, allowing construction with proper brackets to require them. It definitely should not be a generic solution for construction/concatenation though where you might want to put arbitrary code.

I'm somewhat skeptical there is much need outside of language purity for anArray{T,N}, N >= 2, construction syntax that doesn't flatten the elements. How often do people create matrices of vectors? Maybe the macro could be used for the pure construction behavior since it should be much less common and the current concatenating behavior can remain? I also think using different brackets is a fine solution.

OK, my bad, I'd thought string interpolation still occurred, before passing to the macro.
Of course, you _could_ do the following:

@mat """
       $(sin(rand(2,2))) a
       b 0
       """
Expr 
  head: Symbol string
  args: Array(Any,(2,))
    1: Expr 
      head: Symbol call
      args: Array(Any,(2,))
        1: Symbol sin
        2: Expr 
          head: Symbol call
          args: Array(Any,(3,))
          typ: Any
      typ: Any
    2: ASCIIString " a\nb 0\n"
  typ: Any

One complaint about DSLs using strings is the lack of syntax highlighting, but I wonder if this is something that could be overcome (in future) by defining lexer/parser for each macro. That's a whole other can of worms.

Most text editor syntax highlighting grammars support nesting grammars. In
Sublime, for example, I hacked the julia.tmLanguage to highlight SQL within
the sql"..." or query("...") functions/macros (I think @Keno did it for C++
too with cxx"..."). I don't think this has anything to do with Base though.

On Wed, Jul 15, 2015 at 10:50 AM, Andy Hayden [email protected]
wrote:

One complaint about DSLs using strings is the lack of syntax highlighting,
but I wonder if this is something that could be overcome (in future) by
defining lexer/parser for each macro. That's a whole other can of worms.

—
Reply to this email directly or view it on GitHub
https://github.com/JuliaLang/julia/issues/7128#issuecomment-121676812.

@tbreloff

We could make the syntax unambiguous by always producing the smallest possible output matrix.

a = rand(1,2)
b = rand(1,3)
c = rand(1,4)
z = @mat [
 a 0
 0 b
 c
]
# ==>
[ a1 a2 0 0
 0  b1 b2 b3
 c1 c2 c3 c4 ]

z = @mat [
 a 0
 0 b
 c 0
]
# ==>
[ a1 a2 0  0  0
 0  0  b1 b2 b3
 c1 c2 c3 c4 0 ]

I disagree that we should claim not to have space-sensitive syntax, and then actually have it within a special kind of string that we tell people to use. You still have space-sensitive syntax, plus the added problems of putting code in strings. This is an important language construct for us, so we ought to be able to parse it with just our normal parser. This makes life easier for writers of other macros or code analysis tools. If some syntax is worth having, it's worth having in the actual parser.

One of the things which attracted me to Julia was the cleanliness of the language - it's pretty and (mostly) easy to comprehend even for someone who is not a scientist and who's programming background is in K&R C with little actual programmin over the last couple of decades. What however is not pretty and clean at all is putting code into strings. For me it turns code hard to read - that's just the way my brain is wired. Likewise using white spaces as separators is just hideous from readability point of view - I'd certainly prefer to see , and ; as mandatory separators. I'm also not sure it's a good idea to use the macro approach as macros tend to be a slight turn off for highly incompetent folks like myself, but it's certainly better from approachability point of view than the string thing.

@Jakki42, would you think the string solution is not nice because it usually does not offer sensible syntax highlighting? In other words, would you be okay with the solution if we offer specialized syntax highlighting for it?

String macros are like the wild west. The macro could do _absolutely anything_. Which is awesome and powerful, but also can be crazy and hard to reason about.

Lots of folks have a rightfully-learned aversion to code in strings. And they're right. Code has no place in a string. It requires runtime parsing and evaluation, which is slow and a performance trap in almost all languages since there's no way for a static analyzer to reason about what the evaluation might result in.

But string macros don't necessarily return code in a string. They can implement their own parsing rules and place the resulting expression directly into the surrounding code before it's compiled. There could potentially be no difference between the text of a source file and the text within a string macro. It's all just parsed text. As a toy example, here's a string macro that simply transforms its whitespace separated contents into an addition between all of them:

julia> macro add_str(ex)
          toks = map(parse, split(ex)) # This could be more robust, but works as a simple example
          esc(Expr(:call, :+, toks...))
       end

julia> x = 2; y = 3;
       add"x y 2x*y sin(rand())"
17.775332435685616

julia> macroexpand(:(add"x y 2x*y sin(rand())"))
:(x + y + (2x) * y + sin(rand()))

This means they can implement DSLs, add rich text markup, precompile regular expressions, execute C++ code, and more. A string macro's contents can be entirely data, or it can have its own interpolation rules (with code demarcated by $ or \(…) or any rules it wants), or it can be entirely Julia code _in the same scope_ as its surrounding code, returning a parsed Julian expression.

After all this, I'm still with @JeffBezanson. If this is an important enough construct to be included in the standard library, it should be a first-class parsed syntax. All the other string macros included in base currently follow the "entirely data" or "data with interpolation" semantics — adding this would cause lots of confusion, I think. The only way I could see a consistent story here is if we decide that backtick string macros (#12139) should always contain in-scope Julian code and all other string macros should be predominantly string data (which may happen to be code for another language… which is unfortunate for Cmd).

@SimonDanisch - for me it's not a syntax highlighting issue, but the messyness of the syntax itself - it's confusing and unclear to me, a deviation from how the rest of the syntax looks like, like from a different language - I can not quicly glance though it, but my old and slow brain needs plenty of extra effort to understand what's going on. Maybe it is my background but to me [ ], ( ), { } clearly encapsulate something, make a unit or block of something and " " and ' ' always just seem like string or characher of no programmmatic meaning inside. While < > also form a nice opening and closing, I'm so old that to my brain they do not equate to anything to else but bigger or smaller than symbols and extra effort is needed to understand if they were to have something meaningful inside. And whitespace - to me it's always just something not relevant to syntax, parsed out.

Anyhow, please keep in mind that I am a low end low priority user, certainly not a member of the main target audience groups of the language :-)

I find the whole notion of string macros for doing math slightly horrifying. This is terrible syntax. It is not merely ugly and messy; it is ad-hoc and inconsistent. One principle of design is that similar things should look similar, and different things should look different. String macros should typically do string-like things. For example, PyPlot uses L"..." to LaTeX strings. String-like syntax should be used for string-like objects. By the same principle, collections of things like tuples and arrays should use [ ], ( ), { }.

Another big problem with string macros, which was raised earlier, is that language syntax should be parsed by the parser. Macros force ad-hoc sub-languages inside Julia with their own alternate parsers. This is bad for tools that need to parse Julia (syntax highlighting was mentioned) and it is just bad design because macros in general are problematic. My biggest pet peeve with Julia is that @sprintf is a macro and not a function. Other languages manage to make sprintf functions, why can't Julia? So my view is that we need fewer macros in the core language, not more.

@dcarrera your question about the @sprintf macro was very nicely addressed here on stackoverflow by @StefanKarpinski.

We've addressed much of the core issue here by changing the meaning of [a,b,c] to always do array construction. The whitespace-sensitive syntax remains an issue, but not a huge one. In 1.x we can revisit this and try out new syntaxes for this, and eventually could remove the whitespace sensitive syntax in 2.0, but this isn't going to happen for 1.0.

Just wanted to register my strong preference that none of the array literals (including matrices) do any form of concatenation. The concatenation behavior feels like an unnecessary MATLAB-ism to me. If we're building LinAlg to be block-matrix friendly, why not let [A B; C D] make a block matrix?

Regarding syntax changes, the syntax that appeals most to me is the one mentioned by @yuyichao, with commas and semicolons and no whitespace sensitivity. I can see this generalizing arbitrary dimensions by having other symbols like double-semicolon ;;, being whitespace invariant, allowing row matrices vs vector, etc.

# vector
[1, 2, 3, 4]

# matrix
[1, 2;
 3, 4]

# 3D array
[1, 2;
 3, 4;;

 5, 6;
 7, 8]

The only thing is I sometimes wonder if it would be nicer to do this in storage order not "looks like linear algebra" order. Otherwise [1, 2, 3, 4;] is a 1x4 matrix not a 4x1 matrix, it seems odd that the last semicolon can make such a drastic change.

The final thing I'd love is a literal for 0-dimensional arrays, as I find StaticArrays.Scalar extremely useful to control broadcast, etc. The possibility here is [a] makes a zero-D array and [a,] makes a 1D vector. I admit some users might find that obnoxious, but it has some logic to it (with , being the first dimensional separator and following the same rule as trailing ;, etc, plus we have to treat length-1 tuples with trailing ,, as in (a,) so there is a precedence.).

+1 for the rules proposed by @andyferris, except for the 0D array part ([a,] sounds too cumbersome).

On the other hand [a,] would be consistent with length 1 tuples (a,).

Besides all the issues, whitespace sensitivity is so damn nicely lightweight :cry:

[1 2 3
 4 5 6
 7 8 9]

vs

[1, 2, 3;
 4, 5, 6;
 7, 8, 9]

Related to this issue: the construction of n x 1 matrices comes up often (eg this and this just in the past month). Until this issue is resolved, would it make sense to make a FAQ item for this?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wilburtownsend picture wilburtownsend  Â·  3Comments

musm picture musm  Â·  3Comments

omus picture omus  Â·  3Comments

yurivish picture yurivish  Â·  3Comments

manor picture manor  Â·  3Comments