When abusing aliases in RETURN ... ORDER BY ..., it's possible to get surprising results. Users should not use aliases like this, to begin with, so this is not a critical issue. Still, I found it interesting and I think it's best if it's recorded here. I stumbled upon it while working on https://github.com/slizaa/slizaa-opencypher-xtext/issues/7.
Neo4j Version: 3.3.1
Operating System: Ubuntu 16.04
API: web UI
I created a small example dataset of 4 rows.
UNWIND [
{x: 1, y: 3},
{x: 1, y: 4},
{x: 2, y: 3},
{x: 2, y: 4}
] AS row
WITH row.x AS x, row.y AS y, row.y AS y2
RETURN x AS y, y2
ORDER BY y
Result:
โโโโโคโโโโโ
โ"y"โ"y2"โ
โโโโโชโโโโโก
โ1 โ3 โ
โโโโโผโโโโโค
โ1 โ4 โ
โโโโโผโโโโโค
โ2 โ3 โ
โโโโโผโโโโโค
โ2 โ4 โ
โโโโโดโโโโโ
We can observe that the engine orders tuples according to the value of x. However, if we repeat the value of y as y3, we get a different ordering:
UNWIND [
{x: 1, y: 3},
{x: 1, y: 4},
{x: 2, y: 3},
{x: 2, y: 4}
] AS row
WITH row.x AS x, row.y AS y, row.y AS y2
RETURN x AS y, y AS y3
ORDER BY y
(The only difference in the query is the last item of the RETURN clause, y AS y3.)
Result:
โโโโโคโโโโโ
โ"y"โ"y3"โ
โโโโโชโโโโโก
โ1 โ3 โ
โโโโโผโโโโโค
โ2 โ3 โ
โโโโโผโโโโโค
โ1 โ4 โ
โโโโโผโโโโโค
โ2 โ4 โ
โโโโโดโโโโโ
In theory, repeating variable y aliased as y3 should not interfere with the ordering, but somehow it does. I can only guess, but probably this somehow "lifts" the original variable y higher in the scope and it takes precedences over the newly introduced variable y (which is an alias for x).
Expected results: one can argue for both orderings, but the two queries should use the same ordering. In my opinion, the variable reference y should refer to the closest variable declaration, i.e. to the alias (AS y) introduced in the RETURN clause.
They are not very revealing, but I added the EXPLAIN plans for query 1 and query 2.
... 
I dove into this a bit more. I got suspicious that the root of the problem is in the AST rewriters - normalizeWithClauses and/or normalizeReturnClauses.
These are typically tested with MATCH queries. So first, I created an actual graph of 4 nodes to serve as input data:
CREATE
({x: 1, y: 3}),
({x: 1, y: 4}),
({x: 2, y: 3}),
({x: 2, y: 4})
This way, the queries become:
MATCH (n)
WITH n.x AS x, n.y AS y, n.y AS y2
RETURN x AS y, y2
ORDER BY y
vs.
MATCH (n)
WITH n.x AS x, n.y AS y, n.y AS y2
RETURN x AS y, y AS y3
ORDER BY y
Interestingly, the error does not occur when using ORDER BY on a WITH clause (instead of on a RETURN clause).
MATCH (n)
WITH n.x AS x, n.y AS y, n.y AS y2
WITH x AS y, y2
ORDER BY y
RETURN y
MATCH (n)
WITH n.x AS x, n.y AS y, n.y AS y2
WITH x AS y, y AS y3
ORDER BY y
RETURN y
These two queries return their results in the same order. This is even more interesting if we consider what the normalizeReturnClauses rewriter is supposed to do. According to its Javadoc:
This rewriter makes sure that all return items in a RETURN clauses are aliased, and moves any ORDER BY to a preceding WITH clause
Anyways, let's see what the rewriter makes of Q1 and Q2:
Query(None,SingleQuery(List(
Match(false,Pattern(List(EveryPath(NodePattern(Some(Variable(n)),List(),None)))),List(),None),
With(false,ReturnItems(false,Vector(
AliasedReturnItem(Property(Variable(n),PropertyKeyName(x)),Variable(x)),
AliasedReturnItem(Property(Variable(n),PropertyKeyName(y)),Variable(y)),
AliasedReturnItem(Property(Variable(n),PropertyKeyName(y)),Variable(y2)))
),
GraphReturnItems(true,List()),None,None,None,None),
Return(false,
ReturnItems(false,List(
AliasedReturnItem(Variable(x),Variable(y)),
UnaliasedReturnItem(Variable(y2),y2)
)),
None,Some(OrderBy(List(AscSortItem(Variable(y))))),None,None,Set())
)))
Query(None,SingleQuery(List(
Match(false,Pattern(List(EveryPath(NodePattern(Some(Variable(n)),List(),None)))),List(),None),
With(false,ReturnItems(false,Vector(
AliasedReturnItem(Property(Variable(n),PropertyKeyName(x)),Variable(x)),
AliasedReturnItem(Property(Variable(n),PropertyKeyName(y)),Variable(y)),
AliasedReturnItem(Property(Variable(n),PropertyKeyName(y)),Variable(y2)))
),
GraphReturnItems(true,List()),None,None,None,None),
Return(false,
ReturnItems(false,List(
AliasedReturnItem(Variable(x),Variable(y)),
AliasedReturnItem(Variable(y),Variable(y3))
)),
None,Some(OrderBy(List(AscSortItem(Variable(y))))),None,None,Set())
)))
The only difference is the last ReturnItem. Let's inspect the return items:
ReturnItems(false,List(
AliasedReturnItem(Variable(x),Variable(y)),
UnaliasedReturnItem(Variable(y2),y2)
)),
ReturnItems(false,List(
AliasedReturnItem(Variable(x),Variable(y)),
AliasedReturnItem(Variable(y),Variable(y3))
)),
My hypothesis is the following: the latter results in an incorrect order due to the bad selection of y. Namely, the second AliasedReturnItem takes precedence, while the first one should.
@szarnyasg This sounds like a bug, we will take a look at it.
//Louise, Neo4j Cypher Team
@szarnyasg A bug fix for this will be included in 3.3.3