Neo4j: Cumbersome aliasing and ORDER BY results in inconsistent results

Created on 2 Jan 2018  ยท  4Comments  ยท  Source: neo4j/neo4j

When abusing aliases in RETURN ... ORDER BY ..., it's possible to get surprising results. Users should not use aliases like this, to begin with, so this is not a critical issue. Still, I found it interesting and I think it's best if it's recorded here. I stumbled upon it while working on https://github.com/slizaa/slizaa-opencypher-xtext/issues/7.

Neo4j Version: 3.3.1
Operating System: Ubuntu 16.04
API: web UI

Steps to reproduce

I created a small example dataset of 4 rows.

UNWIND [
  {x: 1, y: 3},
  {x: 1, y: 4},
  {x: 2, y: 3},
  {x: 2, y: 4}
] AS row
WITH row.x AS x, row.y AS y, row.y AS y2
RETURN x AS y, y2
ORDER BY y

Result:

โ•’โ•โ•โ•โ•คโ•โ•โ•โ•โ••
โ”‚"y"โ”‚"y2"โ”‚
โ•žโ•โ•โ•โ•ชโ•โ•โ•โ•โ•ก
โ”‚1  โ”‚3   โ”‚
โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”ค
โ”‚1  โ”‚4   โ”‚
โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”ค
โ”‚2  โ”‚3   โ”‚
โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”ค
โ”‚2  โ”‚4   โ”‚
โ””โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”˜

We can observe that the engine orders tuples according to the value of x. However, if we repeat the value of y as y3, we get a different ordering:

UNWIND [
  {x: 1, y: 3},
  {x: 1, y: 4},
  {x: 2, y: 3},
  {x: 2, y: 4}
] AS row
WITH row.x AS x, row.y AS y, row.y AS y2
RETURN x AS y, y AS y3
ORDER BY y

(The only difference in the query is the last item of the RETURN clause, y AS y3.)

Result:

โ•’โ•โ•โ•โ•คโ•โ•โ•โ•โ••
โ”‚"y"โ”‚"y3"โ”‚
โ•žโ•โ•โ•โ•ชโ•โ•โ•โ•โ•ก
โ”‚1  โ”‚3   โ”‚
โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”ค
โ”‚2  โ”‚3   โ”‚
โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”ค
โ”‚1  โ”‚4   โ”‚
โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”ค
โ”‚2  โ”‚4   โ”‚
โ””โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”˜

In theory, repeating variable y aliased as y3 should not interfere with the ordering, but somehow it does. I can only guess, but probably this somehow "lifts" the original variable y higher in the scope and it takes precedences over the newly introduced variable y (which is an alias for x).

Expected results: one can argue for both orderings, but the two queries should use the same ordering. In my opinion, the variable reference y should refer to the closest variable declaration, i.e. to the alias (AS y) introduced in the RETURN clause.

3.3 bug team-cypher

All 4 comments

They are not very revealing, but I added the EXPLAIN plans for query 1 and query 2.

plan ... plan 1

I dove into this a bit more. I got suspicious that the root of the problem is in the AST rewriters - normalizeWithClauses and/or normalizeReturnClauses.

These are typically tested with MATCH queries. So first, I created an actual graph of 4 nodes to serve as input data:

CREATE
  ({x: 1, y: 3}),
  ({x: 1, y: 4}),
  ({x: 2, y: 3}),
  ({x: 2, y: 4})

This way, the queries become:

Q1

MATCH (n)
WITH n.x AS x, n.y AS y, n.y AS y2
RETURN x AS y, y2
ORDER BY y

vs.

Q2

MATCH (n)
WITH n.x AS x, n.y AS y, n.y AS y2
RETURN x AS y, y AS y3
ORDER BY y

Interestingly, the error does not occur when using ORDER BY on a WITH clause (instead of on a RETURN clause).

Q1/Return

MATCH (n)
WITH n.x AS x, n.y AS y, n.y AS y2
WITH x AS y, y2
ORDER BY y
RETURN y

Q2/Return

MATCH (n)
WITH n.x AS x, n.y AS y, n.y AS y2
WITH x AS y, y AS y3
ORDER BY y
RETURN y

These two queries return their results in the same order. This is even more interesting if we consider what the normalizeReturnClauses rewriter is supposed to do. According to its Javadoc:

This rewriter makes sure that all return items in a RETURN clauses are aliased, and moves any ORDER BY to a preceding WITH clause

Anyways, let's see what the rewriter makes of Q1 and Q2:

Q1 after running the normalizeWithClauses rewriter

Query(None,SingleQuery(List(
  Match(false,Pattern(List(EveryPath(NodePattern(Some(Variable(n)),List(),None)))),List(),None),
  With(false,ReturnItems(false,Vector(
    AliasedReturnItem(Property(Variable(n),PropertyKeyName(x)),Variable(x)),
    AliasedReturnItem(Property(Variable(n),PropertyKeyName(y)),Variable(y)),
    AliasedReturnItem(Property(Variable(n),PropertyKeyName(y)),Variable(y2)))
  ),
  GraphReturnItems(true,List()),None,None,None,None),
  Return(false,
    ReturnItems(false,List(
      AliasedReturnItem(Variable(x),Variable(y)),
      UnaliasedReturnItem(Variable(y2),y2)
    )),
    None,Some(OrderBy(List(AscSortItem(Variable(y))))),None,None,Set())
  )))

Q2 after running the normalizeWithClauses rewriter

Query(None,SingleQuery(List(
  Match(false,Pattern(List(EveryPath(NodePattern(Some(Variable(n)),List(),None)))),List(),None),
  With(false,ReturnItems(false,Vector(
    AliasedReturnItem(Property(Variable(n),PropertyKeyName(x)),Variable(x)),
    AliasedReturnItem(Property(Variable(n),PropertyKeyName(y)),Variable(y)),
    AliasedReturnItem(Property(Variable(n),PropertyKeyName(y)),Variable(y2)))
  ),
  GraphReturnItems(true,List()),None,None,None,None),
  Return(false,
    ReturnItems(false,List(
      AliasedReturnItem(Variable(x),Variable(y)),
      AliasedReturnItem(Variable(y),Variable(y3))
    )),
    None,Some(OrderBy(List(AscSortItem(Variable(y))))),None,None,Set())
  )))

The only difference is the last ReturnItem. Let's inspect the return items:

Q1

    ReturnItems(false,List(
      AliasedReturnItem(Variable(x),Variable(y)),
      UnaliasedReturnItem(Variable(y2),y2)
    )),

Q2

    ReturnItems(false,List(
      AliasedReturnItem(Variable(x),Variable(y)),
      AliasedReturnItem(Variable(y),Variable(y3))
    )),

My hypothesis is the following: the latter results in an incorrect order due to the bad selection of y. Namely, the second AliasedReturnItem takes precedence, while the first one should.

@szarnyasg This sounds like a bug, we will take a look at it.
//Louise, Neo4j Cypher Team

@szarnyasg A bug fix for this will be included in 3.3.3

Was this page helpful?
0 / 5 - 0 ratings

Related issues

webtic picture webtic  ยท  4Comments

guile2912 picture guile2912  ยท  3Comments

cybersam picture cybersam  ยท  3Comments

szarnyasg picture szarnyasg  ยท  3Comments

BrainAnnex picture BrainAnnex  ยท  4Comments