Dgraph: recursive query not expanding

Created on 5 Jul 2019  路  28Comments  路  Source: dgraph-io/dgraph

This is the same query/schema/data as https://github.com/dgraph-io/dgraph/issues/3582

I believe it is an independent issue but not sure: @martinmr

For the query:

q = `
        query withvar($hashid: string) {
            chain(func: eq(node.hashid, $hashid)) @recurse(loop:false) {
                uid
                node.owner
                user.name
                node.hashid
                node.xdata
                node.parent @facets @facets(eq(facet, "required") or eq(facet, "recommended"))
            }
        }
    `

When a node reappears deeper in, it is not expanding the node.owner. It correctly expands in earlier on.

arequerylang arequerylanrecurse kinenhancement prioritP2 statuaccepted

All 28 comments

See <------------------------ below

{
  "chain": [
    {
      "uid": "0x4e27",
      "node.owner": [
        {
          "uid": "0x4e21",
          "user.name": "johny"
        }
      ],
      "node.hashid": "jjtn4cdirv",
      "node.xdata": "{\"title\":\"The Two Towers\",\"authors\":[\"J. R. R. Tolkien\"]}",
      "node.parent": [
        {
          "uid": "0x2711", <----------------------------
          "node.owner": [ <--------------- EXPANDED
            {
              "uid": "0x1",
              "user.name": "powerofgod"
            }
          ],
          "node.hashid": "myin9aksq",    
          "node.xdata": "{\"title\":\"The Da Vinci Code\",\"authors\":[\"Dan Brown\"]}",
          "node.parent|facet": "required"
        },
        {
          "uid": "0x4e25",
          "node.owner": [
            {
              "uid": "0x1",
              "user.name": "powerofgod"
            }
          ],
          "node.hashid": "r9t9rc4ip",
          "node.xdata": "{\"title\":\"Pride and Prejudice\",\"authors\":[\"Jane Austen\"]}",
          "node.parent": [
            {
              "uid": "0x2711", <------------------------SAME NODE (no expanded node.owner)
              "node.hashid": "myin9aksq",
              "node.xdata": "{\"title\":\"The Da Vinci Code\",\"authors\":[\"Dan Brown\"]}",
              "node.parent|facet": "required"
            }
          ],
          "node.parent|facet": "recommended"
        },
        {
          "uid": "0x4e26",
          "node.owner": [
            {
              "uid": "0x4e21",
              "user.name": "johny"
            }
          ],
          "node.hashid": "35t8qc8i5",
          "node.xdata": "{\"title\":\"The Hobbit\",\"authors\":[\"J. R. R. Tolkien\"]}",
          "node.parent": [
            {
              "uid": "0x2711", <------------------------SAME NODE (no expanded node.owner)
              "node.hashid": "myin9aksq",
              "node.xdata": "{\"title\":\"The Da Vinci Code\",\"authors\":[\"Dan Brown\"]}",
              "node.parent|facet": "recommended"
            },
            {
              "uid": "0x4e25",
              "node.hashid": "r9t9rc4ip",
              "node.xdata": "{\"title\":\"Pride and Prejudice\",\"authors\":[\"Jane Austen\"]}",
              "node.parent|facet": [ <--------------- ISSUE #3582 (different issue)
                "required",
                "required"
              ]
            }
          ],
          "node.parent|facet": "recommended"
        }
      ]
    }
  ]
}

/cc @power-f-GOD

This is most likely because you reached the maximum depth allowed. Try playing with the depth parameter. I would start with 5 and see if that makes those nodes appear.

link to the docs: https://docs.dgraph.io/query-language/#recurse-query

I'll also document what the default depth is since it's currently unclear from the docs.

My use case requires no limits

If I was making a dependency tool like npm I can't have a limit

Nevermind, I checked the code. The default depth when loop is set to false is math.MaxUint64 so that's not the problem. The problem is you have set loop to false. Once a node has been seen, it won't be expanded again. If that's not the behavior you want, then you have to set loop to true and pass a depth parameter that is greater than zero.

If your use case requires no limits, then you must set loop to false, which will run the query until it reaches all the nodes but won't follow loops. Otherwise your queries will never return.

Seems like a bug to me. The query implies the node should always be expanded, even if it means dgraph keeps the previously encountered node in memory and reuses it.

This is not a bug. You specifically set loop to false.

You are looking at it from an implementation point of view. That is not what the query implies irrespective of loop setting. @manishrjain

That is exactly what the query implies. If you want to follow loops you need to set loop to true and provide a depth parameter. Otherwise Dgraph would keep on expanding the graph endlessly.

My concern is regarding why the loop setting is effecting the JSon payload being returned. I understand loop should stop expanding endlessly. But the query's "footprint" says for each node (uid) found, the JSon response should contain an expanded 'node.owner'. the response should supply it, even if it means storing encountered nodes in memory and showing it.

The footprint of each returned object is:

                node.owner
                user.name
                node.hashid
                node.xdata
                node.parent @facets @facets(eq

All returned objects should look like that as much as possible.

This is not different than a predicate not showing up when it doesn't exist. In the first appearance of the node, node.owner was expanded because it had not been traverse that. After that, that edge won't be expanded because loop is set to false. Dgraph doesn't show anything. It could set node.owner to an empty object or list but that would not be very useful. You are right in that Dgraph will try to adhere to the query format requested but it won't force it in situations where it does not make that much sense.

I'm suggesting it should be in the case in this scenario without having to "force it". Since the uid has been encountered before, DGraph has all the data required to properly create an object in the eventual json which will look like the footprint suggested in the query.

It is essentially doing this: https://github.com/thehonestscoop/lemma-chain/blob/master/find_chain.go#L242 (but the DGraph side rather than client).

It is not hard to do from a technical point of view, it doesn't violate loop limit and returns what the query intuitively expects.

Doing something like this would in fact violate the loop guarantees. From the documentation:

The loop parameter can be set to false, in which case paths which lead to a loop would be ignored while traversing.

Including the objects at the end of an already traversed edge would mean we are considering paths that lead to a loop, which contradicts what setting loop to false means.

Something like this is application specific logic and should be done in the client, not in Dgraph.

Something is very wrong. Totally counter-intuitive of what the query footprint suggests without a good enough reason to justify not expanding a node that was already encountered. @manishrjain @campoy

NB: @martinmr I'm not saying you're doing anything wrong. Maybe you implemented it as stated in docs or requirements. I think the behaviour should be considered wrong.

Hey @pjebs ,

Thanks for bringing the issue. I read through the thread and honestly, I was a bit confused about the results as well. The semantics of what not to follow when loop: false is set are not very clear.

In terms of loop iteration, the general idea is to stop once the same node is encountered. However, the way it is implemented is such that we stop when the same edge is encountered. So, the same node can continue to be expanded, as long as we find new edges. Or, that's my understanding so far.

https://github.com/dgraph-io/dgraph/blob/e5791c37ecae32879814b3ad2ebb326c6b104442/query/recurse.go#L133

I agree that we should maintain the query structure as closely as possible in the results. So, it rules out returning the same node.hashid multiple times, while not returning node.owner.

Based on my cursory look, I think the right thing to do here would be to stop expanding edges as soon as the node is seen again. But, I feel I'm missing some history here.

We'll look closer and see what changes are required here to make the semantics of loop clearer.

CC: @pawanrawal

This is a bug. The response of the query should be the same as the expanded query below but its not.

Expanded query

{
  me(func: uid(0x4e27)) {
    uid
    node.hashid
    node.xdata
    node.parent {
      uid
      node.hashid
      node.xdata
      node.owner {
        uid
        user.name
      }
      node.parent {
        uid
        node.hashid
        node.xdata
        node.owner {
          uid
          user.name
        }
        node.parent {
          uid
          node.hashid
          node.xdata
          node.owner {
            uid
            user.name
          }
        }
      }
    }
    node.owner {
      uid
      user.name
    }
  }
}

The JSON data-set that can be used to reproduce this issue is pasted below.

Dataset

{
  "set": [
    {
      "uid": "0x4e27",
      "node.owner": [
        {
          "uid": "0x4e21",
          "user.name": "johny"
        }
      ],
      "node.hashid": "jjtn4cdirv",
      "node.xdata": "{\"title\":\"The Two Towers\",\"authors\":[\"J. R. R. Tolkien\"]}",
      "node.parent": [
        {
          "uid": "0x2711",
          "node.owner": [
            {
              "uid": "0x1",
              "user.name": "powerofgod"
            }
          ],
          "node.hashid": "myin9aksq",    
          "node.xdata": "{\"title\":\"The Da Vinci Code\",\"authors\":[\"Dan Brown\"]}",
          "node.parent|facet": "required"
        },
        {
          "uid": "0x4e25",
          "node.owner": [
            {
              "uid": "0x1",
              "user.name": "powerofgod"
            }
          ],
          "node.hashid": "r9t9rc4ip",
          "node.xdata": "{\"title\":\"Pride and Prejudice\",\"authors\":[\"Jane Austen\"]}",
          "node.parent": [
            {
              "uid": "0x2711",
              "node.hashid": "myin9aksq",
              "node.xdata": "{\"title\":\"The Da Vinci Code\",\"authors\":[\"Dan Brown\"]}",
              "node.parent|facet": "required"
            }
          ],
          "node.parent|facet": "recommended"
        },
        {
          "uid": "0x4e26",
          "node.owner": [
            {
              "uid": "0x4e21",
              "user.name": "johny"
            }
          ],
          "node.hashid": "35t8qc8i5",
          "node.xdata": "{\"title\":\"The Hobbit\",\"authors\":[\"J. R. R. Tolkien\"]}",
          "node.parent": [
            {
              "uid": "0x2711",
              "node.hashid": "myin9aksq",
              "node.xdata": "{\"title\":\"The Da Vinci Code\",\"authors\":[\"Dan Brown\"]}",
              "node.parent|facet": "recommended"
            },
            {
              "uid": "0x4e25",
              "node.hashid": "r9t9rc4ip",
              "node.xdata": "{\"title\":\"Pride and Prejudice\",\"authors\":[\"Jane Austen\"]}"
            }
          ],
          "node.parent|facet": "recommended"
        }
      ]
    }
  ]
}

It happens because while considering loops, we filter the edges that we have already walked but this filtering happens at the subgraph level and not for individual sub-trees. The response should include the expanded nodes as mentioned by @pjebs as they won't lead to any loop for the given data-set.

We should ignore the nodes that we have walked but do it for each branch of the tree separately. I'll look into how to do that.

In the meantime, there is a workaround. You can get the correct results you want by setting loop to true and specifying a very large depth (the max that you expect).

My current workaround is: https://github.com/thehonestscoop/lemma-chain/blob/master/find_chain.go#L242

Is this not good enough?

Can I use Max uint64 value?

Yeah, you can use MaxUint64 value but should use a lower value if you have an idea of the max expected depth. The problem with using MaxUint64 is that if your data does have loops, you'll end up getting a lot of results (capped by query_edge_limit flag of alpha) which you don't care about.

Ok, so I do not think this is a bug at all. Let me explain.

When setting loop: false we're asking Dgraph to stop expanding after a node that has already been seen is found.

This allows us to find loops by giving us the element that we already found. This repeated element is expanded, but no further elements will be expanded.

Let's see an example!

Given this dataset:

{
    set {
        _:a <name> "a" .
        _:b <name> "b" .
        _:c <name> "c" .
        _:d <name> "d" .
        _:z <name> "z" .

        _:a <id> "1"^^<xs:int> .
        _:b <id> "2"^^<xs:int> .
        _:c <id> "3"^^<xs:int> .
        _:d <id> "4"^^<xs:int> .

        _:a <knows> _:b .
        _:b <knows> _:c .
        _:c <knows> _:a .
        _:c <knows> _:d .

        _:a <type> _:z .
        _:b <type> _:z .
        _:c <type> _:z .
        _:d <type> _:z .  
    }
}

This draws a graph as follows:

image

As you can see, a, b, and c form a loop. Also a, b, c, and d point to z with their "type" edges.

@recurse(loop: true)

Let's recurse starting from a using name, knows, id, and type with the following query:

_note_: the name predicate has an exact predicate <name>: string @index(exact) .

{
  q(func: eq(name, "a")) @recurse(loop: true, depth: 5) {
    name 
    id
    knows
    type
  }
}

The result is:

{
  "data": {
    "q": [
      {
        "name": "a",
        "id": 1,
        "knows": [
          {
            "name": "b",
            "id": 2,
            "knows": [
              {
                "name": "c",
                "id": 3,
                "knows": [
                  {
                    "name": "a",
                    "id": 1,
                    "knows": [
                      {
                        "name": "b",
                        "id": 2
                      }
                    ],
                    "type": [
                      {
                        "name": "z"
                      }
                    ]
                  },
                  {
                    "name": "d",
                    "id": 4,
                    "type": [
                      {
                        "name": "z"
                      }
                    ]
                  }
                ],
                "type": [
                  {
                    "name": "z"
                  }
                ]
              }
            ],
            "type": [
              {
                "name": "z"
              }
            ]
          }
        ],
        "type": [
          {
            "name": "z"
          }
        ]
      }
    ]
  },
  "extensions": { }
}

This contains two paths a - b - c - a - b and a - b - c - d, and for each of these nodes all the predicates (name, id, knows, type) are expanded.

Notice that while the ending node d still has likes expanded to the name z.
But the end of the path with the loop b does not expand type but it does expand name and id. Why is this?

The behavior is actually the same for name and type, but while expanding name gives a value predicate of type string, type goes to another uid for which we don't expand anything anymore since we've reached the maximum recursion depth. Since the expansion doesn't occur, the object inside of type (and knows) is empty and therefore the predicate type is dropped completely.

@recurse(loop: false)

A very similar behavior occurs when setting loop: false as the last node to be expanded is the one that was already included in the traversal. In this case, that's a.

{
  q(func: eq(name, "a")) @recurse(loop: false) {
    name 
    id
    knows
    type
  }
}

This will return:

{
  "data": {
    "q": [
      {
        "name": "a",
        "id": 1,
        "knows": [
          {
            "name": "b",
            "id": 2,
            "knows": [
              {
                "name": "c",
                "id": 3,
                "knows": [
                  {
                    "name": "a",
                    "id": 1
                  },
                  {
                    "name": "d",
                    "id": 4,
                    "type": [
                      {
                        "name": "z"
                      }
                    ]
                  }
                ],
                "type": [
                  {
                    "name": "z"
                  }
                ]
              }
            ],
            "type": [
              {
                "name": "z"
              }
            ]
          }
        ],
        "type": [
          {
            "name": "z"
          }
        ]
      }
    ]
  },
  "extensions": { }
}

Here you can see that while d is expanded and you can see the name for the node pointed by type, a is expanded but the nodes pointed by it via knows and type will not be expanded, generating an empty object, which will be removed as well as the predicates knows and type themselves. This is not different to the behavior with b when the maximum recursion depth in the case above.

Coming back to your example

The behavior above matches what you see in your example. The first time you see the node is the first element in the loop and we expand it until we find it again, in which case we stop the expansion at that node (value predicates will appear, but object predicates will be expanded but empty therefore dropped).

I hope this helps understand the behavior.

I do no think this is a bug, although it might be useful to improve the documentation regarding the behavior of @recurse when loop: false is set and a loop is found or the maximum recursion depth is reached.

What you are saying is correct. What I suggested in https://github.com/dgraph-io/dgraph/issues/3634#issuecomment-512619447 is that it may very well be working as intended - with the obvious improvement being to clarify the behaviour in documentation.

I am saying that the intended bahviour is not consistent with what a user would expect from such a query such as:

q = `
        query withvar($hashid: string) {
            chain(func: eq(node.hashid, $hashid)) @recurse(loop:false) {
                uid
                node.owner
                user.name
                node.hashid
                node.xdata
                node.parent @facets @facets(eq(facet, "required") or eq(facet, "recommended"))
            }
        }
    `

I expect each result to satisfy the "footprint" as stated in the query of uid, node.owner, user.name, node.hashid, node.xdata and most importantly node.parent if available.

Currently if the parent node has been visited, it is not displaying it. It should return that value. Currently I am doing it client side, when it can be very easily performed in server-side. @pawanrawal can probably explain it better than I.

Further to this, if the node has been visited before, then node.parent is not returned. I then have no ability to know if that node has a parent. This is obviously an error.

I wonder if it would make sense for you to perform this query in a slightly different way then.

You could have the recursion to identify what are the nodes involved, and a different query in the same block to obtain the data related to each.

So following the dataset I presented above, instead of the query right below for which you'd be missing the type information on nodes already visited:

{
  q(func: eq(name, "a")) @recurse(loop: false) {
    name 
    id
    knows
    type
  }
}

You could fetch the data in two queries as such:

{
  tree(func: eq(name, "a")) @recurse(loop: false) {
    nodes as uid
    knows
  }

  nodes(func: uid(nodes)) {
    name
    id
    type {
      name
    }
  }
}
````

This would return both the structure of your tree and the information attached to each node (only once).
The result would be a smaller response, probably.

```json
{
  "data": {
    "tree": [
      {
        "uid": "0x3",
        "knows": [
          {
            "uid": "0x4",
            "knows": [
              {
                "uid": "0x5",
                "knows": [
                  {
                    "uid": "0x1"
                  },
                  {
                    "uid": "0x3"
                  }
                ]
              }
            ]
          }
        ]
      }
    ],
    "nodes": [
      {
        "name": "d",
        "id": 4,
        "type": [
          {
            "name": "z"
          }
        ]
      },
      {
        "name": "a",
        "id": 1,
        "type": [
          {
            "name": "z"
          }
        ]
      },
      {
        "name": "b",
        "id": 2,
        "type": [
          {
            "name": "z"
          }
        ]
      },
      {
        "name": "c",
        "id": 3,
        "type": [
          {
            "name": "z"
          }
        ]
      }
    ]
  },
  "extensions": { ... }
}

We will update the documentation, but first I want to make sure this solution meets your needs.

After discussing this issue with @pawanrawal we've reached the conclusion that there's a possible enhancement to be done here which would target your concerns.

I'm going to keep this issue open but change it to be an enhancement, as the current behavior matches the specs. That said, we'll try to implement this in future releases.

The solution would be to stop recursion once a node has been seen in the current path. This is different to the current behavior where a node is not expanded once it has been seen at any point in the traversal.

Let's see this with one example:

{
    set {
        _:a <name> "a" .
        _:b <name> "b" .
        _:c <name> "c" .
        _:d <name> "d" .
        _:e <name> "e" .
        _:f <name> "f" .

        _:z <name> "z" .

        _:f <knows> _:a .
        _:f <knows> _:d .

        _:d <knows> _:e .
        _:e <knows> _:a .

        _:a <knows> _:b .
        _:a <knows> _:e .
        _:b <knows> _:c .

        _:a <type> _:z .
        _:b <type> _:z .
        _:c <type> _:z .
        _:d <type> _:z .  
        _:e <type> _:z .
        _:f <type> _:z .
    }
}

This would generate a graph like this:

image
_note_: the type predicates have been omitted for clarity's sake

If you got to start a recursive query from f:

{
    q(func: eq(name, "f")) @recurse {
        name
        knows
        type
    }
}

Currently, you would get something like this:

{
  "data": {
    "q": [
      {
        "knows": [
          {
            "knows": [
              {
                "knows": [
                  {
                    "name": "c",
                    "type": [
                      {
                        "name": "z"
                      }
                    ]
                  }
                ],
                "name": "b",
                "type": [
                  {
                    "name": "z"
                  }
                ]
              }
            ],
            "name": "a",
            "type": [
              {
                "name": "z"
              }
            ]
          },
          {
            "knows": [
              {
                "knows": [
                  {
                    "name": "a"
                  }
                ],
                "name": "e"
              }
            ],
            "name": "d",
            "type": [
              {
                "name": "z"
              }
            ]
          }
        ],
        "name": "f",
        "type": [
          {
            "name": "z"
          }
        ]
      }
    ]
  }
}

Note how once we reached a in the path starting with f -> d, the recursion stops.

We propose enhancing the loop detection to keep on going as a has not been seen in that path yet, so the output would look like the following:

{
  "data": {
    "q": [
      {
        "knows": [
          {
            "knows": [
              {
                "knows": [
                  {
                    "name": "c",
                    "type": [
                      {
                        "name": "z"
                      }
                    ]
                  }
                ],
                "name": "b",
                "type": [
                  {
                    "name": "z"
                  }
                ]
              }
            ],
            "name": "a",
            "type": [
              {
                "name": "z"
              }
            ]
          },
          {
            "knows": [
              {
                "knows": [
                  {
                    "knows": [
                      {
                        "knows": [
                          {
                            "name": "c",
                            "type": [
                              {
                                "name": "z"
                              }
                            ]
                          }
                        ],
                        "name": "b",
                        "type": [
                          {
                            "name": "z"
                          }
                        ]
                      }
                    ],
                    "name": "a",
                    "type": [
                      {
                        "name": "z"
                      }
                    ]
                  }
                ],
                "name": "e"
              }
            ],
            "name": "d",
            "type": [
              {
                "name": "z"
              }
            ]
          }
        ],
        "name": "f",
        "type": [
          {
            "name": "z"
          }
        ]
      }
    ]
  }
}

I think this would address your concerns, do you agree? @pjebs @power-f-GOD
Thanks for the report and for helping us understand your needs!

I'll have a look at it when I get back to this project. I still think that the queries return "footprint" should always be returned when possible.

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

drawing

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jeffkhull picture jeffkhull  路  3Comments

allen-munsch picture allen-munsch  路  4Comments

protheusfr picture protheusfr  路  4Comments

KadoBOT picture KadoBOT  路  5Comments

bytefish picture bytefish  路  4Comments