v2.2 SNAPSHOT[ ] - .11[x] .rc1[ ] .beta2[ ] .beta1[ ]
[x] Linux
I have a MATCH query with multiple traversal expressions, per the docs:
MATCH {
class: Foo,
as: foo_1
}.out('Foo_Bar') {
optional: true,
as: bar_1
}, {
class: Foo,
as: foo_1
}.out('Foo_Baz') {
where: ???,
as: baz_1
} RETURN $matches
In the ???
region, I would like to use an expression to this effect: either bar_1
did not exist (it's optional), or if it did then count >= bar_1.count
. However, I am not sure how to write this check given the docs, and the few ways I tried have failed. For example:
Attempt 1: The following causes a NullPointerException
traceable down to $matched.bar_1 IS null
:
MATCH {
class: Foo,
as: foo_1
}.out('Foo_Bar') {
optional: true,
as: bar_1
}, {
class: Foo,
as: foo_1
}.out('Foo_Baz') {
where: ($matched.bar_1 IS null OR count >= $matched.bar_1.count),
as: baz_1
} RETURN $matches
Attempt 2: The following also causes a NullPointerException
, traceable down to $matched containsKey 'bar_1'
:
MATCH {
class: Foo,
as: foo_1
}.out('Foo_Bar') {
optional: true,
as: bar_1
}, {
class: Foo,
as: foo_1
}.out('Foo_Baz') {
where: (not ($matched containsKey 'bar_1') OR count >= $matched.bar_1.count),
as: baz_1
} RETURN $matches
Attempt 3: The following allows all records through and does not filter based on count
, meaning that 'bar_1' in $matched
is always false
:
MATCH {
class: Foo,
as: foo_1
}.out('Foo_Bar') {
optional: true,
as: bar_1
}, {
class: Foo,
as: foo_1
}.out('Foo_Baz') {
where: (not ('bar_1' in $matched) OR count >= $matched.bar_1.count),
as: baz_1
} RETURN $matches
Here's the schema and data I used:
CREATE CLASS Foo EXTENDS V
CREATE PROPERTY Foo.name String
CREATE CLASS Bar EXTENDS V
CREATE PROPERTY Bar.count Integer
CREATE CLASS Foo_Bar EXTENDS E
CREATE CLASS Foo_Baz EXTENDS E
CREATE VERTEX Foo SET name = 'foo' // assigned #12:0
CREATE VERTEX Foo SET name = 'new_foo' // assigned #12:1
CREATE VERTEX Foo SET name = 'bad_foo' // assigned #12:2
CREATE VERTEX Bar SET count = 1 // assigned #13:0
CREATE VERTEX Bar SET count = 2 // assigned #13:1
// "foo" #12:0 has both edges, and the "count" condition is satisfied
CREATE EDGE Foo_Bar FROM #12:0 TO #13:0
CREATE EDGE Foo_Baz FROM #12:0 TO #13:1
// "new_foo" #12:1 has no Foo_Bar edge, so it too should be selected
CREATE EDGE Foo_Baz FROM #12:1 TO #13:1
// "bad_foo" #12:2 has both edges, but its "count" condition isn't satisfied
CREATE EDGE Foo_Bar FROM #12:2 TO #13:1
CREATE EDGE Foo_Baz FROM #12:2 TO #13:0
The expected result for the query I want is the following:
foo_1 bar_1 baz_1
-------------------------
#12:0 #13:0 #13:1
#12:1 #13:1
Of the 3 attempts I made above, 2 returned errors, and one did not filter because it always evaluated the filter to true
as described above, meaning that it returned the above two rows plus the row #12:2 #13:1 #13:0
which is incorrect.
Hi @obi1kenobi
I think the easiest and safest way to do it is to use a subquery, eg.
SELECT FROM (
MATCH {
class: Foo,
as: foo_1
}.out('Foo_Bar') {
optional: true,
as: bar_1
}, {
class: Foo,
as: foo_1
}.out('Foo_Baz') {
as: baz_1
} RETURN $matches
) WHERE bar_1 is null or baz_1.count >= bar_1.count
Anyway, I'm checking the NullPointerException that should not happen
Thanks
Luigi
Hi @luigidellaquila thanks for the suggestion. Unfortunately, the subquery is not an option, because the real queries I am using are about 40x the size of this one. It's a very bad idea to pull out all their conditionals to an outer query, because then I end up traversing millions of records and then at the very end only filter down to a few.
If you were to add support for an operator that checks whether a location is selected in $matched
, I would be finally able to move off of Gremlin and get that 100x speedup from MATCH. I hope you and @lvca are willing to reconsider -- it shouldn't be too complex of an addition but will dramatically expand the capabilities of the MATCH query.
Hi @obi1kenobi
sure, this is a quite common use case, probably fixing the null pointer will also solve the problem, but if it's not the case I'll see what's the effort to implement it.
Thanks
Luigi
The reason I need this is because I'm building (and hoping to open-source soon!) a module that gives OrientDB support for Facebook's popular GraphQL language: http://graphql.org/
It's impossible to fix the use case because the point of GraphQL is that queries can be arbitrarily complex -- so I can't lift all of the filtering into an outer SELECT statement.
I have already found a way to turn GraphQL into Gremlin, but it's really slow. I believe that this problem is one of the last few things standing between GraphQL and MATCH. If you are able to solve this problem with me, OrientDB could be one of the first databases to support GraphQL queries!
@obi1kenobi wow, that's cool! @luigidellaquila how much time does it take?
@lvca thanks, I thought you might like it!
As for the proper way to check if selections are made -- Gremlin makes its equivalent of $matched.unselected_location
be null
for any locations that aren't selected, but I don't particularly care which of the three approaches above you decide is the "most correct". Just let me know once it's fixed and update the docs, and I will be a happy camper :)
Hi @lvca
I'm still investigating it, I have to change the execution flow of the MATCH statement. Right now optional nodes are evaluated at the end, so the condition on the non-optional node cannot use a result from the optional node... The easiest and quickest way is to evaluate that condition at the end of the pattern matching, but it would be the same as doing a nested SELECT.
I'll give you a feedback in next hours
Thanks
Luigi
Hi @luigidellaquila
Just to make sure there's no misunderstanding -- I'm not asking for the "optional" keyword to be supported anywhere but at the end of one MATCH expression. All I need is this: when there are multiple MATCH expressions in the same MATCH query, the subsequent expressions should be able to tell whether a previous "optional" traversal succeeded. Note that in my example, the "optional" keyword is still the right terminal, and a new MATCH expression starts after it.
Assuming my understanding of OrientDB's code is correct, all I need is the OrientDB IN
or CONTAINSKEY
operators, when called on the $matched
special variable, to call the matchContext.matched.containsKey()
method inside OrientDB. I believe that this should not cause changes to the execution flow of MATCH.
Hi @luigidellaquila, any updates on this? Any chance it'll be available in 2.2.12?
Hi @obi1kenobi
no, unfortunately to support it in 2.2.x I have two alternatives:
Solution 2 is quite expensive, so it's hard for me to do it in 2.2.12
Solution 1 is much easier, but it would have the exact same performance of an outer SELECT, so it's not worth the effort.
I'm doing it in 3.0 in the meantime, maybe I'll find a way to backport some parts of it and make it work, but I cannot give you an ETA
Thanks
Luigi
Here's another alternative that will not require a different execution flow, but will still let me compile GraphQL to MATCH -- please let me know what you think!
Let's use this example graph:
Foo(@rid=#1:1) --SomeEdge-> Bar(@rid=#2:1) ------
\
===FinalEdge=> Baz(@rid=#3:1)
/
Foo(@rid=#1:2) --DifferentEdge-> Bar(@rid=#2:2)---
Currently, when a traversal is being evaluated, any paths that don't match a given step are eliminated from consideration. Let's call such paths _detached_. In the following example, the path that starts at Foo(@rid=#1:2)
becomes detached at bar_1
because it didn't satisfy the .out('SomeEdge')
step:
MATCH {
class: Foo,
as: foo_1
}.out('SomeEdge') {
as: bar_1
} RETURN $matches
You could add a keyword called "resume" that allows detached paths to be resumed at a later time. Here are a few examples.
MATCH {
class: Foo,
as: foo_1
}.out('SomeEdge') {
optional: true,
as: bar_1
}.in('SomeEdge') {
resume: bar_1,
as: foo_2
} RETURN $matches
The above query would return:
| foo_1 | bar_1 | foo_2 |
------------------------------
| #1:1 | #2:1 | #1:1 |
| #1:2 | null | #1:2 |
MATCH {
class: Foo,
as: foo_1
}.out('SomeEdge') {
optional: true,
as: bar_1
}.out('FinalEdge') { // note this traversal only applies to the non-detached paths
as: baz_1
}.in('FinalEdge') {
where: (@rid == $matched.bar_1.@rid),
as: bar_2
}.in('SomeEdge') {
resume: bar_1,
as: foo_2
} RETURN $matches
Since the path from Foo(@rid=#1:2)
became detached (with optional: true
) at bar_1
, it is not evaluated further until the resume: bar_1
statement, which allows it to reattach. Using resume: bar_1
implies where: (@rid == $matched.foo_1.@rid)
in addition to any other filtering -- a path can only resume from the @rid where it became detached. Here, foo_1
was the last location visited by the detached path before becoming detached at bar_1
, which is why the implicit filter checks agains $matched.foo_1.@rid
.
The query would return:
| foo_1 | bar_1 | baz_1 | bar_2 | foo_2 |
---------------------------------------------------
| #1:1 | #2:1 | #3:1 | #2:1 | #1:1 |
| #1:2 | null | null | null | #1:2 |
Adding the resume: true
option allows users to write queries where optional: true
is not the last (right-terminal) traversal step, making MATCH much more powerful. While the same effect can be achieved using subqueries, this approach is much cleaner and far more efficient, because it doesn't have complex nesting and does not require a full query evaluation before the filtering.
optional: true
behavior:MATCH {
class: Foo,
as: foo_1
}.out('SomeEdge') {
optional: true,
as: bar_1
} RETURN $matches
When optional: true
is on the last (right-terminal) traversal step, this is equivalent to having a resume: bar_1
just before outputting the results.
@luigidellaquila and @lvca, please let me know what you think! I think this is one of the last few technical obstacles before I can use GraphQL with OrientDB!
Hi @obi1kenobi
It's definitely worth evaluating this proposal. We decided to adopt a json-like syntax specifically because it's very flexible and we can add new behaviors like this, so in terms of pure syntax it's a GO. In terms of execution planning, I have to think about this a little bit and see how hard it is to implement and if it gives real advantages in terms of expressive power and performance.
Thanks
Luigi
Thanks for looking into it @luigidellaquila. Let me know if you have any questions or updates!
Hello @luigidellaquila and @lvca -- did you have a chance to consider my proposal in more detail?
This is the only missing feature blocking my progress with using GraphQL, and I really would love to make progress as soon as possible and then open-source the code :)
Hi @obi1kenobi
It's in my TODO list, I'll check it in next two days.
I'll keep you posted!
Thanks
Luigi
Ok guys, I checked it.
I have to review the order of the pattern calculation, so that in some cases optional edges are calculated in advance (now they are calculated at the end) based on conditions on $matched.
It's not a trivial fix, it will require one or two working days to make sure that nothing breaks.
I'll schedule it in next days.
Thanks
Luigi
That sounds great, thank you! Will this be part of the 2.2.13 release? It's not assigned to any milestone at the moment.
Also, which of the two approaches I proposed did you decide to go with? Just so I know in which direction to take my GraphQL work :)
I'll do the solution 2, I'll partially rewrite the execution flow of the MATCH statement.
It will be much more efficient, so it makes sense to invest one or two day on it.
I think I'll be able to do it in 2.2.13 (unless the team decides to release it at the beginning of next week, but I don't think it's the case, unless we find and fix any serious issue)
Thanks
Luigi
@obi1kenobi - in what language are you creating the GraphQL implementation?
Scott
@luigidellaquila thanks, that's great news! Looking forward to it :)
@smolinari I'm currently using pyorient
and Python, because Python has a very nice GraphQL library. However, if you can find a GraphQL library, you should be able to port my code to any language you like.
I'm interested. :) I'm looking to do the same thing with PHP. However, not with normal PHP, but rather with appserver. Normal PHP doesn't cut it for GraphQL (and why Facebook came up with HHVM).
Is the project you are working on public? Can I have a look?
Scott
Hi @obi1kenobi
I have news for you: I just pushed some changes that fix the basic flow.
For the record, the right syntax is the first one you proposed:
MATCH {
class: Foo,
as: foo_1
}.out('Foo_Bar') {
optional: true,
as: bar_1
}, {
class: Foo,
as: foo_1
}.out('Foo_Baz') {
where: ($matched.bar_1 IS null OR count >= $matched.bar_1.count),
as: baz_1
} RETURN $matches
Now this query executes correctly, but I'm still checking if there are some cases when the pattern conditions are evaluated in the wrong order, so I'm not closing this issue yet
Thanks
Luigi
Just to make sure I didn't misunderstand, $matched.xyz
now evaluates to null
if no location marked as: xyz
was visited?
Thanks!
Yes, exactly
Thanks
Luigi
@obi1kenobi - did you miss my question? :smile:
Scott
@luigidellaquila looking forward to it!
@smolinari sorry, I didn't see your edit. It's not public yet, I'm working with my company to get it open-sourced at the moment.
@luigidellaquila thank you so much for prioritizing this -- I really appreciate it! I am currently testing with the 2.2.x branch so we can uncover any problems before 2.2.13 is released, and shorten the development cycle of this feature.
Here are a couple of issues that you may want to look into.
I am not going to use the same schema as above because it isn't suitable to express these edge cases. Just imagine that all the vertices and edges are declared to fit the queries below.
Consider a query like the following:
MATCH {
class: Foo,
as: foo
}.out('Foo_Bar') {
optional: true,
as: optional_bar
}, {
class: Baz,
as: baz
}.out('Baz_Bar') {
optional: true,
as: optional_bar
} RETURN $matches
This returns no results even when such Foos, Bars and Bazs exist. Even if no Bars existed, it is my understanding that this should return the cartesian product of Foo X Bar
for all Foo and Bar vertices.
This is unfortunately a necessary feature for my GraphQL project. I need queries like the following:
MATCH {
class: Foo,
as: foo
}.out('Foo_Bar') {
optional: true,
as: optional_bar
}, {
class: Bar,
as: optional_bar
}.in('Baz_Bar') {
as: baz
} RETURN $matches
If a matching Bar exists at optional_bar
, this should include it in the traversal. If no matching Bar exists in the step that has optional: true
, then it's just as if the subsequent traversal didn't exist (because it starts at an optional location that doesn't exist) and $matched.baz
should subsequently be null for the same reason.
Thanks again, and please let me know if you'd like me to re-test an updated version of the branch!
Regarding the second part of my last post:
If a matching Bar exists at optional_bar, this should include it in the traversal.
If no matching Bar exists in the step that has optional: true, then it's just as
if the subsequent traversal didn't exist (because it starts at an optional
location that doesn't exist) and $matched.baz should subsequently
be null for the same reason.
I'm also perfectly happy to accept a solution that only triggers this behavior when a special keyword is applied to the root of the traversal starting at the optional node. For example, something like this:
MATCH {
class: Foo,
as: foo
}.out('Foo_Bar') {
optional: true,
as: optional_bar
}, {
class: Bar,
as: optional_bar,
nullWhenEmpty: true
}.in('Baz_Bar') {
as: baz
} RETURN $matches
@luigidellaquila please let me know what you think!
Hey @luigidellaquila, just double-checking if there's anything I can do to help with closing down this very complex issue! The progress so far has been great, and I'm hoping we can resolve this fully before 2.2.13 is out. A lot of people at my company are excitedly awaiting this so they can use GraphQL with OrientDB, and I'm sure open-sourcing my work will get the general OrientDB community quite excited as well :)
Please let me know how I can best help with this!
Hi @obi1kenobi
Thank you very much, unfortunately some internal refactoring is needed to support this and it's definitely not trivial. I hope I'll manage to fix these points in next very few weeks, I'll keep you posted
Thanks
Luigi
Thanks for the update, and please let me know if you'd like me to test a new build -- I set up an OrientDB development environment, so I can check out any OrientDB branch and run my GraphQL-to-MATCH compiler there.
Hi @obi1kenobi
This issue fell through the cracks, I guess it's already solved with last patches.
Can I close it?
Thanks
Luigi
Hi @luigidellaquila, this is a separate issue from the topological scheduler I contributed and shouldn't be closed. From my earlier post in this thread, reproduced here for your convenience:
If a matching Bar exists at optional_bar, this should include it in the traversal.
If no matching Bar exists in the step that has optional: true, then it's just as
if the subsequent traversal didn't exist (because it starts at an optional
location that doesn't exist) and $matched.baz should subsequently
be null for the same reason.
I'm also perfectly happy to accept a solution that only triggers this behavior when a special keyword is applied to the root of the traversal starting at the optional node. For example, something like this:
MATCH {
class: Foo,
as: foo
}.out('Foo_Bar') {
optional: true,
as: optional_bar
}, {
class: Bar,
as: optional_bar,
nullWhenEmpty: true
}.in('Baz_Bar') {
as: baz
} RETURN $matches
Hi @luigidellaquila, any luck with the above? I'd really love to have a way to traverse from optional nodes -- it would add so much flexibility!
Hi @obi1kenobi
not yet, sorry, it's in my TODO list but we are also finalizing the 3.0 M1 so we are pretty busy with that...
Thanks
Luigi
Hey @smolinari, I just realized I never updated the discussion here with the news on my GraphQL work. Long story short, it's now open-source and I wrote a blog post about it!
Let me know what you think!
Most helpful comment
Hi @luigidellaquila, this is a separate issue from the topological scheduler I contributed and shouldn't be closed. From my earlier post in this thread, reproduced here for your convenience:
I'm also perfectly happy to accept a solution that only triggers this behavior when a special keyword is applied to the root of the traversal starting at the optional node. For example, something like this: