Latest
Yes
N/A
Mutation Sample
I added "dgraph.type" just to be sure.
{
"set": [
{
"type": "node",
"dgraph.type": "Node",
"id": "0",
"labels": ["Movie"],
"properties": {
"tagline": "Welcome to the Real World",
"title": "The Matrix",
"released": 1999,
"dgraph.type": "Properties"
}
}
]
}
type Node {
properties
type
id
labels
tagline
title
released
}
type Properties {
tagline
title
released
}
{
movies(func: has(properties)) @filter(eq(type,"node")){
type
id
labels
dgraph.type
properties {
tagline
title
released
}
tagline
title
released
}
}
upsert {
query {
movies(func: has(type)) @filter(has(properties)){
v as uid
Labels as labels
properties {
Title as title
Released as released
Tagline as tagline
}
}
}
mutation {
set {
uid(v) <migrationDone> "True" .
uid(v) <title> val(Title) .
uid(v) <dgraph.type> val(Labels) .
uid(v) <released> val(Released) .
uid(v) <tagline> val(Tagline) .
}
}
}
This operation should be able to copy the values in the variables of the nested block, "properties". Thus allowing a kind of "migration" using Upsert Block. This feature is also equivalent to "merge". Where we collect the values of the nested block, we paste in the target node and delete the nested block right after.
With this "Hack" I was able to make it work
WARNING - This "hack" works mutation by mutation. Not Bulk Upsert. If you try to bulk upsert, it can get crazy results.
upsert {
query {
movies(func: has(type)) @filter(has(properties)){
v as uid
Labels as labels
properties {
Title as title
Released as released
Tagline as tagline
}
}
me() {
TitleMe as sum(val(Title))
ReleasedMe as sum(val(Released))
TaglineMe as sum(val(Tagline))
}
}
mutation {
set {
uid(v) <migrationDone> "True" .
uid(v) <title> val(TitleMe) .
uid(v) <dgraph.type> val(Labels) .
uid(v) <released> val(ReleasedMe) .
uid(v) <tagline> val(TaglineMe) .
}
}
}
Testing the PR https://github.com/dgraph-io/dgraph/pull/4767 for this issue - but now creating a new node instead of update it (which is the goal of 4692, to just update instead of creating new ones). It works like that.
pay attention that this would work only with 1 to 1 operation. It won't work in 1 to many (like bulk upsert).
upsert {
query {
movies(func: has(type), first:1) @filter(has(properties) AND not has(migrationDone)){
uid
qID as id
Labels as labels
properties {
Title as title
Released as released
Tagline as tagline
}
}
me() {
LabelsMe as sum(val(Labels))
TitleMe as sum(val(Title))
ReleasedMe as sum(val(Released))
TaglineMe as sum(val(Tagline))
qIDMe as sum(val(qID))
}
}
mutation {
set {
_:New <migrationDone> "True" .
_:New <id> val(qIDMe) .
_:New <title> val(TitleMe) .
_:New <dgraph.type> val(LabelsMe) .
_:New <released> val(ReleasedMe) .
_:New <tagline> val(TaglineMe) .
}
}
}
Further work that needs to be done:
Blocks #4767
In SQL it has no exact similarity cuz it is not a Graph DB. But in theory, it would be the same as MERGE https://docs.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver15 - Which is the ability to merge two tables in one. Which would be the case with this operation that I'm using Upsert Block.
In the above operation, I am merging two nodes. "Parent" with "Child", that is, bringing the nested data to the parent. And then I would delete the nested data.
Yes, it is possible to do with the existing functionality. But it is limited. It is necessary to create an extra block in a kind of "Hacky way" for it to work properly. However, sometimes it doesn't work as expected. Also, bulk upsert is impossible with this approach. The user needs to spend hundreds or thousands of Upsert Blocks to complete the entire process.
BTW, this issue is slightly different from #4767 the only thing that is similar is the use of an aggregation block to make this feature work.
Question:
Does anyone have an experience loading big datasets into a live cluster.
The issue we're facing is to load a large amount of vertexes and edges into the cluster, but some of these nodes may already exist and we dont want duplicates.
I was hoping to be able to write all the data as files with an array of upsert blocks, and feed it to the live loader, but that does not seem to work.
Any ideas?
What I would answer.
Upsert Block is a too new feature. It won’t work in any loader.
One thing you can do is add a unique edge with a unique value for each entity. And then, when it is totally loaded. Do upserts to merge them.
But I just remembered that this isn't possible either. Due to these issues.
I was looking around as to how upsert works in other databases. I didn't find any example where a database would allow updates using the values of variables defined in the query of the upsert.
I also noticed that databases only allow single row update or same update in all the rows that satisfy the query criteria.
I recommend that we keep the current behaviour for val as it is because it provides one to one mapping of nodes to their values which is similar to single row update that other databases provide. I also think that we should accept the change proposed in the PR https://github.com/dgraph-io/dgraph/pull/4767 to at least allow some of the use cases.
We can keep thinking how we can better solve the problems raised by this issue. I do not have a good solution that seems worth implementing.
cc @manishrjain @MichelDiz
@mangalaman93 Perhaps this is a new feature unrelated to variables? What if there was a function to return a predicate's value from a given uid? Intuitively, if I can say uid(v) in a mutation, I'd expect to also be able to say something like predicateValue(uid(v), "starttime"). I think this would solve my issues, what about you @MichelDiz ?
As a small note, dgraph does not error when you try to use vars like this in bulk contexts, which is probably a small bug given it is not currently unsupported, e.x the _:newNode <endtime> val(endVar) .
line below doesn't do anything
upsert {
query {
v AS var(func: eq("eventtype", "myevent")) @filter(lt(starttime, "2005") AND gt(endtime, "2005")) {
startVar as starttime
endVar as endtime
}
}
mutation @if(gt(len(v), 0)) {
set {
uid(v) <endtime> "2005" .
_:newNode <starttime> "2005" .
_:newNode <endtime> val(endVar) .
}
}
}
I'm using dgraph to store event information, nodes look like this { eventType, start, end }.
My api receives discrete beginEvent and finishEvent events from a message bus (which happens to not guarantee ordering)
When I ingest to dgraph, I want to do conditional upserts- for example:
If I receive a beginEvent at t2, I'd like to find all events that span that time, for example an event like {eventType: myevent, start: t0, end: t5} and using an upsert... end up with two nodes {start: t0, end:t2} and {start: t2, end:t5}. (With an approach like this, I'll have eventual consistency even if my events are delivered out of order)
The bug fix that recently went in makes this possible for a single event update at a time (using that me(){ myvar as sum()} syntax), but I'd like to be able to do batches for performance reasons.
@seanlaff This seems like a good idea. Let me think through this.
I just realized that what you are suggesting is already possible. You could do something like:
upsert {
query {
.
.
.
var(func: uid(v)) {
p as starttime
}
}
# Now you can use val(p) along with uid(v) and it'd work fine.
# something like below
mutation {
set {
uid(v) <SomePredicate> val(p) .
}
}
}
Now, the problem we are trying to solve is when different set of UIDs needs to be correlated UIDs in a variable.
@mangalaman93 Exactly, it's possible with one match but does not work with an array of matches. I was thinking perhaps a new func since the mutation doesn't exist within the closure of the block in which predicate var was defined... but on the other hand, my first intuition when writing the query was that I could use a predicate var within the mutation and the planner would "know" which uid to pull it from- so perhaps that is a better goal to strive for.
Is this issue related to the fact that properties is an edge so val(Title) could be many different titles ? no idea how Dgraph manages values underneath but i'd expect val(Title) to take the last found title in the sequence of properties nodes found, thus i don't understand why extracting Title is different from extracting Labels or v :thinking:
Imaging with a pseudo cypher query it would show up like
MATCH (movie)-[properties]->(prop)
SET movie.title = prop.title
meaning (unless i'm wrong) that movie.title is equal to the last matched property title
I created an issue requesting a loop feature. And I believe that it would solve the problem that we have in this issue here.
Here's an example.
upsert {
query {
source as movies(func: has(type), first:100)
@filter(has(properties) AND not has(migrationDone)){
qID as id
Labels as labels
properties {
Title as title
Released as released
Tagline as tagline
}
}
}
mutation @exec(foreach(in: source)) {
set {
_:New <migrationDone> "True" .
_:New <id> val(qID) .
_:New <title> val(Title) .
_:New <dgraph.type> val(Labels) .
_:New <released> val(Released) .
_:New <tagline> val(Tagline) .
}
}
}
What does that mean?
For each node found in the query, Dgraph will add a new mutation block to the transaction.
That is, if in the query finds 100 nodes, it will create 100 "mutation/set" blocks execute the mutation and finally commit the transaction. This is the idea.
I'm currently working on a rewrite of the graphql execute function so i think i understand the issue better (thanks for the ping)! i'm faced with the same problem that i plan to solve by treating exported datas as is.
single valuearraysequenceThe neo4j foreach is basically a scoped unwind #5165 and indeed would totally solve the problem
in my case i'm going to follow the concept that i find the most intuitive, meaning if i had to implement a foreach it would work on an exported array only but not on your current exemple which for me is :
source is a sequence of n moviessingle valuearrayn propertiesthe unwind would work on everything as everything is a sequence (a single value is a sequence of 1) and each operation definition (the mutation in our case) represent a template of what needs to be done on each sequence (which is most of the time a sequence of 1 unless we are in a bulk situation like here in a sequence of n source)
ps: i have written an execution process (6.) here which may help
Hey Sceat,
Going back in time a little. Sorry for missing your question.
The answer will be really BIG (sorry also for that), but I will cover everything.
Is this issue related to the fact that properties is an edge so val(Title) could be many different titles?
Not exactly. But also that.
"thus i don't understand why extracting Title is different from extracting Labels or v"
That's about levels. I gonna explain below better, as far as I know.
Continuing
Internally Dgraph treats query levels as distinct, that is, level 1 is not related to level two and so on - so I created an extra block called "me()" - to capture the values of any level and transport them for the mutation block (or any block I want). That is, with the block "me()" I brought up the nested values to level one. That is why my "hacky" query works.
The values need to be at the same level in order for a mutation to work. Mutation does not capture nested levels.
I don't know how to explain it technically because I'm not Core Dev. But it's based on my experience that I claim this. But it has nothing to do with the fact that there could be several objects on that edge. This is unexpected behavior. But in the end, several objects in an edge could lead to more issues tho.
{
"set": [
{
"name": "MyTest",
"properties":[
{"propertie1": "Test 1"},
{"propertie2": "Test 2"},
{"propertie3": "Test 3"}
]
}
]
}
You can also try Single property object
{
"set": [
{
"name": "MyTest",
"properties":[
{
"propertie1": "Test 1",
"propertie2": "Test 2",
"propertie3": "Test 3"
}
]
}
]
}
query {
movies(func: has(properties), first:1000) @filter(not has(migrationDone)) {
uid
name
properties {
propertie1
propertie2
propertie3
}
}
}
You gonna see the mutation, all normal until here.
upsert {
query {
movies(func: has(properties), first:1000) @filter(not has(migrationDone)){
NAME as name #Level 1
properties {
propertie1V as propertie1 #Level 2
propertie2V as propertie2 #Level 2
propertie3V as propertie3 #Level 2
}
}
}
mutation {
set {
_:New <migrationDone> "True" .
_:New <name> val(NAME) .
_:New <propertie1> val(propertie1V) .
_:New <propertie2> val(propertie2V) .
_:New <propertie3> val(propertie3V) .
}
}
}
query {
A1 as var(func: has(propertie1))
A2 as var(func: has(propertie2))
A3 as var(func: has(propertie3))
A4 as var(func: has(migrationDone))
A5 as var(func: has(name)) @filter(not has(properties))
q(func: uid(A1, A2, A3, A4, A5), first:1000) {
name
propertie1
propertie2
propertie3
migrationDone
properties { #This won't appear it is here only to release the conscience's weight.
propertie1
propertie2
propertie3
}
}
}
The Single property object would create one new object and mutate migrationDone in the previously created node.
{
"data": {
"q": [
{
"propertie1": "Test 1"
},
{
"propertie2": "Test 2"
},
{
"propertie3": "Test 3"
},
{
"migrationDone": "True"
}
]
}
}
As you can see, he created 4 new nodes/objects. Because it treats each object from the query independently. And that is a problem. It is not the behavior we want. And this behavior is intrinsic to the fundamental design of the Dgraph query system.
But it could get even worse. Imagine that we have 1000 nodes with properties. It would simply create something around 4000 new nodes! That's why I always limit this type of query to first: 1. And I add the extra block "me()".
If we use an existing UID instead of a blank node, the same thing will happen. It simply ignores and creates three more extra nodes. The node you used it will probably add the value that is in "migrationDone".
It is at that moment that the foreach would enter.
With foreach I can make mutations based on the root map, which in this case is the "source". That is, it will "isolate" the context by step in a map from the "source" variable.
Let's say again that we have 100 nodes with properties.
It will iterate over the "movies" in isolation. One by one collecting the values and assembling the mutation block for that movie 1 out of 100.
In your last comment, I realize that even with foreach there could still be problems. In some use cases.
For example, you expect "properties" to be an array. So far so good. But in practice, It would have to create another mechanism to expand this array within the mutation block. And that was never supported and it sounds like something even more complex to implement.
Foreach would work normally when expanding the nested blocks in the query, but not the nested block itself within a variable. In that case, you need something that expands it. And that is another issue.
BTW. It is important to emphasize that Upsert Block was never foreseen in the Dgraph's design. So these things happens, and we need to find ways to make them work. It is not easy to change the wheel with the car moving.
Cheers.
Oh okay make sense! that bring an unexpected amount of conceptual problems.. if a core dev could bring more internal details on how execution works ? technically if Dgraph could evaluate the result of query first and export a kinda flattened varmap to pass to the mutation there would be no problems i guess
Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.
