Earlier dgraph had support to _xid_ or custom uid (I have used till 0.8.3)
Now it seems that both of these are removed. This may result in a lot of problems
Most of the devs would want this as the core feature of the dgraph.
Related slack conv: https://dgraph.slack.com/archives/C13LH03RR/p1518791448000040
What I have done to solve this problem is create a helper function getExtIdToUidMap that takes an array of ext ids and returns me the array of uids so that I can operate on them. Agreed that one database call is better than two, but at least a helper can encapsulate the query -> mutate process.
I think stopping support for XID and treating it like other edges allowed the Dgraph team to simplify the code base for v1.
Another interesting problem affecting external IDs is related to the Dgraph team's suggestion to have many small predicates rather than a single large predicate (like "external id"). So the best practice solution may be to have different XID predicates for different data types... like person.xid, place.xid, company.xid.
Indeed, this was giving us headaches for a bit as well. What we ended up doing was to create a central "gateway" microservice that handles database writes, with an internal cache of uid mappings, as well as a Redis layer behind that.
Yeah @antikantian this was the first thing I had thought of. Using the inbuilt badger for maintaining a map of xid to uid. But this feature seems like a candidate for a product. This feature was already there which was removed.
@akshaydeo, I totally agree. External id support (or, I guess re-support in this case) would make things a lot easier.
+1 for this. I'm actually using dgraph with Redis for UID mapping. XID-to-UID map is a fundamental function for normal applications. I think one of the best dgraph feature is its simplicity. But when it comes to UID map, we need another big storage like Redis which might make people to look for alternatives.
@pawanrawal @manishrjain can anyone from dgraph help us on this?
I have an update. Seeing the least enthusiasm from the maintainers of the project 😞, I started digging into the source code.
UIDsVerifyUid which does the verification of the UID passed with the quad against the available lease.@pawanrawal @manishrjain Could you guys let us know if there could be any issues you could think of in this approach, until this is officially supported by dgraph?
Glad someone is prepared to get their hands dirty :)
Could you please expand a bit more on “This approach won't handle
collisions of the UID. As far as my tests go, it just overwrites which was
desired behavior for my app”? Are you saying that this is not functionality
to deal with the ‘if node exists’ vs. ‘If node doesn’t exist’ case, but
simply ensured that there can’t ever be 2 nodes with the same xid?
On Fri, 23 Mar 2018 at 12:08, Akshay Deo notifications@github.com wrote:
I have an update. Seeing the least enthusiasm from the maintainers of the
project 😞, I started digging into the source code.
- One of the main reason, of removing this support seems like to avoid
the collision of the UIDs- In the current setup, the current leader does the UID allocation.
- The lease is shared with all the nodes for faster verifications etc.
Hacky approach
- There is a function called VerifyUid which does the verification of
the UID passed with the quad against the available lease.
Refer:
https://sourcegraph.com/github.com/dgraph-io/dgraph/-/blob/query/mutation.go#L96:6- I just returned nil via the function.
Pros
- Passing all the test cases of the project 🍾
- Passing all the test cases I have in my project 👍
- If you don't pass your own UID entire flow works the same.
Cons 🚑
- This approach won't handle collisions of the UID. As far as my tests
go, it just overwrites which was desired behavior for my app- I have tried my best to check all the flows that might break, but
could not find any, but it's still a hacky approach.@pawanrawal https://github.com/pawanrawal @manishrjain
https://github.com/manishrjain Could you guys let us know if there could—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/dgraph-io/dgraph/issues/2134#issuecomment-375645458,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACKSZl8IhJO_ru2FfWWjvoK1Pkjjqwnxks5thOW2gaJpZM4SIfoF
.>
Thanks,
Vlad
Yes. So here is the example:
{Subject: "0x1", Predicate: "Name", Object: "Jarvis"}
{Subject: "0x1", Predicate: "Color", Object: "White"}
Premise is Node with UID
0x1is not in the system
And fetch the record by UID 0x1 would return this node with predicates Name and Color with Values Jarvis And White
{Subject: "0x1", Predicate: "Name", Object: "Jarvis2"}
And fetch the record by UID 0x1 would return this node with predicates Name and Color with Values Jarvis2 And White
But you can already perform mutations that write to a UID and ensure that
there’s no duplication. Can the UID in your example be replaced with an
XID? For example, can it be something like “account_193748”?
On Fri, 23 Mar 2018 at 12:39, Akshay Deo notifications@github.com wrote:
Yes. So here is the example:
- I perform mutate by following NQuads
{Subject: "0x1", Predicate: "Name", Object: "Jarvis"}
{Subject: "0x1", Predicate: "Color", Object: "White"}Premise is Node with UID 0x1 is not in the system
And fetch the record by UID 0x1 would return this node with predicates
Name and Color with Values Jarvis And White
- If I perform one more mutation after this as follows
{Subject: "0x1", Predicate: "Name", Object: "Jarvis2"}
And fetch the record by UID 0x1 would return this node with predicates
Name and Color with Values Jarvis2 And White—
You are receiving this because you commented.Reply to this email directly, view it on GitHub
https://github.com/dgraph-io/dgraph/issues/2134#issuecomment-375652299,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACKSZjl1pw2AUy8qnn6ApvSeCdQdSZ8Aks5thOzkgaJpZM4SIfoF
.>
Thanks,
Vlad
Yes yes, but with a twist.
So In dgraph, uid is the ultimate identifier, which has to be a hex integer.
But there is a workaround which I have been using since I have started using dgraph (0.7.x)
uid and xid. I do something like func uidFromxid(xid string) uint64 {
f := fnv.New64a()
f.Write([]byte(xid))
return f.Sum64()
}
And pass this as uid in the above example.
This way the mapping becomes programmatical
Ah, so all you're doing is disabling the verification of the UID. Your function to translate an XID into a UID will lead to the data being stored with UIDs that are not sequential, but all over the place. So then, as your question stated above, you need to understand the unintended consequences of this.
At least superficially, it sounds like it would fit our scenario has well. Could that validation be made optional?
I can quickly send a pull request, which can skip the verification based on a param sent with dgraph server?
Would that be a right way to introduce that?
@akshaydeo I was doing this as well for a time. When you say you bypass the VerifyUid function, that means dgraph does no checks of the UID at all? I ask because when I tried maintaining my own set of UID's and passing those in mutations, I eventually encountered an error to the effect of "UID value cannot be greater than lease" or something to that effect.
That's the error verification he's bypassing, check the code that he's
linked to :)
On Sun, 25 Mar 2018 at 11:13, antikantian notifications@github.com wrote:
@akshaydeo https://github.com/akshaydeo I was doing this as well for a
time. When you say you bypass the VerifyUid function, that means dgraph
does no checks of the UID at all? I ask because when I tried maintaining my
own set of UID's and passing those in mutations, I eventually encountered
an error to the effect of "UID value cannot be greater than lease" or
something to that effect.—
You are receiving this because you commented.Reply to this email directly, view it on GitHub
https://github.com/dgraph-io/dgraph/issues/2134#issuecomment-375959262,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACKSZqp4BXc9zvwOEq1YdIcEz2_GjtuMks5th23bgaJpZM4SIfoF
.>
Thanks,
Vlad
Hey @akshaydeo
Its great to see the interest in this feature and I appreciate your effort into this. Let me first clarify why we have the VerifyUid function.
As you know Dgraph allocates uids sequentially, now if we were to allow the user to set data with a random uid, like say 2500, then when the uid allocator actually gets to that uid (when 2499 new uids have already been allocated), it would append data to the uid instead of creating a new node (because its hard to check if a uid has already been used). So node with uid 2500 would have the old data that was set directly by the user as well as the new data.
Another problem is the hashing function that you use to transform the xid to a uid. Since it is a hashing function, it can lead to collisions (same uid for two different xids), which would lead to inconsistent data. The probability of collision is more for longer xids.
The only scenario in which this would work reliably is if no new nodes are created using Dgraph and if the xids are uint64. That is when all data is created outside of Dgraph with a uint64 identifier and it is just stored and retrieved from Dgraph. If that is your use case, then sure bypassing the check can work.
any updates on a good way to do this? My usecase needs to use xids in the form of URLs and i've just now learned they're no longer in, would really be interested in some updates on, at the very least, some decent ways to handle this
is there really no other way than running a substitution on the RDF nquads and creating a mapping from xids to uids?
So, to answer the general questions here:
We could allow a way by which you could do your own UID-allocations. Dgraph would then no longer be the one responsible for handing out new UIDs, or doing conflict checking. Let me know if that'd help, @akshaydeo ?
@beta-phenylethylamine : You could assign an xid edge to the nodes, set type as string, and put hash index on them. Then, you could query via the eq function.
In general, we have no plans to natively support an external ID -- because the index based support is the best that we can do at our end, but that mechanism is already available.
@manishrjain thanks! I actually just realized that I was misinterpreting the docs on that bit myself. Still a bit of a PitA transform on outside RDF data but much more manageable than directly mapping everything imo
I don't think any action is needed here. Most users use an xid predicate to map Dgraph IDs to their own external ids. That works well already.
So, to answer the general questions here:
We could allow a way by which you could do your own UID-allocations. Dgraph would then no longer be the one responsible for handing out new UIDs, or doing conflict checking. Let me know if that'd help, @akshaydeo ?
@beta-phenylethylamine : You could assign an
xidedge to the nodes, set type as string, and puthashindex on them. Then, you could query via theeqfunction.In general, we have no plans to natively support an external ID -- because the index based support is the best that we can do at our end, but that mechanism is already available.
If you use this method, it is very likely that one xid will have multiple query results (corresponding to multiple uids). How can you solve this problem
I don't think any action is needed here. Most users use a
xidpredicate to map Dgraph IDs to their own external ids. That works well already.
@manishrjain the point of this issue is related to node updates.
For updating a node,
eqAnother way would be to keep a uid <-> mapping externally.
With external ID support, you can directly perform this operation in one single query.
Correct me if I am wrong?
I would highly recommend having this feature inside dgraph for most of the use cases where dgraph will be used as a knowledge graph against the main database of the system.
Most helpful comment
@manishrjain the point of this issue is related to node updates.
For updating a node,
eqAnother way would be to keep a uid <-> mapping externally.
With external ID support, you can directly perform this operation in one single query.
Correct me if I am wrong?
I would highly recommend having this feature inside
dgraphfor most of the use cases where dgraph will be used as a knowledge graph against the main database of the system.