Dgraph: RDF vs Property Graph

Created on 19 Sep 2017  路  12Comments  路  Source: dgraph-io/dgraph

As Dgraph uses the triple, do you agree to the differences outlined in the following article?

https://neo4j.com/blog/rdf-triple-store-vs-labeled-property-graph-difference

Most helpful comment

We do allow storing properties on edges like property graph. We have extended rdf syntax to be able to specify attributes of edges.
https://docs.dgraph.io/query-language/#facets-edge-attributes

About the comment on high space usage by rdfs that's not entirely true. As manish said we convert string id's to unique integer ID which are assigned incrementally. We store each property of node as a vertex, so when compared to storing all of the properties together the extra space overhead would be the amount of space consumed by the keys. Our keys start with predicate name and uid and we do prefix diffing of keys in badger so this overhead shouldn't be significant.

We store all the outgoing relation for subject, predicate pair together. and compress it using bit packing. Example all friends of alice would be stored in a single list and compressed. Since the list is sorted uids and we generate them incrementally they compress well.

So our space usage shouldn't be significantly higher compared to property graphs as mentioned in the article.

This article seemed to be written with some bias against rdf stores. "RDF stores are very strongly index-based," - rdf is just a data representation for input and storage layer could be entirely different and we don't used index for traversing a relation as mentioned in the article.
We store relations as a posting list, https://docs.dgraph.io/design-concepts/#posting-list. So traversing a relation is just a lookup for us.

PS: Space usage might be higher in the beginning due to they way LSM works but the space usage would decrease eventually on compaction.

All 12 comments

Just to clarify outright, we accept triples, but we don't really operate like a triple store. For e.g., we don't allow strings as IDs -- all the nodes have a unique integer based ID that Dgraph allocates to them, and the way we store data is quite unique. One can add an edge to a node to represent the resource identifier URL, but there's no native support for it.

We do allow storing properties on edges like property graph. We have extended rdf syntax to be able to specify attributes of edges.
https://docs.dgraph.io/query-language/#facets-edge-attributes

About the comment on high space usage by rdfs that's not entirely true. As manish said we convert string id's to unique integer ID which are assigned incrementally. We store each property of node as a vertex, so when compared to storing all of the properties together the extra space overhead would be the amount of space consumed by the keys. Our keys start with predicate name and uid and we do prefix diffing of keys in badger so this overhead shouldn't be significant.

We store all the outgoing relation for subject, predicate pair together. and compress it using bit packing. Example all friends of alice would be stored in a single list and compressed. Since the list is sorted uids and we generate them incrementally they compress well.

So our space usage shouldn't be significantly higher compared to property graphs as mentioned in the article.

This article seemed to be written with some bias against rdf stores. "RDF stores are very strongly index-based," - rdf is just a data representation for input and storage layer could be entirely different and we don't used index for traversing a relation as mentioned in the article.
We store relations as a posting list, https://docs.dgraph.io/design-concepts/#posting-list. So traversing a relation is just a lookup for us.

PS: Space usage might be higher in the beginning due to they way LSM works but the space usage would decrease eventually on compaction.

@manishrjain & @janardhan1993, appreciate your quick responses and insights. It's great to know that Dgraph extends triples to allow facets. May I know

  • if compaction decreases write/read performance by how much if you have the data
  • Dgraph creates different UIDs for the same triples. Is it application's responsibility to ensure the unique vertex(node)?

PS: Space usage might be higher in the beginning due to they way LSM works but the space usage would decrease eventually on compaction.

if compaction decreases write/read performance by how much if you have the data

LSM tree compactions don't really result in space reduction. I think @janardhan1993 was thinking about a value log garbage collection, which happens frequently. In general, I don't think you should concerned about disk usage. Dgraph's represents everything as posting lists and compresses them using latest research before it stores them on disk (/ram), so our space usage is going to be much smaller than other DBs.

We have the data for Badger's (the underlying KV store, with the LSM tree) write performance: https://blog.dgraph.io/post/badger/ -- It performs really well (in fact, outperforms as value sizes increase) against RocksDB, which is an industry standard.

Dgraph creates different UIDs for the same triples. Is it application's responsibility to ensure the unique vertex(node)?

Dgraph client, Dgraph loader, and Bulk loader -- all take care of that for you. We also introduced an upsert operation https://docs.dgraph.io/query-language/#upsert, which allows atomically checking and creating a node.

@manishrjain, thanks for your quick response. It's great to know that you have upsert and your loaders remove duplicates.

Please bear with me for a few more questions this week.

  • In Dgraph, it says A query can鈥檛 traverse an edge in reverse. Neo4J seems not care for the edge direction. Please confirm if this is one of the difference.
  • Does Dgraph support finding a DISTINCT result set?

  • Values(Literals) don't show up in the graph in Dgraph.

  • What if a deletion failed partially, will Dgraph rollback or the graph is corrupted?

In Dgraph, it says A query can鈥檛 traverse an edge in reverse

It can now, you just need to set @reverse indexing. Graphs are directional, so reverses are derived data set, just like indices for us.
https://docs.dgraph.io/query-language/#reverse-edges

Does Dgraph support finding a DISTINCT result set?

Currently, not. We had a task to do it, but didn't get around to it. If you file an issue, we can try to get to it.

Values(Literals) don't show up in the graph in Dgraph.

I'm not sure what you mean. We handle values just fine. You can see the data types that we handle natively: https://docs.dgraph.io/query-language/#schema-types

What if a deletion failed partially, will Dgraph rollback or the graph is corrupted?

Dgraph doesn't do transactions right now. But, there's an outstanding issue for it. We're putting together a design doc to envision how we could introduce them.
https://github.com/dgraph-io/dgraph/issues/1445

@manishrjain, thanks for all your help.

Graph DB only display relational data, not value data, which is fine. A few more questions:

  • Dgraph supports labels but not in the query. When will you support label in the query? If I like to specify a label for my data, is it like
_:michael <name> "Michael" person .
  • Is there a way to view all data on a particular server? Or is there a command to check if Dgraph does what specified in the groups.config file, such as object.name is in group1 and on server x?

  • If Dgraph can provide the drill down child relationship feature same as Neo4J, that would be cool.

  • In the Dgraph visualizer, there is an input box for entering regex for labels, if you have an example of how to use it?

  • Dgraph visualizer shows vertex name(label) randomly. That's some vertexes have labels and some don't. Why is it? e.g. purple nodes don't have any text on them.

screen shot 2017-09-21 at 4 30 53 pm

  • The query below. The question is that how I know how many expand I need to use? Are they determined by predicates? Is there any other way to example data instead of using _predicate_?
{
  expand(func: allofterms(name, "Michael")) {
    expand(_all_) {
      expand(_all_) {
        expand(_all_)
      }
    }
  }
}
  • I saw your roadmap for Dgraph 1.0. When will that be?

Hey @candysmurf, I will try to answer your questions here.

Graph DB only display relational data, not value data

Dgraph does display values if the query asked for it. Try clicking on a node in the visualization and the values should be displayed in the last row. Image attached for reference.

2017-09-22-114220_1754x1054_scrot

Dgraph supports labels but not in the query. When will you support label in the query?

I think what you are looking for can be achieved by attaching a type edge to your node and [indexing[(https://docs.dgraph.io/query-language/#indexing) it. Then you can use that in your query.

_:michael <name> "Michael" .
_:michael <type> "Person" .

Then you can query for all persons like

{
  me(func: eq(type, "Person")) {
    name
  }
}

Is there a way to view all data on a particular server? Or is there a command to check if Dgraph does what specified in the groups.config file, such as object.name is in group1 and on server x?

You can query for the schema, that would tell you the predicates served by a server. http://localhost:8080/debug/vars would give you more detailed predicate stats under the dgraph_predicate_stats key. You could also do an export to see the data that was written to a dgraph instance.

Also, from the next release, you can query from what groups are served by which dgraph nodes and the predicates (along with their size) that are part of a group.

If Dgraph can provide the drill down child relationship feature same as Neo4J, that would be cool.

I am not very sure what the feature in Neo4j does. We have recurse. Is this what you are looking for?

In the Dgraph visualizer, there is an input box for entering regex for labels, if you have an example of how to use it?

Dgraph visualizer shows vertex name(label) randomly. That's some vertexes have labels and some don't. Why is it? e.g. purple nodes don't have any text on them.

I will answer both these together because they are related. So the regex box is for choosing what node property to display. By default, we try to find a property which matches the name regex. The purple nodes probably don't have a property which matches the name regex. If your nodes didn't have a name and say had a property called alias then you could enter that within the regex box and then it should display the alias label within the nodes.

The query below. The question is that how I know how many expand I need to use? Are they determined by predicates? Is there any other way to example data instead of using _predicate_?

The number of expand here depend on the level of nesting you want to explore. For example, if you only want to explore up to 2nd level friends, then you just have two of them. Again if you want expand infinitely then recurse could be a better way to do it.

Is there any other way to example data instead of using _predicate_?

Sorry, I didn't get what you are trying to ask here. Could you please elaborate a bit more. _predicate_ gives you a list of predicates/edges coming out from a node.

I saw your roadmap for Dgraph 1.0. When will that be?

The roadmap for v1.0 is more than 95% complete. We will be doing a v0.9 release in a couple of months. After that we wont be adding any more features till v1.0 and would only be doing bugfixes/improvements. So v1.0 should happen sometime early next year. We already have companies using us in production, so stability shouldn't be an issue even with the latest release.

Feel free to ask more questions.

@pawanrawal, thanks for all your answers, I'll look into them and get back to you with more questions. I'm very impressed with your support.

  • Any current customers?

Are there any plans to add ACL/authentication support? Ideally I can just setup dGraph on a server and write client code against it without adding additional server code to do authentication.

I'm going to set up the cluster the rest of this week. Hopefully, I can recommend Dgraph by next week. May I know if setting a 3 nodes cluster on the same machine is possible?

@candysmurf : Yeah, a bunch of folks are using us in production now. They're active on Slack.

@jzhang1 : We support encrypted password storage and checking, which can be used to build authentication. We have plans for ACL, but that'd be after v1.0 is released.

@candysmurf : Yeah, you can set up a 3 node cluster on the same machine. We do that for testing. You can use the port offset flag to make it easier to give them different port numbers.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

captain-me0w picture captain-me0w  路  4Comments

bytefish picture bytefish  路  4Comments

allen-munsch picture allen-munsch  路  4Comments

pjebs picture pjebs  路  4Comments

yupengfei picture yupengfei  路  4Comments