Dgraph: Benchmark Dgraph against Cayley

Created on 3 Jun 2016 · 36Comments · Source: dgraph-io/dgraph

This is the most popular page on discuss:
https://discuss.dgraph.io/t/differences-between-dgraph-and-cayley/23

This is also the most popular question on Reddit:
https://www.reddit.com/comments/4fs7qm/why_we_built_dgraph/

And again:
https://www.reddit.com/r/golang/comments/3uyj8m/announcing_alpha_release_of_open_source/

And again on HN:
https://news.ycombinator.com/item?id=11322444

So, it's about time we do a real benchmark. @ashwin95r has written some code to benchmark Dgraph for his final year thesis. So, he will be the coordinator for this project.

We can reuse that code to throw similar queries at Cayley and see how it compares.

benchmark help wanted

Source

manishrjain

All 36 comments

Hi Manish I would like to pick it up this one

patilvikram on 16 Jun 2016

Hey Vikram,
Go ahead. Let me know if you want some help!

ashwin95r on 16 Jun 2016

Awesome! @ashwin95r has some benchmarking scripts that he built for his performance report. You might want to talk to him before you begin.

If you finish this, we can send some Dgraph swag your way :+1: .

manishrjain on 17 Jun 2016

Hey, I've just recently found out Dgraph and sounds like an exciting project!

Regarding the benchmark, another GraphDB vendor created an open source graph/NoSQL benchmark some time ago and they accept contributions, here's the link: https://github.com/weinberger/nosql-tests. They've already used it to comparare ArangoDB, Neo4j, OrientDB, MongoDB and PostgreSQL: https://www.arangodb.com/performance/. Using it may save you some time and provide a wider comparison.

dmarcelino on 22 Jun 2016

Interesting. Not sure if we support all of the functionality required by the benchmark, but surely worth a look to determine if we can fit in their cast.

manishrjain on 22 Jun 2016

👍1

Hey @patilvikram -- Any updates about this?

manishrjain on 1 Jul 2016

I think I can give you real-world stats without too much hardship. I have an app that uses cayley, and speed's real important for it. Does dgraph support a fully-in-memory mode?

faddat on 9 Oct 2016

Hey @faddat,

To run Dgraph fully in-memory, you'd have to run it on RAMFS or TMPFS. Put all the directories (postings, uids and mutations) under those.

manishrjain on 10 Oct 2016

Thanks for that! This is what I'll be working on next, once this kernel finishes compiling.

faddat on 10 Oct 2016

Still looking for help? If nobody has had the time to work on somthing for this I could dedicate some time to make it happend...

robbert229 on 26 Nov 2016

Hey @robbert229, Yes, still looking for someone to take it up. Would be great if you could do this. We're going to do a release next week; so maybe pick that version. It's going to be v0.7.

manishrjain on 26 Nov 2016

@manishrjain did anyone pick this up? If it can be done in 2-3 weeks I can take this up.

santhukumar on 25 Dec 2016

Please do. We really need those benchmarks for an upcoming talk.

Sent from Nexus 6P

On Dec 25, 2016 4:15 PM, "santhukumar" notifications@github.com wrote:

@manishrjain https://github.com/manishrjain did anyone pick this up? If
it can be done in 2-3 weeks I can take this up.

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/dgraph-io/dgraph/issues/101#issuecomment-269110601,
or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABsyNPPJZcr6ewa0B7z3PxZtPubZh9kmks5rLfwHgaJpZM4ItIN2
.

manishrjain on 25 Dec 2016

I have started coding some python scripts for doing benchmark test for Dgraph vs Cayley.
I will be using same stackoverflow data as suggested in #101.
Link for WIP for this benchmark is here https://github.com/ankurayadav/graphdb-benchmarks.

ankurayadav on 27 Dec 2016

That's great @ankurayadav. Let us know if you are stuck somewhere and need some help.

pawanrawal on 28 Dec 2016

I have couple of questions. I think https://github.com/weinberger/nosql-tests is a good starting point. I can use that code base to do banchmarking. Cayley works with levelDB, Bolt, PostgreSQL, MongoDB and performance depends on the underlying database also. Shall we benchmark against cayley on all 4 databases? https://www.arangodb.com/2015/10/benchmark-postgresql-mongodb-arangodb/ does benchmarking on few metrics. What are the metrics we are looking at?

santhukumar on 28 Dec 2016

Stackoverflow archives are very big. Its taking too much time to convert is from xml to ntriples.
Does anyone one know any efficient way to convert xml to ntriples.

ankurayadav on 28 Dec 2016

@santhukumar -- You can use the Freebase film data that we have -- the 21M RDFs. And then issue various queries that we have documented in our Query Language Spec. If you run them via go test benchmarking functionality, those numbers would be pretty accurate. Focus on getting the code right, so we can also run them on our and Amazon instances.

@ankurayadav -- You can use the 21M RDFs that we have in benchmarks repository instead. They contain freebase film data.

manishrjain on 28 Dec 2016

👍1

I have added benchmark tests for loading data to cayley and dgraph.
Soon, I will be adding some benchmarks tests on some queries on both cayley and dgraph.

https://github.com/ankurayadav/graphdb-benchmarks

ankurayadav on 1 Jan 2017

@ankurayadav -- That looks interesting. Let's do the queries as well and then we can try a couple more backends on Cayley.

I think you'll have more interesting queries with the 21M RDF data, than the 30k.

manishrjain on 1 Jan 2017

I was not able to load full 21M RDF data to dgraph on my macbook pro.

            .
         .
       
      
        
      
, err: context deadline exceeded

BenchmarkImportDataToDB-4              1    365186762389 ns/op
PASS
ok      _/Users/ankuryadav/dev/benchmark/graphdb-benchmark/dgraph   365.200s

also for cayley I got this error

cs     0x7
fs     0x0
gs     0x0
*** Test killed with quit: ran too long (1h1m0s).
FAIL    _/Users/ankuryadav/dev/benchmark/graphdb-benchmark/cayley   3660.118s

ankurayadav on 2 Jan 2017

👍1

@ankurayadav How are you trying to load the RDFs into Dgraph? Are you using the dgraphloader or something else?

pawanrawal on 3 Jan 2017

@pawanrawal thanks for reaching out to help.
I have successfully loaded 21million rdf data into dgraph and cayley.
The reason for these error was test timeout. So, I had to increase test timeout to 4h to get this working because cayley took around 2.5h to load whole data.
I have learnt some basic gremlin and have tried it out on cayley.
I will soon come up with benchmark on few queries.

ankurayadav on 3 Jan 2017

Thats great @ankurayadav. Looking forward to some benchmarks. How much time did it take to load up the data on Cayley vs Dgraph?

pawanrawal on 4 Jan 2017

For Cayley it took around 2h 22m to load full data.
And for Dgraph it took around 15m to load full data.

ankurayadav on 5 Jan 2017

I have benchmarked a single query in both Cayley and Dgraph. i.e.
Query to find all movies and the genre of director "Steven Spielberg"
Results are available here.
https://github.com/ankurayadav/graphdb-benchmarks#results-of-queries-benchmark

ankurayadav on 5 Jan 2017

Hey @ankurayadav, this look really interesting. Can you add a few more queries? You can pick the queries from https://wiki.dgraph.io/Get_Started.

Also: Maybe it would be worth reaching out to Cayley folks to check if boltdb is their fastest persistent backend, and if the query you produced for Cayley is optimized. We want to ensure that we provide the best possible outcome for Cayley.

manishrjain on 6 Jan 2017

@manishrjain @ankurayadav We know that current implementation of Bolt is not optimal, not to say slow. We are working on alternative implementation which is already much faster.

Also, it's not a good idea to compare Gremlin which is a full JS VM to a native query performance. We have an experimental branch with GraphQL support, so it's better to use it instead (when we rebase it on top of our new Bolt implementation).

dennwc on 6 Jan 2017

👍1

@dennwc -- When do you expect these changes to be pushed to master?

manishrjain on 6 Jan 2017

@manishrjain To weeks at least. Maybe more. We should verify new approach carefully.

But we can make an experimental build earlier. Still need at least a week or two for this, since I have not much free cycles right now.

dennwc on 6 Jan 2017

I am eager to try GraphQL on Cayley.
I shall keep an eye on its release!

ankurayadav on 6 Jan 2017

@ankurayadav -- Seems like we should instead use MQL, which Cayley also supports. See Barak's comment here: https://discourse.cayley.io/t/cayley-vs-dgraph-benchmarks/562/8?u=mrjn

Note that I'd need these results in a week's time for Go meetup in Sydney; so would be great if you could put them together. Also, use other queries, if Cayley supports them. For e.g., these: https://wiki.dgraph.io/Get_Started

manishrjain on 10 Jan 2017

@manishrjain I will dig into some MQL and will try to make some more benchmark queries in few days.
I will try to complete it within a week so that you can use it in meetup.

ankurayadav on 10 Jan 2017

@manishrjain I tried to do benchmarks with existing support of MQL in cayley but looks like it doesn't support filters yet.
So, I just did benchmarking on two queries which I was able to perform on caylay using MQL.
I will redo benchmarking when cayley fully supports MQL.
Results are available here.
https://github.com/ankurayadav/graphdb-benchmarks#results-of-queries-benchmark-1

ankurayadav on 14 Jan 2017

Is this issue closed, wondering as there is no update since two months?
If no, what are the things left to investigate as I want to help

RajawatAmit on 5 Mar 2017

This is done. @ankurayadav has the link to the results. Also, @ankurayadav: You did a write up about this test. Can you post it on discuss.dgraph.io? That way we can link it here.

manishrjain on 6 Mar 2017

Was this page helpful?

0 / 5 - 0 ratings