This is the most popular page on discuss:
https://discuss.dgraph.io/t/differences-between-dgraph-and-cayley/23
This is also the most popular question on Reddit:
https://www.reddit.com/comments/4fs7qm/why_we_built_dgraph/
And again:
https://www.reddit.com/r/golang/comments/3uyj8m/announcing_alpha_release_of_open_source/
And again on HN:
https://news.ycombinator.com/item?id=11322444
So, it's about time we do a real benchmark. @ashwin95r has written some code to benchmark Dgraph for his final year thesis. So, he will be the coordinator for this project.
We can reuse that code to throw similar queries at Cayley and see how it compares.
Hi Manish I would like to pick it up this one
Hey Vikram,
Go ahead. Let me know if you want some help!
Awesome! @ashwin95r has some benchmarking scripts that he built for his performance report. You might want to talk to him before you begin.
If you finish this, we can send some Dgraph swag your way :+1: .
Hey, I've just recently found out Dgraph and sounds like an exciting project!
Regarding the benchmark, another GraphDB vendor created an open source graph/NoSQL benchmark some time ago and they accept contributions, here's the link: https://github.com/weinberger/nosql-tests. They've already used it to comparare ArangoDB, Neo4j, OrientDB, MongoDB and PostgreSQL: https://www.arangodb.com/performance/. Using it may save you some time and provide a wider comparison.
Interesting. Not sure if we support all of the functionality required by the benchmark, but surely worth a look to determine if we can fit in their cast.
Hey @patilvikram -- Any updates about this?
I think I can give you real-world stats without too much hardship. I have an app that uses cayley, and speed's real important for it. Does dgraph support a fully-in-memory mode?
Hey @faddat,
To run Dgraph fully in-memory, you'd have to run it on RAMFS or TMPFS. Put all the directories (postings, uids and mutations) under those.
Thanks for that! This is what I'll be working on next, once this kernel finishes compiling.
Still looking for help? If nobody has had the time to work on somthing for this I could dedicate some time to make it happend...
Hey @robbert229, Yes, still looking for someone to take it up. Would be great if you could do this. We're going to do a release next week; so maybe pick that version. It's going to be v0.7.
@manishrjain did anyone pick this up? If it can be done in 2-3 weeks I can take this up.
Please do. We really need those benchmarks for an upcoming talk.
Sent from Nexus 6P
On Dec 25, 2016 4:15 PM, "santhukumar" notifications@github.com wrote:
@manishrjain https://github.com/manishrjain did anyone pick this up? If
it can be done in 2-3 weeks I can take this up.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dgraph-io/dgraph/issues/101#issuecomment-269110601,
or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABsyNPPJZcr6ewa0B7z3PxZtPubZh9kmks5rLfwHgaJpZM4ItIN2
.
I have started coding some python scripts for doing benchmark test for Dgraph vs Cayley.
I will be using same stackoverflow data as suggested in #101.
Link for WIP for this benchmark is here https://github.com/ankurayadav/graphdb-benchmarks.
That's great @ankurayadav. Let us know if you are stuck somewhere and need some help.
I have couple of questions. I think https://github.com/weinberger/nosql-tests is a good starting point. I can use that code base to do banchmarking. Cayley works with levelDB, Bolt, PostgreSQL, MongoDB and performance depends on the underlying database also. Shall we benchmark against cayley on all 4 databases? https://www.arangodb.com/2015/10/benchmark-postgresql-mongodb-arangodb/ does benchmarking on few metrics. What are the metrics we are looking at?
Stackoverflow archives are very big. Its taking too much time to convert is from xml to ntriples.
Does anyone one know any efficient way to convert xml to ntriples.
@santhukumar -- You can use the Freebase film data that we have -- the 21M RDFs. And then issue various queries that we have documented in our Query Language Spec. If you run them via go test benchmarking functionality, those numbers would be pretty accurate. Focus on getting the code right, so we can also run them on our and Amazon instances.
@ankurayadav -- You can use the 21M RDFs that we have in benchmarks repository instead. They contain freebase film data.
I have added benchmark tests for loading data to cayley and dgraph.
Soon, I will be adding some benchmarks tests on some queries on both cayley and dgraph.
@ankurayadav -- That looks interesting. Let's do the queries as well and then we can try a couple more backends on Cayley.
I think you'll have more interesting queries with the 21M RDF data, than the 30k.
I was not able to load full 21M RDF data to dgraph on my macbook pro.
. . , err: context deadline exceeded BenchmarkImportDataToDB-4 1 365186762389 ns/op PASS ok _/Users/ankuryadav/dev/benchmark/graphdb-benchmark/dgraph 365.200s
also for cayley I got this error
cs 0x7 fs 0x0 gs 0x0 *** Test killed with quit: ran too long (1h1m0s). FAIL _/Users/ankuryadav/dev/benchmark/graphdb-benchmark/cayley 3660.118s
@ankurayadav How are you trying to load the RDFs into Dgraph? Are you using the dgraphloader or something else?
@pawanrawal thanks for reaching out to help.
I have successfully loaded 21million rdf data into dgraph and cayley.
The reason for these error was test timeout. So, I had to increase test timeout to 4h to get this working because cayley took around 2.5h to load whole data.
I have learnt some basic gremlin and have tried it out on cayley.
I will soon come up with benchmark on few queries.
Thats great @ankurayadav. Looking forward to some benchmarks. How much time did it take to load up the data on Cayley vs Dgraph?
For Cayley it took around 2h 22m to load full data.
And for Dgraph it took around 15m to load full data.
I have benchmarked a single query in both Cayley and Dgraph. i.e.
Query to find all movies and the genre of director "Steven Spielberg"
Results are available here.
https://github.com/ankurayadav/graphdb-benchmarks#results-of-queries-benchmark
Hey @ankurayadav, this look really interesting. Can you add a few more queries? You can pick the queries from https://wiki.dgraph.io/Get_Started.
Also: Maybe it would be worth reaching out to Cayley folks to check if boltdb is their fastest persistent backend, and if the query you produced for Cayley is optimized. We want to ensure that we provide the best possible outcome for Cayley.
@manishrjain @ankurayadav We know that current implementation of Bolt is not optimal, not to say slow. We are working on alternative implementation which is already much faster.
Also, it's not a good idea to compare Gremlin which is a full JS VM to a native query performance. We have an experimental branch with GraphQL support, so it's better to use it instead (when we rebase it on top of our new Bolt implementation).
@dennwc -- When do you expect these changes to be pushed to master?
@manishrjain To weeks at least. Maybe more. We should verify new approach carefully.
But we can make an experimental build earlier. Still need at least a week or two for this, since I have not much free cycles right now.
I am eager to try GraphQL on Cayley.
I shall keep an eye on its release!
@ankurayadav -- Seems like we should instead use MQL, which Cayley also supports. See Barak's comment here: https://discourse.cayley.io/t/cayley-vs-dgraph-benchmarks/562/8?u=mrjn
Note that I'd need these results in a week's time for Go meetup in Sydney; so would be great if you could put them together. Also, use other queries, if Cayley supports them. For e.g., these: https://wiki.dgraph.io/Get_Started
@manishrjain I will dig into some MQL and will try to make some more benchmark queries in few days.
I will try to complete it within a week so that you can use it in meetup.
@manishrjain I tried to do benchmarks with existing support of MQL in cayley but looks like it doesn't support filters yet.
So, I just did benchmarking on two queries which I was able to perform on caylay using MQL.
I will redo benchmarking when cayley fully supports MQL.
Results are available here.
https://github.com/ankurayadav/graphdb-benchmarks#results-of-queries-benchmark-1
Is this issue closed, wondering as there is no update since two months?
If no, what are the things left to investigate as I want to help
This is done. @ankurayadav has the link to the results. Also, @ankurayadav: You did a write up about this test. Can you post it on discuss.dgraph.io? That way we can link it here.