I know that full thrift support was not planned for GA but I did some experimentation to see if Titan would work with the current level of functionality. I have attached a test system/titan keyspace [1] that you can use to reproduce this behavior. Titan creates its keyspace automatically when you start it up for the first time. This system and titan keyspace are a result of doing this with Titan 1.0 and Cassandra 2.1.9. Scylla starts up ok after these are copied into /var/lib/scylla/data but the first access attempts that Titan makes throw unimplemented_exceptions. [2] I also tried taking an existing titan and migrating it over to Scylla but apparently Titan still uses unsupported thrift operations beyond the keyspace creation phase.
[1] [titanKeyspace.zip](https://github.com/scylladb/scylla/files/70279/titanKeyspace.zip)
[2] https://github.com/scylladb/scylla/blob/1991fd5ca2dc00803f515503df894aadbc0979d5/thrift/handler.cc#L47
@twilmes thanks for the initial effort.
Integrating with Titan is very interesting for ScyllaDB.
We do plan to complete Thrift after GA, which is getting closer.
Thanks @tzach. Let me know if you guys would like any help testing things out when they're a bit further along. In the meantime, I'll monitor the progress.
@twilmes thanks for testing and for keeping and eye on the Titan integration.
Titan/Scylla is a combination is very interesting for us.
You can watch #399 for updates on Thrift support, and I will reach out to you with any updates. help in testing will be appreciated!
Now Thrift is done, would it be possible to test this again?
Exactly! We tested Kariosdb which works fine and Presto (with a tiny issue
which
was committed already). We plan to test Titan and you're welcomed to be the
first
On Fri, Jul 22, 2016 at 11:46 PM, Mathias Bogaert [email protected]
wrote:
Now Thrift is done, would it be possible to test this again?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/693#issuecomment-234652098,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABp6RcKmTvmnk9uEr0pRpD9IkN7Y7NjZks5qYSw9gaJpZM4G6RHw
.
Cool, yeah, I just saw the other issue closed out yesterday. Great news.
I'll let you all know when I get a chance if somebody else doesn't beat me
to it!
Thanks,
Ted
On Fri, Jul 22, 2016 at 4:04 PM, Dor Laor [email protected] wrote:
Exactly! We tested Kariosdb which works fine and Presto (with a tiny issue
which
was committed already). We plan to test Titan and you're welcomed to be the
firstOn Fri, Jul 22, 2016 at 11:46 PM, Mathias Bogaert <
[email protected]>
wrote:Now Thrift is done, would it be possible to test this again?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/693#issuecomment-234652098,
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/ABp6RcKmTvmnk9uEr0pRpD9IkN7Y7NjZks5qYSw9gaJpZM4G6RHw.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/693#issuecomment-234656250,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEgPzWEAWlLE9B9f_jOSpZU5F26CK9L5ks5qYTBQgaJpZM4G6RHw
.
I've given it a quick test against the latest Scylla code on Fedora 23 from here.
One issue I found is that Scylla isn't throwing a org.apache.cassandra.thrift.InvalidRequestException when Titan attempts to connect to a keyspace that doesn't already exist. Scylla is throwing a org.apache.thrift.TApplicationException instead.
Here's the relevant part of the stack trace:
Caused by: com.thinkaurelius.titan.diskstorage.PermanentBackendException: Permanent failure in storage backend
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager.ensureColumnFamilyExists(CassandraThriftStoreManager.java:528)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager.ensureColumnFamilyExists(CassandraThriftStoreManager.java:500)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager.openDatabase(CassandraThriftStoreManager.java:317)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager.openDatabase(CassandraThriftStoreManager.java:55)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.KeyColumnValueStoreManager.openDatabase(KeyColumnValueStoreManager.java:29)
at com.thinkaurelius.titan.diskstorage.Backend.getStandaloneGlobalConfiguration(Backend.java:449)
... 40 more
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Temporary failure in storage backend
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager.ensureKeyspaceExists(CassandraThriftStoreManager.java:473)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager.ensureColumnFamilyExists(CassandraThriftStoreManager.java:506)
... 45 more
Caused by: org.apache.thrift.TApplicationException: Internal server error: Keyspace 'titan' does not exist
at org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.cassandra.thrift.Cassandra$Client.recv_set_keyspace(Cassandra.java:608)
at org.apache.cassandra.thrift.Cassandra$Client.set_keyspace(Cassandra.java:595)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager.ensureKeyspaceExists(CassandraThriftStoreManager.java:447)
... 46 more
According to the InvalidRequestException javadoc, "Invalid request could mean keyspace or column family does not exist, required parameters are missing, or a parameter is malformed."
I worked around this problem by manually creating a titan keyspace from cqlsh:
CREATE KEYSPACE titan WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
After that, I was able to load Graph of the Gods fine. One caveat is that I had to disable the cache.db-cache settings in $TITAN_HOME/conf/titan-cassandra.properties before the load, otherwise I was finding that traversals like g.V().count() and g.V().valueMap(true) were returning duplicate results.
This is looking pretty promising!
@pluradj Thanks for the report!
@duarten can you take a look at the two issue reported above?
@tzach Will do.
@pluradj Can you give me some examples of traversals that reported duplicate values? I'm running Scylla from master (with the set_keyspace problem solved), TitanDB 1.0.0 and ES 1.5.0, with conf/titan-cassandra-es.properties and I'm getting correct results so far:
gremlin> g.V(hercules).out('battled').valueMap(true)
==>[name:[nemean], label:monster, id:4176]
==>[name:[hydra], label:monster, id:4320]
==>[name:[cerberus], label:monster, id:8272]
gremlin> g.V(hercules).out('battled').count()
==>3
I haven't updated from master. Here's the gremlin session:
gremlin> graph = TitanFactory.open('conf/titan-cassandra.properties')
==>standardtitangraph[cassandrathrift:[127.0.0.1]]
gremlin> GraphOfTheGodsFactory.load(graph, null, false)
==>null
gremlin> g = graph.traversal()
==>graphtraversalsource[standardtitangraph[cassandrathrift:[127.0.0.1]], standard]
gremlin> [ vertices: g.V().count().next(), edges: g.E().count().next() ]
17:10:39 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx - Query requires iterating over all vertices [()]. For better performance, use indexes
17:10:39 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx - Query requires iterating over all vertices [()]. For better performance, use indexes
==>vertices=24
==>edges=33
gremlin> g.V().valueMap(true)
17:10:44 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx - Query requires iterating over all vertices [()]. For better performance, use indexes
==>[label:location, name:[sea], id:4176]
==>[label:god, name:[neptune], id:4192, age:[4500]]
==>[label:monster, name:[nemean], id:4256]
==>[label:location, name:[sky], id:4272]
==>[label:god, name:[jupiter], id:8368, age:[5000]]
==>[label:god, name:[pluto], id:12464, age:[4000]]
==>[label:location, name:[tartarus], id:16560]
==>[label:demigod, name:[hercules], id:4296, age:[30]]
==>[label:titan, name:[saturn], id:4312, age:[10000]]
==>[label:human, name:[alcmene], id:8408, age:[45]]
==>[label:monster, name:[hydra], id:4328]
==>[label:monster, name:[cerberus], id:8424]
==>[label:location, name:[sea], id:4176]
==>[label:god, name:[neptune], id:4192, age:[4500]]
==>[label:monster, name:[nemean], id:4256]
==>[label:location, name:[sky], id:4272]
==>[label:god, name:[jupiter], id:8368, age:[5000]]
==>[label:god, name:[pluto], id:12464, age:[4000]]
==>[label:location, name:[tartarus], id:16560]
==>[label:demigod, name:[hercules], id:4296, age:[30]]
==>[label:titan, name:[saturn], id:4312, age:[10000]]
==>[label:human, name:[alcmene], id:8408, age:[45]]
==>[label:monster, name:[hydra], id:4328]
==>[label:location, name:[tartarus], id:16560]
gremlin> g.E().valueMap(true)
17:10:47 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx - Query requires iterating over all vertices [()]. For better performance, use indexes
==>[reason:loves waves, label:lives, id:1zg-38g-9hx-380]
==>[label:brother, id:2do-38g-b2t-6gg]
==>[label:brother, id:2rw-38g-b2t-9m8]
==>[label:father, id:5xy-6gg-6c5-3bs]
==>[reason:loves fresh breezes, label:lives, id:6c6-6gg-9hx-3ao]
==>[label:brother, id:6qe-6gg-b2t-38g]
==>[label:brother, id:74m-6gg-b2t-9m8]
==>[reason:no fear of death, label:lives, id:8ba-9m8-9hx-cs0]
==>[label:pet, id:8pi-9m8-aad-6i0]
==>[label:brother, id:7x2-9m8-b2t-38g]
==>[label:brother, id:7iu-9m8-b2t-6gg]
==>[label:father, id:1zt-3bc-6c5-6gg]
==>[label:mother, id:2e1-3bc-74l-6hk]
==>[label:battled, id:2s9-3bc-7x1-3a8, time:1, place:point[38.1,23.7]]
==>[label:battled, id:36h-3bc-7x1-3c8, place:point[37.7,23.9], time:2]
==>[label:battled, id:3kp-3bc-7x1-6i0, place:point[39.0,22.0], time:12]
==>[label:lives, id:2sd-6i0-9hx-cs0]
==>[reason:loves waves, label:lives, id:1zg-38g-9hx-380]
==>[label:brother, id:2do-38g-b2t-6gg]
==>[label:brother, id:2rw-38g-b2t-9m8]
==>[label:father, id:5xy-6gg-6c5-3bs]
==>[reason:loves fresh breezes, label:lives, id:6c6-6gg-9hx-3ao]
==>[label:brother, id:6qe-6gg-b2t-38g]
==>[label:brother, id:74m-6gg-b2t-9m8]
==>[reason:no fear of death, label:lives, id:8ba-9m8-9hx-cs0]
==>[label:pet, id:8pi-9m8-aad-6i0]
==>[label:brother, id:7x2-9m8-b2t-38g]
==>[label:brother, id:7iu-9m8-b2t-6gg]
==>[label:father, id:1zt-3bc-6c5-6gg]
==>[label:mother, id:2e1-3bc-74l-6hk]
==>[label:battled, id:2s9-3bc-7x1-3a8, time:1, place:point[38.1,23.7]]
==>[label:battled, id:36h-3bc-7x1-3c8, place:point[37.7,23.9], time:2]
==>[label:battled, id:3kp-3bc-7x1-6i0, place:point[39.0,22.0], time:12]
vertex count should be 12, edge count should be 17
@pluradj Thanks! I can repro it now. Will investigate.
@pluradj Master now contains fixes for both of those problems. They still aren't in any released package, but they'll be in the next RC.
@duarten Which commit(s) in master are those? I backported 5aaf43d1bcea20be215242ec56f4ff73d9e24e75 to 1.3 branch. Do we need anything else?
Gets a +1 from me too
@penberg The other one is 2be45c4806b38b12add69f305b945ca96d7c1f32; unsure if it was backported.
It's backported. If you run
git fetch origin refs/notes/*:refs/notes/*
You will see a git note on the commit in master indicating that we backported it.
@penberg Ah, good to know! :)
I've confirmed the fixes in my environment :+1:
Now I'm trying to do some Titan OLAP with SparkGraphComputer, and this is the stack trace I'm running into:
gremlin> graph = GraphFactory.open('conf/hadoop/read-scylla.properties'); g = graph.traversal(computer(SparkGraphComputer)); g.V().count()
java.io.IOException: Could not get input splits
Display stack trace? [yN] y
java.lang.IllegalStateException: java.io.IOException: Could not get input splits
at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:82)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:140)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:147)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:218)
at org.apache.tinkerpop.gremlin.console.Console$_closure3.doCall(Console.groovy:205)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:218)
at org.codehaus.groovy.tools.shell.Groovysh.setLastResult(Groovysh.groovy:443)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:218)
at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:187)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:218)
at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:122)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:218)
at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:95)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1210)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:124)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:218)
at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:59)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1210)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:83)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:218)
at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:144)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:218)
at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:305)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Could not get input splits
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:80)
... 48 more
Caused by: java.io.IOException: Could not get input splits
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:203)
at com.thinkaurelius.titan.hadoop.formats.cassandra.CassandraBinaryInputFormat.getSplits(CassandraBinaryInputFormat.java:48)
at com.thinkaurelius.titan.hadoop.formats.util.GiraphInputFormat.getSplits(GiraphInputFormat.java:48)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:115)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.api.java.JavaRDDLike$class.partitions(JavaRDDLike.scala:65)
at org.apache.spark.api.java.AbstractJavaRDDLike.partitions(JavaRDDLike.scala:47)
at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$28(SparkGraphComputer.java:176)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: failed connecting to all endpoints
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:199)
... 19 more
Caused by: java.io.IOException: failed connecting to all endpoints
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSubSplits(AbstractColumnFamilyInputFormat.java:317)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.access$200(AbstractColumnFamilyInputFormat.java:61)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:236)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:221)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
I'm building titan11 branch from source, running against a Spark 1.5.2 standalone cluster and Hadoop 2.7.2. Here's the read-scylla.properties.
My guess is that this call to describe_local_ring is failing.
Away from computer at the moment.
"failed connecting to all endpoints " + StringUtils.join(range.endpoints, ","));
This error message didn't add any additional detail in the stack trace message, so that's why I think the range is incorrect.
You're right. We're returning TokenRanges, since we're calling through to getSubSplits, but the rpc_endpoints do seem to be empty. I can't think of anything off the top of my head that could explain this, so I'll investigate tomorrow.
At this point in the code: splitsize=65536, range.rpc_endpoints.size()=0, range.endpoints.size()=0, keyspace=argonaut, cfName=edgestore
I'm also seeing miscounts on Grateful Dead graph:
gremlin> graph = TitanFactory.open('conf/titan-scylla.properties')
==>standardtitangraph[cassandrathrift:[10.0.2.15]]
gremlin> graph.io(gryo()).readGraph('data/grateful-dead.kryo')
==>null
gremlin> graph.tx().commit()
==>null
gremlin> g = graph.traversal()
==>graphtraversalsource[standardtitangraph[cassandrathrift:[127.0.0.1]], standard]
gremlin> g.V().count()
==>508
gremlin> g.E().count()
==>5131
Expected 808 vertices, 8049 edges.
I opened #1517 and sent a fix for it; this should take care of the empty rpc_endpoints.
Now I'm going to look into the Grateful Dead graph miscounts.
It seems to be working for me:
gremlin> graph = TitanFactory.open('conf/titan-cassandra-es.properties')
==>standardtitangraph[cassandrathrift:[127.0.0.1]]
gremlin> graph.io(gryo()).readGraph('data/grateful-dead.kryo')
==>null
gremlin> graph.tx().commit()
==>null
gremlin> g = graph.traversal()
==>graphtraversalsource[standardtitangraph[cassandrathrift:[127.0.0.1]], standard]
gremlin> g.V().count()
11:39:51 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx - Query requires iterating over all vertices [()]. For better performance, use indexes
==>808
gremlin> g.E().count()
11:39:57 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx - Query requires iterating over all vertices [()]. For better performance, use indexes
==>8049
gremlin>
What version of Scylla are you on?
666.development-20160729.3531dd8 on Fedora 23. Counts for Grateful Dead are off, regardless of the cache.db_cache setting in titan-cassandra.properties.
Oh, I've managed to repro it. Running scylla with --smp > 1 seems to cause the problem. I always get correct results with --smp 1, but never with --smp 4.
Quick update: I've pushed a patch addressing this pending issue, and it should be merged soon.
Fixes are merged and should be in next 1.3 RC.
Scylla will be compatible with TitanDB in version 1.3?
@ustczen That's what we're shooting for!
@duarten I'm testing against 666.development-20160804.ad58691 on Fedora 23, single node.
OLTP Graph of the Gods and OLTP Grateful Dead are looking good. :+1:
Using OLAP SparkGraphComputer in local mode (here's the read-scylla.properties I used), I'm getting past the input splits problem previously reported, but the results are off. Changing from spark.master=local[4] to spark.master=local didn't make a difference. Getting 7 vertices, 13 edges on Gods; 363 vertices, 3563 edges on Grateful Dead.
Here's the Gremlin Console session:
gremlin> graph = GraphFactory.open('conf/hadoop-graph/read-scylla.properties'); g = graph.traversal(computer(SparkGraphComputer));
==>graphtraversalsource[hadoopgraph[cassandrainputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().count()
03:05:36 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
==>363
gremlin> g.E().count()
==>3563
@pluradj I'll investigate. Any pointers as to what changes when using SparkGraphComputer?
Found the problem, will push a fix asap.
Patches are in master and 1.3.
Things are looking very good over here with both OLTP and OLAP using 666.development-20160808.700feda on Fedora 23. When is 1.3 supposed to close?
Good to know! There's no exact date, but expect 1.3 to be out real soon.
Trying to load a slightly larger data set into a 3-node Scylla cluster with replication factor 1 and getting some incorrect counts. Expecting 9962 vertices, and 1012657 edges.
If I use 3-node Scylla cluster with RF=3, it works ok and also works ok with 1-node Scylla cluster with RF=1.
Hum.. that's strange. Anything that stands out in the node's logs?
It might be something with describe_splits_ex: with RF=1, data is spread around the cluster, so that verb should be called on all of the hosts. Did you also try this scenario with Cassandra?
It would also be good to know how data is spread around the hosts, maybe by querying the tables Titan creates via cqlsh.
Trying to fix this using #1616, #1636
Is this issue resolved in ScyllaDB 1.3 ?
Tested @pluradj scenario (https://github.com/pluradj/titan-movielens) with Scylla 1.4.1 and verified the problem still exists.
With RF = 1:
gremlin> graph = TitanFactory.open('conf/titan-cassandra.properties')
==>standardtitangraph[cassandrathrift:[...]]
gremlin> graph.io(gryo()).readGraph('./graphdata/movielens.gryo')
==>null
gremlin> g = graph.traversal()
gremlin> g.V().count()
==>10700
gremlin> g.E().count()
==>1014825
With RF = 3:
gremlin> graph = TitanFactory.open('conf/titan-cassandra.properties')
==>standardtitangraph[cassandrathrift:[...]]
gremlin> graph.io(gryo()).readGraph('./graphdata/movielens.gryo')
==>null
gremlin> g = graph.traversal()
gremlin> g.V().count()
==>9962
gremlin> g.E().count()
==>1012657
@duarten , how about 1.5?
@ustczen Unfortunately no, but it is fixed in 1.6.
@duarten, which commit(s) fix the issue? Can we backport them to 1.5.x?
@penberg Both 6bb875bdb74652ac844f5052e0458e244ef9a1df and 57f49108322e907e0896b5b4db67ca6db7366a86. Not sure if we want to backport them all, since some of fixes appear in a series with bigger changes, but I can find out which are the minimum, less intrusive set of commits that fix things and prepare the patches to be backported.
@duarten Oh, those patch series are definitely too big to backport to 1.5. Thanks for the information, though!
Most helpful comment
Quick update: I've pushed a patch addressing this pending issue, and it should be merged soon.