Scylla: Support Cassandra 3 sstable format

Created on 28 Dec 2016  Â·  16Comments  Â·  Source: scylladb/scylla

We should support Cassandra 3's new sstable format, for two reasons:

  1. It has several advantages over the Cassandra 2 sstable format which we currently use, e.g., smaller on-disk footprint without compression.
  2. It will be helpful for people migrating from Cassandra 3 and could use their existing data.

Other options are possible - like an offline migration tool from Cassandra 3 to ScyllaDB, or we could invent our own, even better, new storage format. But the option of supporting Cassandra 3's sstable format might be both easier and convenient.

We should probably begin by documenting the Cassandra 3 format - and its differences from the old format - in the Wiki, like we did for the Cassandra 2 sstable format.

We also need to verify whether after the release of Cassandra 3.0, did the format continue to evolve - and if it did, which of the 3.* formats do we want to support.

We should strive to continue supporting both formats, for both reads and writes. We can keep the existing sstable source files, and add new source files for the new format.

cassandra 3.x compatibility high

Most helpful comment

Bad error reporting, but I would guess that the table you are trying to
load is not defined in the target node.

Scylla sstable loader requires the target node(s) to have the
source/target table created, so if you are importing from a different
cluster you should first issue the appropriate "create table" statements
into the new scylla cluster.

On 04/11/2018 08:50 AM, Roman wrote:
>

@elcallio https://github.com/elcallio thnx
when i try to use sshtableloader from scylla-tools i got this

|root@ip-10-2-23-19 centos]# sstableloader -x -v -d 10.2.23.19
/cassadata/data/Kyespace/ Table1/ java.lang.NullPointerException at
com.scylladb.tools.BulkLoader$CQLClient.getCFMetaData(BulkLoader.java:439)
at
com.scylladb.tools.SSTableToCQL.getCFMetaData(SSTableToCQL.java:835)
at com.scylladb.tools.SSTableToCQL.addFile(SSTableToCQL.java:874) at
com.scylladb.tools.SSTableToCQL.access$100(SSTableToCQL.java:112) at
com.scylladb.tools.SSTableToCQL$1.accept(SSTableToCQL.java:852) at
java.io.File.list(File.java:1161) at
com.scylladb.tools.SSTableToCQL.openSSTables(SSTableToCQL.java:846) at
com.scylladb.tools.SSTableToCQL.stream(SSTableToCQL.java:926) at
com.scylladb.tools.BulkLoader.main(BulkLoader.java:900) |

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/1969#issuecomment-380377086,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGyLdqMgcDjT2iEd74acmFYvdEAU1XSzks5tncO-gaJpZM4LW03_.

All 16 comments

Another reason is that COPY FROM is not working correctly yet, and then the migration from Cassandra 3.x to Scylla is not easy.

For migration, having Scylla sstableloader supporting 3.* format is a valid solution

C* also moved to CRC32 for compression in 3.0 https://issues.apache.org/jira/browse/CASSANDRA-8684

On Sun, Mar 26, 2017 at 02:32:49AM -0700, Tzach Livyatan wrote:

C* also moved to CRC32 for compression in 3.0 https://issues.apache.org/jira/browse/CASSANDRA-8684

That should be hell of a compression.

--
Gleb.

Hey there!
Is it possible to migrate from Cassa 3 to Scylla for now?

It's possible via sstableloader.

The first bytes of native support for 3.0 format are working but it will
take ~3 month to release them.
As Avi said, for the time being, upload them to Scylla using our
sstableloader

On Mon, Apr 9, 2018 at 6:10 AM, Avi Kivity notifications@github.com wrote:

It's possible via sstableloader.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/1969#issuecomment-379746051,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABp6RcAWuwqEM7VYZvbpjcxa9v8cQwkgks5tm122gaJpZM4LW03_
.

Hi everyone
can i use sstableloader to migrate data from cassandra 3.1 or only from 3.0?

Should work on any 3.x

Cassandra has three minor versions of SSTables 3.0 labeled ma, mb and mc, they
only differ in some additional commitlog positions stored in Statistics.db.

On Tue, Apr 10, 2018 at 3:11 AM, Roman notifications@github.com wrote:

Hi everyone
can i use sstableloader to migrate data from cassandra 3.1 or only from
3.0?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/1969#issuecomment-380047575,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABp6RexbSSBz6eCsVybPT8NnsTsnbbz2ks5tnIVogaJpZM4LW03_
.

@dorlaor Ok thnx
can you help me please?
when i try to use sshtableloader on cassandra host i got this on scylladb

Apr 10 10:51:18 ip-10-2-23-19.ec2.internal scylla[13344]:  [shard 0] rpc - client 10.2.23.241: wrong protocol magic
Apr 10 10:51:18 ip-10-2-23-19.ec2.internal scylla[13344]:  [shard 0] rpc - client 10.2.23.241: server connection dropped: connection is closed

and stacktrace from sshtableloader

Streaming to the following hosts failed:
[/10.2.23.19]
java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
    at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
    at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
    at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:98)
    at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:48)
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
    at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88)
    at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
    at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
    at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
    at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
    at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
    at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215)
    at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191)
    at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481)
    at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:571)
    at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:324)
    at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" org.apache.cassandra.tools.BulkLoadException: java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
    at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:114)
    at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:48)
Caused by: java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
    at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
    at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
    at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:98)
    ... 1 more
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
    at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88)
    at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
    at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
    at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
    at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
    at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
    at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215)
    at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191)
    at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481)
    at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:571)
    at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:324)
    at java.lang.Thread.run(Thread.java:748)
ERROR 08:08:23,344 [Stream #80342a90-3d5f-11e8-8712-311248845468] Streaming error occurred on session with peer 10.2.23.19
java.io.IOException: Broken pipe
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_141]
    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.8.0_141]
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_141]
    at sun.nio.ch.IOUtil.write(IOUtil.java:51) ~[na:1.8.0_141]
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) ~[na:1.8.0_141]
    at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.doFlush(BufferedDataOutputStreamPlus.java:323) ~[apache-cassandra-3.11.1.jar:3.11.1]
    at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.flush(BufferedDataOutputStreamPlus.java:331) ~[apache-cassandra-3.11.1.jar:3.11.1]
    at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:409) [apache-cassandra-3.11.1.jar:3.11.1]
    at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:388) [apache-cassandra-3.11.1.jar:3.11.1]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_141]

i'm using AMI ID ScyllaDB 2.1.1 (ami-11845b6c)

On Wed, Apr 11, 2018 at 08:12:05AM +0000, Roman wrote:

Apr 10 10:51:18 ip-10-2-23-19.ec2.internal scylla[13344]: [shard 0] rpc - client 10.2.23.241: wrong protocol magic
Apr 10 10:51:18 ip-10-2-23-19.ec2.internal scylla[13344]: [shard 0] rpc - client 10.2.23.241: server connection dropped: connection is closed
This means that someone other than Scylla tries to connect to port 7000.
Is it possible this IP is still listed as a Cassandra peer at some other
cluster? Also often we saw that as a result of port scanners if you
happen to be open to the Internet.

--
Gleb.

You are running a native cassandra loader tool. Scylla does not support
this, as it uses a different protocol for streaming data.

The tool you want is the "sstableloader" one in the scylla java tools
package.

On 04/11/2018 08:12 AM, Roman wrote:
>

@dorlaor https://github.com/dorlaor Ok thnx
can you help me please?
when i try to use sshtableloader on cassandra host i got this on scylladb

|Apr 10 10:51:18 ip-10-2-23-19.ec2.internal scylla[13344]: [shard 0]
rpc - client 10.2.23.241: wrong protocol magic Apr 10 10:51:18
ip-10-2-23-19.ec2.internal scylla[13344]: [shard 0] rpc - client
10.2.23.241: server connection dropped: connection is closed |

and stacktrace from sshtableloader

|Streaming to the following hosts failed: [/10.2.23.19]
java.util.concurrent.ExecutionException:
org.apache.cassandra.streaming.StreamException: Stream failed at
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:98) at
org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:48) Caused
by: org.apache.cassandra.streaming.StreamException: Stream failed at
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
at
com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
at
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
at
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
at
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
at
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215)
at
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191)
at
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481)
at
org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:571)
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:324)
at java.lang.Thread.run(Thread.java:748) Exception in thread "main"
org.apache.cassandra.tools.BulkLoadException:
java.util.concurrent.ExecutionException:
org.apache.cassandra.streaming.StreamException: Stream failed at
org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:114) at
org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:48) Caused
by: java.util.concurrent.ExecutionException:
org.apache.cassandra.streaming.StreamException: Stream failed at
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:98) ...
1 more Caused by: org.apache.cassandra.streaming.StreamException:
Stream failed at
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
at
com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
at
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
at
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
at
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
at
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215)
at
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191)
at
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481)
at
org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:571)
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:324)
at java.lang.Thread.run(Thread.java:748) ERROR 08:08:23,344 [Stream

80342a90-3d5f-11e8-8712-311248845468] Streaming error occurred on

session with peer 10.2.23.19 java.io.IOException: Broken pipe at
sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_141] at
sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
~[na:1.8.0_141] at
sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
~[na:1.8.0_141] at sun.nio.ch.IOUtil.write(IOUtil.java:51)
~[na:1.8.0_141] at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
~[na:1.8.0_141] at
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.doFlush(BufferedDataOutputStreamPlus.java:323)
~[apache-cassandra-3.11.1.jar:3.11.1] at
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.flush(BufferedDataOutputStreamPlus.java:331)
~[apache-cassandra-3.11.1.jar:3.11.1] at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:409)
[apache-cassandra-3.11.1.jar:3.11.1] at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:388)
[apache-cassandra-3.11.1.jar:3.11.1] at
java.lang.Thread.run(Thread.java:748) [na:1.8.0_141] |

i'm using |AMI ID ScyllaDB 2.1.1 (ami-11845b6c)|

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/1969#issuecomment-380366525,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGyLdr-hqxTfp7cKBC8RwBj91nm91eipks5tnbrVgaJpZM4LW03_.

On Wed, Apr 11, 2018 at 08:43:44AM +0000, Calle Wilund wrote:

You are running a native cassandra loader tool. Scylla does not support
this, as it uses a different protocol for streaming data.

Ah, if the tool connects to port 7000 it explains rpc message too.

--
Gleb.

@elcallio thnx
when i try to use sshtableloader from scylla-tools i got this

root@ip-10-2-23-19 centos]# sstableloader -v -d $(hostname -i) /cassadata/data/Kyespace/
Table1/
java.lang.NullPointerException
    at com.scylladb.tools.BulkLoader$CQLClient.getCFMetaData(BulkLoader.java:439)
    at com.scylladb.tools.SSTableToCQL.getCFMetaData(SSTableToCQL.java:835)
    at com.scylladb.tools.SSTableToCQL.addFile(SSTableToCQL.java:874)
    at com.scylladb.tools.SSTableToCQL.access$100(SSTableToCQL.java:112)
    at com.scylladb.tools.SSTableToCQL$1.accept(SSTableToCQL.java:852)
    at java.io.File.list(File.java:1161)
    at com.scylladb.tools.SSTableToCQL.openSSTables(SSTableToCQL.java:846)
    at com.scylladb.tools.SSTableToCQL.stream(SSTableToCQL.java:926)
    at com.scylladb.tools.BulkLoader.main(BulkLoader.java:900)

Bad error reporting, but I would guess that the table you are trying to
load is not defined in the target node.

Scylla sstable loader requires the target node(s) to have the
source/target table created, so if you are importing from a different
cluster you should first issue the appropriate "create table" statements
into the new scylla cluster.

On 04/11/2018 08:50 AM, Roman wrote:
>

@elcallio https://github.com/elcallio thnx
when i try to use sshtableloader from scylla-tools i got this

|root@ip-10-2-23-19 centos]# sstableloader -x -v -d 10.2.23.19
/cassadata/data/Kyespace/ Table1/ java.lang.NullPointerException at
com.scylladb.tools.BulkLoader$CQLClient.getCFMetaData(BulkLoader.java:439)
at
com.scylladb.tools.SSTableToCQL.getCFMetaData(SSTableToCQL.java:835)
at com.scylladb.tools.SSTableToCQL.addFile(SSTableToCQL.java:874) at
com.scylladb.tools.SSTableToCQL.access$100(SSTableToCQL.java:112) at
com.scylladb.tools.SSTableToCQL$1.accept(SSTableToCQL.java:852) at
java.io.File.list(File.java:1161) at
com.scylladb.tools.SSTableToCQL.openSSTables(SSTableToCQL.java:846) at
com.scylladb.tools.SSTableToCQL.stream(SSTableToCQL.java:926) at
com.scylladb.tools.BulkLoader.main(BulkLoader.java:900) |

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/1969#issuecomment-380377086,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGyLdqMgcDjT2iEd74acmFYvdEAU1XSzks5tncO-gaJpZM4LW03_.

oh big thanx
keyspace for this table was with uppercase
that was a problem

Was this page helpful?
0 / 5 - 0 ratings