Scylla: Paging on Secondary Index does not end (NodeJS: ReadableStream never ends)

Created on 17 Jun 2019  路  16Comments  路  Source: scylladb/scylla

Problem

I'm using the npm cassandra client to make a select query using the stream method but the end event is never being triggered. This only happens in large collections where pagination is actually necessary. You won't see this behaviour for small sets.

Installation details

I'm running scylla locally in docker with

docker run --name scylla -d scylladb/scylla

Scylla version (or git commit hash): 3.0.5-0.20190414.403f66eca

Example:

DDL

CREATE KEYSPACE my_keyspace WITH replication = { 'class': 'SimpleStrategy', 'replication_factor' : 1 };

USE my_keyspace;

CREATE TABLE events
(
    eventId UUID,
    userId INT,
    payload TEXT,
    ackeddevices SET<TEXT>,
    PRIMARY KEY(eventId)
);

CREATE INDEX ON events(userId);

Code

const stream = scylladb.stream('SELECT payload, ackeddevices FROM events WHERE userId=?', [userId], { prepare: true });

stream
  .on('readable', function () {
    let data;
    while (data = this.read()) {
      // Doing something with data here
    }
  })
  .on('end', () => console.log('Ended'))
  .on('error', (err) => console.log(`Some error: ${err}`));

I've confirmed:

  • The 'Ended' is never logged when using Scylla (neither is 'Some error', which isn't an issue, just making sure some error wasn't being triggered).
  • the same code works using a Cassandra image (docker run cassandra:latest)
  • the stream.readable property is never set to false with Scylla, even seconds after the query is finished
  • the stream paging is working with both Cassandra and Scylla, it is only that the end event is not being triggered with the Scylla image

I can provide a dml.cql with a lot of inserts if it helps.

User Request bug sec-index

Most helpful comment

@thales-gaddini before being merged into master, it needs to pass the gating tests. If everything goes smoothly, then it may be a couple of hours, if not, then until I fix the bugs and resend :)

All 16 comments

@denesb please comment

@thales-gaddini please do provide the dml.cql

It will also help if you can provide a tcp dump for the interaction with cassandra and the interaction with scylla capturing port 9042

(the info for no next page should be visible on the last response - and it will help to validate that on the wire its marked correctly flag: Has_more_pages https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec#L552)

Hi @slivne , the dml is attached as a zip (just a really big file with a lot of inserts).

I've also implemented the paging with the eachRow method as describe here and noticed that the nextPage and pageState from the result are never null.

This is an example of the ResultSet returned in the eachRow callback with Cassandra when there are still pages to be retrieved:

{
  result: ResultSet {
    info: {
      queriedHost: '172.17.0.2:9042',
      triedHosts: [Object],
      speculativeExecutions: 0,
      achievedConsistency: 10,
      traceId: undefined,
      warnings: undefined,
      customPayload: undefined,
      isSchemaInAgreement: true
    },
    rows: undefined,
    rowLength: 20000,
    columns: [ [Object], [Object] ],
    pageState: '10ee2f4e7f73e043bea633a78769f5dfe300f07fffb1df00',
    nextPage: [Function: nextPage]
  }
}

This is an example of the ResultSet returned in the eachRow callback result with Cassandra when there are no more pages to be retrieved:

{
  result: ResultSet {
    info: {
      queriedHost: '172.17.0.2:9042',
      triedHosts: [Object],
      speculativeExecutions: 0,
      achievedConsistency: 10,
      traceId: undefined,
      warnings: undefined,
      customPayload: undefined,
      isSchemaInAgreement: true
    },
    rows: [],
    rowLength: 20000,
    columns: [ [Object], [Object] ],
    pageState: null,
    nextPage: undefined
  }
}

The ResultSet with Scylla never reaches the pageState: null and nextPage: undefined status, making infinite requests. It makes sense to me that the stream uses the same ResultSet under the hood and that's why it never ends. Two consecutive ResultSets when there where no more rows to be retrieved from Scylla:

{
  result: ResultSet {
    info: {
      queriedHost: '172.17.0.2:9042',
      triedHosts: [Object],
      speculativeExecutions: 0,
      achievedConsistency: 10,
      traceId: undefined,
      warnings: undefined,
      customPayload: undefined,
      isSchemaInAgreement: true
    },
    rows: [],
    rowLength: 20000,
    columns: [ [Object], [Object] ],
    pageState: '00000000a40000001000000001000000040000000000271001280000000200000008000000fffc0480a1e1a69410000000c50f3598f6304f7baa73526537dc724cdfb1ff7fee47a20ac473782b0f0c912bde39858e010000003900000001190000001400000001000000080000003aaa8898163098470101190000001400000001000000080000003aaa8898163098470101010000006d41db007affc3b5d4340de9fa2b2a940100',
    nextPage: [Function: nextPage]
  }
}
{
  result: ResultSet {
    info: {
      queriedHost: '172.17.0.2:9042',
      triedHosts: [Object],
      speculativeExecutions: 0,
      achievedConsistency: 10,
      traceId: undefined,
      warnings: undefined,
      customPayload: undefined,
      isSchemaInAgreement: true
    },
    rows: [],
    rowLength: 20000,
    columns: [ [Object], [Object] ],
    pageState: '00000000a40000001000000001000000040000000000271001280000000200000008000000fffc0480a1e1a69410000000c50f3598f6304f7baa73526537dc724cdfb1ff7fee47a20ac473782b0f0c912bde39858e010000003900000001190000001400000001000000080000003aaa8898163098470101190000001400000001000000080000003aaa8898163098470101010000006d41db007affc3b5d4340de9fa2b2a940100',
    nextPage: [Function: nextPage]
  }
}



md5-da83befbfbb41cb36266ab50f3929bb4



sudo tcpdump port 9042 -w scylla.pcap
sudo tcpdump port 10000 -w scylla.pcap
sudo tcpdump host 172.17.0.2 and port 9042 -w scylla.pcap
sudo tcpdump host 172.17.0.2 and port 10000 -w scylla.pcap

But none of them had any output. I've also tried to just tcpdump everything (without specifying a host or port), but nothing related to Scylla was logged. Should I tcpdump inside the Docker container?

tcpdumping from docker worked. Here is a zip with the pcap files from both Scylla and Cassandra.

As you said, the Has_more_pages flag is true in Scylla even though there are no more records (with a lot of the same results), as can be seen below (one of the many results )

results

scylla

Has_more_pages is false in the final result in the Cassandra dump.

cassandra

@thales-gaddini thanks (and extra points for doing the initial analysis - nice going)

There is one difference between cassandra and scylla when it comes to paging.

  • Cassandra will not return empty pages and will attempt to fill up pages to the number of rows requested
  • Scylla will cap page size by the number of rows requested or 1 MB which ever is reached first and may also return empty pages in some cases. In those cases the paging state returned should change between the pages

I am not sure the paging state is changing correctly between pages - yet will dig further after trying to reproduce this using the sample you shared.

So trying your schema + dml.cql on a clean 3.0.5 docker container with python I can't reproduce

from cassandra.cluster import Cluster

cluster = Cluster(["172.17.0.3"])
session = cluster.connect()
p = session.prepare('SELECT payload, ackeddevices from my_keyspace.events where userid=?')
rows = session.execute(p,[10000])
for user_row in rows:
    print ".",

next the dump :)

Yet a closer look it does reproduce the issue

last_page

the last page has the flag Has_next_page set to True

issue_4569.cap.gz

Is the paging state changing between different iterations?

The driver should have continued fetching. I don't understand @slivne's capture.

The paging state does not change

frame 17623 result paging state
frame_17623.paging_state.txt

frame 17624 next page request (passes the paging info)
frame_17624.e_paging_state.txt

frame 17625 result paging state - the same as 17623
frame_17625.paging_state.txt

@avikivity I don;t understand the driver as well not fetching more pages - the program is the one I provided above

so the query returns a total of 20000 rows split into 5000 per page

I changed the pagesize from being 5000 to being 4999

So scylla can return 4999 and the last page not a full page (4 results)

in this case the last page Has_more_pages is marked as FALSE

issue_4569_2_page_4999.cap.gz

issue_4569_page_4999

So it seems the issue is with the last page and the next query that returns no results returns back the paging information while it shouldn't

another oddity using the 5000 row page size , when I change the code and not use prepared statements it does not reproduce as well

on docker 3.0.7

this has the problem e.g. the last page is an empty page and has Has_more_pages : True

from cassandra.cluster import Cluster

cluster = Cluster(["172.17.0.4"])
session = cluster.connect()
p = session.prepare('SELECT payload, ackeddevices from my_keyspace.events where userid=?')
rows = session.execute(p,[10000])
for user_row in rows:
    print ".",

this does not e.g. the last page is an empty page with Has_more_pages set to False

from cassandra.cluster import Cluster

cluster = Cluster(["172.17.0.4"])
session = cluster.connect()
rows = session.execute("SELECT payload, ackeddevices from my_keyspace.events where userid=10000")
for user_row in rows:
    print ".",

Hey guys, thanks for the fix. When does it go from next to master?

@thales-gaddini before being merged into master, it needs to pass the gating tests. If everything goes smoothly, then it may be a couple of hours, if not, then until I fix the bugs and resend :)

Backported to 3.0 and 3.1, removing from backport candidates.

Was this page helpful?
0 / 5 - 0 ratings