Yugabyte-db: [YCQL] OFFSET is a reserved keyword

Created on 5 Feb 2019  路  10Comments  路  Source: yugabyte/yugabyte-db

Hi,
Any reason why we have OFFSET as a reserved keyword in Yugabyte? (Its not the case with Cassandra). Also, is it possible to change?

Some background -
I am trying to use Yugabyte's Cassandra storage for akka-persistence-cassandra and as a part of it its creating a table
CREATE TABLE IF NOT EXISTS akka.tag_write_progress( persistence_id text, tag text, sequence_nr bigint, tag_pid_sequence_nr bigint, offset timeuuid, PRIMARY KEY (persistence_id, tag));

The above create command is failing since there is a column named offset.

Thanks

communitrequest kinenhancement

Most helpful comment

Thanks @kmuthukk for explaining the performance for offset. Even I was thinking something along option "a" since that gives a more predictable performance. That would also eliminate returning any duplicate results when a record is inserted between two page fetches.

Reg SortedSet, the commands which are missing for akka-persistence-redis are
zremrangebyscore
zrevrangebyscore

Apart from that, I also see a lot of uses for zrangebylex and zrevrangebylex commands as well (Widely used for prefix matching scenarios). So if you could prioritize that too, that would be great.

Thanks,
Srikanth

All 10 comments

cc: @nocaway , @robertpang .

Hi @srikanthm-1 : OFFSET was made keyword as we planned to extend the CQL language to support the starting point of a query in addition to LIMIT. For example, the following query with (offset=10) will skip the first 10 rows and retrieve the next 20 rows. I'll discuss with the team to see if we should remove OFFSET from keyword list.

SELECT * FROM a_table OFFSET 10 LIMIT 20;

Hi @srikanthm-1 :

  • OFFSET has to be kept as keyword to support OFFSET clause in YugaByte as aforementioned.
  • A workaround would be using double quote around column name, i.e. "offset". If you're writing a program such as JAVA program, you might need escape character before the double quote.
CREATE TABLE tab (user_id int primary key, "offset" timeuuid);

Use escape character when needed.

session.execute("INSERT INTO tab(user_id, \"offset\") VALUES(1, now());");

Reopen issue so that we can revisit this in the near future.

@srikanthm-1

We'll try to see if we can relax the restriction from OFFSET being a reserved keyword to a regular keyword.

Currently, unlike Apache Cassandra, YB's YCQL supports things like .. WHERE x = 10 OFFSET 5 LIMIT 5;. So along the lines of Postgres we made it a reserved keyword.

Fixing that would require some hackery in the compiler grammar to parse the various cases without ambiguity, correctly. Cases like these where offset is a table or column name:

SELECT * from offset offset 5 limit 5;
or
SELECT * from T where 10=offset offset 5 limit 5;
need to be handled carefully.

But we realized that existing apps (or frameworks) using Apache Cassandra won't work transparently with YugaByte YCQL without this support. So will look into fixing this.

Is this a blocker for you now? Are you able to use the workaround suggested or rename the column name in the version of akka-persistence-cassandra you are using?

Thanks @kmuthukk, having full compatibility with CQL will really help with our existing apps seamlessly move to YugabyteDB.

I am not currently blocked by this, we are currently using the akka-persistence plugin for redis (We have a sentinel managed redis cluster) and was planning switch that over to Yugabyte's Redis. Unfortunately, the plugin was using a few sorted set commands which are not yet supported. Hence I decided to use Cassandra and ended up with this problem. I can keep my existing set up till the fix is rolled out.

I had a few followup questions
1) Is there a plan to support the full list of existing redis commands? Especially the SortedSet ones since they are widely used?

2) How is the performance of the select statement be affected by the offset? For eg. do we expect to see any performance difference between the following queries?
SELECT * from name offset 5 limit 5;
SELECT * from name offset 1000000 limit 5;

P.S. Great product BTW. Making redis work as a distributed KV store in CP mode seems really promising to start using redis for durable storage needs.

hi @srikanthm-1

1) Firstly, regarding <<< P.S. Great product BTW. Making redis work as a distributed KV store in CP mode seems really promising to start using redis for durable storage needs. >>>

Thank you very much for the positive feedback. That is always very motivating for everyone on the team.

2) Yes - good observation about

SELECT * from name offset 5 limit 5;
vs:
SELECT * from name offset 1000000 limit 5;

The performance of second query will be much worse. We'll have to scan that many rows and skip the unmatched rows. Indexing doesn't quite work well in this case since there might also be WHERE clauses and such to deal with. BTW, this is the case in other databases too (like Oracle or MySQL).

This comes down to different pagination techniques:

  • Ideal way to do pagination is to simply hold on to the statement in a client-session and keep fetching more rows say 500 rows at a time. Internally, the DB can track the last row or key where things were left off and continue from there. This works just fine in YCQL too.

  • Sometimes pagination state needs to be returned all the way to say the browser (like in a three tier app), and the next request may not come to the same middle-tier server/process. In this case, you have two choices:

a) use the last key from the previous set of results, and issue the next query something like:

SELECT * FROM table where k > 'last_key' LIMIT 10;

b) But in some use cases, folks prefer to do OFFSET M LIMIT 10 for simplicity, rather than performance. This is Ok as long as the OFFSETs are small, but not advisable for large offsets.

3) Regarding REDIS command support, current list is here:

https://docs.yugabyte.com/latest/api/redis/

For sorted-sets per se, here's the subset of commands:

ZADD | Add a sorted set entry
ZCARD | Get cardinality of a sorted set
ZRANGE | Retrieve sorted set entries for given index range
ZRANGEBYSCORE | Retrieve sorted set entries for a given score range
ZREM | Delete a sorted set entry
ZREVRANGE | Retrieve sorted set entries for given index range ordered from highest to lowest score

Can you let us know what other sorted set commands the akka-persistence plugin for redis was using? We can track that, and try to prioritize it best to the extent possible.

Thanks @kmuthukk for explaining the performance for offset. Even I was thinking something along option "a" since that gives a more predictable performance. That would also eliminate returning any duplicate results when a record is inserted between two page fetches.

Reg SortedSet, the commands which are missing for akka-persistence-redis are
zremrangebyscore
zrevrangebyscore

Apart from that, I also see a lot of uses for zrangebylex and zrevrangebylex commands as well (Widely used for prefix matching scenarios). So if you could prioritize that too, that would be great.

Thanks,
Srikanth

Hi @srikanthm-1,

Could you please open a separate github issue against us for this? Something like:

Support more sorted set commands in YEDIS
And in the details you could add exactly what you had mentioned above:

Reg SortedSet, the commands which are missing for akka-persistence-redis are
zremrangebyscore
zrevrangebyscore

Apart from that, I also see a lot of uses for zrangebylex and zrevrangebylex commands as well (Widely used for prefix matching scenarios). So if you could prioritize that too, that would be great.

Was this page helpful?
0 / 5 - 0 ratings