Scylla: Add support for SASI

Created on 23 Mar 2017  Â·  11Comments  Â·  Source: scylladb/scylla

SSTable Attached Secondary Indexes (SASI) implements three types of indexes, PREFIX, CONTAINS, and SPARSE.

create table examples (from http://www.doanduyhai.com/blog/?p=2058#what_is_sasi)

// Full text search on albums title
CREATE CUSTOM INDEX albums_title_idx ON music.albums(title) 
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
    'mode': 'CONTAINS',
    'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
    'tokenization_enable_stemming': 'true',
    'tokenization_locale': 'en',
    'tokenization_skip_stop_words': 'true',
    'analyzed': 'true',
    'tokenization_normalize_lowercase': 'true'
};

// Full text search on artist name with neither Tokenization nor case sensitivity
CREATE CUSTOM INDEX albums_artist_idx ON music.albums(artist) 
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
     'mode': 'PREFIX', 
     'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
     'case_sensitive': 'false'
};

select examples

SELECT * FROM team WHERE name = 'TZACH';
SELECT * FROM team WHERE name LIKE 't%';
SELECT * FROM team WHERE name LIKE '%t%';
SELECT * FROM team WHERE name LIKE '%t%' ALLOW FILTERING;
SELECT * FROM team WHERE name LIKE '%T%';
CQL cassandra 3.x compatibility enhancement

Most helpful comment

Tzach, I suggest we'll close it, we can always reopen if/when we have a concrete implementation schedule

I vote that we don't close this issue. This issue allows other users who might look for SASI in Scylla to find the issue, and see what we wrote here - i.e., that Scylla indeed doesn't have this feature, and that Tzach and Dor are saying that we're not planning to do it any time soon. This is useful information - I don't think it should be hidden in a closed issue.

All 11 comments

related Apache C* tickets:
Add SASI https://issues.apache.org/jira/browse/CASSANDRA-10661
Enable SASI index for static columns https://issues.apache.org/jira/browse/CASSANDRA-11183

The problem there is no patch, only a github issue and a matching
implementation in C*.
Once there is a patch, we'll be happy to merge it

On Sun, Jun 24, 2018 at 12:49 AM, Adam Ning notifications@github.com
wrote:

I wonder any merge possibility of this patch ? This would be the exact
feature we need

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/2203#issuecomment-399712972,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABp6RRw7w9AtC0Z3NfY_7KExSL9clmdEks5t_rgFgaJpZM4MmoTB
.

SASI indexes to likely become retroactively experimental.

Any updates on implementing SASI indexes into ScyllaDB?

With current secondary indexes, there is no way to use LIKE operator in WHERE clause, nor it is possible to filter timestamp index by range.
Only EXACT values with = operator are allowed, which is pretty rigid!

My company is considering Scylla as the alternative to DataStax Cassandra, but this problem with indexes are holding us back from it.

Btw, documentation on indexes seems rather outdated (2016?) and scarce.

CREATE TABLE audit (
    action_id uuid,
    username text,
    time timestamp,
    arguments text,
    info text,
    ip text,
    roles text,
    status text,
    PRIMARY KEY ((action_id, username), time)
) WITH CLUSTERING ORDER BY (time DESC);

CREATE INDEX ON audit (username);
CREATE INDEX ON audit (time);

SELECT * FROM audit WHERE username LIKE '%admin';

SyntaxException: line 1:26 no viable alternative at input 'username'

SELECT * FROM audit WHERE time >= '2019-04-01' and time < '2019-04-17';

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

As @micobarac , our team use the secondary index of C* , by integrating lucene, for full text search. We plan to migrate to ScyllaDB, however, the current global secondary index is based on MV, and we expect the local secondary index supporting, as C* SASI. I notice that the feature of local secondary index is not on the schedule.
Will someone join us to develop this feature, or could someone give advice for developing, since I am not familiar with codes of scylladb. @tzach @duarten @dorlaor

Recently we added local (regular) indexes but without any search
capabilities. SASI is very interesting but it's still not
on the near term roadmap (it is on the long term)

On Wed, Apr 17, 2019 at 10:03 PM Chris Zhang notifications@github.com
wrote:

As @micobarac https://github.com/micobarac , our team use the secondary
index of C* , by integrating lucene, for full text search. We plan to
migrate to ScyllaDB, however, the current global secondary index is based
on MV, and we expect the local secondary index supporting, as C* SASI. I
notice that the feature of local secondary index is not on the schedule.
Will someone join us to develop this feature, or could someone give advice
for developing, since I am not familiar with codes of scylladb. @tzach
https://github.com/tzach @duarten https://github.com/duarten @dorlaor
https://github.com/dorlaor

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/2203#issuecomment-484327265,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AANHURNP7BRUKSF3SVRC3HTPQ7JHPANCNFSM4DE2QTAQ
.

Is this actually likely to happen, my guess would be no.

@hdost we (ScyllaDB team) are not planning to work on this soon, mostly because of other more urgent features.
SASI does have a valid use case as an integration point, and we will appreciate PRs

Tzach, I suggest we'll close it, we can always reopen if/when we have a
concrete implementation schedule

On Mon, Apr 19, 2021 at 5:36 AM Tzach Livyatan @.*>
wrote:

@hdost https://github.com/hdost we (ScyllaDB team) are not planning to
work on this soon, mostly because of other more urgent features.
SASI does have a valid use case as an integration point, and we will
appreciate PRs

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/2203#issuecomment-822432537,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AANHUROEIWTGN5MLMT7EVLDTJQPV7ANCNFSM4DE2QTAQ
.

Tzach, I suggest we'll close it, we can always reopen if/when we have a concrete implementation schedule

I vote that we don't close this issue. This issue allows other users who might look for SASI in Scylla to find the issue, and see what we wrote here - i.e., that Scylla indeed doesn't have this feature, and that Tzach and Dor are saying that we're not planning to do it any time soon. This is useful information - I don't think it should be hidden in a closed issue.

Some translated Cassandra unit test that test the SASI feature:

  • cassandra_tests/validation/entities/secondary_index_test.py::testPrepareStatementsWithLIKEClauses

Note that the current Cassandra documentation calls SASI indexes "experimental and are not recommended for production use" and not enabled by default: A recent Cassandra user will need to enable the "enable_sasi_indexes" option to use them.

Was this page helpful?
0 / 5 - 0 ratings