Scylla: LCS ignores min_threshold value

Created on 7 Jun 2021 · 21Comments · Source: scylladb/scylla

Installation details
Scylla version:
Cluster size: 6
OS: U16

Hardware details
Platform: Azure
Hardware: L16s_v2

Description
LCS ignores min_threshold value despite the fact that compaction_enforce_min_threshold has been set to true.

scylla.yaml:
```# Configured by Ansible Scylla role

Additional parameters can be edited right here for all-node distribution

cluster_name: "testcluster"
data_file_directories:
- /var/lib/scylla/data
commitlog_directory: /var/lib/scylla/commitlog
num_tokens: 256
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "172.17.0.4"
listen_address: 172.17.0.9
native_transport_port: 9042
read_request_timeout_in_ms: 5000
write_request_timeout_in_ms: 2000
cas_contention_timeout_in_ms: 1000
endpoint_snitch: GossipingPropertyFileSnitch
rpc_address: 172.17.0.9
broadcast_address: 172.17.0.9
rpc_port: 9160
api_port: 10000
api_address: 127.0.0.1
batch_size_warn_threshold_in_kb: 5
batch_size_fail_threshold_in_kb: 50
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
commitlog_total_space_in_mb: -1
murmur3_partitioner_ignore_msb_bits: 12
api_ui_dir: /opt/scylladb/swagger-ui/dist/
api_doc_dir: /opt/scylladb/api/api-doc/
enable_sstables_mc_format: True
auto_snapshot: False
compaction_enforce_min_threshold: true
compaction_static_shares: 100
hinted_handoff_enabled: False

$ nodetool describecluster
Using /etc/scylla/scylla.yaml as the config file
Cluster Information:
Name: testcluster
Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
DynamicEndPointSnitch: disabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
6e1c8e50-f696-331d-aaa6-a87a1fba995d: [172.17.0.4, 172.17.0.8, 172.17.0.5, 172.17.0.7, 172.17.0.9, 172.17.0.6]


schema:

CREATE KEYSPACE baselines WITH replication = {'class': 'NetworkTopologyStrategy', 'azure_dc': '3'} AND durable_writes = true;

CREATE TABLE baselines.pcf (
wpid text,
locale text,
pcf blob,
updatedtm timestamp,
PRIMARY KEY ((wpid, locale))
) WITH bloom_filter_fp_chance = 0.1
AND caching = {'keys': 'ALL', 'rows_per_partition': '1100000'}
AND comment = ''
AND compaction = {'class': 'LeveledCompactionStrategy', 'enabled': 'true', 'max_threshold': '32', 'min_threshold': '32'}
AND compression = {'chunk_length_in_kb': '8', 'compression_level': '3', 'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';


And we see that scylla compacts as if min_threshold is 2:

Jun 07 21:45:34 ansible-node6 scylla[32583]: [shard 2] compaction - [Compact baselines.pcf aa365450-c7d9-11eb-b0ed-000000000004] Compacted 2 sstables to [/var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-6806-big-Data.db:level=2, /var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-6820-big-Data.db:level=2, ]. 339MB to 339MB (~99% of original) in 10585ms = 32MB/s. ~46080 total partitions merged to 46025.
Jun 07 21:45:34 ansible-node6 scylla[32583]: [shard 2] compaction - [Compact baselines.pcf b09aae90-c7d9-11eb-b0ed-000000000004] Compacting [/var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-3432-big-Data.db:level=2, /var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-4440-big-Data.db:level=1, ]
Jun 07 21:45:46 ansible-node6 scylla[32583]: [shard 2] compaction - [Compact baselines.pcf b09aae90-c7d9-11eb-b0ed-000000000004] Compacted 2 sstables to [/var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-6834-big-Data.db:level=2, /var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-6848-big-Data.db:level=2, ]. 339MB to 339MB (~99% of original) in 11797ms = 28MB/s. ~46080 total partitions merged to 46025.
Jun 07 21:45:46 ansible-node6 scylla[32583]: [shard 2] compaction - [Compact baselines.pcf b7b894d0-c7d9-11eb-b0ed-000000000004] Compacting [/var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-3474-big-Data.db:level=2, /var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-3446-big-Data.db:level=2, /var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-4468-big-Data.db:level=1, ]
Jun 07 21:46:04 ansible-node6 scylla[32583]: [shard 2] compaction - [Compact baselines.pcf b7b894d0-c7d9-11eb-b0ed-000000000004] Compacted 3 sstables to [/var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-6862-big-Data.db:level=2, /var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-6890-big-Data.db:level=2, /var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-6904-big-Data.db:level=2, ]. 509MB to 509MB (~99% of original) in 17019ms = 29MB/s. ~69120 total partitions merged to 69038.
Jun 07 21:46:04 ansible-node6 scylla[32583]: [shard 2] compaction - [Compact baselines.pcf c1f6a6d0-c7d9-11eb-b0ed-000000000004] Compacting [/var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-3488-big-Data.db:level=2, /var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-4510-big-Data.db:level=1, ]
Jun 07 21:46:13 ansible-node6 scylla[32583]: [shard 2] compaction - [Compact baselines.pcf c1f6a6d0-c7d9-11eb-b0ed-000000000004] Compacted 2 sstables to [/var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-6918-big-Data.db:level=2, /var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-6932-big-Data.db:level=2, ]. 339MB to 339MB (~99% of original) in 9545ms = 35MB/s. ~46080 total partitions merged to 46025.
Jun 07 21:46:13 ansible-node6 scylla[32583]: [shard 2] compaction - [Compact baselines.pcf c7b942d0-c7d9-11eb-b0ed-000000000004] Compacting [/var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-3502-big-Data.db:level=2, /var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-4552-big-Data.db:level=1, ]
Jun 07 21:46:25 ansible-node6 scylla[32583]: [shard 2] compaction - [Compact baselines.pcf c7b942d0-c7d9-11eb-b0ed-000000000004] Compacted 2 sstables to [/var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-6946-big-Data.db:level=2, /var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-6960-big-Data.db:level=2, ]. 339MB to 339MB (~99% of original) in 11519ms = 29MB/s. ~46080 total partitions merged to 46025.
Jun 07 21:46:25 ansible-node6 scylla[32583]: [shard 2] compaction - [Compact baselines.pcf ceb01910-c7d9-11eb-b0ed-000000000004] Compacting [/var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-3530-big-Data.db:level=2, /var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-3544-big-Data.db:level=2, /var/lib/scylla/data/baselines/pcf-74b025a0c7d311eb9e3d000000000001/md-4580-big-Data.db:level=1, ]
```

Full scylla log from the node in question:

scylla.log.txt.tar.gz

bug compaction performance

Source

vladzcloudius

All 21 comments

@eliransin @gnumoreno @fee-mendes @raphaelsc FYI

vladzcloudius on 7 Jun 2021

👍1

@roydahan @slivne Do we have a test for this?

vladzcloudius on 7 Jun 2021

@raphaelsc @asias we need it for a customer test for a case we run streaming and
we wish to automatically slow down compaction on the receiving node.

dorlaor on 7 Jun 2021

LCS uses min threshold in L0 where STCS is performed. For higher levels min threshold isn't enforced on. For example. if level L is full, then LCS needs to promote a single sstable in L with overlapping ones in level L + 1.

isn't tweaking shares enough to slow down compaction to the desired extent? i see it was set to 100.

raphaelsc on 8 Jun 2021

@dorlaor @vladzcloudius which operation is running in that node which wants to slow down compaction? bootstrap? replace?

offstrategy fixes this problem, but users will not benefit from that in short term. I think we can improve this with LCS enforcing min threshold strictly during bootstrap, even on higher levels.

raphaelsc on 8 Jun 2021

From the logs, it seems bootstrap. Please confirm.

raphaelsc on 8 Jun 2021

Bootstrap operation but all operations are relevant

On Mon, Jun 7, 2021 at 4:00 PM Raphael Carvalho @.*>
wrote:

@dorlaor https://github.com/dorlaor @vladzcloudius
https://github.com/vladzcloudius which operation is running in that
node which wants to slow down compaction? bootstrap? replace?

offstrategy fixes this problem, but users will not benefit from that in
short term. I think we can improve this with LCS enforcing min threshold
strictly during bootstrap, even on higher levels.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/8819#issuecomment-856315903,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AANHURK6APGQFR2LXXRMGDDTRVFSRANCNFSM46IRC4LQ
.

dorlaor on 8 Jun 2021

To fully solve this problem, we came up with offstrategy on RBNO.

If we really want to slow down compaction on the receiving node, setting compaction shares is the way to go. Even if we increase min threshold, so reducing the amount of compaction rounds, backlog will be absurdly high and compaction will act very aggressively when it kicks in.

I can cook a patch for LCS to enforce threshold on higher levels but I can only do that for bootstrap and replace. Otherwise LCS promises will break.

raphaelsc on 8 Jun 2021

Indeed offstrategy was born for this, does it come into play
only with RBNO or also in regular streaming?

On Mon, Jun 7, 2021 at 4:16 PM Raphael Carvalho @.*>
wrote:

To fully solve this problem, we came up with offstrategy on RBNO.

If we really want to slow down compaction on the receiving node, setting
compaction shares is the way to go. Even if we increase min threshold, so
reducing the amount of compaction rounds, backlog will be absurdly high and
compaction will act very aggressively when it kicks in.

I can cook a patch for LCS to enforce threshold on higher levels but I can
only do that for bootstrap and replace. Otherwise LCS promises will break.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/8819#issuecomment-856322174,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AANHURN3F5TLXSXAE225WO3TRVHMLANCNFSM46IRC4LQ
.

dorlaor on 8 Jun 2021

Indeed offstrategy was born for this, does it come into play only with RBNO or also in regular streaming?
…

It's only available for RBNO, as we intend to make RBNO our default choice. Also, offstrategy wouldn't be as efficient with streaming, because streaming-based ops would've to wait for offstrategy completion before returning success, which means offstrategy is added to the operation time (this doesn't happen with RBNO+offstrategy), but streaming+offstrategy is probably better than what we have today.

raphaelsc on 8 Jun 2021

Ok, it will be good to test RBNO+offstrategy now, I agree it should be the
default

On Mon, Jun 7, 2021 at 4:39 PM Raphael Carvalho @.*>
wrote:

Indeed offstrategy was born for this, does it come into play only with
RBNO or also in regular streaming?
… <#m_-2947806895475998059_>

It's only available for RBNO, as we intend to make RBNO our default
choice. Also, offstrategy wouldn't be as efficient with streaming, because
streaming-based ops would've to wait for offstrategy completion before
returning success, which means offstrategy is added to the operation time
(this doesn't happen with RBNO+offstrategy), but streaming+offstrategy is
probably better than what we have today.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/8819#issuecomment-856330483,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AANHURKEVKP75KKAXKTTKWLTRVKDDANCNFSM46IRC4LQ
.

dorlaor on 8 Jun 2021

👍1

LCS uses min threshold in L0 where STCS is performed. For higher levels min threshold isn't enforced on. For example. if level L is full, then LCS needs to promote a single sstable in L with overlapping ones in level L + 1.

Can we delay the promote after bootstrap is finished too?

isn't tweaking shares enough to slow down compaction to the desired extent? i see it was set to 100.

asias on 8 Jun 2021

Not sure about delaying promotion, as things will go crazy once operation completes, but we can for sure make LCS act less aggressively during bootstrap. I'll put more thought on this.

raphaelsc on 8 Jun 2021

From the logs, it seems bootstrap. Please confirm.

@raphaelsc No, it's not a bootstrap - it's a population phase: c-s write only, single row per-partition, ~7.5KB per row.

vladzcloudius on 8 Jun 2021

To fully solve this problem, we came up with offstrategy on RBNO.

If we really want to slow down compaction on the receiving node, setting compaction shares is the way to go. Even if we increase min threshold, so reducing the amount of compaction rounds, backlog will be absurdly high and compaction will act very aggressively when it kicks in.

Again, this GH issue is not (!!) about bootstrapping.
I expected LCS to respect min_threshold across the board however now I understand that it's impossible given the invariant at tiers higher than L0.

So, we can close this issue.

I can cook a patch for LCS to enforce threshold on higher levels but I can only do that for bootstrap and replace. Otherwise LCS promises will break.

Don't we already have min_threshold being forced to 16 during bootstrapping already? I know we do.

If you could have enforced it for higher tiers during bootstrapping and you haven't done it yet, @raphaelsc - then that's a bug.

Please, let me know if I should open a separate GH issue for that?

vladzcloudius on 8 Jun 2021

@raphaelsc @dorlaor

Even if we increase min threshold, so reducing the amount of compaction rounds, backlog will be absurdly high and compaction will act very aggressively when it kicks in.

There's going to be a compaction backlog anyway, so at this point the critical part for a compaction to be efficient and make sure that it compacts the same row as few times as possible == reducing the amount of compaction rounds.

As to reducing the amount of compaciton shares - we know way too well that this is not going to help much because compaction is still always going to win the fight for I/O budget over CQL and memtable flushes introducing higher latencies.

The only thing that is really going to make a difference is https://github.com/scylladb/scylla/issues/7461

vladzcloudius on 8 Jun 2021

@raphaelsc @dorlaor

Even if we increase min threshold, so reducing the amount of compaction rounds, backlog will be absurdly high and compaction will act very aggressively when it kicks in.

There's going to be a compaction backlog anyway, so at this point the critical part for a compaction to be efficient and make sure that it compacts the same row as few times as possible == reducing the amount of compaction rounds.

As to reducing the amount of compaciton shares - we know way too well that this is not going to help much because compaction is still always going to win the fight for I/O budget.

The only thing that is really going to make a difference is #7461

The only thing that really solves this problem is offstrategy.

raphaelsc on 8 Jun 2021

@raphaelsc @dorlaor

Even if we increase min threshold, so reducing the amount of compaction rounds, backlog will be absurdly high and compaction will act very aggressively when it kicks in.

There's going to be a compaction backlog anyway, so at this point the critical part for a compaction to be efficient and make sure that it compacts the same row as few times as possible == reducing the amount of compaction rounds.
As to reducing the amount of compaciton shares - we know way too well that this is not going to help much because compaction is still always going to win the fight for I/O budget.
The only thing that is really going to make a difference is #7461

The only thing that really solves this problem is offstrategy.

@raphaelsc No, it is not: I suspect you've missed my comment here - https://github.com/scylladb/scylla/issues/8819#issuecomment-856723306

Read the first line, please.

vladzcloudius on 8 Jun 2021

During normal writes we shouldn't enforce min threshold on higher levels, so I think we should probably close this issue.

raphaelsc on 8 Jun 2021

If you could have enforced it for higher tiers during bootstrapping and you haven't done it yet, @raphaelsc - then that's a bug.
Please, let me know if I should open a separate GH issue for that?

@vladzcloudius IMO this should be addressed in a separate bug. If you open such a bug I will close this one but not before so we don't lose track of the fact that we have a problem.
Just to summarize and see that I didn't miss anything, min_threshold_value shouldn't be respected at higher levels than L0 except for the Bootstrap phase.

eliransin on 8 Jun 2021

@eliransin

Ok, it will be good to test RBNO+offstrategy now, I agree it should be the default
…
On Mon, Jun 7, 2021 at 4:39 PM Raphael Carvalho @.*> wrote: Indeed offstrategy was born for this, does it come into play only with RBNO or also in regular streaming? … <#m_-2947806895475998059_> It's only available for RBNO, as we intend to make RBNO our default choice. Also, offstrategy wouldn't be as efficient with streaming, because streaming-based ops would've to wait for offstrategy completion before returning success, which means offstrategy is added to the operation time (this doesn't happen with RBNO+offstrategy), but streaming+offstrategy is probably better than what we have today. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8819 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANHURKEVKP75KKAXKTTKWLTRVKDDANCNFSM46IRC4LQ .

@dorlaor Since off-strategy is not part of any existing OSS release it's irrelevant at this point.

vladzcloudius on 8 Jun 2021

Was this page helpful?

0 / 5 - 0 ratings