The problem
When we see high compaction shares today all we know that there is at least one table that has a high compaction backlog. But we can't tell which one exactly.
Knowing which table is responsible for high compaction shares allows us to act on this, e.g. run a major compaction or change the compaction configuration (e.g. increase min_threshold).
The solution (proposed by @raphaelsc)
Add a REST API that would return the current compaction backlog for a requested item:
I think a virtual table is better since it can be used in a cloud
deployment too
On Tue, Mar 23, 2021 at 5:28 PM Vladislav Zolotarov <
@.*> wrote:
The problem
When we see high compaction shares today all we know that there is at
least one table that has a high compaction backlog. But we can't tell which
one exactly.Knowing which table is responsible for high compaction shares allows us to
act on this, e.g. run a major compaction or change the compaction
configuration (e.g. increase min_threshold).The solution (proposed by @raphaelsc https://github.com/raphaelsc)
Add a REST API that would return the current compaction backlog for a
requested item:
- If Keyspace + Table name are provided - for that table
- If only Keyspace name is provided - for all tables in that KS.
- If no parameters are provided: for all tables.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/8351, or unsubscribe
https://github.com/notifications/unsubscribe-auth/AANHURNYND3U3YEJ7HAJJFLTFEWZDANCNFSM4ZWHCMVA
.
I think a virtual table is better since it can be used in a cloud deployment too
Agree. We can start with REST API patch set that would expose the data without the virtual tables complexity. And then send a follow up series that would add a corresponding virtual table.
Virtual table infra is being added - we have a patch from Julius for that - we can do it in virtual table
Virtual table infra is being added - we have a patch from Julius for that - we can do it in virtual table
Great! Thanks for an update, @slivne.
@raphaelsc you can also consider using VTs for compaction status too in the context of https://github.com/scylladb/scylla/issues/8392
By the way, knowing which table is contributing the most for shares is important, but it's also important to understand why that's happening before any action is taken. To help with that, we can have another VT which shows the sstable layout per shard.
Example:
SHARD KS TABLE LEVEL SIZE_IN_GB SSTABLES MIN_TIMESTAMP MAX_TIMESTAMP
0 FOO BAR 0 0.1 1 20 25
0 FOO BAR 1 1.6 10 10 15
This table will be filled according to the strategy being used, i.e. levels / tiers / windows.
With this info, we can understand why the compaction strategy is failing to reduce the backlog, like: LCS: if a specific level is falling behind, or TWCS: sstables are being accumulated in a given time window.
Most helpful comment
By the way, knowing which table is contributing the most for shares is important, but it's also important to understand why that's happening before any action is taken. To help with that, we can have another VT which shows the sstable layout per shard.
Example:
This table will be filled according to the strategy being used, i.e. levels / tiers / windows.
With this info, we can understand why the compaction strategy is failing to reduce the backlog, like: LCS: if a specific level is falling behind, or TWCS: sstables are being accumulated in a given time window.