@vladzcloudius ran a simple cassandra-stress ran against a cluster of 3 Scylla 3.0 nodes:
cassandra-stress-shard-aware write duration=30m no-warmup -node <address> -rate threads=600 fixed=150000/s -mode native cql3
The cluster successfully handled 150,000 requests per second, and after a while, Vlad created a materialized view.
From this point, the cluster is much busier, needing to both build the view on the existing data, and to write view updates for new data which comes in. But we expect flow control to slow down the client.
But the "view update backlog per shard" graph on one of the nodes looks like this:

There are two bad things here. First, we see a view backlog on just two shards. Why? Second, the view backlog is growing linearly. It should have slowed down its growth because of flow control, but doesn't. In this graph it appears as if the client is not slowing down at all (in total requests box, we see the client slow down to 95,000 requests per second immediately after the creation of the view, but it does not continue to slow down).
Is it possible that we have a bug in our view backlog reporting - where a base replica where only two shards have large backlogs, and 6 others have almost zero backlogs - report back an almost zero backlog instead of the maximum of all shards?
Cassandra-stress is what you call an interactive client, so depending on its concurrency (and potentially other factors), it will just keep sending requests to meet the requested rate.
What's the key distribution?
As far as I know, at least in its default modes, Cassandra-stress is a limited-concurrency application. You set a specific number of threads and each thread will send at most one request.
So flow control should have worked.
It isn't if you specify the rate, otherwise it would suffer from coordinated omission. This is assuming we use the c-s version that has those fixes.
@duarten really, so the "threads" option is just ignored, and it just uses the "fixed" option and bombards the server with unlimited concurrency?
@vladzcloudius can you try the same test without the "fixed" option, just let it run with a known concurrency, not trying to achieve any specific rate?
I think so, if we're running with the fix described in http://psy-lob-saw.blogspot.com/2016/07/fixing-co-in-cstress.html (which I hope we are).
I chatted with @vladzcloudius and it looks like a real bug, where we aren't applying any delay to user writes (the throttled base writes were 0). He showed me a run without the --rate option.
The difference between this test and the ones we performed before is that the shards have 8GB of memory, which allows for a much bigger backlog. Maybe that's messing with the calculations?
We should look into this with very high priority, @eliransin and @nyh. I'm knee-deep in the consensus stuff :/
@duarten @vladzcloudius I'm now trying to reproduce this issue. Just so we're on the same page, @vladzcloudius were you running this on Scylla 3.0 or master?
@nyh scylla-3.0.1
@vladzcloudius also clarified in an email to the mailing list that in this test:
Most helpful comment
We should look into this with very high priority, @eliransin and @nyh. I'm knee-deep in the consensus stuff :/