Scylla: Tracing: Slow Query Logging causes the "cached" budget component to undeflow

Created on 19 Oct 2016 · 10Comments · Source: scylladb/scylla

Scylla version (or git commit hash): 54069162f545f57c7031973a2479eb356dca1a2a
Cluster size: 3
AWS AMI: c3.8xlarge

_Description_
1) Enable a Slow Query Logging
2) Run a cassandra-stress: cassandra-stress read n=10000000 -node <address> -rate threads=500
3) See the collectd tracing statistics with scyllatop "*trac*". Note that the "cached_records" counter has a huge value.

bug

Source

vladzcloudius

All 10 comments

The above is caused by a fact that a cached component of a tracing budget becomes "negative": we return more than we have consumed. And since this is an unsigned value it translates to a huge value.

This doesn't happen when a regular tracing is enabled, so there must be some logic error in a budget handling related to a Slow Query Logging.

I continue digging.

vladzcloudius on 19 Oct 2016

The issue is caused by the fact that the trace_state migrates to the other shard without using global_trace_state_ptr.

I'm looking for a specific place in a code where it happens now...

vladzcloudius on 19 Oct 2016

The abusing trace point is

tracing::trace(_trace_state, "Reading key {} from sstable {}", *_rp.key(), seastar::value_of([&sstable] { return sstable->get_filename(); }));

vladzcloudius on 19 Oct 2016

@duarten FYI ;)

vladzcloudius on 19 Oct 2016

Yikes! I thought that had been fixed with #1678 :/

duarten on 19 Oct 2016

Nope, I only fixed the issue in a storage_proxy I knew about. If you know about any other place, please, don't hesitate to share... ;)

vladzcloudius on 19 Oct 2016

The patch fixing THIS problem in on a list. I hope this is the last place like this... ;)

vladzcloudius on 19 Oct 2016

👍1

That seems to be the only missing one!

duarten on 19 Oct 2016

👍1

_Conclusion_
The issue was affecting not only the Slow Query Logging but also a regular Tracing and it was a luck that it didn't crash line in #1678.
So, I'd define this issue as critical and would suggest to merge it into the scylla-1.4 branch.

vladzcloudius on 19 Oct 2016

Looking at https://github.com/scylladb/scylla/commit/46b86ff80126c72b22a13d4245f3e11ab869c6ba, the following places in storage_proxy need the global_trace_state_ptr:

storage_proxy::query_singular_local
storage_proxy::query_mutations_locally

duarten on 20 Oct 2016

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Remove probabilistic read repair

duarten · 5Comments

ScyllaDB combines with SPDK

hellowaywewe · 3Comments

Unable to connect to Scylla via port 9042 (and others) on Docker overlay network, and the issue seems to be specific to Scylla.

mattcobley · 6Comments

Azure: ScyllaDB setup error (optimize NIC queue settings)

Ritaja · 3Comments

Reduce buffer-handling boilerplate in mutation reader implementation

pdziepak · 6Comments