Scylla: Read timeout should be propagated to replica side

Created on 2 Nov 2017 · 21Comments · Source: scylladb/scylla

Currently replica-side reads don't time out, reads only time out on coordinator side. If a read takes more than a timeout, and clients retry, reads may keep accumulating on replica side, making overload worse than it should be. We should propagate the timeout and expire reads, like we do for writes.

Field-Tier1 bug overload stability

Source

tgrabiec

All 21 comments

To make the scope more clear we already propagate timeout values to
replica side all is left is a hard part to actually honor the timeout
in databse::query().

--
Gleb.

gleb-cloudius on 2 Nov 2017

@tgrabiec isn't this is handled by the per-request timeout patchset ? Am I missing something ?

glommer on 28 Apr 2018

@glommer The work which has already been implemented for this will expire reads on replica side if they are blocked on sstable read concurrency limiter. They still won't expire after admitted.

tgrabiec on 29 Apr 2018

On Sun, Apr 29, 2018 at 10:26:28AM -0700, Tomasz Grabiec wrote:

@glommer The work which has already been implemented for this will expire reads on replica side if they are blocked on sstable read concurrency limiter. They still won't expire after admitted.

What expiration points do we still have that do not honor the timeout?

--
Gleb.

gleb-cloudius on 30 Apr 2018

What do you mean by "expiration point"? Actual reading from sstables doesn't respect the timeout.

tgrabiec on 30 Apr 2018

On Mon, Apr 30, 2018 at 12:32:21AM -0700, Tomasz Grabiec wrote:

What do you mean by "expiration point"? Actual reading from sstables doesn't respect the timeout.

I mean points where a read can stuck without timeout. What is "actual
reading"? A read submitted to IO queue? Actually sstable read may
require more than one IO, right? So the code that tracks how long the
overall process takes is missing?

--
Gleb.

gleb-cloudius on 30 Apr 2018

2018-04-30 9:38 GMT+02:00 Gleb Natapov notifications@github.com:

On Mon, Apr 30, 2018 at 12:32:21AM -0700, Tomasz Grabiec wrote:

What do you mean by "expiration point"? Actual reading from sstables
doesn't respect the timeout.

I mean points where a read can stuck without timeout. What is "actual
reading"? A read submitted to IO queue? Actually sstable read may
require more than one IO, right? So the code that tracks how long the
overall process takes is missing?

Yes, a read may do many IOs, on many files. Some of them in parallel, some
of them sequentually. We will not timeout at any point once the read is
admitted to start reading from sstables, past the concurrency limiting
queue.

tgrabiec on 30 Apr 2018

We also saw reads queueing inside the data_query_stage execution stage when CPU is overloaded. That queue doesn't time out.

tgrabiec on 10 Apr 2019

We could wrap file with something that times out after a certain point. But it wouldn't play well with #1865 caches.

I don't think this can help much. At any point in time, most reads in a loaded system are queued, not running, so this will affect only a small number of queries.

avikivity on 23 Apr 2020

I think a better way would be setting a timer in the same layer that is doing the querier caching, then pass down a cancellation handle all the way down to seastar I/O. The timer, when triggered, cancels the read, including all outstanding I/O requests. If the page was filled and the reader is saved we kill the timer.

denesb on 30 Apr 2020

There is some more discussion about this on the duplicate issue on #7340 (which I'm going to close as a duplicate) , the highlights of it is:

We will need I/O cancellation support in (seasrar scylladb/seastar#811)
It can create a live lock where reads accumulate to such a long queue (on the read concurrency semaphore) that all of the
reads will become stale at the moment of execution.

eliransin on 8 Oct 2020

@vladzcloudius FYI

eliransin on 8 Oct 2020

👍1

@amihayday @slivne @denesb @eliransin @eyalgutkind
Please, let's give it the highers priority: this issue is hurting us a lot all over the place.

vladzcloudius on 17 Oct 2020

@amihayday @slivne @denesb @eliransin @eyalgutkind
Please, let's give it the highers priority: this issue is hurting us a lot all over the place.

@eliransin @slivne ping

vladzcloudius on 3 Nov 2020

👍1

@bhalevy the work you and @xemul are doing will cover this - right ?

slivne on 24 Feb 2021

@slivne no, not yet.

bhalevy on 25 Feb 2021

I'll rephrase

I know that we are not done - in the sense that you still have some patches - but what I meant as following your changes - will this be closed or are we missing something in addition to that.

slivne on 25 Feb 2021

I'll rephrase

I know that we are not done - in the sense that you still have some patches - but what I meant as following your changes - will this be closed or are we missing something in addition to that.

I was under the assumption that timeouts were not conveyed to the replica side but apparently they are (https://github.com/scylladb/scylla/issues/2927#issuecomment-341376890), and all was left was to actually honor them.
Said that, Pavel's fix for https://github.com/scylladb/seastar/issues/859, with my project to close flat_mutation_reader:s will do that and it indeed seems like there's nothing more missing.

@denesb, @tgrabiec please confirm.

bhalevy on 25 Feb 2021

I didn't see the code in your close series which actually takes action when a reader is timed out. Your series lays the groundwork for doing so, yes, but is not enough on its own.
We will still need to glue all this together. Set up a timeout timer and when this timer fires we cancel all I/O and propagate up an exception terminating any in-progress operation on the reader, after which we close and destroy them.

denesb on 25 Feb 2021

Well, the timeouts on r_c_s should already work, no? As well as the timeout parameter propagated to in-progress queries?

avikivity on 25 Feb 2021

The timeout on the r_c_s does already work, but after passing the semaphore
we hav no timeout at all effectively.

On Thu, Feb 25, 2021, 18:29 Avi Kivity notifications@github.com wrote:

Well, the timeouts on r_c_s should already work, no? As well as the
timeout parameter propagated to in-progress queries?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/2927#issuecomment-786030487,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAKTFWINTWBXBJWOIYPKSUDTAZ3IBANCNFSM4EB5FNJQ
.