Scylla: Replaying hints before repair

Created on 16 Feb 2021 · 30Comments · Source: scylladb/scylla

This issue is meant to serve a continuation of the discussion from under the issue #4712, starting from here. I summed up points made in the discussion and present them here.

Both hinted handoff and repair are anti-entropy mechanisms. They work independently from each other, which may sometimes cause us to do more amount of work if necessary. For example, if node B is repaired, then most hints towards B become obsolete as they would try to write data which was already fixed by repair.

One notable exception are CL=ANY writes - they do not need to be considered written by any replica to be successful, it is sufficient to only write a hint for it. Dropping those hints would cause us to lose information about those writes. Although CL=ANY is pretty unreliable, repair didn't cause us to lose those writes, so we shouldn't probably do it now.

Proposal

Keeping the above in mind, a better idea than dropping hints is to wait until hints are sent out before starting repair. This may speed up repair because, in some cases, hints themselves may be able to repair the data perfectly, or at least a considerable amount of it.

Considerations

While we wait for hints to be sent out, more of them can be generated

If we waited until all hints are sent out and no hints are left in each participating hints queue, we could get stuck because new hints can be generated, even if the destination node is available. We can protect from that by marking a point at the end of the hints log and waiting until all hints up to that point are sent.
Some hints in the queue A->B might not be owned by B anymore

In normal mode, such "orphaned" hints are sent with CL=ALL to all current replicas. However, some or all new replicas can be DOWN which means that we won't be able to send it. This is problematic considering the hints caused by CL=ANY - we can't drop them. A safe solution would be to write them with lower consistency level, or push the hint back to the end of the queue.

Evaluation

It's not immediately obvious that it will be an improvement. It's possible to imagine cases in which replaying hints before repair could either prolong or speed up the whole operation. The perfect scenario happens when stored hints themselves are able to repair the cluster completely, and there are not too many overwrites of the same key. In such case, the data will be fully consistent and repair won't have anything to do.

However, in other scenarios it could happen that:

Some hints could not be generated - we ran out of space for hints, or the period during which we generate hints for a node has ended (3hrs after node goes down, by default),
Some hints could be outright lost - e.g. a node that had some hints crashed and could not be brought up,
A user did many overwrites - a single row will be written by hints many times, while row-level repair will process it only once.

We should create a POC and evaluate its performance on different test cases. Depending on the results, we might also consider adding a on/off flag to the nodetool repair command and other node operations which use repair, or even drop this idea altogether if it performs poorly.

Summary

Hints and repair has conflicting goals, and may do double work unnecessarily
Dropping hints is a bad idea due to CL=ANY writes which may be stored for some time only in the form of hints
The idea is to replay hints before repair - it may cause repair to be cheaper and result in a speedup / less work done
It may turn out that it does or does not result in a speedup - we should test how it affects repair and repair-based operations in some scenarios

cc: @vladzcloudius @gleb-cloudius @asias @avikivity @slivne @kbr- @haaawk could you comment? Do you see any issues that should be resolved before moving on to implementing POC?

enhancement hinted-handoff stability

Source

piodul

Most helpful comment

On Wed, Feb 17, 2021 at 07:24:03PM -0800, Piotr Jastrzębski wrote:

As to @haaawk suggestion not to replay hints while repairing - why is it even a problem? We have all kinds of schedulers (sorry, "controllers") at place that should make them run nicely in parallel.

So, again, seems to be a non-issue to me.

We've got issues that cluster performance drops when hints are replayed and repair/streaming runs at the same time. Apparently they don't run nicely in parallel.

Both hint replay and repair/streaming are running in streaming
scheduling class. They should with one another, but not with the main
workload. We need to investigate why this is happening, not to try
hiding it.

--
Gleb.

gleb-cloudius on 18 Feb 2021

❤1 👍1

All 30 comments

Do you see any issues that should be resolved before moving on to implementing POC?

yes, we should remove CL=ANY, it should be forever forgotten

kbr- on 16 Feb 2021

The idea of replaying all hints before the repair makes a lot sense.
Note that all use cases that you mentioned when hints can't be sent would make the repair be impossible as well.

The only problem here is: how long it's going to take to replay those hints - we should limit this by some reasonable time limit and if HH replay doesn't complete - proceed to the repair regardless.

vladzcloudius on 17 Feb 2021

👍1

I have the following thoughts:

It seems to be important not to replay hints and run repair at the same time. It's ok to reply hints first and then do repair or do the repair first and then replay hints or even to replay part of hints first, then do the repair and then replay the remaining hints. To make sure that the cluster performance is not affected too much we should make sure though that those two things don't run in parallel.
Whether it's worth to replay hints or not before repair depends on the use case. If there are many overrides/updates in the use case then it might be much cheaper to just do a repair.
We can't just drop hints because of CL ANY.

Thus, I would propose the following idea for a consideration:

Let's mark hints that were created with CL ANY and let's do it in a zero cost abstraction way so that only hints with CL ANY have to pay the price. For example we could have a separate log that allows telling which hints are CL ANY hints
When repair is about to be started, let's replay only hints that were created with CL ANY and drop any other hints.
Only after all CL ANY hints are replayed, we start the repair

My thinking is that we should normally have no CL ANY hints. Our MV don't create CL ANY hints so the user has to explicitly use CL ANY to get CL ANY hints in their cluster.

What do you think? @piodul ? @vladzcloudius ? @slivne ?

haaawk on 17 Feb 2021

On a second thought, we can replay CL ANY hints after the repair or in parallel with it because the expected number of such hints is small.

haaawk on 17 Feb 2021

I do not see why CL=ANY is any (pun intended) special. What if a hint
was written for CL=ONE but the one node that got the write died? It it
OK to drop such hint?

--
Gleb.

gleb-cloudius on 17 Feb 2021

👍1

I believe it is. When you write with CL=ONE you accept the risk. CL ANY is special because it may not be present on any replica.

haaawk on 17 Feb 2021

@gleb-cloudius In general it's ok to drop hints because they are just optimization to reduce the amount of work repair has to do. CL ANY is special because that's something repair can't fix.

haaawk on 17 Feb 2021

On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski wrote:

I believe it is. When you write with CL=ONE you accept the risk. CL ANY is special because it may not be present on any replica.

I do not understand that distinction. You accept the risk of using
whatever CL you chose to use. I do not see why you feel you need to try
harder for one but not the other.

--
Gleb.

gleb-cloudius on 17 Feb 2021

On Wed, Feb 17, 2021 at 05:58:16AM -0800, Piotr Jastrzębski wrote:

@gleb-cloudius In general it's ok to drop hints because they are just optimization to reduce the amount of work repair has to do. CL ANY is special because that's something repair can't fix.

The repair cannot fix CL=ONE after the only node that got the write crashed
as well.

--
Gleb.

gleb-cloudius on 17 Feb 2021

śr., 17 lut 2021 o 15:04 Gleb Natapov notifications@github.com napisał(a):

On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski wrote:

I believe it is. When you write with CL=ONE you accept the risk. CL ANY
is special because it may not be present on any replica.

I do not understand that distinction. You accept the risk of using
whatever CL you chose to use. I do not see why you feel you need to try
harder for one but not the other.

CL=ANY doesn't guarantee the write will end up on any replica in the end,
so here's an idea: to handle a CL=ANY request, simply return an ack to the
client without doing anything else. Then the performance of these CL=ANY
requests will be fantastic and everyone will be using CL=ANY. No data will
appear in the cluster from these writes but well, that's what CL=ANY
guarantees (i.e. does not guarantee) after all.

But seriously, we should get rid of CL=ANY altogether.

--
Gleb.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/8102#issuecomment-780577861,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABRW6FNG2YDSNTNLVSQYQPLS7PEFXANCNFSM4XWVPPAQ
.

kbr- on 17 Feb 2021

On Wed, Feb 17, 2021 at 05:58:16AM -0800, Piotr Jastrzębski wrote: @gleb-cloudius In general it's ok to drop hints because they are just optimization to reduce the amount of work repair has to do. CL ANY is special because that's something repair can't fix.
The repair cannot fix CL=ONE after the only node that got the write crashed as well.
…
-- Gleb.

There's a difference in not being able to fix something when one node in a cluster is down vs not being able to fix something always (no matter what), don't you think @gleb-cloudius ?

haaawk on 17 Feb 2021

On Wed, Feb 17, 2021 at 06:11:50AM -0800, Kamil Braun wrote:

śr., 17 lut 2021 o 15:04 Gleb Natapov notifications@github.com napisał(a):

On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski wrote:

I believe it is. When you write with CL=ONE you accept the risk. CL ANY
is special because it may not be present on any replica.

I do not understand that distinction. You accept the risk of using
whatever CL you chose to use. I do not see why you feel you need to try
harder for one but not the other.

CL=ANY doesn't guarantee the write will end up on any replica in the end,
Neither is CL=ONE.

so here's an idea: to handle a CL=ANY request, simply return an ack to the
client without doing anything else. Then the performance of these CL=ANY
requests will be fantastic and everyone will be using CL=ANY. No data will
appear in the cluster from these writes but well, that's what CL=ANY
guarantees (i.e. does not guarantee) after all.

Ah famous old "Mongo DB Is Web Scale" joke.

But seriously, we should get rid of CL=ANY altogether.

For some SQL folks the whole NoSQL thing is a big CL=ANY, but others
find it useful for certain types of data. The same is true for different
levels of CL in Cassadra - some people may be happy with guaranties
CL=ANY provides.

--
Gleb.

gleb-cloudius on 17 Feb 2021

On Wed, Feb 17, 2021 at 06:12:57AM -0800, Piotr Jastrzębski wrote:

There's a difference in not being able to fix something when one node in a cluster is down vs not being able to fix something always (no matter what), don't you think @gleb-cloudius ?

That is not matter of "not being able to fix something when one node in
a cluster is down". The node is gone. You will not see it again. The
data is gone with is as well. You will not see it again as well. Unless
you reply hints which is exactly the same situation as with CL=ANY and
for that reason I do not see any difference.

--
Gleb.

gleb-cloudius on 17 Feb 2021

śr., 17 lut 2021 o 15:20 Gleb Natapov notifications@github.com napisał(a):

On Wed, Feb 17, 2021 at 06:11:50AM -0800, Kamil Braun wrote:

śr., 17 lut 2021 o 15:04 Gleb Natapov notifications@github.com
napisał(a):

On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski wrote:

I believe it is. When you write with CL=ONE you accept the risk. CL
ANY
is special because it may not be present on any replica.

I do not understand that distinction. You accept the risk of using
whatever CL you chose to use. I do not see why you feel you need to try
harder for one but not the other.

CL=ANY doesn't guarantee the write will end up on any replica in the end,
Neither is CL=ONE.

CL=ONE guarantees that if the write successful (client gets successful
ack), it ended up at one replica. That's a guarantee - in contrast to the
guarantee of nothing given by CL=ANY.

so here's an idea: to handle a CL=ANY request, simply return an ack to
the
client without doing anything else. Then the performance of these CL=ANY
requests will be fantastic and everyone will be using CL=ANY. No data
will
appear in the cluster from these writes but well, that's what CL=ANY
guarantees (i.e. does not guarantee) after all.

Ah famous old "Mongo DB Is Web Scale" joke.

But seriously, we should get rid of CL=ANY altogether.

For some SQL folks the whole NoSQL thing is a big CL=ANY, but others
find it useful for certain types of data. The same is true for different
levels of CL in Cassadra - some people may be happy with guaranties
CL=ANY provides.

--
Gleb.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/8102#issuecomment-780588065,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABRW6FIFHLEXW372GZCX7EDS7PGEFANCNFSM4XWVPPAQ
.

kbr- on 17 Feb 2021

Ok @gleb-cloudius I get your perspective.
I think of hints as optimization and your perspective adds also safety which is fair.

Whether one should use hints to decrease the chances of a data loss instead of increasing CL from ONE to QUORUM is debatable but it is fair to consider this case. It seems not very useful though because it gives you only a very short window to recover the data from hints.

So just to be clear @gleb-cloudius - you're absolutely against dropping any hints because they are another replica of data and in corner cases may safe data from being lost?

haaawk on 17 Feb 2021

On Wed, Feb 17, 2021 at 06:32:34AM -0800, Kamil Braun wrote:

śr., 17 lut 2021 o 15:20 Gleb Natapov notifications@github.com napisał(a):

On Wed, Feb 17, 2021 at 06:11:50AM -0800, Kamil Braun wrote:

śr., 17 lut 2021 o 15:04 Gleb Natapov notifications@github.com
napisał(a):

On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski wrote:

I believe it is. When you write with CL=ONE you accept the risk. CL
ANY
is special because it may not be present on any replica.

I do not understand that distinction. You accept the risk of using
whatever CL you chose to use. I do not see why you feel you need to try
harder for one but not the other.

CL=ANY doesn't guarantee the write will end up on any replica in the end,
Neither is CL=ONE.

CL=ONE guarantees that if the write successful (client gets successful
ack), it ended up at one replica. That's a guarantee - in contrast to the
guarantee of nothing given by CL=ANY.

CL=ANY guaranties that if write succeeds its either written to at
least one of the replicas or a hint is written. This is not nothing. You
do not like it do not use it. But neither guaranties that you will
eventually be able to get your data back.

--
Gleb.

gleb-cloudius on 17 Feb 2021

On Wed, Feb 17, 2021 at 06:34:55AM -0800, Piotr Jastrzębski wrote:

So just to be clear @gleb-cloudius - you're absolutely against dropping any hints because they are another replica of data and in corner cases may safe data from being lost?

I think you misunderstood my first comment. I only pointed out that
I do not see why CL=ANY should be treated differently as discussion
seams to be going this way.

I personally prefer not to drop hint voluntarily (they can disappear if a
node that is holding them dies, so expecting them to always be replayed
would be a wishful thinking on my part), but if you are going to drop
them for a grater good (I do not see that the proposal mandates this)
you do not need to treat CL=ANY any special.

--
Gleb.

gleb-cloudius on 17 Feb 2021

śr., 17 lut 2021 o 15:37 Gleb Natapov notifications@github.com napisał(a):

On Wed, Feb 17, 2021 at 06:32:34AM -0800, Kamil Braun wrote:

śr., 17 lut 2021 o 15:20 Gleb Natapov notifications@github.com
napisał(a):

On Wed, Feb 17, 2021 at 06:11:50AM -0800, Kamil Braun wrote:

śr., 17 lut 2021 o 15:04 Gleb Natapov notifications@github.com
napisał(a):

On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski wrote:

I believe it is. When you write with CL=ONE you accept the risk.
CL
ANY
is special because it may not be present on any replica.

I do not understand that distinction. You accept the risk of using
whatever CL you chose to use. I do not see why you feel you need
to try
harder for one but not the other.

CL=ANY doesn't guarantee the write will end up on any replica in the
end,
Neither is CL=ONE.

CL=ONE guarantees that if the write successful (client gets successful
ack), it ended up at one replica. That's a guarantee - in contrast to the
guarantee of nothing given by CL=ANY.

CL=ANY guaranties that if write succeeds its either written to at
least one of the replicas or a hint is written. This is not nothing. You
do not like it do not use it. But neither guaranties that you will
eventually be able to get your data back.

And what does the fact that "hint is written" give?
A hint is something that can be discarded. On the other hand, data written
to replica cannot be discarded - unless external conditions discard it for
us (i.e. there is a serious failure of persistent storage). Hints are being
actively discarded by the system itself.
If you want to argue that hints cannot / shouldn't be discarded, then I
would agree with the point you're trying to make - but then we would
actually need to do some serious redesign around hints since currently (as
far as I understand) there are many scenarios where a hint can simply
disappear without ever being flushed to a replica.

--
Gleb.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/8102#issuecomment-780599022,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABRW6FIMEZWYGBFXC7OQ6STS7PIDNANCNFSM4XWVPPAQ
.

kbr- on 17 Feb 2021

On Wed, Feb 17, 2021 at 06:47:40AM -0800, Kamil Braun wrote:

śr., 17 lut 2021 o 15:37 Gleb Natapov notifications@github.com napisał(a):

On Wed, Feb 17, 2021 at 06:32:34AM -0800, Kamil Braun wrote:

śr., 17 lut 2021 o 15:20 Gleb Natapov notifications@github.com
napisał(a):

On Wed, Feb 17, 2021 at 06:11:50AM -0800, Kamil Braun wrote:

śr., 17 lut 2021 o 15:04 Gleb Natapov notifications@github.com
napisał(a):

On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski wrote:

I believe it is. When you write with CL=ONE you accept the risk.
CL
ANY
is special because it may not be present on any replica.

I do not understand that distinction. You accept the risk of using
whatever CL you chose to use. I do not see why you feel you need
to try
harder for one but not the other.

CL=ANY doesn't guarantee the write will end up on any replica in the
end,
Neither is CL=ONE.

CL=ONE guarantees that if the write successful (client gets successful
ack), it ended up at one replica. That's a guarantee - in contrast to the
guarantee of nothing given by CL=ANY.

CL=ANY guaranties that if write succeeds its either written to at
least one of the replicas or a hint is written. This is not nothing. You
do not like it do not use it. But neither guaranties that you will
eventually be able to get your data back.

And what does the fact that "hint is written" give?
And what does the fact that one replica got the write gives?

A hint is something that can be discarded. On the other hand, data written
to replica cannot be discarded - unless external conditions discard it for
us (i.e. there is a serious failure of persistent storage). Hints are being
actively discarded by the system itself.
Who told you so? A hint may not be written (but then CL=ANY will not
succeed), but if it is written it will not be discarded unless the
proposal here to discard them will be implemented.

If you want to argue that hints cannot / shouldn't be discarded, then I
would agree with the point you're trying to make - but then we would
actually need to do some serious redesign around hints since currently (as
far as I understand) there are many scenarios where a hint can simply
disappear without ever being flushed to a replica.

What scenario is that except a node that holds it dies? Which is exactly
the same scenario where CL=ONE will lose data.

--
Gleb.

gleb-cloudius on 17 Feb 2021

On Wed, Feb 17, 2021 at 06:34:55AM -0800, Piotr Jastrzębski wrote: So just to be clear @gleb-cloudius - you're absolutely against dropping any hints because they are another replica of data and in corner cases may safe data from being lost?
I think you misunderstood my first comment. I only pointed out that I do not see why CL=ANY should be treated differently as discussion seams to be going this way. I personally prefer not to drop hint voluntarily (they can disappear if a node that is holding them dies, so expecting them to always be replayed would be a wishful thinking on my part), but if you are going to drop them for a grater good (I do not see that the proposal mandates this) you do not need to treat CL=ANY any special.
…
-- Gleb.

Fair enough. Let's just make sure we don't run repair and replay hints at the same time then.

haaawk on 17 Feb 2021

śr., 17 lut 2021 o 15:53 Gleb Natapov notifications@github.com napisał(a):

On Wed, Feb 17, 2021 at 06:47:40AM -0800, Kamil Braun wrote:

śr., 17 lut 2021 o 15:37 Gleb Natapov notifications@github.com
napisał(a):

On Wed, Feb 17, 2021 at 06:32:34AM -0800, Kamil Braun wrote:

śr., 17 lut 2021 o 15:20 Gleb Natapov notifications@github.com
napisał(a):

On Wed, Feb 17, 2021 at 06:11:50AM -0800, Kamil Braun wrote:

śr., 17 lut 2021 o 15:04 Gleb Natapov notifications@github.com
napisał(a):

On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski
wrote:

I believe it is. When you write with CL=ONE you accept the
risk.
CL
ANY
is special because it may not be present on any replica.

I do not understand that distinction. You accept the risk of
using
whatever CL you chose to use. I do not see why you feel you
need
to try
harder for one but not the other.

CL=ANY doesn't guarantee the write will end up on any replica in
the
end,
Neither is CL=ONE.

CL=ONE guarantees that if the write successful (client gets
successful
ack), it ended up at one replica. That's a guarantee - in contrast
to the
guarantee of nothing given by CL=ANY.

CL=ANY guaranties that if write succeeds its either written to at
least one of the replicas or a hint is written. This is not nothing.
You
do not like it do not use it. But neither guaranties that you will
eventually be able to get your data back.

And what does the fact that "hint is written" give?
And what does the fact that one replica got the write gives?

Then, as long as the replica does not fail, the data is available.

A hint is something that can be discarded. On the other hand, data
written
to replica cannot be discarded - unless external conditions discard it
for
us (i.e. there is a serious failure of persistent storage). Hints are
being
actively discarded by the system itself.
Who told you so? A hint may not be written (but then CL=ANY will not
succeed), but if it is written it will not be discarded unless the
proposal here to discard them will be implemented.

So hints are not ever discarded unless they are flushed?
If that is the case, then my argument started from the wrong assumptions.

If you want to argue that hints cannot / shouldn't be discarded, then I
would agree with the point you're trying to make - but then we would
actually need to do some serious redesign around hints since currently
(as
far as I understand) there are many scenarios where a hint can simply
disappear without ever being flushed to a replica.

What scenario is that except a node that holds it dies? Which is exactly
the same scenario where CL=ONE will lose data.

I was convinced that a node can drop hints when its hint queue is full / it
is overloaded.
However, if that is not true...

Then it changes everything!
Can you confirm that the following is guaranteed:

If a CL=ANY write succeeds, then the write is persisted (on coordinator
as a hint or replica, doesn't matter); (also let's forget about commitlog
batch mode for the moment)
If a hint is persisted, then it is kept until it is persisted to at
least one replica?
Because in that case, hints DO give "strong" guarantees and CL=ANY writes
can actually be considered reliable, i.e. if nodes don't experience
failures, then eventually the write will appear.

--
Gleb.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/8102#issuecomment-780609753,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABRW6FKZGWGG2CXVSHCFMF3S7PJ6VANCNFSM4XWVPPAQ
.

kbr- on 17 Feb 2021

On Wed, Feb 17, 2021 at 07:01:37AM -0800, Kamil Braun wrote:

And what does the fact that "hint is written" give?
And what does the fact that one replica got the write gives?

Then, as long as the replica does not fail, the data is available.

But we are HA database. A node may always fail.

>

A hint is something that can be discarded. On the other hand, data
written
to replica cannot be discarded - unless external conditions discard it
for
us (i.e. there is a serious failure of persistent storage). Hints are
being
actively discarded by the system itself.
Who told you so? A hint may not be written (but then CL=ANY will not
succeed), but if it is written it will not be discarded unless the
proposal here to discard them will be implemented.

So hints are not ever discarded unless they are flushed?
Unless a node holding it crashes. That was the case last time I looked at the code.

If that is the case, then my argument started from the wrong assumptions.

>

If you want to argue that hints cannot / shouldn't be discarded, then I
would agree with the point you're trying to make - but then we would
actually need to do some serious redesign around hints since currently
(as
far as I understand) there are many scenarios where a hint can simply
disappear without ever being flushed to a replica.

What scenario is that except a node that holds it dies? Which is exactly
the same scenario where CL=ONE will lose data.

I was convinced that a node can drop hints when its hint queue is full / it
is overloaded.
It can! But than CL=ANY will know that a hint was not written and it
will not succeed. Basically CL=ANY treats successfully written hint as
an ACK. If you disable hints CL=ANY will behave exactly same as CL=ONE.

However, if that is not true...

Then it changes everything!
Can you confirm that the following is guaranteed:

If a CL=ANY write succeeds, then the write is persisted (on coordinator
as a hint or replica, doesn't matter); (also let's forget about commitlog
batch mode for the moment)
Yes.

If a hint is persisted, then it is kept until it is persisted to at
least one replica?
This was certainly the case when I last looked. A hint can be discarded
only if a node crashes oh a hints file is corrupted.

Because in that case, hints DO give "strong" guarantees and CL=ANY writes
can actually be considered reliable, i.e. if nodes don't experience
failures, then eventually the write will appear.

--
Gleb.

gleb-cloudius on 17 Feb 2021

śr., 17 lut 2021 o 16:15 Gleb Natapov notifications@github.com napisał(a):

On Wed, Feb 17, 2021 at 07:01:37AM -0800, Kamil Braun wrote:

And what does the fact that "hint is written" give?
And what does the fact that one replica got the write gives?

Then, as long as the replica does not fail, the data is available.

But we are HA database. A node may always fail.

>

A hint is something that can be discarded. On the other hand, data
written
to replica cannot be discarded - unless external conditions discard
it
for
us (i.e. there is a serious failure of persistent storage). Hints are
being
actively discarded by the system itself.
Who told you so? A hint may not be written (but then CL=ANY will not
succeed), but if it is written it will not be discarded unless the
proposal here to discard them will be implemented.

So hints are not ever discarded unless they are flushed?
Unless a node holding it crashes. That was the case last time I looked at
the code.

I hope by "crash" you mean "crash and burn" (irrecoverably) and not simply
restart.

If that is the case, then my argument started from the wrong assumptions.

>

If you want to argue that hints cannot / shouldn't be discarded,
then I
would agree with the point you're trying to make - but then we would
actually need to do some serious redesign around hints since
currently
(as
far as I understand) there are many scenarios where a hint can simply
disappear without ever being flushed to a replica.

What scenario is that except a node that holds it dies? Which is
exactly
the same scenario where CL=ONE will lose data.

I was convinced that a node can drop hints when its hint queue is full /
it
is overloaded.
It can! But than CL=ANY will know that a hint was not written and it
will not succeed. Basically CL=ANY treats successfully written hint as
an ACK. If you disable hints CL=ANY will behave exactly same as CL=ONE.

By "drop" I mean drop a hint after it was written.
So from what you're saying, it cannot drop a hint after it was written
(persisted).

In other words: successful CL=ANY write guarantees the data is made durable
on at least one node.
More generally, for any CL, if the write is not successful, then a hint
kind of acts as an additional replica...
Well, assuming that the coordinator is not one of the replicas - because if
it is, then the hint doesn't really
add anything in terms of durability. The data is written twice to the same
node.
But this could be improved by enforcing the hint to be sent to yet another
node if the coordinator is a replica (just an idea).

However, if that is not true...

Then it changes everything!
Can you confirm that the following is guaranteed:

If a CL=ANY write succeeds, then the write is persisted (on
coordinator
as a hint or replica, doesn't matter); (also let's forget about commitlog
batch mode for the moment)
Yes.

If a hint is persisted, then it is kept until it is persisted to at
least one replica?
This was certainly the case when I last looked. A hint can be discarded
only if a node crashes oh a hints file is corrupted.

Because in that case, hints DO give "strong" guarantees and CL=ANY writes
can actually be considered reliable, i.e. if nodes don't experience
failures, then eventually the write will appear.

--
Gleb.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/8102#issuecomment-780625176,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABRW6FPZTGCMCAZYZWHOBWLS7PMRRANCNFSM4XWVPPAQ
.

kbr- on 17 Feb 2021

On Wed, Feb 17, 2021 at 07:27:33AM -0800, Kamil Braun wrote:

śr., 17 lut 2021 o 16:15 Gleb Natapov notifications@github.com napisał(a):

On Wed, Feb 17, 2021 at 07:01:37AM -0800, Kamil Braun wrote:

And what does the fact that "hint is written" give?
And what does the fact that one replica got the write gives?

Then, as long as the replica does not fail, the data is available.

But we are HA database. A node may always fail.

>

A hint is something that can be discarded. On the other hand, data
written
to replica cannot be discarded - unless external conditions discard
it
for
us (i.e. there is a serious failure of persistent storage). Hints are
being
actively discarded by the system itself.
Who told you so? A hint may not be written (but then CL=ANY will not
succeed), but if it is written it will not be discarded unless the
proposal here to discard them will be implemented.

So hints are not ever discarded unless they are flushed?
Unless a node holding it crashes. That was the case last time I looked at
the code.

I hope by "crash" you mean "crash and burn" (irrecoverably) and not simply
restart.

Yes. But even simple restart can lose 10 seconds of writes.

>

If that is the case, then my argument started from the wrong assumptions.

>

If you want to argue that hints cannot / shouldn't be discarded,
then I
would agree with the point you're trying to make - but then we would
actually need to do some serious redesign around hints since
currently
(as
far as I understand) there are many scenarios where a hint can simply
disappear without ever being flushed to a replica.

What scenario is that except a node that holds it dies? Which is
exactly
the same scenario where CL=ONE will lose data.

I was convinced that a node can drop hints when its hint queue is full /
it
is overloaded.
It can! But than CL=ANY will know that a hint was not written and it
will not succeed. Basically CL=ANY treats successfully written hint as
an ACK. If you disable hints CL=ANY will behave exactly same as CL=ONE.

By "drop" I mean drop a hint after it was written.
Why would already written hints be dropped if the hint queue is full
instead of new incoming were denied?

So from what you're saying, it cannot drop a hint after it was written
(persisted).

In other words: successful CL=ANY write guarantees the data is made durable
on at least one node.
More generally, for any CL, if the write is not successful, then a hint
kind of acts as an additional replica...
Well, assuming that the coordinator is not one of the replicas - because if
it is, then the hint doesn't really
add anything in terms of durability. The data is written twice to the same
node.
But this could be improved by enforcing the hint to be sent to yet another
node if the coordinator is a replica (just an idea).

I will tell use a secret: this is what batch log does (sending to two
other nodes in fact).

--
Gleb.

gleb-cloudius on 17 Feb 2021

(As a side note: GH is a TERRIBLE discussion tool to my taste!)

I agree with @gleb-cloudius CL=ANY should not be treated any differently than any other CL.
Nor should any hints be dropped and this is at least for a one simple reason - because if HH was enabled (or not disabled) by the user he/she MEANs for hints to be sent.

In particular, when you think about the implementation, we would have to indicate in corresponding metrics that we have dropped those hints. And then the user may have a logic on the app level that looks at those metrics and acts accordingly. So, it will be very confusing (if not to say alarming) to see s...t load of hints suddenly dropped.

It's just easier not to drop anything at all.

There seems to be some assumption across the board here that hints are always something that can be always safely discarded - but what if somebody uses hints like we use them for MVs? Namely implements something that is "strictly" reliable (I quoted since @gleb-cloudius will jump on it right away otherwise ;)) by utilizing metrics?

In this case there may be a lot of corner cases where we'd rather keep things very simple.

As to @haaawk suggestion not to replay hints while repairing - why is it even a problem? We have all kinds of schedulers (sorry, "controllers") at place that should make them run nicely in parallel.

So, again, seems to be a non-issue to me.

vladzcloudius on 17 Feb 2021

As to @haaawk suggestion not to replay hints while repairing - why is it even a problem? We have all kinds of schedulers (sorry, "controllers") at place that should make them run nicely in parallel.

So, again, seems to be a non-issue to me.

We've got issues that cluster performance drops when hints are replayed and repair/streaming runs at the same time. Apparently they don't run nicely in parallel.

haaawk on 18 Feb 2021

On Wed, Feb 17, 2021 at 07:24:03PM -0800, Piotr Jastrzębski wrote:

As to @haaawk suggestion not to replay hints while repairing - why is it even a problem? We have all kinds of schedulers (sorry, "controllers") at place that should make them run nicely in parallel.

So, again, seems to be a non-issue to me.

We've got issues that cluster performance drops when hints are replayed and repair/streaming runs at the same time. Apparently they don't run nicely in parallel.

Both hint replay and repair/streaming are running in streaming
scheduling class. They should with one another, but not with the main
workload. We need to investigate why this is happening, not to try
hiding it.

--
Gleb.

gleb-cloudius on 18 Feb 2021

❤1 👍1

On Wed, Feb 17, 2021 at 07:24:03PM -0800, Piotr Jastrzębski wrote: > As to @haaawk suggestion not to replay hints while repairing - why is it even a problem? We have all kinds of schedulers (sorry, "controllers") at place that should make them run nicely in parallel. > > So, again, seems to be a non-issue to me. We've got issues that cluster performance drops when hints are replayed and repair/streaming runs at the same time. Apparently they don't run nicely in parallel.
Both hint replay and repair/streaming are running in streaming scheduling class. They should with one another, but not with the main workload. We need to investigate why this is happening, not to try hiding it.
…
-- Gleb.

I agree but then we shouldn't mark those issues as hinted handoff issues but a scheduling issues and someone familiar with scheduler is a better suited person to investigate them.

haaawk on 18 Feb 2021

On Wed, Feb 17, 2021 at 07:24:03PM -0800, Piotr Jastrzębski wrote: > As to @haaawk suggestion not to replay hints while repairing - why is it even a problem? We have all kinds of schedulers (sorry, "controllers") at place that should make them run nicely in parallel. > > So, again, seems to be a non-issue to me. We've got issues that cluster performance drops when hints are replayed and repair/streaming runs at the same time. Apparently they don't run nicely in parallel.
Both hint replay and repair/streaming are running in streaming scheduling class. They should with one another, but not with the main workload. We need to investigate why this is happening, not to try hiding it.
…
-- Gleb.

I agree but then we shouldn't mark those issues as hinted handoff issues but a scheduling issues and someone familiar with scheduler is a better suited person to investigate them.

I can't agree more.

vladzcloudius on 18 Feb 2021

The reason to replay hints before repair is to reduce the amount of data that needs to be repaired.

So at least in my view we should replay hints before repair.