This issue is meant to serve a continuation of the discussion from under the issue #4712, starting from here. I summed up points made in the discussion and present them here.
Both hinted handoff and repair are anti-entropy mechanisms. They work independently from each other, which may sometimes cause us to do more amount of work if necessary. For example, if node B is repaired, then most hints towards B become obsolete as they would try to write data which was already fixed by repair.
One notable exception are CL=ANY writes - they do not need to be considered written by any replica to be successful, it is sufficient to only write a hint for it. Dropping those hints would cause us to lose information about those writes. Although CL=ANY is pretty unreliable, repair didn't cause us to lose those writes, so we shouldn't probably do it now.
Keeping the above in mind, a better idea than dropping hints is to wait until hints are sent out before starting repair. This may speed up repair because, in some cases, hints themselves may be able to repair the data perfectly, or at least a considerable amount of it.
While we wait for hints to be sent out, more of them can be generated
If we waited until all hints are sent out and no hints are left in each participating hints queue, we could get stuck because new hints can be generated, even if the destination node is available. We can protect from that by marking a point at the end of the hints log and waiting until all hints up to that point are sent.
Some hints in the queue A->B might not be owned by B anymore
In normal mode, such "orphaned" hints are sent with CL=ALL to all current replicas. However, some or all new replicas can be DOWN which means that we won't be able to send it. This is problematic considering the hints caused by CL=ANY - we can't drop them. A safe solution would be to write them with lower consistency level, or push the hint back to the end of the queue.
It's not immediately obvious that it will be an improvement. It's possible to imagine cases in which replaying hints before repair could either prolong or speed up the whole operation. The perfect scenario happens when stored hints themselves are able to repair the cluster completely, and there are not too many overwrites of the same key. In such case, the data will be fully consistent and repair won't have anything to do.
However, in other scenarios it could happen that:
We should create a POC and evaluate its performance on different test cases. Depending on the results, we might also consider adding a on/off flag to the nodetool repair command and other node operations which use repair, or even drop this idea altogether if it performs poorly.
cc: @vladzcloudius @gleb-cloudius @asias @avikivity @slivne @kbr- @haaawk could you comment? Do you see any issues that should be resolved before moving on to implementing POC?
Do you see any issues that should be resolved before moving on to implementing POC?
yes, we should remove CL=ANY, it should be forever forgotten
The idea of replaying all hints before the repair makes a lot sense.
Note that all use cases that you mentioned when hints can't be sent would make the repair be impossible as well.
The only problem here is: how long it's going to take to replay those hints - we should limit this by some reasonable time limit and if HH replay doesn't complete - proceed to the repair regardless.
I have the following thoughts:
It seems to be important not to replay hints and run repair at the same time. It's ok to reply hints first and then do repair or do the repair first and then replay hints or even to replay part of hints first, then do the repair and then replay the remaining hints. To make sure that the cluster performance is not affected too much we should make sure though that those two things don't run in parallel.
Whether it's worth to replay hints or not before repair depends on the use case. If there are many overrides/updates in the use case then it might be much cheaper to just do a repair.
We can't just drop hints because of CL ANY.
Thus, I would propose the following idea for a consideration:
My thinking is that we should normally have no CL ANY hints. Our MV don't create CL ANY hints so the user has to explicitly use CL ANY to get CL ANY hints in their cluster.
What do you think? @piodul ? @vladzcloudius ? @slivne ?
On a second thought, we can replay CL ANY hints after the repair or in parallel with it because the expected number of such hints is small.
I do not see why CL=ANY is any (pun intended) special. What if a hint
was written for CL=ONE but the one node that got the write died? It it
OK to drop such hint?
--
Gleb.
I believe it is. When you write with CL=ONE you accept the risk. CL ANY is special because it may not be present on any replica.
@gleb-cloudius In general it's ok to drop hints because they are just optimization to reduce the amount of work repair has to do. CL ANY is special because that's something repair can't fix.
On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski wrote:
I believe it is. When you write with CL=ONE you accept the risk. CL ANY is special because it may not be present on any replica.
I do not understand that distinction. You accept the risk of using
whatever CL you chose to use. I do not see why you feel you need to try
harder for one but not the other.
--
Gleb.
On Wed, Feb 17, 2021 at 05:58:16AM -0800, Piotr Jastrzębski wrote:
@gleb-cloudius In general it's ok to drop hints because they are just optimization to reduce the amount of work repair has to do. CL ANY is special because that's something repair can't fix.
The repair cannot fix CL=ONE after the only node that got the write crashed
as well.
--
Gleb.
śr., 17 lut 2021 o 15:04 Gleb Natapov notifications@github.com napisał(a):
On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski wrote:
I believe it is. When you write with CL=ONE you accept the risk. CL ANY
is special because it may not be present on any replica.I do not understand that distinction. You accept the risk of using
whatever CL you chose to use. I do not see why you feel you need to try
harder for one but not the other.CL=ANY doesn't guarantee the write will end up on any replica in the end,
so here's an idea: to handle a CL=ANY request, simply return an ack to the
client without doing anything else. Then the performance of these CL=ANY
requests will be fantastic and everyone will be using CL=ANY. No data will
appear in the cluster from these writes but well, that's what CL=ANY
guarantees (i.e. does not guarantee) after all.
But seriously, we should get rid of CL=ANY altogether.
>
--
Gleb.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/8102#issuecomment-780577861,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABRW6FNG2YDSNTNLVSQYQPLS7PEFXANCNFSM4XWVPPAQ
.
On Wed, Feb 17, 2021 at 05:58:16AM -0800, Piotr Jastrzębski wrote: @gleb-cloudius In general it's ok to drop hints because they are just optimization to reduce the amount of work repair has to do. CL ANY is special because that's something repair can't fix.
The repair cannot fix CL=ONE after the only node that got the write crashed as well.
…
-- Gleb.
There's a difference in not being able to fix something when one node in a cluster is down vs not being able to fix something always (no matter what), don't you think @gleb-cloudius ?
On Wed, Feb 17, 2021 at 06:11:50AM -0800, Kamil Braun wrote:
śr., 17 lut 2021 o 15:04 Gleb Natapov notifications@github.com napisał(a):
On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski wrote:
I believe it is. When you write with CL=ONE you accept the risk. CL ANY
is special because it may not be present on any replica.I do not understand that distinction. You accept the risk of using
whatever CL you chose to use. I do not see why you feel you need to try
harder for one but not the other.CL=ANY doesn't guarantee the write will end up on any replica in the end,
Neither is CL=ONE.
so here's an idea: to handle a CL=ANY request, simply return an ack to the
client without doing anything else. Then the performance of these CL=ANY
requests will be fantastic and everyone will be using CL=ANY. No data will
appear in the cluster from these writes but well, that's what CL=ANY
guarantees (i.e. does not guarantee) after all.Ah famous old "Mongo DB Is Web Scale" joke.
But seriously, we should get rid of CL=ANY altogether.
For some SQL folks the whole NoSQL thing is a big CL=ANY, but others
find it useful for certain types of data. The same is true for different
levels of CL in Cassadra - some people may be happy with guaranties
CL=ANY provides.
--
Gleb.
On Wed, Feb 17, 2021 at 06:12:57AM -0800, Piotr Jastrzębski wrote:
There's a difference in not being able to fix something when one node in a cluster is down vs not being able to fix something always (no matter what), don't you think @gleb-cloudius ?
That is not matter of "not being able to fix something when one node in
a cluster is down". The node is gone. You will not see it again. The
data is gone with is as well. You will not see it again as well. Unless
you reply hints which is exactly the same situation as with CL=ANY and
for that reason I do not see any difference.
--
Gleb.
śr., 17 lut 2021 o 15:20 Gleb Natapov notifications@github.com napisał(a):
On Wed, Feb 17, 2021 at 06:11:50AM -0800, Kamil Braun wrote:
śr., 17 lut 2021 o 15:04 Gleb Natapov notifications@github.com
napisał(a):On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski wrote:
I believe it is. When you write with CL=ONE you accept the risk. CL
ANY
is special because it may not be present on any replica.I do not understand that distinction. You accept the risk of using
whatever CL you chose to use. I do not see why you feel you need to try
harder for one but not the other.CL=ANY doesn't guarantee the write will end up on any replica in the end,
Neither is CL=ONE.CL=ONE guarantees that if the write successful (client gets successful
ack), it ended up at one replica. That's a guarantee - in contrast to the
guarantee of nothing given by CL=ANY.
>
so here's an idea: to handle a CL=ANY request, simply return an ack to
the
client without doing anything else. Then the performance of these CL=ANY
requests will be fantastic and everyone will be using CL=ANY. No data
will
appear in the cluster from these writes but well, that's what CL=ANY
guarantees (i.e. does not guarantee) after all.Ah famous old "Mongo DB Is Web Scale" joke.
But seriously, we should get rid of CL=ANY altogether.
For some SQL folks the whole NoSQL thing is a big CL=ANY, but others
find it useful for certain types of data. The same is true for different
levels of CL in Cassadra - some people may be happy with guaranties
CL=ANY provides.--
Gleb.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/8102#issuecomment-780588065,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABRW6FIFHLEXW372GZCX7EDS7PGEFANCNFSM4XWVPPAQ
.
Ok @gleb-cloudius I get your perspective.
I think of hints as optimization and your perspective adds also safety which is fair.
Whether one should use hints to decrease the chances of a data loss instead of increasing CL from ONE to QUORUM is debatable but it is fair to consider this case. It seems not very useful though because it gives you only a very short window to recover the data from hints.
So just to be clear @gleb-cloudius - you're absolutely against dropping any hints because they are another replica of data and in corner cases may safe data from being lost?
On Wed, Feb 17, 2021 at 06:32:34AM -0800, Kamil Braun wrote:
śr., 17 lut 2021 o 15:20 Gleb Natapov notifications@github.com napisał(a):
On Wed, Feb 17, 2021 at 06:11:50AM -0800, Kamil Braun wrote:
śr., 17 lut 2021 o 15:04 Gleb Natapov notifications@github.com
napisał(a):On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski wrote:
I believe it is. When you write with CL=ONE you accept the risk. CL
ANY
is special because it may not be present on any replica.I do not understand that distinction. You accept the risk of using
whatever CL you chose to use. I do not see why you feel you need to try
harder for one but not the other.CL=ANY doesn't guarantee the write will end up on any replica in the end,
Neither is CL=ONE.CL=ONE guarantees that if the write successful (client gets successful
ack), it ended up at one replica. That's a guarantee - in contrast to the
guarantee of nothing given by CL=ANY.CL=ANY guaranties that if write succeeds its either written to at
least one of the replicas or a hint is written. This is not nothing. You
do not like it do not use it. But neither guaranties that you will
eventually be able to get your data back.
--
Gleb.
On Wed, Feb 17, 2021 at 06:34:55AM -0800, Piotr Jastrzębski wrote:
So just to be clear @gleb-cloudius - you're absolutely against dropping any hints because they are another replica of data and in corner cases may safe data from being lost?
I think you misunderstood my first comment. I only pointed out that
I do not see why CL=ANY should be treated differently as discussion
seams to be going this way.
I personally prefer not to drop hint voluntarily (they can disappear if a
node that is holding them dies, so expecting them to always be replayed
would be a wishful thinking on my part), but if you are going to drop
them for a grater good (I do not see that the proposal mandates this)
you do not need to treat CL=ANY any special.
--
Gleb.
śr., 17 lut 2021 o 15:37 Gleb Natapov notifications@github.com napisał(a):
On Wed, Feb 17, 2021 at 06:32:34AM -0800, Kamil Braun wrote:
śr., 17 lut 2021 o 15:20 Gleb Natapov notifications@github.com
napisał(a):On Wed, Feb 17, 2021 at 06:11:50AM -0800, Kamil Braun wrote:
śr., 17 lut 2021 o 15:04 Gleb Natapov notifications@github.com
napisał(a):On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski wrote:
I believe it is. When you write with CL=ONE you accept the risk.
CL
ANY
is special because it may not be present on any replica.I do not understand that distinction. You accept the risk of using
whatever CL you chose to use. I do not see why you feel you need
to try
harder for one but not the other.CL=ANY doesn't guarantee the write will end up on any replica in the
end,
Neither is CL=ONE.CL=ONE guarantees that if the write successful (client gets successful
ack), it ended up at one replica. That's a guarantee - in contrast to the
guarantee of nothing given by CL=ANY.CL=ANY guaranties that if write succeeds its either written to at
least one of the replicas or a hint is written. This is not nothing. You
do not like it do not use it. But neither guaranties that you will
eventually be able to get your data back.And what does the fact that "hint is written" give?
A hint is something that can be discarded. On the other hand, data written
to replica cannot be discarded - unless external conditions discard it for
us (i.e. there is a serious failure of persistent storage). Hints are being
actively discarded by the system itself.
If you want to argue that hints cannot / shouldn't be discarded, then I
would agree with the point you're trying to make - but then we would
actually need to do some serious redesign around hints since currently (as
far as I understand) there are many scenarios where a hint can simply
disappear without ever being flushed to a replica.
>
--
Gleb.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/8102#issuecomment-780599022,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABRW6FIMEZWYGBFXC7OQ6STS7PIDNANCNFSM4XWVPPAQ
.
On Wed, Feb 17, 2021 at 06:47:40AM -0800, Kamil Braun wrote:
śr., 17 lut 2021 o 15:37 Gleb Natapov notifications@github.com napisał(a):
On Wed, Feb 17, 2021 at 06:32:34AM -0800, Kamil Braun wrote:
śr., 17 lut 2021 o 15:20 Gleb Natapov notifications@github.com
napisał(a):On Wed, Feb 17, 2021 at 06:11:50AM -0800, Kamil Braun wrote:
śr., 17 lut 2021 o 15:04 Gleb Natapov notifications@github.com
napisał(a):On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski wrote:
I believe it is. When you write with CL=ONE you accept the risk.
CL
ANY
is special because it may not be present on any replica.I do not understand that distinction. You accept the risk of using
whatever CL you chose to use. I do not see why you feel you need
to try
harder for one but not the other.CL=ANY doesn't guarantee the write will end up on any replica in the
end,
Neither is CL=ONE.CL=ONE guarantees that if the write successful (client gets successful
ack), it ended up at one replica. That's a guarantee - in contrast to the
guarantee of nothing given by CL=ANY.CL=ANY guaranties that if write succeeds its either written to at
least one of the replicas or a hint is written. This is not nothing. You
do not like it do not use it. But neither guaranties that you will
eventually be able to get your data back.And what does the fact that "hint is written" give?
And what does the fact that one replica got the write gives?
A hint is something that can be discarded. On the other hand, data written
to replica cannot be discarded - unless external conditions discard it for
us (i.e. there is a serious failure of persistent storage). Hints are being
actively discarded by the system itself.
Who told you so? A hint may not be written (but then CL=ANY will not
succeed), but if it is written it will not be discarded unless the
proposal here to discard them will be implemented.
If you want to argue that hints cannot / shouldn't be discarded, then I
would agree with the point you're trying to make - but then we would
actually need to do some serious redesign around hints since currently (as
far as I understand) there are many scenarios where a hint can simply
disappear without ever being flushed to a replica.What scenario is that except a node that holds it dies? Which is exactly
the same scenario where CL=ONE will lose data.
--
Gleb.
On Wed, Feb 17, 2021 at 06:34:55AM -0800, Piotr Jastrzębski wrote: So just to be clear @gleb-cloudius - you're absolutely against dropping any hints because they are another replica of data and in corner cases may safe data from being lost?
I think you misunderstood my first comment. I only pointed out that I do not see why CL=ANY should be treated differently as discussion seams to be going this way. I personally prefer not to drop hint voluntarily (they can disappear if a node that is holding them dies, so expecting them to always be replayed would be a wishful thinking on my part), but if you are going to drop them for a grater good (I do not see that the proposal mandates this) you do not need to treat CL=ANY any special.
…
-- Gleb.
Fair enough. Let's just make sure we don't run repair and replay hints at the same time then.
śr., 17 lut 2021 o 15:53 Gleb Natapov notifications@github.com napisał(a):
On Wed, Feb 17, 2021 at 06:47:40AM -0800, Kamil Braun wrote:
śr., 17 lut 2021 o 15:37 Gleb Natapov notifications@github.com
napisał(a):On Wed, Feb 17, 2021 at 06:32:34AM -0800, Kamil Braun wrote:
śr., 17 lut 2021 o 15:20 Gleb Natapov notifications@github.com
napisał(a):On Wed, Feb 17, 2021 at 06:11:50AM -0800, Kamil Braun wrote:
śr., 17 lut 2021 o 15:04 Gleb Natapov notifications@github.com
napisał(a):On Wed, Feb 17, 2021 at 05:56:27AM -0800, Piotr Jastrzębski
wrote:I believe it is. When you write with CL=ONE you accept the
risk.
CL
ANY
is special because it may not be present on any replica.I do not understand that distinction. You accept the risk of
using
whatever CL you chose to use. I do not see why you feel you
need
to try
harder for one but not the other.CL=ANY doesn't guarantee the write will end up on any replica in
the
end,
Neither is CL=ONE.CL=ONE guarantees that if the write successful (client gets
successful
ack), it ended up at one replica. That's a guarantee - in contrast
to the
guarantee of nothing given by CL=ANY.CL=ANY guaranties that if write succeeds its either written to at
least one of the replicas or a hint is written. This is not nothing.
You
do not like it do not use it. But neither guaranties that you will
eventually be able to get your data back.And what does the fact that "hint is written" give?
And what does the fact that one replica got the write gives?Then, as long as the replica does not fail, the data is available.
>
A hint is something that can be discarded. On the other hand, data
written
to replica cannot be discarded - unless external conditions discard it
for
us (i.e. there is a serious failure of persistent storage). Hints are
being
actively discarded by the system itself.
Who told you so? A hint may not be written (but then CL=ANY will not
succeed), but if it is written it will not be discarded unless the
proposal here to discard them will be implemented.So hints are not ever discarded unless they are flushed?
If that is the case, then my argument started from the wrong assumptions.
>
If you want to argue that hints cannot / shouldn't be discarded, then I
would agree with the point you're trying to make - but then we would
actually need to do some serious redesign around hints since currently
(as
far as I understand) there are many scenarios where a hint can simply
disappear without ever being flushed to a replica.What scenario is that except a node that holds it dies? Which is exactly
the same scenario where CL=ONE will lose data.I was convinced that a node can drop hints when its hint queue is full / it
is overloaded.
However, if that is not true...
Then it changes everything!
Can you confirm that the following is guaranteed:
>
--
Gleb.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/8102#issuecomment-780609753,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABRW6FKZGWGG2CXVSHCFMF3S7PJ6VANCNFSM4XWVPPAQ
.
On Wed, Feb 17, 2021 at 07:01:37AM -0800, Kamil Braun wrote:
And what does the fact that "hint is written" give?
And what does the fact that one replica got the write gives?Then, as long as the replica does not fail, the data is available.
But we are HA database. A node may always fail.
>
A hint is something that can be discarded. On the other hand, data
written
to replica cannot be discarded - unless external conditions discard it
for
us (i.e. there is a serious failure of persistent storage). Hints are
being
actively discarded by the system itself.
Who told you so? A hint may not be written (but then CL=ANY will not
succeed), but if it is written it will not be discarded unless the
proposal here to discard them will be implemented.So hints are not ever discarded unless they are flushed?
Unless a node holding it crashes. That was the case last time I looked at the code.
If that is the case, then my argument started from the wrong assumptions.
>
If you want to argue that hints cannot / shouldn't be discarded, then I
would agree with the point you're trying to make - but then we would
actually need to do some serious redesign around hints since currently
(as
far as I understand) there are many scenarios where a hint can simply
disappear without ever being flushed to a replica.What scenario is that except a node that holds it dies? Which is exactly
the same scenario where CL=ONE will lose data.I was convinced that a node can drop hints when its hint queue is full / it
is overloaded.
It can! But than CL=ANY will know that a hint was not written and it
will not succeed. Basically CL=ANY treats successfully written hint as
an ACK. If you disable hints CL=ANY will behave exactly same as CL=ONE.
However, if that is not true...
Then it changes everything!
Can you confirm that the following is guaranteed:
- If a CL=ANY write succeeds, then the write is persisted (on coordinator
as a hint or replica, doesn't matter); (also let's forget about commitlog
batch mode for the moment)
Yes.
- If a hint is persisted, then it is kept until it is persisted to at
least one replica?
This was certainly the case when I last looked. A hint can be discarded
only if a node crashes oh a hints file is corrupted.
Because in that case, hints DO give "strong" guarantees and CL=ANY writes
can actually be considered reliable, i.e. if nodes don't experience
failures, then eventually the write will appear.
--
Gleb.
śr., 17 lut 2021 o 16:15 Gleb Natapov notifications@github.com napisał(a):
On Wed, Feb 17, 2021 at 07:01:37AM -0800, Kamil Braun wrote:
And what does the fact that "hint is written" give?
And what does the fact that one replica got the write gives?Then, as long as the replica does not fail, the data is available.
But we are HA database. A node may always fail.
>
A hint is something that can be discarded. On the other hand, data
written
to replica cannot be discarded - unless external conditions discard
it
for
us (i.e. there is a serious failure of persistent storage). Hints are
being
actively discarded by the system itself.
Who told you so? A hint may not be written (but then CL=ANY will not
succeed), but if it is written it will not be discarded unless the
proposal here to discard them will be implemented.So hints are not ever discarded unless they are flushed?
Unless a node holding it crashes. That was the case last time I looked at
the code.I hope by "crash" you mean "crash and burn" (irrecoverably) and not simply
restart.
>
If that is the case, then my argument started from the wrong assumptions.
>
If you want to argue that hints cannot / shouldn't be discarded,
then I
would agree with the point you're trying to make - but then we would
actually need to do some serious redesign around hints since
currently
(as
far as I understand) there are many scenarios where a hint can simply
disappear without ever being flushed to a replica.What scenario is that except a node that holds it dies? Which is
exactly
the same scenario where CL=ONE will lose data.I was convinced that a node can drop hints when its hint queue is full /
it
is overloaded.
It can! But than CL=ANY will know that a hint was not written and it
will not succeed. Basically CL=ANY treats successfully written hint as
an ACK. If you disable hints CL=ANY will behave exactly same as CL=ONE.By "drop" I mean drop a hint after it was written.
So from what you're saying, it cannot drop a hint after it was written
(persisted).
In other words: successful CL=ANY write guarantees the data is made durable
on at least one node.
More generally, for any CL, if the write is not successful, then a hint
kind of acts as an additional replica...
Well, assuming that the coordinator is not one of the replicas - because if
it is, then the hint doesn't really
add anything in terms of durability. The data is written twice to the same
node.
But this could be improved by enforcing the hint to be sent to yet another
node if the coordinator is a replica (just an idea).
>
However, if that is not true...
Then it changes everything!
Can you confirm that the following is guaranteed:
- If a CL=ANY write succeeds, then the write is persisted (on
coordinator
as a hint or replica, doesn't matter); (also let's forget about commitlog
batch mode for the moment)
Yes.
- If a hint is persisted, then it is kept until it is persisted to at
least one replica?
This was certainly the case when I last looked. A hint can be discarded
only if a node crashes oh a hints file is corrupted.Because in that case, hints DO give "strong" guarantees and CL=ANY writes
can actually be considered reliable, i.e. if nodes don't experience
failures, then eventually the write will appear.--
Gleb.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/8102#issuecomment-780625176,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABRW6FPZTGCMCAZYZWHOBWLS7PMRRANCNFSM4XWVPPAQ
.
On Wed, Feb 17, 2021 at 07:27:33AM -0800, Kamil Braun wrote:
śr., 17 lut 2021 o 16:15 Gleb Natapov notifications@github.com napisał(a):
On Wed, Feb 17, 2021 at 07:01:37AM -0800, Kamil Braun wrote:
And what does the fact that "hint is written" give?
And what does the fact that one replica got the write gives?Then, as long as the replica does not fail, the data is available.
But we are HA database. A node may always fail.
>
A hint is something that can be discarded. On the other hand, data
written
to replica cannot be discarded - unless external conditions discard
it
for
us (i.e. there is a serious failure of persistent storage). Hints are
being
actively discarded by the system itself.
Who told you so? A hint may not be written (but then CL=ANY will not
succeed), but if it is written it will not be discarded unless the
proposal here to discard them will be implemented.So hints are not ever discarded unless they are flushed?
Unless a node holding it crashes. That was the case last time I looked at
the code.I hope by "crash" you mean "crash and burn" (irrecoverably) and not simply
restart.Yes. But even simple restart can lose 10 seconds of writes.
>
If that is the case, then my argument started from the wrong assumptions.
>
If you want to argue that hints cannot / shouldn't be discarded,
then I
would agree with the point you're trying to make - but then we would
actually need to do some serious redesign around hints since
currently
(as
far as I understand) there are many scenarios where a hint can simply
disappear without ever being flushed to a replica.What scenario is that except a node that holds it dies? Which is
exactly
the same scenario where CL=ONE will lose data.I was convinced that a node can drop hints when its hint queue is full /
it
is overloaded.
It can! But than CL=ANY will know that a hint was not written and it
will not succeed. Basically CL=ANY treats successfully written hint as
an ACK. If you disable hints CL=ANY will behave exactly same as CL=ONE.By "drop" I mean drop a hint after it was written.
Why would already written hints be dropped if the hint queue is full
instead of new incoming were denied?
So from what you're saying, it cannot drop a hint after it was written
(persisted).In other words: successful CL=ANY write guarantees the data is made durable
on at least one node.
More generally, for any CL, if the write is not successful, then a hint
kind of acts as an additional replica...
Well, assuming that the coordinator is not one of the replicas - because if
it is, then the hint doesn't really
add anything in terms of durability. The data is written twice to the same
node.
But this could be improved by enforcing the hint to be sent to yet another
node if the coordinator is a replica (just an idea).I will tell use a secret: this is what batch log does (sending to two
other nodes in fact).
--
Gleb.
(As a side note: GH is a TERRIBLE discussion tool to my taste!)
I agree with @gleb-cloudius CL=ANY should not be treated any differently than any other CL.
Nor should any hints be dropped and this is at least for a one simple reason - because if HH was enabled (or not disabled) by the user he/she MEANs for hints to be sent.
In particular, when you think about the implementation, we would have to indicate in corresponding metrics that we have dropped those hints. And then the user may have a logic on the app level that looks at those metrics and acts accordingly. So, it will be very confusing (if not to say alarming) to see s...t load of hints suddenly dropped.
It's just easier not to drop anything at all.
There seems to be some assumption across the board here that hints are always something that can be always safely discarded - but what if somebody uses hints like we use them for MVs? Namely implements something that is "strictly" reliable (I quoted since @gleb-cloudius will jump on it right away otherwise ;)) by utilizing metrics?
In this case there may be a lot of corner cases where we'd rather keep things very simple.
As to @haaawk suggestion not to replay hints while repairing - why is it even a problem? We have all kinds of schedulers (sorry, "controllers") at place that should make them run nicely in parallel.
So, again, seems to be a non-issue to me.
As to @haaawk suggestion not to replay hints while repairing - why is it even a problem? We have all kinds of schedulers (sorry, "controllers") at place that should make them run nicely in parallel.
So, again, seems to be a non-issue to me.
We've got issues that cluster performance drops when hints are replayed and repair/streaming runs at the same time. Apparently they don't run nicely in parallel.
On Wed, Feb 17, 2021 at 07:24:03PM -0800, Piotr Jastrzębski wrote:
As to @haaawk suggestion not to replay hints while repairing - why is it even a problem? We have all kinds of schedulers (sorry, "controllers") at place that should make them run nicely in parallel.
So, again, seems to be a non-issue to me.
We've got issues that cluster performance drops when hints are replayed and repair/streaming runs at the same time. Apparently they don't run nicely in parallel.
Both hint replay and repair/streaming are running in streaming
scheduling class. They should with one another, but not with the main
workload. We need to investigate why this is happening, not to try
hiding it.
--
Gleb.
On Wed, Feb 17, 2021 at 07:24:03PM -0800, Piotr Jastrzębski wrote: > As to @haaawk suggestion not to replay hints while repairing - why is it even a problem? We have all kinds of schedulers (sorry, "controllers") at place that should make them run nicely in parallel. > > So, again, seems to be a non-issue to me. We've got issues that cluster performance drops when hints are replayed and repair/streaming runs at the same time. Apparently they don't run nicely in parallel.
Both hint replay and repair/streaming are running in streaming scheduling class. They should with one another, but not with the main workload. We need to investigate why this is happening, not to try hiding it.
…
-- Gleb.
I agree but then we shouldn't mark those issues as hinted handoff issues but a scheduling issues and someone familiar with scheduler is a better suited person to investigate them.
On Wed, Feb 17, 2021 at 07:24:03PM -0800, Piotr Jastrzębski wrote: > As to @haaawk suggestion not to replay hints while repairing - why is it even a problem? We have all kinds of schedulers (sorry, "controllers") at place that should make them run nicely in parallel. > > So, again, seems to be a non-issue to me. We've got issues that cluster performance drops when hints are replayed and repair/streaming runs at the same time. Apparently they don't run nicely in parallel.
Both hint replay and repair/streaming are running in streaming scheduling class. They should with one another, but not with the main workload. We need to investigate why this is happening, not to try hiding it.
…
-- Gleb.I agree but then we shouldn't mark those issues as hinted handoff issues but a scheduling issues and someone familiar with scheduler is a better suited person to investigate them.
I can't agree more.
The reason to replay hints before repair is to reduce the amount of data that needs to be repaired.
So at least in my view we should replay hints before repair.
Most helpful comment
On Wed, Feb 17, 2021 at 07:24:03PM -0800, Piotr Jastrzębski wrote:
--
Gleb.