Commit: current master (2ad09d0bf83c5bd47b93efd3021858d69a00ab6d)
cqlsh> create keyspace ks with replication = {'class': 'EverywhereStrategy'};
cqlsh> create table ks.t (pk int primary key);
Result (on the bootstrapping node):
INFO 2021-04-22 11:14:28,917 [shard 2] repair - Repair 30 out of 513 ranges, id=[id=3, uuid=c1c82429-eba5-4bbe-8421-9346132456e2], shard=2, keyspace=ks, table={t}, range=(7764568581638937715, 7776790348170289012]
scylla: ./seastar/include/seastar/core/gate.hh:101: future<> seastar::gate::close(): Assertion `!_stopped && "seastar::gate::close() cannot be called more than once"' failed.
Aborting on shard 0.
Backtrace:
0x24c6ffb
0x24c6fbc
0x2492c7d
0x24b496c
0x24b49ea
0x24b49ba
0x24b4985
0x7fcb4144ea8f
/lib64/libc.so.6+0x3c9e4
/lib64/libc.so.6+0x25894
/lib64/libc.so.6+0x25768
/lib64/libc.so.6+0x34e75
0xf36a61
0x1e0c252
0x1e0bf31
0x1e0bddd
0x1e0bae6
0x1dcc587
0x1dcb643
0x244ed26
0x2548656
decoded:
seastar::gate::close() at main.cc:?
repair_meta::stop() at row_level.cc:?
repair_meta::repair_row_level_stop(gms::inet_address, seastar::basic_sstring<char, unsigned int, 15u, true>, seastar::basic_sstring<char, unsigned int, 15u, true>, nonwrapping_interval<dht::token>) at row_level.cc:?
row_level_repair::run()::{lambda()#1}::operator()() const::{lambda(gms::inet_address const&)#1}::operator()(gms::inet_address const) const at row_level.cc:?
seastar::future<void> seastar::parallel_for_each<__gnu_cxx::__normal_iterator<gms::inet_address*, std::vector<gms::inet_address, std::allocator<gms::inet_address> > >, row_level_repair::run()::{lambda()#1}::operator()() const::{lambda(gms::inet_address const&)#1}>(__gnu_cxx::__normal_iterator<gms::inet_address*, std::vector<gms::inet_address, std::allocator<gms::inet_address> > >, seastar::future, row_level_repair::run()::{lambda()#1}::operator()() const::{lambda(gms::inet_address const&)#1}&&) at row_level.cc:?
row_level_repair::run()::{lambda()#1}::operator()() const at row_level.cc:?
seastar::async<row_level_repair::run()::{lambda()#1}>(seastar::thread_attributes, std::decay&&, (std::decay<row_level_repair::run()::{lambda()#1}>::type&&)...)::{lambda()#1}::operator()() const at row_level.cc:?
seastar::noncopyable_function<void ()>::operator()() const at future.cc:?
seastar::thread_context::main() at thread.cc:?
@slivne @asias Could someone look at this? It is blocking CDC fix.
@haaawk The issue is gone with the PR https://github.com/scylladb/scylla/pull/8536. You guys can continue to test cdc + repair based node ops, on top of this. I see cdc fails with the PR.
INFO 2021-04-22 15:42:29,450 [shard 0] cdc - Inserting new generation data at UUID f5bf81d0-94a1-402d-aa65-ae2f86eb8883
INFO 2021-04-22 15:42:29,456 [shard 0] init - Shutting down storage service notifications
INFO 2021-04-22 15:42:29,456 [shard 0] init - Shutting down storage service notifications was successful
INFO 2021-04-22 15:42:29,456 [shard 0] init - Shutting down system distributed keyspace
INFO 2021-04-22 15:42:29,456 [shard 0] init - Shutting down system distributed keyspace was successful
INFO 2021-04-22 15:42:29,456 [shard 0] init - Shutting down gossiping
...
INFO 2021-04-22 15:42:29,669 [shard 0] init - Shutting down sighup
INFO 2021-04-22 15:42:29,669 [shard 0] init - Shutting down sighup was successful
ERROR 2021-04-22 15:42:29,669 [shard 0] init - Startup failed: exceptions::unavailable_exception (Cannot achieve consistency level for cl ALL. Requires 2, alive 0)
@asias
Is it ok that the bootstrapping node self netaddr appears in row_level_repair::run master.all_nodes()?
Is it ok that it appears there twice?
Is it ok for the self ip address be present in bootstrap_with_repair old_endpoints_in_local_dc?
And from there it gets to neighbors in the everywhere_topology case.
@asias
With #8536 Everywhere tables stop working completely (RBO and without RBO).
@asias
With #8536 Everywhere tables stop working completely (RBO and without RBO).
See https://github.com/scylladb/scylla/issues/8533.
I think 8536 exposed more problem with Everywhere topology.
@asias
Is it ok that the bootstrapping node self netaddr appears in row_level_repair::runmaster.all_nodes()?
yes.
Is it ok that it appears there twice?
no.
Is it ok for the self ip address be present in
bootstrap_with_repairold_endpoints_in_local_dc?
No. We have bugs with Everywhere topolgy. https://github.com/scylladb/scylla/pull/8536. That's why we it showed up in old endpoints list.
And from there it gets to
neighborsin theeverywhere_topologycase.
Why are you pointing to #8533? It refers to your own branch, how can you be so sure that you didn't break something on that branch with your custom code? The issue in #8533 may be with your code, not with how currently Everywhere works on master.
I tested Everywhere tables extensively without RBO and _everything worked._
Why are you pointing to #8533?
Because it shows the problem with everywhere strategy.
It refers to your own branch, how can you be so sure that you didn't break something on that branch with your custom code? The issue in #8533 may be with your code, not with how currently Everywhere works on
master.
I did not say my customer code did not break anything. I was just saying I found issues.
I tested Everywhere tables extensively without RBO and _everything worked._
I tested latest https://github.com/scylladb/scylla/pull/8536. Everywhere strategy read issue is gone. Repair based node ops + cdc is working too. @kbr- @haaawk . Can you try?
@asias Thanks. I checked simple bootstrapping of few nodes and operations with CDC, it seems to work. I'm gonna try rerunning all the tests I did with my patchset rebased on top of this PR with RBO enabled on Monday.
Most helpful comment
@asias Thanks. I checked simple bootstrapping of few nodes and operations with CDC, it seems to work. I'm gonna try rerunning all the tests I did with my patchset rebased on top of this PR with RBO enabled on Monday.