Scylla: nodetool repair and repair process stuck on node for 3 days

Created on 21 Jul 2020  Â·  78Comments  Â·  Source: scylladb/scylla

Installation details
Scylla version (or git commit hash): 4.2.rc1-0.20200709.51d4efc3212 with build-id fe5f4c9282b21b22bbcc986f8c43836f6d6cf32d
Cluster size: 6
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-03396516d7bd6a5fa

Job longevity-50GB-4d had nodetool repair command and repair process on node8 stuck for 3 days.

After cluster successfully added new node and remove old one, on node 8 corruptthenrepair nemesis started.
this nemesis stopped the scylla, then remove several sstables:

2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !NOTICE  | sudo:  centos : TTY=unknown ; PWD=/home/centos ; USER=root ; COMMAND=/bin/sh -c find /var/lib/scylla/data/keyspace_zstd/standard1-* -maxdepth 1 -type f
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session closed for user root
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !NOTICE  | sudo:  centos : TTY=unknown ; PWD=/home/centos ; USER=root ; COMMAND=/bin/rm -f /var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-12064-big-CompressionInfo.db /var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-12064-big-Data.db /var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-12064-big-Digest.crc32 /var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-12064-big-Filter.db /var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-12064-big-Index.db /var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-12064-big-Scylla.db /var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-12064-big-Statistics.db /var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-12064-big-Summary.db
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !NOTICE  | sudo:  centos : (command continued) /var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-12064-big-TOC.txt
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session closed for user root
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !NOTICE  | sudo:  centos : TTY=unknown ; PWD=/home/centos ; USER=root ; COMMAND=/bin/sh -c find /var/lib/scylla/data/keyspace_deflate/standard1-* -maxdepth 1 -type f
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session closed for user root
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !NOTICE  | sudo:  centos : TTY=unknown ; PWD=/home/centos ; USER=root ; COMMAND=/bin/rm -f /var/lib/scylla/data/keyspace_deflate/standard1-47c4c7f0c76e11eaa99400000000000a/mc-9104-big-CompressionInfo.db /var/lib/scylla/data/keyspace_deflate/standard1-47c4c7f0c76e11eaa99400000000000a/mc-9104-big-Data.db /var/lib/scylla/data/keyspace_deflate/standard1-47c4c7f0c76e11eaa99400000000000a/mc-9104-big-Digest.crc32 /var/lib/scylla/data/keyspace_deflate/standard1-47c4c7f0c76e11eaa99400000000000a/mc-9104-big-Filter.db /var/lib/scylla/data/keyspace_deflate/standard1-47c4c7f0c76e11eaa99400000000000a/mc-9104-big-Index.db /var/lib/scylla/data/keyspace_deflate/standard1-47c4c7f0c76e11eaa99400000000000a/mc-9104-big-Scylla.db /var/lib/scylla/data/keyspace_deflate/standard1-47c4c7f0c76e11eaa99400000000000a/mc-9104-big-Statistics.db /var/lib/scylla/data/keyspace_deflate/standard1-47c4c7f0c76e11eaa99400000000000a/mc-9104-big-Summary.db
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !NOTICE  | sudo:  centos : (command continued) /var/lib/scylla/data/keyspace_deflate/standard1-47c4c7f0c76e11eaa99400000000000a/mc-9104-big-TOC.txt
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session closed for user root
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !NOTICE  | sudo:  centos : TTY=unknown ; PWD=/home/centos ; USER=root ; COMMAND=/bin/sh -c find /var/lib/scylla/data/mview/users-* -maxdepth 1 -type f
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session closed for user root
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !NOTICE  | sudo:  centos : TTY=unknown ; PWD=/home/centos ; USER=root ; COMMAND=/bin/rm -f /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-12081-big-CompressionInfo.db /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-12081-big-Data.db /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-12081-big-Digest.crc32 /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-12081-big-Filter.db /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-12081-big-Index.db /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-12081-big-Scylla.db /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-12081-big-Statistics.db /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-12081-big-Summary.db /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-12081-big-TOC.txt
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session closed for user root
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !NOTICE  | sudo:  centos : TTY=unknown ; PWD=/home/centos ; USER=root ; COMMAND=/bin/sh -c find /var/lib/scylla/data/keyspace_snappy/standard1-* -maxdepth 1 -type f
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session closed for user root
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !NOTICE  | sudo:  centos : TTY=unknown ; PWD=/home/centos ; USER=root ; COMMAND=/bin/rm -f /var/lib/scylla/data/keyspace_snappy/standard1-240356b0c76e11eaa443000000000006/mc-12468-big-CompressionInfo.db /var/lib/scylla/data/keyspace_snappy/standard1-240356b0c76e11eaa443000000000006/mc-12468-big-Data.db /var/lib/scylla/data/keyspace_snappy/standard1-240356b0c76e11eaa443000000000006/mc-12468-big-Digest.crc32 /var/lib/scylla/data/keyspace_snappy/standard1-240356b0c76e11eaa443000000000006/mc-12468-big-Filter.db /var/lib/scylla/data/keyspace_snappy/standard1-240356b0c76e11eaa443000000000006/mc-12468-big-Index.db /var/lib/scylla/data/keyspace_snappy/standard1-240356b0c76e11eaa443000000000006/mc-12468-big-Scylla.db /var/lib/scylla/data/keyspace_snappy/standard1-240356b0c76e11eaa443000000000006/mc-12468-big-Statistics.db /var/lib/scylla/data/keyspace_snappy/standard1-240356b0c76e11eaa443000000000006/mc-12468-big-Summary.db
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !NOTICE  | sudo:  centos : (command continued) /var/lib/scylla/data/keyspace_snappy/standard1-240356b0c76e11eaa443000000000006/mc-12468-big-TOC.txt
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session closed for user root
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !NOTICE  | sudo:  centos : TTY=unknown ; PWD=/home/centos ; USER=root ; COMMAND=/bin/sh -c find /var/lib/scylla/data/mview/users-* -maxdepth 1 -type f
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session closed for user root
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !NOTICE  | sudo:  centos : TTY=unknown ; PWD=/home/centos ; USER=root ; COMMAND=/bin/rm -f /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-11861-big-CompressionInfo.db /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-11861-big-Data.db /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-11861-big-Digest.crc32 /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-11861-big-Filter.db /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-11861-big-Index.db /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-11861-big-Scylla.db /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-11861-big-Statistics.db /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-11861-big-Summary.db /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-11861-big-TOC.txt
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session closed for user root
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !NOTICE  | sudo:  centos : TTY=unknown ; PWD=/home/centos ; USER=root ; COMMAND=/bin/systemctl start scylla-server.service
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | systemd: Starting Scylla Server...
2020-07-16T15:54:48+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | sshd[16095]: Accepted publickey for centos from 172.30.0.119 port 34856 ssh2: RSA SHA256:g7OmDbWC8qLUZHxjMohQe3LsQHfcOJbG1ZHqodB6smc

and then start the scylla again.

after scylla was started,
command nodetool repair was executed:

2020-07-16T15:56:21+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 9] compaction - Compacted 2 sstables to [/var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-12721-big-Data.db:level=0, ]. 169MB to 165MB (~97% of original) in 37489ms = 4MB/s. ~788224 total partitions merged to 771692.
2020-07-16T15:56:21+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 9] compaction - Compacting [/var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-9123-big-Data.db:level=0, /var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-12721-big-Data.db:level=0, ]
2020-07-16T15:56:22+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 8] compaction - Compacted 2 sstables to [/var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-12720-big-Data.db:level=0, ]. 170MB to 166MB (~97% of original) in 29786ms = 5MB/s. ~791040 total partitions merged to 774541.
2020-07-16T15:56:22+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 8] compaction - Compacting [/var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-9206-big-Data.db:level=0, /var/lib/scylla/data/keyspace_zstd/standard1-0a2b95e0c76e11eab639000000000008/mc-12720-big-Data.db:level=0, ]
2020-07-16T15:56:25+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 0] repair - starting user-requested repair for keyspace mview, repair id 1, options {{ trace -> false}, { jobThreads -> 1}, { incremental -> false}, { parallelism -> parallel}, { primaryRange -> false}}
2020-07-16T15:56:25+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 0] repair - Repair 1 out of 781 ranges, id=1, shard=0, keyspace=mview, table={users_by_first_name, users, users_by_last_name}, range=(-inf, -9221656703687820791]
2020-07-16T15:56:25+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 0] repair - Repair 2 out of 781 ranges, id=1, shard=0, keyspace=mview, table={users_by_first_name, users, users_by_last_name}, range=(-9221656703687820791, -9221540607665831158]
2020-07-16T15:56:25+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 0] repair - Repair 3 out of 781 ranges, id=1, shard=0, keyspace=mview, table={users_by_first_name, users, users_by_last_name}, range=(-9221540607665831158, -9214664096464859708]

and the latest messages regarding process repair was logged

2020-07-16T15:57:04+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 3] repair - Repair 146 out of 781 ranges, id=1, shard=3, keyspace=mview, table={users_by_first_name, users, users_by_last_name}, range=(-6169013921077494909, -6165898005295029693]
2020-07-16T15:57:04+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 3] repair - Repair 147 out of 781 ranges, id=1, shard=3, keyspace=mview, table={users_by_first_name, users, users_by_last_name}, range=(-6165898005295029693, -6156095151061582055]
2020-07-16T15:57:04+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 3] compaction - Compacting [/var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-14465-big-Data.db:level=0, /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-14451-big-Data.db:level=0, ]
2020-07-16T15:57:04+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 3] repair - Repair 148 out of 781 ranges, id=1, shard=3, keyspace=mview, table={users_by_first_name, users, users_by_last_name}, range=(-6156095151061582055, -6141009918163534763]
2020-07-16T15:57:05+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 3] compaction - Compacted 2 sstables to [/var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-14493-big-Data.db:level=0, ]. 19kB to 14kB (~74% of original) in 161ms = 88kB/s. ~256 total partitions merged to 37.
2020-07-16T15:57:05+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 3] compaction - Compacting [/var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-14479-big-Data.db:level=0, /var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-14493-big-Data.db:level=0, ]
2020-07-16T15:57:05+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 3] compaction - Compacted 2 sstables to [/var/lib/scylla/data/mview/users-d9790090c76d11ea9bc0000000000007/mc-14521-big-Data.db:level=0, ]. 22kB to 17kB (~77% of original) in 159ms = 111kB/s. ~256 total partitions merged to 52.
2020-07-16T15:57:05+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !INFO    | scylla: [shard 3] compaction - Compacting [/var/lib/scylla/data/mview/users_by_first_name-dbd8e990c76d11ea8f06000000000005/mc-12253-big-Data.db:level=0, /var/lib/scylla/data/mview/users_by_first_name-dbd8e990c76d11ea8f06000000000005/mc-12239-big-Data.db:level=0, ]

in a day of running

2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 1] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace_snappy.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 13] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace_deflate.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 8] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace_zstd.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 7] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace1.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 2] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace1.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 6] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace_lz4.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 0] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace1.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 2] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace_snappy.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 2] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace_deflate.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 5] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace1.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 5] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace_zstd.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 2] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace_deflate.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 0] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace_zstd.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 9] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace1.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 0] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace_lz4.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 6] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace_lz4.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 0] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace_lz4.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 7] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace_snappy.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)
2020-07-17T09:09:23+00:00  longevity-tls-50gb-4d-4-2-db-node-be967b52-8 !ERR     | scylla: [shard 8] storage_proxy - Exception when communicating with 172.30.0.21, to read from keyspace_snappy.standard1: seastar::named_semaphore_timed_out (Semaphore timed out: _system_read_concurrency_sem)

This messages continue to appeare for next 2 days. the nodetool repair was not termintated till the end of test.

job logs are here:

+-------------+------------------------------------------------------------------------------------------------------------------------------+
| Cluster set | Link |
+-------------+------------------------------------------------------------------------------------------------------------------------------+
| db-cluster | https://cloudius-jenkins-test.s3.amazonaws.com/be967b52-37f6-4816-a44d-59dfb5adb5a5/20200720_154041/db-cluster-be967b52.zip |
| monitor-set | https://cloudius-jenkins-test.s3.amazonaws.com/be967b52-37f6-4816-a44d-59dfb5adb5a5/20200720_154041/monitor-set-be967b52.zip |
| loader-set | https://cloudius-jenkins-test.s3.amazonaws.com/be967b52-37f6-4816-a44d-59dfb5adb5a5/20200720_154041/loader-set-be967b52.zip |
| sct-runner | https://cloudius-jenkins-test.s3.amazonaws.com/be967b52-37f6-4816-a44d-59dfb5adb5a5/20200720_154041/sct-runner-be967b52.zip |
+
Backport candidate bug high repair

All 78 comments

@denesb Could this be a manifestation of https://github.com/scylladb/scylla/issues/6875?

Either that or the repair deadlock issue https://github.com/scylladb/scylla/issues/6272.

@bhalevy / @denesb

  • is 4.2 missing any backport ?

    • is the specific 4.2 version mentioned missing a fix ?

@slivne the fix for #6272 was deemed too risky to be backported anywhere. As for the fixes for the view update generator deadlock, those that were merged already (#6707 and #6774) were backported to both 4.1 and 4.2. The remaining one (#6603) is not merged yet, and as it depends on the fix for #6272 it probably won't be backported either.

I see this issue also on master,

Installation details
Scylla version (or git commit hash):Scylla version 666.development-0.20200720.5371be71e90 with build-id f9c63d98faa71525d9ad186790a5a78df2fb4d61
Cluster size: 4
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-057bac86e0f131898

Job longevity-lwt-3h-test had nodetool repair command and repair was stuck,
last repair message I see is "Repair 791 out of 791 ranges" and the command is not exiting

| 20190101_010101 | prometheus  | https://cloudius-jenkins-test.s3.amazonaws.com/8f57099d-24d6-44b0-abe6-3b38ea611b4b/prometheus_snapshot_20200720_175916.tar.gz                                                                                               |
| 20200720_175026 | grafana     | https://cloudius-jenkins-test.s3.amazonaws.com/8f57099d-24d6-44b0-abe6-3b38ea611b4b/20200720_175026/grafana-screenshot-overview-20200720_175026-longevity-lwt-3h-master-monitor-node-8f57099d-1.png                          |
| 20200720_175026 | grafana     | https://cloudius-jenkins-test.s3.amazonaws.com/8f57099d-24d6-44b0-abe6-3b38ea611b4b/20200720_175026/grafana-screenshot-scylla-per-server-metrics-nemesis-20200720_175401-longevity-lwt-3h-master-monitor-node-8f57099d-1.png |
| 20200720_175931 | grafana     | https://cloudius-jenkins-test.s3.amazonaws.com/8f57099d-24d6-44b0-abe6-3b38ea611b4b/20200720_175931/grafana-screenshot-overview-20200720_175931-longevity-lwt-3h-master-monitor-node-8f57099d-1.png                          |
| 20200720_175931 | grafana     | https://cloudius-jenkins-test.s3.amazonaws.com/8f57099d-24d6-44b0-abe6-3b38ea611b4b/20200720_175931/grafana-screenshot-scylla-per-server-metrics-nemesis-20200720_180238-longevity-lwt-3h-master-monitor-node-8f57099d-1.png |
| 20200720_180818 | db-cluster  | https://cloudius-jenkins-test.s3.amazonaws.com/8f57099d-24d6-44b0-abe6-3b38ea611b4b/20200720_180818/db-cluster-8f57099d.zip                                                                                                  |
| 20200720_180818 | loader-set  | https://cloudius-jenkins-test.s3.amazonaws.com/8f57099d-24d6-44b0-abe6-3b38ea611b4b/20200720_180818/loader-set-8f57099d.zip                                                                                                  |
| 20200720_180818 | monitor-set | https://cloudius-jenkins-test.s3.amazonaws.com/8f57099d-24d6-44b0-abe6-3b38ea611b4b/20200720_180818/monitor-set-8f57099d.zip                                                                                                 |
| 20200720_180818 | sct-runner  | https://cloudius-jenkins-test.s3.amazonaws.com/8f57099d-24d6-44b0-abe6-3b38ea611b4b/20200720_180818/sct-runner-8f57099d.zip                                                                                                  |

The issue occurred at the 200gb longevity in master:
Scylla version (or git commit hash): 666.development-0.20200716.09d3a3543 with build-id cfbf8d38ae5f73d91c6528cc755b40a7651a4c53
Cluster size: 4
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0b9c2aca9ce37eaa1

In the test, after we decommissioned a node, we added a new node to replace it. In that node, a repair has started:

2020-07-20T21:07:01+00:00  longevity-200gb-48h-verify-limited--db-node-f91ca30f-7 !INFO    | scylla: [shard 0] storage_service - JOINING: Starting to bootstrap...
2020-07-20T21:07:01+00:00  longevity-200gb-48h-verify-limited--db-node-f91ca30f-7 !INFO    | scylla: [shard 0] repair - bootstrap_with_repair: started with keyspaces={ks_truncate, system_auth, system_traces, keyspace1, system_distributed}
2020-07-20T21:07:01+00:00  longevity-200gb-48h-verify-limited--db-node-f91ca30f-7 !INFO    | scylla: [shard 0] repair - bootstrap_with_repair: started with keyspace=ks_truncate, nr_ranges=756

But after a few minutes, the repair just abruptly ends:

2020-07-20T21:18:27+00:00  longevity-200gb-48h-verify-limited--db-node-f91ca30f-7 !INFO    | scylla: [shard 8] repair - Repair 232 out of 756 ranges, id=[id=5, uuid=215969b6-621e-4141-a429-73b2527625f8], 
shard=8, keyspace=system_distributed, table={cdc_generations, view_build_status, cdc_streams}, range=(3874549276583660680, 3931910205549554549]
2020-07-20T21:18:27+00:00  longevity-200gb-48h-verify-limited--db-node-f91ca30f-7 !INFO    | scylla: [shard 10] repair - Repair 208 out of 756 ranges, id=[id=5, uuid=215969b6-621e-4141-a429-73b2527625f8],
 shard=10, keyspace=system_distributed, table={cdc_generations, view_build_status, cdc_streams}, range=(4578487690810104135, 4579314905837149700]
2020-07-20T21:18:27+00:00  longevity-200gb-48h-verify-limited--db-node-f91ca30f-7 !INFO    | scylla: [shard 8] repair - Repair 233 out of 756 ranges, id=[id=5, uuid=215969b6-621e-4141-a429-73b2527625f8], 
shard=8, keyspace=system_distributed, table={cdc_generations, view_build_status, cdc_streams}, range=(3870896775600473007, 3874549276583660680]

The node, as a result, never finished it's initialization as well.

Logs:

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                 Log links for testrun with test id f91ca30f-2781-4796-ad89-e399a5f08b2d                                                                                                  |
+-----------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Date            | Log type    | Link                                                                                                                                                                                                                                     |
+-----------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 20190101_010101 | prometheus  | https://cloudius-jenkins-test.s3.amazonaws.com/f91ca30f-2781-4796-ad89-e399a5f08b2d/prometheus_snapshot_20200721_161730.tar.gz                                                                                                           |
| 20200721_160515 | grafana     | https://cloudius-jenkins-test.s3.amazonaws.com/f91ca30f-2781-4796-ad89-e399a5f08b2d/20200721_160515/grafana-screenshot-overview-20200721_160515-longevity-200gb-48h-verify-limited--monitor-node-f91ca30f-1.png                          |
| 20200721_160515 | grafana     | https://cloudius-jenkins-test.s3.amazonaws.com/f91ca30f-2781-4796-ad89-e399a5f08b2d/20200721_160515/grafana-screenshot-scylla-per-server-metrics-nemesis-20200721_160851-longevity-200gb-48h-verify-limited--monitor-node-f91ca30f-1.png |
| 20200721_164011 | db-cluster  | https://cloudius-jenkins-test.s3.amazonaws.com/f91ca30f-2781-4796-ad89-e399a5f08b2d/20200721_164011/db-cluster-f91ca30f.zip                                                                                                              |
| 20200721_164011 | loader-set  | https://cloudius-jenkins-test.s3.amazonaws.com/f91ca30f-2781-4796-ad89-e399a5f08b2d/20200721_164011/loader-set-f91ca30f.zip                                                                                                              |
| 20200721_164011 | monitor-set | https://cloudius-jenkins-test.s3.amazonaws.com/f91ca30f-2781-4796-ad89-e399a5f08b2d/20200721_164011/monitor-set-f91ca30f.zip                                                                                                             |
| 20200721_164011 | sct-runner  | https://cloudius-jenkins-test.s3.amazonaws.com/f91ca30f-2781-4796-ad89-e399a5f08b2d/20200721_164011/sct-runner-f91ca30f.zip                                                                                                              |
+-----------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

@denesb

  • So #6272 , Avi is ok with backporting to 4.2 if its needed, according to QA they did not hit this in 4.1 - does this add up - do we need this backported to 4.1 as well ?
  • With regards to #6603 are you working on that or someone else - I see some referencing commits - but am not sure if its solved or not.

@denesb

* [#6892 (comment)](https://github.com/scylladb/scylla/issues/6892#issuecomment-661849832)

* So #6272 , Avi is ok with backporting to 4.2 if its needed, according to QA they did not hit this in 4.1 - does this add up - do we need this backported to 4.1 as well ?

In theory all releases using row-level repair are affected, this includes 4.1. Releases that also have repair-based node operations might be more affected.

* With regards to #6603 are you working on that or someone else - I see some referencing commits - but am not sure if its solved or not.

The fix is on the mailing list: [PATCH v5 00/10] db/view: view_update_generator: make staging reader evictable

@nyh please expedite review

Does f488eaebafe4fdb4355c8ef36770f5807934e3fa close this issue?

@bhalevy its the last repair deadlock related issue that was waiting to be merged so it should.

Issue reproduced for Scylla version 4.2.rc2-0.20200803.c5ed14bff with build-id 377a88e9234ddae99025e9a5e5f5eeda0cda88cb.

Job is still running (https://jenkins.scylladb.com/view/scylla-4.2/job/scylla-4.2/job/longevity/job/longevity-50gb-4days-test/10/). Node 5 was restarted and nodetool repair executed after node was started.
the process nodetool repair is running for 1d15h for that moment and continue to run

[centos@longevity-tls-50gb-4d-4-2-db-node-d47cc9a7-5 ~]$ ps aux | grep repair
centos     8128  0.0  0.0 113300  3112 ?        Ss   авг04   0:00 /bin/sh /opt/scylladb/share/cassandra/bin/nodetool -u cassandra -pw cassandra repair
centos     8238  0.2  0.1 1848884 169936 ?      Sl   авг04   4:56 java -javaagent:/opt/scylladb/share/cassandra/lib/jamm-0.3.2.jar -ea -cp /tmp/tmp.eCisRWP4gl:/opt/scylladb/share/cassandra/lib/airline-0.6.jar:/opt/scylladb/share/cassandra/lib/antlr-runtime-3.5.2.jar:/opt/scylladb/share/cassandra/lib/apache-cassandra-4.2.rc2-20200803.712738dfda.jar:/opt/scylladb/share/cassandra/lib/apache-cassandra.jar:/opt/scylladb/share/cassandra/lib/apache-cassandra-thrift-4.2.rc2-20200803.712738dfda.jar:/opt/scylladb/share/cassandra/lib/asm-6.2.jar:/opt/scylladb/share/cassandra/lib/caffeine-2.2.6.jar:/opt/scylladb/share/cassandra/lib/chronicle-bytes-1.16.3.jar:/opt/scylladb/share/cassandra/lib/chronicle-core-1.16.3-SNAPSHOT.jar:/opt/scylladb/share/cassandra/lib/chronicle-queue-4.16.3.jar:/opt/scylladb/share/cassandra/lib/chronicle-threads-1.16.0.jar:/opt/scylladb/share/cassandra/lib/chronicle-wire-1.16.1.jar:/opt/scylladb/share/cassandra/lib/commons-cli-1.1.jar:/opt/scylladb/share/cassandra/lib/commons-codec-1.9.jar:/opt/scylladb/share/cassandra/lib/commons-lang3-3.1.jar:/opt/scylladb/share/cassandra/lib/commons-math3-3.2.jar:/opt/scylladb/share/cassandra/lib/compress-lzf-0.8.4.jar:/opt/scylladb/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/opt/scylladb/share/cassandra/lib/concurrent-trees-2.4.0.jar:/opt/scylladb/share/cassandra/lib/disruptor-3.0.1.jar:/opt/scylladb/share/cassandra/lib/ecj-4.6.1.jar:/opt/scylladb/share/cassandra/lib/guava-18.0.jar:/opt/scylladb/share/cassandra/lib/HdrHistogram-2.1.9.jar:/opt/scylladb/share/cassandra/lib/high-scale-lib-1.0.6.jar:/opt/scylladb/share/cassandra/lib/hppc-0.5.4.jar:/opt/scylladb/share/cassandra/lib/jackson-core-asl-1.9.13.jar:/opt/scylladb/share/cassandra/lib/jackson-mapper-asl-1.9.13.jar:/opt/scylladb/share/cassandra/lib/jamm-0.3.2.jar:/opt/scylladb/share/cassandra/lib/javax.inject.jar:/opt/scylladb/share/cassandra/lib/jbcrypt-0.3m.jar:/opt/scylladb/share/cassandra/lib/jcl-over-slf4j-1.7.7.jar:/opt/scylladb/share/cassandra/lib/jctools-core-1.2.1.jar:/opt/scylladb/share/cassandra/lib/jflex-1.6.0.jar:/opt/scylladb/share/cassandra/lib/jna-4.2.2.jar:/opt/scylladb/share/cassandra/lib/joda-time-2.4.jar:/opt/scylladb/share/cassandra/lib/json-simple-1.1.jar:/opt/scylladb/share/cassandra/lib/jstackjunit-0.0.1.jar:/opt/scylladb/share/cassandra/lib/libthrift-0.9.2.jar:/opt/scylladb/share/cassandra/lib/log4j-over-slf4j-1.7.7.jar:/opt/scylladb/share/cassandra/lib/logback-classic-1.1.3.jar:/opt/scylladb/share/cassandra/lib/logback-core-1.1.3.jar:/opt/scylladb/share/cassandra/lib/lz4-1.3.0.jar:/opt/scylladb/share/cassandra/lib/metrics-core-3.1.5.jar:/opt/scylladb/share/cassandra/lib/metrics-jvm-3.1.5.jar:/opt/scylladb/share/cassandra/lib/metrics-logback-3.1.5.jar:/opt/scylladb/share/cassandra/lib/netty-all-4.0.56.Final.jar:/opt/scylladb/share/cassandra/lib/ohc-core-0.5.1.jar:/opt/scylladb/share/cassandra/lib/ohc-core-j8-0.5.1.jar:/opt/scylladb/share/cassandra/lib/reporter-config3-3.0.3.jar:/opt/scylladb/share/cassandra/lib/reporter-config-base-3.0.3.jar:/opt/scylladb/share/cassandra/lib/scylla-driver-core-3.7.1-scylla-2-shaded.jar:/opt/scylladb/share/cassandra/lib/scylla-tools-4.2.rc2-20200803.712738dfda.jar:/opt/scylladb/share/cassandra/lib/sigar-1.6.4.jar:/opt/scylladb/share/cassandra/lib/slf4j-api-1.7.7.jar:/opt/scylladb/share/cassandra/lib/snakeyaml-1.11.jar:/opt/scylladb/share/cassandra/lib/snappy-java-1.1.1.7.jar:/opt/scylladb/share/cassandra/lib/snowball-stemmer-1.3.0.581.1.jar:/opt/scylladb/share/cassandra/lib/ST4-4.0.8.jar:/opt/scylladb/share/cassandra/lib/stream-2.5.2.jar:/opt/scylladb/share/cassandra/lib/stress.jar:/opt/scylladb/share/cassandra/lib/thrift-server-0.3.7.jar:/opt/scylladb/share/cassandra/*.jar: -Xmx128m -XX:ParallelGCThreads=1 -Dcassandra.storagedir= -Dlogback.configurationFile=logback-tools.xml org.apache.cassandra.tools.NodeTool -p 7199 -u cassandra -pw cassandra repair
centos    28234  0.0  0.0 112832  2520 pts/0    S+   07:25   0:00 grep --color=auto repair

Node is available by ssh [email protected]

Monitoring stack: http://34.251.178.230:3000/d/Kh2Hae4Mk/scylla-per-server-metrics-nemesis-master?orgId=1&from=now-2d&to=now&var-by=instance&var-cluster=&var-dc=All&var-node=All&var-shard=All&var-sct_tags=DisruptionEvent
node log:
node5.log.tar.gz

@bhalevy @denesb Could you take a look on running node?

@aleksbykov based on the metrics this is something else. I don't see any reads queued on the streaming semaphore, there doesn't appear to be a reader deadlock.

@Asias He asias@scylladb.com ?

On Thu, Aug 6, 2020 at 11:21 AM Botond Dénes notifications@github.com
wrote:

@aleksbykov https://github.com/aleksbykov based on the metrics this is
something else. I don't see any reads queued on the streaming semaphore,
there doesn't appear to be a reader deadlock.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/6892#issuecomment-669786252,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AA2OCCGINK46TLK6WTLD64LR7JRW7ANCNFSM4PDPC26Q
.

I see active reads and paused reader eviction in the streaming semaphore between 15:30-16:45 GMT and no activity for the remainder of the test.

We lost some lines of log in https://github.com/scylladb/scylla/files/5033235/node5.log.tar.gz. Starting message of repair job id 11 was not even shown. It was repair with mv table again:

repair - Repair 719 out of 775 ranges, id=11, shard=0, keyspace=mview, table={users, users_by_first_name, users_by_last_name}, range=(8079827648274913491, 8094329519117371080]

@asias this log was copied from node5 /var/log/messages.

The version used contains all:

commit f488eaebafe4fdb4355c8ef36770f5807934e3fa
Merge: fe127a215 929cdd3a1
Author: Nadav Har'El <[email protected]>
Date:   Sun Jul 26 14:11:41 2020 +0300

    merge: db/view: view_update_generator: make staging reader evictable


commit 5ebe2c28d145cc55d8f8fc86837cef5f5936aa7c
Author: Botond Dénes <[email protected]>
Date:   Mon Jul 6 16:51:08 2020 +0300

    db/view: view_update_generator: re-balance wait/signal on the register semaphore



commit a9c7a1a86c426a235f20d8a742cfdeba56930672
Merge: c2939c67b fbbc86e18
Author: Avi Kivity <[email protected]>
Date:   Wed Jun 24 12:54:58 2020 +0300

    Merge "repair: row_level: prevent deadlocks when repairing homogenous nodes" from Botond

@asias this log was copied from node5 /var/log/messages.

Thanks. @avikivity we see lost of logging lines in SCT test multiple times in different tests. Any ideas? The log is copied so it excludes SCT lost some of the logs.

What kind of loss? syslog squashing duplicate message, or just disappearing lines?

Are we using rsyslog? Maybe it's UDP packet loss.

What kind of loss? syslog squashing duplicate message, or just disappearing lines?

Some of the lines are disappearing.

Are we using rsyslog? Maybe it's UDP packet loss.

I am not sure. @aleksbykov can you check?

We use to get logs on sct runner: rsyslog
action(type="omfwd" Target="10.0.246.104" Port="33136" Protocol="tcp")

What does that line mean?

Additionally forward messages to remote rsyslog server over tcp, which is running on jenkins builder or sct-runner.

Well, maybe check logs on the rsyslog server. Maybe it is dropping messages.

It looks some messages could be lost:

Aug 04 16:24:08 longevity-tls-50gb-4d-4-2-db-node-d47cc9a7-5 systemd[1]: Started System Logging Service.
-- Subject: Unit rsyslog.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit rsyslog.service has finished starting up.
-- 
-- The start-up result is done.
Aug 04 16:30:23 longevity-tls-50gb-4d-4-2-db-node-d47cc9a7-5 rsyslogd[1351]: imjournal: begin to drop messages due to rate-limiting
Aug 04 16:34:09 longevity-tls-50gb-4d-4-2-db-node-d47cc9a7-5 rsyslogd[1351]: imjournal: 19354 messages lost due to rate-limiting
Aug 04 16:38:07 longevity-tls-50gb-4d-4-2-db-node-d47cc9a7-5 rsyslogd[1351]: imjournal: begin to drop messages due to rate-limiting
Aug 04 16:44:10 longevity-tls-50gb-4d-4-2-db-node-d47cc9a7-5 rsyslogd[1351]: imjournal: 10922 messages lost due to rate-limiting
Aug 04 17:45:18 longevity-tls-50gb-4d-4-2-db-node-d47cc9a7-5 rsyslogd[1351]: imjournal: journal reloaded... [v8.24.0-52.el7_8.2 try http://www.rsyslog.com/e/0 ]
Aug 05 04:20:50 longevity-tls-50gb-4d-4-2-db-node-d47cc9a7-5 rsyslogd[1351]: imjournal: journal reloaded... [v8.24.0-52.el7_8.2 try http://www.rsyslog.com/e/0 ]
Aug 05 14:46:35 longevity-tls-50gb-4d-4-2-db-node-d47cc9a7-5 rsyslogd[1351]: imjournal: journal reloaded... [v8.24.0-52.el7_8.2 try http://www.rsyslog.com/e/0 ]
Aug 06 01:10:06 longevity-tls-50gb-4d-4-2-db-node-d47cc9a7-5 rsyslogd[1351]: imjournal: journal reloaded... [v8.24.0-52.el7_8.2 try http://www.rsyslog.com/e/0 ]

So disable rate limiting.

Could you take a look on running node. The repair process is still running. Job could be set as failed. But while cluster are alive, you can find what is the problem?

Node is available by ssh [email protected]

@asias please note you can access the node (hopefully its still there - and check what you need)

I tried to login with no luck. Looks like the node is destroyed.

@aleksbykov Will SCT generate a core when we suspect a hang (It will be helpful if SCT support this. It can be manual trigger)?

@asias yes, we have such functionality, but this command doesn't run with timeout, because it is difficult to determine exact time of command run for different jobs (from 4h to 7d)
will try to reproduce the issue with coredump

@asias Issue again reprdouced. Can you check the node4 : 18.202.57.193

i didn't yet trigger coredump, thought may be you want to get a look on running instances

@asias Issue again reprdouced. Can you check the node4 : 18.202.57.193

I can not login with ssh.

do you have ssh key?

@asias Sorry, wrong IP. Use this one: 34.253.205.35

do you have ssh key?

Of course, I have a ssh key. It is probably the qa firewall blocked me. I need to update my ip address there. You can send me an instruction how to login privately over email. I guessed multiple qa keys and failed.

Due to issues like https://github.com/scylladb/scylla/issues/7012 https://github.com/scylladb/scylla/issues/7016, I can not get the repair_meta objects.

This is what I can get from the _streaming_concurrency_sem. All looks idle.

(gdb) scylla databases;
    0 (database*)0x60d009436010
    1 (database*)0x601009436010
    2 (database*)0x602009436010
    3 (database*)0x603009436010
    4 (database*)0x604009436010
    5 (database*)0x605009436010
    6 (database*)0x606009436010
    7 (database*)0x607009436010
    8 (database*)0x608009436010
    9 (database*)0x609009436010
   10 (database*)0x60a009436010
   11 (database*)0x60b009436010
   12 (database*)0x60c009436010
   13 (database*)0x60d009436010

(gdb) p (*(database*)0x60d009436010)._streaming_concurrency_sem._wait_list._size
$128 = 0
(gdb) p (*(database*)0x60d009436010)._streaming_concurrency_sem._resources
$129 = {
  count = 10, 
  memory = 171001774
}
(gdb) p (*(database*)0x601009436010)._streaming_concurrency_sem._wait_list._size
$130 = 0
(gdb) p (*(database*)0x601009436010)._streaming_concurrency_sem._resources
$131 = {
  count = 10, 
  memory = 171001774
}
(gdb) p (*(database*)0x602009436010)._streaming_concurrency_sem._wait_list._size
$132 = 0
(gdb) p (*(database*)0x602009436010)._streaming_concurrency_sem._resources
$133 = {
  count = 10, 
  memory = 171001774
}
(gdb) p (*(database*)0x603009436010)._streaming_concurrency_sem._wait_list._size
$134 = 0
(gdb) p (*(database*)0x603009436010)._streaming_concurrency_sem._resources
$135 = {
  count = 10, 
  memory = 171001774
}
(gdb) p (*(database*)0x605009436010)._streaming_concurrency_sem._wait_list._size
$136 = 0
(gdb) p (*(database*)0x605009436010)._streaming_concurrency_sem._resources
$137 = {
  count = 10, 
  memory = 171001774
}
(gdb) p (*(database*)0x606009436010)._streaming_concurrency_sem._wait_list._size
$138 = 0
(gdb) p (*(database*)0x606009436010)._streaming_concurrency_sem._resources
$139 = {
  count = 10, 
  memory = 171001774
}
(gdb) p (*(database*)0x607009436010)._streaming_concurrency_sem._wait_list._size
$140 = 0
(gdb) p (*(database*)0x607009436010)._streaming_concurrency_sem._resources
$141 = {
  count = 10, 
  memory = 171001774
}
(gdb) p (*(database*)0x608009436010)._streaming_concurrency_sem._wait_list._size
$142 = 0
(gdb) p (*(database*)0x608009436010)._streaming_concurrency_sem._resources
$143 = {
  count = 10, 
  memory = 171001774
}
(gdb) p (*(database*)0x609009436010)._streaming_concurrency_sem._wait_list._size
$144 = 0
(gdb) p (*(database*)0x609009436010)._streaming_concurrency_sem._resources
$145 = {
  count = 10, 
  memory = 171001774
}
(gdb) p (*(database*)0x60a009436010)._streaming_concurrency_sem._wait_list._size
$146 = 0
(gdb) p (*(database*)0x60a009436010)._streaming_concurrency_sem._resources
$147 = {
  count = 10, 
  memory = 171001774
}
(gdb) p (*(database*)0x60b009436010)._streaming_concurrency_sem._wait_list._size
$148 = 0
(gdb) p (*(database*)0x60b009436010)._streaming_concurrency_sem._resources
$149 = {
  count = 10, 
  memory = 171001774
}
(gdb) p (*(database*)0x60c009436010)._streaming_concurrency_sem._wait_list._size
$150 = 0
(gdb) p (*(database*)0x60c009436010)._streaming_concurrency_sem._resources
$151 = {
  count = 10, 
  memory = 171001774
}
(gdb) p (*(database*)0x60d009436010)._streaming_concurrency_sem._wait_list._size
$152 = 0
(gdb) p (*(database*)0x60d009436010)._streaming_concurrency_sem._resources
$153 = {
  count = 10, 
  memory = 171001774
}

I saved the core on node 34.253.205.35.

/var/lib/scylla/coredump/my1947.core

@aleksbykov I want to have a look, leave the machine alive for some time. Will let you know when I'm done with it.

@denesb Ok, will mark them to keep-alive

@asias / @aleksbykov lets upload the core to make sure we will not loose it - was the machine on-demand or spot.

@slivne
Instances are on_demand

I think it was automatically detected and uploaded.

2020-08-11 11:17:59.000: (CoreDumpEvent Severity.ERROR): node=Node longevity-tls-50gb-4d-reproduc-db-node-1097f011-4 [34.253.205.35 | 10.0.2.132] (seed: False)
corefile_url=
https://storage.cloud.google.com/upload.scylladb.com/core.gdb.0.6166c7bc45c541b696576204f8b1ebd0.47911.1597144679000000/core.gdb.0.6166c7bc45c541b696576204f8b1ebd0.47911.1597144679000000.gz
backtrace=           PID: 47911 (gdb)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Tue 2020-08-11 11:17:59 UTC (30s ago)
  Command Line: gdb
    Executable: /opt/rh/devtoolset-7/root/usr/bin/gdb
 Control Group: /user.slice/user-1000.slice/session-10848.scope
          Unit: session-10848.scope
         Slice: user-1000.slice
       Session: 10848
     Owner UID: 1000 (centos)
       Boot ID: 6166c7bc45c541b696576204f8b1ebd0
    Machine ID: df877a200226bc47d06f26dae0736ec9
      Hostname: longevity-tls-50gb-4d-reproduc-db-node-1097f011-4
      Coredump: /var/lib/systemd/coredump/core.gdb.0.6166c7bc45c541b696576204f8b1ebd0.47911.1597144679000000
       Message: Process 47911 (gdb) of user 0 dumped core.

                Stack trace of thread 47911:
                #0  0x00007f5cc8471387 raise (libc.so.6)
                #1  0x00007f5cc8472a78 abort (libc.so.6)
                #2  0x00007f5cc8a7f7d5 _ZN9__gnu_cxx27__verbose_terminate_handlerEv (libstdc++.so.6)
                #3  0x00007f5cc8a7d746 n/a (libstdc++.so.6)
                #4  0x00007f5cc8a7c6f9 n/a (libstdc++.so.6)
                #5  0x00007f5cc8a7d364 __gxx_personality_v0 (libstdc++.so.6)
                #6  0x00007f5cc88188a3 n/a (libgcc_s.so.1)
                #7  0x00007f5cc8818dd7 _Unwind_Resume (libgcc_s.so.1)
                #8  0x00000000005d619c _Z18location_completerP16cmd_list_elementPKcS2_ (gdb)
                #9  0x00000000005d62b8 _Z20expression_completerP16cmd_list_elementPKcS2_ (gdb)
                #10 0x00000000005d4fba _ZL22complete_line_internalPKcS0_i29complete_line_internal_reason (gdb)
                #11 0x00000000005d6584 _Z13complete_linePKcS0_i (gdb)
                #12 0x00000000005d67e4 _Z33readline_line_completion_functionPKci (gdb)
                #13 0x00007f5cca723641 rl_completion_matches (libreadline.so.6)
                #14 0x00007f5cca723874 rl_complete_internal (libreadline.so.6)
                #15 0x00007f5cca71b6de _rl_dispatch_subseq (libreadline.so.6)
                #16 0x00007f5cca71ba00 readline_internal_char (libreadline.so.6)
                #17 0x00007f5cca730f9d rl_callback_read_char (libreadline.so.6)
                #18 0x000000000062803e _ZL42gdb_rl_callback_read_char_wrapper_noexceptv (gdb)
                #19 0x00000000006280b9 _ZL33gdb_rl_callback_read_char_wrapperPv (gdb)
                #20 0x0000000000628640 _Z19stdin_event_handleriPv (gdb)
                #21 0x00000000006274c5 _ZL18gdb_wait_for_eventi (gdb)
                #22 0x000000000062770a _Z16gdb_do_one_eventv (gdb)
                #23 0x0000000000627755 _Z16start_event_loopv (gdb)
                #24 0x000000000067dcc8 _ZL21captured_command_loopPv (gdb)
                #25 0x00000000006297d3 _Z12catch_errorsPFiPvES_PKc11return_mask (gdb)
                #26 0x000000000067f1d6 _Z8gdb_mainP18captured_main_args (gdb)
                #27 0x000000000040eee1 main (gdb)
                #28 0x00007f5cc845d555 __libc_start_main (libc.so.6)
                #29 0x000000000040fa38 _start (gdb)
                #29 0x000000000040fa38 _start (gdb)

download_instructions=
gsutil cp gs://upload.scylladb.com/core.gdb.0.6166c7bc45c541b696576204f8b1ebd0.47911.1597144679000000/core.gdb.0.6166c7bc45c541b696576204f8b1ebd0.47911.1597144679000000.gz .
gunzip /var/lib/systemd/coredump/core.gdb.0.6166c7bc45c541b696576204f8b1ebd0.47911.1597144679000000.gz

I think it was automatically detected and uploaded.

2020-08-11 11:17:59.000: (CoreDumpEvent Severity.ERROR): node=Node longevity-tls-50gb-4d-reproduc-db-node-1097f011-4 [34.253.205.35 | 10.0.2.132] (seed: False)
corefile_url=
https://storage.cloud.google.com/upload.scylladb.com/core.gdb.0.6166c7bc45c541b696576204f8b1ebd0.47911.1597144679000000/core.gdb.0.6166c7bc45c541b696576204f8b1ebd0.47911.1597144679000000.gz
backtrace=           PID: 47911 (gdb)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Tue 2020-08-11 11:17:59 UTC (30s ago)
  Command Line: gdb
    Executable: /opt/rh/devtoolset-7/root/usr/bin/gdb
 Control Group: /user.slice/user-1000.slice/session-10848.scope
          Unit: session-10848.scope
         Slice: user-1000.slice
       Session: 10848
     Owner UID: 1000 (centos)
       Boot ID: 6166c7bc45c541b696576204f8b1ebd0
    Machine ID: df877a200226bc47d06f26dae0736ec9
      Hostname: longevity-tls-50gb-4d-reproduc-db-node-1097f011-4
      Coredump: /var/lib/systemd/coredump/core.gdb.0.6166c7bc45c541b696576204f8b1ebd0.47911.1597144679000000
       Message: Process 47911 (gdb) of user 0 dumped core.

                Stack trace of thread 47911:
                #0  0x00007f5cc8471387 raise (libc.so.6)
                #1  0x00007f5cc8472a78 abort (libc.so.6)
                #2  0x00007f5cc8a7f7d5 _ZN9__gnu_cxx27__verbose_terminate_handlerEv (libstdc++.so.6)
                #3  0x00007f5cc8a7d746 n/a (libstdc++.so.6)
                #4  0x00007f5cc8a7c6f9 n/a (libstdc++.so.6)
                #5  0x00007f5cc8a7d364 __gxx_personality_v0 (libstdc++.so.6)
                #6  0x00007f5cc88188a3 n/a (libgcc_s.so.1)
                #7  0x00007f5cc8818dd7 _Unwind_Resume (libgcc_s.so.1)
                #8  0x00000000005d619c _Z18location_completerP16cmd_list_elementPKcS2_ (gdb)
                #9  0x00000000005d62b8 _Z20expression_completerP16cmd_list_elementPKcS2_ (gdb)
                #10 0x00000000005d4fba _ZL22complete_line_internalPKcS0_i29complete_line_internal_reason (gdb)
                #11 0x00000000005d6584 _Z13complete_linePKcS0_i (gdb)
                #12 0x00000000005d67e4 _Z33readline_line_completion_functionPKci (gdb)
                #13 0x00007f5cca723641 rl_completion_matches (libreadline.so.6)
                #14 0x00007f5cca723874 rl_complete_internal (libreadline.so.6)
                #15 0x00007f5cca71b6de _rl_dispatch_subseq (libreadline.so.6)
                #16 0x00007f5cca71ba00 readline_internal_char (libreadline.so.6)
                #17 0x00007f5cca730f9d rl_callback_read_char (libreadline.so.6)
                #18 0x000000000062803e _ZL42gdb_rl_callback_read_char_wrapper_noexceptv (gdb)
                #19 0x00000000006280b9 _ZL33gdb_rl_callback_read_char_wrapperPv (gdb)
                #20 0x0000000000628640 _Z19stdin_event_handleriPv (gdb)
                #21 0x00000000006274c5 _ZL18gdb_wait_for_eventi (gdb)
                #22 0x000000000062770a _Z16gdb_do_one_eventv (gdb)
                #23 0x0000000000627755 _Z16start_event_loopv (gdb)
                #24 0x000000000067dcc8 _ZL21captured_command_loopPv (gdb)
                #25 0x00000000006297d3 _Z12catch_errorsPFiPvES_PKc11return_mask (gdb)
                #26 0x000000000067f1d6 _Z8gdb_mainP18captured_main_args (gdb)
                #27 0x000000000040eee1 main (gdb)
                #28 0x00007f5cc845d555 __libc_start_main (libc.so.6)
                #29 0x000000000040fa38 _start (gdb)
                #29 0x000000000040fa38 _start (gdb)

download_instructions=
gsutil cp gs://upload.scylladb.com/core.gdb.0.6166c7bc45c541b696576204f8b1ebd0.47911.1597144679000000/core.gdb.0.6166c7bc45c541b696576204f8b1ebd0.47911.1597144679000000.gz .
gunzip /var/lib/systemd/coredump/core.gdb.0.6166c7bc45c541b696576204f8b1ebd0.47911.1597144679000000.gz

I am not sure who generated the core dump. It looks like it is not the one I generated and stored /var/lib/scylla/coredump/my1947.core.

@asias yes, it looks like the core of gdb process not scylla

@asias yes, it looks like the core of gdb process not scylla

Can you help storing /var/lib/scylla/coredump/my1947.core somewhere after Botond handle over the machine to you.

@asias i uploaded the core. download link : https://storage.cloud.google.com/upload.scylladb.com/6892/my1947.core.gz it is compressed with pigz.

Will recheck and reupload.

Will recheck and reupload.

Just for the record the latest version this was reproduced with is 4.2.rc2-0.20200809.0ce379918, build-id: 9d98e70c2e1b2804e3c5c97cbd90db1fb31df2cd.

It is crazy how fragile thread debugging is. On the server, with gdb 8.0 (devtoolset-7-gdb) only the following sequence of commands worked:

$ gdb
(gdb) set auto-load safe-path /
(gdb) core-file ./my1947.core
(gdb) exec-file /opt/scylladb/libexec/scylla

This same command sequence is not working anymore with gdb 8.3 (devtoolset-9-gdb).

In dbuild, only this works:

$ gdb --core=./my1947.core /opt/scylladb/libexec/scylla

The only way to figure out what works is through endless trial-and-error.

(gdb) pi
>>> v = std_vector(gdb.parse_and_eval('_the_tracker->_range_parallelism_semaphores'))
>>> v
<__main__.std_vector object at 0x7f2e9d7a2fd0>
>>> for i, s in enumerate(v):
...     gdb.write('[{}] {}\n'.format(i, str(s['_count'])))
... 
[0] 25
[1] 25
[2] 1
[3] 25
[4] 25
[5] 25
[6] 25
[7] 25
[8] 25
[9] 25
[10] 25
[11] 24
[12] 25
[13] 25

It seeems two shards have repair activity.

@aleksbykov I have already downloaded it locally. If @asias is also done with the machine it can be shut down.

@aleksbykov I have already downloaded it locally. If @asias is also done with the machine it can be shut down.

Yes, we can destroy it now.

@asias @denesb Thanks. I am going to destroy cluster.

(gdb) p $downcast_vptr(0x60201bc62d80)
$1 = (seastar::thread_context *) 0x60201bc62d80
(gdb) python switch('*$1')
switch to (seastar::thread_context*) 0x000060201bc62d80
fiber select 0x000060200359f5a0 0x000060200359ff80 0x000060200359f680 0x000000000056d328 0x000060200359f7b0 0x0000602008578bd8 0x000060200359f4a0 0x000000000312a7a9
(gdb) bt
#0  0x000000000312a7a9 in seastar::jmp_buf_link::switch_out (this=<optimized out>) at /jenkins/workspace/scylla-4.2/next/scylla/seastar/src/core/thread.cc:137
#1  0x000000000312a309 in seastar::thread_context::switch_out (this=<optimized out>) at /jenkins/workspace/scylla-4.2/next/scylla/seastar/src/core/thread.cc:246
#2  0x0000000002e0f1c3 in seastar::internal::future_base::do_wait (this=this@entry=0x60200359f5a0) at /jenkins/workspace/scylla-4.2/next/scylla/seastar/src/core/future.cc:223
#3  0x00000000025c2863 in seastar::future<>::wait() (this=0x60200359f5a0) at /jenkins/workspace/scylla-4.2/next/scylla/seastar/include/seastar/core/future.hh:1424
#4  seastar::future<>::wait() (this=0x60200359f5a0) at /jenkins/workspace/scylla-4.2/next/scylla/seastar/include/seastar/core/future.hh:1416
#5  seastar::future<>::get() (this=0x60200359f5a0) at /jenkins/workspace/scylla-4.2/next/scylla/seastar/include/seastar/core/future.hh:1390
#6  row_level_repair::run()::{lambda()#1}::operator()() const (__closure=<optimized out>) at repair/row_level.cc:2569
#7  0x00000000025c38a4 in std::__invoke_impl<void, row_level_repair::run()::{lambda()#1}>(std::__invoke_other, row_level_repair::run()::{lambda()#1}&&) (__f=...) at /usr/include/c++/10/bits/invoke.h:60
#8  std::__invoke<row_level_repair::run()::{lambda()#1}>(std::__invoke_result&&, (row_level_repair::run()::{lambda()#1}&&)...) (__fn=...) at /usr/include/c++/10/bits/invoke.h:95
#9  std::__apply_impl<row_level_repair::run()::{lambda()#1}, std::tuple<>>(row_level_repair::run()::{lambda()#1}&&, std::tuple<>&&, std::integer_sequence<unsigned long>) (__t=..., __f=...) at /usr/include/c++/10/tuple:1723
#10 std::apply<row_level_repair::run()::{lambda()#1}, std::tuple<> >(row_level_repair::run()::{lambda()#1}&&, std::tuple<>&&) (__t=..., __f=...) at /usr/include/c++/10/tuple:1734
#11 seastar::futurize<void>::apply<row_level_repair::run()::{lambda()#1}>(row_level_repair::run()::{lambda()#1}&&, std::tuple<>&&) (args=..., func=...) at /jenkins/workspace/scylla-4.2/next/scylla/seastar/include/seastar/core/future.hh:1976
#12 seastar::async<row_level_repair::run()::{lambda()#1}>(seastar::thread_attributes, std::decay&&, (std::decay<row_level_repair::run()::{lambda()#1}>::type&&)...)::{lambda()#1}::operator()() const (

Repair is blocked here:
https://github.com/scylladb/scylla/blob/b052f3f5cea30a8b518981310870930742d5addd/repair/row_level.cc#L2567-L2569

@asias

(gdb) p &master
$3 = (repair_meta *) 0x60200359f870

EDIT: on shard 2

I inspected another repair instance, same backtrace:

(gdb) p &master
$2 = (repair_meta *) 0x60200313f870

node 5 is using the "bad" 5.8 kernel:

2020-08-04T16:23:56+00:00 longevity-tls-50gb-4d-4-2-db-node-d47cc9a7-5 !NOTICE | kernel: Linux version 5.8.0-1.el7.elrepo.x86_64 (mockbuild@Build64R7) (gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2), GNU ld version 2.32-16.el7 ) #1 SMP Sun Aug 2 18:18:16 EDT 2020

In frame 3:

(gdb) p this  
$13 = (seastar::future<> * const) 0x60200313f5a0  
(gdb) p this->_promise._task
$15 = (seastar::task *) 0x60200313f4b0
(gdb) p *$15
$16 = {                                                                                                                        
  _vptr.task = 0x73a060 <vtable for seastar::(anonymous namespace)::thread_wake_task+16>,
  _sg = {                                                                                                                      
    _id = 2
  } 
}        
(gdb) scylla find --resolve 0x60200313f4b0
thread 3, small (size <= 96), live (0x602000c92720 +64) 0x000000000073abe0 vtable for seastar::parallel_for_each_state + 16 
(gdb) p $downcast_vptr(0x602000c92720)      
$17 = (seastar::parallel_for_each_state *) 0x602000c92720
(gdb) p *$17                         
$18 = {                                                                                                                        
  <seastar::continuation_base<>> = {      
    <seastar::task> = {                                                                                                        
      _vptr.task = 0x73abe0 <vtable for seastar::parallel_for_each_state+16>,
      _sg = {                             
        _id = 2                            
      }                                                                                                                        
    },                   
    members of seastar::continuation_base<>:        
    _state = {                                  
      <seastar::future_state_base> = {          
        _u = {                                  
          st = seastar::future_state_base::state::future,
          ex = {                                
            _M_exception_object = 0x1           
          }                                     
        }                                       
      },            
      <seastar::internal::uninitialized_wrapper<std::tuple<> >> = {
        <seastar::internal::uninitialized_wrapper_base<std::tuple<>, true>> = {
          <std::tuple<>> = {<No data fields>}, <No data fields>}, <No data fields>}, 
    }                         
  },   
  members of seastar::parallel_for_each_state:
  _incomplete = size 0 = [],
  _result = {                
    <seastar::internal::promise_base_with_type<>> = {
      <seastar::internal::promise_base> = {
        _future = 0x60200313f5a0,                  
        _state = 0x60200313f5a8,
        _task = 0x60200313f4b0 
      }, 
    }, 
    members of seastar::promise<>:
    _local_state = {
      <seastar::future_state_base> = {
        _u = {
          st = seastar::future_state_base::state::invalid,
          ex = {
            _M_exception_object = 0x0
          }
        }
      }, 
      <seastar::internal::uninitialized_wrapper<std::tuple<> >> = {
        <seastar::internal::uninitialized_wrapper_base<std::tuple<>, true>> = {
          <std::tuple<>> = {<No data fields>}, <No data fields>}, <No data fields>}, 
    }
  },
  _ex = {
    _M_exception_object = 0x0
  }
}
(gdb) scylla task-queues 
   id name                             shares  tasks
   00 "main"                           1000.00 0
   01 "atexit"                         1000.00 0
   02 "streaming"                       200.00 0
   03 "compaction"                       67.82 0
   04 "mem_compaction"                 1000.00 0
   05 "statement"                      1000.00 0
   06 "memtable"                          9.77 0
   07 "memtable_to_cache"               200.00 0
(gdb) p *$17->_result->_state
$23 = {
  _u = {
    st = seastar::future_state_base::state::future,
    ex = {
      _M_exception_object = 0x1
    }
  }
}

The parallel_for_each state has no incomplete futures, yet the future of the parallel for each state is unavailable.

This parallel for each turned out to be a red herring. There is one future it is still waiting on:

(gdb) scylla fiber 0x6020163a0ac0
Stopping because loop is detected: task 0x0000602014659980 was seen before.
Starting task: (task*) 0x00006020163a0ac0 0x0000000000408058 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIJEEEZNS_6futureIJSt5tupleIJEEEE14discard_resultEvEUlDpOT_E_ZZNS7_14then_impl_nrvoISB_NS4_IJEEEEET0_OT_ENKUlvE_clEvEUlRS3_RSB_ONS_12future_stateIJS6_EEEE_JS6_EEE + 16 
#0  (task*) 0x0000602000c92840 0x000000000073abe0 vtable for seastar::parallel_for_each_state + 16 
#1  (task*) 0x00006020187ace38 0x000000000060e998 vtable for seastar::continuation<seastar::internal::promise_base_with_type<>, seastar::future<>::handle_exception<repair_writer::wait_for_writer_done()::{lambda(std::__exception_ptr::exception_ptr)#2}>(repair_writer::wait_for_writer_done()::{lambda(std::__exception_ptr::exception_ptr)#2}&&)::{lambda(auto:1)#1}, seastar::future<>::then_wrapped_nrvo<seastar::future<>, repair_writer::wait_for_writer_done()::{lambda(std::__exception_ptr::exception_ptr)#2}&&>(seastar::future<>&&)::{lambda()#1}::operator()() const::{lambda(seastar::internal::promise_base_with_type<>&, auto:1&, seastar::future_state<>&&)#1}> + 16 
#2  (task*) 0x000060200c5020c0 0x000000000060ecd8 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIJEEEZNS_6futureIJEE12finally_bodyIZN11repair_meta4stopEvEUlvE_Lb1EEclEOS5_EUlOT_E_ZZNS5_17then_wrapped_nrvoIS5_SD_EENS_8futurizeISB_E4typeEOT0_ENKUlvE_clEvEUlRS3_RSD_ONS_12future_stateIJEEEE_JEEE + 16 
#3  (task*) 0x0000602000c92720 0x000000000073abe0 vtable for seastar::parallel_for_each_state + 16 
#4  (task*) 0x000060200313f4b0 0x000000000073a060 vtable for seastar::(anonymous namespace)::thread_wake_task + 16 
#5  (task*) 0x0000602014659980 0x0000000000799e48 vtable for seastar::thread_context + 16 

Found no further pointers to task objects.
If you think there should be more, run `scylla fiber 0x0000602014659980 --verbose` to learn more.

The last future is blocked on:

(gdb) scylla find --resolve 0x6020163a0ac0
thread 3, small (size <= 192), live (0x602000ae6540 +152) 0x00000000003a6308 vtable for seastar::internal::when_all_state<seastar::internal::extract_values_from_futures_tuple<seastar::future<>, seastar::future<> >, seastar::future<>, seastar::future<> > 
+ 16 

(gdb) p $downcast_vptr(0x602000ae6540)      
$39 = (seastar::internal::when_all_state<seastar::internal::extract_values_from_futures_tuple<seastar::future<>, seastar::future<> >, seastar::future<>, seastar::future<> > *) 0x602000ae6540
(gdb) p $39->tuple
$41 = {
  <std::_Tuple_impl<0, seastar::future<>, seastar::future<> >> = {
    <std::_Tuple_impl<1, seastar::future<> >> = {
      <std::_Head_base<1, seastar::future<>, false>> = {
        _M_head_impl = {
          <seastar::internal::future_base> = {
            _promise = 0x0
          }, 
          <seastar::internal::warn_variadic_future<false>> = {<No data fields>}, 
          members of seastar::future<>:
          _state = {
            <seastar::future_state_base> = {
              _u = {
                st = seastar::future_state_base::state::future,
                ex = {
                  _M_exception_object = 0x1
                }
              }
            }, 
            <seastar::internal::uninitialized_wrapper<std::tuple<> >> = {
              <seastar::internal::uninitialized_wrapper_base<std::tuple<>, true>> = {
                <std::tuple<>> = {<No data fields>}, <No data fields>}, <No data fields>}, 
          }
        }
      }, <No data fields>}, 
    <std::_Head_base<0, seastar::future<>, false>> = {
      _M_head_impl = {
        <seastar::internal::future_base> = {
          _promise = 0x0
        }, 
        <seastar::internal::warn_variadic_future<false>> = {<No data fields>}, 
        members of seastar::future<>:
        _state = {
          <seastar::future_state_base> = {
            _u = {
              st = seastar::future_state_base::state::result,
              ex = {
                _M_exception_object = 0x3
              }
            }
          }, 
          <seastar::internal::uninitialized_wrapper<std::tuple<> >> = {
            <seastar::internal::uninitialized_wrapper_base<std::tuple<>, true>> = {
              <std::tuple<>> = {<No data fields>}, <No data fields>}, <No data fields>}, 
        }
      }
    }, <No data fields>}, <No data fields>}

The first future is unavailable, this is: write_end_of_stream(node_idx).

Full fiber:

(gdb) scylla fiber 0x60200ca0eb28
Stopping because loop is detected: task 0x0000602014659980 was seen before.
Starting task: (task*) 0x000060200ca0eb28 0x0000000000628068 vtable for seastar::continuation<seastar::internal::promise_base_with_type<>, seastar::future<>::finally_body<seastar::smp::submit_to<mutation_writer::multishard_writer::consume(unsigned int)::{lambda()#1}>(unsigned int, seastar::smp_submit_to_options, std::result_of&&)::{lambda()#1}, false>, seastar::future<>::then_wrapped_nrvo<seastar::future<>, {lambda()#1}>({lambda()#1}&&)::{lambda()#1}::operator()() const::{lambda(seastar::internal::promise_base_with_type<>&, {lambda()#1}&, seastar::future_state<>&&)#1}> + 16 
#0  (task*) 0x00006020008b8e38 0x0000000000628008 vtable for seastar::continuation<seastar::internal::promise_base_with_type<>, seastar::future<>::handle_exception<mutation_writer::multishard_writer::consume(unsigned int)::{lambda(std::__exception_ptr::exception_ptr)#2}>(mutation_writer::multishard_writer::consume(unsigned int)::{lambda(std::__exception_ptr::exception_ptr)#2}&&)::{lambda(auto:1)#1}, seastar::future<>::then_wrapped_nrvo<seastar::future<>, mutation_writer::multishard_writer::consume(unsigned int)::{lambda(std::__exception_ptr::exception_ptr)#2}&&>(seastar::future<>&&)::{lambda()#1}::operator()() const::{lambda(seastar::internal::promise_base_with_type<>&, auto:1&, seastar::future_state<>&&)#1}> + 16 
#1  (task*) 0x0000602002415c30 0x000000000051a950 vtable for seastar::continuation<seastar::internal::promise_base_with_type<>, seastar::internal::complete_when_all<seastar::internal::extract_values_from_futures_vector<seastar::future<> >, seastar::future<> >(std::vector<seastar::future<>, std::allocator<seastar::future<> > >&&, std::vector<seastar::future<>, std::allocator<seastar::future<> > >::iterator)::{lambda(auto:1)#1}, seastar::future<>::then_wrapped_nrvo<seastar::future<>, {lambda(auto:1)#1}>({lambda(auto:1)#1}&&)::{lambda()#1}::operator()() const::{lambda(seastar::internal::promise_base_with_type<>&, {lambda(auto:1)#1}&, seastar::future_state<>&&)#1}> + 16 
#2  (task*) 0x00006020163a0480 0x0000000000628088 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIJEEEZNS_6futureIJEE12finally_bodyIZN15mutation_writer17multishard_writerclEvEUlvE_Lb1EEclEOS5_EUlOT_E_ZZNS5_17then_wrapped_nrvoIS5_SE_EENS_8futurizeISC_E4typeEOT0_ENKUlvE_clEvEUlRS3_RSE_ONS_12future_stateIJEEEE_JEEE + 16 
#3  (task*) 0x0000602012e95dc8 0x0000000000627fe8 vtable for seastar::continuation<seastar::internal::promise_base_with_type<unsigned long>, mutation_writer::multishard_writer::operator()()::{lambda()#2}, seastar::future<>::then_impl_nrvo<{lambda()#2}, mutation_writer::multishard_writer::operator()()::{lambda()#2}<unsigned long> >({lambda()#2}&&)::{lambda()#1}::operator()() const::{lambda(seastar::internal::promise_base_with_type<unsigned long>&, {lambda()#2}&, seastar::future_state<>&&)#1}> + 16 
#4  (task*) 0x00006020152c4a80 0x00000000006283c8 vtable for seastar::internal::do_with_state<std::tuple<mutation_writer::multishard_writer, utils::phased_barrier::operation>, seastar::future<unsigned long> > + 16 
#5  (task*) 0x000060201c8378c0 0x000000000060dc58 vtable for seastar::continuation<seastar::internal::promise_base_with_type<>, repair_writer::create_writer(seastar::sharded<database>&, unsigned int)::{lambda(unsigned long)#2}, seastar::future<unsigned long>::then_impl_nrvo<{lambda(unsigned long)#2}, repair_writer::create_writer(seastar::sharded<database>&, unsigned int)::{lambda(unsigned long)#2}<> >({lambda(unsigned long)#2}&&)::{lambda()#1}::operator()() const::{lambda(seastar::internal::promise_base_with_type<>&, {lambda(unsigned long)#2}&, seastar::future_state<unsigned long>&&)#1}, unsigned long> + 16 
#6  (task*) 0x0000602001f42f00 0x000000000060e958 vtable for seastar::continuation<seastar::internal::promise_base_with_type<>, seastar::future<>::handle_exception<repair_writer::create_writer(seastar::sharded<database>&, unsigned int)::{lambda(std::__exception_ptr::exception_ptr)#3}>(repair_writer::create_writer(seastar::sharded<database>&, unsigned int)::{lambda(std::__exception_ptr::exception_ptr)#3}&&)::{lambda(auto:1)#1}, seastar::future<>::then_wrapped_nrvo<seastar::future<>, repair_writer::create_writer(seastar::sharded<database>&, unsigned int)::{lambda(std::__exception_ptr::exception_ptr)#3}&&>(seastar::future<>&&)::{lambda()#1}::operator()() const::{lambda(seastar::internal::promise_base_with_type<>&, auto:1&, seastar::future_state<>&&)#1}> + 16 
#7  (task*) 0x0000602000ae6580 0x00000000003a62e8 vtable for seastar::internal::when_all_state_component<seastar::future<> > + 16 
#8  (task*) 0x00006020163a0ac0 0x0000000000408058 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIJEEEZNS_6futureIJSt5tupleIJEEEE14discard_resultEvEUlDpOT_E_ZZNS7_14then_impl_nrvoISB_NS4_IJEEEEET0_OT_ENKUlvE_clEvEUlRS3_RSB_ONS_12future_stateIJS6_EEEE_JS6_EEE + 16 
#9  (task*) 0x0000602000c92840 0x000000000073abe0 vtable for seastar::parallel_for_each_state + 16 
#10 (task*) 0x00006020187ace38 0x000000000060e998 vtable for seastar::continuation<seastar::internal::promise_base_with_type<>, seastar::future<>::handle_exception<repair_writer::wait_for_writer_done()::{lambda(std::__exception_ptr::exception_ptr)#2}>(repair_writer::wait_for_writer_done()::{lambda(std::__exception_ptr::exception_ptr)#2}&&)::{lambda(auto:1)#1}, seastar::future<>::then_wrapped_nrvo<seastar::future<>, repair_writer::wait_for_writer_done()::{lambda(std::__exception_ptr::exception_ptr)#2}&&>(seastar::future<>&&)::{lambda()#1}::operator()() const::{lambda(seastar::internal::promise_base_with_type<>&, auto:1&, seastar::future_state<>&&)#1}> + 16 
#11 (task*) 0x000060200c5020c0 0x000000000060ecd8 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIJEEEZNS_6futureIJEE12finally_bodyIZN11repair_meta4stopEvEUlvE_Lb1EEclEOS5_EUlOT_E_ZZNS5_17then_wrapped_nrvoIS5_SD_EENS_8futurizeISB_E4typeEOT0_ENKUlvE_clEvEUlRS3_RSD_ONS_12future_stateIJEEEE_JEEE + 16 
#12 (task*) 0x0000602000c92720 0x000000000073abe0 vtable for seastar::parallel_for_each_state + 16 
#13 (task*) 0x000060200313f4b0 0x000000000073a060 vtable for seastar::(anonymous namespace)::thread_wake_task + 16 
#14 (task*) 0x0000602014659980 0x0000000000799e48 vtable for seastar::thread_context + 16 

One of the repair instance hangs on repair_row_level_stop below

2569             parallel_for_each(nodes_to_stop, [&] (const gms::inet_address& node) {
2570                 return master.repair_row_level_stop(node, _ri.keyspace, _cf_name, _range);
2571             }).get();

which in turn calls

 783     future<> stop() {
 784         auto gate_future = _gate.close();
 785         auto f1 = _sink_source_for_get_full_row_hashes.close();
 786         auto f2 = _sink_source_for_get_row_diff.close();
 787         auto f3 = _sink_source_for_put_row_diff.close();
 788         return when_all_succeed(std::move(gate_future), std::move(f1), std::move(f2), std::move(f3)).discard_result().finally([this] {
 789             return _repair_writer.wait_for_writer_done();
 790         });
 791     }

The core shows it was waiting for _repair_writer.wait_for_writer_done, and the flat mutation reader associated with _mq[node_idx] has had empty buffer and it was in the status of EOS. This suggests the consumer of this reader (sstable consumer which reads data from the reader and writes to disk) has read the data out.

 611     future<> wait_for_writer_done() {
 612         return parallel_for_each(boost::irange(unsigned(0), unsigned(_nr_peer_nodes)), [this] (unsigned node_idx) {
 613             return when_all_succeed(write_end_of_stream(node_idx), do_wait_for_writer_done(node_idx)).discard_result();
 614         }).handle_exception([this] (std::exception_ptr ep) {
 615             rlogger.warn("repair_writer: keyspace={}, table={}, wait_for_writer_done failed: {}",
 616                     _schema->ks_name(), _schema->cf_name(), ep);
 617             return make_exception_future<>(std::move(ep));
 618         });
 619     }

The sstable consumer:

86 future<> shard_writer::consume() {
 87     return _reader.peek(db::no_timeout).then([this] (mutation_fragment* mf_ptr) {
 88         if (mf_ptr) {
 89             return _consumer(std::move(_reader));
 90         }
 91         return make_ready_future<>();
 92     });
 529                 auto consumer = cs.make_interposer_consumer(metadata,
 530                         [t = std::move(t), use_view_update_path, adjusted_estimated_partitions] (flat_mutation_reader reader) {
 531                     sstables::shared_sstable sst = use_view_update_path ? t->make_streaming_staging_sstable() : t->make_streaming_sstable_for_write();
 532                     schema_ptr s = reader.schema();
 533                     auto& pc = service::get_local_streaming_priority();
 534                     return sst->write_components(std::move(reader), std::max(1ul, adjusted_estimated_partitions), s,
 535                                                  t->get_sstables_manager().configure_writer(),
 536                                                  encoding_stats{}, pc).then([sst] {
 537                         return sst->open_data();
 538                     }).then([t, sst] {
 539                         return t->add_sstable_and_update_cache(sst);
 540                     }).then([t, s, sst, use_view_update_path]() mutable -> future<> {
 541                         if (!use_view_update_path) {
 542                             return make_ready_future<>();
 543                         }
 544                         return _view_update_generator->local().register_staging_sstable(sst, std::move(t));
 545                     });
 546                 });

In other words, repair had sent end of stream to the reader above at line 534, but the sst->write_components or other future in the chain did not return, so shard_writer::consume did not return.

(gdb) p _view_update_generator                                                                                                                                                                                                                                $40 = (seastar::distributed *) 0x3409b30 <main::{lambda()#1}::operator()() const::{lambda()#2}::operator()() const::view_update_generator>                                                                                                                    (gdb) p $sharded_local($40)                                                                                                                                                                                                                                   
$41 = (db::view::view_update_generator *) 0x602000579e10                                                                                                                                                                                                      (gdb) p *$41 
(gdb) p $41->_registration_sem
$43 = {
  <seastar::named_semaphore_exception_factory> = {
    name = "view update generator"
  }, 
  members of seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>:
  _count = 0,
  _ex = {
    _M_exception_object = 0x0
  },
  _wait_list = {
    _front = {
      _M_t = {
        <std::__uniq_ptr_impl<seastar::expiring_fifo<seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::entry, seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::expiry_handler, std::chrono::_V2::steady_clock>::entry, std::default_delete<seastar::expiring_fifo<seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::entry, seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::expiry_handler, std::chrono::_V2::steady_clock>::entry> >> = {
          _M_t = {
            <std::_Tuple_impl<0, seastar::expiring_fifo<seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::entry, seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::expiry_handler, std::chrono::_V2::steady_clock>::entry*, std::default_delete<seastar::expiring_fifo<seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::entry, seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::expiry_handler, std::chrono::_V2::steady_clock>::entry> >> = {
              <std::_Tuple_impl<1, std::default_delete<seastar::expiring_fifo<seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::entry, seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::expiry_handler, std::chrono::_V2::steady_clock>::entry> >> = {
                <std::_Head_base<1, std::default_delete<seastar::expiring_fifo<seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::entry, seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::expiry_handler, std::chrono::_V2::steady_clock>::entry>, true>> = {
                  <std::default_delete<seastar::expiring_fifo<seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::entry, seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::expiry_handler, std::chrono::_V2::steady_clock>::entry>> = {<No data fields>}, <No data fields>}, <No data fields>}, 
              <std::_Head_base<0, seastar::expiring_fifo<seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::entry, seastar::basic_semaphore<seastar::named_semaphore_exception_factory>::expiry_handler, std::chrono::_V2::steady_clock>::entry*, false>> = {
                _M_head_impl = 0x0
              }, <No data fields>}, <No data fields>}
        }, <No data fields>}
    },
    _list = {
      _front_chunk = 0x602019a28000,
      _back_chunk = 0x602019a28000,
      _nchunks = 1,
      _free_chunks = 0x0,
      _nfree_chunks = 0
    },
    _on_expiry = {
      <seastar::named_semaphore_exception_factory> = {
        name = "view update generator"
      }, <No data fields>},
    _size = 24
  }
}

There are no available units and 24 waiters ... and 24 stuck repairs. It seems repairs are blocked on view generation.

(gdb) p &$41->_started
$46 = (seastar::future<> *) 0x602000579e30
(gdb) scylla ptr 0x602000579e30
thread 3, small (size <= 384), live (0x602000579e00 +48)
(gdb) scylla find --resolve 0x602000579e30
thread 3, small (size <= 56), live (0x60200dbf3d58 +24) 0x000000000051b058 vtable for seastar::continuation<seastar::internal::promise_base_with_type<>, seastar::future<>::finally_body<seastar::async<db::view::view_update_generator::start()::{lambda()#1}>(seastar::thread_attributes, std::decay&&, (std::decay<db::view::view_update_generator::start()::{lambda()#1}>::type&&)...)::{lambda()#3}, false>, seastar::future<>::then_wrapped_nrvo<seastar::future<>, {lambda()#3}>({lambda()#3}&&)::{lambda()#1}::operator()() const::{lambda(seastar::internal::promise_base_with_type<>&, {lambda()#3}&, seastar::future_state<>&&)#1}> + 16 
(gdb) scylla find --resolve 0x60200dbf3d58
thread 3, small (size <= 64), live (0x602020839580 +40) 0x000000000051b038 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIJEEEZNS_5asyncIZN2db4view21view_update_generator5startEvEUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayIT_E4typeEDpNSB_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSC_DpOSF_EUlvE0_ZZNS_6futureIJEE14then_impl_nrvoISS_SU_EET0_SP_ENKUlvE_clEvEUlRS3_RSS_ONS_12future_stateIJEEEE_JEEE + 16 
(gdb) scylla find --resolve 0x602020839580
thread 3, small (size <= 384), live (0x602009bf8c00 +304) 0x0000000000799e48 vtable for seastar::thread_context + 16 
(gdb) p (seastar::thread_context*)0x602009bf8c00
$47 = (seastar::thread_context *) 0x602009bf8c00
(gdb) python switch('*$47')
switch to (seastar::thread_context*) 0x0000602009bf8c00
fiber select 0x000060201679fc20 0x000060201679ff80 0x7ffffffffffffff8 0x0000000000000000 0x000060201679feb0 0x00006020135c0d20 0x000060201679fae0 0x000000000312a7a9
(gdb) bt 7
#0  0x000000000312a7a9 in seastar::jmp_buf_link::switch_out (this=<optimized out>) at /jenkins/workspace/scylla-4.2/next/scylla/seastar/src/core/thread.cc:137
#1  0x000000000312a309 in seastar::thread_context::switch_out (this=<optimized out>) at /jenkins/workspace/scylla-4.2/next/scylla/seastar/src/core/thread.cc:246
#2  0x0000000002e0f1c3 in seastar::internal::future_base::do_wait (this=this@entry=0x60201679fc20) at /jenkins/workspace/scylla-4.2/next/scylla/seastar/src/core/future.cc:223
#3  0x0000000001e9196e in seastar::future<>::wait() (this=0x60201679fc20) at /jenkins/workspace/scylla-4.2/next/scylla/seastar/include/seastar/core/future.hh:1424
#4  seastar::future<>::wait() (this=<optimized out>) at /jenkins/workspace/scylla-4.2/next/scylla/seastar/include/seastar/core/future.hh:1416
#5  seastar::future<>::get() (this=<optimized out>) at /jenkins/workspace/scylla-4.2/next/scylla/seastar/include/seastar/core/future.hh:1390
#6  operator() (__closure=<optimized out>) at db/view/view_update_generator.cc:42
(More stack frames follow...)
  2         while (!_as.abort_requested()) {
  1             if (_sstables_with_tables.empty()) {
 42                  _pending_sstables.wait().get();
  1             }

This looks very wrong. The view generation loop is idle (waiting for more sstables), yet the repair instances that registered sstables are still blocked.

I double checked and 4.2.rc2 has fea83f6ae, which is supposed to fix the register semaphore. But maybe there is another bug I overlooked, or the fix itself introduced one.

I think the bug is in this loop:

 50             for (auto table_it = _sstables_with_tables.begin(); table_it != _sstables_with_tables.end(); table_it = _sstables_with_tables.erase(table_it)) {

This will roughly translate to:

auto table_it = _sstables_with_tables.begin();
label: LOOP_START;
//loop body
table_it = _sstables_with_tables.erase(table_it);
if (table_it != _sstables_with_tables.end()) {
    goto LOOP_START;
}

But while loop body is running, new sstables can be added to _sstables_wit_tables for the processed table. These will be discarded and hence the associated count units will be leaked.

I think the bug is in this loop:

 50             for (auto table_it = _sstables_with_tables.begin(); table_it != _sstables_with_tables.end(); table_it = _sstables_with_tables.erase(table_it)) {

This will roughly translate to:

auto table_it = _sstables_with_tables.begin();
label: LOOP_START;
//loop body
table_it = _sstables_with_tables.erase(table_it);
if (table_it != _sstables_with_tables.end()) {
    goto LOOP_START;
}

But while loop body is running, new sstables can be added to _sstables_wit_tables for the processed table. These will be discarded and hence the associated count units will be leaked.

Actually for loops test their condition before running the loop body. Still I think the bug will be here somewhere. I will write exploratory unit tests to confirm.

On Mon, Aug 17, 2020 at 11:06 AM Botond Dénes notifications@github.com
wrote:

I think the bug is in this loop:

50 for (auto table_it = _sstables_with_tables.begin(); table_it != _sstables_with_tables.end(); table_it = _sstables_with_tables.erase(table_it)) {

This will roughly translate to:

auto table_it = _sstables_with_tables.begin();
label: LOOP_START;//loop body
table_it = _sstables_with_tables.erase(table_it);if (table_it != _sstables_with_tables.end()) {
goto LOOP_START;
}

But while loop body is running, new sstables can be added to
_sstables_wit_tables for the processed table. These will be discarded and
hence the associated count units will be leaked.

Actually for loops test their condition before running the loop body.
Still I think the bug will be here somewhere. I will write exploratory unit
tests to confirm.

I think register_staging_sstable() now potentially can race with the
iteration introduced here
https://github.com/scylladb/scylla/commit/fea83f6ae0ebc01bf339cea1b3afec7fed1ad614
.
For example, as we iterate through _sstables_with_tables,
register_staging_sstable() can introduce a new table or re-add a table
previously removed. Prior to regression, we'd copy all tables in advance,
then iterate through them.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/6892#issuecomment-674902625,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAKYA4YQUA24SYLQTWVIQRLSBE2ORANCNFSM4PDPC26Q
.

Reproduced with a unit test.

Patch on the list: [PATCH v1] view_update_generator: fix race between registering and processing sstables

Was this page helpful?
0 / 5 - 0 ratings