scylla-server.service start operation timed out: stuck after Completed migration of legacy schema tables

Created on 15 Jan 2019  路  17Comments  路  Source: scylladb/scylla

Installation details
Scylla version (or git commit hash): 3.0.0-0.20190114.d427a23d4
Cluster size: 1
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-043a1330dde5d121f

[centos@ip-172-30-0-22 ~]$ uname -a
Linux ip-172-30-0-22.ec2.internal 4.20.2-1.el7.elrepo.x86_64 #1 SMP Sun Jan 13 09:57:55 EST 2019 x86_64 x86_64 x86_64 GNU/Linux

  • Job log: http://jenkins.cloudius-systems.com:8080/job/scylla-3.0/job/3.0-test/job/scylla-ami-test/label=muninn/62/artifact/job-2019-01-14T23.52-c5e759d/job.log
    The repeated run 63 of this job (with same AMI-id) are all PASS.

    The artifact test scenario:

  • wait service to be up on AMI (PASS)

  • test_after_install, cs write & mixed workloads (PASS)
  • test_after_stop_start, cs write & mixed workloads (FAILED)
  • test_after_restart, cs write & mixed workloads (PASS)
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]:  [shard 0] legacy_schema_migrator - Completed migration of legacy schema tables
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal systemd[1]: scylla-server.service start operation timed out. Terminating.
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]:  [shard 0] compaction_manager - Asked to stop
Jan 14 22:16:10 ip-172-30-0-72.ec2.internal systemd[1]: scylla-server.service stop-final-sigterm timed out. Killing.
Jan 14 22:16:10 ip-172-30-0-72.ec2.internal systemd[1]: scylla-server.service: main process exited, code=killed, status=9/KILL
Jan 14 22:16:10 ip-172-30-0-72.ec2.internal systemd[1]: Failed to start Scylla Server.

```
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: Scylla version 3.0.0-0.20190114.d427a23d4 starting ...
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] snitch_logger - Ec2Snitch using region: us-east, zone: 1b.
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] init - Scylla API server listening on 127.0.0.1:10000 ...
.....
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 14] database - Row: max_vector_size: 32, internal_count: 5
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Populating Keyspace system_schema
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system_schema: Reading CF view_virtual_columns id=08843b63-45dc-3be2-9798-a0418295cfaa version=ac302a39-2fb0-3def-bf61-d0f348567f1a
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system_schema: Reading CF aggregates id=924c5587-2e3a-345b-b10c-12f37c1ba895 version=9556aaa6-f8df-381c-997e-fc7aedc43e05
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system_schema: Reading CF functions id=96489b79-80be-3e14-a701-66a0b9159450 version=76a4e4f7-0adf-381b-9da4-1066ada12806
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system_schema: Reading CF tables id=afddfb9d-bc1e-3068-8056-eed6c302ba09 version=c179c1d7-9503-3f66-a5b3-70e72af3392a
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system_schema: Reading CF dropped_columns id=5e7583b5-f3f4-3af1-9a39-b7e1d6f5f11f version=d4e2a352-3e28-3297-b746-461236df0a1e
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system_schema: Reading CF views id=9786ac1c-dd58-3201-a7cd-ad556410c985 version=6e894071-2cfc-3ce6-96a2-c5c1d7a86996
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system_schema: Reading CF triggers id=4df70b66-6b05-3251-95a1-32b54005fd48 version=72ceb49d-cfce-330b-8dbf-613aa8b16cd9
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system_schema: Reading CF types id=5a8b1ca8-6602-3f77-a045-9273d308917a version=4a083362-11d8-3c56-bf3b-071c1996ed2e
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system_schema: Reading CF keyspaces id=abac5682-dea6-31c5-b535-b3d6cffd0fb6 version=546e75bd-58af-3eb4-8017-5f7c54bad49c
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system_schema: Reading CF indexes id=0feb57ac-311f-382f-ba6d-9024d305702f version=a7860a8c-d89c-30b0-98ba-279ae67fae1e
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system_schema: Reading CF scylla_tables id=5d912ff1-f759-3665-b2c8-8042ab5103dd version=d15e4b38-4294-3cf3-b7e2-5485bd5c9367
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system_schema: Reading CF columns id=24101c25-a2ae-3af7-87c1-b40ee1aca33f version=a4fe505b-053c-3486-9d6a-bd283d568c10
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Populating Keyspace system
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF schema_functions id=d1b675fe-2b50-3ca4-8e49-c0f81989dcad version=e460c9b9-f16f-3fdb-b8c7-9726bf8bbd39
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF schema_usertypes id=3aa75225-4f82-350b-8d5c-430fa221fa0a version=e13d0c09-4360-3fdb-82e3-ce15a5d16646
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF schema_triggers id=0359bc71-7123-3ee1-9a4a-b9dfb11fc125 version=599a9774-a0ce-3c13-805c-4c41d6042e8b
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF schema_columns id=296e9c04-9bec-3085-827d-c17d3df2122a version=7841736d-7d32-31e8-9e25-3e9d47b9fc25
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF schema_keyspaces id=b0f22357-4458-3cdb-9631-c43e59ce3676 version=d18195e9-634f-31a6-87f8-5691d87376fc
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF peers id=37f71aca-7dc2-383b-a706-72528af04d4f version=739aff7c-afc5-30e3-bb9a-aa0b340feedb
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF local id=7ad54392-bcdd-35a6-8417-4e047860b377 version=3e8e18ea-3208-3205-8b39-e62193f76ba8
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF built_views id=4b3c50a9-ea87-3d76-9101-6dbc9c38494a version=7cb318b7-eab5-3637-9ad9-b9d99af3ad37
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF paxos id=b7b7f0c2-fd0a-3410-8c05-3ef614bb7c2d version=8174219f-df55-3f99-9185-c4b1cab65807
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF batchlog id=0290003c-977e-397c-ac3e-fdfdc01d626b version=6bcb22aa-5faa-3b0f-bcfe-8b93fec16d40
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF size_estimates id=618f817b-005f-3678-b8a4-53f3930b8e86 version=9d8e68fd-b530-3e22-ad57-8921880ef2f4
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF IndexInfo id=9f5c6374-d485-3229-9a0a-5094af9ad1e3 version=6d26f9cd-5481-3415-bd86-d5fc853eaade
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF large_partitions id=8a7fe624-96b0-34b1-b90e-f71bddcdd2d3 version=bbae8dd5-345f-3da6-83b1-eb9b26df351e
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF schema_columnfamilies id=45f5b360-24bc-3f83-a363-1034ea4fa697 version=b57bc185-37ec-3429-a019-38c2e3fa61cf
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF views_builds_in_progress id=b7f2c108-78cd-3c80-9cd5-d609b2bd149c version=b3f8840c-04f3-3120-ae35-3c74d35052d1
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF range_xfers id=55d76438-4e55-3f8b-9f6e-676d4af3976d version=c28c4e27-fc89-3655-be9b-83549e67e569
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF compactions_in_progress id=55080ab0-5d9c-3886-90a4-acb25fe1f77b version=4f0ae080-a13e-334e-a6b4-ab9ad38db42b
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF schema_aggregates id=a5fc57fc-9d6c-3bfd-a3fc-01ad54686fea version=2b664525-a49c-3c59-bb0a-76508543e624
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF scylla_views_builds_in_progress id=a04c7bfd-1e13-36c9-a44d-f22da352281d version=a763e6ee-f0ac-39f6-8297-f5d9f82c9f4c
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF hints id=2666e205-73ef-38b3-90fe-fecf96e8f0c7 version=22906784-f0f7-39a1-8376-9b21c8112dee
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF compaction_history id=b4dbb7b4-dc49-3fb5-b3bf-ce6e434832ca version=b7ecaf8a-49da-3462-8d54-510a017926b7
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF peer_events id=59dfeaea-8db2-3341-91ef-109974d81484 version=41ae6768-c385-316c-acc9-b5b749b3d85e
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] database - Keyspace system: Reading CF sstable_activity id=5a1ff267-ace0-3f12-8563-cfae6103c65e version=d69820df-9d03-3cd0-91b0-c078c030b708
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] legacy_schema_migrator - Moving 0 keyspaces from legacy schema tables to the new schema keyspace (system_schema)
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] legacy_schema_migrator - Dropping legacy schema tables
Jan 14 21:59:41 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] legacy_schema_migrator - Completed migration of legacy schema tables
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal systemd[1]: scylla-server.service start operation timed out. Terminating.
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 0] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 1] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 3] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 5] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 2] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 4] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 6] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 14] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 9] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 13] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 27] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 10] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 28] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 25] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 15] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 19] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 23] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 11] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 22] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 17] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 30] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 39] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 12] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 7] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 8] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 18] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 47] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 20] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 16] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 31] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 29] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 56] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 42] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 60] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 38] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 52] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 41] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 45] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 44] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 32] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 33] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 35] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 34] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 37] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 36] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 21] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 46] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 26] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 49] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 48] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 50] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 51] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 57] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 55] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 24] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 53] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 59] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 58] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 61] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 43] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 54] compaction_manager - Asked to stop
Jan 14 22:14:40 ip-172-30-0-72.ec2.internal scylla[12113]: [shard 40] compaction_manager - Asked to stop
Jan 14 22:16:10 ip-172-30-0-72.ec2.internal systemd[1]: scylla-server.service stop-final-sigterm timed out. Killing.
Jan 14 22:16:10 ip-172-30-0-72.ec2.internal systemd[1]: scylla-server.service: main process exited, code=killed, status=9/KILL
Jan 14 22:16:10 ip-172-30-0-72.ec2.internal systemd[1]: Failed to start Scylla Server.
Jan 14 22:16:10 ip-172-30-0-72.ec2.internal systemd[1]: Dependency failed for Scylla JMX.
Jan 14 22:16:10 ip-172-30-0-72.ec2.internal systemd[1]: Job scylla-jmx.service/start failed with result 'dependency'.
Jan 14 22:16:10 ip-172-30-0-72.ec2.internal systemd[1]: Unit scylla-server.service entered failed state.
Jan 14 22:16:10 ip-172-30-0-72.ec2.internal systemd[1]: scylla-server.service failed.
Jan 14 22:16:11 ip-172-30-0-72.ec2.internal systemd[1]: scylla-server.service holdoff time over, scheduling restart.
Jan 14 22:16:11 ip-172-30-0-72.ec2.internal systemd[1]: Stopped Scylla Server.
Jan 14 22:16:11 ip-172-30-0-72.ec2.internal systemd[1]: Starting Scylla Server...
Jan 14 22:16:13 ip-172-30-0-72.ec2.internal scylla_prepare[25738]: File /etc/sysconfig/irqbalance.scylla.orig already exists - not overwriting.
Jan 14 22:16:13 ip-172-30-0-72.ec2.internal scylla_prepare[25738]: Restarting irqbalance: going to ban the following IRQ numbers: 435, 436, 437, 438, 439, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 475, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 597, 598, 599, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 696, 695, 697, 694, 693, 691, 692, 698 ...
Jan 14 22:16:13 ip-172-30-0-72.ec2.internal scylla_prepare[25738]: Restarting irqbalance via systemctl...
````

bug

All 17 comments

In the failed subtest, scylla service failed to start once, and it's started again.

Two series workloads for each subtests:

  • /bin/cassandra-stress write n=10000 -mode cql3 native -pop seq=1..10000
  • /bin/cassandra-stress mixed n=10000 -mode cql3 native -pop seq=1..10000

The failed subtest is the second subtest, so there were two (write + mixed) workloads were executed.

Reproduced again on http://jenkins.cloudius-systems.com:8080/view/scylla-3.0/job/scylla-3.0/job/3.0-test/job/scylla-ami-test/label=muninn/65/artifact/job-2019-01-16T04.02-634507a/job.log

http://jenkins.cloudius-systems.com:8080/job/scylla-3.0/job/3.0-test/job/scylla-ami-test/label=muninn/65/console
failed due to this issue (Roy asked to write this).

Shlomi asked me to reproduce this manually, I will update the result later after try.

two failed subtest were all using i3.16xlarge instance type, other instance types are PASS.

Paste a normal log (no stuck) here:

....
[shard 0] database - Keyspace system: Reading CF sstable_activity id=5a1ff267-ace0-3f12-8563-cfae6103c65e version=d69820df-9d03-3cd0-91b0-c078c030b708
[shard 0] legacy_schema_migrator - Moving 0 keyspaces from legacy schema tables to the new schema keyspace (system_schema)
[shard 0] legacy_schema_migrator - Dropping legacy schema tables
[shard 0] legacy_schema_migrator - Completed migration of legacy schema tables

<stuck point>

[shard 0] database - Populating Keyspace system_distributed
[shard 0] database - Keyspace system_distributed: Reading CF view_build_status id=5582b59f-8e4e-35e1-b913-3acada51eb04 version=e0b6e628-60e0-39a2-ada0-9b6c13bdd9bc
[shard 0] database - Populating Keyspace system_auth
[shard 0] database - Keyspace system_auth: Reading CF roles id=5bc52802-de25-35ed-aeab-188eecebb090 version=2eb99e1c-78f3-3e16-9153-45ca49453f56
[shard 0] database - Keyspace system_auth: Reading CF role_members id=0ecdaa87-f8fb-3e60-88d1-74fb36fe5c0d version=85dc0934-cff3-373f-b223-e128f41f498c
[shard 0] database - Populating Keyspace system_traces
[shard 0] database - Keyspace system_traces: Reading CF sessions id=c5e99f16-8677-3914-b17e-960613512345 version=5a8ea5ad-6053-3266-97f2-30f0fce8e5ae
[shard 0] database - Keyspace system_traces: Reading CF node_slow_log_time_idx id=f9706768-aa1e-3d87-9e5c-51a3927c2870 version=61066486-af58-3e1f-b4f9-0c32dc5748dc
[shard 0] database - Keyspace system_traces: Reading CF events id=8826e8e9-e16a-3728-8753-3bc1fc713c25 version=de62ac22-9c9a-3191-9ea4-9cc2b81748df
[shard 0] database - Keyspace system_traces: Reading CF sessions_time_idx id=0ebf001c-c1d1-3693-9a63-c3d96ac53318 version=ec055d8f-cd2b-3ecb-8bcc-01fa296d16bc
[shard 0] database - Keyspace system_traces: Reading CF node_slow_log id=bfcc4e62-5b63-3aa1-a1c3-6f5e47f3325c version=63800e21-4db6-3ec6-83aa-85ff749d8377
[shard 0] database - Populating Keyspace keyspace1
[shard 0] database - Keyspace keyspace1: Reading CF standard1 id=630fa390-1978-11e9-9f58-000000000005 version=38f67ce5-e413-31e2-9ad5-364da8d72a62
[shard 22] compaction - Compacting [/var/lib/scylla/data/system/peers-37f71aca7dc2383ba70672528af04d4f/la-146-big-Data.db:level=0, /var/lib/scylla/data/system/peers-37f71aca7dc23
[shard 21] compaction - Compacting [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/la-765-big-Data.db:level=0, /var/lib/scylla/data/system/local-7ad54392bcdd3
[shard 22] compaction - Compacted 2 sstables to [/var/lib/scylla/data/system/peers-37f71aca7dc2383ba70672528af04d4f/system-peers-ka-208-Data.db:level=0, ]. 9626 bytes to 5141 (~5
[shard 21] compaction - Compacted 31 sstables to [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/system-local-ka-2625-Data.db:level=0, ]. 162763 bytes to 1664
systemd[1]: Started Scylla Server.
[shard 0] storage_service - Loading persisted ring state
[shard 0] storage_service - load_peer_features: peer_features size=0
[shard 0] storage_service - Checking remote features skipped, since this node is a new seed node which knows nothing about the cluster
[shard 0] storage_service - Starting up server gossip
[shard 21] compaction - Compacting [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/system-local-ka-2687-Data.db:level=0, /var/lib/scylla/data/system/local-7ad
[shard 0] messaging_service - Starting Messaging Service on port 7000
[shard 21] compaction - Compacted 2 sstables to [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/system-local-ka-2749-Data.db:level=0, ]. 22109 bytes to 11899 
[shard 0] database - Schema version changed to 6c79c7fd-5c53-3ae5-a27c-92b9da9bc712
[shard 0] gossip - Feature RANGE_TOMBSTONES is enabled
[shard 0] gossip - Feature LARGE_PARTITIONS is enabled
[shard 0] gossip - Feature COUNTERS is enabled
[shard 0] gossip - Feature DIGEST_MULTIPARTITION_READ is enabled
[shard 0] gossip - Feature CORRECT_COUNTER_ORDER is enabled
[shard 0] gossip - Feature SCHEMA_TABLES_V3 is enabled
[shard 0] gossip - Feature CORRECT_NON_COMPOUND_RANGE_TOMBSTONES is enabled
[shard 0] gossip - Feature WRITE_FAILURE_REPLY is enabled
[shard 0] gossip - Feature XXHASH is enabled
[shard 0] gossip - Feature ROLES is enabled
[shard 0] gossip - Feature LA_SSTABLE_FORMAT is enabled
[shard 0] gossip - Feature STREAM_WITH_RPC_STREAM is enabled
[shard 0] gossip - Feature MATERIALIZED_VIEWS is enabled
[shard 0] gossip - Feature INDEXES is enabled
[shard 0] storage_service - Using saved tokens {998954100053477882, 8982938624862447225, 8634782436875040141, 8216618535133216496, 8212948169093760316, 8101927364982902128, 78505
[shard 21] compaction - Compacting [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/la-2811-big-Data.db:level=0, /var/lib/scylla/data/system/local-7ad54392bcdd
[shard 21] compaction - Compacted 2 sstables to [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/la-2873-big-Data.db:level=0, ]. 16783 bytes to 11892 (~70% of 
[shard 21] compaction - Compacting [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/la-2935-big-Data.db:level=0, /var/lib/scylla/data/system/local-7ad54392bcdd
[shard 0] storage_service - Node 172.30.0.22 state jump to normal
[shard 21] compaction - Compacted 2 sstables to [/var/lib/scylla/data/system/local-7ad54392bcdd35a684174e047860b377/la-2997-big-Data.db:level=0, ]. 22375 bytes to 11888 (~53% of 
[shard 22] compaction - Compacting [/var/lib/scylla/data/system/peers-37f71aca7dc2383ba70672528af04d4f/la-270-big-Data.db:level=0, /var/lib/scylla/data/system/peers-37f71aca7dc23
[shard 0] storage_service - NORMAL: node is now in normal status
[shard 0] storage_service - SSTable data integrity checker is disabled.
[shard 0] gossip - Waiting for gossip to settle before accepting client requests...
[shard 22] compaction - Compacted 2 sstables to [/var/lib/scylla/data/system/peers-37f71aca7dc2383ba70672528af04d4f/la-332-big-Data.db:level=0, ]. 9954 bytes to 5141 (~51% of ori
[shard 0] gossip - No gossip backlog; proceeding
[shard 0] storage_service - Starting listening for CQL clients on 0.0.1.0:9042 (unencrypted)
[shard 0] storage_service - Thrift server listening on 0.0.0.0:9160 ...

How long does it take to pass the problem point in a passed test?

one second

Jan 16 10:23:51 ip-172-30-0-22.ec2.internal scylla[10431]:  [shard 0] database - Keyspace system: Reading CF sstable_activity id=5a1ff267-ace0-3f12-8563-c
Jan 16 10:23:51 ip-172-30-0-22.ec2.internal scylla[10431]:  [shard 0] legacy_schema_migrator - Moving 0 keyspaces from legacy schema tables to the new sch
Jan 16 10:23:51 ip-172-30-0-22.ec2.internal scylla[10431]:  [shard 0] legacy_schema_migrator - Dropping legacy schema tables
Jan 16 10:23:51 ip-172-30-0-22.ec2.internal scylla[10431]:  [shard 0] legacy_schema_migrator - Completed migration of legacy schema tables
Jan 16 10:23:52 ip-172-30-0-22.ec2.internal scylla[10431]:  [shard 0] database - Populating Keyspace system_distributed
Jan 16 10:23:52 ip-172-30-0-22.ec2.internal scylla[10431]:  [shard 0] database - Keyspace system_distributed: Reading CF view_build_status id=5582b59f-8e4
Jan 16 10:23:52 ip-172-30-0-22.ec2.internal scylla[10431]:  [shard 0] database - Populating Keyspace system_auth

Reproduced:

ssh -i ~/.ssh/scylla-qa-ec2.pem -l centos 3.88.188.168

It stuck 15 mins

Jan 16 11:22:55 ip-172-30-0-22.ec2.internal scylla[27998]:  [shard 0] legacy_schema_migrator - Completed migration of legacy schema tables
Jan 16 11:37:55 ip-172-30-0-22.ec2.internal systemd[1]: scylla-server.service start operation timed out. Terminating.
Jan 16 11:37:55 ip-172-30-0-22.ec2.internal scylla[27998]:  [shard 0] compaction_manager - Asked to stop
Jan 16 11:37:55 ip-172-30-0-22.ec2.internal scylla[27998]:  [shard 1] compaction_manager - Asked to stop
top - 11:27:13 up  1:08,  5 users,  load average: 11.21, 11.90, 8.35
Tasks: 576 total,   1 running, 256 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.5 us,  0.4 sy,  0.0 ni, 94.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 50359558+total, 49576096+free,  7017516 used,   817124 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 49398358+avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                               
 27998 scylla    20   0   16.0t   6.2g  51096 S 386.7  1.3  17:21.86 scylla                                                                                                
 28243 centos    20   0  162456   4940   3724 R   0.7  0.0   0:00.05 top     

reproducer script:

echo 'clean data/env'

 sudo systemctl stop scylla-server
 sudo rm -rf /var/lib/scylla/data/*
 sudo rm -rf /var/lib/scylla/commitlog/*
 sudo systemctl start scylla-server

#wait cqlsh is available

sleep 20

/bin/cassandra-stress write n=10000 -mode cql3 native -pop seq=1..10000
/bin/cassandra-stress mixed n=10000 -mode cql3 native -pop seq=1..10000

date >> d

sudo systemctl stop scylla-server
sudo systemctl start scylla-server
# check db log by another terminal 

another reproducer-v2.sh, then the stuck scylla won't be killed by systemd

echo 'clean data/env'

 sudo systemctl stop scylla-server
 sudo rm -rf /var/lib/scylla/data/* 
 sudo rm -rf /var/lib/scylla/commitlog/* 
 sudo systemctl start scylla-server

#wait cqlsh is available

sleep 20

/bin/cassandra-stress write n=10000 -mode cql3 native -pop seq=1..10000
/bin/cassandra-stress mixed n=10000 -mode cql3 native -pop seq=1..10000

date >> d

sudo systemctl stop scylla-server
#sudo systemctl start scylla-server

cd /var/lib/scylla/
source /etc/sysconfig/scylla-server
sudo /usr/bin/scylla --log-to-syslog 1 --default-log-level info --network-stack posix --io-properties-file=/etc/scylla.d/io_properties.yaml --developer-mode=1 --cpuset 1-31,33-63

Shlomi asked Tomasz to check on the machine.

Couldn't find any trace for this issue in recent longevity tests.

Couldn't find this issue in any of the failures of 2.3 ami-test.

I also never saw this issue in the past, only in recent two artifact ami-test (i3.16xlarge).

is it effected by instance type? we donot use i3.16xlarge for longevity?

@amoskong it can happen for smaller instances too, it's just a lot less likely

Was this page helpful?
0 / 5 - 0 ratings