Installation details
Scylla version (or git commit hash): 4.1.10-0.20201119.01c822301f
Cluster size: 4 nodes (i3.2xlarge)
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0e2cbb2c9e1e42eea
longevity-lwt-3h runs more time then planned.
Last ("pre-regression") test was run with ami-0b36d79ff69153396 (Scylla version: 4.1.9-0.20201109.fb14fae79b)
From https://jenkins.scylladb.com/job/scylla-4.1/job/longevity/job/longevity-lwt-3h/76 (test was finished in 4h15m) found that prepare (prefill data by c-s inserts) part duration was 30 min.
Prepare started at 20:46 and finished 21:15
C-s statistics
From cassandra-stress stats:
'op rate': '5960',
'partition rate': '5960',
'row rate': '6748',
'latency mean': '5.0',
'latency median': '3.7',
'latency 95th percentile': '10.0',
'latency 99th percentile': '32.7',
'total partitions': '10000000'
Get monitor for "pre-regression" run: hydra investigate show-monitor 4f2613dc-9f5a-481c-88ba-5b282cd5a844
Logs:
db-cluster | https://cloudius-jenkins-test.s3.amazonaws.com/4f2613dc-9f5a-481c-88ba-5b282cd5a844/20201111_003708/db-cluster-4f2613dc.zip |
monitor-set | https://cloudius-jenkins-test.s3.amazonaws.com/4f2613dc-9f5a-481c-88ba-5b282cd5a844/20201111_003708/monitor-set-4f2613dc.zip |
loader-set | https://cloudius-jenkins-test.s3.amazonaws.com/4f2613dc-9f5a-481c-88ba-5b282cd5a844/20201111_003708/loader-set-4f2613dc.zip |
sct-runner | https://cloudius-jenkins-test.s3.amazonaws.com/4f2613dc-9f5a-481c-88ba-5b282cd5a844/20201111_003708/sct-runner-4f2613dc.zip |
"Regression" run with ami-0e2cbb2c9e1e42eea(Scylla version 4.1.10-0.20201119.01c822301f, test_id=d79dd949-53b5-49b9-9eef-2a2e8a675f58)
From https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/yulia/job/staging-4.1-longevity-lwt-3h/1/console (test was aborted after 5 hours) found that prepare part duration was about 2 hours.
Prepare started at 07:05 and finished 09:03
C-s statistics
From cassandra-stress stats:
'op rate': '1430',
'partition rate': '1430',
'row rate': '1620',
'latency mean': '20.9',
'latency median': '21.5',
'latency 95th percentile': '24.7',
'latency 99th percentile': '44.3',
'total partitions': '10000000'
"Regression" run live monitor
Get monitor for "regression" run: hydra investigate show-monitor d79dd949-53b5-49b9-9eef-2a2e8a675f58
All logs:
https://cloudius-jenkins-test.s3.amazonaws.com/d79dd949-53b5-49b9-9eef-2a2e8a675f58/20201125_152600/20201125-064419-816674.tar.gz
No error, reactor stalls found in the log
First time slowdown received with Scylla version 4.1.9-0.20201111.7325996510d (ami-0c71cce348923c8b9 ).
Jenkins job
Test id: 610ec4a5-ead5-4241-9d19-5e5260c6fce4
@juliayakovlev is that a regression? what is the last version that was good?
@juliayakovlev is that a regression? what is the last version that was good?
last version that was good: 4.1.9-0.20201109.fb14fae79b (ami-0b36d79ff69153396)
@slivne @avikivity
Do you need I'll keep the monitor or can terminate it?
@gleb-cloudius can you please scan the diffs in 4.1.9...4.1.10 in scylla / seastar and see if you have a suspect
As we all know it may be a kernel issue - we need to extract the kernels from the AMIs and see if they are the same or not - if they aren't (and gleb doesn't have a suspect) it may be kernel related
this is blocking 4.1.10 from going out
On Thu, Nov 26, 2020 at 02:07:37AM -0800, Shlomi Livne wrote:
@gleb-cloudius can you please scan the diffs in 4.1.9...4.1.10 in scylla / seastar and see if you have a suspect
Why scan doff between those two and not fb14fae79b and 01c822301f which
are two versions that are compared. The diff between those is:
Botond Dénes (2):
types: validate(): linearize values lazily
mutation_reader: queue_reader: don't set EOS flag on abort
Hagit Segev (1):
release: prepare for 4.1.10
Piotr Dulikowski (1):
hints: don't read hint files when it's not allowed to send
Raphael S. Carvalho (1):
compaction: Make sure a partition is filtered out only by producer
So it is unlikely Scylla related.
--
Gleb.
4.1.10 ami-0e2cbb2c9e1e42eea - 5.9.9-1.el7.elrepo.x86_64 #1 SMP Tue Nov 17 19:12:43 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
4.1.9 ami-0b36d79ff69153396 5.9.8-1.el7.elrepo.x86_64 #1 SMP Wed Nov 11 09:49:09 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
4.1.9 rpms http://downloads.scylladb.com/rpm/unstable/centos/branch-4.1/2020-11-09T16:32:49Z/scylla/x86_64/
4.1.10 rpms http://downloads.scylladb.com/rpm/unstable/centos/branch-4.1/2020-11-19T18%3A17%3A45Z/scylla/x86_64/
@slivne
I am going to terminate the monitor. Any objections?
I am not able to reproduce this with a single server and running cassadnra-stress
even when attempting to use the same profile
cassandra-stress user profile=a.yaml n=10000000 ops'(insert=1)' -rate threads=30 -node ...
the prepare part doesn;t use any LWT
this is the profile
# LWT test: create and update data using LWT.
### DML ###
# Keyspace Name
keyspace: cqlstress_lwt_example
# The CQL for creating a keyspace (optional if it already exists)
keyspace_definition: |
CREATE KEYSPACE IF NOT EXISTS cqlstress_lwt_example WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
# Table name
table: blogposts
# The CQL for creating a table you wish to stress (optional if it already exists)
table_definition: |
CREATE TABLE IF NOT EXISTS blogposts (
domain int,
published_date int,
lwt_indicator int,
url text,
author text,
title text,
PRIMARY KEY(domain, published_date)
) WITH compaction = { 'class':'LeveledCompactionStrategy' }
AND comment='A table to hold blog posts'
extra_definitions:
- create MATERIALIZED VIEW blogposts_update_one_column_lwt_indicator as select domain, lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator >=-150000 and lwt_indicator < 0 PRIMARY KEY(lwt_indicator, domain, published_date);
- create MATERIALIZED VIEW blogposts_update_one_column_lwt_indicator_after_update as select lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator = 30000000 PRIMARY KEY(lwt_indicator, domain, published_date);
- create MATERIALIZED VIEW blogposts_update_2_columns_lwt_indicator as select lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator > 0 and lwt_indicator <= 150000 PRIMARY KEY(lwt_indicator, domain, published_date);
- create MATERIALIZED VIEW blogposts_update_2_columns_lwt_indicator_after_update as select lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator = 20000000 PRIMARY KEY(lwt_indicator, domain, published_date);
- create MATERIALIZED VIEW blogposts_not_updated_lwt_indicator as select lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator > 1000000 and lwt_indicator < 20000000 PRIMARY KEY(lwt_indicator, domain, published_date);
### Column Distribution Specifications ###
columnspec:
- name: domain
population: seq(1..10000000) #10M possible domains to pick from
- name: published_date
cluster: uniform(1..1000) #under each domain we will have max 1000 posts
- name: lwt_indicator
population: seq(1..1001000)
- name: url
size: uniform(30..30)
- name: title #titles shouldn't go beyond 200 chars
size: gaussian(10..20)
- name: author
size: uniform(5..20) #author names should be short
### Batch Ratio Distribution Specifications ###
insert:
partitions: fixed(1) # Our partition key is the domain so only insert one per batch
select: fixed(1)/1000 # We have 1000 posts per domain so 1/1000 will allow 1 post per batch
batchtype: UNLOGGED # Unlogged batches
#
# A list of queries you wish to run against the schema
#
queries:
insert_query:
cql: insert into blogposts (domain, published_date, lwt_indicator, url, author, title) values (?,?,?,?,?,?) if not exists
fields: samerow
select:
cql: select * from blogposts where domain = ? LIMIT 1
fields: samerow
lwt_update_one_column:
cql: update blogposts set lwt_indicator = 30000000 where domain = ? and published_date = ? if lwt_indicator >=-150000 and lwt_indicator < 0
fields: samerow
lwt_update_two_columns:
cql: update blogposts set lwt_indicator = 20000000, author = 'text' where domain = ? and published_date = ? if lwt_indicator > 0 and lwt_indicator <= 150000 and author != 'text'
fields: samerow
and the prepare stage does
cassandra-stress user profile=/tmp/c-s_lwt_basic.yaml n=10000000 ops'(insert=1)' cl=QUORUM -mode native cql3 -rate threads=30 -node ... -errors skip-unsupported-columns
******************** Stress Settings ********************
Command:
Type: user
Count: 10,000,000
No Warmup: false
Consistency Level: LOCAL_ONE
Target Uncertainty: not applicable
Command Ratios: {insert=1.0}
Command Clustering Distribution: clustering=gaussian(1..10)
Profile File: a.yaml
Rate:
Auto: false
Thread Count: 100
OpsPer Sec: 0
Population:
Sequence: 1..10000000
Order: ARBITRARY
Wrap: true
Insert:
Revisits: Uniform: min=1,max=1000000
Visits: Fixed: key=1
Row Population Ratio: Ratio: divisor=1.000000;delegate=Fixed: key=1
Batch Type: not batching
Errors:
Ignore: false
Tries: 10
Log:
No Summary: false
No Settings: false
File: null
Interval Millis: 1000
Level: NORMAL
Mode:
API: JAVA_DRIVER_NATIVE
Connection Style: CQL_PREPARED
CQL Version: CQL3
Protocol Version: V4
Username: null
Password: null
Auth Provide Class: null
Max Pending Per Connection: 128
Connections Per Host: 8
Compression: NONE
Node:
Nodes: [172.31.9.202]
Is White List: false
Datacenter: null
Schema:
Keyspace: cqlstress_lwt_example
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Strategy Pptions: {replication_factor=1}
Table Compression: null
Table Compaction Strategy: null
Table Compaction Strategy Options: {}
Transport:
factory=org.apache.cassandra.thrift.TFramedTransportFactory; truststore=null; truststore-password=null; keystore=null; keystore-password=null; ssl-protocol=TLS; ssl-alg=SunX509; store-type=JKS; ssl-ciphers=TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA;
Port:
Native Port: 9042
Thrift Port: 9160
JMX Port: 7199
Send To Daemon:
*not set*
Graph:
File: null
Revision: unknown
Title: null
Operation: USER
TokenRange:
Wrap: false
Split Factor: 1
******************** Profile ********************
Keyspace Name: cqlstress_lwt_example
Keyspace CQL:
***
CREATE KEYSPACE IF NOT EXISTS cqlstress_lwt_example WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
***
Table Name: blogposts
Table CQL:
***
CREATE TABLE IF NOT EXISTS blogposts (
domain int,
published_date int,
lwt_indicator int,
url text,
author text,
title text,
PRIMARY KEY(domain, published_date)
) WITH compaction = { 'class':'LeveledCompactionStrategy' }
AND comment='A table to hold blog posts'
***
Extra Schema Definitions: [create MATERIALIZED VIEW blogposts_update_one_column_lwt_indicator as select domain, lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator >=-150000 and lwt_indicator < 0 PRIMARY KEY(lwt_indicator, domain, published_date);, create MATERIALIZED VIEW blogposts_update_one_column_lwt_indicator_after_update as select lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator = 30000000 PRIMARY KEY(lwt_indicator, domain, published_date);, create MATERIALIZED VIEW blogposts_update_2_columns_lwt_indicator as select lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator > 0 and lwt_indicator <= 150000 PRIMARY KEY(lwt_indicator, domain, published_date);, create MATERIALIZED VIEW blogposts_update_2_columns_lwt_indicator_after_update as select lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator = 20000000 PRIMARY KEY(lwt_indicator, domain, published_date);, create MATERIALIZED VIEW blogposts_not_updated_lwt_indicator as select lwt_indicator, author from blogposts where domain is not null and published_date is not null and lwt_indicator > 1000000 and lwt_indicator < 20000000 PRIMARY KEY(lwt_indicator, domain, published_date);]
Generator Configs:
lwt_indicator: Identity: Sequence: start=1,end=1001000;
author: Size: Uniform: min=5,max=20;
domain: Identity: Sequence: start=1,end=10000000;
title: Size: Gaussian: min=10,max=20,mean=15.000000,stdev=1.666667;
published_date: Clustering: Uniform: min=1,max=1000;
url: Size: Fixed: key=30;
Query Definitions:
insert_query: CQL:insert into blogposts (domain, published_date, lwt_indicator, url, author, title) values (?,?,?,?,?,?) if not exists;Fields:samerow;
select: CQL:select * from blogposts where domain = ? LIMIT 1;Fields:samerow;
lwt_update_one_column: CQL:update blogposts set lwt_indicator = 30000000 where domain = ? and published_date = ? if lwt_indicator >=-150000 and lwt_indicator < 0;Fields:samerow;
lwt_update_two_columns: CQL:update blogposts set lwt_indicator = 20000000, author = 'text' where domain = ? and published_date = ? if lwt_indicator > 0 and lwt_indicator <= 150000 and author != 'text';Fields:samerow;
Token Range Queries:
Insert Settings:
partitions: fixed(1)
select: fixed(1)/1000
batchtype: UNLOGGED
===== Using optimized driver!!! =====
Connected to cluster: scylladb-cluster-1606775542, max pending requests per connection 128, max connections per host 8
Datatacenter: eu-west; Host: /172.31.9.202; Rack: 1a
Created schema. Sleeping 1s for propagation.
Created extra schema. Sleeping 1s for propagation.
Generating batches with [1..1] partitions and [0..1] rows (of [1..1000] total rows in the partitions)
So the prepare phase is doing inserts into a keyspace rf=3 with a simple table with 5 MVs.
There is no LWT related code
On Mon, Nov 30, 2020 at 10:52:58PM -0800, Shlomi Livne wrote:
the prepare part doesn;t use any LWT
I think you are using incorrect one. The one in the branch-4.1 has:
insert:
partitions: fixed(1) # Our partition key is the domain so only insert one per batch
select: fixed(1)/1000 # We have 1000 posts per domain so 1/1000 will allow 1 post per batch
batchtype: UNLOGGED # Unlogged batches
condition: IF title = NULL # LWT: Do not override
--
Gleb.
using the following ami for loader (ami-0bf19545bc7bc9e1f) scylla-qa-loader-ami-lwt-v5 on a 4 nodecluster with LWT in inserts I am able to reproduce the issue
Starts from the good 4.1.9 AMI ami-0b36d79ff69153396 did an upgrade to 4.1.10
scylla --version 4.1.10-0.20201119.01c822301f
And the performance is ok
Results:
Op rate : 6,732 op/s [insert: 6,778 op/s]
Partition rate : 6,732 pk/s [insert: 6,778 pk/s]
Row rate : 7,623 row/s [insert: 7,675 row/s]
Latency mean : 4.4 ms [insert: 4.4 ms]
Latency median : 4.0 ms [insert: 4.0 ms]
Latency 95th percentile : 7.5 ms [insert: 7.5 ms]
Latency 99th percentile : 10.3 ms [insert: 10.3 ms]
Latency 99.9th percentile : 17.1 ms [insert: 17.1 ms]
Latency max : 46.9 ms [insert: 46.9 ms]
Total partitions : 1,000,000 [insert: 1,000,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:02:28
So the issue is not in scylla code
which means the issue is (again) in kernel (or ami) somehow.
start from bad ami: ami-0e2cbb2c9e1e42eea
verified bad performance
downgraded to 4.1.9 and started to check which version is cuasing the issue
scylla version: 4.1.9-0.20201026.b80dab6d58
Results:
Op rate : 6,308 op/s [insert: 6,308 op/s]
Partition rate : 6,308 pk/s [insert: 6,308 pk/s]
Row rate : 7,137 row/s [insert: 7,137 row/s]
Latency mean : 4.5 ms [insert: 4.5 ms]
Latency median : 3.6 ms [insert: 3.6 ms]
Latency 95th percentile : 7.7 ms [insert: 7.7 ms]
Latency 99th percentile : 30.4 ms [insert: 30.4 ms]
Latency 99.9th percentile : 44.8 ms [insert: 44.8 ms]
Latency max : 53.0 ms [insert: 53.0 ms]
Total partitions : 100,000 [insert: 100,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:00:15
END
version: 4.1.9-0.20201106.db6303dba0d
Results:
Op rate : 6,412 op/s [insert: 6,412 op/s]
Partition rate : 6,412 pk/s [insert: 6,412 pk/s]
Row rate : 7,255 row/s [insert: 7,255 row/s]
Latency mean : 4.5 ms [insert: 4.5 ms]
Latency median : 3.6 ms [insert: 3.6 ms]
Latency 95th percentile : 7.5 ms [insert: 7.5 ms]
Latency 99th percentile : 31.2 ms [insert: 31.2 ms]
Latency 99.9th percentile : 44.6 ms [insert: 44.6 ms]
Latency max : 58.4 ms [insert: 58.4 ms]
Total partitions : 100,000 [insert: 100,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:00:15
END
scylla version:
repo: http:////downloads.scylladb.com.s3.amazonaws.com/rpm/unstable/centos/branch-4.1/2020-11-09T16%3A32%3A49Z/scylla.repo
Results:
Op rate : 6,272 op/s [insert: 6,272 op/s]
Partition rate : 6,272 pk/s [insert: 6,272 pk/s]
Row rate : 7,095 row/s [insert: 7,095 row/s]
Latency mean : 4.5 ms [insert: 4.5 ms]
Latency median : 3.6 ms [insert: 3.6 ms]
Latency 95th percentile : 7.3 ms [insert: 7.3 ms]
Latency 99th percentile : 30.5 ms [insert: 30.5 ms]
Latency 99.9th percentile : 44.9 ms [insert: 44.9 ms]
Latency max : 61.8 ms [insert: 61.8 ms]
Total partitions : 100,000 [insert: 100,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:00:15
END
scylla version: 4.1.9-0.20201111.7325996510d
repo //downloads.scylladb.com.s3.amazonaws.com/rpm/unstable/centos/branch-4.1/2020-11-11T18%3A43%3A35Z/scylla.repo
Results:
Op rate : 6,492 op/s [insert: 6,492 op/s]
Partition rate : 6,492 pk/s [insert: 6,492 pk/s]
Row rate : 7,333 row/s [insert: 7,333 row/s]
Latency mean : 4.6 ms [insert: 4.6 ms]
Latency median : 3.5 ms [insert: 3.5 ms]
Latency 95th percentile : 8.0 ms [insert: 8.0 ms]
Latency 99th percentile : 32.4 ms [insert: 32.4 ms]
Latency 99.9th percentile : 45.4 ms [insert: 45.4 ms]
Latency max : 52.8 ms [insert: 52.8 ms]
Total partitions : 100,000 [insert: 100,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:00:15
scylla version: 4.1.9-0.20201115.b144b93cd8
repo: //downloads.scylladb.com.s3.amazonaws.com/rpm/unstable/centos/branch-4.1/2020-11-15T11%3A20%3A00Z/scylla.repo
Results:
Op rate : 6,276 op/s [insert: 6,276 op/s]
Partition rate : 6,276 pk/s [insert: 6,276 pk/s]
Row rate : 7,112 row/s [insert: 7,112 row/s]
Latency mean : 4.7 ms [insert: 4.7 ms]
Latency median : 3.5 ms [insert: 3.5 ms]
Latency 95th percentile : 9.0 ms [insert: 9.0 ms]
Latency 99th percentile : 33.7 ms [insert: 33.7 ms]
Latency 99.9th percentile : 45.0 ms [insert: 45.0 ms]
Latency max : 50.8 ms [insert: 50.8 ms]
Total partitions : 100,000 [insert: 100,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:00:15
scylla version: 4.1.10-0.20201119.01c822301f
repo: //downloads.scylladb.com.s3.amazonaws.com/rpm/unstable/centos/branch-4.1/2020-11-19T18%3A17%3A45Z/scylla.repo
Results:
Op rate : 6,129 op/s [insert: 6,129 op/s]
Partition rate : 6,129 pk/s [insert: 6,129 pk/s]
Row rate : 6,935 row/s [insert: 6,935 row/s]
Latency mean : 4.6 ms [insert: 4.6 ms]
Latency median : 3.6 ms [insert: 3.6 ms]
Latency 95th percentile : 8.2 ms [insert: 8.2 ms]
Latency 99th percentile : 30.7 ms [insert: 30.7 ms]
Latency 99.9th percentile : 44.2 ms [insert: 44.2 ms]
Latency max : 50.7 ms [insert: 50.7 ms]
Total partitions : 100,000 [insert: 100,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:00:16
scylla version: 4.1.10-0.20201122.01c822301f
repo: //downloads.scylladb.com.s3.amazonaws.com/rpm/unstable/centos/branch-4.1/2020-11-22T13%3A32%3A10Z/scylla.repo
Results:
Op rate : 6,448 op/s [insert: 6,448 op/s]
Partition rate : 6,448 pk/s [insert: 6,448 pk/s]
Row rate : 7,299 row/s [insert: 7,299 row/s]
Latency mean : 4.5 ms [insert: 4.5 ms]
Latency median : 3.6 ms [insert: 3.6 ms]
Latency 95th percentile : 7.3 ms [insert: 7.3 ms]
Latency 99th percentile : 29.8 ms [insert: 29.8 ms]
Latency 99.9th percentile : 44.8 ms [insert: 44.8 ms]
Latency max : 52.8 ms [insert: 52.8 ms]
Total partitions : 100,000 [insert: 100,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:00:15
We are back in 4.1.10 and the performance is good ?
this is the upgrade code - lets see if its causing this to work well.
rm scylla.repo*; wget http://downloads.scylladb.com.s3.amazonaws.com/rpm/unstable/centos/branch-4.1/2020-11-22T13%3A32%3A10Z/scylla.repo ; sudo cp scylla.repo /etc/yum.repos.d/
sudo yum upgrade scylla*4.1.10; sudo service scylla-server stop; sudo rm -Rf /var/lib/scylla/data/* /var/lib/scylla/commitlog/*
The bottom line I suspect if we restart / upgrade - he issue will be gone - need to check whats the smallest number of steps that will cause performance to be good
May be affinity setup is broken on the ami and upgrade fixes it?
Or maybe its an issue that scylla starts to early when the node starts we will know soon.
Restarting scylla on all servers did not work ....
Results:
Op rate : 1,487 op/s [insert: 1,487 op/s]
Partition rate : 1,487 pk/s [insert: 1,487 pk/s]
Row rate : 1,683 row/s [insert: 1,683 row/s]
Latency mean : 20.1 ms [insert: 20.1 ms]
Latency median : 20.9 ms [insert: 20.9 ms]
Latency 95th percentile : 24.3 ms [insert: 24.3 ms]
Latency 99th percentile : 40.2 ms [insert: 40.2 ms]
Latency 99.9th percentile : 55.2 ms [insert: 55.2 ms]
Latency max : 79.7 ms [insert: 79.7 ms]
Total partitions : 100,000 [insert: 100,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:01:07
Trying this did not help as well
sudo service scylla-server stop; sudo rm -Rf /var/lib/scylla/data/* /var/lib/scylla/commitlog/* ; sudo service scylla-server start
Next reinstall
sudo yum reinstall scylla*
sudo service scylla-server stop; sudo rm -Rf /var/lib/scylla/data/* /var/lib/scylla/commitlog/* ; sudo service scylla-server start
this fixes the issue.
So @gleb-cloudius - you are correct - reinstall fixed it - yet what did it do - and how do we find whats broken
So @gleb-cloudius - you are correct - reinstall fixed it - yet what did it do - and how do we find whats broken
Runs scylla-setup may be?
--
Gleb.
4.1.10 has the revert the code of the unpinning of the kernel @penberg can you please check the patch in scylla-machine-image for the unpinning
@avikivity as discussed the regression is visible between
4.1.10 ami-0e2cbb2c9e1e42eea - 5.9.9-1.el7.elrepo.x86_64 #1 SMP Tue Nov 17 19:12:43 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
4.1.9 ami-0b36d79ff69153396 5.9.8-1.el7.elrepo.x86_64 #1 SMP Wed Nov 11 09:49:09 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
reinstall of scylla "fixes" the issue.
I'd like to see monitoring from a bad run.
On Sun, Dec 06, 2020 at 04:46:04AM -0800, Avi Kivity wrote:
How reinstalling Scylla on the same kernel would have helped?
--
Gleb.
@avikivity
Monitor of "regression" run:
http://52.215.0.102:3000/d/yW_A6WJGz/scylla-per-server-metrics-nemesis-master?orgId=1&from=1606285776157&to=1606307608372&var-by=instance&var-cluster=&var-dc=All&var-node=All&var-shard=All
@juliayakovlev can I have a monitor for a good run? the bad run actually looks good, so I'd like to compare to a good run.
@juliayakovlev can I have a monitor for a good run? the bad run actually looks good, so I'd like to compare to a good run.
@avikivity
Do you need both of them or just for good run?
I need both (now looking at the bad run).
Some rates (per shard), bad run:
coordinator ops: ~28-64/s
replica reads: ~390-450/s
replica writes: ~650-740/s
cache reads: ~520-590/s
view local update: 14-19/s
view remote update: 4-6/s
lwt writes: 42-47/s
lwt average write latency: 19-27ms
lwt condition not met: rises to 7-9/s
lwt prune: 42-49/s
commitlog iops: 300/s - 370/s
commitlog bandwidth: ~1.3-3.1 MB/s
Average commitlog write is 8k.
I need both (now looking at the bad run).
@avikivity
"Good" run monitor: http://52.16.7.182:3000/d/fQUrzMJMk/scylla-per-server-metrics-nemesis-master?orgId=1&from=1605040696854&to=1605054283344&var-by=instance&var-cluster=&var-dc=All&var-node=All&var-shard=All
For the bad run, c-s latency 95% was ~25ms. For the good run, ~10ms.
I've seen this many times when system.paxos needs compaction, so its memtable is empty, every round has to do a disk read.
The question would be though why is this a regression?
Some rates (per shard), good run:
The question would be though why is this a regression?
4.1.10 results are ~5X worse than 4.1.9.
I've seen this many times when system.paxos needs compaction, so its memtable is empty, every round has to do a disk read.
I don't see disk reads during the populate phase.
Commitlog I/O queue delay is low in both runs.
Somehow the latency is much higher for the bad run, although I/O is fast for both.
@gleb-cloudius any idea?
On Tue, Dec 08, 2020 at 06:41:58AM -0800, Avi Kivity wrote:
@gleb-cloudius any idea?
Not really. I already looked at bad results back when the bug was posted
and also did not see anything strange. Cpu utilization was low as well.
Look at Shlomi's findings. Reinstalling the package seams to help, so
may be there are problems with cpu affinity or something bad runs on the
host and steals cpu. If we have host metrics we can compare context
switches.
--
Gleb.
We do have host metrics. But look at reactor load, it is always low.
In the bad run, we take 7%-10% in statement group. In the good run, 38%-69%.
I did reinstall 3 times and it "fixed" the issue
I'll provide instructions on how to reproduce - @penberg / @avi - will take a look.
reproducing the setup
please note there are params you will need to specify
setup cluster <your-ec2-ireland-key>
aws cloudformation create-stack --stack-name sl-t --template-url http://downloads.scylladb.com.s3.amazonaws.com/downloads/scylla/rpm/centos/scylladb-4.1/scylla_cluster_4.1.9.yaml --parameters ParameterKey=AvailabilityZone,ParameterValue=eu-west-1b ParameterKey=ScyllaAmi,ParameterValue=ami-0e2cbb2c9e1e42eea ParameterKey=ClusterName,ParameterValue=sl-t-4-1-10 ParameterKey=InstanceType,ParameterValue=i3.2xlarge ParameterKey=InstanceCount,ParameterValue=4 ParameterKey=KeyName,ParameterValue=<your-ec2-ireland-key>
setup loader (need an instance from the cluster) <subnet-id-from-one-instance>,<security-group-ids-from-one-instance>,<your-ec2-ireland-key>
aws ec2 run-instances --image=ami-0bf19545bc7bc9e1f --instance-type c5.4xlarge --subnet-id <subnet-id-from-one-instance> --security-group-ids <security-group-ids-from-one-instance> --key-name <your-ec2-ireland-key>
copy the cassandra-stress profile need <loader-ip>
scylla-cluster-tests
git checkout branch-4.1
scp -i <your-ec2-ireland-key> centos@<loader-ip>:/home/centos/
on loader-ip run load
cassandra-stress user profile=./c-s_lwt_basic.yaml n=10000000 ops'(insert=1)' cl=QUORUM -mode native cql3 -rate threads=30 -errors skip-unsupported-columns -node 172.31.0.1
```
@slivne those setup 4.1.9, no? need to install 4.1.10 on top, and then reinstall?
@avikivity the template is 4.1.9 as its the last released version the ami is 4.1.10 (0e2cbb2c9e1e42eea)
ah!
reinstallling just the scylla-server subpackage and restarting did not help.
reinstalling scylla-kernel-conf helped immediately, even without restart.
Here's a machine I did not yet upgrade:
[centos@ip-172-31-3-11 ~]$ cat /proc/sys/kernel/sched_wakeup_granularity_ns
15000000
and an upgraded machine:
$ cat /proc/sys/kernel/sched_wakeup_granularity_ns
450000
Rebooting a reinstalled node restores the bad value.
So maybe we are not persisting the setting and only changing it for the
runtime?
On Wed, Dec 9, 2020 at 6:20 PM Avi Kivity notifications@github.com wrote:
Rebooting a reinstalled node restores the bad value.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/7696#issuecomment-741881531,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AA2OCCAU66EUTPWRQKNGXO3ST6PTXANCNFSM4UCLJ6XQ
.
I straced the systemd-sysctl.service invocation, which is responsible for loading the value at boot:
open("/proc/sys/kernel/sched_wakeup_granularity_ns", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa8d473c000
write(4, "450000\n", 7) = 7
close(4)
So something is overwriting it afterwards.
/usr/lib/tuned/throughput-performance/tuned.conf:kernel.sched_wakeup_granularity_ns = 15000000
Name : tuned
Version : 2.11.0
Release : 10.el7
Architecture: noarch
Install Date: 2020-11-20T03:23:48 UTC
Group : Unspecified
Size : 815813
License : GPLv2+
Signature : RSA/SHA256, 2020-11-17T14:19:32 UTC, Key ID 24c6a8a7f4a80eb5
Source RPM : tuned-2.11.0-10.el7.src.rpm
Build Date : 2020-11-16T16:46:25 UTC
Build Host : x86-02.bsys.centos.org
Relocations : (not relocatable)
Packager : CentOS BuildSystem <http://bugs.centos.org>
Vendor : CentOS
URL : http://www.tuned-project.org/
Summary : A dynamic adaptive system tuning daemon
Description :
The tuned package contains a daemon that tunes system settings dynamically.
It does so by monitoring the usage of several system components periodically.
Based on that information components will then be put into lower or higher
power saving modes to adapt to the current usage. Currently only ethernet
network and ATA harddisk devices are implemented.
* Wed Aug 12 2020 Jaroslav Å karvada <[email protected]> - 2.11.0-10
- realtime-virtual-host: remove lapic advancement calculation and
related qemu-kvm-tools-rhev requirement
Resolves: rhbz#1852253
* Thu Mar 05 2020 Jaroslav Å karvada <[email protected]> - 2.11.0-9
- Fixed SIGHUP handling
Resolves: rhbz#1702724
- Tune irqbalance service
Resolves: rhbz#1720042
- Added netcat requirement
Resolves: rhbz#1746436
- sysctl: made reapply_sysctl ignore configs from /usr
Resolves: rhbz#1776149
- profiles: define variables before use
Resolves: rhbz#1781664
"- sysctl: made reapply_sysctl ignore configs from /usr"
So we have two options:
Looking at /usr/lib/tuned/throughput-performance/tuned.conf, we are better off disabling tuned.
Perhaps we should add Obsoletes: tuned to scylla-kernel-conf.
@slivne bottom line, it is not a regression in scylla, but a change in tuned behavior that we got by running yum update somehere.
from yum.log:
Nov 20 03:23:48 Updated: tuned-2.11.0-10.el7.noarch
@avikivity
Do you need the monitors?
@juliayakovlev no, we found the cause and a fix is on the way. Please kill the monitors.
Backported to 4.1, 4.2, 4.3.
Most helpful comment
Here's a machine I did not yet upgrade:
[centos@ip-172-31-3-11 ~]$ cat /proc/sys/kernel/sched_wakeup_granularity_ns
15000000
and an upgraded machine:
$ cat /proc/sys/kernel/sched_wakeup_granularity_ns
450000