Cockroach: CockroachDB v1.0 write performance is half as fast as postgreSQL?

Created on 21 Aug 2017  路  15Comments  路  Source: cockroachdb/cockroach

I'd like to congratulate the people working on this project (It's awesome) - i was looking for a distributed SQL db with strong consistency and cockroachDB seems like the best proposal out there.

I'm not a DB guru by any stretch but I really wanted to see the baseline performance difference between PostgreSQL and Cockroach on a single node just out of curiosity. I made a makeshift NodeJS benchmark script that inserts 2000 records into a table it creates (which has 3 indexes), then tries to benchmark a few selects using one of the indexes.

Here's the schema i used:

CREATE TABLE customers_bench (
    id SERIAL NOT NULL,
    first_name varchar(250) DEFAULT NULL,
    last_name varchar(250) DEFAULT NULL,
    email varchar(250) DEFAULT NULL,
    age INT DEFAULT NULL,
    PRIMARY KEY (id)
  );
CREATE INDEX ON customers_bench (email);
CREATE INDEX ON customers_bench (age);

It looked like Postgresql inserts 2k records in 3.8s while cockroach does it in 7.3s.

Postgresql:

Cockroach DB:

Selecting by an index takes exactly the same time between Postgersql & Cockroach (~2ms to select by an index).

How to replicate:

I've used a t2.small for both the cockroach EC2 server and the postgresql RDS server on AWS. I have done 0 extra configuration/tweaks on both.

How to benchmark:

Get my Node js script from here: https://gist.github.com/Spazer9/a6073cdc58d103bf5ac055be28589114

Make sure you got Node 7.6 and up. install the pg package. then run:

node sql_bench.js --host=XXXXX --port=XXXX --user=XXXX --pw=XXX --db=XXXX

This will work on both postgresql and cockroach dbs.

C-question O-community

Most helpful comment

Dear @Spazer9 we'd like to thank you again for your analysis with CockroachDB 1.0.
Your work and that of many other CockroachDB users have pushed us to make major investments into performance in CockroachDB v1.1 and subsequently v2.0 which is to be released within a couple of months.
We have worked hard to bridge the gap you have found. I think that the gist of the message carried by this github issue is not current any more. I will close this issue for now, and we will welcome any new studies onto the performance of CockroachDB v2.0.

All 15 comments

PostgreSQL is usually faster than single-node CockroachDB at this time. We'll narrow that gap over time, but the real advantage to Cockroach is the easy horizontal scalability without having to resort to sharding.

Looking at your code, one thing to be aware of is that Cockroach is more sensitive to roundtrips to the server than a traditional single-node database. You want to minimize those roundtrips. Inserting 1 row at a time is sometimes needed for OLTP workloads, but if you're trying to bulk insert data, see the FAQ. Batching hundreds or thousands of rows into a single INSERT statement will be orders of magnitude faster for both PostgreSQL and CockroachDB.

I wish I could achive your results... When testing with sysbench 1.1 against Galera, both single-nodes, I get:

oltp_write_only
cockroach
transactions: 34.19 per sec.
queries: 205.15 per sec.

galera
transactions: 870.68 per sec.
queries: 5224.08 per sec.

oltp_read_only
cockroach
transactions: 106.54 per sec.
queries: 1704.58 per sec.

galera
transactions: 449.72 per sec.
queries: 7195.46 per sec.


Clients (threads): 1
Tables: 4
Table rows: 250k
RAM: 4G, 25% for both caches
CPUs: 4

If this is of interest, I can open a separate issue with all details.

@amq Can you provide the exact sysbench command line you ran against Cockroach? I haven't done any significant investigation into sysbench, but the numbers you posted seem low.

@petermattis

sysbench --db-driver=pgsql --pgsql-host=cockroachdb-01 --pgsql-port=26257 --pgsql-user=root oltp_read_only --tables=4 --table-size=250000 prepare
sysbench --db-driver=pgsql --pgsql-host=cockroachdb-01 --pgsql-port=26257 --pgsql-user=root oltp_read_only --tables=4 --time=60 --threads=1 run

Some other tests:

oltp_write_only
oltp_read_write
oltp_update_non_index

Make sure to run the master (1.1) of sysbench if you want to try more threads than one. See https://github.com/akopytov/sysbench/commit/0169d42681ab699a05533a1e9294e98ab6494535

My docker-compose.yml:

version: "3.1"

services:
  cockroachdb-01:
    hostname: cockroachdb-01
    image: cockroachdb/cockroach:${COCKROACHDB_VERSION:-v1.1.2}
    command: |-
      start
      --join=cockroachdb-01:26257
      --cache=25%
      --max-sql-memory=25%
      --logtostderr
      --insecure
    ports:
      - "8080:8080"
      - "26257:26257"
    volumes:
      - cockroachdb-01:/cockroach/cockroach-data
    networks:
      - mynet
    deploy:
      placement:
        constraints:
          - node.hostname == ${BASE_HOSTNAME:-node}-01

volumes:
  cockroachdb-01:

networks:
  mynet:
    external: true

@amq Thanks for the details. I'm not going to be able to look at this until next week due to the holiday.

@petermattis I've noticed that CockroachDB performs best with --rand-type=zipfian distribution (good!), while Galera is best with the sysbench's default --rand-type=special. Both CockroachDB and Galera benefit heavily from multiple threads, even for writes, so these results are with --threads=4

sysbench --db-driver=pgsql --pgsql-host=cockroachdb-01 --pgsql-port=26257 --pgsql-user=root oltp_write_only --time=60 --tables=4 --threads=4 --rand-type=zipfian run

| read_only TPS | special | pareto | zipfian |
|---------------|---------|--------|---------|
| cockroach | 379 | 493 | 492 |
| galera | 1486 | 1136 | 1050 |

| write_only TPS | special | pareto | zipfian |
|----------------|---------|--------|---------|
| cockroach | 71 | 57 | 116 |
| galera | 2190 | 2057 | 2363 |

More on different distributions in sysbench here: http://kaamos.me/blog/2017/09/12/zipfian-distribution-in-sysbench.html

I started taking a look at this issue. The first step is understanding what sysbench is doing. For the oltp tests, it creates N tables with the schema:

CREATE TABLE sbtest%d (
  id SERIAL PRIMARY KEY,
  k INTEGER DEFAULT 0 NOT NULL,
  c STRING(120) DEFAULT '' NOT NULL,
  pad STRING(60) DEFAULT '' NOT NULL,
  INDEX (k ASC)
)

For oltp_read_only, transactions look like:

BEGIN TRANSACTION
SELECT c FROM sbtest2 WHERE id = $1
SELECT c FROM sbtest2 WHERE id = $1
SELECT c FROM sbtest2 WHERE id = $1
SELECT c FROM sbtest2 WHERE id = $1
SELECT c FROM sbtest2 WHERE id = $1
SELECT c FROM sbtest2 WHERE id = $1
SELECT c FROM sbtest2 WHERE id = $1
SELECT c FROM sbtest2 WHERE id = $1
SELECT c FROM sbtest2 WHERE id = $1
SELECT c FROM sbtest2 WHERE id = $1
SELECT c FROM sbtest2 WHERE id BETWEEN $1 AND $2
SELECT sum(k) FROM sbtest4 WHERE id BETWEEN $1 AND $2
SELECT c FROM sbtest1 WHERE id BETWEEN $1 AND $2 ORDER BY c
SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN $1 AND $2 ORDER BY c
COMMIT TRANSACTION

For oltp_write_only, transactions look like:

BEGIN TRANSACTION
UPDATE sbtest1 SET k = k + 1 WHERE id = $1
UPDATE sbtest2 SET c = $1 WHERE id = $2
DELETE FROM sbtest1 WHERE id = $1
INSERT INTO sbtest1(id, k, c, pad) VALUES ($1, $2, $3, $4)
COMMIT TRANSACTION

The individual statements in the transactions are sent serially.

Just found out that it's possible to create a custom test with parallel execution: http://kaamos.me/talks/pl2017/#/slide-orgheadline23. Generally, those slides are a gold mine for getting to know sysbench.

upd: author of sysbench gave a hint how to implement parallel execution: https://github.com/akopytov/sysbench/issues/73

upd2: appending RETURNING NOTHING to all non-SELECT statements brought ~8% of improvement at oltp_write_only and ~11% improvement at oltp_read_write.

@amq The numbers I'm seeing are quite different from what you're reporting. This could be due to machine differences or something else. It is probably worthwhile to remove docker as a variable (I'm running cockroach directly). The numbers I'm reporting below are from Cockroach master (0cd176ae8a552ed9284a159092279dd43d41afad) while you're using 1.1.2, but it is very unlikely that could account for the magnitude of difference in numbers here.

My machine has 8 cores, so I use --threads=8. Cockroach gets better concurrency when there are multiple ranges. Your usage of --tables=4 ensures 4 ranges because the size of each table is less than the range size of 64MB. I've used --tables=8 below.

sysbench --db-driver=pgsql --pgsql-host=localhost --pgsql-port=26257 --pgsql-user=root oltp_read_only --tables=8 --time=60 --threads=8 run
...
    transactions:                        86561  (1442.59 per sec.)
    queries:                             1384976 (23081.49 per sec.)

We can focus on point-selects by disabling range queries with --range-selects=off:

sysbench --db-driver=pgsql --pgsql-host=localhost --pgsql-port=26257 --pgsql-user=root oltp_read_only --tables=8 --time=60 --threads=8 --range-selects=off run
...
    transactions:                        140695 (2344.81 per sec.)
    queries:                             1688340 (28137.77 per sec.)

Default oltp_write_only shows:

sysbench --db-driver=pgsql --pgsql-host=localhost --pgsql-port=26257 --pgsql-user=root oltp_write_only --tables=8 --time=60 --threads=8 run
...
    transactions:                        43824  (730.27 per sec.)
    queries:                             263766 (4395.33 per sec.)

Focusing in on the individual write operations. index-updates:

sysbench --db-driver=pgsql --pgsql-host=localhost --pgsql-port=26257 --pgsql-user=root oltp_write_only --tables=8 --time=60 --threads=8 --index-updates=1 --non-index-updates=0 --delete-inserts=0 run
...
    transactions:                        107938 (1798.84 per sec.)
    queries:                             323814 (5396.52 per sec.)

non-index-updates:

sysbench --db-driver=pgsql --pgsql-host=localhost --pgsql-port=26257 --pgsql-user=root oltp_write_only --tables=8 --time=60 --threads=8 --index-updates=0 --non-index-updates=1 --delete-inserts=0 run
...
    transactions:                        178064 (2967.64 per sec.)
    queries:                             534192 (8902.93 per sec.)

delete-inserts:

sysbench --db-driver=pgsql --pgsql-host=localhost --pgsql-port=26257 --pgsql-user=root oltp_write_only --tables=8 --time=60 --threads=8 --index-updates=0 --non-index-updates=0 --delete-inserts=1 run
...
    transactions:                        115332 (1922.09 per sec.)
    queries:                             461328 (7688.37 per sec.)

We can also remove the machine as a variable by using AWS with the exact same instance type, storage and AMI. It won't be 100% perfect because of a noisy environment, but even fluctuations of 20% won't make a difference in the whole picture.

The next tests were performed on:

| | |
|--------------|--------------|
| type | r4.xlarge |
| virtualization | hvm |
| storage | gp2 |
| storage size | 500 |
| ami | xenial server release 20171116 ebs-ssd amd64 |

I provisioned the instance and docker using docker-machine 0.13.0

docker-machine create \
--engine-storage-driver overlay2 \
--driver amazonec2 \
--amazonec2-region eu-central-1 \
--amazonec2-zone a \
--amazonec2-instance-type r4.xlarge \
--amazonec2-volume-type gp2 \
--amazonec2-root-size 500 \
--amazonec2-ami ami-3d2bab52 \
node-01

sysbench prepare

sysbench --db-driver=pgsql --pgsql-host=cockroachdb-01 --pgsql-port=26257 --pgsql-user=root oltp_read_only --tables=4 --table-size=250000 prepare

sysbench run

for threads in 1 2 4 8 16 32; do sysbench --db-driver=pgsql --pgsql-host=cockroachdb-01 --pgsql-port=26257 --pgsql-user=root oltp_read_only --tables=4 --table-size=250000 --warmup-time=5 --time=60 --report-interval=5 --threads=$threads run; sleep 5; done

Cockroach 1.1.2 in docker [[config](https://github.com/amq/cockroach-galera-benchmark/blob/53c0f1dedacd889eb1395b955333bea5fd18788a/cockroachdb-cluster/docker-compose-1.yml)]

oltp_read_only

| | 1 | 2 | 4 | 8 | 16 | 32 |
|-------------------------|---|---|---|---|----|----|
| transactions | 157 | 272 | 444 | 437 | 445| 452 |
| queries | 2518 | 4355 | 7115| 6996 | 7126 | 7234 |
| 95th percentile latency | 6 | 7 | 14 | 36 | 71 | 180 |

oltp_write_only

| | 1 | 2 | 4 | 8 | 16 | 32 |
|-------------------------|---|---|---|---|----|----|
| transactions | 54 | 91 | 164 | 208 | 286 | 347 |
| queries | 327 | 546 | 987 | 1248 | 1715 | 2082 |
| 95th percentile latency | 24 | 29 | 35 | 56 | 109 | 200 |

Cockroach 1.1.2 native

oltp_read_only

| | 1 | 2 | 4 | 8 | 16 | 32 |
|-------------------------|---|---|---|---|----|----|
| transactions | 184 | 308 | 492 | 520 | 527 | 524 |
| queries | 2945 | 4940 | 7867 | 8314 | 8431 | 8334 |
| 95th percentile latency | 6 | 7 | 16 | 33 | 73 | 183 |

oltp_write_only

| | 1 | 2 | 4 | 8 | 16 | 32 |
|-------------------------|---|---|---|---|----|----|
| transactions | 56 | 92 | 175 | 213 | 300 | 363 |
| queries | 334 | 533 | 1052 | 1276 | 1799 | 2182 |
| 95th percentile latency | 24 | 30 | 34 | 55 | 111 | 200 |

Conclusion regarding docker: it brings up to 15% of overhead. Note that sysbench was also running natively in this case.

Bottleneck in native

oltp_read_only bottleneck: CPU with --threads >= 8

top - 12:30:56 up  1:49,  2 users,  load average: 1.06, 1.18, 1.00
Tasks: 142 total,   1 running, 141 sleeping,   0 stopped,   0 zombie
%Cpu(s): 90.3 us,  6.6 sy,  0.0 ni,  0.5 id,  0.0 wa,  0.0 hi,  2.6 si,  0.1 st
KiB Mem : 31398288 total, 26447948 free,  1234728 used,  3715612 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 29721996 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1115 root      20   0 1347060 1.025g  37952 S 375.1  3.4  12:00.23 cockroach
 2535 root      20   0 2163732  16660   7688 S  22.3  0.1   0:04.62 sysbench

oltp_write_only bottleneck: 1500 IOPS provided by the storage with --threads >= 2

iostat 1
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda           1664.00         0.00     14328.00          0      14328

Galera in docker as a baseline [[config](https://github.com/amq/cockroach-galera-benchmark/blob/53c0f1dedacd889eb1395b955333bea5fd18788a/galera-cluster/docker-compose-1.yml)]

oltp_read_only

| | 1 | 2 | 4 | 8 | 16 | 32 |
|-------------------------|---|---|---|---|----|----|
| transactions | 451 | 820 | 1537 | 1519 | 1537 | 1509 |
| queries | 7217 | 13112 | 24586 | 24301 | 24590 | 24141 |
| 95th percentile latency | 2 | 2 | 3 | 6 | 11 | 24 |

oltp_write_only

| | 1 | 2 | 4 | 8 | 16 | 32 |
|-------------------------|---|---|---|---|----|----|
| transactions | 1200 | 1894 | 2619 | 3241 | 3162 | 2808 |
| queries | 7204 | 11364 | 15716 | 19447 | 18974 | 16844 |
| 95th percentile latency | 1 | 1 | 2 | 3 | 8 | 15 |

Disclaimer: I am a student and testing CockroachDB is a part of my assignment. I am not affiliated/invested in any party involved and I have introduced a comparison with Galera only to better understand the results.

... just ran the same single-node tests on the new EC2 instance types. In short, they seem to be significantly better than r4, while for the final testing I'm leaning towards m5 due to the double RAM. It would allow me to run both datastores on the same instances simultaneously comfortably, even with more data than 250k rows * 4 tables. There seems to be one unresolved issue which degrades IO on m5 though, so I will need to consider this.

I think so far my results are consistent, and it seems like there are no glaring issues with the setup. If anybody sees any issues (e.g. unoptimal / unfair configuration), I would be extremely grateful for a hint.

What wonders me the most regarding Cockroach is the high latency.

| | |
|--------------|--------------|
| type | c5.xlarge |
| virtualization | hvm |
| storage | gp2 |
| storage size | 500 |
| storage driver | overlay2 |
| region | eu-west-1 |
| ami | xenial server release 20171116 ebs-ssd amd64 |

Cockroach in docker

oltp_read_only

| threads | tps | qps | 95th lat |
|---------|--------|----------|----------|
| 1 | 230.45 | 3687.23 | 4.41 |
| 2 | 399.35 | 6389.31 | 5.00 |
| 4 | 659.62 | 10553.40 | 15.55 |
| 8 | 669.14 | 10705.38 | 30.81 |
| 16 | 666.31 | 10658.61 | 59.99 |
| 32 | 664.80 | 10632.54 | 144.97 |

oltp_write_only

| threads | tps | qps | 95th lat |
|---------|--------|---------|----------|
| 1 | 112.43 | 674.54 | 13.70 |
| 2 | 127.73 | 766.25 | 23.10 |
| 4 | 179.14 | 1074.95 | 33.12 |
| 8 | 215.65 | 1293.65 | 55.82 |
| 16 | 346.29 | 2078.59 | 94.10 |
| 32 | 456.14 | 2740.71 | 161.51 |

Galera in docker

oltp_read_only

| threads | tps | qps | 95th lat |
|---------|---------|----------|----------|
| 1 | 770.78 | 12332.34 | 1.39 |
| 2 | 1460.85 | 23373.42 | 1.42 |
| 4 | 2521.69 | 40346.55 | 2.18 |
| 8 | 2619.01 | 41903.16 | 3.19 |
| 16 | 2593.42 | 41492.12 | 6.79 |
| 32 | 2557.90 | 40920.41 | 14.21 |

oltp_write_only

| threads | tps | qps | 95th lat |
|---------|---------|----------|----------|
| 1 | 1986.45 | 11918.62 | 0.54 |
| 2 | 3016.31 | 18097.96 | 0.80 |
| 4 | 3994.63 | 23968.62 | 1.47 |
| 8 | 4687.19 | 28125.03 | 2.57 |
| 16 | 4497.43 | 26986.80 | 5.47 |
| 32 | 4374.43 | 26246.22 | 10.09 |

#

| | |
|--------------|--------------|
| type | m5.xlarge |
| virtualization | hvm |
| storage | gp2 |
| storage size | 500 |
| storage driver | overlay2 |
| region | eu-west-1 |
| ami | xenial server release 20171116 ebs-ssd amd64 |

Cockroach in docker

oltp_read_only

| threads | tps | qps | 95th lat |
|---------|--------|----------|----------|
| 1 | 240.52 | 3848.19 | 4.33 |
| 2 | 386.43 | 6182.49 | 5.28 |
| 4 | 659.68 | 10554.19 | 8.13 |
| 8 | 702.03 | 11231.24 | 24.83 |
| 16 | 661.74 | 10586.15 | 58.92 |
| 32 | 692.15 | 11070.55 | 134.90 |

oltp_write_only

| threads | tps | qps | 95th lat |
|---------|--------|---------|----------|
| 1 | 113.39 | 680.28 | 13.70 |
| 2 | 126.66 | 759.93 | 23.95 |
| 4 | 168.23 | 1009.22 | 34.95 |
| 8 | 215.89 | 1295.66 | 55.82 |
| 16 | 343.30 | 2060.57 | 82.96 |
| 32 | 441.22 | 2651.65 | 176.73 |

Galera in docker

oltp_read_only

| threads | tps | qps | 95th lat |
|---------|---------|----------|----------|
| 1 | 723.09 | 11569.43 | 1.50 |
| 2 | 1377.73 | 22043.50 | 1.50 |
| 4 | 2486.05 | 39776.33 | 1.67 |
| 8 | 2416.96 | 38669.93 | 3.49 |
| 16 | 2415.51 | 38645.50 | 7.04 |
| 32 | 2351.84 | 37623.83 | 14.73 |

oltp_write_only

| threads | tps | qps | 95th lat |
|---------|---------|----------|----------|
| 1 | 1921.44 | 11528.55 | 0.56 |
| 2 | 2779.13 | 16675.02 | 0.87 |
| 4 | 3772.58 | 22636.28 | 1.58 |
| 8 | 4362.11 | 26174.03 | 2.76 |
| 16 | 4166.74 | 25002.94 | 5.99 |
| 32 | 3973.48 | 23840.49 | 11.04 |

I also ran the tests with 500k and 1m rows, and there wasn't any significant difference.

Here is a handy parsing script which I used to speed up the process:

cat sysbench-1.log | egrep "threads:|transactions:|queries:|min:|avg:|max:|percentile:" | tr -d "\n" | sed 's/Number of threads: /\n/g' | sed 's/[A-Za-z\/]\{1,\}://g' | sed -e 's/95th//g' -e 's/per sec.)//g' -e 's/ms//g' -e 's/(//g' | sed 's/ \{1,\}/,/g'

Columns:

threads,transactions total, transactions per second,queries total,queries per second,min latency, avg latency, max latency,95th percentile latency

Dear @Spazer9 we'd like to thank you again for your analysis with CockroachDB 1.0.
Your work and that of many other CockroachDB users have pushed us to make major investments into performance in CockroachDB v1.1 and subsequently v2.0 which is to be released within a couple of months.
We have worked hard to bridge the gap you have found. I think that the gist of the message carried by this github issue is not current any more. I will close this issue for now, and we will welcome any new studies onto the performance of CockroachDB v2.0.

@knz is there any estimated launch date of v2?

April 2018.

Originally setup first comment said single node needs nearly double of time to insert same number if records.
Interested so see the results of same setup but new Cockroach version.
Also would like to know if parallel execution of Inserts can reach the same duration as single node ow setup.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mjibson picture mjibson  路  3Comments

HeikoOnnebrink picture HeikoOnnebrink  路  4Comments

richardanaya picture richardanaya  路  3Comments

ajwerner picture ajwerner  路  4Comments

petermattis picture petermattis  路  4Comments