Cockroach: storage: diagnose performance issues with tpc-c when not using partitioning

Created on 24 May 2018  路  19Comments  路  Source: cockroachdb/cockroach

A-kv-distribution C-performance

Most helpful comment

After some initial investigation today, there's one problem so big that it pretty much obscures whatever problems may exist beneath it -- bad replica balance. For the most part, the replicas for any given table mostly fall on the same 3 nodes. The worst offender here is the tpcc.warehouse table, which gets the most load of any table as far as I'm aware (please correct that if I'm wrong). Almost all of the replicas for its 1000 ranges ended up on the same 3 nodes, meaning that the best leaseholder rebalancing in the world couldn't make for good performance. Those 3 nodes are indeed the 3 most loaded along dimensions such as number of BatchRequests processed, CPU usage, and keys written per second, with 1 of them in particular really standing out in goroutines (20k vs 2k for the next highest).

This could be fixed in a number of ways, such as:

  • Load-based rebalancing
  • Trying to diversify the nodes a table's ranges are on when rebalancing
  • Implementing a version of ALTER TABLE ... SCATTER that scatters replicas in addition to leases

The last one should get the quickest results and we've talked about wanting it anyway, so I'll work on that tomorrow and see where it leaves things.


I do have a couple questions about oddities I've noticed, though:

  1. r1 has a lot of qps. Way more qps than any other range (460 qps to it vs 50 to the next hottest range in my latest full measurement, and it appears to be growing over time; it's now over 1000 qps). That's way higher than the cluster's rate of NotLeaseholderErrors. Judging by the fact that its keys written per second is basically, 0, they're all reads. According to the UI, there is no rebalancing activity going on, neither replicas nor leases. I increased the cluster setting for the number of range cache entries from 1M to 10M and it's had no effect. I enabled slow query tracing to try and see which requests were hitting it, and I'm seeing it repeatedly involved in QueryTxn attempts to resolve intents. It looks pretty fishy to me, but I haven't checked against the code yet. Here's an example trace: https://gist.github.com/a-robinson/17ebf561bf450b40e7e89edd967962ca. Is this expected?
  2. The tpcc.warehouse ranges on n11 (the one with the massive number of goroutines and most qps) are getting significantly more qps than those whose leases are on the other two nodes (https://gist.github.com/a-robinson/0aebc9aea3fb042255d3572e52ce63ac). Under normal operation, I'd expect each warehouse to get the same number of qps. Is that a correct understanding of tpc-c? And given that n11 seems to be the worst bottleneck, I'd expect each range on it to have fewer qps, not more, due to the slowdown. Any idea what might be going on here?

Finally, for future reference (cc @piyush-singh), things that would have been nice to have for the sake of debugging the performance issue:

  • Breakdown of which ranges are on each node, i.e. a breakdown of how many are from each table
  • Breakdown of where the replicas for each table are located, i.e. their distribution across the nodes
  • Breakdown of where the leaseholders for each table are located, i.e. their distribution across the nodes
  • List of the ranges receiving the most traffic (QPS and/or keys written per second), where their leaseholders are, where their replicas are, which tables they're from

I ended up manually constructing most of this data with the raft debug endpoint + a few python scripts as well as SHOW TESTING_RANGES + some bash.

All 19 comments

After some initial investigation today, there's one problem so big that it pretty much obscures whatever problems may exist beneath it -- bad replica balance. For the most part, the replicas for any given table mostly fall on the same 3 nodes. The worst offender here is the tpcc.warehouse table, which gets the most load of any table as far as I'm aware (please correct that if I'm wrong). Almost all of the replicas for its 1000 ranges ended up on the same 3 nodes, meaning that the best leaseholder rebalancing in the world couldn't make for good performance. Those 3 nodes are indeed the 3 most loaded along dimensions such as number of BatchRequests processed, CPU usage, and keys written per second, with 1 of them in particular really standing out in goroutines (20k vs 2k for the next highest).

This could be fixed in a number of ways, such as:

  • Load-based rebalancing
  • Trying to diversify the nodes a table's ranges are on when rebalancing
  • Implementing a version of ALTER TABLE ... SCATTER that scatters replicas in addition to leases

The last one should get the quickest results and we've talked about wanting it anyway, so I'll work on that tomorrow and see where it leaves things.


I do have a couple questions about oddities I've noticed, though:

  1. r1 has a lot of qps. Way more qps than any other range (460 qps to it vs 50 to the next hottest range in my latest full measurement, and it appears to be growing over time; it's now over 1000 qps). That's way higher than the cluster's rate of NotLeaseholderErrors. Judging by the fact that its keys written per second is basically, 0, they're all reads. According to the UI, there is no rebalancing activity going on, neither replicas nor leases. I increased the cluster setting for the number of range cache entries from 1M to 10M and it's had no effect. I enabled slow query tracing to try and see which requests were hitting it, and I'm seeing it repeatedly involved in QueryTxn attempts to resolve intents. It looks pretty fishy to me, but I haven't checked against the code yet. Here's an example trace: https://gist.github.com/a-robinson/17ebf561bf450b40e7e89edd967962ca. Is this expected?
  2. The tpcc.warehouse ranges on n11 (the one with the massive number of goroutines and most qps) are getting significantly more qps than those whose leases are on the other two nodes (https://gist.github.com/a-robinson/0aebc9aea3fb042255d3572e52ce63ac). Under normal operation, I'd expect each warehouse to get the same number of qps. Is that a correct understanding of tpc-c? And given that n11 seems to be the worst bottleneck, I'd expect each range on it to have fewer qps, not more, due to the slowdown. Any idea what might be going on here?

Finally, for future reference (cc @piyush-singh), things that would have been nice to have for the sake of debugging the performance issue:

  • Breakdown of which ranges are on each node, i.e. a breakdown of how many are from each table
  • Breakdown of where the replicas for each table are located, i.e. their distribution across the nodes
  • Breakdown of where the leaseholders for each table are located, i.e. their distribution across the nodes
  • List of the ranges receiving the most traffic (QPS and/or keys written per second), where their leaseholders are, where their replicas are, which tables they're from

I ended up manually constructing most of this data with the raft debug endpoint + a few python scripts as well as SHOW TESTING_RANGES + some bash.

And to be clear, the point of the first paragraph above is that leaseholder rebalancing can't fix the problem by itself if all the warehouse range replicas are on the same 3 nodes.

r1 has a lot of qps. Way more qps than any other range (460 qps to it vs 50 to the next hottest range in my latest full measurement, and it appears to be growing over time; it's now over 1000 qps). That's way higher than the cluster's rate of NotLeaseholderErrors. Judging by the fact that it's keys written per second is basically, 0, they're all reads. According to the UI, there is no rebalancing activity going on, neither replicas nor leases. I increased the cluster setting for the number of range cache entries from 1M to 10M and it's had no effect. I enabled slow query tracing to try and see which requests were hitting it, and I'm seeing it repeatedly involved in QueryTxn attempts to resolve intents. It looks pretty fishy to me, but I haven't checked against the code yet. Here's an example trace: https://gist.github.com/a-robinson/17ebf561bf450b40e7e89edd967962ca. Is this expected?

i'd expect r1 to be mostly idle in the steady state. @nvanbenschoten, @tschottdorf any theories?

The tpcc.warehouse ranges on n11 (the one with the massive number of goroutines and most qps) are getting significantly more qps than those whose leases are on the other two nodes (https://gist.github.com/a-robinson/0aebc9aea3fb042255d3572e52ce63ac). Under normal operation, I'd expect each warehouse to get the same number of qps. Is that a correct understanding of tpc-c? And given that n11 seems to be the worst bottleneck, I'd expect each range on it to have fewer qps, not more, due to the slowdown. Any idea what might be going on here?

This is not expected. Each warehouse should receive approximately the same qps. No idea what is going on, but it is worth pulling on this thread. It might be indicative of a bug in the load generator.

Finally, for future reference (cc @piyush-singh), things that would have been nice to have for the sake of debugging the performance issue:

Heh, I came up with a similar list when debugging tpcc performance issues. I believe there are issues filed for many of these. @piyush-singh please double check.

This is not expected. Each warehouse should receive approximately the same qps. No idea what is going on, but it is worth pulling on this thread. It might be indicative of a bug in the load generator.

I can't confirm it right now since I tore the cluster down when I was done last night, but I think it could conceivably be due to errors from that node. I recall there being a fairly large (albeit spiky) amount of "Replica Errors" in the UI's graph of RPC errors. It's possible that those were mostly coming from n11's replicas, and that retries were then bumping up the count of batches sent to n11's replicas.

Implementing a version of ALTER TABLE ... SCATTER that scatters replicas in addition to ranges

What does "scatters replicas in addition to ranges" mean?

r1 has a lot of qps.

That's surprising. This is probably due to overly-broad cache invalidation when the DistSender gets a NotLeaseHolder error (or a few other things). It sounds like we might be invalidating both meta1 and meta2 when we should only be invalidating meta2.

We will be showing location of replicas in a debug page shortly (#24855), and @vilterp and I have discussed using this layout to similarly show leaseholder counts.

@petermattis mentioned similar requests for information about ranges in #23379 (nodes to which they were replicated, size, load). I'll make a note to think through how best to show ranges in the UI for our last milestone since we'll be making some layout changes anyways.

Thanks for the write up and detailed requests @a-robinson

r1 has a lot of qps. Way more qps than any other range (460 qps to it vs 50 to the next hottest range in my latest full measurement, and it appears to be growing over time; it's now over 1000 qps). That's way higher than the cluster's rate of NotLeaseholderErrors. Judging by the fact that it's keys written per second is basically, 0, they're all reads. According to the UI, there is no rebalancing activity going on, neither replicas nor leases. I increased the cluster setting for the number of range cache entries from 1M to 10M and it's had no effect. I enabled slow query tracing to try and see which requests were hitting it, and I'm seeing it repeatedly involved in QueryTxn attempts to resolve intents. It looks pretty fishy to me, but I haven't checked against the code yet. Here's an example trace: https://gist.github.com/a-robinson/17ebf561bf450b40e7e89edd967962ca. Is this expected?

The trace is pretty suspicious for this issue. As you mentioned, we see a large number of QueryTxn requests in the trace. A lot of these something like:

event:[n11,s11,r55510/2:/Table/55/1/9{00-10}] no pusher found; backing off
 30575.736ms    390.389ms                                operation:dist sender
 30575.767ms      0.030ms                                event:[s11,r55510/2:/Table/55/1/9{00-10},n11] querying next range at /Min
 30575.806ms      0.040ms                                event:[s11,r55510/2:/Table/55/1/9{00-10},n11] r1: sending batch 1 QueryTxn to (n20,s19):8
 30575.827ms      0.021ms                                event:[s11,r55510/2:/Table/55/1/9{00-10},n11] sending request to alex-tpcc-0002:26257
 30575.848ms      0.021ms                                    operation:/cockroach.roachpb.Internal/Batch component:gRPC sb:1 span.kind:client txnid:779a7ccb-5b5d-4a8c-8c02-33a5937f9510
 30576.756ms      0.908ms                                        operation:/cockroach.roachpb.Internal/Batch component:gRPC sb:1 span.kind:server txnid:779a7ccb-5b5d-4a8c-8c02-33a5937f9510
 30576.776ms      0.019ms                                        event:[n20] 1 QueryTxn
 30576.781ms      0.005ms                                        event:[n20,s19] executing 1 requests
 30626.903ms     50.122ms                                        event:[n20,s19,r1/8:/{Min-System/}] read-only path
 30626.914ms      0.011ms                                        event:[n20,s19,r1/8:/{Min-System/}] command queue
 30626.918ms      0.004ms                                        event:[n20,s19,r1/8:/{Min-System/}] waiting for read lock
 30626.970ms      0.052ms                                        event:[n20,s19,r1/8:/{Min-System/}] read completed

Note that they're addressed to /Min. This trace is coming from https://github.com/cockroachdb/cockroach/blob/f8a7889c598150b45c933c96ee19dc522eb2f8d3/pkg/storage/txnwait/queue.go#L682-L684 and I suspect that we're querying a transaction that does not have a record yet (i.e. pusher.TxnMeta.Key == nil). For what it's worth, @spencerkimball has been dealing with issues like this lately with some of his contentionQueue changes.

things that would have been nice to have for the sake of debugging the performance issue:

@vilterp's replica matrix (https://github.com/cockroachdb/cockroach/pull/24855) would address at least the first two of these needs. Might be worth trying it out here.

What does "scatters replicas in addition to ranges" mean?

Sorry, I guess I can't type late at night. I meant "scatters replicas in addition to leases".

Yes, we would be seeing spurious QueryTxn requests to r1 in the current code because the txn wait queue isn't checking whether there's a key available for the pusher before querying it for dependencies. The lack of a key addresses the QueryTxn request to r1. This is a one line fix as part of my current PR, but I'll break it out and send it for review.

Before scattering replicas, running tpc-c 5k on 15 nodes:

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  300.3s    29544.7  45.9%  26271.4  12348.0  85899.3 103079.2 103079.2 103079.2

After scattering replicas (and leases), running tpc-c 5k on 15 nodes:

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  300.4s    61047.7  94.9%   1106.1    117.4    369.1   9663.7  19327.4  28991.0

So if our goal is to have a usable workaround, scattering replicas basically gets us there. And if we want to remove the manual aspect of it, it looks like it'll be more impactful to work on stats-based replica rebalancing than lease rebalancing.

@a-robinson Nice improvement. The p99 and pMax latency looks higher than what I remember for tpcc-5k. Would be interesting to see how this compares to tpcc-5k with partitioning.

Running with --ramp=30s to ramp up the traffic more smoothly seems to help a lot with the tail latencies:

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  300.4s    60544.4  94.2%    143.7    134.2    218.1    260.0    453.0   4831.8

Of course, the sample sizes here aren't exactly large, so maybe the difference is just noise.

And to follow up on my questions from the other day:

  1. r1 isn't by far the hottest range anymore. Its qps hasn't dropped to 0, either, though. The slow query log hasn't been nearly as illuminating this time, just showing a handful of normal range-lookup events. It's weird it hasn't dropped closer to 0, but it's not causing any problems anymore.
  2. All the warehouse ranges are now down around 2 qps, with no outliers. So the outliers the other day probably were caused by error retries.

The hottest ranges in the cluster at this point are all from the tpcc.new_order table, followed by the tpcc.item table. Top 50 ranges in case you're curious, with the rangeIDs in the mid-hundreds being from new_order and the 30-thousands from item:

idx rangeID    QPS                nodeIDs     leaseholder node
  1:   496      217.292003       [5, 6, 4]      6
  2:   793      216.760419      [5, 10, 6]      5
  3:   653      216.523500      [5, 10, 6]      10
  4:   448      216.106647      [5, 10, 6]      10
  5:   557      215.943809      [15, 3, 1]      15
  6:   458      215.132572      [15, 12, 1]     15
  7:   533      193.794032      [5, 15, 4]      5
  8:   730      188.153795       [5, 6, 3]      6
  9:   293      159.987734      [10, 6, 9]      9
 10:   413      154.168710      [5, 15, 4]      5
 11:   613      152.213182      [15, 6, 1]      15
 12:   267      151.527005       [5, 6, 4]      6
 13:   797      151.262274       [5, 6, 3]      3
 14:   611      150.982270       [6, 4, 3]      6
 15:   457      150.963159       [5, 6, 3]      5
 16:   724      150.941294       [5, 3, 1]      3
 17:   615      150.589744      [5, 12, 4]      12
 18: 33644      123.715932       [8, 6, 7]      8
 19: 33197      112.783797       [8, 6, 7]      6
 20:   388      112.059562      [5, 10, 4]      4
 21:   540       97.112390       [5, 6, 4]      4
 22: 34061       94.627812       [8, 6, 7]      7
 23: 32384       92.206044       [8, 6, 7]      8
 24: 32110       86.193180       [8, 6, 7]      6
 25: 32675       82.701674       [8, 6, 7]      6
 26: 34390       73.451197       [8, 6, 7]      6
 27: 34878       70.419347       [8, 6, 7]      7
 28:   266       69.683754      [5, 15, 4]      15
 29: 31457       67.308302       [8, 6, 7]      6
 30: 33275       67.023898       [8, 6, 7]      7
 31:   976       65.729066      [5, 10, 6]      5
 32: 36672       63.460209      [5, 15, 4]      15
 33: 34217       60.512212       [8, 6, 7]      6
 34:   493       59.469083      [5, 6, 12]      6
 35: 34104       54.001622       [8, 6, 7]      7
 36: 32033       53.894249       [8, 6, 7]      6
 37: 31546       52.105440       [8, 6, 7]      6
 38: 32199       52.043978       [8, 6, 7]      7
 39: 34701       50.376507       [8, 6, 7]      8
 40: 32436       49.500585       [8, 6, 7]      8
 41: 32335       47.352396       [8, 6, 7]      6
 42: 32210       46.446430       [8, 6, 7]      8
 43: 34580       45.592824       [8, 6, 7]      8
 44: 32663       44.654265       [8, 6, 7]      8
 45: 33395       44.197098       [8, 6, 7]      8
 46: 33678       44.139657       [8, 6, 7]      7
 47: 31786       43.393267       [8, 6, 7]      8
 48:     1       42.440460      [2, 14, 12]     2
 49: 34153       41.976303       [8, 6, 7]      7
 50: 33196       40.899344       [8, 6, 7]      8

Hey @a-robinson, we can discuss this in a separate issue, but how did you generate the hot ranges table above? Would love to get it into an endpoint so we can show it in the UI.

The new_order table has warehouse ID as the first component of its primary key. I'm surprised to see such a delta in QPS across those new_order ranges. And I'm guessing this is only a fraction of the new_order ranges.

Hey @a-robinson, we can discuss this in a separate issue, but how did you generate the hot ranges table above? Would love to get it into an endpoint so we can show it in the UI.

hottest_ranges.py from https://gist.github.com/a-robinson/54fbaa6628ae9f1ad9c6185ecd28edb9 run on the output of the /_status/raft endpoint.

Disclaimer: the qps numbers are averages over the last 30 minutes, not instantaneous measurements. Also, that raft debug endpoint is very slow (>1 minute) on a large cluster.

And I'm guessing this is only a fraction of the new_order ranges.

No, they're all represented there except for one outlier that at the very end of the table that's responsible for fewer warehouses. Although that's ignoring the bunch of ranges in the table that are responsible for effectively none of the keyspace (e.g. /1764/3/2642-/1764/3/2642/0), which makes it look like we're choosing split points incorrectly somewhere (reminiscent of https://github.com/cockroachdb/cockroach/pull/24896):

root@localhost:26257/defaultdb> show testing_ranges from table tpcc.new_order;
+-----------------+-----------------+----------+-------------+--------------+
|    Start Key    |     End Key     | Range ID |  Replicas   | Lease Holder |
+-----------------+-----------------+----------+-------------+--------------+
| NULL            | /0              |      192 | {3,4,12}    |           12 |
| /0              | /0/1/2101/0     |    36721 | {3,4,12}    |           12 |
| /0/1/2101/0     | /138/9/2506     |      266 | {4,5,15}    |           15 |
| /138/9/2506     | /272/10/2564    |    36672 | {4,5,15}    |           15 |
| /272/10/2564    | /272/10/2564/0  |      337 | {4,5,15}    |           15 |
| /272/10/2564/0  | /521/6/2127     |      413 | {4,5,15}    |           15 |
| /521/6/2127     | /521/6/2127/0   |      460 | {5,10,15}   |           15 |
| /521/6/2127/0   | /770/1/2590     |      533 | {4,5,15}    |           15 |
| /770/1/2590     | /770/1/2590/0   |      198 | {3,6,15}    |           15 |
| /770/1/2590/0   | /1000           |      293 | {6,9,10}    |            6 |
| /1000           | /1018/7/2153    |    36711 | {6,9,10}    |            6 |
| /1018/7/2153    | /1018/7/2153/0  |      412 | {4,9,14}    |            9 |
| /1018/7/2153/0  | /1267/2/2616    |      458 | {1,12,15}   |            1 |
| /1267/2/2616    | /1267/2/2616/0  |      552 | {1,4,15}    |            1 |
| /1267/2/2616/0  | /1515/8/2179    |      557 | {1,3,15}    |            1 |
| /1515/8/2179    | /1515/8/2179/0  |      215 | {4,5,12}    |           12 |
| /1515/8/2179/0  | /1764/3/2642    |      267 | {2,4,6}     |            6 |
| /1764/3/2642    | /1764/3/2642/0  |      339 | {4,5,11}    |           11 |
| /1764/3/2642/0  | /1948/8/2638    |      388 | {4,5,10}    |            4 |
| /1948/8/2638    | /1948/8/2638/0  |      492 | {3,6,14}    |           14 |
| /1948/8/2638/0  | /2000           |      611 | {3,4,6}     |            3 |
| /2000           | /2197/4/2201    |    36697 | {3,4,6}     |            3 |
| /2197/4/2201    | /2197/4/2201/0  |      262 | {2,4,6,15}  |            2 |
| /2197/4/2201/0  | /2445/9/2664    |      457 | {5,6,13}    |            6 |
| /2445/9/2664    | /2445/9/2664/0  |      536 | {3,6,12}    |           12 |
| /2445/9/2664/0  | /2605/5/2267    |      540 | {2,6,8,15}  |            8 |
| /2605/5/2267    | /2605/5/2267/0  |      721 | {3,4,6}     |            3 |
| /2605/5/2267/0  | /2853/10/2730   |      730 | {3,5,6}     |            3 |
| /2853/10/2730   | /2853/10/2730/0 |      331 | {4,6,15}    |           15 |
| /2853/10/2730/0 | /2921/10/2590   |      493 | {5,6,12}    |           12 |
| /2921/10/2590   | /2921/10/2590/0 |      612 | {1,2,3}     |            1 |
| /2921/10/2590/0 | /3000           |      613 | {1,6,15}    |            1 |
| /3000           | /3170/6/2153    |    36685 | {1,6,15}    |            1 |
| /3170/6/2153    | /3170/6/2153/0  |      614 | {1,6,15}    |            1 |
| /3170/6/2153/0  | /3419/1/2616    |      615 | {4,5,12}    |           12 |
| /3419/1/2616    | /3419/1/2616/0  |      384 | {4,5,12}    |            4 |
| /3419/1/2616/0  | /3667/7/2179    |      496 | {4,5,6}     |            6 |
| /3667/7/2179    | /3667/7/2179/0  |      551 | {3,6,7}     |            3 |
| /3667/7/2179/0  | /3916/2/2642    |      724 | {1,3,5}     |            5 |
| /3916/2/2642    | /3916/2/2642/0  |      794 | {3,5,15}    |           15 |
| /3916/2/2642/0  | /4000           |      797 | {3,5,6}     |            3 |
| /4000           | /4164/8/2205    |    36692 | {3,5,6}     |            3 |
| /4164/8/2205    | /4164/8/2205/0  |      417 | {4,6,11,13} |            6 |
| /4164/8/2205/0  | /4413/3/2668    |      448 | {5,6,10}    |           10 |
| /4413/3/2668    | /4413/3/2668/0  |      535 | {3,5,6}     |            3 |
| /4413/3/2668/0  | /4661/9/2231    |      653 | {5,6,10}    |           10 |
| /4661/9/2231    | /4661/9/2231/0  |      726 | {4,5,13}    |           13 |
| /4661/9/2231/0  | /4910/4/2694    |      793 | {5,9,10}    |            5 |
| /4910/4/2694    | /4910/4/2694/0  |      845 | {5,10,13}   |           10 |
| /4910/4/2694/0  | /4999/10/3001   |      976 | {4,5,10}    |           10 |
| /4999/10/3001   | /5000           |     1091 | {4,6,11,13} |            6 |
| /5000           | NULL            |    36731 | {4,6,13}    |            4 |
+-----------------+-----------------+----------+-------------+--------------+
(52 rows)

No, they're all represented there except for one outlier that at the very end of the table that's responsible for fewer warehouses. Although that's ignoring the bunch of ranges in the table that are responsible for effectively none of the keyspace (e.g. /1764/3/2642-/1764/3/2642/0), which makes it look like we're choosing split points incorrectly somewhere (reminiscent of #24896):

We don't manually split the new_order table. I'm not sure why not. Perhaps there are enough ranges already to not require splitting. The strange nearly empty ranges might indicate badness (or a bug) in the automatic splitting heuristics.

Using partitioning gets basically the same results as with replica scattering:

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  300.4s    60837.6  94.6%     84.7     71.3    130.0    201.3    352.3   1006.6

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  300.4s    60587.1  94.2%     88.9     79.7    130.0    184.5    369.1    906.0

The strange nearly empty ranges might indicate badness (or a bug) in the automatic splitting heuristics.

It looks to me like the tpc-c 5k fixture I'm RESTOREing has the bad splits already in it. It's as if the cluster that created the backup had partitioned the table and was running a version before https://github.com/cockroachdb/cockroach/pull/24896. Either that, or the RESTORE process itself creates the bad splits somehow. The bad splits are consistently there as soon as the data has been restored.

It looks to me like the tpc-c 5k fixture I'm RESTOREing has the bad splits already in it. It's as if the cluster that created the backup had partitioned the table and was running a version before #24896. Either that, or the RESTORE process itself creates the bad splits somehow. The bad splits are consistently there as soon as the data has been restored.

If this is reproducible, can you file a separate issue about this? Seems like something for the bulkio folks to investigate.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bdarnell picture bdarnell  路  4Comments

magaldima picture magaldima  路  3Comments

danhhz picture danhhz  路  3Comments

couchand picture couchand  路  3Comments

nvanbenschoten picture nvanbenschoten  路  3Comments