SHA: https://github.com/cockroachdb/cockroach/commits/c59f5347d5424edb90575fb0fd50bad677953752
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1195732&tab=buildLog
The test failed on release-19.1:
ts_util.go:89,kv.go:186,cluster.go:1605,errgroup.go:57: spent 5.263158% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1626,kv.go:189,test.go:1214: Goexit() was called
spent 5.263158% of time below target of ... 5.000000%. We're close enough to the target that this is probably just a low-probability flake. I'll leave this as is and see if we fail over the next few days. If not, I'll bump the target threshold slightly, as it was chosen arbitrarily.
SHA: https://github.com/cockroachdb/cockroach/commits/a1e6e9decc9dec15a32bbb6d30efc67ca45a532a
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1204585&tab=buildLog
The test failed on release-19.1:
ts_util.go:89,kv.go:186,cluster.go:1605,errgroup.go:57: spent 5.263158% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1626,kv.go:189,test.go:1216: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/83de585d331b05a4aa02a65b353bed6bf829b696
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1247383&tab=buildLog
The test failed on master:
ts_util.go:89,kv.go:186,cluster.go:1667,errgroup.go:57: spent 14.035088% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1688,kv.go:189,test.go:1237: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/ec4728ae986b46d4f57009233b86971198b275ed
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1255121&tab=buildLog
The test failed on master:
ts_util.go:89,kv.go:186,cluster.go:1667,errgroup.go:57: spent 7.017544% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1688,kv.go:189,test.go:1237: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/99306ec3e9fcbba01c05431cbf496e8b5b8954b4
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1260033&tab=buildLog
The test failed on master:
ts_util.go:89,kv.go:186,cluster.go:1761,errgroup.go:57: spent 7.017544% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1782,kv.go:189,test.go:1245: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/dff4132a80e62c6c5ad603ff6c608b09419d4e3e
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1264632&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:89,kv.go:186,cluster.go:1814,errgroup.go:57: spent 10.344828% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1835,kv.go:189,test.go:1253: Goexit() was called
The increased flakiness looks somewhat correlated with https://github.com/cockroachdb/cockroach/pull/36748. I wonder if there are cases where we were relying on liveness to expire for transactions to be unstuck. I'm going to run this with a huge txn liveness expiration and see if things get even worse.
That did the trick. I can clearly see transactions waiting on other transactions for 30+ seconds after bumping TxnLivenessHeartbeatMultiplier to 30. I think it's likely that a transaction is failing to roll back its transaction record.
I see the transactions that are failing to roll back their transaction records hitting errors like
TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_ABORTED_RECORD_FOUND): "sql txn" id=c4215f04 key=/Table/53/1/-7656810924711545548/0 rw=true pri=0.01158742 stat=PENDING epo=0 ts=1556562081.872413329,1 orig=1556562051.258792452,0 max=1556562051.260084221,0 wto=true seq=13
So actually that doesn't say very much.
On the /debug/requests page I saw pushers getting stuck in the contentionQueue and repeatedly hitting TransactionAbortedError(ABORT_REASON_PUSHER_ABORTED) errors every 100ms (after waiting for cycle detection to be ready). This makes sense because contentionQueue.add doesn't return an error so transactions can fall into loops whether they hit an intent, try to push, fail to push due to an error, and repeat. I added a few lines to plumb the error back up through the IntentResolver and things immediately got a lot better. I no longer see any long stalls in the workload even with the 30s TxnLivenessThreshold.
I'm going to make the following changes:
contentionQueue instead of swallowing themTxnLivenessHeartbeatMultiplier to an environment variable and set it to 60s in this roachtest. We should never need to wait on a transaction's liveness to expire when all nodes are live, so the low timeout was only serving to hide bugs like thisAnother option is to have a cluster setting/env var that disables transaction expiration entirely so that a failure to roll a transaction back causes everyone else to get stuck. This benefit of this is that it would be very loud.
A compromise would be to have a knob that allows us to set TxnLivenessThreshold to some high value and another to instruct PushTxn to fatal if we ever stumble upon an expired PENDING transaction. The nice part about this is that it would be equally as loud in roachtests as the first option but wouldn't cause the tests to hang forever. @andreimatei any opinion on this?
SHA: https://github.com/cockroachdb/cockroach/commits/efb45869b242137e5c178b10c646c3ed025fff36
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1266041&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:89,kv.go:186,cluster.go:1814,errgroup.go:57: spent 12.280702% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1835,kv.go:189,test.go:1253: Goexit() was called
A compromise would be to have a knob that allows us to set TxnLivenessThreshold to some high value and another to instruct PushTxn to fatal if we ever stumble upon an expired PENDING transaction. The nice part about this is that it would be equally as loud in roachtests as the first option but wouldn't cause the tests to hang forever. @andreimatei any opinion on this?
I like it.
SHA: https://github.com/cockroachdb/cockroach/commits/24feca7a4106f08c73534e16ebb79d949a479f35
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1268176&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:89,kv.go:186,cluster.go:1814,errgroup.go:57: spent 12.280702% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1835,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/ff969dd6cb0e087327f0b210f728f588b2f5aeb0
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1270738&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:89,kv.go:186,cluster.go:1814,errgroup.go:57: spent 22.807018% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1835,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/856ba9108f112f85d406bbe88d2208651859336e
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1274175&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:89,kv.go:186,cluster.go:1812,errgroup.go:57: spent 15.789474% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1833,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/23155799e92e54915ae66259d06a630e981afbeb
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1277061&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:89,kv.go:186,cluster.go:1812,errgroup.go:57: spent 7.017544% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1833,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/8abb47a1c9795c1463183bc44e776b054bece682
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1279683&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:89,kv.go:186,cluster.go:1812,errgroup.go:57: spent 8.771930% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1833,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/048bdc163fcb470d4e749fcad482cf2671c29fb1
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1281656&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:89,kv.go:186,cluster.go:1812,errgroup.go:57: spent 12.280702% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1833,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/45e15e05abff25e099ca59f4c5cb40a6cf695e6d
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1285294&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:89,kv.go:186,cluster.go:1812,errgroup.go:57: spent 7.017544% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1833,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/c25518b4e9a723d8de0dba30a95ce0ade7963aed
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1286188&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:89,kv.go:186,cluster.go:1812,errgroup.go:57: spent 7.017544% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1833,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/ba5c092a726134b73e789c2047f7ec151be7c1a1
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1288263&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:89,kv.go:186,cluster.go:1812,errgroup.go:57: spent 6.896552% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1833,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/7b2651400b2003d0a381cba9dbfc0b7bc0dfee00
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1293898&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:89,kv.go:186,cluster.go:1812,errgroup.go:57: spent 8.771930% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1833,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/58c567a325056033b326cb9c4ed9ba490e8956da
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1296592&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:89,kv.go:186,cluster.go:1812,errgroup.go:57: spent 5.263158% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1833,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/9671342fead0509bec0913bae4ae1f244660788e
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1298500&tab=buildLog
The test failed on branch=release-19.1, cloud=gce:
cluster.go:1474,kv.go:179,cluster.go:1812,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1298500-kv-contention-nodes-4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
I190521 10:37:49.444190 1 workload/workload.go:562 starting 4 splits
Error: ALTER TABLE kv SPLIT AT VALUES (-3689348814741909504): pq: splits would be immediately discarded by merge queue; disable the merge queue first by running 'SET CLUSTER SETTING kv.range_merge.queue_enabled = false'
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_35.231.83.24_2019-05-21T10:37:48Z: exit status 1
: exit status 1
cluster.go:1833,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/7009f8750d5c3af32d5c43011869048ea7a311ae
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1300930&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:89,kv.go:186,cluster.go:1848,errgroup.go:57: spent 5.263158% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1869,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/c9301cf71ea69da451fe5e5ba2c3074a4fe53831
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1303699&tab=buildLog
The test failed on branch=master, cloud=gce:
cluster.go:1516,kv.go:179,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1303699-kv-contention-nodes-4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
7.9 17179.9 31138.5 66572.0 write
10m57s 0 163.1 94.5 6.0 16643.0 18253.6 32212.3 write
10m58s 0 0.0 94.4 0.0 0.0 0.0 0.0 write
10m59s 0 0.0 94.2 0.0 0.0 0.0 0.0 write
11m0s 0 0.0 94.1 0.0 0.0 0.0 0.0 write
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
11m1s 0 84.0 94.1 7.1 4831.8 22548.6 94489.3 write
Error: ERROR: TransactionStatusError: programming error: found COMMITTED record for prevented implicit commit: id=add05534 key=/Table/53/1/-3313874327987903458/0 rw=true pri=0.13424172 stat=COMMITTED epo=1 ts=1558615331.690889202,1 orig=1558615326.639046519,2 max=0.000000000,0 wto=false seq=13 int=14 (REASON_UNKNOWN) (SQLSTATE XXUUU)
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_35.237.56.114_2019-05-23T12:31:09Z: exit status 1
: exit status 1
cluster.go:1875,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/630a6e9cb3771912cd138f9aa3bea1f0ca9fa7c9
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1306250&tab=buildLog
The test failed on branch=master, cloud=gce:
cluster.go:1516,kv.go:179,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1306250-kv-contention-nodes-4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
I190524 11:43:27.549556 1 workload/workload.go:578 starting 4 splits
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
1s 0 396.2 604.8 113.2 570.4 805.3 906.0 write
2s 0 534.1 569.4 79.7 805.3 1140.9 1811.9 write
3s 0 572.9 570.6 56.6 939.5 1610.6 2415.9 write
4s 0 534.1 561.5 32.5 973.1 1744.8 3221.2 write
Error: ERROR: duplicate key value (v)=('\x5c') violates unique constraint "kv_v_idx" (SQLSTATE 23505)
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_34.73.250.48_2019-05-24T11:43:26Z: exit status 1
: exit status 1
cluster.go:1875,kv.go:189,test.go:1251: Goexit() was called
found COMMITTED record for prevented implicit commit
Is fixed by https://github.com/cockroachdb/cockroach/pull/37784.
duplicate key value (v)=('\x5c') violates unique constraint "kv_v_idx"
I'm not sure what to make of this. I need to dig in.
duplicate key value (v)=('\x5c') violates unique constraint "kv_v_idx"
I can reproduce this fairly easily, even with parallel commits disabled, so it's not that.
I can reproduce at least as far back as 7009f8750d5c3af32d5c43011869048ea7a311ae. Did something in the workload change?
I'm able to repro this immediately by running ./workload run kv --init --secondary-index --cycle-length=10 --batch=5 against a one-node cluster.
@jordanlewis could I assign you for triage? It's not clear that this is an issue in Core and I'm out for the next week so I want to make sure I get to the issues that definitely are.
Sure.
SHA: https://github.com/cockroachdb/cockroach/commits/fc7e48295cd05f94fd2883498d96d91ad538e559
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1308263&tab=buildLog
The test failed on branch=master, cloud=gce:
cluster.go:1516,kv.go:179,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1308263-kv-contention-nodes-4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
12s 0 554.0 529.5 19.9 570.4 3892.3 9126.8 write
13s 0 510.1 528.0 21.0 486.5 2818.6 11811.2 write
14s 0 186.0 503.6 17.8 486.5 637.5 6710.9 write
15s 0 8.0 470.5 5.5 3087.0 3087.0 3087.0 write
16s 0 8.0 441.6 7.1 10737.4 10737.4 10737.4 write
17s 0 3.0 415.8 8589.9 12348.0 12348.0 12348.0 write
18s 0 13.0 393.4 6.0 9126.8 11811.2 11811.2 write
19s 0 167.9 381.6 9.4 5368.7 9663.7 13421.8 write
20s 0 89.0 366.9 8.1 6710.9 13958.6 15032.4 write
Error: ERROR: duplicate key value (v)=('\xf0') violates unique constraint "kv_v_idx" (SQLSTATE 23505)
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_34.73.34.56_2019-05-25T11:59:00Z: exit status 1
: exit status 1
cluster.go:1875,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/db9c1217a6967fcac2d135cf0f24a4265dc76d77
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1310296&tab=buildLog
The test failed on branch=master, cloud=gce:
cluster.go:1516,kv.go:179,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1310296-kv-contention-nodes-4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
ite
18s 0 44.0 425.8 10.0 4563.4 11274.3 11274.3 write
19s 0 239.9 416.0 13.1 1946.2 10200.5 13958.6 write
20s 0 396.1 415.0 12.1 1744.8 3623.9 15032.4 write
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
21s 0 68.0 398.5 12.6 637.5 1744.8 2415.9 write
22s 0 27.0 381.6 12.1 9126.8 15032.4 15032.4 write
23s 0 24.0 366.0 13.1 2952.8 6979.3 6979.3 write
24s 0 0.0 350.8 0.0 0.0 0.0 0.0 write
25s 0 65.0 339.4 11.0 4563.4 18253.6 19327.4 write
Error: ERROR: duplicate key value (v)=('\xf8') violates unique constraint "kv_v_idx" (SQLSTATE 23505)
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_35.243.249.205_2019-05-26T10:48:17Z: exit status 1
: exit status 1
cluster.go:1875,kv.go:189,test.go:1251: Goexit() was called
The problem here is that the workload is producing statements with multiple of the same key in a single UPSERT statement, seen here:
I190528 04:11:03.358978 632563 sql/exec_log.go:162 [n1,client=[::1]:59428,user=root] 182 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8), ($9, $10)" {$1:"1464797858451214926", $2:"'\\xd2'", $3:"-5499256992642714345", $4:"'\\xc2'", $5:"8989193896465373993", $6:"'\\x7b'", $7:"-8472975028287664194", $8:"'\\xeb'", $9:"1464797858451214926", $10:"'\\x96'"} 2.006 0 "duplicate key value (k)=(1464797858451214926) violates unique constraint \"primary\""
Note that the key 1464797858451214926 is repeated twice in the statement. I filed #37880 to investigate this behavior change in the CBO.
It's still unknown why the workload produces an error message about v and not k. But I think it's probably the same root cause.
SHA: https://github.com/cockroachdb/cockroach/commits/c280de40c2bcab93c41fe82bef8353a5ecd95ac4
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1311970&tab=buildLog
The test failed on branch=master, cloud=gce:
cluster.go:1516,kv.go:179,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1311970-kv-contention-nodes-4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
rite
2m54s 0 728.9 118.9 8.9 1073.7 3892.3 60129.5 write
2m55s 0 104.1 118.8 5.5 1040.2 2281.7 13421.8 write
2m56s 0 287.9 119.8 7.3 2013.3 3758.1 33286.0 write
2m57s 0 0.0 119.1 0.0 0.0 0.0 0.0 write
2m58s 0 0.0 118.4 0.0 0.0 0.0 0.0 write
2m59s 0 12.0 117.8 5.2 3623.9 3892.3 3892.3 write
3m0s 0 0.0 117.2 0.0 0.0 0.0 0.0 write
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
3m1s 0 84.0 117.0 6.8 6442.5 7516.2 60129.5 write
Error: ERROR: duplicate key value (v)=('\xef') violates unique constraint "kv_v_idx" (SQLSTATE 23505)
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_35.196.93.213_2019-05-28T13:08:00Z: exit status 1
: exit status 1
cluster.go:1875,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/8f42e0d9948256af8b3e1994d514314ba1718c48
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1315162&tab=buildLog
The test failed on branch=master, cloud=gce:
cluster.go:1516,kv.go:179,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1315162-kv-contention-nodes-4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
6m31s 0 108.0 83.8 7.6 12884.9 14495.5 77309.4 write
6m32s 0 0.0 83.6 0.0 0.0 0.0 0.0 write
6m33s 0 218.9 83.9 7.1 9663.7 16643.0 73014.4 write
6m34s 0 273.0 84.4 7.1 3087.0 10200.5 10200.5 write
6m35s 0 0.0 84.2 0.0 0.0 0.0 0.0 write
6m36s 0 0.0 84.0 0.0 0.0 0.0 0.0 write
6m37s 0 0.0 83.8 0.0 0.0 0.0 0.0 write
6m38s 0 20.0 83.6 7.6 15032.4 20401.1 20401.1 write
6m39s 0 0.0 83.4 0.0 0.0 0.0 0.0 write
Error: ERROR: duplicate key value (v)=('\x49') violates unique constraint "kv_v_idx" (SQLSTATE 23505)
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_35.227.108.232_2019-05-30T10:47:59Z: exit status 1
: exit status 1
cluster.go:1875,kv.go:189,test.go:1251: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/af1bf3b0b16ec1913cfa945cb2ce76cc45f2d66a
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1318685&tab=buildLog
The test failed on branch=master, cloud=gce:
cluster.go:1516,kv.go:179,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1318685-kv-contention-nodes-4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
I190601 12:42:51.194842 1 workload/workload.go:578 starting 4 splits
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
1s 0 448.8 646.9 104.9 486.5 738.2 872.4 write
Error: ERROR: duplicate key value (v)=('\xdb') violates unique constraint "kv_v_idx" (SQLSTATE 23505)
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_34.74.214.58_2019-06-01T12:42:49Z: exit status 1
: exit status 1
cluster.go:1875,kv.go:189,test.go:1251: Goexit() was called
@tbg, are you aware of this failure mode?
Error: ERROR: TransactionStatusError: programming error: found COMMITTED record for prevented implicit commit: id=add05534 key=/Table/53/1/-3313874327987903458/0 rw=true pri=0.13424172 stat=COMMITTED epo=1 ts=1558615331.690889202,1 orig=1558615326.639046519,2 max=0.000000000,0 wto=false seq=13 int=14 (REASON_UNKNOWN) (SQLSTATE XXUUU)
The fix for the latest problem is to change kv to not include duplicate keys in the UPSERT. The problem I pasted looks more serious.
I'm actually no longer sure the duplicate key thing is the problem - that's just a problem in one of Nathan's repro cases. Still looking into this latest failure.
@jordanlewis see https://github.com/cockroachdb/cockroach/pull/37784 -- sorry should've pointed that out.
I've captured an example instance of an UPSERT that fails spuriously, and verified that it has no duplicate keys:
I190604 22:48:47.917258 2759 sql/exec_log.go:162 [n1,client=[::1]:50288,user=root] 25282 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-8194475532289442352", $2:"'\\x1a'", $3:"866343863770607923", $4:"'\\xac'", $5:"8481621241235572951", $6:"'\\x3c'", $7:"-4578397311671209003", $8:"'\\x52'"} 33021.310 0 "duplicate key value (v)=('\\xac') violates unique constraint \"kv_v_idx\""
Here's the stack trace of where the "unique constraint violation" (in quotes because in reality it is not a unique constraint) is coming from:
goroutine 2759 [running]:
runtime/debug.Stack(0xc00a44ef80, 0x4b47fd1, 0x76701e0)
/usr/local/Cellar/go/1.12.4/libexec/src/runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
/usr/local/Cellar/go/1.12.4/libexec/src/runtime/debug/stack.go:16 +0x22
github.com/cockroachdb/cockroach/pkg/sql/row.NewUniquenessConstraintViolationError(0x76df9e0, 0xc0078bcfc0, 0xc00b43a400, 0xc0061d4f00, 0x12, 0x20, 0xc0078142a0, 0xc00a44f3a0, 0x4af9112)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/errors.go:138 +0x65e
github.com/cockroachdb/cockroach/pkg/sql/row.ConvertBatchError(0x76df9e0, 0xc0078bcfc0, 0xc00b43a400, 0xc00e6bb180, 0x766ef60, 0xc013a3bf90)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/errors.go:66 +0x1e2
github.com/cockroachdb/cockroach/pkg/sql.(*tableWriterBase).finalize(0xc0009de000, 0x76df9e0, 0xc0078bcfc0, 0xc00b43a400, 0xc00fec06c0, 0xc010208b60)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/tablewriter.go:168 +0x94
github.com/cockroachdb/cockroach/pkg/sql.(*tableUpserterBase).finalize(0xc0009de000, 0x76df9e0, 0xc0078bcfc0, 0xc00fec0600, 0x0, 0x0, 0x0)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/tablewriter_upsert.go:168 +0x49
github.com/cockroachdb/cockroach/pkg/sql.(*upsertNode).BatchedNext(0xc011712d20, 0x76df9e0, 0xc0078bcfc0, 0xc00fec06c0, 0xc00b5cb950, 0x6aa1500, 0xc000584360, 0xc00a44f5a0)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/upsert.go:347 +0x29e
github.com/cockroachdb/cockroach/pkg/sql.(*rowCountNode).startExec(0xc0086cc160, 0x76df9e0, 0xc0078bcfc0, 0xc00fec06c0, 0xc00b5cb950, 0x0, 0xc00a44f5e8)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan_batch.go:173 +0xcd
github.com/cockroachdb/cockroach/pkg/sql.startExec.func2(0x6ef5a80, 0x5, 0x76e16e0, 0xc0086cc160, 0xc00a2f57a0, 0xc00a44f5f8)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan.go:488 +0x55
github.com/cockroachdb/cockroach/pkg/sql.(*planVisitor).visitInternal.func1(0xc009026300, 0x6ef5a80, 0x5, 0x76e16e0, 0xc0086cc160)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/walk.go:146 +0x5d
github.com/cockroachdb/cockroach/pkg/sql.(*planVisitor).visitInternal(0xc009026300, 0x76e16e0, 0xc0086cc160, 0x6ef5a80, 0x5)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/walk.go:628 +0x2a8
github.com/cockroachdb/cockroach/pkg/sql.(*planVisitor).visit(0xc009026300, 0x76e16e0, 0xc0086cc160, 0x0, 0x60)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/walk.go:113 +0x8d
github.com/cockroachdb/cockroach/pkg/sql.walkPlan(...)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/walk.go:77
github.com/cockroachdb/cockroach/pkg/sql.startExec(0x76df9e0, 0xc0078bcfc0, 0xc00fec06c0, 0xc00b5cb950, 0x76e16e0, 0xc0086cc160, 0x1, 0x970de60)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan.go:491 +0x286
github.com/cockroachdb/cockroach/pkg/sql.(*planNodeToRowSource).Start(0xc010274a00, 0x76df9e0, 0xc0078bcfc0, 0x970de60, 0x69edec0)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan_node_to_row_source.go:121 +0xd6
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.(*ProcessorBase).Run(0xc010274a00, 0x76df9e0, 0xc0078bcfc0)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/processors.go:795 +0x52
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.(*Flow).Run(0xc0144041e0, 0x76df9e0, 0xc0078bcfc0, 0x706b7e0, 0x0, 0x0)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/flow.go:626 +0x1ec
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run(0xc00011de00, 0xc00919fce0, 0xc00ff045a0, 0xc00a450868, 0xc013930840, 0xc00b5cba28, 0x0)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:253 +0x8fe
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).PlanAndRun(0xc00011de00, 0x76df9e0, 0xc0078bc810, 0xc00b5cba28, 0xc00919fce0, 0xc00ff045a0, 0x76e16e0, 0xc0086cc160, 0xc013930840)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:841 +0x1fb
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execWithDistSQLEngine(0xc00b5cb600, 0x76df9e0, 0xc0078bc810, 0xc00b5cb950, 0x2, 0xeb7a3f0, 0xc0089e3d90, 0xc00a450a00, 0x0, 0x0)
/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:903 +0x284
Immediately preceding the failure are several writes of the same keys, but different values:
I190604 22:48:47.786118 3167 sql/exec_log.go:162 [n1,client=[::1]:50325,user=root] 25274 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-8194475532289442352", $2:"'\\x58'", $3:"866343863770607923", $4:"'\\xe3'", $5:"8481621241235572951", $6:"'\\x1e'", $7:"-4578397311671209003", $8:"'\\x36'"} 129506.128 0 "TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_WRITE_TOO_OLD): \"sql txn\" id=44d71960 key=/Table/3840/1/-8194475532289442352/0 rw=true pri=0.13950700 stat=PENDING epo=0 ts=1559688522.860037000,6 orig=1559688521.088864000,0 max=1559688521.089188000,0 wto=true seq=13"
I190604 22:48:47.788088 3167 sql/exec_log.go:162 [n1,client=[::1]:50325,user=root] 25275 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-8194475532289442352", $2:"'\\x58'", $3:"866343863770607923", $4:"'\\xe3'", $5:"8481621241235572951", $6:"'\\x1e'", $7:"-4578397311671209003", $8:"'\\x36'"} 129508.100 0 "TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_TIMESTAMP_CACHE_REJECTED_POSSIBLE_REPLAY): \"sql txn\" id=44d71960 key=/Table/3840/1/-8194475532289442352/0 rw=true pri=0.13950700 stat=PENDING epo=1 ts=1559688523.071164000,1 orig=1559688522.860037000,6 max=1559688521.089188000,0 wto=false seq=13"
I190604 22:48:47.795567 2692 sql/exec_log.go:162 [n1,client=[::1]:50271,user=root] 25276 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-8194475532289442352", $2:"'\\x26'", $3:"866343863770607923", $4:"'\\xc3'", $5:"8481621241235572951", $6:"'\\xec'", $7:"-4578397311671209003", $8:"'\\xcc'"} 161694.515 0 "TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_WRITE_TOO_OLD): \"sql txn\" id=a8b3c642 key=/Table/3840/1/-8194475532289442352/0 rw=true pri=0.13950695 stat=PENDING epo=0 ts=1559688522.860037000,6 orig=1559688521.189769000,0 max=1559688521.190152000,0 wto=true seq=13"
I190604 22:48:47.804952 2692 sql/exec_log.go:162 [n1,client=[::1]:50271,user=root] 25277 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-8194475532289442352", $2:"'\\x26'", $3:"866343863770607923", $4:"'\\xc3'", $5:"8481621241235572951", $6:"'\\xec'", $7:"-4578397311671209003", $8:"'\\xcc'"} 161703.902 4 ""
I190604 22:48:47.885090 3431 sql/exec_log.go:162 [n1,client=[::1]:50344,user=root] 25278 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-7656810924711545548", $2:"'\\x09'", $3:"855793586072960013", $4:"'\\x09'", $5:"1017287173965809355", $6:"'\\xa1'", $7:"-6232585356493585104", $8:"'\\xaf'"} 70823.755 0 "TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_WRITE_TOO_OLD): \"sql txn\" id=2e2849a8 key=/Table/3840/1/-7656810924711545548/0 rw=true pri=0.11333670 stat=PENDING epo=0 ts=1559688507.625218000,1 orig=1559688507.333902000,0 max=1559688507.334513000,1 wto=true seq=13"
I190604 22:48:47.906688 3299 sql/exec_log.go:162 [n1,client=[::1]:50311,user=root] 25279 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-8194475532289442352", $2:"'\\x54'", $3:"866343863770607923", $4:"'\\xd3'", $5:"8481621241235572951", $6:"'\\xd0'", $7:"-4578397311671209003", $8:"'\\xf2'"} 123737.444 0 "TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_WRITE_TOO_OLD): \"sql txn\" id=0ce80ca2 key=/Table/3840/1/-8194475532289442352/0 rw=true pri=0.13950691 stat=PENDING epo=0 ts=1559688522.860037000,6 orig=1559688521.197242000,0 max=1559688521.197667000,0 wto=true seq=13"
I190604 22:48:47.908636 3299 sql/exec_log.go:162 [n1,client=[::1]:50311,user=root] 25280 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-8194475532289442352", $2:"'\\x54'", $3:"866343863770607923", $4:"'\\xd3'", $5:"8481621241235572951", $6:"'\\xd0'", $7:"-4578397311671209003", $8:"'\\xf2'"} 123739.403 0 "TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_ABORTED_RECORD_FOUND): \"sql txn\" id=0ce80ca2 key=/Table/3840/1/-8194475532289442352/0 rw=true pri=0.13950691 stat=PENDING epo=1 ts=1559688527.805226000,1 orig=1559688522.860037000,6 max=1559688521.197667000,0 wto=true seq=13"
I190604 22:48:47.915600 2759 sql/exec_log.go:162 [n1,client=[::1]:50288,user=root] 25281 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-8194475532289442352", $2:"'\\x1a'", $3:"866343863770607923", $4:"'\\xac'", $5:"8481621241235572951", $6:"'\\x3c'", $7:"-4578397311671209003", $8:"'\\x52'"} 33019.643 0 "TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_WRITE_TOO_OLD): \"sql txn\" id=2036fa06 key=/Table/3840/1/-8194475532289442352/0 rw=true pri=0.13950695 stat=PENDING epo=0 ts=1559688522.860037000,6 orig=1559688521.226186000,0 max=1559688521.226484000,0 wto=true seq=13"
(separately, it seems unlikely that the workload is doing what we want here - upon encountering an error, it should either retry the same statement or do a new one - not modify the values of the one it just tried)
Still digging in more.
Here's the trace of KV operations performed for one of these UPSERT operations:
2019-06-04 23:55:27.56581+00:00 | 00:00:00.001373 | fetched: /kv/primary/-6249445251639448483/v -> /'\x0834' | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.565816+00:00 | 00:00:00.001379 | fetched: /kv/primary/-2520054918142495237/v -> /'\x02' | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.565821+00:00 | 00:00:00.001384 | fetched: /kv/primary/7230858377498478967/v -> /'\x2130' | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.565824+00:00 | 00:00:00.001387 | fetched: /kv/primary/2375599334895466311/v -> /'\x2431' | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.565867+00:00 | 00:00:00.00143 | Put /Table/3842/1/-6249445251639448483/0 -> /TUPLE/2:2:Bytes/84 | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.565887+00:00 | 00:00:00.00145 | Del /Table/3842/2/"\b4"/-6249445251639448483/0 | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.565896+00:00 | 00:00:00.001459 | CPut /Table/3842/2/"84"/-6249445251639448483/0 -> /BYTES/ (expecting does not exist) | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.565924+00:00 | 00:00:00.001487 | Put /Table/3842/1/-2520054918142495237/0 -> /TUPLE/2:2:Bytes/02 | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.565933+00:00 | 00:00:00.001496 | Del /Table/3842/2/"\x02"/-2520054918142495237/0 | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.565936+00:00 | 00:00:00.001499 | CPut /Table/3842/2/"02"/-2520054918142495237/0 -> /BYTES/ (expecting does not exist) | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.565958+00:00 | 00:00:00.001521 | Put /Table/3842/1/7230858377498478967/0 -> /TUPLE/2:2:Bytes/10 | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.565961+00:00 | 00:00:00.001524 | Del /Table/3842/2/"!0"/7230858377498478967/0 | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.565967+00:00 | 00:00:00.00153 | CPut /Table/3842/2/"10"/7230858377498478967/0 -> /BYTES/ (expecting does not exist) | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.565977+00:00 | 00:00:00.00154 | Put /Table/3842/1/2375599334895466311/0 -> /TUPLE/2:2:Bytes/41 | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.56598+00:00 | 00:00:00.001543 | Del /Table/3842/2/"$1"/2375599334895466311/0 | [n1,client=[::1]:61224,user=root] | | flow | 6
2019-06-04 23:55:27.565982+00:00 | 00:00:00.001545 | CPut /Table/3842/2/"41"/2375599334895466311/0 -> /BYTES/ (expecting does not exist) | [n1,client=[::1]:61224,user=root] | | flow | 6
The "uniqueness constraint violation" we're experiencing is a result of one of the CPut operations failing because something did in fact exist.
I added logging to print out the failing request and response, and discovered that the failure is caused by CPut finding that, instead of the expected empty value in the key that it's writing to, it finds the very value that it wants to write:
Request: header:<key:"\367\017\010\212\022\r\000\001\200\305\330S\237&\323|X\210" sequence:9 > value:<raw_bytes:"\270G\264Z\003" timestamp:<> > allow_if_does_not_exist:true
Result: unexpected value: raw_bytes:"\270G\264Z\003" timestamp:<wall_time:1559693454441408000 logical:1 >
This shouldn't ever happen from SQL's perspective. At this point, the value in the secondary index for this key should be empty, if all went well, or something different, if SQL had a bug and didn't delete an old index entry after the value being indexed changed.
Note further that this request itself got retried several times. Here are all of the retries from the log:
I190605 00:10:53.016837 652041 sql/exec_log.go:162 [n1,client=[::1]:56020,user=root] 106590 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-4221407084484719214", $2:"'\\xcd'", $3:"-1252498265032407347", $4:"'\\xce'", $5:"-4190507510251946920", $6:"'\\x0d'", $7:"6722673737277716797", $8:"'\\x0f'"} 22659.224 0 "TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_WRITE_TOO_OLD): \"sql txn\" id=b21d5eeb key=/Table/3848/1/-4221407084484719214/0 rw=true pri=0.06546453 stat=PENDING epo=0 ts=1559693452.049776000,1 orig=1559693450.528623000,0 max=1559693450.529240000,0 wto=true seq=13"
I190605 00:10:53.017908 652041 sql/exec_log.go:162 [n1,client=[::1]:56020,user=root] 106591 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-4221407084484719214", $2:"'\\xcd'", $3:"-1252498265032407347", $4:"'\\xce'", $5:"-4190507510251946920", $6:"'\\x0d'", $7:"6722673737277716797", $8:"'\\x0f'"} 22660.297 0 "TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_ABORT_SPAN): \"sql txn\" id=b21d5eeb key=/Table/3848/1/-4221407084484719214/0 rw=true pri=0.06546453 stat=ABORTED epo=1 ts=1559693452.049776000,1 orig=1559693452.049776000,1 max=1559693450.529240000,0 wto=false seq=0"
I190605 00:10:54.329344 652041 sql/exec_log.go:162 [n1,client=[::1]:56020,user=root] 106630 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-4221407084484719214", $2:"'\\xcd'", $3:"-1252498265032407347", $4:"'\\xce'", $5:"-4190507510251946920", $6:"'\\x0d'", $7:"6722673737277716797", $8:"'\\x0f'"} 23971.746 0 "TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_WRITE_TOO_OLD): \"sql txn\" id=6b70e947 key=/Table/3848/1/-4221407084484719214/0 rw=true pri=0.06546453 stat=PENDING epo=0 ts=1559693453.879009000,1 orig=1559693453.017784000,0 max=1559693453.018321000,0 wto=true seq=13"
I190605 00:10:54.329869 652041 sql/exec_log.go:162 [n1,client=[::1]:56020,user=root] 106631 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-4221407084484719214", $2:"'\\xcd'", $3:"-1252498265032407347", $4:"'\\xce'", $5:"-4190507510251946920", $6:"'\\x0d'", $7:"6722673737277716797", $8:"'\\x0f'"} 23972.271 0 "TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_ABORT_SPAN): \"sql txn\" id=6b70e947 key=/Table/3848/1/-4221407084484719214/0 rw=true pri=0.11085267 stat=ABORTED epo=1 ts=1559693453.879009000,1 orig=1559693453.879009000,1 max=1559693453.018321000,0 wto=false seq=0"
I190605 00:10:54.856972 652041 sql/exec_log.go:162 [n1,client=[::1]:56020,user=root] 106765 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-4221407084484719214", $2:"'\\xcd'", $3:"-1252498265032407347", $4:"'\\xce'", $5:"-4190507510251946920", $6:"'\\x0d'", $7:"6722673737277716797", $8:"'\\x0f'"} 24499.373 0 "TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_WRITE_TOO_OLD): \"sql txn\" id=4cfeb7d5 key=/Table/3848/1/-4221407084484719214/0 rw=true pri=0.11085267 stat=PENDING epo=0 ts=1559693454.541409000,1 orig=1559693454.329810000,0 max=1559693454.330092000,0 wto=true seq=13"
I190605 00:10:54.879232 652041 sql/exec_log.go:162 [n1,client=[::1]:56020,user=root] 106773 exec "" {} "UPSERT INTO kv(k, v) VALUES ($1, $2), ($3, $4), ($5, $6), ($7, $8)" {$1:"-4221407084484719214", $2:"'\\xcd'", $3:"-1252498265032407347", $4:"'\\xce'", $5:"-4190507510251946920", $6:"'\\x0d'", $7:"6722673737277716797", $8:"'\\x0f'"} 24521.636 0 "duplicate key value (v)=('\\x0d') violates unique constraint \"kv_v_idx\""
Could this mean that a CPut from a transaction that got retried didn't get properly cleaned up from the perspective of SQL?
SHA: https://github.com/cockroachdb/cockroach/commits/8e7ef35a8e4169ec63dc5a4df963d8b31a3d5b61
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1324151&tab=buildLog
The test failed on branch=master, cloud=gce:
cluster.go:1513,kv.go:176,cluster.go:1851,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1324151-kv-contention-nodes-4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
7s 0 310.9 583.4 11.5 2550.1 5100.3 5905.6 write
8s 0 560.0 580.5 14.7 1208.0 5100.3 6174.0 write
9s 0 557.0 577.9 11.5 1208.0 4831.8 7784.6 write
10s 0 262.1 546.3 10.5 1140.9 5368.7 7247.8 write
11s 0 370.0 530.3 10.0 2013.3 5368.7 6979.3 write
12s 0 539.0 531.0 13.6 1476.4 4563.4 10200.5 write
13s 0 678.0 542.3 14.2 771.8 2281.7 8589.9 write
14s 0 372.0 530.1 13.6 637.5 1073.7 11811.2 write
15s 0 316.1 515.9 9.4 1140.9 2080.4 2550.1 write
Error: ERROR: duplicate key value (v)=('\x4c') violates unique constraint "kv_v_idx" (SQLSTATE 23505)
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_34.74.90.185_2019-06-05T11:00:51Z: exit status 1
: exit status 1
cluster.go:1872,kv.go:186,test.go:1248: Goexit() was called
I think I finally have a hold on what's going on here. The TL;DR is that we're dealing with a bug in MVCC that's being tickled by a bug in DistSender that's being tickled by a small change in behavior due to parallel commits. Fixing the bug in DistSender is straightforward but insufficient, as I still don't think we can actually enforce a property that MVCC is relying on for correctness - so we'll want to fix both issues.
What @jordanlewis and I were able to see repeatedly is that a transaction performed a Scan on the primary key of a table during an UPSERT, and then issued the following three requests in a batch:
Put to update the primary keyDelete to delete the old secondary index keyCPut to create the new secondary index keyThe CPut expected an existing key-value of nil for the secondary index key because it observed a different secondary index value in the primary key. Instead, it observed a non-nil value in the secondary index key. This resulted in a ConditionFailedError and eventually a "unique constraint violation". The fact that the CPut observed a non-nil key in the secondary index is surprising. We weren't seeing the transaction refresh, so the read portion of the CPut was reading at the same time as the initial Scan. So why weren't the primary and secondary indexes lining up? Were we not seeing a consistent snapshot when reading at the same timestamp?
Well... actually no we weren't. I added logging around ScanRequest evaluation and ConditionalPutRequest evaluation and was able to see that for the transaction that hit the issue, the result from the ScanRequest's MVCC scan did not line up with the result of the ConditionalPutRequest scan. The results also didn't have the same timestamp, which we would expect if both had been written by the same transaction. To back this up, I could see the intents for the corresponding values being committed by different transactions. During this time I also noticed that every transaction that hit this issue was in its second epoch. In other words, it had restarted once. This will come up later.
Finally, a big clue jumped out. The result of the ScanRequest had a higher timestamp than the timestamp that the transaction was reading at. This was bizarre. A transaction should only ever be able to see a value above its read timestamp if that value is its own intent. But this wasn't the case. Not only could I see the value being resolved as part of another transaction, but the value also had a higher timestamp than our transaction's own write timestamp. Additionally, the transaction hadn't even written any intents in the current epoch, and intents from previous epochs should be ignored. Digging into MVCCScan led to the first bug:
When performing an MVCC scan, a transaction should read values from the perspective of its read timestamp. The only case where this is not true is when a transaction observes its own intent, at which point it can read the intent regardless of its timestamp. This allows a transaction to continue reading its own writes even as its write timestamp drifts away from its read timestamp. This logic is all implemented in mvccScanner::getAndAdvance. There are a few interesting cases to handle in that method:
Anyone see a problem there?
...
It turns out that the last condition is making a subtle assumption that a transaction at epoch E will never write an intent at a timestamp higher than the read timestamp at epoch E+1. If this is ever untrue then "scanning to the MVCC value directly after the intent" is incorrect because that committed value might still have too large of a timestamp. If this is the case then the function will return a committed value with a timestamp above the read timestamp of the reader.
Through some dumb luck, it seems like this will very rarely cause issues for us right now due to a combination of only supporting a single intent-per-key and always starting new transaction epochs with a read timestamp equal to the best-known write timestamp of the previous epoch. Intent pushes allow the timestamp of an intent at epoch E to go above the read timestamp of epoch E+1, but luckily the single intent-per-key restriction protects us there because no new MVCC values will be able to slip in between the old and new intent timestamp.
Still, I think there are cases where this is broken and could cause issues. For instance, a Put could be replayed from an earlier epoch after another transaction came in and committed a value. The Put would be pushed above the committed value, resulting in a violation of the assumption and allowing the transaction at a higher epoch to come along, ignore the Put from the earlier epoch, and end up reading the committed value even if it is at a higher timestamp than its read timestamp.
The solution here is to replace this seek with a seek to
min(PrevTimestamp(ToDBTimestamp(meta_.timestamp()), timestamp_)
If the transaction doesn't want to see anything above timestamp_ and is deliberately ignoring intents from old epochs, we shouldn't seek to versions above timestamp_. The same change will be needed here.
With that change, the issue goes away. But what caused this? That question leads to the second bug.
It still seemed strange to me that the transaction was being restarted with a read timestamp below its previous write timestamp. I added instrumentation in evaluateBatch and could see requests sent by the previous epoch of the transaction using the write timestamp of the problematic intent. These write timestamps were being incremented because requests were running into committed values at higher timestamps and hitting WriteTooOldErrors. Remember that a transaction will continue to write even if it hits a WriteTooOld error. Instead of returning the error immediately, it will swallow it and set the WriteTooOld flag on its proto instead.
So I could see writes at the higher timestamp. However, instrumentation inside of PrepareTransactionForRetry showed that the transaction proto being used to create the new proto at a higher epoch after a TransactionRetryError: retry txn (RETRY_WRITE_TOO_OLD) error had a lower write timestamp than I was seeing in evaluateBatch. So the information about writes at the higher timestamp was being lost somewhere.
I tracked this down to DistSender's divideAndSendBatchToRanges method. The method's purpose is to fan requests out to ranges and then combine their responses. It should be the one combining the information about write timestamps that may have been incremented during the evaluation of requests on different ranges. This is supposed to happen around here. In this case, one of the requests was returning a TransactionRetryError while the others were simply returning transaction protos with updated write timestamps. Combining all of this information should be handled here, but there's a subtle bug in this code. We stop updating the BatchReponse's transaction proto when collecting responses if we see an error. So if the TransactionRetryError returns before the other responses, the updates from the other responses will be lost.
The fix here is to stop losing information when collecting responses. The easiest way to do that is with a change like this (though the actual fix will look somewhat different):
diff --git a/pkg/kv/dist_sender.go b/pkg/kv/dist_sender.go
index 548e7ec557..83f0ae10ac 100644
--- a/pkg/kv/dist_sender.go
+++ b/pkg/kv/dist_sender.go
@@ -1161,6 +1161,8 @@ func (ds *DistSender) divideAndSendBatchToRanges(
if err := br.Combine(resp.reply, resp.positions); err != nil {
pErr = roachpb.NewError(err)
}
+ } else {
+ pErr.UpdateTxn(resp.reply.Txn)
}
}
This also fixes the issue, even without the MVCC fix.
So what caused all of this? These two bugs have been lingering for a while, but something changed recently to cause them to combine in such a disastrous way.
The answer to that is parallel commits, but not by any fault of its own. Parallel commits allows an EndTransactionRequest to end up in a partial batch with one of the writes in the transaction's first epoch. Where before the writes in the first epoch would be issued before the EndTransaction and would all hit WriteTooOld errors and all swallow them before being combined at the client, now one of the writes hits a WriteTooOld error that is immediately converted to a TransactionRetryError: retry txn (RETRY_WRITE_TOO_OLD) during the EndTransaction evaluation before being returning to the client. This allowed the second bug to lose information about the increased write timestamps by the other partial batches issued concurrently with the batch with the EndTransaction request. This, in turn, led to a transaction restart where the new epoch's read timestamp was below the timestamp of a write from the previous epoch. This allowed the first bug to return the incorrect value during the UPSERTs Scan which eventually led to the inconsistent snapshot and in turn the duplicate key value error we've been seeing.
Great work! And I don't mean only the sleuthing, I also mean the very digestible presentation.
SHA: https://github.com/cockroachdb/cockroach/commits/b83798cadfee6447d565688b58657843741f8a45
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1328389&tab=buildLog
The test failed on branch=master, cloud=gce:
cluster.go:1513,kv.go:176,cluster.go:1851,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1328389-kv-contention-nodes-4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
I190607 10:20:36.009121 1 workload/workload.go:576 starting 4 splits
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
1s 0 522.6 744.8 92.3 419.4 604.0 771.8 write
2s 0 702.9 723.9 71.3 637.5 1040.2 1946.2 write
3s 0 651.0 699.6 50.3 906.0 1610.6 2281.7 write
4s 0 630.1 682.2 33.6 704.6 1409.3 2684.4 write
5s 0 466.0 639.0 18.9 805.3 2147.5 3355.4 write
Error: ERROR: duplicate key value (v)=('\x1d') violates unique constraint "kv_v_idx" (SQLSTATE 23505)
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_35.237.56.114_2019-06-07T10:20:34Z: exit status 1
: exit status 1
cluster.go:1872,kv.go:186,test.go:1248: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/33f9a3c2c5a0d2b756a8827daa74285b1aece401
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1330334&tab=buildLog
The test failed on branch=master, cloud=gce:
cluster.go:1513,kv.go:176,cluster.go:1851,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1330334-kv-contention-nodes-4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
ite
20s 0 12.0 321.7 7.3 8053.1 16643.0 16643.0 write
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
21s 0 215.9 316.7 8.4 3221.2 12884.9 13421.8 write
22s 0 427.9 321.8 8.4 2952.8 3623.9 20401.1 write
23s 0 112.0 312.6 8.9 973.1 3892.3 20401.1 write
24s 0 128.0 305.0 9.4 1946.2 3087.0 16643.0 write
25s 0 0.0 292.8 0.0 0.0 0.0 0.0 write
26s 0 271.9 292.0 9.4 4563.4 5100.3 18253.6 write
27s 0 112.0 285.3 8.9 5100.3 21474.8 24696.1 write
Error: ERROR: duplicate key value (v)=('\x79') violates unique constraint "kv_v_idx" (SQLSTATE 23505)
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_35.231.210.155_2019-06-08T10:31:34Z: exit status 1
: exit status 1
cluster.go:1872,kv.go:186,test.go:1248: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/de90b5d84c9e5d2ef562ebfa3d4a1bdc0e834e10
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1332319&tab=buildLog
The test failed on branch=master, cloud=gce:
cluster.go:1513,kv.go:176,cluster.go:1851,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1332319-kv-contention-nodes-4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
rite
2m54s 0 0.0 151.0 0.0 0.0 0.0 0.0 write
2m55s 0 12.0 150.2 5.8 6442.5 42949.7 42949.7 write
2m56s 0 0.0 149.4 0.0 0.0 0.0 0.0 write
2m57s 0 0.0 148.5 0.0 0.0 0.0 0.0 write
2m58s 0 0.0 147.7 0.0 0.0 0.0 0.0 write
2m59s 0 0.0 146.9 0.0 0.0 0.0 0.0 write
3m0s 0 0.0 146.1 0.0 0.0 0.0 0.0 write
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
3m1s 0 8.0 145.3 11.0 49392.1 49392.1 49392.1 write
Error: ERROR: duplicate key value (v)=('\xe5') violates unique constraint "kv_v_idx" (SQLSTATE 23505)
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_35.243.180.44_2019-06-09T10:40:28Z: exit status 1
: exit status 1
cluster.go:1872,kv.go:186,test.go:1248: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/90841a6559df9d9a4724e1d30490951bbdb811b4
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1364443&tab=buildLog
The test failed on branch=provisional_201906271846_v19.2.0-alpha.20190701, cloud=gce:
cluster.go:1511,kv.go:188,cluster.go:1849,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1364443-kv-contention-nodes-4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
0.0 write
52m56s 0 0.0 37.1 0.0 0.0 0.0 0.0 write
52m57s 0 0.0 37.1 0.0 0.0 0.0 0.0 write
52m58s 0 0.0 37.1 0.0 0.0 0.0 0.0 write
52m59s 0 0.0 37.1 0.0 0.0 0.0 0.0 write
53m0s 0 0.0 37.1 0.0 0.0 0.0 0.0 write
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
53m1s 0 0.0 37.1 0.0 0.0 0.0 0.0 write
53m2s 0 0.0 37.1 0.0 0.0 0.0 0.0 write
53m3s 0 116.0 37.1 6.0 32212.3 42949.7 103079.2 write
53m4s 0 0.0 37.1 0.0 0.0 0.0 0.0 write
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_35.224.186.87_2019-06-28T05:47:07Z: exit status 255
: exit status 1
cluster.go:1870,kv.go:198,test.go:1249: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/537767ac9daa52b0026bb957d7010e3b88b61071
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1364821&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:84,kv.go:195,cluster.go:1849,errgroup.go:57: spent 26.315789% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1870,kv.go:198,test.go:1249: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/ca1ef4d4f8296b213c0b2b140f16e4a97931e6e7
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1368144&tab=buildLog
The test failed on branch=master, cloud=gce:
ts_util.go:84,kv.go:195,cluster.go:1849,errgroup.go:57: spent 24.561404% of time below target of 10.000000 txn/s, wanted no more than 5.000000%
cluster.go:1870,kv.go:198,test.go:1249: Goexit() was called
SHA: https://github.com/cockroachdb/cockroach/commits/4878cb3e960dc26f0946e527e1836f27b3304c97
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1373334&tab=buildLog
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190704-1373334/kv/contention/nodes=4/run_1
cluster.go:1724,kv.go:188,cluster.go:2067,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1562220659-21-n5cpu4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
rite
14m56s 0 0.0 67.6 0.0 0.0 0.0 0.0 write
14m57s 0 0.0 67.5 0.0 0.0 0.0 0.0 write
14m58s 0 0.0 67.4 0.0 0.0 0.0 0.0 write
14m59s 0 0.0 67.4 0.0 0.0 0.0 0.0 write
15m0s 0 0.0 67.3 0.0 0.0 0.0 0.0 write
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
15m1s 0 0.0 67.2 0.0 0.0 0.0 0.0 write
15m2s 0 0.0 67.1 0.0 0.0 0.0 0.0 write
15m3s 0 0.0 67.1 0.0 0.0 0.0 0.0 write
15m4s 0 103.0 67.1 7.9 36507.2 40802.2 85899.3 write
Error: EOF
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_35.229.126.184_2019-07-04T10:54:08Z: exit status 1
: exit status 1
cluster.go:2088,kv.go:198,test_runner.go:681: unexpected node event: 4: dead
SHA: https://github.com/cockroachdb/cockroach/commits/36b74f0373b040aa873d4c23b415d94deba88378
Parameters:
To repro, try:
# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=kv/contention/nodes=4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1376442&tab=buildLog
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190708-1376442/kv/contention/nodes=4/run_1
cluster.go:1724,kv.go:188,cluster.go:2067,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1562566386-19-n5cpu4:5 -- ./workload run kv --init --secondary-index --duration=1h0m0s --cycle-length=512 --concurrency=128 --batch=4 --splits=4 {pgurl:1-4} returned:
stderr:
stdout:
56m30s 0 224.0 75.4 7.9 7516.2 13958.6 90194.3 write
56m31s 0 160.0 75.5 10.0 2147.5 9126.8 45097.2 write
56m32s 0 92.0 75.5 9.4 4160.7 10200.5 53687.1 write
56m33s 0 12.0 75.5 4.7 12.1 12.6 12.6 write
56m34s 0 0.0 75.4 0.0 0.0 0.0 0.0 write
56m35s 0 0.0 75.4 0.0 0.0 0.0 0.0 write
56m36s 0 0.0 75.4 0.0 0.0 0.0 0.0 write
56m37s 0 0.0 75.4 0.0 0.0 0.0 0.0 write
56m38s 0 24.0 75.4 8.4 9663.7 60129.5 60129.5 write
56m39s 0 208.0 75.4 8.4 9126.8 17179.9 98784.2 write
Error: EOF
Error: ssh verbose log retained in /root/.roachprod/debug/ssh_34.74.167.143_2019-07-08T10:56:45Z: exit status 1
: exit status 1
cluster.go:2088,kv.go:198,test_runner.go:681: unexpected node event: 2: dead
Most helpful comment
I think I finally have a hold on what's going on here. The TL;DR is that we're dealing with a bug in MVCC that's being tickled by a bug in DistSender that's being tickled by a small change in behavior due to parallel commits. Fixing the bug in DistSender is straightforward but insufficient, as I still don't think we can actually enforce a property that MVCC is relying on for correctness - so we'll want to fix both issues.
What @jordanlewis and I were able to see repeatedly is that a transaction performed a
Scanon the primary key of a table during anUPSERT, and then issued the following three requests in a batch:Putto update the primary keyDeleteto delete the old secondary index keyCPutto create the new secondary index keyThe
CPutexpected an existing key-value ofnilfor the secondary index key because it observed a different secondary index value in the primary key. Instead, it observed a non-nil value in the secondary index key. This resulted in aConditionFailedErrorand eventually a "unique constraint violation". The fact that theCPutobserved a non-nil key in the secondary index is surprising. We weren't seeing the transaction refresh, so the read portion of theCPutwas reading at the same time as the initialScan. So why weren't the primary and secondary indexes lining up? Were we not seeing a consistent snapshot when reading at the same timestamp?Well... actually no we weren't. I added logging around
ScanRequestevaluation andConditionalPutRequestevaluation and was able to see that for the transaction that hit the issue, the result from theScanRequest's MVCC scan did not line up with the result of theConditionalPutRequestscan. The results also didn't have the same timestamp, which we would expect if both had been written by the same transaction. To back this up, I could see the intents for the corresponding values being committed by different transactions. During this time I also noticed that every transaction that hit this issue was in its second epoch. In other words, it had restarted once. This will come up later.Finally, a big clue jumped out. The result of the
ScanRequesthad a higher timestamp than the timestamp that the transaction was reading at. This was bizarre. A transaction should only ever be able to see a value above its read timestamp if that value is its own intent. But this wasn't the case. Not only could I see the value being resolved as part of another transaction, but the value also had a higher timestamp than our transaction's own write timestamp. Additionally, the transaction hadn't even written any intents in the current epoch, and intents from previous epochs should be ignored. Digging intoMVCCScanled to the first bug:Bug 1
When performing an MVCC scan, a transaction should read values from the perspective of its read timestamp. The only case where this is not true is when a transaction observes its own intent, at which point it can read the intent regardless of its timestamp. This allows a transaction to continue reading its own writes even as its write timestamp drifts away from its read timestamp. This logic is all implemented in
mvccScanner::getAndAdvance. There are a few interesting cases to handle in that method:Anyone see a problem there?
...
It turns out that the last condition is making a subtle assumption that a transaction at epoch E will never write an intent at a timestamp higher than the read timestamp at epoch E+1. If this is ever untrue then "scanning to the MVCC value directly after the intent" is incorrect because that committed value might still have too large of a timestamp. If this is the case then the function will return a committed value with a timestamp above the read timestamp of the reader.
Through some dumb luck, it seems like this will very rarely cause issues for us right now due to a combination of only supporting a single intent-per-key and always starting new transaction epochs with a read timestamp equal to the best-known write timestamp of the previous epoch. Intent pushes allow the timestamp of an intent at epoch E to go above the read timestamp of epoch E+1, but luckily the single intent-per-key restriction protects us there because no new MVCC values will be able to slip in between the old and new intent timestamp.
Still, I think there are cases where this is broken and could cause issues. For instance, a
Putcould be replayed from an earlier epoch after another transaction came in and committed a value. ThePutwould be pushed above the committed value, resulting in a violation of the assumption and allowing the transaction at a higher epoch to come along, ignore thePutfrom the earlier epoch, and end up reading the committed value even if it is at a higher timestamp than its read timestamp.The solution here is to replace this seek with a seek to
min(PrevTimestamp(ToDBTimestamp(meta_.timestamp()), timestamp_)If the transaction doesn't want to see anything above
timestamp_and is deliberately ignoring intents from old epochs, we shouldn't seek to versions abovetimestamp_. The same change will be needed here.With that change, the issue goes away. But what caused this? That question leads to the second bug.
Bug 2
It still seemed strange to me that the transaction was being restarted with a read timestamp below its previous write timestamp. I added instrumentation in
evaluateBatchand could see requests sent by the previous epoch of the transaction using the write timestamp of the problematic intent. These write timestamps were being incremented because requests were running into committed values at higher timestamps and hittingWriteTooOldErrors. Remember that a transaction will continue to write even if it hits aWriteTooOlderror. Instead of returning the error immediately, it will swallow it and set theWriteTooOldflag on its proto instead.So I could see writes at the higher timestamp. However, instrumentation inside of
PrepareTransactionForRetryshowed that the transaction proto being used to create the new proto at a higher epoch after aTransactionRetryError: retry txn (RETRY_WRITE_TOO_OLD)error had a lower write timestamp than I was seeing inevaluateBatch. So the information about writes at the higher timestamp was being lost somewhere.I tracked this down to
DistSender'sdivideAndSendBatchToRangesmethod. The method's purpose is to fan requests out to ranges and then combine their responses. It should be the one combining the information about write timestamps that may have been incremented during the evaluation of requests on different ranges. This is supposed to happen around here. In this case, one of the requests was returning aTransactionRetryErrorwhile the others were simply returning transaction protos with updated write timestamps. Combining all of this information should be handled here, but there's a subtle bug in this code. We stop updating the BatchReponse's transaction proto when collecting responses if we see an error. So if theTransactionRetryErrorreturns before the other responses, the updates from the other responses will be lost.The fix here is to stop losing information when collecting responses. The easiest way to do that is with a change like this (though the actual fix will look somewhat different):
This also fixes the issue, even without the MVCC fix.
Why did this start happening?
So what caused all of this? These two bugs have been lingering for a while, but something changed recently to cause them to combine in such a disastrous way.
The answer to that is parallel commits, but not by any fault of its own. Parallel commits allows an
EndTransactionRequestto end up in a partial batch with one of the writes in the transaction's first epoch. Where before the writes in the first epoch would be issued before the EndTransaction and would all hit WriteTooOld errors and all swallow them before being combined at the client, now one of the writes hits a WriteTooOld error that is immediately converted to aTransactionRetryError: retry txn (RETRY_WRITE_TOO_OLD)during the EndTransaction evaluation before being returning to the client. This allowed the second bug to lose information about the increased write timestamps by the other partial batches issued concurrently with the batch with theEndTransactionrequest. This, in turn, led to a transaction restart where the new epoch's read timestamp was below the timestamp of a write from the previous epoch. This allowed the first bug to return the incorrect value during the UPSERTsScanwhich eventually led to the inconsistent snapshot and in turn theduplicate key valueerror we've been seeing.