I encountered a concerning error while trying to restore a registration cluster backup to a roachprod cluster:
root@localhost:26257/defaultdb> RESTORE TABLE registration.* FROM 's3://cockroach-reg-backups/2019-09-01?AWS_ACCESS_KEY_ID=<redacted>&AWS_SECRET_ACCESS_KEY=<redacted>';
pq: importing 12095 ranges: importing span /Table/78/1/"\r{=R巍\xadEj\x88iM\x03g+N\x0e"/4/1920-09-16T09:02:26.852719999Z/"SELECT _, _, _ FROM _ AS OF SYSTEM TIME _ WHERE _ = _"/1/0/"$ internal-read orphaned table leases"-k7~k\xd4N\xea\xa7\t\xa9\xe4\xa6\\\x01\x8a"/1/1920-10-04T04:48:11.013350999Z/"SELECT _, _ FROM _ WHERE (_ IN ($1, $2, __more1__)) AND (_ < $4) ORDER BY _ LIMIT _"/1/0/"$ internal-gc-jobs"}: adding to batch: /Table/71/1/"\ri\x91\x815}H\xf6\x9a\xdbDP2\x1e\xa2t"/1/1920-07-14T04:41:27.182670999Z/"UPDATE _ SET _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = -_, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _ WHERE _ = _"/0/0/"40f09bee"/0/1561058311.817775579,4 -> /TUPLE/4:4:Bytes/v2.0.2/1:5:Int/356418/1:6:False/false/5:11:Int/81/1:12:Int/0/1:13:Int/81/2:15:Float/0.5/1:16:Float/40.5/1:17:Float/0.0002460487839506174/1:18:Float/1.2843239990119444e-05/1:19:Float/0.00011604993209876539/1:20:Float/4.570659468070251e-06/1:21:Float/0.0009339686666666668/1:22:Float/0.00017130743693108405/1:23:Float/0.003720578672839506/1:24:Float/0.0030473801431714874/1:25:Float/0.005016646055555556/1:26:Float/0.005024995901326392: computing stats for SST [/Table/71/1/"\ri\x91\x815}H\xf6\x9a\xdbDP2\x1e\xa2t"/1/1920-05-05T15:38:08.254344999Z/"SELECT _, _, _, _, _, _, _, _, _ FROM _ WHERE _ IN (_, _)"/0/0/"40f09bee"/0, /Table/71/1/"\ri\x91\x815}H\xf6\x9a\xdbDP2\x1e\xa2t"/1/1920-07-14T04:41:27.178937999Z/"UPDATE _ SET _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = -_, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _ WHERE _ = _"/0/0/"40f09bee"/0/NULL): /Table/71/1/"\ri\x91\x815}H\xf6\x9a\xdbDP2\x1e\xa2t"/1/1920-05-05T21:38:06.091400999Z/"UPDATE _ SET _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ =
_, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _, _ = _ WHERE _ = _"/0/0/"40f09bee"/0: invalid header size: 4
I tried the same restore out on a few different cockroach versions and observed the error on v19.2.0-beta.20190826 and later but not on v19.2.0-alpha.20190805 and earlier.
Repro steps:
CLUSTER=$USER-secure
roachprod create $CLUSTER -n 3 --clouds=aws --aws-machine-type-ssd=c5d.4xlarge
roachprod stage $CLUSTER:1-3 cockroach
roachprod start $CLUSTER:1-3 --secure
roachprod sql $CLUSTER:1 --secure
Then run the following statements, filling in sensitive info as necessary:
SET CLUSTER SETTING cluster.organization = '<redacted>';
SET CLUSTER SETTING enterprise.license = '<redacted>';
CREATE DATABASE registration;
RESTORE TABLE registration.* FROM 's3://cockroach-reg-backups/2019-09-01?AWS_ACCESS_KEY_ID=<redacted>&AWS_SECRET_ACCESS_KEY=<redacted>';
The error should appear within 30 seconds.
@pbardea @solongordon is this a release blocker?
Yes, @lucy-zhang added it to the list this morning.
@pbardea has bisected this issue to a commit which bumped the Pebble version. So far the reg cluster backups are the only known example of the error.
cc @petermattis
Through experimentation I found that the issue seems to be related to the introduction of the two level index block in pebble. Commenting out the line: https://github.com/cockroachdb/pebble/blob/master/sstable/writer.go#L434 seems to allow the import progress. It's not clear to me yet how this relates to a value.
It also seems that after the import succeeds without that line, we are able to read the data from the restore. Unsure if that's expected considering that I think this means that the topLevleIndex is empty (I assume it may just scan all the data blocks in the SST in this case?)
Huh, commenting out the line you indicated seems really problematic as we would create invalid sstables (the top-level index would be broken). Can you instead try setting Options.IndexBlockSize = math.MaxInt32? That is the "correct" way to disable two-level indexes.
It looks like that also resolves the issue. (The RESTORE is not yet complete, but usually errors out quite quickly -- will update when the RESTORE completes).
In this case does it look like this is an Pebble issue? (I haven't found anything above this in the stack that looks amiss otherwise.) If so, I can file an issue and set the index block size as described above as a temporary work-around until the two level index issue is resolved?
(For posterity: Yesterday I also noticed that the issue disappeared when I toggled https://github.com/cockroachdb/pebble/blob/master/sstable/writer.go#L364 to be w.twoLevelIndex = false, which I believe would also force the usage of a single index block.)
In this case does it look like this is an Pebble issue? (I haven't found anything above this in the stack that looks amiss otherwise.) If so, I can file an issue and set the index block size as described above as a temporary work-around until the two level index issue is resolved?
Yes. Two-level indexes were only recently added to pebble. We don't actually enable them for RocksDB. Totally fine to disable them.
It will be useful for the issue you file to have reproduction instructions. Please include the SHA of cockroachdb you were running at.
(For posterity: Yesterday I also noticed that the issue disappeared when I toggled https://github.com/cockroachdb/pebble/blob/master/sstable/writer.go#L364 to be w.twoLevelIndex = false, which I believe would also force the usage of a single index block.)
Right. That's the brute force way to disable two-level indexes.
If I understand correctly, Pebble is being used to write the sstables which are then ingested into RocksDB, right? It is possible RocksDB has a bug in handling two-level indexes.
Correcting my misunderstanding above: Pebble is being used to write the sstables and then golang/leveldb/table is being used to iterate over them in order to compute range stats. golang/leveldb/table doesn't understand two-level indexes. We should really change that code to use pebble/sstable instead, though there is also a bug in Pebble here. pebble/sstable.Writer should create LevelDB compatible tables when asked to do so (and it wasn't).