We have a teamcity build that backs up a production cluster and restores it into a new database _full. Then does an incremental backup from the first one and uses both to restore into a second new database _inc. Finally, it drops the two databases_full and _inc. In a recent run the backups each took about 1m, the restores took 10m, and the first drop database took 62 minutes. (As of the time of writing, the second one hadn't finished.) The amount of data being dropped was about 2.8GB.
https://teamcity.cockroachdb.com/viewLog.html?buildId=188347
we should look at using rocksDBs DB::DeleteRange for faster deletion of both tables and indexes.
I don't think that's right. Deleting tables and indexes needs to go through our MVCC layer, we can't just obliterate the data with RocksDB's DeleteRange.
Well, we currently go through the MVCC layer, and this allows us to support ongoing queries (or new time-travel queries) across DROP TABLE boundaries, but in many cases this is not a requirement, and it might be nice to at least have the ability to opt in to a faster DROP TABLE that did not go through the MVCC layer (except to fix up the MVCC stats after the DeleteRange). This would both improve the performance of the drop itself and reduce the time before the disk space is freed (instead of waiting for a 24h GC cycle).
That's fine, but I'm not sure that's what this issue is about. It seems that the DROP TABLE operation is much slower than the restore, suggesting that there are lower-hanging fruit to be picked here.
I'm sure there's a lot of room for improvement while still using MVCC, but restore is as fast as it is because it bypasses the MVCC layer.
While we can work on the actual performance of drop table, a user preferred solution will be to just declare the drop as done once the name is available for reuse and run the actually drop in the background. This can be implemented by the schema changer being made aware if it is associated with a session, and running only the "release name" part of the code when run from the session, and the rest of the drop table code from the async schema changer.
Perhaps it was a mistake to make DROP TABLE work like TRUNCATE TABLE. I agree with @vivekmenezes that we should start by removing the table name -> table ID mapping in the schema and let clients continue. But instead of doing the MVCC deletion in the background, it seems considerably more efficient to have a different path for the actual deletion of the table, which would schedule DeleteRange calls to delete the underlying data according to the zone config TTL.
Most helpful comment
I'm sure there's a lot of room for improvement while still using MVCC, but restore is as fast as it is because it bypasses the MVCC layer.