Cockroach: sql: DROP DATABASE takes a looooong time

Created on 20 Mar 2017 · 7Comments · Source: cockroachdb/cockroach

We have a teamcity build that backs up a production cluster and restores it into a new database _full. Then does an incremental backup from the first one and uses both to restore into a second new database _inc. Finally, it drops the two databases_full and _inc. In a recent run the backups each took about 1m, the restores took 10m, and the first drop database took 62 minutes. (As of the time of writing, the second one hadn't finished.) The amount of data being dropped was about 2.8GB.

https://teamcity.cockroachdb.com/viewLog.html?buildId=188347

C-bug C-performance

Source

danhhz

Most helpful comment

I'm sure there's a lot of room for improvement while still using MVCC, but restore is as fast as it is because it bypasses the MVCC layer.

bdarnell on 20 Mar 2017

👍2

All 7 comments

we should look at using rocksDBs DB::DeleteRange for faster deletion of both tables and indexes.

vivekmenezes on 20 Mar 2017

I don't think that's right. Deleting tables and indexes needs to go through our MVCC layer, we can't just obliterate the data with RocksDB's DeleteRange.

tamird on 20 Mar 2017

Well, we currently go through the MVCC layer, and this allows us to support ongoing queries (or new time-travel queries) across DROP TABLE boundaries, but in many cases this is not a requirement, and it might be nice to at least have the ability to opt in to a faster DROP TABLE that did not go through the MVCC layer (except to fix up the MVCC stats after the DeleteRange). This would both improve the performance of the drop itself and reduce the time before the disk space is freed (instead of waiting for a 24h GC cycle).

bdarnell on 20 Mar 2017

That's fine, but I'm not sure that's what this issue is about. It seems that the DROP TABLE operation is much slower than the restore, suggesting that there are lower-hanging fruit to be picked here.

tamird on 20 Mar 2017

I'm sure there's a lot of room for improvement while still using MVCC, but restore is as fast as it is because it bypasses the MVCC layer.

bdarnell on 20 Mar 2017

👍2

While we can work on the actual performance of drop table, a user preferred solution will be to just declare the drop as done once the name is available for reuse and run the actually drop in the background. This can be implemented by the schema changer being made aware if it is associated with a session, and running only the "release name" part of the code when run from the session, and the rest of the drop table code from the async schema changer.

vivekmenezes on 22 Mar 2017

Perhaps it was a mistake to make DROP TABLE work like TRUNCATE TABLE. I agree with @vivekmenezes that we should start by removing the table name -> table ID mapping in the schema and let clients continue. But instead of doing the MVCC deletion in the background, it seems considerably more efficient to have a different path for the actual deletion of the table, which would schedule DeleteRange calls to delete the underlying data according to the zone config TTL.