Created 2 column table with 2.7M rows and tried to CTAS from primary table to new table and encountered the following issue.
root@:26257/test> select count(1) from accounts;
+---------+
| count |
+---------+
| 2790683 |
+---------+
(1 row)
root@:26257/test> insert into accounts2 select * from accounts;
pq: command is too large: 116025971 bytes (max: 67108864)
root@:26257/test> drop table accounts2;
DROP TABLE
Time: 30.228115ms
root@:26257/test> create table accounts2 as select * from accounts;
pq: command is too large: 345539008 bytes (max: 67108864)
root@:26257/test> select * from crdb_internal.node_statement_statistics;
+---------+------------------+-------+--------------------------------------
The root cause of this issue is the limit on transaction size. INSERT INTO ... SELECT is naturally affected by this, because it has to preserve transactionality.
CREATE TABLE AS, on the other hand, does not have to be transactional. We should be able to support large CREATE TABLE AS statements by breaking them up into multiple transactions that use a consistent transaction timestamp, I think.
cc @tschottdorf @danhhz - perhaps there's a fancier way to do this with SSTable rewriting?
CREATE TABLE AS does need to be transactional - the resulting table should come into existence all at once from the perspective of any outside observer.
Note that the transaction size limit has been significantly raised in 2.0, but CREATE TABLE AS runs into the (much smaller) command size limit. A transactional CREATE TABLE AS that splits its work into multiple KV-level commands would be able to handle larger tables (although it would still be slower than using the export/import machinery).
Adjusted the title to make this easier to search for.
So Please confirm CTAS or CTAS with 1=2 and IAS would be the same issue.. I am sure of it - just want to confirm.
@awoods187 @jordanlewis @rolandcrosby
After discussion with @dt, we considered breaking the change into two stages:
The migration to a job would imply that the table created by CTAS would not be usable within the same txn, as jobs are only processed following a commit. Just wanted to run this by the Execution team before any dev work is started.
With #38374 now merged, I ran a benchmark on a 4 node default configuration cluster to compare CTAS performance in the 19.1.2 release, and on master.
The previous implementation would error out with pq: command is too large for greater than 300k rows, thus I used a bank table with 300k rows (100MiB) in my testing.
Average over 5 runs:
v19.1.2 - 7.88s
master - 1.02s
So we observe an 87% speedup in smaller tables.
I then tested the new implementation with source tables of varying sizes all the way up to a 25 GiB, 50 million row table which it completed in 18m14s.
This is awesome @adityamaru27! I am super psyched about it.
Very nice work @adityamaru27.
Most helpful comment
With #38374 now merged, I ran a benchmark on a 4 node default configuration cluster to compare CTAS performance in the
19.1.2release, and onmaster.The previous implementation would error out with
pq: command is too largefor greater than 300k rows, thus I used a bank table with 300k rows (100MiB) in my testing.Average over 5 runs:
v19.1.2- 7.88smaster- 1.02sSo we observe an 87% speedup in smaller tables.
I then tested the new implementation with source tables of varying sizes all the way up to a 25 GiB, 50 million row table which it completed in 18m14s.