On S3, Presto can insert into Hive table/partition without moving files around. #5818 introduces support for transaction-ish delete followed by insert. I want to understand 2 things:
BEGIN TRANSACTIONDELETE FROM t WHERE pk='value'INSERT INTO t ....COMMITThe only thing I can think of that will work on S3 without introducing file moves:
xxx/<hash of query id and table name>/pk1=x/pk2=y. This way, Presto can implement inserts without moving files. But this means any tools that assumes conventional directory name and doesn't talk to metastore will not be able to read table/partition generated by Presto.Is this going to be useful? Do you have better proposals?
@haozhun thanks for putting this together. Here is my 2 cents.
How Hive does INSERT INTO or INSERT OVERWRITE on S3?
We currently have EMR Hive 1.0 and an insert overwrite first deletes the s3 path at the beginning of the query, then a job writes the data to hdfs in a temp dir and after that another job writes that data directly to s3 (no s3 temp files are used). With inserts, a job writes the data to hdfs in a temp dir and then a subsequent job reads that and write it directly to s3.
Regarding the use of temp files, in general we don't like s3 temp files being used during writes as s3 temp directories are a mess to manage (especially on versioned buckets) and they incur additional costs, so we prefer direct writes to s3.
With some of the tools we have, similar to your proposal, whenever new data is written we create a new batchid directory on s3 and write the data to that directory, and update the metastore to point to this new batchid (batchid is a unix timestamp). This solves the temp file problem and also avoids the potential eventual consistency issues that can arise (for a discussion of some these issues please take a look at this, there section on "s3 insert overwrite"). All the tools that we use like Hive, Pig, Presto, Spark, etc. support talking to the Hive metastore to get the location information so it's working well for us so far.
@nezihyigitbasi I have a few questions on your response.
batchid subpartition. That is not exactly the same as creating a new directory with batchid in it, and then have metastore point the partition there. Can you show me some examples of the full path of files on S3?batchid directory gets created and the table location points to that. There are other tables that has a batchid partition. An example for a non-partitioned table: s3n://hive/database.db/table/batchid=1470789481, and for a partitioned table the partition (dateint=20160506, hour=9, batchid='merged_20160506T101705_1') points to s3n://hive/database.db/table/dateint=20160506/hour=9/batchid=merged_20160506T101705_1;This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.