Presto: How to Delete a partition file in Amazon S3 using a Presto script?

Created on 8 Nov 2016  路  4Comments  路  Source: prestodb/presto

On S3, Presto can insert/delete into Hive table, but when deleting on Presto, i see that partition on amazon s3 are not deleted. Can you explain the reasons?

presto:mp_catalog> delete from cat_item where itemid = 3;

S3 : category/itemid=3/20161108_100300_00145_243y7_f526527b-7e8b-401d-b6d1-f172f989a86f.gz

Most helpful comment

I'm experiencing the same issue. I did a DELETE FROM example where date='2019-05-09'; where the date is what the data is partition by. But the underlying data is still there in S3. I am unable to insert into that partition again unless I manually delete it from S3.

All 4 comments

The Hive metastore is responsible for physically deleting the data when the partition is dropped. If you drop the partition using the Hive CLI, is the directory deleted?

Yes, when the partition is dropped in hive, the directory for the partition is deleted. Dropping the partition from presto just deletes the partition from the hive metastore. The data still exists in s3. And since presto does not support overwrite, you have to delete the data manually before running the query again.

I'm experiencing the same issue. I did a DELETE FROM example where date='2019-05-09'; where the date is what the data is partition by. But the underlying data is still there in S3. I am unable to insert into that partition again unless I manually delete it from S3.

Hey @electrum, any resolution for this? I was suspecting that it should be a Hive metastore operation to delete the S3 data of a managed table so I dropped a table from the Hive CLI and the S3 value was indeed gone but when I drop the table from presto, the S3 value still exists.

Was this page helpful?
0 / 5 - 0 ratings