Aws-sdk-java: File metadata is lost during multipart S3 copy

Created on 27 Feb 2015 · 5Comments · Source: aws/aws-sdk-java

Original metadata is always dropped during copy for files larger than 5GB (where multipart copy is required). For smaller files the behavior is correct.
CopyCallable.initiateMultipartUpload is always setting NewObjectMetadata on the CopyObjectRequest so the original data is destroyed.
In my particular case I have a Content-Disposition header that does not get copied.

service-api

Source

gribbet

Most helpful comment

The API should probably not handle metadata differently depending on file size. It is confusing behavior. This issue was quite difficult to track down.
Note AWS SDKs in other languages (eg. Python) don't have this issue.
Perhaps the newObjectMetadata on line 255 should be set to the existing metadata rather than creating a new one? I don't understand why "the SDK cannot determine what metadata needs to be copied from source". How about all of it except for encryption headers?

Yes, a reasonable workaround that we have already implemented is to query and explicitly set the existing metadata for large files.

gribbet on 28 Feb 2015

👍3

All 5 comments

@gribbet

For all objects under the threshold limit, we use one single call (PUT Object Copy API) that by default copies the metadata from source to destination ignoring a few specific headers

However for multipart copies, the metadata needs to be set in the InitiateMultipart request and the SDK cannot determine what metadata needs to be copied from the source. There are encryption related headers that cannot be copied and needs to be explicitly specified by the user in the request.

Is it feasible for you to explicitly set the metadata in request ? If not can you specify the use case ?

manikandanrs on 27 Feb 2015

Yes, a reasonable workaround that we have already implemented is to query and explicitly set the existing metadata for large files.

gribbet on 28 Feb 2015

👍3

@gribbet apologies for the extended delay in getting back to you on this issue. Unfortunately as @manikandanrs said this is an issue with the S3 service rather than the Java SDK. The Python SDK actually has a similar issue (see https://github.com/aws/aws-cli/issues/1145).

The handling of metadata on a single copy request is actually done by S3 itself (via the x-amz-metadata-directive header) see S3 Copy docs for more info. S3 handles this because certain metadata is intended not to be persisted across copy (e.g. storage class / server-side encryption). This "black-list" of meta-data is maintained by S3 and is subject to change - and therefore it doesn't really make sense for us to do this filtering in the SDK itself.

Unfortunately S3 does not support the x-amz-metadata-directive header on InitiateMultipartTransfer or CopyPart requests. I've raised this to the service team and will come back on this issue when I hear back from them.

kiiadi on 11 Aug 2016

@gribbet I contacted the S3 service team and they are aware of the inconsistency - it's possible that they'll fix it in a future version of the service. However given there is a workaround there are higher priority issues to resolve. Given this is not a Java SDK specific problem I'm going to close this issue. I will communicate back when the service team resolves this inconsistency in multi-part copy.

kiiadi on 31 Aug 2016

Since this seems as though it will never get fixed why not write your own s3 sync function that preserves metdata 🙄

Here's a really ugly one in node 8.x, hopefully this helps someone
https://gist.github.com/akotranza/51f452f975469e1fa78c2748dd115c87