I'm looking to take advantage of the aws s3 sync
command, but provide per-object metadata (i.e. metadata that can change per object) rather than provide global metadata with --metadata
.
Right now, I have basically a couple of options:
sync
command.What would be nice is if I could somehow indicate to the CLI that I want to map each object to a set of metadata, and then upload each object with that metadata. A couple of solutions come to mind:
{
"path/to/object1": {"key1": "value1", "key2": "value2},
...etc...
}
$ aws s3 sync /some/dir s3://somebucket --metadata-mapping /path/to/meta/mapping.json
sync
command read the metadata for each object prior to uploading. For example, I could have a local directory:$ ls /path/to/local/files
file1
file1.meta
$ cat file1.meta
{
"key1": "value1",
"key2": "value2"
}
$ aws s3 sync /path/to/local/files s3://somebucket --object-metadata '$filename.meta'
(So when this is run, the $filename.meta
files would just be read for metadata, and would not be transferred)
$ ls /path/to/local/files
file1
$ lookup-metadata.py /path/to/local/files/file1
{
"key1": "value1"
}
$ aws s3 sync /path/to/local/files s3://somebucket --metadata-callback lookup-metadata.py
Alternatively, what would be really great is if the syncing functionality were available independently of the CLI from within Python (without requiring me to figure out the internals of how to properly initialize the CLI environment, etc.), so that I could subclass and customize the process. I started going down this route somewhat, but am worried that this API is not for public consumption and would break in the future.
Any thoughts?
I'm -1 on adding that. I don't think providing that kind of mapping is a very good experience. At that point you're effectively setting everything manually anyway, so it would take just as much time to perform all those requests.
As far as using our code, we don't guarantee we won't break internals. However, it is MIT licensed so feel free to vendor or copy it.
In my use case, the metadata is precomputed against the objects I'm trying to store and placed in a storage backend (details not important, but e.g. MongoDB). All I'm doing is retrieving the data from that backend and storing it with the objects. If I do this object-by-object, then I need to recreate threaded uploads, multipart handling, sync strategies, etc -- all of the things that the sync
command normally would do for me. I then need to hook in my logic to make sure the correct metadata is stored with each object. If the CLI supports a mapping or callback, then I just need to translate the data into the correct format (which I can stage ahead of time), and then run the sync
+1 for me. I think this is a reasonable request. Out of all the proposed solutions, I like the metadata JSON file the best. I'm inclined to mark this as a feature request.
@jcmcken One other thing worth considering is the work @kyleknap's been doing for s3transfer. It's still under active development so I wouldn't recommend it for general use just yet, but the idea is to create a good python API for the functionality that's currently exposed in the AWS CLI.
@jamesls Since s3transfer is still very much in active development, do you have a recommendation for syncing with per file metadata?
node-s3-client is the most promising library I've come across, but the project seems to be having problems with the underlying AWS SDK, see https://github.com/andrewrk/node-s3-client/issues/129
Good Morning!
We're closing this issue here on GitHub, as part of our migration to UserVoice for feature requests involving the AWS CLI.
This will let us get the most important features to you, by making it easier to search for and show support for the features you care the most about, without diluting the conversation with bug reports.
As a quick UserVoice primer (if not already familiar): after an idea is posted, people can vote on the ideas, and the product team will be responding directly to the most popular suggestions.
We鈥檝e imported existing feature requests from GitHub - Search for this issue there!
And don't worry, this issue will still exist on GitHub for posterity's sake. As it鈥檚 a text-only import of the original post into UserVoice, we鈥檒l still be keeping in mind the comments and discussion that already exist here on the GitHub issue.
GitHub will remain the channel for reporting bugs.
Once again, this issue can now be found by searching for the title on: https://aws.uservoice.com/forums/598381-aws-command-line-interface
-The AWS SDKs & Tools Team
Based on community feedback, we have decided to return feature requests to GitHub issues.
Most helpful comment
Based on community feedback, we have decided to return feature requests to GitHub issues.