Google-cloud-python: Unable to set blob metadata

Created on 15 Oct 2015  路  22Comments  路  Source: googleapis/google-cloud-python

I am trying to set custom metadata on a blob. However, this does not work. I'm using the following script:

#!/usr/bin/env python3
import os

from gcloud import storage

client = storage.Client('super secret id')
bucket = client.get_bucket('super secret bucket')

blob = bucket.blob('kirby.png')
blob.metadata = dict(Color='Pink')
with open(os.path.expanduser('~/Pictures/kirby.png'), 'rb') as img_data:
    blob.upload_from_file(img_data)

When retrieving the blob or inspecing it in the developer console, the metadata is still unset.

storage p2

Most helpful comment

I think this is a duplicate of #754. The use of _metadata_ is a bit ambiguous as _metadata_ refers to both the _metadata_ of an object (such as _Cache-Control_) and the metadata property, meant for custom _metadata_.

Anyway, to the issue. Any update? The problem with patch is that the object is online briefly with wrong _metadata_, or less briefly in case of a connection failure between two requests. And, it increases metageneration.

>>> text = bucket.blob('file.txt')
>>> text.cache_control = 'no-cache'
>>> text.upload_from_string('text')
>>> text.reload()
>>> text.cache_control
>>> text.metageneration
1
>>> text.cache_control = 'no-cache'
>>> text.patch()
>>> text.reload()
>>> text.cache_control
u'no-cache'
>>> text.metageneration
2

All 22 comments

Setting the metadata locally does not cause the instance to make the API call to patch it on the server, and we don't send metadata along with the upload_from_file() call. You need to add a call to blob.patch() to persist the changes to blob.metadata.

I modified the script:

#!/usr/bin/env python3
import os

from gcloud import storage

client = storage.Client('dc-app-api')
bucket = client.get_bucket('appsemble-dev')

blob = bucket.blob('kirby.png')
with open(os.path.expanduser('~/Pictures/kirby.png'), 'rb') as img_data:
    blob.upload_from_file(img_data)
blob.metadata = {'Color': 'Pink', 'Spam': 'Cheese'}
blob.metadata['Foo'] = 'Bar'
blob.patch()

This sets the Color and Spam metadata fields, but not the Foo field. There is no warning about this whatsoever.

Also there is no documentation on the Blob class on how to update metadata.

Thanks for reporting. Looking into it now.

Ahhh. This is because the Blob.metadata getter returns a copy.deepcopy of the dictionary stored, so doing

blob.metadata['Foo'] = 'Bar'

updates the copy, not the dictionary actually stored on the Blob. We should fix this.

@tseaver WDYT?

To allow mutation of the metadata in-place (as opposed to assigning to it wholesale), we would need a way to detect those mutations, in order to mark it as changed (for a subsequent call to blob.patch() to pick up).

Right. Custom dict subclass?

Right, overriding:

  • __delitem__
  • __setitem__
  • clear
  • update
  • setdefault
  • pop
  • popitem

A custom dict class could work for the examples I provided. This would still not make a dict behave as expected tough. Since dicts are mutable, I would expect the following to work:

metadata = {'Holy': 'Grail'}
blob.metadata = metadata
metadata['Ministry'] = 'Silly Walks'
blob.patch()

If the setter of blob.metadata would return this new class, this would no longer refer to the original dict that has been set.

I think this subclass could work if no setter is defined. An empty metadata object could be set in the Blob constructor. This exposes the metadata as if it is a regular dict. This would break with the current behaviour, where setting metadata _is_ allowed.

The fact that the getter returns a copy is not set in stone. We did it to "protect" the integrity of the actual data own by the class, but didn't think through situations like this.

We don't need a getter, but do need a setter to track changes on the object. Can't have a setter without a getter, so that's just how it's gotta be. (Also, all properties are stored on a non-public attribute called _properties, another reason for a getter.)

I suppose it could look something like this:

class Metadata(dict):
    def __init__(self, blob):
        self._blob = blob

    def __setitem__(self, key, value):
        super(self, Metadata).__setitem__(key, value)
        self._blob._patch_property('metadata', self)

    # XXX Repeat pattern for other modifier functions


class Blob(_PropertyMixin):
    ...

    def __init__(self, ...)
        ...
        self._properties['metadata'] = Metadata(self)

    @property
    def metadata(self):
        return self._properties['metadata']

This marks the metadata as modified if it gets edited and ensures the metadata can't be overridden using a setter.

A side effect is that the metadata property is never None. I personally like that better anyway.

blob.upload_from_file()
blob.metadata = {'luke': 'skywalker'}
blob.patch()

worked for me. I was able to upload a file and apply the meta data after the file upload.

@saxenanurag I'm in the middle of reviewing how we are using the ex-apitools code (now in gcloud.streaming), and can see how we _should_ be able to send the patched metadata along with the upload request.

:+1: What he said :grinning:

Thanks.

I think this is a duplicate of #754. The use of _metadata_ is a bit ambiguous as _metadata_ refers to both the _metadata_ of an object (such as _Cache-Control_) and the metadata property, meant for custom _metadata_.

Anyway, to the issue. Any update? The problem with patch is that the object is online briefly with wrong _metadata_, or less briefly in case of a connection failure between two requests. And, it increases metageneration.

>>> text = bucket.blob('file.txt')
>>> text.cache_control = 'no-cache'
>>> text.upload_from_string('text')
>>> text.reload()
>>> text.cache_control
>>> text.metageneration
1
>>> text.cache_control = 'no-cache'
>>> text.patch()
>>> text.reload()
>>> text.cache_control
u'no-cache'
>>> text.metageneration
2

I've updated a previous patch, hack perhaps, which makes it work. It's not tested extensively.

--- a/blob.py
+++ b/blob.py
@@ -494,6 +494,11 @@

         # Configure the upload request parameters.
         request = Request(upload_url, 'POST', headers)
+
+        if self._properties:
+            headers['content-type'] = 'application/json'
+            request.body = json.dumps(self._properties)
+
         upload.configure_request(upload_config, request, url_builder)

         # Configure final URL

The file google/cloud/storage/blob.py is located in your Python site-packages directory.

>>> text = bucket.blob('file.txt')
>>> text.cache_control = 'no-cache'
>>> text.metadata = {'custom': 'metadata'}
>>> text.upload_from_string('text')
>>> text.reload()
>>> text.cache_control
u'no-cache'
>>> text.metadata
{u'custom': u'metadata'}
>>> text.metageneration
1
>>> text.download_as_string()
'text'

3362 passes along all non-ACL blob properties with the (initial or only) upload request.

You can work around this bug by creating a completely new dictionary, assigning it to blob.metadata, and doing blob.patch():

newdict={}
for x in blob.metadata:
    newdict[x] = blob.metadata[x]
    newdict['mynewproperty'] = 'awesome'
blob.metadata=newdict
blob.patch()

Your blob will now have the {'mynewproperty':'awesome'} metadata property in its stored metadata list on GCS.

With patch() the metadata of a file is getting updated properly. But this has two operations,
1) First upload a file with no metadata
2) Then updating the metadata using patch()

Is there any option to upload the file along with its metadata? this should be a single step process

@Prabakaran1410 You don't need to call patch on the blob after upload: since PR #3362, all properties assigned to the blob before upload are passed through.

I had some trouble getting this to work because the metadata was always None (even though I manually added values via the Storage UI). Later I realized I was using bucket.blob("name") rather than bucket.get_blob("name") so the metadata hadn't been loaded. Might help someone else.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

VikramTiwari picture VikramTiwari  路  4Comments

tweeter0830 picture tweeter0830  路  4Comments

LMeinhardt picture LMeinhardt  路  4Comments

stevenschlansker picture stevenschlansker  路  3Comments

mgoddard-pivotal picture mgoddard-pivotal  路  3Comments