I have google-cloud-storage-1.15.0 and want to user Bucket.list_blobs for many objects, so want to limit the fields returned. However, from the doc it's not clear to me:
nextPageToken in the fileds param as well or is it obsolete like the page_token param?items(name,size) or items/name,items/size?Please, fill the doc, thanks.
The fields query parameter is no longer documented for objects.list. Our support for it was always mostly broken (we forcibly convert the returned resource dicts to Blob instances). See #4216.
At this point I would call the fields parameter effectively deprecated.
Thanks, then please document, that it's deprecated :-)
@xmedeko, fields to the best of my knowledge is not deprecated and just not documented in cloud documentation.
For how to use fields. I recommend taking a look at https://cloud.google.com/storage/docs/json_api/v1/how-tos/performance#partial-response. It adds additional context not found in the library documentation.
If I have to specify nextPageToken in the fileds param as well or is it obsolete like the page_token param?
You need to add nextPageToken to the fields parameter. If not it will not be in the response.
How to specify multiple fields and what's the proper form: items(name,size) or items/name,items/size?
The two are equivalent based on the document above, but if I think the short form is clearer items(name,size).
The final value would be items(name,size),nextPageToken.
I'm following up internally (INTERNAL BUG: 132247394).
@frankyn Whether or not the back-end docs for the fields query parameter reappear, we still cannot provide "correct" support for it here: Bucket.list_blobs returns Blob instances, not dictionaries, and we cannot construct them correctly from arbitrarily-restricted sets of fields. That is my rationale for deprecating the argument in #7897.
To support @xmedeko's usecase, we would need to add a new method, something like Bucket.list_blob_info, which returned dictionaries instead of instances.
Apologies, I misunderstood.
It exposes fields, which is probably not optimal for a method which is supposed to return populated Blob instances.
I would expect Blob instances to still be listed but when fields are selected then the Blob instance would only have the select field values. If a user sets this it reduces the amount of information returned in a response.
The return value doesn't have to be a dictionary in this case.
(update) Response from Technical Writer paraphrased: Given the fields query parameter is part of generic fields they're listed here: https://cloud.google.com/storage/docs/json_api/v1/parameters#fields.
The TW will add a link to available query parameters in API Reference documentation. The fields query parameter isn't deprecated.
@frankyn If the user doesn't at least items(name), we can't make a Blob instance from the result at all. If they do suppy items(name,...), then it works (I thought it didn't, but must have typoed something):
>>> from google.cloud.storage import Client
>>> client = Client()
>>> bucket = client.get_bucket('gcp-7875')
>>> blobs = list(bucket.list_blobs(fields="items(name)"))
>>> blobs
[<Blob: gcp-7875, bar>, <Blob: gcp-7875, baz>, <Blob: gcp-7875, foo>]
>>> blobs = list(bucket.list_blobs(fields="items(name,contentType)"))
>>> for blob in blobs:
... print(blob.name, blob.content_type)
...
bar text/plain
baz text/plain
foo text/plain
Nice, I was trying to figure out why it wasn't working, and you beat me to it, thank you!
So it's possible to select fields using Python. The limitation is name of the object is required
The limitation could be helped by instantiating the Blob instance without a name. WDYT?
Cool, it works. I need sometimes just to get prefixes only, so fields = 'prefixes' seems working well, too. (I have not checked how the blobs look like in the response - I do not need them.)
The code is:
biter = bucket.list_blobs(prefix=prefix, delimiter='/', fields='prefixes')
for _ in biter:
pass # just iterate
prefixes = biter.prefixes
The limitation could be helped by instantiating the Blob instance without a name. WDYT?
Without a name, the Blob is pretty useless (no further API requests can be made, for instance, and there is nothing to relate any of the other fields to.
How about documenting that name must be populated in the selected fields and not deprecate the method list_blobs()?
And please, do not forget my original doc request:
nextPageToken in the fileds param as well or is it obsolete like the page_token param?items(name,size) or items/name,items/size?Update: according to my experiment, the nextPageToken is necessary. It's unpleasant, the list_blobs try to hide the paging, but it leaks into the fields parameter. IMO the list_blobs should add it automatically. Something like:
if fields:
fields += ',nextPageToken'
Thanks for resolving this @tseaver.