gcloud auth activate-service-account --key-file "service-account.json"
gsutil ls gs://some-bucket ### this works fine!
via ruby it doesn't work using the same service-account.json! why?!
storage = Google::Cloud::Storage.new(
project: 'someproject', keyfile: 'service-account.json')
storage.bucket('some-bucket').files ...
...
forbidden: Caller does not have storage.buckets.get access to bucket 'some-bucket'
I don't need to list buckets! I just want to list all files in a bucket that service account has access to!
Unfortunately, it is not possible to list files without access to the bucket that contains them. This is due to the current design of the library, which requires that the bucket is loaded before listing its files.
why?!
I believe one of the founding goals of this project was a "clean, OOP-inspired design", which comes at some cost to flexibility. For more flexibility, the google-api-client/Google/Apis/StorageV1/StorageService offers a "flatter" API, although authentication is a bit more involved.
I've managed to list bucket files/dirs (using the same service account) using the code bellow:
storage.service.list_files( ...
but this approach seems to be a hack.. it returns an instance of this class Google::Apis::StorageV1::Objects
Please note that calling storage.service.list_files( ... as suggested above is NOT RECOMMENDED as it uses undocumented access to storage.service to use the underlying google-api-client implementation. This implementation is subject to change at any time without warning, and has in fact been changed in many of the other packages in google-cloud-ruby. Should it change in the future for google-cloud-storage, the above code will no longer work.
I can think of two possible solutions for this issue:
Objects.list as a top-level method (Project#files) that accepts the bucket name.Project#bucket to return a stub containing the given bucket name without retrieving the bucket metadata. This stub could then be used to call Bucket#files without permissions to the bucket.@remi This issue raises a tension that I think is shared by #1596, which is the tension between granular access control and the hierarchical access of an OOP-inspired design. We would love to get your input on this issue as well.
Add a mode (option) to
Project#bucketto return a stub containing the given bucket name without retrieving the bucket metadata
There's an existing idiom in the client libraries for doing this, eg.
require "google/cloud/pubsub"
pubsub = Google::Cloud::Pubsub.new
topic = pubsub.topic "my-topic", skip_lookup: true
The Pub/Sub skip_lookup option is documented as such:
Optionally create a Topic object without verifying the topic resource exists on the Pub/Sub service. Calls made on this object will raise errors if the topic resource does not exist. Default is false.
This issue is one of the trade-offs/cons of the approach that these Ruby Google Cloud libraries take. One of the benefits is that my code raises an exception when I call bucket.storage "bucket-that-doesn't exit" rather than raising an exception later on, which may be more challenging to debug.
For this specific case... storage.bucket "bucket", skip_lookup: true would follow an existing library idiom with precedent and fix the issue.
@quartzmo ^ this has the benefits of being an additive change to the library (which is GA). A more significant change, such as storage.bucket (and other such actions) not kicking off an API request would be a substantially bigger conversation.
Note: this has come up a few times for performance because a user may only want to download a file (1 API request needed) but storage.bucket("dogs").file("dog.png").download "dog.png" kicks off *3 API requests. This is out of scope of this issue, just FYI*
Thank you @remi.
@blowmage What is your opinion on adding skip_lookup behavior similar to that in Pub/Sub to all relevant methods in Storage? Do you have any reservations about having taken this approach with Pub/Sub?
I worry about unforeseen complications adding skip_lookup to libraries not originally designed for it. I sat on this question overnight and while I still have my concerns, I don't think they should stop progress and granting users additional control over the API calls made (or not made).
Ping: @blowmage I'm working on a sample where I could really find this feature request very useful. Do you have an ETA? Thank you!
@frankyn Hoping to have something up for review by the end of the week.
Awesome! No rush, I'm excited for this feature. Thank you!
Hey @blowmage (follow-up).
Adding more rationale for this enhancement.
Turns out to use requester pays to download files with the client library this is a necessary enhancement. For a user to download a file using requester pays with the Ruby client, they need to provide permission to storage.buckets.get, storage.objects.list, and storage.objects.get to allUsers (public bucket/objects for this example) and this is only possible by assigning roles roles/storage.objectViewer and (roles/storage.legacyBucketReader or roles/storage.admin). roles/storage.admin is not a way a user should access a storage.buckets.get so a user has to use the legacy role roles/storage.legacyBucketReader.
This enhancement will allow users to not depend on a legacy role when using the Ruby client to access files from a public bucket.
Thanks!
We have added the ability to create Bucket and File objects without first accessing the Storage API by using skip_lookup in the 1.4.0 release.
This means that you should be able to accomplish this using the following code:
require "google/cloud/storage"
storage = Google::Cloud::Storage.new
bucket = storage.bucket "some-bucket", skip_lookup: true
files = bucket.files
Does OOP say everything have to be instantiated and self-inspected? I don't think so. It is about how we write code, not how it is being executed.
It is lazyness and Ruby is fine with it.
One of the benefits is that my code raises an exception when I call
bucket.storage "bucket-that-doesn't exit"rather than raising an exception later on, which may be more challenging to debug.
I won't say it's challenging. Just put it in documentation that this stuff is lazy and if you really want to fail immediately do some intermediate check. I would love skip_lookup to be true by default and stating false would be that check.
Most helpful comment
We have added the ability to create Bucket and File objects without first accessing the Storage API by using
skip_lookupin the 1.4.0 release.This means that you should be able to accomplish this using the following code: