Google-cloud-python: Bigquery: add ability to restrict Client class to particular dataset

Created on 21 Sep 2018  路  3Comments  路  Source: googleapis/google-cloud-python

We have a multi-tenant approach, and want to be able to create multiple "binds" to different "databases", much like you can with traditional databases like postgres.

This started as a discussion in in the bigquery sqlalchemy dialect project:
https://github.com/mxmzdlv/pybigquery/issues/24
https://github.com/mxmzdlv/pybigquery/pull/25

Especially in this comment:
https://github.com/mxmzdlv/pybigquery/pull/25#issuecomment-423078408

It would elegantly solve all our problems to add the ability to restrict the Client class to a specific dataset.

This might be related to this existing issue, but it doesn't seem to be the priority.
https://github.com/GoogleCloudPlatform/google-cloud-python/issues/5183

We would be happy to put together a pull request for this if it's a welcome change.

Thanks!

feature request question bigquery

All 3 comments

I should mention, that if there's another, similarly ergonomic way to achieve this kind of safe multi-tenancy, we're very open to it. The only other way I could think of was to create many google cloud "projects", but it seems like that would have a substantial maintenance overhead.

Yes, part of this issue is a duplicate of https://github.com/GoogleCloudPlatform/google-cloud-python/issues/5183 in which we propose allowing defaults for many of the job config classes, including the ability to set a default dataset for query jobs.

We are thinking of having a default QueryJobConfig that you can attach to a client. When you go to insert the query job with client.query() any properties that aren't defined in the query job but are defined in the default job config will be overridden.

5183 does not address the _safely_ piece. For that, you need to use different credentials for each client you create. If you are using service account keys, then you'll need a different service account for each dataset and share that dataset with the service account.

Since pybigquery supports a credentials path (and this library supports arbitrary google-auth credentials to the Client constructor), I think that part is covered with some automation on your part for creating service accounts, downloading keys, and granting the service account access to the dataset. I don't think this is something we'd do at the client-library layer, since it covers several APIs and storing keys is application-specific.

Closing as a duplicate of #5183

Was this page helpful?
0 / 5 - 0 ratings