Google-cloud-python: set correct project when using from_service_account_json

Created on 21 Jun 2016  路  17Comments  路  Source: googleapis/google-cloud-python

When instantiating with bigquery.Client.from_service_account_json('...') the project field in the client should be set to the project_id defined in the service account JSON.

Right now it is still the one from env.

feature request core auth p2

Most helpful comment

I ran into this issue when attempting to switch from the more manual googleapiclient to this library.

Using googleapiclient:

from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient import discovery

scopes = ['https://www.googleapis.com/auth/devstorage.full_control']
credentials = ServiceAccountCredentials.from_json_keyfile_name('service_acct_key.json', scopes)
# No need to specify project ID:
svc = discovery.build('storage', 'v1', credentials=credentials)
# svc.objects().insert() et al does not require project ID

Using google.cloud:

from google.cloud import storage

# this fails unless default project ID can be derived from env
client = storage.Client.from_service_account_json('service_acct_key.json')

As pointed out in this thread, if we manually specify the project argument to Client in the latter example it will work. Dunno how relevant this comparison is, but the difference in default behaviour took me a while to figure out.

All 17 comments

Thanks for reporting @kelvinabrokwa!

I was able to reproduce the issue with

$ export GCLOUD_PROJECT=testing-project
>>> from gcloud import bigquery
>>> import os
>>> os.getenv('GCLOUD_PROJECT')
'testing-project'
>>> bc = bigquery.Client.from_service_account_json('creds.json')
>>> bc.project
u'testing-project'

I'll try and figure out what's up.

@daspecster thanks for the quick response!

The credentials check and project check are completely decoupled. The from_service_account_* helpers pass an explicit credentials instance, hence they bypass the check. We may want to also pass an explicit project.

@dhermes, what do you think about just adding/calling a helper function to extract the project id from the file at this point?

https://github.com/GoogleCloudPlatform/gcloud-python/blob/master/gcloud/client.py#L60

This is actually tricky because a service account can have access to multiple different projects (ie, you can go into another project and add that service account to the project so it has access over there).

If we were to automagically pick up the project ID from the key file (JSON), how would people specify "I'm using a service account from project A to talk to resources in project B" ?

True @jgeewax, similar issue in #1827 right?

Yea I'd say these are related.

@jgeewax good point. I actually just ran into that today.

The way I'm looking at it now, the project_id in the JSON key file is only an indicator of where the service account lives, not necessarily which project it can talk to.

With that in mind, I can definitely imagine a scenario in which someone sets their env var purposefully to the project they want and then run into a bug where the library is using the one from the key file.

Its more a question of developer experience at this point.

I could be wrong, but this might be one of those special situations where we should try to separate a client from credentials, as well as loading a client from a service account from loading credentials from a service account.

For example, when I load a _Client_ from a service account, I'm expecting that all data in the service account is used to create a client. I think you're right in saying this should pull a project ID if that's there.

On the other hand, if I were to create a client and set the credentials to ones from a service account, I'm expecting that it only reads in the _credential_, and pulls the project based on the default patterns.

from gcloud import bigquery

# Get all defaults: credentials, project ID, scope, etc.
client = bigquery.Client()

# Get everything we possibly can from the service account JSON file
client = bigquery.Client.from_service_account_json('key.json')

# Get credentials from the service account JSON file, and defaults for everything else
client = bigquery.Client(credentials=Credentials.from_service_account_json('key.json'))

# Use key.json for credentials, and 'my-project' as the project ID
client = bigquery.Client(project='my-project',
                         credentials=Credentials.from_service_account_json('key.json'))

@tseaver , @daspecster : Is this something that would make sense across the project?

That makes sense to me at first glance.

In the last example, that would override envar GCLOUD_PROJECT, correct?

For the second example,

# Get everything we possibly can from the service account JSON file
client = bigquery.Client.from_service_account_json('key.json')

If the project ID isn't in the key.json then it would run through the default search order?
There's a chance that a user might expect the project ID to be in key.json but it's not and then we would want to be pretty clear on what's happening.

In an application where multiple projects are used, I would assume that I would have to specify each one as I wanted to access it.

If the project ID isn't in the JSON file, then they manually deleted it... I don't think that's a common thing...

I ran into this issue when attempting to switch from the more manual googleapiclient to this library.

Using googleapiclient:

from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient import discovery

scopes = ['https://www.googleapis.com/auth/devstorage.full_control']
credentials = ServiceAccountCredentials.from_json_keyfile_name('service_acct_key.json', scopes)
# No need to specify project ID:
svc = discovery.build('storage', 'v1', credentials=credentials)
# svc.objects().insert() et al does not require project ID

Using google.cloud:

from google.cloud import storage

# this fails unless default project ID can be derived from env
client = storage.Client.from_service_account_json('service_acct_key.json')

As pointed out in this thread, if we manually specify the project argument to Client in the latter example it will work. Dunno how relevant this comparison is, but the difference in default behaviour took me a while to figure out.

@jonparrott @lukesneeringer I have thought about this but written no code. The issue is that some clients don't ever use a project while some don't ever use http. I want to just combine _ClientFactoryMixin and _ClientProjectMixin but not all clients use a project, so we would only optionally pass project with the args.

| Package | HTTP | Project |
|:----------------:|:----:|:-------:|
| BigQuery | YES | YES |
| Bigtable | | YES |
| Datastore | YES | YES |
| DNS | YES | YES |
| Error Reporting | YES | YES |
| Language | YES | |
| Logging | YES | YES |
| Monitoring | YES | YES |
| Pub / Sub | YES | YES |
| Resource Manager | YES | |
| Runtime Config | YES | YES |
| Spanner | | YES |
| Speech | YES | |
| Storage | YES | YES |
| Translate | YES | |
| Vision | YES | YES |

WDYT?

Optional project on every client seems fine with me.

Also, google-auth doesn't choke on ADC if it can't determine the project.

Fine with me as long as it can be done without a breaking change. I am not keen to have a 0.23 to 0.24 path that is as painful as this one.

In some sense it can't be done without breaking some behaviors. It would add project as a keyword (unless 'project' in kwargs). So this would magically change some people's code's behavior, but it's the right change for us.

@lukesneeringer This is still a bug

Was this page helpful?
0 / 5 - 0 ratings