The default scopes for the BigQuery Client do not include Google Drive, which makes it difficult to access federated data sources in Google Sheets.
This can be worked around by manually constructing service account credentials with the appropriate scopes but that feels cumbersome. Am I not seeing a hook that allows scope modification when instantiating bigquery.Client()?
I don't believe that Google Drive is directly support to import data from for BigQuery.
This is the list of supported sources that I'm aware of.
https://cloud.google.com/bigquery/loading-data
@tswast Can you comment?
Google Drive is in fact a supported federated query source. https://cloud.google.com/bigquery/external-data-sources#creating_an_external_table_using_google_drive
I don't think it should in the default scopes, but we definitely need a way to amend the scopes used in the client. This will be an issue for any of the external data sources described in https://github.com/GoogleCloudPlatform/google-cloud-python/issues/2372 For example, Cloud Storage will need it's scope to query files there.
@tswast I agree it shouldn't be in the default set, but it would be great to be able to flag it from the default Client. That way it could also be passed as a kwarg to bigquery.Client.from_service_account_json() without modification.
@daspecster I think the reason it isn't listed there is because unlike GCS files, GDrive files (including Google Sheets) can only be external -- not loaded as native tables.
Ran into this today and couldn't find a work-around (even overwriting scopes wouldn't work!). In the mean-time I'm manually re-materializing the google sheets table.
I had the same issue as well -- created a client subclass that added the Drive scope but still couldn't get it to work.
In the meantime you should be able to create a credentials object using google.auth.default(scopes=...)
https://google-auth.readthedocs.io/en/latest/reference/google.auth.html#google.auth.default
or from a service account file
You can pass the credentials object to the client constructor.
https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery-client.html
I wonder if the helper methods such as from_service_account_json() should be removed in favor of using Google Auth credentials directly?
Update: see my comment below.
@tswast cool, thanks! I was able to get
SCOPE = ('https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive')
credentials, _ = google.auth.default(scopes=SCOPE)
client = bigquery.Client(credentials=credentials)
to work. Don't know why my subclass didn't before but this'll do.
@tswast I forgot that used the same workaround as @clifton and materialized the sheet into a real table. 鈽癸笍 What I have above doesn't actually work, so it's likely not a scoping issue.
Another update after taking a crack at this again. I was able to get the below to work, but only if I used the GOOGLE_APPLICATION_CREDENTIALS method with Application Default Credentials (the google.auth.default), not the Google Cloud SDK method. Maybe the scoping interacts weirdly with the SDK method, I'm uncertain.
SCOPE = ('https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive')
credentials, _ = google.auth.default(scopes=SCOPE)
client = bigquery.Client(credentials=credentials)
@cdchan That (i.e. using google.auth.default with scopes) is precisely what I would've suggested.
I'm not sure what the "Google Cloud SDK method" you refer to is, but would be happy to clear it up if I understood better.
@dhermes from what I understand, google.auth.default supplies Application Default Credentials. Those are generated in a variety of methods, one of which pulls from the environment variable GOOGLE_APPLICATION_CREDENTIALS (which worked for me). Another method is to authenticate as a user via the Google Cloud SDK (as explained in the link). The Google Cloud SDK method for getting credentials works for queries in general but I couldn't get it to work with Google Sheets tables.
I did notice that gcloud auth application-default login (the Google Cloud SDK method) does have a --scopes flag and played around with that but still wasn't able to get it to work.
Ah gotcha. Yeah it just depends on what account was being used. I'd guess it was your personal account, which maybe didn't have the right permissions? When you say "couldn't get it to work", this means you got some kind of error?
@dhermes it had the right permissions (my work account) -- I was able to query the regular tables without issues and I had access to the Google Sheet itself.
The error I got was
400 The job encountered an internal error during execution and was unable to complete successfully.
once I scoped the SDK method like so
$ gcloud auth application-default login \
> --scopes=https://www.googleapis.com/auth/userinfo.email, https://www.googleapis.com/auth/bigquery, https://www.googleapis.com/auth/drive, https://www.googleapis.com/auth/spreadsheets
I didn't pursue this much further since that error message was pretty much a dead end. I can provide job ids if that helps.
Thanks for the info! The error message isn't clear to me though maybe @tswast recognizes it?
Not a super helpful error message, but it is true that Google Cloud SDK credentials can't be used for all APIs. The reason is that it is using your user credentials, but associated with the Google Cloud SDK project. My guess is that the Drive and Sheets APIs aren't enabled on the Cloud SDK project.
With service account credentials, GOOGLE_APPLICATION_CREDENTIALS, it is using the project associated with that service account.
Ah good point @tswast. @cdchan You can "verify" this by hitting the tokeninfo endpoint or just check out
>>> credentials.client_id
u'{SOMEID}-BLAHBLAH.apps.googleusercontent.com'
where {SOMEID} is a stand-in for a numerical ID. Then you can navigate to
https://console.cloud.google.com/home/dashboard?project={SOMEID}
to see the settings for that project (or to realize you don't have access to that project).
A question on this --
When I do:
SCOPE = ('https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive')
credentials, _ = google.auth.default(scopes=SCOPE)
client = bigquery.Client(credentials=credentials)
The scopes on the credentials don't seem to be set?
credentials.scopes
gives nothing?
/cc @jonparrott
For thoughts on allowing extra scopes in this client library.
I don't want to add any more options to the constructor of clients and scopes are firmly in the purview of the auth layer. In the context of Cloud APIs and service accounts, scopes are essentially becoming meaningless.
If anything, we should just document how to obtain credentials with different scopes.
Is there a better way to do this then? I have a google sheet external data source BQ table, and I'm trying to read it using the python bigquery module. But I get an error about needing drive scope:
"Forbidden: 403 Access Denied: BigQuery BigQuery: No OAuth token with Google Drive scope was found."
What is the correct way to deal with permissions here?
Hmm,
scopes = (
'https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive',
)
credentials, _ = google.auth.default(scopes=scopes)
client = bigquery.Client(credentials=credentials)
appears to be correct. Are you using a service account? If using the credentials from the gcloud command (cloud sdk) the sheets API is not enabled for that project. Better to use a service account as described at https://cloud.google.com/docs/authentication/getting-started
OK I am just starting out with all of this BUT I have been searching far and wide for a solution to the federated table issue and I haven't' been able to find much. My code is below. I tried adding in the solution above but it won't work. I still receive " 403 Access Denied: BigQuery BigQuery: No OAuth token with Google Drive scope was found."
I am using a service account. Any ideas on where I am going wrong?
import json
from httplib2 import Http
from oauth2client.service_account import ServiceAccountCredentials
from apiclient.discovery import build
scopes = (
'https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive',
)
credentials, _ = google.auth.default(scopes=scopes)
client = bigquery.Client(credentials=credentials)
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'client_key_dev.json', scopes)
import uuid
from google.cloud import bigquery
def query_myproject():
client = bigquery.Client(project='MYPROJECTID')
query_job = client.run_async_query(str(uuid.uuid4()), """
#standardSQL
SELECT *
FROM test.test123
LIMIT 10""")
query_job.begin()
query_job.result() # Wait for job to complete.
destination_table = query_job.destination
destination_table.reload()
for row in destination_table.fetch_data():
print(row)
if __name__ == '__main__':
query_myproject()
@jcommaroto The from oauth2client.service_account import ServiceAccountCredentials is not the correct one to use for this library. The correct one to use is google.oauth2.service_account.Credentials
Second, I see you create a Client in the root of the module, but then discard it and create a new one in the query_myproject() function. You should create a client in the way you do above that function to pass in scopes to the credentials first.
@jonparrott I think you mentioned we should close this issue, as we want to recommend people pass in credentials objects explicitly to the client constructor rather than add more options to the client for auth stuff?
I had some issues with this as well. Solved it like this:
from google.cloud import bigquery
from google.oauth2.service_account import Credentials
scopes = (
'https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive'
)
credentials = Credentials.from_service_account_file('bq_auth.json')
credentials = credentials.with_scopes(scopes)
client = bigquery.Client(credentials=credentials)
Perhaps usable for someone.
Is that correct that the following is not expected to work (for querying spreadsheet tables) by design or is it just a bug which will be fixed at some point:
$ gcloud auth application-default login \
> --scopes=https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/bigquery
(I understand that I could work around with service accounts if really necessary)
Here is btw the error message that I'm getting. The project id mentioned in the error message is not a project in our organisation and the link doesn't lead anywhere.
Forbidden: 403 Access Denied: BigQuery BigQuery: Access Not Configured. Drive API has not been used in project 764086051850 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/drive.googleapis.com/overview?project=764086051850 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.
Here is the python code I'm running:
from google.cloud import bigquery
client = bigquery.Client()
query = "SELECT * FROM tablename LIMIT 5"
query_job = client.query(query)
rows = query_job.result()
for row in rows:
print(row.name)
And I have Drive API definitely enabled. I can query the same table from the console via:
$ bq query_string
My current configuration (and we have tried it on 4 different macbooks)
Google Cloud SDK 187.0.0
Python 2.7
google-cloud 0.27.0
google-cloud-bigquery 0.29.0
cc @dhermes @jonparrott
@dmasin That's correct.
Using
$ gcloud auth application-default login \
> --scopes=https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/bigquery
is not expected to work, as that is using the credentials from the Cloud SDK. Instead it is recommended that you either use a service account credentials or user authentication with client secrets from a project you have created.
thanks for the confirmation @tswast.
I've tried the service account route and it works fine.
If we have 100 analysts querying bigquery (and spreadsheets) locally, is my understanding correct that we would need to share every of those spreadsheets with all analyst's service accounts because you can't create groups of service accounts? This doesn't sound very scalable and is not usable in practice. I'm probably just missing something. Would you have an advice on how to share >100 spreadsheets with hundreds of service accounts?
@dmasin I am able to add a service account to a Google Group on https://groups.google.com, but I wasn't able to on my GSuite account due to administrator restrictions.
Options:
Most helpful comment
I had some issues with this as well. Solved it like this:
Perhaps usable for someone.