Google-cloud-python: BigQuery DataTransfer: Error in scheduling runs

Created on 4 Jun 2018  路  6Comments  路  Source: googleapis/google-cloud-python

I'm trying to use Python's API for BigQuery DataTransfer, but I'm getting RPC errors. I'm not sure if there's a problem with the API or a general configuration problem with the script.

google-cloud-python version:

Name: google-cloud
Version: 0.32.0
Summary: API Client library for Google Cloud
Home-page: https://github.com/GoogleCloudPlatform/google-cloud-python
Author: Google Cloud Platform
Author-email: [email protected]
License: Apache 2.0
Location: /home/ubuntu/python3_virtualenv/python3_env/lib/python3.6/site-packages
Requires: google-cloud-resource-manager, google-cloud-language, google-cloud-storage, google-cloud-trace, google-cloud-datastore, google-cloud-pubsub, google-cloud-core, google-cloud-speech, google-cloud-spanner, google-cloud-translate, google-cloud-vision, google-cloud-videointelligence, google-cloud-error-reporting, google-cloud-bigquery, google-cloud-firestore, google-cloud-bigquery-datatransfer, google-cloud-dns, google-cloud-bigtable, google-cloud-container, google-cloud-monitoring, google-cloud-logging, google-api-core, google-cloud-runtimeconfig

Example Code:

client = bigquery_datatransfer.DataTransferServiceClient()

timestamp_start = timestamp_pb2.Timestamp()
timestamp_start.FromSeconds(1524022447)

timestamp_end = timestamp_pb2.Timestamp()
timestamp_end.FromSeconds(1524133447)

client.schedule_transfer_runs(client.get_transfer_config("projects/<PROJECT_ID>/locations/us/transferConfigs/<TRANSFER_ID>").name,
                              start_time=timestamp_start,
                              end_time=timestamp_end)

Example Error:

---------------------------------------------------------------------------
_Rendezvous                               Traceback (most recent call last)
~/python3_virtualenv/python3_env/lib/python3.6/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     53         try:
---> 54             return callable_(*args, **kwargs)
     55         except grpc.RpcError as exc:

~/python3_virtualenv/python3_env/lib/python3.6/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials)
    499         state, call, = self._blocking(request, timeout, metadata, credentials)
--> 500         return _end_unary_response_blocking(state, call, False, None)
    501 

~/python3_virtualenv/python3_env/lib/python3.6/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
    433     else:
--> 434         raise _Rendezvous(state, None, None, deadline)
    435 

_Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.INVALID_ARGUMENT, Request contains an invalid argument.)>

The above exception was the direct cause of the following exception:

InvalidArgument                           Traceback (most recent call last)
<ipython-input-40-13d74c5cc600> in <module>()
      7 client.schedule_transfer_runs(client.get_transfer_config("projects/967176960612/locations/us/transferConfigs/5aa1c6a1-0000-252e-b3b7-f403043605f4").name,
      8                               start_time=timestamp_start,
----> 9                               end_time=timestamp_end)

~/python3_virtualenv/python3_env/lib/python3.6/site-packages/google/cloud/bigquery_datatransfer_v1/gapic/data_transfer_service_client.py in schedule_transfer_runs(self, parent, start_time, end_time, retry, timeout)
    711             parent=parent, start_time=start_time, end_time=end_time)
    712         return self._schedule_transfer_runs(
--> 713             request, retry=retry, timeout=timeout)
    714 
    715     def get_transfer_run(self,

~/python3_virtualenv/python3_env/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py in __call__(self, *args, **kwargs)
    137             kwargs['metadata'] = metadata
    138 
--> 139         return wrapped_func(*args, **kwargs)
    140 
    141 

~/python3_virtualenv/python3_env/lib/python3.6/site-packages/google/api_core/retry.py in retry_wrapped_func(*args, **kwargs)
    258                 sleep_generator,
    259                 self._deadline,
--> 260                 on_error=on_error,
    261             )
    262 

~/python3_virtualenv/python3_env/lib/python3.6/site-packages/google/api_core/retry.py in retry_target(target, predicate, sleep_generator, deadline, on_error)
    175     for sleep in sleep_generator:
    176         try:
--> 177             return target()
    178 
    179         # pylint: disable=broad-except

~/python3_virtualenv/python3_env/lib/python3.6/site-packages/google/api_core/timeout.py in func_with_timeout(*args, **kwargs)
    204             """Wrapped function that adds timeout."""
    205             kwargs['timeout'] = next(timeouts)
--> 206             return func(*args, **kwargs)
    207 
    208         return func_with_timeout

~/python3_virtualenv/python3_env/lib/python3.6/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     54             return callable_(*args, **kwargs)
     55         except grpc.RpcError as exc:
---> 56             six.raise_from(exceptions.from_grpc_error(exc), exc)
     57 
     58     return error_remapped_callable

~/python3_virtualenv/python3_env/lib/python3.6/site-packages/six.py in raise_from(value, from_value)

InvalidArgument: 400 Request contains an invalid argument.
question backend bigquerydatatransfer

Most helpful comment

@tswast I was able to create the transfer in the UI.

But actually I figured out it was not the API's issue, but authentication wise.

Here I'll post my solution if anyone has the same issue with data transfer.

  1. I created a service account for all my google APIs.
  2. The service account does not have authentication to access adwords, because in order to gain access, the service account needs to at least be a read-only.
  3. But you cannot add the service account to adwords, since if you added the service account to adwords account then adwords send out an email to the service account, which I don't think you have access to.
  4. The permissions issue does not get returned by Google's servers, instead it is returned as dan INVALID_ARGUMENT, which incorrect. If you tried the same command with gcloud alpha cloud-shell ssh (while authenticated as a service account), and tried the bq command: bq mk --transfer_config --project_id=PROJECT_ID --target_dataset=temp --display_name=temp_tr --params='{"customer_id": "XXX-XXX-XXXX", "exclude_removed_items": "false"}' --data_source=adwords, then it will complain that adwords is an unknown data source.
  5. But if you tried the above command under a user account which has access to both Big Query and Adwords account, then it succeeds.

The workaround I did was get authenticated as a user (as opposed to a service account) using:

https://console.developers.google.com/apis/credentials/oauthclient
https://developers.google.com/oauthplayground/

Using the API credentials, create a OAUTH2 client and secret, and enter it into the oauthplayground. This will return a refresh key, which you can call Google's authentication server to give you an access token, for example:

To get a access token (they expire hourly):

import requests
import json
request = requests.post(f"https://www.googleapis.com/oauth2/v4/token?client_id={client_id}&client_secret={client_secret}&refresh_token={refresh_token}&grant_type=refresh_token")
access_token = json.loads(request.text)["access_token"]

Then create the biquery client using the access token:

from google.cloud import bigquery_datatransfer
from google.oauth2.credentials import Credentials
client = bigquery_datatransfer.DataTransferServiceClient(credentials=Credentials(token=access_token))

Then you can successfully initiate a transfer:

project_id = "blabla-land-18304"
parent = client.project_path(project_id)
config = bigquery_datatransfer.types.TransferConfig()
config.destination_dataset_id = "temp"
config.display_name = "temp display"
config.data_source_id = "adwords"
config.schedule = "every 24 hours"
config.data_refresh_window_days = 7
config.disabled = False
config.params["customer_id"] = "XXX-XXX-XXXX"
config.params["exclude_removed_items"] = False
config.params["exclude_inactive_accounts"] = False
client.create_transfer_config(parent=parent, transfer_config=config)

All 6 comments

@tswast ISTM that error message should at least contain the name (if not the value) of the invalid argument. I'm not sure who the back-end PoC is for BQDT: can you loop the right person in?

@mtai Can you forward this feedback to the right folks? The error messages for this BQ-DTS error don't contain enough info to debug.

Are you able to create a transfer in the UI?

@tswast I was able to create the transfer in the UI.

But actually I figured out it was not the API's issue, but authentication wise.

Here I'll post my solution if anyone has the same issue with data transfer.

  1. I created a service account for all my google APIs.
  2. The service account does not have authentication to access adwords, because in order to gain access, the service account needs to at least be a read-only.
  3. But you cannot add the service account to adwords, since if you added the service account to adwords account then adwords send out an email to the service account, which I don't think you have access to.
  4. The permissions issue does not get returned by Google's servers, instead it is returned as dan INVALID_ARGUMENT, which incorrect. If you tried the same command with gcloud alpha cloud-shell ssh (while authenticated as a service account), and tried the bq command: bq mk --transfer_config --project_id=PROJECT_ID --target_dataset=temp --display_name=temp_tr --params='{"customer_id": "XXX-XXX-XXXX", "exclude_removed_items": "false"}' --data_source=adwords, then it will complain that adwords is an unknown data source.
  5. But if you tried the above command under a user account which has access to both Big Query and Adwords account, then it succeeds.

The workaround I did was get authenticated as a user (as opposed to a service account) using:

https://console.developers.google.com/apis/credentials/oauthclient
https://developers.google.com/oauthplayground/

Using the API credentials, create a OAUTH2 client and secret, and enter it into the oauthplayground. This will return a refresh key, which you can call Google's authentication server to give you an access token, for example:

To get a access token (they expire hourly):

import requests
import json
request = requests.post(f"https://www.googleapis.com/oauth2/v4/token?client_id={client_id}&client_secret={client_secret}&refresh_token={refresh_token}&grant_type=refresh_token")
access_token = json.loads(request.text)["access_token"]

Then create the biquery client using the access token:

from google.cloud import bigquery_datatransfer
from google.oauth2.credentials import Credentials
client = bigquery_datatransfer.DataTransferServiceClient(credentials=Credentials(token=access_token))

Then you can successfully initiate a transfer:

project_id = "blabla-land-18304"
parent = client.project_path(project_id)
config = bigquery_datatransfer.types.TransferConfig()
config.destination_dataset_id = "temp"
config.display_name = "temp display"
config.data_source_id = "adwords"
config.schedule = "every 24 hours"
config.data_refresh_window_days = 7
config.disabled = False
config.params["customer_id"] = "XXX-XXX-XXXX"
config.params["exclude_removed_items"] = False
config.params["exclude_inactive_accounts"] = False
client.create_transfer_config(parent=parent, transfer_config=config)

Thanks for sharing! Sounds like it'll be important to have user-authentication samples for BQ-DTS.

Was this page helpful?
0 / 5 - 0 ratings