Google-cloud-python: [QUESTION][BigQuery] would it be possible to have extract_table_to_storage support with an optional query

Created on 20 Sep 2016  路  3Comments  路  Source: googleapis/google-cloud-python

Hi

Would be great to have a query as an optional field to the extract_table_to_storage method in the google.cloud.bigquery.client.py , so that it will be easy with code to extract the data from the desired table with the custom query and write to GS, or is there any other way to directly write the custom query data after creating the job

import uuid 
from gcloud import bigquery
client = bigquery.Client(project='test-sample')
query = """\
SELECT * FROM test_database.activity \
where activity_date = DATE_ADD(CURRENT_DATE(), -1, 'DAY') order by activity_date;"""
dataset = client.dataset('test_database')
job_name = 'test_job_name_' +  str(uuid.uuid4())
query_job = client.run_async_query('fullname-age-query-job', query)
job = bigquery_client.extract_table_to_storage(
    client.run_async_query(job_name, query), destination)

or is there any other way for this work around, i have some tables as partitioned tables and some with out partition tables, and would need to export some times the data from partitioned and the non-partitioned tables for a given date for analysis with other tools.

Thanks

question bigquery

Most helpful comment

@vrcs The back-end extract operation doesn't support such an option. To achieve this goal, you could:

  • Run an asynchronous QueryJob, passing a temporary table as its destination.
  • Run an ExtractJob on that temporary table.

Untested sketch:

import time
from google.cloud.bigquery import Client

client = Client()
dataset = client('my_dataset')
temp_table = dataset.table('query-results')
assert not temp_table.exists()
query = 'SELECT * from my_dataset.widgets WHERE frob_count >= 100')

query_job = client.run_async_query('query-frobbed-widgets', query)
query_job.destination = temp_table
query_job.begin()

while query_job.state.lower() != 'done':
    query.job.reload()
    time.sleep(1)

extract_job = client.extract_table_to_storage(
    'extract-query-results', temp_table,
    'gs://my-bucket-name/frobbed-widgets*')
extract_job.destination_format = 'CSV'
extract_job.begin()

while extract_job.state.lower() != 'done':
    extract.job.reload()
    time.sleep(1)

Please feel free to re-open the issue if that example doesn't help.

All 3 comments

@vrcs The back-end extract operation doesn't support such an option. To achieve this goal, you could:

  • Run an asynchronous QueryJob, passing a temporary table as its destination.
  • Run an ExtractJob on that temporary table.

Untested sketch:

import time
from google.cloud.bigquery import Client

client = Client()
dataset = client('my_dataset')
temp_table = dataset.table('query-results')
assert not temp_table.exists()
query = 'SELECT * from my_dataset.widgets WHERE frob_count >= 100')

query_job = client.run_async_query('query-frobbed-widgets', query)
query_job.destination = temp_table
query_job.begin()

while query_job.state.lower() != 'done':
    query.job.reload()
    time.sleep(1)

extract_job = client.extract_table_to_storage(
    'extract-query-results', temp_table,
    'gs://my-bucket-name/frobbed-widgets*')
extract_job.destination_format = 'CSV'
extract_job.begin()

while extract_job.state.lower() != 'done':
    extract.job.reload()
    time.sleep(1)

Please feel free to re-open the issue if that example doesn't help.

Got it working , need to make use of legacy sql to true to parse to get yesterdays date with
this expression because of quotes in the exp .DATE_ADD(CURRENT_DATE(), -1, 'DAY')

Thanks you very much @tseaver

@vrcs I'm glad you're up and running. We are working toward full support for the "Standard SQL" dialect: #2229, #2354. Please add another issue if those don't cover your case.

Was this page helpful?
0 / 5 - 0 ratings