Google-cloud-python: Making parametrized queries in BigQuery should be easy

Created on 5 Nov 2018  路  4Comments  路  Source: googleapis/google-cloud-python

If you want developers to not make SQL injection vulnerabilities, you should make their job easier.

Look at sqlite3 API design:
c.execute('SELECT * FROM stocks WHERE symbol=?', t).

Compare this to BigQuery:

query = """
    SELECT word, word_count
    FROM `bigquery-public-data.samples.shakespeare`
    WHERE corpus = @corpus
    AND word_count >= @min_word_count
    ORDER BY word_count DESC;
"""
query_params = [
    bigquery.ScalarQueryParameter('corpus', 'STRING', 'romeoandjuliet'),
    bigquery.ScalarQueryParameter('min_word_count', 'INT64', 250)
]
job_config = bigquery.QueryJobConfig()
job_config.query_parameters = query_params
query_job = client.query(
    query,
    # Location must match that of the dataset(s) referenced in the query.
    location='US',
    job_config=job_config)  # API request - starts the query

Why I need to put query params into job config?
Why I need to mutate the job config?

Query params should be a query function parameter. Or at least part of QueryJobConfig constructor, so I can do:

query_job = client.query(
    query,
    # Location must match that of the dataset(s) referenced in the query.
    location='US',
    job_config=QueryJobConfig(query_pameters=query_params))  # API request - starts the query
feature request bigquery

Most helpful comment

I like the idea of adding the properties to the constructor as optional arguments. I'll look into this now and see if I can find a way to do that without creating a big maintenance burden.

All 4 comments

@tswast, @alixhami RFC

@usamec You can do exactly the same code as in SQLite with the google-cloud-bigquery API. See the DB-API module reference: https://googleapis.github.io/google-cloud-python/latest/bigquery/dbapi.html

That said, I agree that we could make it less verbose to create the query parameter configuration by using the same helper code we do in the DB-API module.

Why I need to put query params into job config?

I chose to have very few parameters to the query() method because there are many options in the API's query job resource, and it's not at all clear where to draw the line. Also, having a QueryJobConfig object matches much more closely to that of the underlying REST API.

Why I need to mutate the job config?

Regarding adding constructor options for each parameter, it would make the handwritten config class code harder to understand and make updates to. It means we'd need to map from constructor arguments to properties. It's also not at all clear where to draw the line for properties. There are many properties of the QueryJobConfig class and I think it'd be way to many arguments to have one for each property.

I like the idea of adding the properties to the constructor as optional arguments. I'll look into this now and see if I can find a way to do that without creating a big maintenance burden.

Thanks for the suggestion @usamec! Job config properties can now be specified in the constructor. The update made it into the release today - version 1.7.0 (changelog)

Was this page helpful?
0 / 5 - 0 ratings