Google-cloud-python: BigQuery: time partitioning of tables and load / query job destination tables

Created on 24 Dec 2017  路  7Comments  路  Source: googleapis/google-cloud-python

Load jobs have attributes to specify partitioning of the table as mentioned in the below link -
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs

It doesnt look like the current bigquery python client supports the "configuration.load.timePartitioning.field" attributes of neigther load job nor a table. These are important attributes to configure a load job. Let me know if these are already implemented and I missed them, but after browsing the code, I dont think they are implemented. If they are not implemented, I believe this is an important feature for this api to be useful.

Edit by @tswast, to better track sub-tasks for this issue

This feature adds the ability to partition a table by a timestamp or datetime column. Changes required:

  • [x] New google.cloud.bigquery.[table].TimePartitioning class to describe time partitioning definition.

    • [x] TimePartitioning.type_ property (defaults to 'DAY' in TimePartitioning constructor)

    • [x] TimePartitionion.field property (string)

    • [x] TimePartitioning.expiration_ms (int, but stored in API representation as a string)

    • [x] TimePartitioning.require_partition_filter property (bool)

    • [x] TimePartitioning.to_api_repr()

    • [x] TimePartitioning.from_api_repr()

  • [x] New QueryJobConfig.time_partitioning property on query configuration to set time partitioning for destination tables.
  • [x] New LoadJobConfig.time_partitioning property on load job configuration to set time partitioning for destination tables.
  • [x] New Table.time_partitioning property to describe time partitioning definition.
  • [x] New TableListItem.time_partitioning property to describe time partitioning definition.
feature request bigquery p2

Most helpful comment

@adityagupta104 In the meantime, you can use

job_config._properties['timePartitioning'] = {'type': 'DAY', 'field': 'yourfield'}

All 7 comments

@tswast can you comment?

I wonder if this might be another case where the back-end has added a feature without bumping the API version.

@tseaver Yeah, this is a relatively new feature. The API docs were just refreshed last week, so they probably appeared then. We're tracking this feature request internally on bug 72959426.

Edit I've copy the summary of required changes to issue description.

@adityagupta104 In the meantime, you can use

job_config._properties['timePartitioning'] = {'type': 'DAY', 'field': 'yourfield'}

The workaround in the latest version (0.32.0) is to use

job_config._properties['load']['timePartitioning'] = {'type': 'DAY', 'field': 'yourfield'}

I just remembered Table has some partioning properties already:

  • partition_expiration
  • partitioning_type

We could add a partition_field property to table instead of a new class.

Then again Time partioning now has a requirePartitionFilter property in the API, so a new class would be better than adding 4 new properties to load jobs and query jobs. I think we can do the class-based method with Table too and point people at it from the existing partioning properties.

+1 for a class instead.

Actually it would be good to group as https://cloud.google.com/bigquery/docs/data-definition-language#specifying_table_partitioning_options for consistency, that is, partition_expiration would not be under a partition class but a (new) options one along with friendlyName, description and labels

@yiga2 I'd love it if the API matched the structure of DDL, but currently this library is modeled after the REST API.

Was this page helpful?
0 / 5 - 0 ratings