Load jobs have attributes to specify partitioning of the table as mentioned in the below link -
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs
It doesnt look like the current bigquery python client supports the "configuration.load.timePartitioning.field" attributes of neigther load job nor a table. These are important attributes to configure a load job. Let me know if these are already implemented and I missed them, but after browsing the code, I dont think they are implemented. If they are not implemented, I believe this is an important feature for this api to be useful.
Edit by @tswast, to better track sub-tasks for this issue
This feature adds the ability to partition a table by a timestamp or datetime column. Changes required:
google.cloud.bigquery.[table].TimePartitioning class to describe time partitioning definition.TimePartitioning.type_ property (defaults to 'DAY' in TimePartitioning constructor)TimePartitionion.field property (string)TimePartitioning.expiration_ms (int, but stored in API representation as a string)TimePartitioning.require_partition_filter property (bool)TimePartitioning.to_api_repr()TimePartitioning.from_api_repr()QueryJobConfig.time_partitioning property on query configuration to set time partitioning for destination tables.LoadJobConfig.time_partitioning property on load job configuration to set time partitioning for destination tables.Table.time_partitioning property to describe time partitioning definition.TableListItem.time_partitioning property to describe time partitioning definition.@tswast can you comment?
I wonder if this might be another case where the back-end has added a feature without bumping the API version.
@tseaver Yeah, this is a relatively new feature. The API docs were just refreshed last week, so they probably appeared then. We're tracking this feature request internally on bug 72959426.
Edit I've copy the summary of required changes to issue description.
@adityagupta104 In the meantime, you can use
job_config._properties['timePartitioning'] = {'type': 'DAY', 'field': 'yourfield'}
The workaround in the latest version (0.32.0) is to use
job_config._properties['load']['timePartitioning'] = {'type': 'DAY', 'field': 'yourfield'}
I just remembered Table has some partioning properties already:
partition_expirationpartitioning_typeWe could add a partition_field property to table instead of a new class.
Then again Time partioning now has a requirePartitionFilter property in the API, so a new class would be better than adding 4 new properties to load jobs and query jobs. I think we can do the class-based method with Table too and point people at it from the existing partioning properties.
+1 for a class instead.
Actually it would be good to group as https://cloud.google.com/bigquery/docs/data-definition-language#specifying_table_partitioning_options for consistency, that is, partition_expiration would not be under a partition class but a (new) options one along with friendlyName, description and labels
@yiga2 I'd love it if the API matched the structure of DDL, but currently this library is modeled after the REST API.
Most helpful comment
@adityagupta104 In the meantime, you can use
job_config._properties['timePartitioning'] = {'type': 'DAY', 'field': 'yourfield'}