Prefect: Add BigQueryLoadJob

Created on 27 Apr 2020  路  3Comments  路  Source: PrefectHQ/prefect

Current behavior

The gcp.bigquery task library doesn't an implementation of client.load_table_from_file yet. It is very similar to the existing task BigQueryLoadGoogleCloudStorage which uses client.load_table_from_uri.

Proposed behavior

Add gcp.bigquery.BigQueryLoadFiel with similar behavior as `gcp.bigquery.BigQueryLoadGoogleCloudStorage

Example

See proposed docstring below.

 """
    Task for insert records from a file in a Google BigQuery table via a [load job](https://cloud.google.com/bigquery/docs/loading-data).
    Note that all of these settings can optionally be provided or overwritten at runtime.
    Args:
        - file (str, optional): string or path-like object to load data from
        - dataset_id (str, optional): the id of a destination dataset to write the
            records to
        - table (str, optional): the name of a destination table to write the
            records to
        - project (str, optional): the project to initialize the BigQuery Client with; if not provided,
            will default to the one inferred from your credentials
        - schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
        - location (str, optional): location of the dataset that will be queried; defaults to "US"
        - credentials_secret (str, optional, DEPRECATED): the name of the Prefect Secret
            containing a JSON representation of your Google Application credentials
        - **kwargs (optional): additional kwargs to pass to the `Task` constructor
    """

Questions

What is preferred:

  • Create separate task as described above?
  • Modify gcp.bigquery.BigQueryLoadGoogeCloudStorage into more generic gcp.biqguqery,BigQuueryLoad which accepts GCS path or local file as input

Please let me know what your thoughts are, am happy to write a PR in the coming week.

enhancement task library

Most helpful comment

@joshmeek thx for your input. Since it is probably also easier (at least for me) to write a separate task, I'll take that approach for now.

All 3 comments

@dkapitan I'm more in favor of a separate task as opposed to one that uses two different input types! No real strong feeling though if anyone else has a preference 馃槃

@joshmeek thx for your input. Since it is probably also easier (at least for me) to write a separate task, I'll take that approach for now.

Closed with #2463

Was this page helpful?
0 / 5 - 0 ratings