Google-cloud-python: BigQuery `insert_data()` fails with record type columns (nested objects)

Created on 20 Jan 2017  Â·  11Comments  Â·  Source: googleapis/google-cloud-python

BigQuery tabledata.insertAll supports inserting rows with columns that take record types (nested objects).

A customer reports that

My table structure has nested schemas. The [google-cloud-python] docs don't say how to handle nested structures. All we get is, "rows (list of tuples) – Row data to be inserted. Each tuple should contain data for each schema field on the current table and in the same order as the schema fields." A list of tuples. That's it. If I have nested data do I make sub-tuples? Is it simply not supported at this time?

I dug into the BQ code to find out what was going on. BQ does,

zip(table._schema, rows)

which looks like it won't support nested inserts (or could -- with sub-tuples, but who knows).

Example in the API explorer, with the following table schema:

image

Try it out in the API Explorer for bigquery.tabledata.insertAll. Enter the following for the request body (after creating a table with the same schema). You need to switch the editor to free-form mode

{
  "rows": 
  [
    {
      "insert_id": "12345",
      "json": 
      {
        "string_col": "Some value",
        "record_col": {
          "nested_string": "another string value",
          "nested_repeated": ["0", "1", "2"],
          "nested_record": {
            "nested_nested_string": "some deep insight"
          }
        }
      }
    }
  ]
}

I can verify that this works with a SQL query.

#standardSql
SELECT * FROM `swast-scratch.hello_world.nested` LIMIT 1000

image

It is not clear from the docs how to do a similar call from our client libraries (and it likely isn't possible).

bug bigquery release blocking p0

Most helpful comment

I was able to insert nested data by wrapping the RECORD (nested) field in a dict() object.

For example if I have the following schema:

    SCHEMA_NAME = [
        bigquery.table.SchemaField('some_filename', 'STRING', mode='REQUIRED'),
        bigquery.table.SchemaField('some_timestamp', 'TIMESTAMP', mode='REQUIRED'),
        bigquery.table.SchemaField('some_file_info', 'RECORD', mode='REQUIRED', fields = [
            bigquery.table.SchemaField('some_number', 'INTEGER', mode='REQUIRED'),
            bigquery.table.SchemaField('some_other_number', 'INTEGER', mode='NULLABLE'),
            bigquery.table.SchemaField('some_float', 'FLOAT', mode='NULLABLE')])
    ]

The following data structure works:

    ROWS_TO_INSERT = [
    ("this_is_my_filename", datetime.datetime.utcnow(),\
        dict(some_number = 455, some_other_number= 6, some_float= 5.11))
    ]

All 11 comments

I'm having the same issue with the BigQuery library. It would be much easier if we could use an actual dictionary.

I tried using sub-tuples (with parentheses), brackets, etc. but it keeps returning _invalid_argument: 'Array specified for non-repeated field._

I was able to insert nested data by wrapping the RECORD (nested) field in a dict() object.

For example if I have the following schema:

    SCHEMA_NAME = [
        bigquery.table.SchemaField('some_filename', 'STRING', mode='REQUIRED'),
        bigquery.table.SchemaField('some_timestamp', 'TIMESTAMP', mode='REQUIRED'),
        bigquery.table.SchemaField('some_file_info', 'RECORD', mode='REQUIRED', fields = [
            bigquery.table.SchemaField('some_number', 'INTEGER', mode='REQUIRED'),
            bigquery.table.SchemaField('some_other_number', 'INTEGER', mode='NULLABLE'),
            bigquery.table.SchemaField('some_float', 'FLOAT', mode='NULLABLE')])
    ]

The following data structure works:

    ROWS_TO_INSERT = [
    ("this_is_my_filename", datetime.datetime.utcnow(),\
        dict(some_number = 455, some_other_number= 6, some_float= 5.11))
    ]

Thanks @Ngg971 for following up. I think there still might be a bug here. I think it's likely that timestamp fields within the nested records might not be sent over the wire correctly by using a dict like this, but I might be wrong.

We need a system test to verify.

@tswast Like the back-end API, we do not flatten nested records into tuples, but preserve them as dicts. PR #3171 adds a system test showing how to insert your original example.

I also see this problem. I'm confused about the status of this bug. This is a dupe of 3695; and 3695 is a dupe of this. Should one of these bugs be open?

@melissachang It seems like the other one was closed because it was a duplicate. This one seemed to have been fixed with PR #3171. Could you try following the example given in that PR and see if you still see the problem?

That PR only adds tests, so it couldn't have fixed the problem.

Sorry, I meant that the PR shows an example which solved the OP's problem which may also fix yours. Please do keep us posted. Thanks.

@melissachang Hello, were you able to resolve your issue following the example?

Hi, I don't remember where I saw this problem and how I worked around it, sorry.

No problem, glad it worked out.

Was this page helpful?
0 / 5 - 0 ratings