I want to get the schema of any tables in BigQuery in json format.
However, I cannot get the table schema of a table using the gcloud api. It just returns an empty list.
dataset = client.dataset(my_dataset)
table = dataset.table(my_table)
print table.schema
[]
@tuanavu, you may need to create the dataset first with dataset.create().
See: https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery-usage.html#dataset-operations
You could also call datasets, token = client.list_datasets() # API request to see your available datasets.
Hi @daspecster,
The dataset and table is already existed in BigQuery. I can print out the table name and description. Only the table schema returns an empty list. That is weird.
print table.dataset_name
print table.name
print table.schema
Output
my_table_name
[]
Tuan
I am also having the same issue on 0.21.0, anaconda python 3.5.2
I'm grabbing the table object by iterating through client -> list_datasets() -> list_tables(), and table.schema returns an empty list.
num_bytes, num_rows, created, and many other properties also return None for tables that I can see exist and have rows (I can also successfully get the table.name and dataset name properties back)
-- EDIT / "FIX" --
I was able to "fix" my issue by calling table.reload() before using the table object, which feels like a bug? As a library user, I'd like for table objects to reliably have their data without having to reload() them before using them. Thanks for your hard work!
Thanks @ahogit,
I was able to get the schema after running table.reload(). It feels like a bug to me.
@tuanavu Creating a Table instance on the client side does not automagically fetch any server-side data, even if the table already exists on the server. The call to reload() is exactly how you are supposed to fetch that information.
@tseaver
The Table objects that I get back from list_tables() also have an empty schema list and other properties. I need to reload() the returned object before I can use it, is that also expected? list_tables() should be returning server side data, right?
@tseaver i've also just run into this problem. the documentation says 'API call: refresh table properties via a GET request'. to me, refresh means that the information is already fetched and it needs to be refreshed because something else has updated it. and a call that needs server side information should then go to the server and get it, without an additional call (which, if needed for some reason, should be called something like 'get_serverside_data()'
@ahogit Unfortunately, the items returned by the tables/list API do not contain the full Table resource: they have only the barest identifying fields for the table. The call to reload() is thus necessary.
@mdmiller53 I'm sorry that the docs aren't as helpful as we would like. I think we could tweak the docstring to clarify, but we're unlikely to rename the reload() method at this point.
@tseaver thanks! update to docs would be great, i wouldn't have thought of trying the reload() method
Most helpful comment
I am also having the same issue on 0.21.0, anaconda python 3.5.2
I'm grabbing the table object by iterating through client -> list_datasets() -> list_tables(), and table.schema returns an empty list.
num_bytes, num_rows, created, and many other properties also return None for tables that I can see exist and have rows (I can also successfully get the table.name and dataset name properties back)
-- EDIT / "FIX" --
I was able to "fix" my issue by calling
table.reload()before using the table object, which feels like a bug? As a library user, I'd like for table objects to reliably have their data without having to reload() them before using them. Thanks for your hard work!