I _thought_ I'd discussed this with @tswast, and he'd even generously added a PR; but after ten mins of searching GH I've come up empty on both issues and PRs, so posting here.
@tswast lmk if my memory is correct and we have discussed this!
I appreciate API design is difficult and there are tradeoffs between verbosity vs explicitness, and consistency within a product vs language-specific adjustments. The bar should be high for people offering criticism when they can only see a subset of the relevant information.
But I do frequently use the BQ python API and find it fairly awkward and un-pythonic, as though I were writing Java.
Here's an example, straight from the docs:
from google.cloud import bigquery
client = bigquery.Client()
dataset_id = 'my_dataset' # replace with your dataset ID
table_id = 'my_table' # replace with your table ID
table_ref = client.dataset(dataset_id).table(table_id)
table = client.get_table(table_ref) # API request
I would love to be able to write this, and receive a table object.
client = bigquery.Client()
client.table(dataset='ds', table='tbl')
There are some points a level down (the .table object doesn't return a Table object while a dataset object does return a Dataset object / do we need a TableReference class in place of a string, etc), but they all center around the ergonomics of the API, particularly in respect to Python.
Thank you as ever for a wonderful product, and appreciate any thoughts on whether I'm making mistakes here.
In our previous conversation, I added a from_string() method to the DatasetReference and TableReference classes. https://github.com/GoogleCloudPlatform/google-cloud-python/pull/5255
You're right that the BigQuery client does still feel overly verbose. I find it a problem that from_string() can't account for the default project on the client.
The reason we have the *Reference classes is that the BigQuery API has such resources in the REST responses. I wanted to be super clear when the API only returns a pointer to a table rather than a full table resource.
It's a historical artifact that client.dataset() and dataset.table() return a reference and not an actual Dataset or Table. In retrospect, I probably should have renamed or removed those methods in the 1.0 redesign project.
An idea:
*Reference, also allow a string. This gets a little muddy in the library code, but I think so long as it's documented well it'd be okay.PRs welcome!
cc @shollyman @alixhami
Hi Tim!
Thanks for your thoughtful reply, as ever! I had the sense there might be some influence from the REST design.
Whereever Client methods accept a *Reference, also allow a string. This gets a little muddy in the library code, but I think so long as it's documented well it'd be okay.
That makes a lot of sense. Particularly if this could be project[:.]dataset.table _or_ dataset.table and fall back to the default project, then that could surmount the issue you point out with .from_string. I think that would accomplish most of the goals - it would be easy & intuitive to get tables, and users wouldn't need to learn the ref objects for 80% of cases. A step further would be to attempt to coerce strings / Tableref objects to Tables, though maybe that's a step too far.
I'm trying to focus on xarray & pandas-gbq re OSS work (and a touch of rust!) so while I'd love to tinker here, please don't wait for me on this one. I hope the comments are still productive even from the peanut gallery.
Why not a step further and introducing default_dataset at Client()-level. It could propagate to similar property e.g. in QueryJobConfig(), or act as default wherever a dataset is required, including in .from_string if only table is passed...
@yiga2 I like the idea of a default dataset in principle, but there are so many places where a dataset might creep in that I fear it'd be hard to catch all uses, especially as the BigQuery API evolves and there may be new APIs that take datasets in their requests.
Instead, we are starting to offer a slightly lower-level API where default resources can be provided. See: https://github.com/googleapis/google-cloud-python/pull/6088
Most helpful comment
@yiga2 I like the idea of a default dataset in principle, but there are so many places where a dataset might creep in that I fear it'd be hard to catch all uses, especially as the BigQuery API evolves and there may be new APIs that take datasets in their requests.
Instead, we are starting to offer a slightly lower-level API where default resources can be provided. See: https://github.com/googleapis/google-cloud-python/pull/6088