Google-cloud-python: BigQuery silently ignores unsupported configuration

Created on 23 Nov 2016  路  7Comments  路  Source: googleapis/google-cloud-python

temikus 位 ipython
Python 2.7.10 (default, Oct 23 2015, 19:19:21)
...
In [1]: import csv
   ...:
   ...: from gcloud import bigquery
   ...: from gcloud.bigquery import SchemaField
   ...:

In [2]: bgq = bigquery.Client()

In [3]: qry_str = """SELECT CAST(source_year AS string) AS year,
   ...: COUNT(is_male) AS birth_count
   ...: FROM [publicdata:samples.natality]
   ...: GROUP BY year
   ...: ORDER BY year
   ...: DESC
   ...: LIMIT 15"""

In [4]: qry = bgq.run_sync_query(qry_str)

In [5]: qry.maximum_billing_tier = 3

In [6]: qry.run()

...

However, there's no actual parameter of maximum_billing_tier in the code:

class _SyncQueryConfiguration(object):
    """User-settable configuration options for synchronous query jobs.
    Values which are ``None`` -> server defaults.
    """
    _default_dataset = None
    _dry_run = None
    _max_results = None
    _timeout_ms = None
    _preserve_nulls = None
    _use_query_cache = None
    _use_legacy_sql = None

This is bad and leads to user confusion, this needs to be fixed ASAP. I'll create a separate bug on supporting maximum_billing_tier.

位 pip show gcloud                                                                                                                                               (1)
Name: gcloud
Version: 0.18.3
Summary: API Client library for Google Cloud
Home-page: https://github.com/GoogleCloudPlatform/gcloud-python
Author: Google Cloud Platform
Author-email: [email protected]
License: Apache 2.0
Location: /Users/temikus/Code/python/lib/python2.7/site-packages
Requires: httplib2, grpc-google-pubsub-v1, google-gax, six, gax-google-logging-v2, protobuf, grpcio, oauth2client, googleapis-common-protos, grpc-google-logging-v2, gax-google-pubsub-v1
bigquery

All 7 comments

Thanks for reporting @Temikus!
I can see how this would be frustrating.

I appears that maximum_billing_tier is supported for async query jobs but not sync.

@dhermes @tseaver should some manipulation to __slots__ or __setattr__ happen here to make it so you can't assign new properties?

@Temikus Python's "normal" object semantics allow assigning arbitrary attributes. Trying to lock the QueryResults instance down to prevent such assignment seems like the wrong choice here.

@tseaver For a typical object I would agree, but in this case, it may be the "safe" path forward. As I see it we have a few options:

  • __slots__
  • Custom __getattr__ (ew, never)
  • Look for "wrong" elements in self.__dict__ when creating a payload to send to the server

The only other "acceptable" fix is to have full coverage of all properties, so they can't get dropped silently, since we'll have setters for them

Note that the OP is trying to set maximum_billing_tier on a synchronous query, which (as I point out in #2766) is not a supported property.

Ok, I admit I'm not a Pythonist, so I may be in the wrong here, however - why do you pass only specific options to the API client then? Wouldn't it be more proper to just pass arbitrary attributes to the API client? It ignores them anyway. This should both simplify your code and allow you not to track the upstream changes for every attribute.

Or is it already doing that?

@Temikus We don't attempt to restrict applications "scribbling" other attributes onto our client-side objects: Python's development culture follows a "consenting adults" model, rather than the "bondage-and-discipline" model of Java and other static typing languages.

The properties we do define map "Pythonic" attribute_names_with_underscores to the "Javiotic" namesWithCamelCase versions required by the back end: in some cases, we also work around other issues (e.g., the API defines names which are Python keywords / builtins). Using __slots__ is normally a performance / memory optimization, rather than a "type safety" option.

@tseaver Understood, thanks for explaining!

Was this page helpful?
0 / 5 - 0 ratings