Google-cloud-python: PROPOSAL: Storage API consistency fix

Created on 13 Feb 2015  路  11Comments  路  Source: googleapis/google-cloud-python

Guiding principles:

  • [x] Getters and setters should never make HTTP requests. Lazy loading is OK,
    but only when it involves instance creation / other local (non-network bound) behavior. For example, in Bucket.acl this already happens:

python @property def acl(self): """Create our ACL on demand.""" if self._acl is None: self._acl = BucketACL(self) return self._acl

  • [ ] More generally HTTP requests should be limited to explicit method calls. This also rules out constructors loading data.
  • [ ] Blob, Bucket, and *ACL (the only nouns) instances should have load(), exists(), create(), delete(), and update() methods. This design gives rise to code like

python blob = Blob('/remote/path.txt', bucket=bucket, properties=properties) try: blob.load() # Just metadata except NotFound: blob.upload_from_file(filename) # Sends metadata from properties

(this maybe screams for get_or_create(), we'll feel it out as we develop). It's unclear if it's worth making a distinction between storage.NOUN.update <--> PUT and storage.NOUN.patch <--> PATCH. (As of right now, we don't implement PUT / update anywhere.)

  • [x] exists() should use fields in the requests to minimize the payload.
  • [ ] A Connection should not be required to be bound to any object (one of the nouns Bucket, ACL, or Blob) but should be an optional argument to methods which actually talk to the API.
  • [ ] We should strongly consider adding a last_updated field to classes to indicate the last time the values were updated (from the server).
  • [ ] For list methods: List all buckets top-level, e.g.

python storage.get_all_buckets(connection=optional_connection)

and then bucket.get_all_objects(). It's unclear how the other 3 nouns (objectAccessControls, bucketAccessControls and defaultObjectAccessControls) will handle this. Right now they are handled via ObjectACL.reload() and BucketACL.reload() (a superclass of DefaultObjectACL).

  • [x] Implicit behavior (default project, default bucket and/or default connection) should be used wherever possible (and documented)

@tseaver Please weigh in. This was inspired by our discussion at the end of #604.

/cc @thobrla I'd like to close #545 and focus on this (possibly broken down into sub-bugs). Does that sound OK?

storage

All 11 comments

Closing https://github.com/GoogleCloudPlatform/gcloud-python/issues/545 and focusing on this+sub-bugs sounds good to me; this captures the concerns I had there.

Thanks!

Aside: I made a list of all the 34 API methods and the corresponding code paths.

@tseaver I'd like to get moving on this soon. WDYT of the proposals?

@dhermes I'm in "quick hit" mode today, but can review in more depth tomorrow.

I'm in Portland mostly AFK, so next few days is fine.

@tseaver Can we move forward on this?

I'm not sure debate here is the right thing: this is a _large_ set of changes, and I'm not sure what the goals are: we should be talking about them, first, before we try to plan out an implementation.

The goal is really just making the API more usable.

The current behavior of "sync" whenever a property is added outside a batch shouldn't be the default behavior (see #545). Most of the changes above relate to making network interaction occur more transparently.

Hi. Is there any progress on the part that says blob.upload_from_file(filename) # Sends metadata from properties in this issue (or is there a separate issue for it I haven't found)?

Having to use patch() after upload_...() unfortunately has several more or less serious drawbacks:

  1. It requires the client to have credentials allowing PATCH requests, preventing immutable append-only semantics.
  2. It is not atomic, so if the PATCH operation fails, or the client fails between the calls, the data in the storage service might be left in an inconsistent state.
  3. It lacks consistency. There will be a window when the blob data has been uploaded but when metadata are still incorrect or missing, leading to potential race conditions.
  4. The metadata-generation will never be 1.

These are currently blocking me from using gcloud-python.

User @pdknsk seems to have provided a patch in #536. I think it would make sense implementing that even before/without having the "load(), exists(), create(), delete(), and update()" interface described in this issue.

I'd be happy to provide a pull request if that helps.

Was this page helpful?
0 / 5 - 0 ratings