Google-cloud-python: Datastore treats str as a blob?

Created on 9 Jun 2016  路  8Comments  路  Source: googleapis/google-cloud-python

https://github.com/GoogleCloudPlatform/gcloud-python/blob/c3331eb80b97d5f69c243dfd421266ed22fede2c/gcloud/datastore/helpers.py#L315

This leads to us saving string values as base64 encoded blobs... What's up with that? Is that on purpose?

datastore

Most helpful comment

Right -- after conversation w @dhermes it's clear that I am wrong!

Here's the deal:

  • str == bytes
  • If you just send off a string, we don't know the encoding, so we store it as "a string of bytes" (which happens to be called blob_value, when it might have been better named as bytes_value)
  • blob_value's are indexed, which means you can query as you always did without issue
  • In the Cloud Console, you'll see unencoded strings (bytes) as base-64 encoded strings. This is simply because the console itself doesn't know the encoding... They could try to guess, but that's... messy.

Long story short: use u'Timmy' if you're clear on what the encoding should be (in this case, string_value stores UTF-8 values). 'Timmy' doesn't tell Datastore what the string encoding is, and therefore Datastore just treats it as some bytes...

All 8 comments

Looks like we went from basestr (unicode + str) to just unicode... and str is treated as a blob https://github.com/GoogleCloudPlatform/gcloud-python/commit/ad90b7b3f233ff3045fd2f4605ab871c9b8c05f7 ?

Yea, looks like we did this on purpose... which seems kind of crazy that saving a regular string (ie, {'name': 'Timmy'}) ends up as a b64-encoded blob.... @tseaver ?

This has been true for almost 2 whole years. See the note

Values which are "text" ('unicode' in Python2, 'str' in Python3) map to 'string_value' in the datastore; values which are "bytes" ('str' in Python2, 'bytes' in Python3) map to 'blob_value'.

Also, blobs don't get base64 encoded, they are just displayed that way in the cloud console. They get stored as byte strings, which is exactly what they are.

Hm. That's scary. Why is it that str in Python2 should convert to blob?

Not scary. In Python2, str == bytes. There is no other way for us to detect that a user wants bytes vs. unicode. Storing a bytestring is much more space-efficient than storing unicode, so it's certainly something we want to enable (and not all data is strings anyhow, so we always need to enable bytestrings)

$ python2.7 -c 'print(bytes)'
<type 'str'>
$ python3.4 -c 'print(bytes)'
<class 'bytes'>
$ python2.7 -c 'print(repr(b"abc"))'
'abc'
$ python3.4 -c 'print(repr(b"abc"))'
b'abc'
$ python2.7 -c 'print(repr(u"abc"))'
u'abc'
$ python3.4 -c 'print(repr(u"abc"))'
'abc'

OK, but doesn't this mean that I can't query for these results? Blobs aren't indexed are they?

It feels like if someone wants to store a blob, they should have to do something extra.... And if they store a str (which yes, happens to be == bytes), we try to store it as a string_value...

In other words, saving {'name': 'Timmy'} to datastore as a Blob value seems completely backwards. If I wanted that as a blob, I should have to do something like {'name_blob': datastore.Blob('Timmy')}...

Right -- after conversation w @dhermes it's clear that I am wrong!

Here's the deal:

  • str == bytes
  • If you just send off a string, we don't know the encoding, so we store it as "a string of bytes" (which happens to be called blob_value, when it might have been better named as bytes_value)
  • blob_value's are indexed, which means you can query as you always did without issue
  • In the Cloud Console, you'll see unencoded strings (bytes) as base-64 encoded strings. This is simply because the console itself doesn't know the encoding... They could try to guess, but that's... messy.

Long story short: use u'Timmy' if you're clear on what the encoding should be (in this case, string_value stores UTF-8 values). 'Timmy' doesn't tell Datastore what the string encoding is, and therefore Datastore just treats it as some bytes...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jgeewax picture jgeewax  路  43Comments

forsberg picture forsberg  路  61Comments

ndenny picture ndenny  路  35Comments

pcostell picture pcostell  路  83Comments

dhermes picture dhermes  路  48Comments