This leads to us saving string values as base64 encoded blobs... What's up with that? Is that on purpose?
Looks like we went from basestr (unicode + str) to just unicode... and str is treated as a blob https://github.com/GoogleCloudPlatform/gcloud-python/commit/ad90b7b3f233ff3045fd2f4605ab871c9b8c05f7 ?
Yea, looks like we did this on purpose... which seems kind of crazy that saving a regular string (ie, {'name': 'Timmy'}) ends up as a b64-encoded blob.... @tseaver ?
This has been true for almost 2 whole years. See the note
Values which are "text" ('unicode' in Python2, 'str' in Python3) map to 'string_value' in the datastore; values which are "bytes" ('str' in Python2, 'bytes' in Python3) map to 'blob_value'.
Also, blobs don't get base64 encoded, they are just displayed that way in the cloud console. They get stored as byte strings, which is exactly what they are.
Hm. That's scary. Why is it that str in Python2 should convert to blob?
Not scary. In Python2, str == bytes. There is no other way for us to detect that a user wants bytes vs. unicode. Storing a bytestring is much more space-efficient than storing unicode, so it's certainly something we want to enable (and not all data is strings anyhow, so we always need to enable bytestrings)
$ python2.7 -c 'print(bytes)'
<type 'str'>
$ python3.4 -c 'print(bytes)'
<class 'bytes'>
$ python2.7 -c 'print(repr(b"abc"))'
'abc'
$ python3.4 -c 'print(repr(b"abc"))'
b'abc'
$ python2.7 -c 'print(repr(u"abc"))'
u'abc'
$ python3.4 -c 'print(repr(u"abc"))'
'abc'
OK, but doesn't this mean that I can't query for these results? Blobs aren't indexed are they?
It feels like if someone wants to store a blob, they should have to do something extra.... And if they store a str (which yes, happens to be == bytes), we try to store it as a string_value...
In other words, saving {'name': 'Timmy'} to datastore as a Blob value seems completely backwards. If I wanted that as a blob, I should have to do something like {'name_blob': datastore.Blob('Timmy')}...
Right -- after conversation w @dhermes it's clear that I am wrong!
Here's the deal:
str == bytesblob_value, when it might have been better named as bytes_value)blob_value's are indexed, which means you can query as you always did without issueLong story short: use u'Timmy' if you're clear on what the encoding should be (in this case, string_value stores UTF-8 values). 'Timmy' doesn't tell Datastore what the string encoding is, and therefore Datastore just treats it as some bytes...
Most helpful comment
Right -- after conversation w @dhermes it's clear that I am wrong!
Here's the deal:
str == bytesblob_value, when it might have been better named asbytes_value)blob_value's are indexed, which means you can query as you always did without issueLong story short: use
u'Timmy'if you're clear on what the encoding should be (in this case,string_valuestores UTF-8 values).'Timmy'doesn't tell Datastore what the string encoding is, and therefore Datastore just treats it as some bytes...