In GAE standard, ndb has the concept of the urlsafe string version of a Key.
e.g. url_string = sandy_key.urlsafe() produces a result like agVoZWxsb3IPCxIHQWNjb3VudBiZiwIM.
However, I cannot find a way to use this format with the google.cloud.datastore library for code running outside GAE standard. Please add a method to the Key class to create urlsafe strings and a method to generate Keys from urlsafe strings. Alternatively, provide some documentation on how to decode/encode them with the existing library.
NB, the console support page says "The encoding method is available in any Cloud Datastore client library."
Hi @ndenny,
Thanks for reporting. We will look into this.
For reference, urlsafe just takes the key, turns it into raw bytes as a serialized protobuf, then urlsafe-base64 encodes those raw bytes. However, the protobuf used by App Engine is different than the one used by the Cloud Datastore API (I think).
@jonparrott Can you weigh in?
If you'd like a "band-aid" for now:
>>> import base64
>>> from google.cloud import datastore
>>>
>>> client = datastore.Client()
>>> key = client.key('Foo', 'bar')
>>> key_pb_bytes = key.to_protobuf().SerializeToString()
>>> urlsafe = base64.urlsafe_b64encode(key_pb_bytes).rstrip(b'=')
There's two separate requests here:
Key.urlsafe() and Key.from_urlsafe() that operates with our current proto format.Go to the source and you will find this:
def _DecodeUrlSafe(urlsafe):
"""Decode a url-safe base64-encoded string.
This returns the decoded string.
"""
if not isinstance(urlsafe, basestring):
raise TypeError('urlsafe must be a string; received %r' % urlsafe)
if isinstance(urlsafe, unicode):
urlsafe = urlsafe.encode('utf8')
mod = len(urlsafe) % 4
if mod:
urlsafe += '=' * (4 - mod)
#This is 3-4x faster than urlsafe_b64decode()
return base64.b64decode(urlsafe.replace('-', '+').replace('_', '/'))
I just copied and pasted this snippet in my code and it works now. Hope this helps.
@michaelenglo Thanks for providing the snippet, however the issue is that the raw bytes are different for an entity in App Engine and one using the Google Cloud Datastore API.
@dhermes That is true, in fact, When I try to print it to the console, it gives me non-textual characters. I prematurely said it worked because I didn't get an exception for the first time when running it lol (been stuck on this for 2 hours now).
By any chance, have we found any solution to this already?
@michaelenglo Feel free to provide some code. The issue though is that the two different APIs (the native RPC API in App Engine and the external Cloud Datastore API) use different protobuf definitions for entities, keys, etc.
I'm pretty sure the GAE protos are in google.appengine.datastore.entity_pb (I'm verifying right now) and the Cloud Datastore protos are here.
Here is a confirmation of the difference
>>> import base64
>>> from google.appengine.datastore import entity_pb
>>> from google.appengine.ext import ndb
>>>
>>> key = ndb.Key('MyModel', 5910974510923776)
>>> urlsafe = key.urlsafe()
>>> urlsafe
'ahNkZXZ-c3R1ZHlidWRkeXgtaHJkchQLEgdNeU1vZGVsGICAgICAgMAKDA'
>>> urlsafe += '==' # Needs padding
>>> raw_bytes = base64.urlsafe_b64decode(urlsafe)
>>> raw_bytes
'j\x13dev~studybuddyx-hrdr\x14\x0b\x12\x07MyModel\x18\x80\x80\x80\x80\x80\x80\xc0\n\x0c'
>>>
>>> ref = entity_pb.Reference(raw_bytes)
>>> print(ref)
app: "dev~studybuddyx-hrd"
path <
Element {
type: "MyModel"
id: 0x15000000000000
}
>
vs.
>>> from google.cloud import datastore
>>> client = datastore.Client(project='studybuddyx-hrd')
>>>
>>> key = client.key('MyModel', 5910974510923776)
>>>
>>> key_pb = key.to_protobuf()
>>> key_pb
partition_id {
project_id: "studybuddyx-hrd"
}
path {
kind: "MyModel"
id: 5910974510923776
}
>>>
>>> key_pb_bytes = key_pb.SerializeToString()
>>> key_pb_bytes
'\n\x11\x12\x0fstudybuddyx-hrd\x12\x12\n\x07MyModel\x10\x80\x80\x80\x80\x80\x80\xc0\n'
>>> print(ref)
app: "dev~studybuddyx-hrd"
path <
Element {
type: "MyModel"
id: 0x15000000000000
}
>
What kind of data format is this? Is this something that has a publicly available parser?
My only goal here is just to extract the ID from the key. And by the way, I don't understand why the data formatting must be different between AE Standard Env. API and the external GC client library?
What kind of data format is this?
This is just a pretty-printed representation of an entity_pb.Reference object, which is a Python wrapper for a protobuf.
Is this something that has a publicly available parser?
The ref instance is already parsed. I recommend firing up the dev_appserver and then using the Interactive Console to play around.
My only goal here is just to extract the ID from the key.
The rules for deserializing a protobuf are public (i.e. not a Google secret), though I'm not sure what they are.
And by the way, I don't understand why the data formatting must be different between AE Standard Env. API and the external GC client library?
I can't help with that. Our library is just a wrapper for the existing cloud APIs. As a historical note, the Datastore within App Engine has existed since 2008, but the Cloud Datastore API came along in 2012 (and the current revision came along in 2015 or 2016).
Okay, so I get that. Now the question will be how to actually convert the raw binary data after it has been decoded?
When I called the _DecodeUrlSafe(urlsafe) when urlsafe="agxzfm9lLXNlcnZpY2VyGwsSDkRyYXdpbmdWZXJzaW9uGICAgID4woQKDA", and print the returned value, I get this:
js~oe-servicerDrawingVersionÇÇÇǰ┬ä
How to convert this value into a protobuf? Is there an API call for this in the Client Library?
If this is possible, then I can call google.cloud.datastore.helpers.entity_from_protobuf(pb) to get the entity.
You'll want to print the repr of the value. That already is a protobuf (as raw bytes). See my snippet above:
from google.appengine.datastore import entity_pb
ref = entity_pb.Reference(raw_bytes)
So I did some more digging and it turns out the Reference protobuf in App Engine is proto1 (i.e. first generation of protobuf), so the modern proto2/proto3 tooling can't convert it. I got very close to converting using standard tools, but some small differences in the protocols prevent it.
You can definitely include the ProtocolBuffer module in your application (in any location):
google-cloud-sdk/platform/google_appengine/google/net/proto/ProtocolBuffer.py
and then just define your own local Reference and Path classes from
google-cloud-sdk/platform/google_appengine/google/appengine/datastore/entity_pb.py
I.e. you can use these files outside of App Engine. I have done so in the gist I linked to and it works just fine.
Scratch that, I just had the wrong proto2 definition. Now you can convert with onestore_v3_pb2.py
See the README in that gist for an example.
I am going to send a PR ASAP.
It works now! I imported onestore_v3_pb2.py and used that to construct a Reference from the binary string. I get the id by calling reference.path.Element.id, which returns the id BUT still of object type Property.
document_key_decoded = self.DecodeUrlSafe(urlsafe)
document_reference = onestore_v3_pb2.Reference()
document_reference.ParseFromString(document_key_decoded)
So, I just cast the id to string, put it on the key constructor argument and voila!
document_key = datastore_client.key('DrawingVersion', str(document_reference.path.Element.id))
document_ds_entity = datastore_client.get(document_key)
In this case, I know my kind already so I don't need to get the kind value, but in case if you do need to get it from the Reference, just be aware that kind in protobuf1 is referred to as type. You have to use reference.path.Element.type to get the kind value.
Thanks for the help @dhermes ! At least now it's working for my application. Your help is greatly appreciated!
Sure thing, and due to you and @ndenny this will likely become a feature (see #3491)
Hey guys,
There was a small but fatal mistake in my solution. document_reference.path.Element.id DOES NOT return the id. In fact, it does not return anything meaningful (When I print the value, I get <property object at 0x00000000057B6E08>). If you want to access the id or the type, you must use document_reference.path.element[0].id or document_reference.path.element[0].type. I traced along the methods to get the id using dir() and finally I can mine the proper id and kind from the Reference (the confirmation is I get the id as long type, instead of string).
Sorry for the mistake! I hope I have not misled too many people.
No worries, thanks for clarifying. I recommend checking out the fix merged in with #3491 for "full-featured" support.
I tried running pip install --upgrade google-cloud and test the method, but it says 'Key' object has no attribute 'to_legacy_urlsafe'. Is the PyPI version not always necessarily the same as the Github version? Or should I do pip install --upgrade google-cloud-datastore instead? What's the difference between pip install --upgrade google-cloud-datastore and just pip install --upgrade google-cloud?
@michaelenglo
pip install --upgrade google-cloud-datastore over the "umbrella" package google-cloudIf you want to install for source, you can do:
$ git clone https://github.com/GoogleCloudPlatform/google-cloud-python
$ pip install google-cloud-python/datastore/
@dhermes I might be wrong but it seems that your to_legacy_urlsafe method returns slightly different urlsafe from the usual urlsafe datastore generates. It should not have any "=", shouldn't it? I tried decoding a urlsafe, generate a key and reverse it back using the method. The returned value is:
agxzfm9lLXNlcnZpY2VyGwsSDkRyYXdpbmdWZXJzaW9uGICAgICAgIAKDA= (re-encoded)
from the former:
agxzfm9lLXNlcnZpY2VyGwsSDkRyYXdpbmdWZXJzaW9uGICAgICAgIAKDA (original)
Right now while waiting for the released changes on pip, I created my own methods from/to urlsafe by copying the related code from your source into it. It works fine for now but it will be changed soon after the library changes has been pushed to the new pip updates.
But another weird part about the code I copied from from_legacy_urlsafe is that when I generate the reference from the real datastore (the same urlsafe on my comment above), there is always an 's~' string in front of the app name:
app: "s~oe-service"
path {
Element {
type: "DrawingVersion"
id: 5629499534213120
}
}
I don't know if this means "service" (since in my case, the app is just a web service, not the main app) or anything, but in order to get the real app name to work with the Keyconstructor, I have to strip the 's~' out:
def make_key_from_urlsafe(self, urlsafe):
reference_binarys = self.DecodeUrlSafe(urlsafe)
#wrap the protobuf with Reference class
document_reference = onestore_v3_pb2.Reference()
document_reference.ParseFromString(reference_binarys)
kind = document_reference.path.element[0].type
id_ = document_reference.path.element[0].id
#for some reason there is an "s~" in front of the project unicode
#It needs to be removed first in order for the project argument correct
project = project=document_reference.app.strip('s~')
key = datastore.key.Key(kind, id_, project=project)
return key
What is this 's~'?
same thing happens with the to_legacy_urlsafe where it does not generate the 's~', so when I compare with the generated url_safe from the real datastore, it is not the same.
s~ indicates the app (and its datastore) are located in the US.
We made a decision not to add the s~ in to_legacy_urlsafe, I'll let @jonparrott explain why (I can't entirely remember).
As for the extra = at the end of the the output from to_legacy_urlsafe, @jonparrott @lukesneeringer should we strip it? I hadn't realized it was stripped in ndb but it is and it is added back.
As for the extra = at the end of the the output from to_legacy_urlsafe, @jonparrott @lukesneeringer should we strip it? I hadn't realized it was stripped in ndb but it is and it is added back.
It doesn't matter. ndb's base64 decode can handle the absence or presence of the padding character.
Good point :grinning:
Just my two cents - For the sake of consistency with App Engine and Datastore, the stripping of the padding characters should be added imo. In my case for example, both the main app and web service app talks to the same datastore and cloud storage. Since every Drawing entity always has one gcs blob, both apps agree to name the gcs objects by their datastore's urlsafe (for uniqueness). But if the main app(std env AE)'s ndb.Key.urlsafe() returns different urlsafe than the client's Key.to_legacy_urlsafe() (as in one has padding and one is not), both apps will retrieve/name objects differently.
I think that's a good point @michaelenglo. Would you like to send a PR? (You should get some credit since you helped @ndenny push this feature into existence.)
@dhermes Sure! Will notify you when I am done. I am just adding ".strip("=")" after the returned base64.urlsafe_b64encode(raw_bytes). Also, have you tested the from_legacy_urlsafe? Did it work? The other day, I tried using base64.urlsafe_b63decode(urlsafe) in python interactive mode to decode the urlsafe I copied straight from datastore but it raised TypeErrorbecause of the incorrect padding (that is the reason why (I think) we see the padding is being added back to the urlsafe in the ndb version)).
Yes both from_legacy_urlsafe and to_legacy_urlsafe should be updated with snippets similar to the ones I linked to above
Hi @dhermes , I have tried to make some changes to the from_legacy_urlsafe and to_legacy_urlsafe. I also tried to follow the instructions on Contribute Docs, but then I am stuck when running the setup.py. When I run python setup.py install, This is what I get:
No local packages or working download links found for google-cloud-vision<0.26dev,>=0.25.0
error: Could not find suitable distribution for Requirement.parse('google-cloud-vision<0.26dev,>=0.25.0')
And because of this, I cannot install the environment to run the tox tests (and I am not familar with tox myself). Can you guide me on what are the necessary steps I need to take to solve this?
Sorry for not mentioning, but the CONTRIBUTING doc is woefully out of date. You should install nox into your environment:
[sudo] pip install --upgrade nox-automation
and then just run
nox -s unit_tests cover
from the datastore/ directory (I don't recommend running setup.py install, install and environment management will be handled by the nox tool)
Hi @dhermes. I am very sorry that I have not been able to send my pr since two weeks ago, as I have been little bit busy with my project at work... But I promise to send a PR by tomorrow! Now, I have some questions regarding testing. I am having trouble with setting up the python3.4 interpreter environments and so I cannot fully complete the coverage test. Here is the error message:
File "c:\python27\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "C:\Python27\Scripts\nox.exe\__main__.py", line 9, in <module>
File "c:\python27\lib\site-packages\nox\main.py", line 289, in main
success = run(global_config)
File "c:\python27\lib\site-packages\nox\main.py", line 227, in run
result = session.execute()
File "c:\python27\lib\site-packages\nox\sessions.py", line 275, in execute
self._create_venv()
File "c:\python27\lib\site-packages\nox\sessions.py", line 239, in _create_venv
self._should_install_deps = self.venv.create()
File "c:\python27\lib\site-packages\nox\virtualenv.py", line 129, in create
cmd.extend(['-p', self._resolved_interpreter])
File "c:\python27\lib\site-packages\nox\virtualenv.py", line 108, in _resolved_interpreter
self.interpreter,
RuntimeError: Unable to locate Python interpreter "python3.4".
I attempted installing the python 3.6 and add it to the path, but it does not solve the problem. How do I setup the test environment for all the python versions?
Also, my unit test with 2.7 python version got only 97% reached. I want to improve it to 100% but I don't know where to find the references on how to do so.
Can you please guide?
You can ignore the tests for interpreters you don't have, 3.6 and 2.7 will be sufficient. To target a specific interpreter you can just run
nox -s "unit_tests(python_version=3.6)"
If you'd like to set up "alternate" Python versions, I recommend looking at pyenv
As for the code coverage, just send the PR and we can discuss there. (We have a 97% target for "unit_tests", but "cover" enforces 100%, so you may just be misreading the output.)
Most helpful comment
So I did some more digging and it turns out the
Referenceprotobuf in App Engine isproto1(i.e. first generation of protobuf), so the modernproto2/proto3tooling can't convert it. I got very close to converting using standard tools, but some small differences in the protocols prevent it.You can definitely include the
ProtocolBuffermodule in your application (in any location):and then just define your own local
ReferenceandPathclasses fromI.e. you can use these files outside of App Engine. I have done so in the gist I linked to and it works just fine.