Google-cloud-python: Provide a method to convert ndb Key's urlsafe format to/from datastore.Key

Created on 12 Apr 2017  Â·  35Comments  Â·  Source: googleapis/google-cloud-python

In GAE standard, ndb has the concept of the urlsafe string version of a Key.
e.g. url_string = sandy_key.urlsafe() produces a result like agVoZWxsb3IPCxIHQWNjb3VudBiZiwIM.

However, I cannot find a way to use this format with the google.cloud.datastore library for code running outside GAE standard. Please add a method to the Key class to create urlsafe strings and a method to generate Keys from urlsafe strings. Alternatively, provide some documentation on how to decode/encode them with the existing library.

NB, the console support page says "The encoding method is available in any Cloud Datastore client library."

feature request datastore p2

Most helpful comment

So I did some more digging and it turns out the Reference protobuf in App Engine is proto1 (i.e. first generation of protobuf), so the modern proto2/proto3 tooling can't convert it. I got very close to converting using standard tools, but some small differences in the protocols prevent it.

You can definitely include the ProtocolBuffer module in your application (in any location):

google-cloud-sdk/platform/google_appengine/google/net/proto/ProtocolBuffer.py

and then just define your own local Reference and Path classes from

google-cloud-sdk/platform/google_appengine/google/appengine/datastore/entity_pb.py

I.e. you can use these files outside of App Engine. I have done so in the gist I linked to and it works just fine.

All 35 comments

Hi @ndenny,
Thanks for reporting. We will look into this.

For reference, urlsafe just takes the key, turns it into raw bytes as a serialized protobuf, then urlsafe-base64 encodes those raw bytes. However, the protobuf used by App Engine is different than the one used by the Cloud Datastore API (I think).

@jonparrott Can you weigh in?


If you'd like a "band-aid" for now:

>>> import base64
>>> from google.cloud import datastore
>>>
>>> client = datastore.Client()
>>> key = client.key('Foo', 'bar')
>>> key_pb_bytes = key.to_protobuf().SerializeToString()
>>> urlsafe = base64.urlsafe_b64encode(key_pb_bytes).rstrip(b'=')

There's two separate requests here:

  1. We need to add Key.urlsafe() and Key.from_urlsafe() that operates with our current proto format.
  2. We should make sure the first method accepts App Engine standard's protobuf format as well.

Go to the source and you will find this:

def _DecodeUrlSafe(urlsafe):
  """Decode a url-safe base64-encoded string.
  This returns the decoded string.
  """
  if not isinstance(urlsafe, basestring):
    raise TypeError('urlsafe must be a string; received %r' % urlsafe)
  if isinstance(urlsafe, unicode):
    urlsafe = urlsafe.encode('utf8')
  mod = len(urlsafe) % 4
  if mod:
    urlsafe += '=' * (4 - mod)
  #This is 3-4x faster than urlsafe_b64decode()
  return base64.b64decode(urlsafe.replace('-', '+').replace('_', '/'))

I just copied and pasted this snippet in my code and it works now. Hope this helps.

@michaelenglo Thanks for providing the snippet, however the issue is that the raw bytes are different for an entity in App Engine and one using the Google Cloud Datastore API.

@dhermes That is true, in fact, When I try to print it to the console, it gives me non-textual characters. I prematurely said it worked because I didn't get an exception for the first time when running it lol (been stuck on this for 2 hours now).

By any chance, have we found any solution to this already?

@michaelenglo Feel free to provide some code. The issue though is that the two different APIs (the native RPC API in App Engine and the external Cloud Datastore API) use different protobuf definitions for entities, keys, etc.

I'm pretty sure the GAE protos are in google.appengine.datastore.entity_pb (I'm verifying right now) and the Cloud Datastore protos are here.

Here is a confirmation of the difference

>>> import base64
>>> from google.appengine.datastore import entity_pb
>>> from google.appengine.ext import ndb
>>>
>>> key = ndb.Key('MyModel', 5910974510923776)
>>> urlsafe = key.urlsafe()
>>> urlsafe
'ahNkZXZ-c3R1ZHlidWRkeXgtaHJkchQLEgdNeU1vZGVsGICAgICAgMAKDA'
>>> urlsafe += '=='  # Needs padding
>>> raw_bytes = base64.urlsafe_b64decode(urlsafe)
>>> raw_bytes
'j\x13dev~studybuddyx-hrdr\x14\x0b\x12\x07MyModel\x18\x80\x80\x80\x80\x80\x80\xc0\n\x0c'
>>>
>>> ref = entity_pb.Reference(raw_bytes)
>>> print(ref)
app: "dev~studybuddyx-hrd"
path <
  Element {
    type: "MyModel"
    id: 0x15000000000000
  }
>

vs.

>>> from google.cloud import datastore
>>> client = datastore.Client(project='studybuddyx-hrd')
>>>
>>> key = client.key('MyModel', 5910974510923776)
>>>
>>> key_pb = key.to_protobuf()
>>> key_pb
partition_id {
  project_id: "studybuddyx-hrd"
}
path {
  kind: "MyModel"
  id: 5910974510923776
}
>>>
>>> key_pb_bytes = key_pb.SerializeToString()
>>> key_pb_bytes
'\n\x11\x12\x0fstudybuddyx-hrd\x12\x12\n\x07MyModel\x10\x80\x80\x80\x80\x80\x80\xc0\n'
>>> print(ref)
app: "dev~studybuddyx-hrd"
path <
  Element {
    type: "MyModel"
    id: 0x15000000000000
  }
>

What kind of data format is this? Is this something that has a publicly available parser?

My only goal here is just to extract the ID from the key. And by the way, I don't understand why the data formatting must be different between AE Standard Env. API and the external GC client library?

What kind of data format is this?

This is just a pretty-printed representation of an entity_pb.Reference object, which is a Python wrapper for a protobuf.

Is this something that has a publicly available parser?

The ref instance is already parsed. I recommend firing up the dev_appserver and then using the Interactive Console to play around.

My only goal here is just to extract the ID from the key.

The rules for deserializing a protobuf are public (i.e. not a Google secret), though I'm not sure what they are.

And by the way, I don't understand why the data formatting must be different between AE Standard Env. API and the external GC client library?

I can't help with that. Our library is just a wrapper for the existing cloud APIs. As a historical note, the Datastore within App Engine has existed since 2008, but the Cloud Datastore API came along in 2012 (and the current revision came along in 2015 or 2016).

Okay, so I get that. Now the question will be how to actually convert the raw binary data after it has been decoded?

When I called the _DecodeUrlSafe(urlsafe) when urlsafe="agxzfm9lLXNlcnZpY2VyGwsSDkRyYXdpbmdWZXJzaW9uGICAgID4woQKDA", and print the returned value, I get this:

j s~oe-servicer DrawingVersionÇÇÇǰ┬ä

How to convert this value into a protobuf? Is there an API call for this in the Client Library?

If this is possible, then I can call google.cloud.datastore.helpers.entity_from_protobuf(pb) to get the entity.

You'll want to print the repr of the value. That already is a protobuf (as raw bytes). See my snippet above:

from google.appengine.datastore import entity_pb
ref = entity_pb.Reference(raw_bytes)

So I did some more digging and it turns out the Reference protobuf in App Engine is proto1 (i.e. first generation of protobuf), so the modern proto2/proto3 tooling can't convert it. I got very close to converting using standard tools, but some small differences in the protocols prevent it.

You can definitely include the ProtocolBuffer module in your application (in any location):

google-cloud-sdk/platform/google_appengine/google/net/proto/ProtocolBuffer.py

and then just define your own local Reference and Path classes from

google-cloud-sdk/platform/google_appengine/google/appengine/datastore/entity_pb.py

I.e. you can use these files outside of App Engine. I have done so in the gist I linked to and it works just fine.

Scratch that, I just had the wrong proto2 definition. Now you can convert with onestore_v3_pb2.py

See the README in that gist for an example.

I am going to send a PR ASAP.

It works now! I imported onestore_v3_pb2.py and used that to construct a Reference from the binary string. I get the id by calling reference.path.Element.id, which returns the id BUT still of object type Property.

document_key_decoded = self.DecodeUrlSafe(urlsafe)
document_reference = onestore_v3_pb2.Reference()
document_reference.ParseFromString(document_key_decoded)

So, I just cast the id to string, put it on the key constructor argument and voila!

document_key = datastore_client.key('DrawingVersion', str(document_reference.path.Element.id))
document_ds_entity = datastore_client.get(document_key)

In this case, I know my kind already so I don't need to get the kind value, but in case if you do need to get it from the Reference, just be aware that kind in protobuf1 is referred to as type. You have to use reference.path.Element.type to get the kind value.

Thanks for the help @dhermes ! At least now it's working for my application. Your help is greatly appreciated!

Sure thing, and due to you and @ndenny this will likely become a feature (see #3491)

Hey guys,

There was a small but fatal mistake in my solution. document_reference.path.Element.id DOES NOT return the id. In fact, it does not return anything meaningful (When I print the value, I get <property object at 0x00000000057B6E08>). If you want to access the id or the type, you must use document_reference.path.element[0].id or document_reference.path.element[0].type. I traced along the methods to get the id using dir() and finally I can mine the proper id and kind from the Reference (the confirmation is I get the id as long type, instead of string).

Sorry for the mistake! I hope I have not misled too many people.

No worries, thanks for clarifying. I recommend checking out the fix merged in with #3491 for "full-featured" support.

I tried running pip install --upgrade google-cloud and test the method, but it says 'Key' object has no attribute 'to_legacy_urlsafe'. Is the PyPI version not always necessarily the same as the Github version? Or should I do pip install --upgrade google-cloud-datastore instead? What's the difference between pip install --upgrade google-cloud-datastore and just pip install --upgrade google-cloud?

@michaelenglo

  • I haven't pushed a release yet, so that wouldn't work
  • I recommend using pip install --upgrade google-cloud-datastore over the "umbrella" package google-cloud

If you want to install for source, you can do:

$ git clone https://github.com/GoogleCloudPlatform/google-cloud-python
$ pip install google-cloud-python/datastore/

@dhermes I might be wrong but it seems that your to_legacy_urlsafe method returns slightly different urlsafe from the usual urlsafe datastore generates. It should not have any "=", shouldn't it? I tried decoding a urlsafe, generate a key and reverse it back using the method. The returned value is:

agxzfm9lLXNlcnZpY2VyGwsSDkRyYXdpbmdWZXJzaW9uGICAgICAgIAKDA= (re-encoded)
from the former:
agxzfm9lLXNlcnZpY2VyGwsSDkRyYXdpbmdWZXJzaW9uGICAgICAgIAKDA (original)

Right now while waiting for the released changes on pip, I created my own methods from/to urlsafe by copying the related code from your source into it. It works fine for now but it will be changed soon after the library changes has been pushed to the new pip updates.

But another weird part about the code I copied from from_legacy_urlsafe is that when I generate the reference from the real datastore (the same urlsafe on my comment above), there is always an 's~' string in front of the app name:

app: "s~oe-service"
path {
  Element {
    type: "DrawingVersion"
    id: 5629499534213120
  }
}

I don't know if this means "service" (since in my case, the app is just a web service, not the main app) or anything, but in order to get the real app name to work with the Keyconstructor, I have to strip the 's~' out:

    def make_key_from_urlsafe(self, urlsafe):
        reference_binarys = self.DecodeUrlSafe(urlsafe)
        #wrap the protobuf with Reference class
        document_reference = onestore_v3_pb2.Reference()
        document_reference.ParseFromString(reference_binarys)
        kind = document_reference.path.element[0].type
        id_ =  document_reference.path.element[0].id
        #for some reason there is an "s~" in front of the project unicode
        #It needs to be removed first in order for the project argument correct
        project = project=document_reference.app.strip('s~')
        key = datastore.key.Key(kind, id_, project=project)
        return key

What is this 's~'?

same thing happens with the to_legacy_urlsafe where it does not generate the 's~', so when I compare with the generated url_safe from the real datastore, it is not the same.

s~ indicates the app (and its datastore) are located in the US.

We made a decision not to add the s~ in to_legacy_urlsafe, I'll let @jonparrott explain why (I can't entirely remember).

As for the extra = at the end of the the output from to_legacy_urlsafe, @jonparrott @lukesneeringer should we strip it? I hadn't realized it was stripped in ndb but it is and it is added back.

As for the extra = at the end of the the output from to_legacy_urlsafe, @jonparrott @lukesneeringer should we strip it? I hadn't realized it was stripped in ndb but it is and it is added back.

It doesn't matter. ndb's base64 decode can handle the absence or presence of the padding character.

Good point :grinning:

Just my two cents - For the sake of consistency with App Engine and Datastore, the stripping of the padding characters should be added imo. In my case for example, both the main app and web service app talks to the same datastore and cloud storage. Since every Drawing entity always has one gcs blob, both apps agree to name the gcs objects by their datastore's urlsafe (for uniqueness). But if the main app(std env AE)'s ndb.Key.urlsafe() returns different urlsafe than the client's Key.to_legacy_urlsafe() (as in one has padding and one is not), both apps will retrieve/name objects differently.

I think that's a good point @michaelenglo. Would you like to send a PR? (You should get some credit since you helped @ndenny push this feature into existence.)

@dhermes Sure! Will notify you when I am done. I am just adding ".strip("=")" after the returned base64.urlsafe_b64encode(raw_bytes). Also, have you tested the from_legacy_urlsafe? Did it work? The other day, I tried using base64.urlsafe_b63decode(urlsafe) in python interactive mode to decode the urlsafe I copied straight from datastore but it raised TypeErrorbecause of the incorrect padding (that is the reason why (I think) we see the padding is being added back to the urlsafe in the ndb version)).

Yes both from_legacy_urlsafe and to_legacy_urlsafe should be updated with snippets similar to the ones I linked to above

Hi @dhermes , I have tried to make some changes to the from_legacy_urlsafe and to_legacy_urlsafe. I also tried to follow the instructions on Contribute Docs, but then I am stuck when running the setup.py. When I run python setup.py install, This is what I get:

No local packages or working download links found for google-cloud-vision<0.26dev,>=0.25.0
error: Could not find suitable distribution for Requirement.parse('google-cloud-vision<0.26dev,>=0.25.0')

And because of this, I cannot install the environment to run the tox tests (and I am not familar with tox myself). Can you guide me on what are the necessary steps I need to take to solve this?

Sorry for not mentioning, but the CONTRIBUTING doc is woefully out of date. You should install nox into your environment:

[sudo] pip install --upgrade nox-automation

and then just run

nox -s unit_tests cover

from the datastore/ directory (I don't recommend running setup.py install, install and environment management will be handled by the nox tool)

Hi @dhermes. I am very sorry that I have not been able to send my pr since two weeks ago, as I have been little bit busy with my project at work... But I promise to send a PR by tomorrow! Now, I have some questions regarding testing. I am having trouble with setting up the python3.4 interpreter environments and so I cannot fully complete the coverage test. Here is the error message:

File "c:\python27\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\Python27\Scripts\nox.exe\__main__.py", line 9, in <module>
  File "c:\python27\lib\site-packages\nox\main.py", line 289, in main
    success = run(global_config)
  File "c:\python27\lib\site-packages\nox\main.py", line 227, in run
    result = session.execute()
  File "c:\python27\lib\site-packages\nox\sessions.py", line 275, in execute
    self._create_venv()
  File "c:\python27\lib\site-packages\nox\sessions.py", line 239, in _create_venv
    self._should_install_deps = self.venv.create()
  File "c:\python27\lib\site-packages\nox\virtualenv.py", line 129, in create
    cmd.extend(['-p', self._resolved_interpreter])
  File "c:\python27\lib\site-packages\nox\virtualenv.py", line 108, in _resolved_interpreter
    self.interpreter,
RuntimeError: Unable to locate Python interpreter "python3.4".

I attempted installing the python 3.6 and add it to the path, but it does not solve the problem. How do I setup the test environment for all the python versions?

Also, my unit test with 2.7 python version got only 97% reached. I want to improve it to 100% but I don't know where to find the references on how to do so.

Can you please guide?

You can ignore the tests for interpreters you don't have, 3.6 and 2.7 will be sufficient. To target a specific interpreter you can just run

nox -s "unit_tests(python_version=3.6)"

If you'd like to set up "alternate" Python versions, I recommend looking at pyenv

As for the code coverage, just send the PR and we can discuss there. (We have a 97% target for "unit_tests", but "cover" enforces 100%, so you may just be misreading the output.)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

graingert picture graingert  Â·  34Comments

pcostell picture pcostell  Â·  83Comments

waprin picture waprin  Â·  38Comments

theacodes picture theacodes  Â·  62Comments

Rockyyost picture Rockyyost  Â·  52Comments