Google-cloud-python: PubSub: AttributeError: 'Client' object has no attribute 'pull'

Created on 3 Nov 2017  路  20Comments  路  Source: googleapis/google-cloud-python

  1. Specify the API at the beginning of the title (for example, "BigQuery: ...")
    General, Core, and Other are also allowed as types
    Pub/Sub

  2. OS type and version
    Windows 10

  3. Python version and virtual environment information python --version
    Python 2.7.13

  4. google-cloud-python version pip show google-cloud, pip show google-<service> or pip freeze

Name: google-cloud-pubsub
Version: 0.29.0
Summary: Python Client for Google Cloud Pub/Sub
Home-page: https://github.com/GoogleCloudPlatform/google-cloud-python
Author: Google Cloud Platform
Author-email: [email protected]
License: Apache 2.0
Location: c:\git\repo\venv\lib\site-packages
Requires: google-api-core, google-gax, psutil, grpc-google-iam-v1, google-cloud-core
  1. Stacktrace if available
AttributeError: 'Client' object has no attribute 'pull'
  1. Steps to reproduce
import mypubsub
mypubsub.create_topic("test-topic")
mypubsub.create_subscription("test-sub", "test-topic")
messages = mypubsub.pull_messages("test-sub")

Note: mypubsub is a helper library that wraps pubsub. Check code below

  1. Code example
from google.cloud import pubsub

def pull_messages(subscription_name, max_messages=999, return_immediately=True):
    """Query messages from subscription."""
    try:
        if not subscription_name:
            return None

        client = pubsub.SubscriberClient()
        subscription_path = client .subscription_path(os.getenv('GOOGLE_CLOUD_PROJECT'),
                                                         subscription_name)

        return client .pull(subscription_path, max_messages, return_immediately)
     except Exception as e:
        logging.exception("Google pub/sub pull messages")

    return None
question pubsub

Most helpful comment

@lukesneeringer Re: pull, I have no issue switching to async approach, I was just shooting in the dark while I'm attempting to migrate my library from pubsub 2.7.x to 2.8+ . Thank you for the workaround, I'll try not to use it.

Re: documentation. I'm not an expert on how to organize documentation but as a consumer of that info I find it hard to figure out the right information. So since I don't have a specific solution that I can recommend right now, let me share some of my struggles and hopefully when combined with other developers feedback you can have a better idea for a solution.

  1. There are lots of different locations for the same or similar documentation without a clear ability to know which version of the library it applies to. Here are some examples:

Related documentation: https://grpc.io/blog/pubsub

  1. Some of the code snippets and instructions are incomplete. Also, some code snippets won't even run, the variable names are wrong. Example:
topic = 'projects/{project_id}/topics/{topic}'.format(project_id=os.getenv('GOOGLE_CLOUD_PROJECT'), topic='MY_TOPIC_NAME')

Other places like this:

topic= subscriber.topic_path(project, topic_name)

While they're both valid approaches, it's not how a library documentation explains the supported types, formats and utility libraries

  1. Some documentation leads to 404 pages and there are a bunch of github issues and threads on compilation errors or documentation moved.
    Example:
    Here's a 404 documentation link: https://googlecloudplatform.github.io/google-cloud-python/latest/pubsub/usage.html
    and it's referenced from official github repo: https://github.com/GoogleCloudPlatform/google-cloud-python/tree/master/pubsub
    image

I hope this info helps

All 20 comments

The subscriber API is now async / callback-based. Your code will need to adjust to the new API.

@tseaver I am adjusting to the new API but since the documentation is not so nicely put together, I'm following the source code, https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/pubsub/google/cloud/gapic/pubsub/v1/subscriber_client.py and pull is still a defined method

A question and an answer.

First, the answer: While the underlying SubscriberClient object you found has a pull method, it is not present on the object that is exported as google.cloud.pubsub_v1.SubscriberClient. This is an intentional decision made at the behest of the Pub/Sub team, who believes that using the polling pull method is generally user error. If you really want to use pull, you can do so by means of a slightly longer import:

from google.cloud.pubsub_v1.gapic.subscriber_client import SubscriberClient
client = SubscriberClient()  # This will have a .pull() method.

Second, the question: I am sorry that the documentation was ineffective in teaching you how to use the library. How could I have put together the documentation better?

@lukesneeringer Re: pull, I have no issue switching to async approach, I was just shooting in the dark while I'm attempting to migrate my library from pubsub 2.7.x to 2.8+ . Thank you for the workaround, I'll try not to use it.

Re: documentation. I'm not an expert on how to organize documentation but as a consumer of that info I find it hard to figure out the right information. So since I don't have a specific solution that I can recommend right now, let me share some of my struggles and hopefully when combined with other developers feedback you can have a better idea for a solution.

  1. There are lots of different locations for the same or similar documentation without a clear ability to know which version of the library it applies to. Here are some examples:

Related documentation: https://grpc.io/blog/pubsub

  1. Some of the code snippets and instructions are incomplete. Also, some code snippets won't even run, the variable names are wrong. Example:
topic = 'projects/{project_id}/topics/{topic}'.format(project_id=os.getenv('GOOGLE_CLOUD_PROJECT'), topic='MY_TOPIC_NAME')

Other places like this:

topic= subscriber.topic_path(project, topic_name)

While they're both valid approaches, it's not how a library documentation explains the supported types, formats and utility libraries

  1. Some documentation leads to 404 pages and there are a bunch of github issues and threads on compilation errors or documentation moved.
    Example:
    Here's a 404 documentation link: https://googlecloudplatform.github.io/google-cloud-python/latest/pubsub/usage.html
    and it's referenced from official github repo: https://github.com/GoogleCloudPlatform/google-cloud-python/tree/master/pubsub
    image

I hope this info helps

@lukesneeringer from google.cloud.pubsub_v1.gapic.subscriber_client import SubscriberClient doesn't exist on pubsub 2.9.0. It should be from google.cloud.gapic.pubsub.v1.subscriber_client import SubscriberClient

@lukesneeringer why was pull removed? We can see the the code is still there but specifically blacklisted - @_gapic.add_methods(subscriber_client.SubscriberClient, blacklist=('pull', 'streaming_pull'))

We have a lot of code thats based on synchroneous pulling which would be hard to convert.
This also means we're stuck in 0.27.0 and can't upgrade any other Google dependency because of the shared dependencies all libraries have.

@lukesneeringer In 0.27, pull returned a batch of messages, which was nice because we then batch wrote into BigQuery and ack'ed the entire batch (batch still seems to exist in google-cloud-0.30 for publishing a batch of messages, but not Subscribe).
Will the pull in from google.cloud.gapic.pubsub.v1.subscriber_client import SubscriberClient continue to be supported down the road? Are there any other ways to get a batch of messages with subscribe?

Also, is there a difference between google.cloud.pubsub and google.cloud.pubsub_v1?
This tutorial uses pubsub_v1: https://cloud.google.com/pubsub/docs/pull
While this one uses pubsub: https://google-cloud-python.readthedocs.io/en/latest/pubsub/subscriber/index.html

Thanks!

@lukesneeringer Hi - I'm late to the conversation, and also I don't know if this is the right place to point this out. the pull method is still being used in this demo code
https://github.com/GoogleCloudPlatform/pubsub-media-processing/blob/dd145e007ed5d463d57709740f91e702836dd37d/worker.py#L66

I have a media processing task and am trying to use this sample as a starting point but, being new to the platform, am not quite sure what/how to fix it. If it makes more sense I can open an issue in that repo?

I'd also like to be able to have a synchronous pull method to get (ideally) one message at a time. I'll describe my use case, and maybe someone can suggest an alternative:

I'm setting up a system that does extensive post-processing on orders from an e-commerce system. The e-commerce system calls a web hook, which passes a message on to a pubsub topic. That pushes to another web hook which does the actual processing, since it involves a number of external API calls and data massaging. If that fails for some reason, I'd like to publish that original message to a "processing-failed" topic, so those orders can then be examined and possibly manually processed by an engineer on support rotation. However, I don't want to asynchronously pull messages, because I'm not automating something.

Maybe there is a different service that might suit this better? I suppose I could put failed order payloads in Cloud Datastore, but that's not currently described as "HIPAA-compliant" and that's a requirement for us (PubSub is compliant, however).

It definitely makes more sense to store those things in a "permanent" data store of some type rather than depend on them being in pub/sub for long periods of time.

@jonparrott The upside of PubSub I can set up alerting using a trigger and a Cloud Function, and once it's handled I can just ack the message to remove it. I don't expect they would be in pubsub for "long periods", but can you suggest a more "permanent" store that is HIPAA-compliant?

It seems Cloud SQL is covered: https://cloud.google.com/security/compliance/hipaa/#covered-products

I am going to close this out since the conversation has stalled.

As far as the feature request, "add back a synchronous pull() to the Subscriber client", that is a decision of the API team (i.e. not really with the maintainers of google-cloud-python / google-cloud-pubsub).

/cc @kir-titievsky

Late to the party here. Thanks for the input!
@chlela Thank you for the detailed writeup. I'll work through the recommendations you've made.
@ekampf I'd love to hear from you at [email protected] about what you are doing with with Pub/Sub. Perhaps we could coordinate a conversation to see if we can make things easier for your with the current client API or help me make a case for extending the client API
@gylu If your end goal is to write message data into Dataflow efficiently, I believe you can do this with the latest Apache Beam on Dataflow (streaming in -> batch out). Also thanks for pointing out the documentation issues.

Kir
Product Manager
Google Cloud Pub/Sub

@dhermes I don't understand your answer. the "pull" method is there and exposed by the API team and the python library explicitly hides it. Why?
(See code here: https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/pubsub/google/cloud/pubsub_v1/subscriber/client.py#L34)

@kir-titievsky it's not that we're doing anything special that requires pull and cant be done any other way. The problem is that we already code that depends on pull and the different Google python libraries are one big mess of dependency hell.
Since they all depend on google-core and some other common libraries, I cant upgrade or use new Google libraries without having to rewrite my code so that I could upgrade pubsub to (pubsub and a newer google-core do not play together)
Even if "pull" is going away eventually it would have been nice to have some grace period.
When it was removed in 0.28 you basically said - you now cant upgrade or use any new Google library until you rewrite your code - which is not really a nice thing to do without some grace period to let us prepare for such a rewrite.
I'm also sure it's not what you guys intended to say...

@ekampf The pull method is explicitly hidden at the request of the backend team, i.e. we do it because they asked us to. See @lukesneeringer answer above:

Yhis is an intentional decision made at the behest of the Pub/Sub team, who believes that using the polling pull method is generally user error

You're free to use subscriber.api.streaming_pull or subscriber.api.pull as is done by the internal callers.


the different Google python are one big mess of dependency hell.

This is somewhat true, but somewhat unfair. Feel free to file a specific issue that you're having and we're happy to help sort it out.

There are certainly conflicts with versions of google-cloud-core, but most of the changes between versions are new features added (i.e. not breaking changes). So at worse, you can "fool" pip to get the versions to work together. To try to fix this "update hell" we have frozen development on google-cloud-core and agreed on a stable google-api-core.

Also, I highly recommend against the umbrella-package / uber-package google-cloud. It will give you way too much.

@dhermes I didnt mean to be unfair. Its totally understandable that some libraries depend on some common code google-cloud-core and that development on both the core and the different apis continues. Thats not my complaint. APIs change, code evolves...

The problem is that all the 0.27 libraries use one core, and all the 0.28 use a different core.
So once pull was removed in 0.28 it means we got stuck in 0.27 for all the other libraries too.
And it just happend one day by suprise.
If , for example, 0.28 was released with a deprecated pull and a warning "pull is going away in X months and you'll have to upgrade" - it would have helped us plan better to migrate the code.

Totally agree with you. ISTM you can mitigate by doing the following:

  • Stop depending directly on the google-cloud package
  • Install each package individually. If you try all at once, pip might angry. Though I just tried
    python -m pip install \ 'google-cloud-bigtable==0.28.1' \ 'google-cloud-datastore==1.0.0'
    where the first requires google-cloud-core==0.28.x and the second google-cloud-core==0.24.x
  • After all packages are installed, run python -m pip install --upgrade google-cloud-core to make sure you actually have the latest one (pip may continue to overwrite it as different packages depend on it)

We never use google-cloud. We only use independent libraries.
The problem is these libraries usually collide on google-cloud-core or some other shared common libraries.
The problem isnt with pip install pairing but with google-cloud-core itself not always being backward compatible.
Specifically, pubsub 0.27 crashes on import if run with some newer google-cloud-core) - I dont have the exact details as we already upgraded everything.

Anyway, it would be helpful if for next breaking changes there would be some deprecation period.

Thanks, for the replies!

Good deal. We (hope) to be in a state of stability in google-cloud-core and google-api-core.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pongad picture pongad  路  4Comments

blainehansen picture blainehansen  路  3Comments

stevenschlansker picture stevenschlansker  路  3Comments

VikramTiwari picture VikramTiwari  路  3Comments

VikramTiwari picture VikramTiwari  路  4Comments