Google-cloud-node: pubsub.subscribe fails if called twice with the same subscription name

Created on 22 Apr 2016  路  29Comments  路  Source: googleapis/google-cloud-node

The sample code for pubsub will fail if it's run twice.

The "subscribe" method:

topic.subscribe('new-subscription'...)

Will fail if "new-subscription" already exists. I see the following error:

_Resource already exists in the project (resource=new-subscription)._

question pubsub

Most helpful comment

Maybe a simply solution for now is to just add { reuseExisting: true } to the README so users don't get tripped up in their first 5 minutes

All 29 comments

All subscriptions must be given unique names. That line is saying: "Create a new subscription named new-subscription", so it is expected to fail the second time.

Do you think it would be more clear if we changed it to:

// Subscribe to the topic.
topic.subscribe('new-subscription-name', function(err, subscription) {

Sure, but presumably we don't want to be creating thousands/millions of subscriptions? (one per code invocation). What I _really_ want to do is get back to the same subscription, but it seems there are a bunch of hoops I need to jump through to do that.

Are we saying that the "recommended" approach is to first call getSubscriptions to see if the one you're looking for is there, and if it's not call subscribe?

Seems to me that it would be simpler for the developer if subscribe were a "get-or-create"

You can use reuseExisting (https://googlecloudplatform.github.io/gcloud-node/#/docs/v0.31.0/pubsub/topic?method=subscribe):

topic.subscribe('maybe-subscription-name', { reuseExisting: true }, function(err, subscription) {
  // subscription was "get-or-create"-ed
});

Another way to do it (this is a pattern throughout our whole API):

var subscription = topic.subscription('maybe-subscription-name');
subscription.get({ autoCreate: true }, function(err, subscription) {
  // subscription was "get-or-create"-ed
});

Hm - seems like that should be the default, no ?


Adding more context...

If we default to {reuseExisting: true}, we're effectively saying "this is the worker model" in the sense that you're creating more instances that will pull from the same subscription (ie, 1 message == 1 item of work)

If we default to {reuseExisting: false} (as we do today), we're saying "this is the broadcast model" in that your each instance will get all the messages (ie, 1 message = N items, where N is the number of nodes).

For the worker model, names matter, and should be required. For the broadcast model, names don't really matter so much (the _topic_ matters a lot, but the name of the subscription doesn't seem that useful, right?)

Is this ringing true with everyone? or am I crazy? @tmatsuo @stephenplusplus

reuseExisting should be true by default? I don't think so, because if it exists, you would just use:

var subscription = topic.subscription('subscription-name');
subscription.on('message', function(message) {});

If you're unsure if it exists, that's why we have both of those examples from my last post.

I'm not sure I understand the Pub/Sub use cases where users are uncertain if a topic exists (related: #696), but if you both think that defaulting reuseExisting is the right way to go, I'll gladly send a PR. Is it worth considering removing the reuseExisting option altogether?

// @tmatsuo since we're talking Pub/Sub stuff.

RE: worker/broadcast model concepts, I haven't looked at it like that before, but I think that makes sense. If I'm following, are you saying we should generate subscription names for the user? (Related: https://github.com/GoogleCloudPlatform/gcloud-ruby/issues/193)

So making this into code:

I'm a worker! I might not see _every_ message that gets published!

var topic = pubsub.topic('tweets');
pubsub.subscribe(topic, 'parse-language', { reuseExisting: true }, function(err, sub) {
  sub.on('message', function(msg)  {
    // Detect the language of this tweet!
    // Then ack!
  });
});

I'm just a listener to everything! Bring it on!

var topic = pubsub.topic('tweets');
pubsub.subscribe(topic, 'random-listening-subscription-id', {autoAck: true}, function(err, sub) {
  sub.on('message', function(msg) {
    console.log('Got a tweet! Awesome!');
  });
});

Though the latter case might be better as...

var topic = pubsub.topic('tweets');
pubsub.subscribe(topic, function(err, sub) {
  sub.on('message', function(msg) {
    console.log('Got a tweet! Awesome!');
  });
});

Maybe a simply solution for now is to just add { reuseExisting: true } to the README so users don't get tripped up in their first 5 minutes

Thanks for the code examples, it always helps! So is this right?:

If I give a name to subscribe, I expect it to be created. Remove the reuseExisting option. If it can't be created because it already exists, don't return an error and just use the subscription.

If I don't give a name to subscribe, just create one for me named something random.

I'd want @tmatsuo to chime in here if possible... but that's my thought... Sometimes I want to just listen on a topic, and I don't ever intend to share the listening responsibilities with anyone else...

Ping @tmatsuo for thoughts!

Sorry guys for the late reply.

Currently subscription name automatic assignment doesn't work as expected, so there's no reliable way of auto assigning a subscription name. However, the use cases that @jgeewax explained make sense to me.

@jgeewax should we wait until that feature is supported to change the model here or implement now and do our own name generation, e.g. {projectId}-{uuid}?

If you auto-generate a name, isn't there a risk that the user will end up back in the world of thousands of zombie subscriptions? (does that matter?)

Would it make sense to just have a "default" subscription name, such that the absence of a name argument in subscribe simply means using a default name? (like, I dunno: "DEFAULT"). The implication being that any client who subscribes in this way will be using the same subscription instance

@jgeewax @tmatsuo any thoughts?

It's just my opinion. I would not implement subscription names auto generation on the client side. There are many corner cases and the implementation will be complex. I would wait for the server side to implement it.

I don't think zombie subscription is a big problem. Subscriptions which are inactive for 30 days will be automatically get turned down. Also messages delivered to the subscriptions are only retained for 30 days.

@eschapira
WDYT?

I think I'm going to suggest the opposite of Takashi.

The autogeneration is simple; on the server side this is the code (in java)

  // Compose subscription name of topic + a random long number in hex format.
  String subscriptionName =
      topicName + "-" + String.format("%x", Math.abs(Random.nextLong()));

It would be fine to do this on the client.

Regarding the default subscribe() behavior, I would prefer if it doesn't create unintended random subscriptions, it will pollute the client resources and possibly put unnecessary load on the server if this pattern repeats too much. Users should explicitly say autoCreateSubscription: true or something like that.

What happens if the topic name has a length of 255?

It's also tricky if multiple subscribers are auto creating subscriptions at the same time. because then they are unintentionally using the same subscription?

Although I'm not strongly against implementing it on the client side. If the implementation is robust, I'm fine with it.

Thank you @tmatsuo and @eschapira for the information.

@jgeewax how shall we proceed?

Can you summarize the options here? I realize that the thread started with "don't fail when subscribing twice", but it progressed to a few different use-cases on top of that, right?

Yes, this is mostly a discussion of an API shift you suggested here, allowing for unnamed subscriptions to be created automatically. I think the best thing to do would be going over the thread starting from there, as we had many opinions weigh in, so it'd be unfair of their expertise if I left any of their points out in a summary.

@jgeewax @omaray

@stephenplusplus I think API could be slightly expanded and simplified imho, when I first looked into pubsub (having used common alternatives in the past), I was expecting something like so:

/*
 - if topic exists, reuse otherwise auto create 
 - publish()/subscribe() topic arg can be a string
 - wildcard support for topics, then subjects (aka subscription ids) make more sense
 - subjects are optional, default to *
 - simple publish API pubsub.publish(topic, subject[optional], msg), 
   handle all logic internally
*/

var pubsub = gcloud.pubsub(conf);

// BROADCAST - all subscribers receive the message
pubsub.subscribe(topic, function(msg, subject) {
  console.log('received message', msg, 'on subject', subject);
});

// WORKERS/QUEUE - message delivered to only one subscriber (ack built in)
pubsub.subscribe(topic, {queue:'myjob.workers'}, function(msg) {
  console.log('starting job');
})

// REQUEST/REPLY - unsubscribe after X responses are received (1 in below example)
pubsub.subscribe(topic, function(msg, replyTo) {
  var response = {foo:'bar'};
  pubsub.publish(replyTo, response);
});

pubsub.request(topic, msg, {autoUnsubscribe:1}, function(response) {  
  console.log('receive response', response);
});

I'm going to attempt to wrap the current API see if I can accomplish the above

Thanks! I also like the idea of creating a subscription without requiring a name, although I think keeping the event emitter works well with the Pub/Sub model:

topic.subscribe(function(err, subscription) {
  if (err) // API error, subscription couldn't be created
  subscription.on('message', function(message) {})
});

If we wanted to add a shortcut, maybe something like:

topic.onMessage(function(err, message) {
  if (err) // API error, subscription couldn't be created
  // subscription created & registered for message events
});

I think that extra functionality you drew up would fit well in a library that wraps ours.

Here's where we stand:

  • remove opts.reuseExisting from topic.subscribe('name', opts)
  • automatically generate a subscription name for the user if they don't provide one

@jgeewax explained the worker vs broadcast model:

A worker takes an item from the message queue, works, acks. The name of the subscription is important to the worker, so the same work isn't performed on the same message multiple times.

The broadcast model is for a listener who just wants to tap into the incoming messages. The name of the subscription doesn't matter in this case.

Before

topic.subscribe('my-subscription', { reuseExisting: true }, function(err, subscription) {
  // subscription was either created or re-used
})

After

topic.subscribe('my-subscription', function(err, subscription) {
  // subscription was either created or re-used
})

To differentiate between the worker and broadcast models, we will use the presence of a subscription name in place of opts.reuseExisting. So, if I just want to receive all of the messages:

topic.subscribe(function(err, subscription) {
  // subscription with a random name was created
})

Any corrections or thoughts before I move forward with a PR?

PR sent in #1799

Was this page helpful?
0 / 5 - 0 ratings

Related issues

charly37 picture charly37  路  3Comments

nicolasgarnier picture nicolasgarnier  路  4Comments

hvolschenk picture hvolschenk  路  4Comments

jmdobry picture jmdobry  路  3Comments

pputhran picture pputhran  路  4Comments