Google-cloud-go: how do I get the most performance out of the subscribe API?

Created on 12 Dec 2017  路  18Comments  路  Source: googleapis/google-cloud-go

What do I need to make this code pull and ack messages faster?

    sub := client.Subscription(*subscription)
    // monkey with sub.ReceiveSettings.*
    err = sub.Receive(ctx, func(ctx context.Context, msg *pubsub.Message) {
        msg.Ack()
    }()

Right now, once instance of my app with:
sub.ReceiveSettings.NumGoroutines = 10 * 6 * runtime.NumCPU()
can pull about 5K / minute, but if I take the same program and run 6 copies of it, I get > 6X performance from pubsub.

Any suggestions? I don't want to have to go back to the old API, but will if I have to.

pubsub feature request

Most helpful comment

Use negative values for both to turn off throttling. I would start there.

All 18 comments

What about MaxOutstandingMessages and MaxOutstandingBytes? They will throttle your throughput no matter how many goroutines you have.

I have PLENTY of memory and bandwidth. What values should I try for MaxOutstandingMessages & MaxOutstandingBytes?

Use negative values for both to turn off throttling. I would start there.

Holy moly. That worked. THANK YOU.
I was banging my head hard about that.

It's completely different performance. Utterly blazing. Seems like the documentation needs a clear section on tuning aspects of the ReceiveSettings for different scenarios.

Sweet! You had us worried there. We designed Receive for high throughput, and our internal testing told us that it delivered.

I will convert this to a feature request to improve the docs.

I know originally this library defaulted to something like this: sub.ReceiveSettings.NumGoroutines = 10 * runtime.NumCPU() but, due to concerns about too much concurrency, it was dropped to a value of 1. What is the recommended amount for high-performance?

I would have said 10x, but you've got it up to 60x. So I don't have a good answer. With throttling off, that is your only remaining tunable parameter, so play with it.

@pongad, any suggestions?

I'm not sure I have a better suggestion; I think some experimentation is necessary.

If your computer runs infinitely quickly, it makes sense to make NumGoroutines large so there are more goroutines pulling messages. Real computers don't actually run that quickly, in which case setting the number too high will make the performance worse. The messages are stuck waiting for CPU on one machine and pubsub server can't send them to another machine.

I suppose it's possible for us to implement some kind of feedback mechanism to create just enough goroutines to fill flow control. I experimented with the idea but it's quite complicated and I never actually got it right.

Based on this discussion (and past code defaults) and to summarize, I'm going to stick with this:

    sub.ReceiveSettings.NumGoroutines = 10 * runtime.NumCPU()
    sub.ReceiveSettings.MaxOutstandingMessages = -1
    sub.ReceiveSettings.MaxOutstandingBytes = -1

as my defaults for high performance situations. Thank you all very much for the quick and direct feedback to make this work.

@willie Just a fair warning, the NumGoroutines is the number of goroutines pulling the messages, not processing them. We keep spawning goroutines until we're capped by either MaxOutstandingMessages or MaxOutstandingBytes.

If you ack messages so quickly, this is probably not a big deal. If you ack slow though, you might run out of memory. -1 is OK for trying things out but might not be for production systems.

@pongad Thanks for the extra information. I will try the NumGoroutines default value on my next run, as I do ack immediately (I rate limit and resource control downstream from the pubsub receive). (I guess I was messing with the wrong knob all this time.)

Good news. I was able to achieve the performance I needed without modifying NumGoroutines from the library defaults and passing -1 to MaxOutstandingMessages & MaxOutstandingBytes. I understand the caveats of that firehose, but it resolves the specific issue I was having.

I would like to understand, however, as to why MaxOutstandingMessages & MaxOutstandingBytes at their default values were slowing throttling me over time, when I was immediately msg.Ack() as the first line of my lightweight receive func.

@jba Thinking about this a little more, I'm also confused. I just ran a test. Leaving everything at default, I was able to pull 20K messages per second, orders of magnitude more than what @willie reported.

@willie If you want to dig more into this, can you let us know how many goroutines you have running with default settings? If you can share stack trace of some of them, that'd be really useful as well (http/pprof should help here).

@willie do you ack early in your callback, but then keep doing stuff? Our current implementation calls the flow controller's release method after the callback returns, not when the message is acked.

@jba Yes, I ack immediately (because I thought it would speed things up), then I parse a JSON message from the data, 4 string manipulations, spawn a goroutine, and then return from the callback.

@pongad That was not my experience. Well, sort of, it was fast at the beginning (not as fast as you describe, but that have been my network path (I'm not consuming from GCP at the moment)), then it trailed off to a much slower rate like a shallow curve.

@jba I recently talked to pubsub team. They recommend us releasing flow-control after the message is acked/nacked, not when the function returns. This isn't a top-priority; I can pick this up if you'd like?

@willie In general, we recommend calling ack/nack after you're "finished" with the message. In that way, if your machine dies while processing, the message is redelivered.

@pongad I agree with the general recommendation.

My particular use case is OK the way it is because the publisher is going to publish again and again until the situation that the subscriber is resolving is resolved. (Resolution involves uploads (and I have coalescing request logic in place), so I'd rather ack and process, rather than wait to ack after process.

Closing, since we've fixed #870.

Was this page helpful?
0 / 5 - 0 ratings