Google-cloud-go: Concurrency model?

Created on 2 Jun 2017 · 14Comments · Source: googleapis/google-cloud-go

I'm new to go, so maybe there is an obvious answer to this, but I can't seem to find any references to channels in the documentation. Methods don't seem to take channels, nor return them.

How exactly is one supposed to write a concurrent program that does multiple things with a google cloud client? Are we intended to create a client in each separate go routine? It seems like pooling, authorization and streaming in results would be so natural to express with go's concurrency model.

question

Source

kevhill

Most helpful comment

My final thought is that returning channels or an async api also significantly increases the complexity for synchronous use. It's generally easier to add concurrency to a synchronous api than to make an async api synchronous.

derekperkins on 23 Jun 2017

👍2

All 14 comments

You create your client beforehand and share it among your many goroutines. Each goroutine might be doing a part of a copy to GCS, for instance, but all using the same client.

adams-sarah on 3 Jun 2017

@kevhill, There's a bit of a paradox in the Go world. Go has all these great concurrency features, but you'll find that most (well-designed) Go APIs don't expose them. Instead they are boring sequential APIs. The reason is that it is easy to add concurrency to a sequential API, but relatively hard to make a concurrent API sequential. Also, APIs that expose things like channels tend to have machinery under the hood that must be managed, or at least understood.

Take one thing you mentioned, a method that returns a channel. If it returns a channel, there's probably a background goroutine feeding that channel. Which means there needs to be a way to shut down that goroutine, or the program may have a leak. That implies another channel or method somewhere. It turns out to be simpler to let the caller add concurrency if they want it.

None of this was obvious in the beginning. But it became clearer as the Go community gained more experience with the language.

I'm going to close this issue, but feel free to re-open if you have more questions.

jba on 6 Jun 2017

Ok, I can see why that might be confusing to expose channels, but as someone new to go/this api it is super confusing what is or isn't goroutine safe.

It sounds like Clients are, but the docs don't help you with that at all (not even telling you how to specify a clients type for a function declaration that you'd need to create your goroutines) but what about Buckets? or GCS Objects? Can those be shared between goroutines safely?

I guess I could just try everything and see what breaks, but often concurrency issues are tricky to debug. And it feels like I'd be learning the API by accident rather than having something clear and concise that I can reason about.

Maybe I'll take issue with the notion that

it is easy to add concurrency to a sequential API, but relatively hard to make a concurrent API sequential

In a good concurrency framework it should be pretty simple to do something like x := <- thing.IOFunc() to make it sequential. Maybe channels are a bad thing to expose, but I feel like you can't get away from having some idiomatic concurrency object that is returned by libraries (eg in node this would be a Promise). If that is hard to do without leaks etc, then concurrency in go is hard, which seems like a big knock against the language.

kevhill on 20 Jun 2017

It sounds like Clients are, but the docs don't help you with that at all (not even telling you how to specify a clients type for a function declaration that you'd need to create your goroutines) but what about Buckets? or GCS Objects? Can those be shared between goroutines safely?

I think this is actually a fair point.

It's hard for users to see that BucketHandle (eg.) doesn't hold any mutable state (or rather, any state that would be mutated), and thus, it may not be obvious that it can be safely used concurrently.
I think that storage is also a great example of one of our libraries that many folks would want to use concurrently - up/downloading large amounts of data.

Maybe we can provide some examples. Or even the words "can be used concurrently" on these types in the godoc.

adams-sarah on 20 Jun 2017

We need to improve the docs about what is goroutine-safe.

In a good concurrency framework it should be pretty simple to do something like x := <- thing.IOFunc() to make it sequential.

It is. The hard part is, what if the user just does thing.IOFunc(), i.e. they drop the channel on the floor? Then a goroutine may have been leaked.

I feel like you can't get away from having some idiomatic concurrency object that is returned by libraries (eg in node this would be a Promise).

Why do you feel that we can't? It looks like we have, by writing simple sequential APIs. We just have to document them better. Can you give some examples of what you're looking for? (Assume for now that anything you want to be goroutine safe, is.)

You may like the Pub/Sub client. It has a lot of internal concurrency, and it does have something like a promise. The rationale for building concurrency into the client was that high performance, which involves asynchrony and pools of goroutines, should be easy to write out of the box—which I believe is your point. But that point of view isn't appropriate for every client.

jba on 22 Jun 2017

@kevhill, take a look at #677. There you have two _extremely_ experienced Go devs trying to figure out how to use an asynchronous API (our logging client). Maybe the API sucks, but part of the problem is just that it is asynchronous.

jba on 22 Jun 2017

@jba I wouldn't say it sucks, but I think you are caught in a paradox.

If it is hard to do concurrency right, then leaving it up for every developer makes for a bad API. If best practices can be set, then not exposing tools to help enforce those best practices makes for a bad API.

There will always be edge cases. Again I think node and Promises offer a good comparison, as that is probably the most widely use concurrency framework today. Here's a great article about how to break node's core concurrency model. The good news is that for 99% of developers, they are already using node in a way that is just fine.

So, I'll go back to my earlier statement. If a good API can't expose things that help developers think and reason about concurrent processes, then something is wrong. You guys would certainly know a hell of a lot more than me about how to make it right. Maybe it isn't a single channel, but some idiomatic combination of multiple channels? I guess it might be true that async programming and strongly-typed languages are in conflict (as by definition you don't know what to expect until later) but my guess is that go as a community just needs to come up with their version of something like a Promise that gives all developers an easy way to interact with concurrent processes.

kevhill on 22 Jun 2017

Sorry, I read your comments out of order.

Docs certainly help, but function signatures and return types are much better. At best maybe under the current circumstances I end up with a bunch of boilerplate code maybe something like:

c := make(chan string)

go func() {
    c <- thing.IOFunc()
}()

...
x := <- c

I guess that works? But again, if you are saying that spawning channels like that is dangerous, I'm not sure why it is better for me as an end developer to do it than for the library developers to do it.

kevhill on 22 Jun 2017

@kevhill Node and promises are a much different way to approach concurrency, with pros and cons like you've mentioned. Spawning channels and running your own goroutines is a core part of the language and isn't inherently "dangerous". What he's referring to is using a channel outside of package boundaries. When you create your own channels and goroutines, you have control of both ends, so you are fully aware and responsible for the concurrency. When a channel is created inside another package, that split responsibility is what makes it easier to leak goroutines. Most of the time you want to do more complex things than your example, and there is no one right way to expose that api.

If you're looking for a simpler way to manage concurrency, you can look at https://golang.org/pkg/sync/#WaitGroup. You don't have to use channels if you find them cumbersome.

In general, Go isn't looking for the most concise way to write an api, it's looking for the most readable way to look at your code. It may take a few more lines, but the lack of magic makes almost every library surprisingly readable.

derekperkins on 22 Jun 2017

If you try and push concurrency into every client package that you use, there are also configuration options to consider. How much concurrency do you want? 2 at a time, 10 at a time? Do you want it to return after a single error or do you want to try all the things, no matter how many fail? The return value has to become a struct, so it isn't chan string, it's chan StringAndError. Every package you use would support a different set of options, expressed in a number of different ways.

With a synchronous api, all of a sudden things become much simpler inside your project. We use https://github.com/nozzle/throttler extensively (we wrote it) to reduce the boilerplate like you mentioned, and it has the options we need. Now whenever we encounter someone else's code in our own app, we immediately know what the concurrency model is, without having to research the sub-package.

There's a little more complexity than promises for sure, but I wouldn't trade the added flexibility of Go's concurrency for the slightly more succinct, but limited concurrency support of promises.

derekperkins on 22 Jun 2017

@derekperkins Not to beat a dead horse, but if the problem is

When a channel is created inside another package, that split responsibility is what makes it easier to leak goroutines.

Then that should be solved by designing some type object(s) that packages can return in a relatively safe fashion that allows you to prevent leakage in 99% of cases.

I'm not saying 'nothing you guys are doing works' or 'there's no reason for it' but just trying to give you guys a good outsider's perspective. When I mentioned to another developer friend that Google's own gcloud go library didn't expose channels, he was also super surprised. It isn't something that is intuitive and means that you have to learn a lot of design patterns in the language before you can start using the packages effectively.

I'm also not saying that you should implement promises, but that you should implement something that helps people reason and manage concurrent processes. For example, that boilerplate I put above just achieves the very first step. There'd need to be more boilerplate every time you wanted to fire off multiple io functions and wait until they were all finished, and more boilerplate to include things like timeouts, etc. To someone who just wants to start using go it doesn't really live up to golang's promise of "makes it easy to build simple, reliable, and efficient software." And even for experienced developers, it isn't very DRY.

A couple of quick retorts:

The return value has to become a struct, so it isn't chan string, it's chan StringAndError. Every package you use would support a different set of options, expressed in a number of different ways.

This is exactly what standards and conventions within a programming community prevent. By not taking a stand at the level of a package you basically ensure that every go developer will come up with their own method.

With a synchronous api, all of a sudden things become much simpler inside your project.

Absolutely not. There is real and true complexity in managing concurrent processes. If you don't put that complexity in the package, then it will be present in the project.

By bringing up Throttler, you are basically acknowledging everything I've said above. It looks pretty good. Why couldn't that, or something similar, be the standard way of interacting with most go packages that do IO?

kevhill on 23 Jun 2017

There is also the errgroup package.

The reason neither that nor throttler can be the standard is that there are too many other variations and patterns. The design space of concurrent programs is much larger than that of sequential ones.

I feel like we could have a better discussion if there were a specific problem or set of problems you needed to solve.

jba on 23 Jun 2017

👍1

@kevhill I'm not a Googler nor a developer on this package, just a user. From my perspective, I would not this package to return channels, as that would unnecessarily complicate things. Once you've been using Go for a while, goroutines and concurrency stop being a mental hurdle and become second nature.

By bringing up Throttler, you are basically acknowledging everything I've said above. It looks pretty good. Why couldn't that, or something similar, be the standard way of interacting with most go packages that do IO?

I don't disagree that a standard could potentially be nice, but like @jba said, there are so many variables that play into concurrent patterns that it would be near impossible to hit all those use cases with a single model. Even if you could, it would require a non-trivial amount of configuration options and levers, which would invariably slow down high volume processing, which is one of the core value props for concurrency.

At the core, this is just a difference in ideology between Go and Node. Go doesn't strive to be completely DRY and avoid boilerplate. That has its own cons that you have brought up and which are valid. For some devs, the extra flexibility that offers is worth the tradeoff, and it isn't for others. For us, we've abstracted our using into throttler so that adding concurrency anywhere we need it barely adds any mental or code overhead.

derekperkins on 23 Jun 2017

👍2

Was this page helpful?

0 / 5 - 0 ratings