Grpc-go: ClientConn is inflexible for client-side LB

Created on 7 Jul 2015  Â·  51Comments  Â·  Source: grpc/grpc-go

Client side LB was being discussed here:
https://groups.google.com/forum/#!searchin/grpc-io/loadbalancing/grpc-io/yqB8sNNHeoo/0Mfu4b2cdaUJ

We've been considering using GRPC for our new MicroService stack. We are using Etcd+SkyDNS for DNS SRV based service discovery and would like to leverage that for RR-like RPC load balancing between backends.

However, it seems that the current ClientConn is fairly "single-homed". I thought about implementing an LbClientConn that would aggregate multiple ClientConn, but all the auto-generated code takes ClientConn structure and not a swappable interface.

Are you planning on doing client-side LB anytime soon? Or maybe ideas or hints on how to make an early stab at it?

Most helpful comment

I treat "Dial" as a function call to dial a communication channel (i.e., ClientConn) for users so that users can put rpcs onto it. I think it is not necessary for Dial to create a real TCP connection or something similar, which is implementation details and should be transparent to users.

The term Dial, in every other Go project, means to create some sort of connection. When this is not the case, other terms, such as New or Open (see database/sql), are used. It is confusing to it to do anything else, and I've seen several developers stumble on this distinction. Furthermore, most of the "dial options" actually affect stream behavior and have nothing to do with dialing.

It seems like the model that is followed here is that of database/sql or net/http.Transport. Both of these pool connections behind the scenes but can be composed to provide more sophisticated behavior. For example, one can replace the entire transport implementation for an HTTP client to get interesting behavior. Yes, there are problems with this model, but it should be used to inform the API of this package.

To resolve this, I would start with the following (breaking) changes:

  1. Remove Conn and ClientConn. They expose a lot of internal detail and have confusing interfaces.
  2. Define an interface called Invoker. What does it do? It invokes!

go type Invoker { Invoke(ctx context.Context, args, reply interface{}, opts...CallOption) error }

  1. Define a new type Client to take care of the connection pooling and dispatch, similar to what Conn and ClientConn manage today. The Client implements Invoker. It also manages connection selection and lifecycle. It's behavior can be augmented by a set of ClientOption, allowing one to influence dialing, picking, stream instantiation and pooling.

go func NewClient(endpoint string, opts...ClientOption) (*Client, error)

Client may expose a few methods to monitor or bounce connections.

  1. Generated code wouldn't have to change much other than having the type Invoker, rather than *ClientConn. Generated code is no longer tightly coupled to the GRPC implementation. We get functions like the following if we have service Foo:

go func NewFooClient(invoker grpc.Invoker) FooClient

One can now write tests of my client without having to mock a service, bind a port and respond over http2.

  1. I would replace the Server type with just a Handler type. Most of the time, we already have a full net/http server running and just want to and GRPC as a handler.

While there are details that need to be worked out, the above would provide a starting point to provide a more idiomatic package. A better package structure would lead to less bugs and more productivity when using GRPC.

All 51 comments

Yes, we are actively working on client-side LB in gRPC as a whole and have been expecting to eventually support etcd and DNS SRV (among others). Much of how it will function cross-language has been defined at this point, but due to "reasons," like "time," hasn't been made public. I don't have an ETA on when the work will be started in Go, other than "soon," in part because of the July 4th holidays several people are on vacation and I can't ask them :).

That sounds great!

Can we use this bug for tracking client-side LB work in grpc-go or do you track it somewhere else?

We can leave for tracking for now at least. The "main" tracking issue is grpc/grpc#1790, which is... quiet. It is the issue which would make the design public. Java is tracking it at grpc/grpc-java#428 and has more information regarding some plans, but not quite enough for you to go off of.

Now I am back from vacation. I will update the issue once I finalize the design of LB.

I hope you enjoyed your vacation!

Great to hear this picking up steam, we'd be really interested in seeing a DNS SRV ReplicatedClientChannel :)

etcd and DNS SRV seems awesome!

Any news here? We could lend a hand implementing DNS SRV, @miekg :)

I am working on it actively. It is a big surgery to the existing code and
needs a couple of rounds of design discussion. So please hold.

Just FYI, our decision is that we won't make grpc.ClientConn an interface,
which is too coarse. We will make it happen in the underlying
implementation but we do allow ppl implement their own load balancer.

On Sat, Sep 5, 2015 at 7:36 AM, Michal Witkowski [email protected]
wrote:

Any news here? We could lend a hand implementing DNS SRV :)

—
Reply to this email directly or view it on GitHub
https://github.com/grpc/grpc-go/issues/239#issuecomment-137960632.

but we do allow ppl implement their own load balancer

Awesome!

If you can use them, we have at least one available gopher to help out.

@iamqizhao
I know you guys are working on this as high priority. Any ETAs on when the Picker and Naming APIs will be considered stable?

You can do this now just takes a some more code:

func (r *RPC) Dialer(addr string, timeout time.Duration) (net.Conn, error) {
        // Just some custom code to populate SRV records from dns or for testing the environment vars
        r.RegenerateSRV()
        for _, s := range r.SRV.Addrs {
                host := net.JoinHostPort(s.Target, strconv.Itoa(int(s.Port)))
                glog.V(2).Infoln("Connection tcp://", host)
                conn, err := net.DialTimeout("tcp", host, timeout)
                if err == nil {
                        return conn, nil
                }
        }
        return nil, errors.New("dialing failed") // do something better 
}

func (r *RPC) Dial() error {
        var opts []grpc.DialOption
        if dictator.GetString("agent.server.cert") == "" {
                opts = append(opts, grpc.WithInsecure())
        }
        opts = append(opts, grpc.WithBlock())
        opts = append(opts, grpc.WithDialer(r.Dialer))

        // filler is just that a required string that never gets used. 
        conn, err := grpc.Dial("filler", opts...)
        if err != nil {
                return err
        }
        r.Conn = conn
        glog.V(2).Info("Connection success")
        return nil
}

@iamqizhao Will there be any changes here? The current ClientConn, Conn and Picker approach is _extremely_ confusing. A few questions specific to this:

  1. What is the role of Conn versus ClientConn?
  2. How would I make a ClientConn but delay actual dialing to when a method is actually called?
  3. Will we be able to make atomic changes for connection routing? Imagine being able to do an atomic database failover that can route all new requests to another connection.
  1. Conn is an abstraction connecting to a single destination (i.e., at most 1 underlying transport at any time); a ClientConn may be consistent of 1 or more Conn;
  2. This is not the default behavior and you need to make your own picker impl by deferring NewConn call;
  3. Without the concrete requirements I cannot give the perfect answer. But I think this also can be achieved using a custom picker impl.

Note that Picker is still in experimental stage and may be subject to some revisions.

@iamqizhao Thanks for your response!

Conn is an abstraction connecting to a single destination (i.e., at most 1 underlying transport at any time); a ClientConn may be consistent of 1 or more Conn;

The current API doesn't really suggest this model. For example, I would expect ClientConn to work like a pool, providing Conn instances, or something to that effect. Perhaps, the dialer would produce Conn instances that would be part of the pool controlled by ClientConn. The most confusing aspect of this setup is that Dial doesn't actually perform a Dial. The dial action is actually taken when creating Conn, but the functionality seems to hidden away in the interaction between ClientConn and Conn private fields.

Will there be any efforts to clear up this naming and behavior? The current situation makes adoption of GRPC very hard, since there is overhead to get over the poor naming.

This is not the default behavior and you need to make your own picker impl by deferring NewConn call;

I got this working. However, it still requires an initial address provided as the target argument to Dial, which is basically a throwaway. Effectively, it requires one to query the target set before creating the connection or incur a timeout cycle, rather than deferring immediately to the picker.

It was also confusing to implement since we don't have the ability to create the very first connection. One must instantiate the picker and ClientConn with the same arguments, then join them up in Init to ensure it is associated with the correct address. We then lose the ability to create Conn instances, since the code around the PickAddr call just uses it. At this point, we've lost control over the lifecycle of the Conn instance that is created as a result.

My immediate feeling is that Picker is way over built but the problems come down to the organization around ClientConn and Conn, not the design of Picker itself.

Without the concrete requirements I cannot give the perfect answer. But I think this also can be achieved using a custom picker impl.

The canonical example would be having a leader/follower set. Let's say that we have a ClientConn that is must always be homed to the current leader. Any in-flight connections could be cancelled and all new RPCs would be sent to the current leader. Obviously, there would need to be some support in the target service for this to working correctly.

Note that Picker is still in experimental stage and may be subject to some revisions.

What are the plans here?

Any progress? On Jan 19 you mentioned on the mailing list that there will be a load balanced picker. For us that need the functionality, though, that doesn't help much.

The code tells us "not implement your own Picker for now". There's no Go spec for the next steps and, absent that, I doubt any PRs from users would be accepted, either. So we can't use a built-in solution, we have little guidance on how to help build one and are also discouraged from writing our own. And with vague/changing timelines, it's hard to make a decision on whether to go for an alternative. This issue has been open for 2/3 of a year now. It looks like grpc-go is understaffed.

Anyway, upstream has posted this a month ago: https://github.com/grpc/grpc/blob/master/doc/load-balancing.md

which smells a lot like GSLB. Is that the direction for grpc-go, too?

https://github.com/grpc/grpc/blob/master/doc/load-balancing.md is the direction for all gRPC implementations. The reason this work is kinda stuck now is because we observed some security concerns in that design and some major revision on the original design. Therefore, I stopped working on that part and will wait until the new design is out. We are working on that and we hope to get the new design done by the end of this month. Sorry about the delay.

Thanks for the update! We'll just use https://github.com/benschw/srv-lb and a custom Picker for now.

We have been working on a draft implementation of the load-balancing proposal - feedback is welcome: https://github.com/bsm/grpclb

@iamqizhao @dim @therc While I see the benefit of the load balancing proposals, this particular issue seems to be more about problems with the design of ClientConn and Picker. With a sane Go API model there, the rest of the discussion around proper load balancing implementations become academic. The current Go API design makes simple behavior challenging to implement.

@dim I think everything in the lb proposal above lines up with the use cases we have in mind, except that one may want to "push" server lists, rather than poll for them. Other than that, it looks like a very sensible API.

To be clear, here are the following use cases for resolving a client connection (if I missed any, pleas chime in):

  1. Load balancing to an arbitrary host for every request. These are services that are either stateless of have shared state between all instances.
  2. Load balancing to an individual host after resolution. These are services that may have a session state associated with a single instance.
  3. Directing requests to a particular host based on request content. An example might be a queuing service where the Picker (or something) resolves a location for a particular queue. It could also be a key value store where the connection is directed to a particular server where the data is located.
  4. Atomic switchover for a particular "class" of service. Imagine a leader/follower service instructed by a watch in an etcd. When there is a switch between leader/follower, the current requests must be cancelled and a new connection must be made to the right service. It is acceptable to rely on service behavior to support this. For example, a leader that is no longer a leader may throw errors that could be intercepted by the picker.

For the most part 1, and even 2, are very straightforward. They can be implemented with very little code but impose a lot of requirements on the backend service (especially use case 1). Use cases 3 and 4 are more interesting and can make grpc useful for very complex services, such as databases or queueing servers. Solving these is hard and it would be great to have GRPC do some of the heavy lifting. The behavior doesn't needed to be provided by GRPC, but it should provide solid hooks, such as connection picking and interruption, to implement such behavior in the Go API.

I agree that the API as implemented today is half baked and unintentionally broken for several purposes. It would have been superseded already if the new one hadn't taken so long.

Anyway: @stevvooe, why push instead of pull? The latter can happen on the client's own schedule, not the load balancer's. The push model does have the advantage of being able to forward right away updates about servers that just joined or left the pool. I suspect pull is going to stay, though, at least for historical reasons (and those might have had a rationale like mine or additional ones, such as efficiency).

I also agree about the need for flexible hooks for building more complex machinery on top, in an efficient manner.

Anyway: @stevvooe, why push instead of pull? The latter can happen on the client's own schedule, not the load balancer's. The push model does have the advantage of being able to forward right away updates about servers that just joined or left the pool. I suspect pull is going to stay, though, at least for historical reasons (and those might have had a rationale like mine or additional ones, such as efficiency).

This may be one of those "academic" discussions. The chief reason I bring up pull is to have synchronized switchover. There is a resource cost to this, such as an active connection and server-side session.

Whatever the advantages of push versus pull actually are, it should be a decision up to the application developer. The actual requirements for the GPRC Go API are similar in either case and should be able to support both.

I'll try to come up with a proposal if this issue is still stalling.

Sorry for late response on this. Let's try to have an agreement on the solution.

I am kind of lost in the discussion. Concretely, what is the problem of the current Picker and ClientConn?

I think the central part of @stevvooe's comment https://github.com/grpc/grpc-go/issues/239#issuecomment-190875540 sums up the main issues (puzzling dummy argument passed to Dial, loss of control over Conn objects).

I treat "Dial" as a function call to dial a communication channel (i.e., ClientConn) for users so that users can put rpcs onto it. I think it is not necessary for Dial to create a real TCP connection or something similar, which is implementation details and should be transparent to users.

It is not clear to me what "loss of control over Conn objects" means. Actually what @stevvooe described in his first paragrah is the current model. ClientConn uses a picker to provide Conn instances.

I treat "Dial" as a function call to dial a communication channel (i.e., ClientConn) for users so that users can put rpcs onto it. I think it is not necessary for Dial to create a real TCP connection or something similar, which is implementation details and should be transparent to users.

The term Dial, in every other Go project, means to create some sort of connection. When this is not the case, other terms, such as New or Open (see database/sql), are used. It is confusing to it to do anything else, and I've seen several developers stumble on this distinction. Furthermore, most of the "dial options" actually affect stream behavior and have nothing to do with dialing.

It seems like the model that is followed here is that of database/sql or net/http.Transport. Both of these pool connections behind the scenes but can be composed to provide more sophisticated behavior. For example, one can replace the entire transport implementation for an HTTP client to get interesting behavior. Yes, there are problems with this model, but it should be used to inform the API of this package.

To resolve this, I would start with the following (breaking) changes:

  1. Remove Conn and ClientConn. They expose a lot of internal detail and have confusing interfaces.
  2. Define an interface called Invoker. What does it do? It invokes!

go type Invoker { Invoke(ctx context.Context, args, reply interface{}, opts...CallOption) error }

  1. Define a new type Client to take care of the connection pooling and dispatch, similar to what Conn and ClientConn manage today. The Client implements Invoker. It also manages connection selection and lifecycle. It's behavior can be augmented by a set of ClientOption, allowing one to influence dialing, picking, stream instantiation and pooling.

go func NewClient(endpoint string, opts...ClientOption) (*Client, error)

Client may expose a few methods to monitor or bounce connections.

  1. Generated code wouldn't have to change much other than having the type Invoker, rather than *ClientConn. Generated code is no longer tightly coupled to the GRPC implementation. We get functions like the following if we have service Foo:

go func NewFooClient(invoker grpc.Invoker) FooClient

One can now write tests of my client without having to mock a service, bind a port and respond over http2.

  1. I would replace the Server type with just a Handler type. Most of the time, we already have a full net/http server running and just want to and GRPC as a handler.

While there are details that need to be worked out, the above would provide a starting point to provide a more idiomatic package. A better package structure would lead to less bugs and more productivity when using GRPC.

The term Dial, in every other Go project, means to create some sort of connection. When this is not the case, other terms, such as New or Open (see database/sql), are used. It is confusing to it to do anything else, and I've seen several developers stumble on this distinction. Furthermore, most of the "dial options" actually affect stream behavior and have nothing to do with dialing.

In grpc-go, I would say "Dial" creates ClientConn to perform rpcs. I want to emphasize that the connection could be an abstraction instead of a real tcp connection. And I think Dial still fits well here.

Some stream behavior options here is to provide a connection-scoped config so that users do not need to configure every rpc on that connection.

It seems like the model that is followed here is that of database/sql or net/http.Transport. Both of these pool connections behind the scenes but can be composed to provide more sophisticated behavior. For example, one can replace the entire transport implementation for an HTTP client to get interesting behavior. Yes, there are problems with this model, but it should be used to inform the API of this package.

We are talking about different abstractions here. For gRPC-Go, we have
i) ClientConn: contains all the control plane logic (e.g., create and pick Conn) to manage multiple Conn;
ii) Conn: contains all the control plane logic (e.g., reconnect) to manage a single Transport;
iii) Transport: data plane for real data flowing. And yes, you can replace the entire transport impl at the level. One of gRPC's goals is to support various transports (e.g., HTTP2, QUIC, etc.)

Picker (part of ClientConn) is working on top of Conn layer (instead of Transport layer) so that we still have per-transport control logic.

In grpc-go, I would say "Dial" creates ClientConn to perform rpcs. I want to emphasize that the connection could be an abstraction instead of a real tcp connection. And I think Dial still fits well here.

@iamqizhao Respectfully, I think you're completely missing the point here. There is a well-established convention that Dial does a certain thing. The behavior here has caught out _every_ experienced Go programmer I've worked with on GRPC related code.

We are talking about different abstractions here.

We aren't at all. I implore you to go back and thoroughly read my commentary. There are a few tweaks here that can expose the right abstractions. Many of these problems can be alleviated by leveraging the well-thought patterns from the standard library and extending them where needed.

@iamqizhao Respectfully, I think you're completely missing the point here. There is a well-established convention that Dial does a certain thing. The behavior here has caught out every experienced Go programmer I've worked with on GRPC related code.

ok, I would buy this. I probably can tune the code so that once Dial is returned at least 1 network connection is started (or setup if grpc.WithBlock() is set.).

We aren't at all. I implore you to go back and thoroughly read my commentary. There are a few tweaks here that can expose the right abstractions. Many of these problems can be alleviated by leveraging the well-thought patterns from the standard library and extending them where needed.

Thank you very much for your thoughts and contributions here. It is very helpful on finalize the design. Keep in mind that we even have not finalized the design of load balancing story for gRPC yet (new issues keep jumping out.). So this is really a big moving piece right now.

Your proposal seems missing some requirements. For example, how can a user add his custom load balancing scheme in your proposal (e.g., he wants to do weighted round-robin on the list of addresses returned by the name resolver. )? In my understanding, he needs to create his own invoker implementation (in his own package) but lacks the building blocks because there are no ClientConn and Conn or something similar.

More importantly, let me clarify some hard requirements here before we proceed any design proposals: i) we must NOT break any existing user-facing API except the experimental ones; ii) ClientConn or its counterpart cannot be an interface because we do not want to mislead users to let them feel they can make their own custom impl. The current Picker design was designed under these hard restrictions. To be honest, I am not satisfied with it too and do plan to improve/fix it in short team.

It seems github issue is not a good place to discuss the design issues like this. I will try to draft my ideas into a google doc and have the discussion there early next week.

It seems github issue is not a good place to discuss the design issues like this. I will try to draft my ideas into a google doc and have the discussion there early next week.

I agree with your sentiment, but let's make sure these issues are discussed in a transparent manner. I do apologize for slightly hijacking this issue, but I believe the problems with load balancing are rooted in problems with the Go API design.

For example, how can a user add his custom load balancing scheme in your proposal (e.g., he wants to do weighted round-robin on the list of addresses returned by the name resolver. )? In my understanding, he needs to create his own invoker implementation (in his own package) but lacks the building blocks because there are no ClientConn and Conn or something similar.

The overall approach is to define a clear role between Client, Transport and net.Conn while allowing one to augment dialing (net.Dialer), invocation (grpc.Invoker) and transport instantiation (grpc.Picker or something similar). The augmentation can be achieved by configuring the client, which has a long lifecycle in an application.

My proposal is really the first steps that need to be taken to make this package usable for the average Go user. The role of ClientConn falls to Client in the above proposal. A Picker in this proposal ends up sitting between the Client and the Dialer to setup and manage transport state. A Picker ends up looking a lot like some sort of transport allocator, which is kind of what it is doing now, but we make this role explicit.

i) we must NOT break any existing user-facing API except the experimental ones;

I would really hope you would reconsider this or someone may fork. The package, in its current state, has a number of problems and doesn't warrant this level of backwards compatibility guarantee. I'm sorry but the quality just isn't there. I'd be more than willing to update existing code. The cost is much less than having to shim non-idiomatic API usage and instruct others in using unfamiliar APIs. I've wasted days and a thinning reputation for choosing GRPC after running into these problems.

That said, there is most definitely a way to maintain compatibility and bring these changes. ClientConn can implement Invoker and still be an argument for grpc.Invoke. They will just be deprecated. If that is insufficient, a simple rewriter can be implemented that can automatically update code. The touch points are really just the NewXXXClient functions and their generated method implementations. For most code bases I'm working with, the updates would be mostly around connection instantiation and use of the Picker, which is experimental anyways. For most users, I'd expect the impact of API breakages to be minimal if careful changes are made.

ii) ClientConn or its counterpart cannot be an interface because we do not want to mislead users to let them feel they can make their own custom impl.

What is your goal in limiting users ability to make a custom implementation? Right now, the generated code is only calling grpc.Invoke, which is tightly integrated with ClientConn and Conn implementations. This should just be an interface so that complexity can be hidden in the actual client. The current behavior makes adding cross-cutting behavior, such as custom logging and metrics a chore. We either have to add it everywhere, write a new plugin that adds them or fork GRPC.

ok, I would buy this. I probably can tune the code so that once Dial is returned at least 1 network connection is started (or setup if grpc.WithBlock() is set.).

Ultimately, it's probably better not to touch it if you're adamant about not changing it. Dial means to dial a single connection that can be used as such and ClientConn simply can't be used that way. More documentation about the role of ClientConn would probably be the right approach. Right now, it just says it makes a connection, which it doesn't actually do. This should probably be augmented with information about the ClientConn lifecycle and best practices.

I do not like the idea to fuse rpc invoking and connection management together and do not think it is necessary. In my understanding, the core issue bothering you is that the ClientConn, Conn and Picker do not have desirable and clear abstraction and interface. I am going to try to have a proposal to address/improve it. Since the very similar thing works very well inside Google, I do believe this is addressable without ruining the existing user-facing API.

I do not like the idea to fuse rpc invoking and connection management together and do not think it is necessary.

I apologize for not making a very clear point, as my proposal is to decouple them. By declaring an interface, invocation and connection become separate components.

In general, the current design is very tightly fused. A *ClientConn _must_ be passed to a NewXXXClient function. Then, grpc.Invoke access private fields on grpc.ClientConn and grpc.Conn.

Since the very similar thing works very well inside Google, I do believe this is addressable without ruining the existing user-facing API.

Google has a lot of internal infrastructure, such as machined-local load balancing, that can help to make this particular abstraction work very well. In the outside world, the environments are not nearly as homogeneous. The interfaces and abstractions must be much more flexible to work in the myriad environments in which GRPC may now find itself in. For example, in Google may find it acceptable to deploy a separate load balancing process that can manipulate IP tables to route RPC requests, but this may be impossible in another environment.

@iamqizhao just FYI, I'm going to be unfortunately on vacation for our fortnightly sync-up, but just a quick reminder that I pulled out an "invoker" interface without too much trouble here: https://docs.google.com/document/d/1weUMpVfXO2isThsbHU8_AWTjUetHdoFe6ziW0n5ukVg

I apologize for not making a very clear point, as my proposal is to decouple them. By declaring an interface, invocation and connection become separate components.

I meant you put both of them into a single "Client" struct.

Google has a lot of internal infrastructure, such as machined-local load balancing, that can help to make this particular abstraction work very well. In the outside world, the environments are not nearly as homogeneous. The interfaces and abstractions must be much more flexible to work in the myriad environments in which GRPC may now find itself in. For example, in Google may find it acceptable to deploy a separate load balancing process that can manipulate IP tables to route RPC requests, but this may be impossible in another environment.

Google has remote load balancing too. In our load balancing design, we do not introduce any new models beyond what we have seen inside Google. We are happy to know if there are outliers it does not cover. So far we have not found any.

For example, in Google may find it acceptable to deploy a separate load balancing process that can manipulate IP tables to route RPC requests, but this may be impossible in another environment.

Kubernetes has iptables-based balancing, but Borg never did. You'd use organization-wide balancers, managed balancers, private balancers or a client-level balancer.

@iamqizhao Here the main load balancing models I enumerated above:

  1. Load balancing to an arbitrary host for every request. These are services that are either stateless of have shared state between all instances.
  2. Load balancing to an individual host after resolution. These are services that may have a session state associated with a single instance.
  3. Directing requests to a particular host based on request content. An example might be a queuing service where the Picker (or something) resolves a location for a particular queue. It could also be a key value store where the connection is directed to a particular server where the data is located.
  4. Atomic switchover for a particular "class" of service. Imagine a leader/follower service instructed by a watch in an etcd. When there is a switch between leader/follower, the current requests must be cancelled and a new connection must be made to the right service. It is acceptable to rely on service behavior to support this. For example, a leader that is no longer a leader may throw errors that could be intercepted by the picker.

I assume these are all represented within Google, by your declaration.

I would also think that, in addition to remote load balancing, we'd want to support in-process load balancing, which is the issue at hand. Note that in each one of those classes of load balancing, there may be services that implement it at the application-level, which is currently problematic in the current Go API.

I meant you put both of them into a single "Client" struct.

I don't think I ever proposed struct contents. Client acts like an integration plane in this model. The implementation may immediately dispatch to the Picker or Transport but the user mostly only has to interact with a configured *Client or Invoker.

FWIW we're working with the nghttp2 developers @tatsuhiro-t to get per-request load balancing working (which is the main use-case of client-side load balancing as far as I can tell, since we're less concerned about security since everything is firewalled off internally). Having the proxying and load-balancing done on the load balancer seems more sane in that it keeps both client code and server code simple.

https://github.com/nghttp2/nghttp2/issues/566
https://github.com/nghttp2/nghttp2/issues/562

@proteneer While per-request load-balancing is of interest for client-side load balancing, the more compelling feature is security and scalability. From a security perspective, we'd prefer not to have certificates on the load balancers in addition to the endpoints. There is operational overhead and key distribution issues that arise in this model. Client-side load balancing, while more complex and expensive for each client, also doesn't incur the cost of transiting a load balancer process.

Digressing, the debate on client-side versus out-of-process load balancing is not one I'm looking to have. Both have their merits. I'm hoping that GRPC's Go API can support this position and work in both scenarios based on the requirements of the application.

@iamqizhao An abstraction over picker would be amazing. Would help me move https://github.com/mwitkow/grpc-proxy (which we use in prod) onto a non-hacked version of gRPC :)

@stevvooe Can you send your email address to me ([email protected]) so that I can share some doc with u?

@iamqizhao Sent.

We've been working with tatsuhiro on nghttp2, to the point where it finally has good support for layer 7 round-robin load-balancing (https://github.com/nghttp2/nghttp2/issues/562) on a per-request level!

@iamqizhao can you please send the current ammended LB proposal my way as well? :)

doing final polishing and it will be out this week.

@iamqizhao sorry, what is the status of this issue? Is there any code/documentation I can read?

Same here, 😕

@flyingmutant @pires fixed in PR #690

@iamqizhao I tried to dig through this thread and PR #690 but it's not clear what the status of this is? I read the points from @stevvooe and I very much agree with all of them. I was just digging through the godoc and I'm a bit surprised at the design of this, it doesn't seem to fit with the rest of GRPC.

My experience is as follows:
I have two GRPC servers at an endpoint, I simply want to load balance roundrobin between the two of them. If one doesn't respond, remove it from rotation for a backoff period. I search the page for load balance, cool I find Balancer. I look at the interface and say, ugh, woah, this must be a low level primitive and has common factories with the Opts() style like the rest of the package for common configuration else where. I figure there must be a way to create a balancer using one or more already-configured clients, or a []string of targets that share a group of Opts. I look for things that return a balancer, and I see RoundRobin. NICE! Then I see it takes a naming.Resolver that is in a separate package.. I understand the purpose of it after looking at the source, seems a bit overkill for my use case, but okay.

I head back over to take a deeper look at Balancer to see if I can short cut implementing the resolver, and I'm just met with more confusion. None of the API resembles the GRPC API client, I don't see any correlation to the client API I am familiar with Dial(target string, opts ...DialOption) (*ClientConn, error) other then the Start(target string, config BalancerConfig) error) having a target string, which has a different signature than dial as it contains the BalancerConfig which seems like a stripped down version of the Options, providing credentials for the entire balancer? This design constraint prevents using various GRPC backends that may have slightly different connection options or even authentication means cross data center, different creds, maybe different client cert auth vs api key, etc, maybe you are migrating company wide authentication standard but some datacenters have less resources then others. These are real world scenarios that people end up with unfortunately despite arguing how "correct" it is, the last person to blame is the guy stuck trying to implement it (we have all been there).

So as someone who just wants to balance requests to 2-3 GRPC basic servers using DNS with some fault tolerance if one goes down.. I feel like I would be implementing all of the logic for load balancing, under the implementation constraints of another authors design in multiple areas. This tax is going to be paid for everyone who is just trying to experiment or use GRPC in a basic POC. My best option to move on is to create a small pool of clients and do my own round robin in a 20 line struct with client := Obtain() defer Recycle(client).

Doing this though I miss all the value of your efforts for the nitty gritty and edge cases. Which let me take a moment to say it's fantastic the API is so robust and has been carefully engineered. Im not saying the design is wrong, I am not familiar enough with the code base to make that statement. I'm just saying it's too intimidating to users, people like me.

All I really want to do is:

primaryConn, err := Dial(...)
secondaryConn, err := Dial(...)
rrConn, err = Balance(WithConn(primaryConn), WithConn(secondaryConn))

// WithDial for cohesion with API
rrConn, err = Balance(WithDial(target, WithCredentials()...), ..)

// POC is running, I want to have a separate DC as a lower weight
dc1Conn, err = Balance(WithConn(primaryConnDC1), .. WithWeight(n))
dc2Conn, err = Balance(WithConn(primaryConnDC2), .., WithWeight(y))
rrConn, err = Balance(WithBalancer(dc1Conn), WithBalancer(dc2Conn))

// POC is running, I want to know when endpoints are down for monitoring
notifyFunc := func(... notify info ...) 
rrConn, err = Balance(WithConns(conns), WithNotify(notifyFunc))

Basically that is what I expected given the fluidity of the rest of GRPC, it covers the basics for 99% of the small teams, quick POC's, etc. Good work over all on GRPC though, don't take this as negative feedback please, just trying to provide a perspective that is difficult for a maintainer, being a dumb user.

Hi All,

I'm trying to use name resolution and lb feature of gRPC for client side load balancing. However, it is not working properly.
Below is the construction of name resolution factory.

public Factory getNameResolverFactory() {

final Attributes NAME_RESOLVER_PARAMS = Attributes.newBuilder()
.set(GrpcNameResolutionLBConstant.RESOLUTION_ATTR, "yeah")
.build();
Attributes attrs = Attributes.newBuilder()
.set(GrpcNameResolutionLBConstant.ATTR_LB_ADDR_AUTHORITY, Constant.HOST + ":" + Constant.PORT)
.build();
final ArrayList EAG = new ArrayList();
SocketAddress addr = new InetSocketAddress(Constant.HOST, Constant.PORT);
EquivalentAddressGroup addrgrp = new EquivalentAddressGroup(addr, attrs);
EAG.add(addrgrp);

final NameResolver.Listener nrlistener = null;

Factory nameResolverFactory = new NameResolver.Factory() {
    @Override
    public NameResolver newNameResolver(URI targetUri, Attributes params) {
        try {
            targetUri = URI.create(Constant.HOST + ":" + Constant.PORT);
            params = NAME_RESOLVER_PARAMS;
        } catch (Exception e) {
            logger.log(Level.SEVERE, "Error: " + e);
        }
        NameResolver nrslvr = new NameResolver() {
            @Override
            public String getServiceAuthority() {
                return Constant.HOST + ":" + Constant.PORT;
            }

            @Override
            public void start(NameResolver.Listener listener) {
                listener = new NameResolver.Listener() {
                    public void onUpdate(List<ResolvedServerInfoGroup> servers, Attributes attributes) {
                        throw new UnsupportedOperationException("Not supported yet.");
                    }

                    public void onAddresses(List<EquivalentAddressGroup> servers, Attributes attributes) {
                        servers = EAG;
                        attributes = NAME_RESOLVER_PARAMS;
                    }

                    public void onError(Status error) {
                        logger.log(Level.SEVERE, "onError called: " + error);
                    }
                };
                listener.onAddresses(EAG, NAME_RESOLVER_PARAMS);
            }

            @Override
            public void shutdown() {
                throw new UnsupportedOperationException("Not supported yet."); //To change body of generated methods, choose Tools | Templates.
            }
        };
        nrslvr.start(nrlistener);
        return nrslvr;
    }

    @Override
    public String getDefaultScheme() {
        return "defaultscheme";
    }
};
return nameResolverFactory;

}

Alongwith name resolution, using rrlb for load balancing.
RoundRobinLoadBalancerFactory.getInstance()

Things are working fine when I exclude nameResolverFactory.

Can someone help me?

P.S. Using NettyChannelBuilder.

@pvox, this is the grpc-go repo, but it looks like this is about java.

If you're looking for help, you may want to start by asking your question in https://groups.google.com/forum/#!forum/grpc-io. The gRPC team monitors that list.

Thanks @dfawley

Was this page helpful?
0 / 5 - 0 ratings