Etcd: Setting cfg.DialTimeout makes clientv3.New blocking

Created on 10 Jun 2018  路  6Comments  路  Source: etcd-io/etcd

Constructing a client with clientv3.New() without cfg.DialTimeout set is a non-blocking call if the etcd server is not available.

https://github.com/coreos/etcd/blob/bb744f6d2b2715ec4aac696e0de51c75955334a8/clientv3/client.go#L427-L448

It is very counter-intuitive that setting DialTimeout makes New() a blocking call (and requires the server to be available).

Is there a reason that clientv3.New() must behave differently when DialTimout is set? I would expect the dial timeout to take effect whenever dialing, but not force dialing (and block until failure) in New(). There is no mention of that behavior change in the documentation for that config option as well:

https://github.com/coreos/etcd/blob/bb744f6d2b2715ec4aac696e0de51c75955334a8/clientv3/config.go#L33-L34

We found this unexpected behavior in an application which did the following:

  • set up clients
  • starts embedded etcd
  • passed constructed clients to workers

In this scenario, without DialTimeout set, things worked fine. With DialTimeout set, the application failed because the blocking New() client construction prevented the embedded server from starting.

Obviously, we can reorder to start the embedded server first, but making New() blocking still slows down startup, since we construct several clients that previously did their work in the background. Now they all block until their balancer is ready, which happens serially.

areclientv3 aredoc

Most helpful comment

In case others have the same problem, here's a workaround that calls the etcd client code in a goroutine, returning the client via channel, and utilizes a timeout in case the etcd client code blocks forever:

package main

import (
    "context"
    "errors"
    "time"

    "go.etcd.io/etcd/clientv3"
)

func main() {
    cli, err := NewClient()
    if err != nil {
        panic(err)
    }
    _, err = cli.Put(context.Background(), "foo", "bar")
    if err != nil {
        panic(err)
    }
}

// NewClient creates a new etcd client and takes care of a timeout
func NewClient() (*clientv3.Client, error) {
    // The behaviour for New() seems to be inconsistent.
    // It should block at most for the specified time in DialTimeout.
    // In our case though New() doesn't block, but instead the following call does.
    // Maybe it's just the specific version we're using.
    // See https://github.com/etcd-io/etcd/issues/9829#issuecomment-438434795.
    // Use own timeout as workaround.
    // TODO: Remove workaround after etcd behaviour has been fixed or clarified.

    config := clientv3.Config{
        Endpoints:   []string{"localhost:2379"},
        DialTimeout: 2 * time.Second,
    }

    errChan := make(chan error, 1)
    cliChan := make(chan *clientv3.Client, 1)

    go func() {
        cli, err := clientv3.New(config)
        if err != nil {
            errChan <- err
            return
        }
        statusRes, err := cli.Status(context.Background(), config.Endpoints[0])
        if err != nil {
            errChan <- err
            return
        } else if statusRes == nil {
            errChan <- errors.New("The status response from etcd was nil")
            return
        }
        cliChan <- cli
    }()

    select {
    case err := <-errChan:
        return nil, err
    case cli := <-cliChan:
        return cli, nil
    case <-time.After(3 * time.Second):
        return nil, errors.New("A timeout occured while trying to connect to the etcd server")
    }
}

All 6 comments

@liggitt

Is there a reason that clientv3.New() must behave differently when DialTimout is set?

We wanted to simulate grpc.WithBlock in etcd layer. Current behavior is 1) no dial timeout to block forever until connection up, and 2) dial timeout to wait up to a fixed amount of time until connection up.

There is no mention of that behavior change in the documentation for that config option as well:

Agree that this should be documented better. We are almost done with client v3 balancer rewrite, and will make sure to document this, when it gets merged.

Current behavior is 1) no dial timeout to block forever until connection up, and 2) dial timeout to wait up to a fixed amount of time until connection up.

the current behavior of client.New is "no dial timeout means New doesn't block and connection attempt happens in the background"

"no dial timeout means New doesn't block and connection attempt happens in the background"

Yes, this is accurate. Will improve our docs. Thanks!

The initial issue described that New() with a DialTimeout lead to a blocking call, while New() without a DialTimeout lead to no blocking.

I'm currently experiencing a problem where I do use a DialTimeout, but the behaviour is completely different from the description:

  1. The code doesn't block at the New() call, but instead in the following cli.Status() call
  2. The code blocks indefinitely, so the DialTimeout of 2 seconds doesn't seem to have any effect

This is all while I have no etcd instance running. As soon as I start one, everything works fine. But I need to be able to handle the case that my etcd instance is down, and preferrably without having to manually use a goroutine and timer to do this.

Versions:

git log first entry in "src\go.etcd.io\etcd": Commit ae25c5e1320f731a2ffaafbf756aca8b0a94dfab

Code to reproduce:

package main

import (
    "context"
    "fmt"
    "time"

    "go.etcd.io/etcd/clientv3"
)

func main() {
    config := clientv3.Config{
        Endpoints:   []string{"localhost:2379"},
        DialTimeout: 2 * time.Second,
    }
    cli, err := clientv3.New(config)
    if err != nil {
        fmt.Println("connection error 1")
    }
    fmt.Println("got the client")
    statusRes, err := cli.Status(context.Background(), "localhost:2379") // Waits here indefinitely
    if err != nil || statusRes == nil {
        fmt.Println("connection error 2")
    }
    fmt.Println("connection ok")
}

Note: I'm a Go beginner

In case others have the same problem, here's a workaround that calls the etcd client code in a goroutine, returning the client via channel, and utilizes a timeout in case the etcd client code blocks forever:

package main

import (
    "context"
    "errors"
    "time"

    "go.etcd.io/etcd/clientv3"
)

func main() {
    cli, err := NewClient()
    if err != nil {
        panic(err)
    }
    _, err = cli.Put(context.Background(), "foo", "bar")
    if err != nil {
        panic(err)
    }
}

// NewClient creates a new etcd client and takes care of a timeout
func NewClient() (*clientv3.Client, error) {
    // The behaviour for New() seems to be inconsistent.
    // It should block at most for the specified time in DialTimeout.
    // In our case though New() doesn't block, but instead the following call does.
    // Maybe it's just the specific version we're using.
    // See https://github.com/etcd-io/etcd/issues/9829#issuecomment-438434795.
    // Use own timeout as workaround.
    // TODO: Remove workaround after etcd behaviour has been fixed or clarified.

    config := clientv3.Config{
        Endpoints:   []string{"localhost:2379"},
        DialTimeout: 2 * time.Second,
    }

    errChan := make(chan error, 1)
    cliChan := make(chan *clientv3.Client, 1)

    go func() {
        cli, err := clientv3.New(config)
        if err != nil {
            errChan <- err
            return
        }
        statusRes, err := cli.Status(context.Background(), config.Endpoints[0])
        if err != nil {
            errChan <- err
            return
        } else if statusRes == nil {
            errChan <- errors.New("The status response from etcd was nil")
            return
        }
        cliChan <- cli
    }()

    select {
    case err := <-errChan:
        return nil, err
    case cli := <-cliChan:
        return cli, nil
    case <-time.After(3 * time.Second):
        return nil, errors.New("A timeout occured while trying to connect to the etcd server")
    }
}
Was this page helpful?
0 / 5 - 0 ratings