Go: net/http: new HTTP client package

Created on 5 Feb 2018  ·  34Comments  ·  Source: golang/go

Tracking bug for higher-level more usable HTTP client functionality.

  • take contexts
  • more timeout control per-request, rather than per-transport/client
  • ...Options pattern?
  • decode into JSON/etc easily?
  • treat non-2xx as error (by default?).
  • slurp entire response from server (by default?)
  • control over max response size (some sane default limit when slurping is enabled)

I thought we had a bug for this, but maybe not.

NeedsInvestigation Thinking

Most helpful comment

The magic-method interfaces (Flusher, Hijacker, Pusher, etc.) are pretty unpleasant: they make the documentation a lot harder to navigate and understand, and make semantically-correct wrappers nearly impossible (you have to write O(2ⁿ) wrapper types for n extension interfaces).

(See also #21904 and #16474.)

Ideally we should not only fold in the existing magic-method interfaces, but also have a plan to avoid the need for them going forward. (Perhaps use structs of functions instead?)

All 34 comments

This is a pattern in a test client I wrote:

// or Post, PostForm
r, err := client.Get(addr)
if err != nil {
    panic(err)
}
// do some parsing here then discard the rest
_, err = io.Copy(ioutil.Discard, r.Body)
if err != nil {
    panic(err)
}
err = r.Body.Close()
if err != nil {
    panic(err)
}
if r.StatusCode != http.StatusOK {
    panic(r)
}

Are you looking at adding new identifiers?

type BodyParser func(io.Reader) error

// http.ErrBadStatus for non-2xx, discards unread response body, and closes the connection.
func (c *Client) ClosingGet(url string, resp BodyParser) error

// repeated for the other methods
var t MyStruct
err := c.ClosingGet(addr, http.JSONBodyParser(&t))

Are you looking at adding new identifiers?

There would be new API for this, yes. I don't have a concrete proposal yet.

We wrote a wrapper for the http client internally that uses the ...Options pattern and achieves a lot of these goals:

https://github.com/AlphaFlow/gohttp

GET with parameters:

cli := http.NewClient()

var ret map[string]interface{}
err := cli.Get(context.Background(),
    "http://example.com",
    http.WithParam("debug", "1),
    http.WithJSONResponse(&ret))

POST with body:

ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
defer cancel()

body = map[string]string{
   "hello": "world",
}
err := cli.Post(ctx, "http://example.com",
    http.WithJSONBody(body),
    http.WithHeader("Authorization", "Bearer my-token"))
if bse, ok := err.(*http.BadStatusError); ok && bse.Code == 404 {
    fmt.Println(bse)
}

Mocking the HTTP request:

    return http.NewMockClient(func(ctx context.Context, req *http.Request) error {
        if e != nil {
            return e
        }
        if len(body) == 0 {
            return nil
        }
        if req.Output != nil {
            _, err := req.Output.Write(body)
            return err
        } else if req.JSONOutput != nil {
            return json.Unmarshal(body, req.JSONOutput)
        } else {
            panic("no place to write response")
        }
    })

This has greatly simplified our error handling/json marshaling.

Quirks

  • Mocking the client using this interface is a bit weird -- the mock needs to know an awful lot about how the code works. This could be simplified a bit, maybe.

    • We inherit standard go http client "follow redirect" idiom, when it would be nice for that to be an option.

I've started working on this.

See:

I'll move a lot of that to golang.org/x/exp/http* at some point here.

Any problems with or changes to tracing to consider? Is https://golang.org/pkg/net/http/httptrace/ still the desired interface for tracing if there's a chance for change?

@danp, thanks. I'll include a section about that. It's mostly okay as-is AFAIK and should ideally continue to work but if you have issues with it, do share.

Maybe that would be a good opportunity to fix one thing that really bugs me. Right now if request fails due to context being cancelled, you get url.Error that wraps context.Cancelled error. It's very annoying when that error is deep inside the call stack wrapped multiple times with something like github.com/pkg/errors. I don't know what is the convention here but it looks to me that context.Cancelled should be returned directly in every call that accepts context without any wrappers.

And in general, it's weird that http package uses url.Error and why the latter is even in the url package. The package description says "Package url parses URLs and implements query escaping" and everything apart from url.Error does relate to URL parsing. Maybe http client should provide it's own error type, even if it's similar to already existing url.Error.

Just a while ago I made a small facade w/ sane defaults whose varargs design may be of interest: https://github.com/function61/gokit/tree/master/ezhttp

edit: my idea seems to be quite similar to what @awreece already posted

Example:

res := MyResponseType{};
_, err := ezhttp.Get(
    ctx,
    "https://example.com/api/getstuff",
    ezhttp.AuthBearer("myToken"), // this and below are varargs "config mutators"
    ezhttp.RespondsJson(&res),
    ezhttp.TolerateNon2xxResponse)

More examples in test file

Mutators my current ezhttp implements:

  • Header(key, val)
  • Cookie(cookie)
  • AuthBasic()
  • AuthBearer() // this is common enough use case to justify presence in HTTP package IMO
  • SendJson(ref interface{})
  • SendBody(body io.Reader, contentType string) // for raw data
  • RespondsJson(ref interface{}, allowUnknownFields bool)
  • TolerateNon2xxResponse // by default errors if response not 2xx

I feel like Context as a mechanism for cancelling, and Context as a way to pass data along with a request, probably ought to be different things, not the same object/parameter. I'm not sure whether that's what is meant by noting that Request.WithContext is slow?

@lpar, redesigning context is out of scope for this bug. In practice, you end up always needing both in big systems (for tracing), and passing two context-like things around everywhere is even grosser than passing one. So one it is. That doesn't affect the performance. WithContext is slow because of the defensive garbage it makes. If you could run a request with a context to begin with (Go 1 didn't accept one), then we wouldn't need WithContext.

While I see some examples for using options when sending the request, I often find that in my programs I want to configure those once at startup directly on the client.

Thats why we wrote https://github.com/classmarkets/cmhttp.

Example:

client := cmhttp.Decorate(
    http.DefaultClient,
    cmhttp.Scoped("https://api.example.com"),
    cmhttp.Typed("application/json"), // or cmhttp.JSON()
)

req, err := http.NewRequest("GET", "/v1/places/de.berlin", nil)
if err != nil {
    panic(err)
}

resp, err := client.Do(req)

Like the examples of @awreece and @joonas-fi this uses varargs to let the user pick the options they want but instead of doing this on a per request basis this decorates the client. Just thought I might share that here for more inspiration for your redesign :)

Are we open to consider respecting proxy env by default?

Are we open to consider respecting proxy env by default?

Seems reasonable.

The magic-method interfaces (Flusher, Hijacker, Pusher, etc.) are pretty unpleasant: they make the documentation a lot harder to navigate and understand, and make semantically-correct wrappers nearly impossible (you have to write O(2ⁿ) wrapper types for n extension interfaces).

(See also #21904 and #16474.)

Ideally we should not only fold in the existing magic-method interfaces, but also have a plan to avoid the need for them going forward. (Perhaps use structs of functions instead?)

I share many points enumerated in the _Problems_ document and would like to drill a bit more into the point _Client vs. Transport distinction_:

The current separation between _Request_ and _Client_ regularly leads to modifying per-request scope properties (e.g. query parameters, headers, body, ...) (using a builder pattern with late/lazy error handling would be nice) but also modifying client scope properties (e.g. base url for single host API endpoints, user-agent, timeouts, outgoing request and incoming response logging, request id or tracing span tracking, API token and refresh logic, retry logic, ...). All of these client modifications have to be done in a nested way, similar to a server's middleware chaining, using the _Transport_ while being careful to not mix _Transport_ struct and _RoundTripper_ interface.

Is the idea of the _Handler_ to solve these client modifications or maybe even combining request and client modifications?

As a real-world example (_Experience Reports_?), we use the following pattern for client/_Transport_ middleware:

func NewClient(logger *log.Logger, opts ...TransportOption) *http.Client {
    c := DefaultClient()

    c.Transport = NewTransport(opts...)
    c.Transport = NewLoggingRoundTripper(c.Transport, logger)
    c.Transport = NewSpanIDRoundTripper(c.Transport)

    return c
}

// DefaultClient returns a new http.Client with similar default values to
// http.Client, but with a non-shared Transport, idle connections disabled, and
// keepalives disabled.
func DefaultClient() *http.Client {
    return &http.Client{
        Transport: DefaultTransport(),
    }
}

And the following pattern for _Request_ building:

func NewClient(httpClient *http.Client, opts ...RequestOption) *Client {
    return &Client{
        HTTPClient:  httpClient,
        RequestOpts: opts,
    }
}

func (c *Client) WithOptions(opts ...RequestOption) *Client {
    c.RequestOpts = append(c.RequestOpts, opts...)
    return c
}

func (c *Client) WithContext(ctx context.Context) *Client {
    c.RequestOpts = append(c.RequestOpts, WithContext(ctx))
    return c
}

func (c *Client) WithBaseURL(u string) *Client {
    c.RequestOpts = append(c.RequestOpts, WithBaseURL(u))
    return c
}

func (c *Client) WithHeader(header, value string) *Client {
    c.RequestOpts = append(c.RequestOpts, WithHeader(header, value))
    return c
}

func (c *Client) Do(req *Request, opts ...RequestOption) (*http.Response, error) {
    if c.err != nil { // abort if already in error state
        return nil, c.err
    }

    oo := c.RequestOpts
    oo = append(oo, req.Opts...)
    oo = append(oo, opts...)

    for _, o := range oo {
        if req.err != nil { // abort if already in error state
            return nil, req.err
        }

        o(req)
    }

    if req.err != nil { // abort if already in error state
        return nil, req.err
    }

    return c.HTTPClient.Do(req.Request)
}

We also use a lot the customisation of Transport much like @djui. We also use this to implement caching and client instrumentation for example. I do agree that better control over single request is needed, especially for timeout but I think having a way to configure client(I am not necessarily attached to RoundTripper though) should be kept in the final design.

@bcmills, I agree, but this is a bug about the HTTP client, not server. I've pushed back on optional interfaces for some time now in all packages, though, and don't plan on using them here.

Dear @bradfitz my wishlist:

1) current internal net and net/http should expose an API to be able to build your own connection pool and connection handling and put into a new client/server as configuration similar to the current transport for example. Use case: replacing the requests queue with a stack for example
2) exported error types, instead of return fmt.Errorf(..)
3) document the net error Temporary() and Timeout(), such that you can be sure how to find out a connection error on connection establishment from one that happened after you wrote HTTP data to the socket. This is quite important for retries in proxy servers.
4) a working h2c implementation, that can be easily enabled, because often you internally don't need the complexity of a rotation TLS infrastructure which is a beast in itself. I tried to use internal/directory to hack around this, but it's quite complex to do if even possible.

I know some of them might not be httpclient related, but if you can fix or enable some of them while changing httpclient would be really nice.

A feature of Python's requests and some Ruby http clients I have seen is to set up authentication once and specify a base URL to combine with paths (e.g. for different host endpoints for stage vs. prod). I don't think these should be direct features of this library, but it should be designed so that it's easy to wrap the client with something modifies the requests before sending them. With that, a lot of API client libraries out there could be even thinner wrappers than they are now, and just provide a set of JSON structs and a function that takes authentication keys and returns a working httpclient.

What I don't understand from http client is why each request must have full URL with domain name?

Why not,

cl, err := http.NewClient("http://example.com")

rawsock, err := cl.Connect(addr)
res, err := cl.Get("/api", qparams)
res, err = cl.Post("/api/post?k=v", nil, body)

// And so on...

?

@shuLhan Your example seems to hint at a single-host client, which I would consider a specialization, to which optimizations can be done, especially regarding connection pooling. The provided HTTP should be generic and stateless between requests, imho.

@djui true, but it could also be working on the return value of Connect() and hint to some general optimization that could be done in protocol aware connections, for example Keep-Alive in case of http/1.1. Not sure if it makes sense or I miss something else here.

Difficulty with magic methods and lack of h2c are big thorns with the existing implementation. Cleaner interfaces and support for server handler middleware would also be good.

Please provide a way to write to the request body. One example of where this is useful is writing a multipart request body using large files to stored on disk. It's possible to get a writer on the request body using io.Pipe and a goroutine, but it would be nice of the client handled these details or if the client plumbed a writer more directly to the network.

Here's a proposed API for the bradfitz/exp-httpclient:

// BodyFunc specifies the request body using a function.
// The HTTP client calls the function to write the request
// body to an io.Writer. The function may be called 0, 1
// or multiple times.
func (r *Request) BodyFunc(fn func(io.Writer) error) *Request {
    panic("TODO")
    return r
}

better multipart support out of the box

@shuLhan Your example seems to hint at a single-host client, which I would consider a specialization, to which optimizations can be done, especially regarding connection pooling.

Please forgive my ignorance, but is not that what _a_ client should do? Unless we are writing a browser, I can't think of any use case where one client can or should handle connection to several hosts (except redirection).

From my point of view, which I am not familiar with the underlying HTTP client code, there are two layer of abstraction that the current HTTP client trying to handle: client layer and connection pooling/management layer; and IMO they should be separated. Case in example: in database, there is an API to create single client connection and there is an API to create connection pooling. Which one user want, its depends on the use case.

The provided HTTP should be generic and stateless between requests, imho.

_generic_. There are eight HTTP methods since RFC7231 (without PATCH) and I don't think it will be changed in the next ten years. The only thinks that dynamic in client is request query parameter, headers, and body; but, as bradfitz already mentioned, the Request object contains both fields required by client and server which cost allocated struct size when working only on client.

_stateless between requests_. I am not sure what does that means, since HTTP is build on top of TCP and current HTTP client is synchronous.

@shuLhan I guess I was thinking of other use cases such as scrapers or simpler programs/scripts.

Not sure if the httpclient is the right way to address this, but if you just spawn a bunch of goroutines to do a bunch of HTTP requests, you can easily eat through your file descriptor limit and crash your machine. Would a concurrency limit be appropriate in a the new httpclient? If not, how could the client make it easy to add a semaphore wrapper?

@carlmjohnson isn't existing http client already gives you all the necessary dials? You can limit keep-alive and concurrent connections to prevent file descriptor exhaustion.

You can set DefaultTransport.MaxConnsPerHost, but AFAIK, there's no way to set a limit if you're scraping multiple hosts. I could be wrong.

@shuLhan Your example seems to hint at a single-host client, which I would consider a specialization, to which optimizations can be done, especially regarding connection pooling.

Please forgive my ignorance, but is not that what _a_ client should do? Unless we are writing a browser, I can't think of any use case where one client can or should handle connection to several hosts (except redirection).

@shuLhan
Think about a reverse proxy case that handles multiple backends.
Of course you don't want to have a client for each backend and if it would make sense, then you would need to fix https://github.com/golang/go/issues/23427 or we would have to have for each different http client a goroutine to make DNS requests happen after a while. I am also not sure about the memory footprint of a http client, which could increase of running a reverse proxy in Go.

I just had another idea for the http client usage.
What about a usage like this:

response, err := http.Config({}).Client({}).Request({}).Do({})
cfg := http.Config({Version: http.Version11, Transport: {}, ConnectionPool: {},...})
// or use a default and override
cfg2 := http.DefaultConfig()
cfg2.Transport = ...
cfg2.Client({})

http.Client({ResponseTimeout: 3*time.Second, ...})
cli := http.DefaultClient()
cli.ResponseTimeout = 5*time.Second

Do(
     OnData: func() {},
      ...handlers...
) Response, error 

Maybe handlers to help pipelining large response bodies, that would be efficient in memory handling. Think of a list of Pods and you need only a fraction of data, in your data collection to process in your program. We could easily enable users to use json Encode/Decode, and not read the full Body and use Marshal/Unmarshal.

Error handling would need to be done with returning nil and check nil on itself and do some noop if it's nil:

// http.Config({}).Client({}).Request({}).Do({})
func Config(cfg *Cfg) {} *Cfg {
    if cfg == nil {
        return &DefaultConfig
   }
}

func (cfg *Cfg) Client(cli *Client) *Client {
   if cfg == nil { // this would be done for all other following func receivers to do "error checking"
        return nil
   }
   ... // default cli..
}

...

One more thing, I realized today:
I would like to have client connection pool data accessible for observability. For example:
How many idle connections do we have for a host and in general?

@bradfitz any update or timeline on this?

Was this page helpful?
0 / 5 - 0 ratings