Crystal: Redesign HTTP::Client to implement missing features

Created on 25 Apr 2018 · 15Comments · Source: crystal-lang/crystal

HTTP::Client is missing some quite essential feature, and while there are some individual issues about that, it is better to discuss necessary design changes in a central place.

Some missing features:

Proxy support #2963
Connecting to Unix socket #2735
Transparently re-use connections to different servers
HTTP/2 #2125
Mock connections (for testing)
Following redirects #2721
Session handling #3081
Integration of custom middlewares (for example authentication)
Timeout
Stream debugging #6335

The first group is about adding more flexibility to use different types of connections or enhance their usage. This needs to separate the implementation for establishing a connection from the HTTP::Client.

6001 is an incomplete experiment for that.

The concept of using a configurable transport is employed by many other implementations of HTTP clients:

Go: RoundTripper (interface request -> response) and Transport (implementation, including proxy, reusing connections, HTTP/2 and a lot of basic HTTP workflow like redirects).
Java: com.google.api.client.http.HttpTransport: builds a request (method, url -> request ) and request.execute() gets the response.
Ruby: Faraday implements a middleware interface similar to HTTP::Handler.
Ruby: Hurley has a connection property which is a lambda request -> response

It makes great sense to have a HTTP::Client as a high-level interface delegate the heavy-lifting to a replaceable low-level API. User should be able to configure HTTP::Client to use a specific transport.

The default transport should allow to connect to any TCP server and maintain a connection pool to re-use idle sockets, even beyond the life-span of a single client instance. Other transports could connect to a proxy, a Unix socket or just mock a connection for testing purposes.

It is a question how the interface between HTTP::Client and transport should be designed. In the experimental PR #6001 the transport returns an IO for a specific request, but the client still writes the request and reads the response.
I'm more tending towards making the interface as simple as Request -> Response. This would obviously move most of the implementation of HTTP::Client to the transport, but I don't think that's a bad thing to move the internal heavy-lifting to the lower level. The interface is also more versatile because it allows the transport to manipulate the request before writing/response after reading. This is for example necessary for proxy requests, because they need the full URL as resource. An additional benefit is that requests and responses don't necessarily need to be sent through an IO. A transport can for example implement an in-memory client cache or directly create responses for non-HTTP protocols such as file.

Such a simple interface would also make it easy to chain transports for more flexibility and might be good place to inject middleware.

This brings us to the second group of missing, more high-level features. Some HTTP clients such as Faraday allow to inject custom middleware to the request-response-chain which can implement for example authentication or following redirects etc.

I am convinced that essential features such as following redirects, basic auth etc. should not require a middleware handler, but be implemented directly either in the low- or high-level API. They need to be optional and configurable, but I wouldn't like to see a HTTP::Client setup using a herd of middleware handlers just to provide basic HTTP. That should only be used for stuff that's on-top (for example OAuth) and can't reasonably be included in the default client implementation.

Related: #1330 (socket factory as transport)

feature discussion stdlib

Source

straight-shoota

❤10 👍8

Most helpful comment

Not yet, unfortunately. But I'd like to move on with this. Possibly for 0.29.0.

straight-shoota on 15 Apr 2019

👍9

All 15 comments

Yes, that proxy support would be amazing!

jwoertink on 26 Apr 2018

👍1

@straight-shoota Just a suggestion; can you use checkboxes instead of bullet points in your OP? it will make it easier to follow the progress in individual points, instead of checking each item individually.

BTW, really love this list of proposed improvements. proxy and http/2 support are the biggest ones IMO. For basic auth, i am more used to the idea of using a middleware, ala Node but i will be fine with handling it in a separate layer.

rishavs on 8 Nov 2018

👍1

@rishavs We're currently talking about rough design. The implementation of individual features is a far way to go.

straight-shoota on 14 Nov 2018

But I'd like to see this discussion moving forward.

In my opinion, the exposed API of HTTP::Client needs to be dead simple but build on a complex backend. It should be as easy as running curl or accessing a website in a browser. In almost all cases, the application doesn't care about how a HTTP response is acquired. Does it create a new TCP connection or re-use an existing one? Is it a simple HTTP/1.1 protocol or multiplexing through HTTP/2 -- at some point even HTTP/3 HTTP::Client should simply use the best available method supported by the server.
This is obviously a very sophisticated goal and will take a long time to reach, most likely till after 1.0.

Maybe this complex implementation doesn't even need to be the default - or even included in the stdlib at all. But the stdlib HTTP::Client API should be designed in such a way that the transport adapter is easily exchangeable. Its interface should be simple enough (Request -> Response) and don't assume anything about connection details. This makes it easy to improve the implementation step by step. The start would be more or less a simple, one-shot connection implementation for TCP/IP. It might even keep the connection alive and reuse it, if the next request is for the same host. Otherwise, a new connection should be established. Proxy and Unix sockets are further feature improvements and relatively easy. A big step would later be a reusable connection pool and HTTP/2. Eventually even HTTP/3.

straight-shoota on 14 Nov 2018

👍6

Has there been any movement recently on UNIX socket support?

PercussiveElbow on 15 Apr 2019

Not yet, unfortunately. But I'd like to move on with this. Possibly for 0.29.0.

straight-shoota on 15 Apr 2019

👍9

But I'd like to see this discussion moving forward.

+1 from me (a happy crystal user who started missing persistent HTTP connections today 😞).

It should be as easy as running curl

Perhaps it could make sense to rebuild HTTP::Client on top of libcurl?

I don't know how core team would feel about introducing an external dependency
for something as central as HTTP, but in my experience I end up reaching for the
curl-binding in most languages most of the time anyway (e.g. typhoeus or PycURL),
because the stdlib client usually lacks in features and/or performance.

m-o-e on 17 Dec 2019

@m-o-e I think this has been brought up in the discussion before, and it probably won't work to integrate libcurl into Crystal's event loop.

straight-shoota on 17 Dec 2019

@straight-shoota Ah okay, understood. Sorry for the duplicate!

m-o-e on 18 Dec 2019

That would be great to have outgoing IP bind from HTTP::Client which tcpsocket have already. Its really useful when work with API's req/hour limits based on IP.

yorci on 3 Jun 2020

👍1

Are any of these, part of the 1.0 plan?

rishavs on 15 Oct 2020

@rishavs no

asterite on 15 Oct 2020

😕2

In almost all cases, the application doesn't care about how a HTTP response is acquired. Does it create a new TCP connection or re-use an existing one?

For this specific point (reuse connections) I think the application should be aware of what is happening. Sockets are a limited resource and app should control how they use it. Coming from Python (where the most used HTTP lib is requests), I think a concept of session should be good:

session = HTTP::Session.new
session.get("https://crystal-lang.org")

Inside a session, calls reuse connections (if possible), but simple calls with HTTP::Client should not.

erdnaxeli on 23 Nov 2020

Most apps really don't need to care about that. It's good to have a low-level API for managing connections manually. But for typical use cases you just want to send HTTP requests from different parts of an application without having to drag a session object around. HTTP connection management can easily be abstracted away.

straight-shoota on 24 Nov 2020

But if we do that I am worried about an app leaking sockets without control over them…

I went look at how other do it: Ruby doesn't do that by default, Python neither does it, Golang does it by default (you must instantiate a client to change it).

So ok why not, the default behavior would persist connections, and if you want to change it you would need to instantiate a client object and change the parameters.

erdnaxeli on 24 Nov 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings