Hyper: Hyper client is permanently broken after "Too many open files" error

Created on 21 Jan 2018 · 11Comments · Source: hyperium/hyper

If you make many parallel requests with a Hyper client then you can run into "Too many open files" operating system errors. Once such an error occurs the Hyper client is "tainted" and cannot make a successful request anymore. Even if enough ports are available.

Steps to reproduce:

Have some server like Apache running on localhost port 80.
Limit the number of allowed open file descriptors with ulimit -n 50
Run the following program:

extern crate futures;
extern crate hyper;
extern crate tokio_core;

use hyper::{Client, Uri};
use futures::future::{join_all, loop_fn, Future, Loop};
use tokio_core::reactor::Core;

fn main() {
    let mut core = Core::new().unwrap();
    let client = Client::new(&core.handle());

    let url: Uri = ("http://localhost/").parse().unwrap();

    let nr_requests = 30_000;
    let concurrency = 1000;

    let mut parallel = Vec::new();
    for _i in 0..concurrency {
        let requests_til_done = loop_fn(0, |counter| {
            client
                .get(url.clone())
                .then(move |_| -> Result<_, hyper::Error> {
                    if counter < (nr_requests / concurrency) {
                        Ok(Loop::Continue(counter + 1))
                    } else {
                        Ok(Loop::Break(counter))
                    }
                })
        });
        parallel.push(requests_til_done);
    }

    let work = join_all(parallel);
    core.run(work).unwrap();

    let work = client.get(url.clone()).map(|res| {
        println!("Response: {}", res.status());
    });
    core.run(work).unwrap();
}

Although the huge amount of parallel request is done after the first core.run() the second core.run() panics with an error Io(Os { code: 24, kind: Other, message: "Too many open files" }). But it should not panic because enough ports are available then.

This seems to be a sister problem to #1358 where the same happens when running a hyper server and it runs out of available file descriptors.

I think this is an underlying Tokio problem, but I could not track it down yet. Any tips how you can use a Hyper client in a robust way to avoid this? My use case is a proxy server where I don't want to spawn new client Tokio event loops all the time just because I ran out of file descriptors at some point.

A-client B-upstream

Source

klausi

👍1

Most helpful comment

I can now reproduce the problem with the hello.rs Hyper server. The client works fine if you run the program from the op with the URL http://127.0.0.1:3000/ but it fails as described when the URL is http://localhost:3000/. So it seems to me the DNS lookup code in the Hyper client might do something wrong.

At least I'm relieved that this is not an Apache specific problem, sorry for the confusion.

klausi on 28 Jan 2018

👍3

All 11 comments

What do you mean the hyper client becomes tainted? Are you sure the sockets had been closed before trying to open a new socket?

You mention that the second call to core.run() panics, but does it panic inside, or is the unwrap() you have right there? I believe the future from client.get should just return to you that IO error, and so you can handle that situation yourself.

seanmonstar on 22 Jan 2018

By tainted I mean that the client is not usable any more. Performing requests on the tainted client will always yield IO errors although there should not be IO errors.

Yes, I think the sockets are closed because if I run the same example with a second fresh client then the IO error does not occur:

extern crate futures;
extern crate hyper;
extern crate tokio_core;

use hyper::{Client, Uri};
use futures::future::{join_all, loop_fn, Future, Loop};
use tokio_core::reactor::Core;

fn main() {
    let mut core = Core::new().unwrap();
    let client = Client::new(&core.handle());

    let url: Uri = ("http://localhost/").parse().unwrap();

    let nr_requests = 30_000;
    let concurrency = 1000;

    let mut parallel = Vec::new();
    for _i in 0..concurrency {
        let requests_til_done = loop_fn(0, |counter| {
            client
                .get(url.clone())
                .then(move |_| -> Result<_, hyper::Error> {
                    if counter < (nr_requests / concurrency) {
                        Ok(Loop::Continue(counter + 1))
                    } else {
                        Ok(Loop::Break(counter))
                    }
                })
        });
        parallel.push(requests_til_done);
    }

    let work = join_all(parallel);
    core.run(work).unwrap();

    let mut core2 = Core::new().unwrap();
    let client2 = Client::new(&core2.handle());

    let work = client2.get(url.clone()).map(|res| {
        println!("Response: {}", res.status());
    });
    core2.run(work).unwrap();
}

Instantiating a new core2 and client2 works, there are no IO errors when performing the request.

Panics: Sorry, the first program from above panics because of the unwrap() of course. Because I get an IO error that should not be there.

So a primitive solution to this problem is to catch IO errors on Hyper clients, then throw the Tokio core and the hyper client away, create new instances of them and then perform requests.

klausi on 26 Jan 2018

What are you using for a server? I just tried this against the hello world server in hyper (and the server did actually fall over from too many files open, but I added a little of code to protect the server) and didn't see any error...

I do notice that in the loop_fn, you use then, which will be give a hyper::Result<Response>, and then drop it. I wonder if that result includes the error as well...

seanmonstar on 27 Jan 2018

I'm using the default Apache installation on Ubuntu 16.04, which listens on localhost port 80 and just delivers a static HTML file from /var/www/html/index.html.

I tried to reproduce this with the hello.rs example from Hyper as well, but the client works as expected in that case. Which could mean that the Apache server does something differently - maybe keeping TCP connections open to the client or similar?

In the loop_fn: Yes, during the request flood the same IO error "Too many open files" starts to appear, I just ignore it there. I know that during the flood this error can happen. The interesting part is that once the flood is over and I send a single request with the same client to Apache it still errors.

klausi on 28 Jan 2018

At least I'm relieved that this is not an Apache specific problem, sorry for the confusion.

klausi on 28 Jan 2018

👍3

Thanks to knowing it was DNS related, I've done a bunch of digging, and determined that the EMFILE seems to be remembered by subsequent calls to lookup the address on the same thread. I don't yet know if this is a some cached info in getaddrinfo, or related to the libc::res_init call when the resolution fails. Sharing the same CpuPool of 1 thread even in a new client triggers the error, but creating a new one for the second client doesn't see the error.

seanmonstar on 29 Jan 2018

I'll see if I can reproduce this with just std (unless someone like to beat me to it), and if so, I'll file an issue on the Rust repo.

seanmonstar on 1 Feb 2018

Filed at https://github.com/rust-lang/rust/issues/47955

seanmonstar on 2 Feb 2018

Thanks a lot Sean! My workaround for my proxy use case is to hard-code 127.0.0.1 instead of host names for now. That way I can avoid dead Hyper clients because of outdated DNS errors.

klausi on 4 Feb 2018

According to some more info in the upstream bug, it looks to be a bug in some versions of libc. As such, I'm going to close as there's not much more we can do here.

seanmonstar on 11 Feb 2019

Thank you, I just now realized this even bugfix happened on my birthday!!! "happy birthday to me!" :)
Thank you Thank you Thank you