Rust: No way to refresh DNS information leading to indefinite network failures

Created on 27 Apr 2017  路  10Comments  路  Source: rust-lang/rust

Consider the following simple network client:

fn main() {
    use std::thread;
    use std::time::Duration;
    use std::net::TcpStream;

    loop {
        match TcpStream::connect("google.com:80") {
            Ok(_) => {
                println!("connected");
                break;
            }
            Err(e) => {
                println!("failed: {:?}", e);
            }
        }
        thread::sleep(Duration::from_secs(1));
    }
}

This works fine if you run it while your internet connection is up and running. However, if you kill your network connection, it (obviously) does not. What is interesting is if you launch the program while your internet is offline (and crucially, while /etc/resolv.conf does not contain any nameservers), and then connect to the internet again. I would expect the program to eventually say "connected", however this is not the case.

This had me puzzle for a while, until I stumbled on this old issue on the Pidgin bug tracker. It turns out that the set of nameservers available when the program is started is cached, and is never automatically re-read. Instead, res_init must be called manually to refresh the nameserver list. Unfortunately, as far as I can tell, there is no way in Rust to call res_init, and thus the above program simply cannot be made to work in the presence of network failures.

It's not entirely clear what the "right" fix here is: we could simply providing a way to call res_init, or we could do something more fancy like a special connect_uncached that does it for you. Regardless, this seems like a fairly unfortunate shortcoming..

Most helpful comment

For future reference, this was finally fixed in a recent glibc release. Though this workaround will probably need to be in place for a while longer.

All 10 comments

Seems like a lot of big programs have gone through the pain of re-discovering this issue. Here's Mozilla Firefox from 14 years ago. And more recently, Chef (and Ruby).

An interesting decision from that Mozilla bug report is:

it calls res_init if gethostbyname (or getaddrinfo) fails

That seems pretty reasonable, and maybe something that Rust could do too? Specifically, we should probably do this in lookup_host in sys_common/net.rs, or alternatively in the resolve_socket_addr used in the impl of ToSocketAddr for str. We'd need res_init to be exposed by libc though...

Opened a PR to libc over at https://github.com/rust-lang/libc/pull/585

Sounds like a reasonable solution to me! (calling res_init on failure)

Thanks for looking into this @jonhoo!

Do you think it'd be better to add this behavior into lookup_host, or in the higher-level resolve_socket_addr?

Nah I think throwing it into lookup_host is fine, that's already a mega "convenience" api

Does anybody have a link for the upstream bug?

Because programs, or even Rust runtime, are definitely not supposed to do this. res_init() is a GNU LibC implementation-specific function (OK, shared with BSD LibC, but no standard), while getaddrinfo() is POSIX. So use of getaddrinfo() can't depend on user fiddling with res_init(). And the specification definitely does not say anything that it is expected not to work if the network connection is changed after the program started.

So either:

  • User is never supposed to change /etc/resolv.conf at runtime and all programs that do that should provide Name Service Switch module, or DNS proxy, to take care of this鈥攕o it is a bug in DHCP-client and Network-Manager, or
  • Changing /etc/resolv.conf is supposed to happen and then it is a bug in GNU LibC not being able to notice it.

@jan-hudec see https://github.com/rust-lang/rust/pull/41582 for some further discussion. This is a bug in glibc (other libc implementations do not have this problem as they either do not cache, or they flush the cache when the set of nameservers change). It is reported upstream at https://sourceware.org/bugzilla/show_bug.cgi?id=984, but it seems unlikely that a fix will land any time soon.

I would argue strongly against your first point above (further indicating that this is a bug): /etc/resolv.conf can change for many reasons, many of which are not related to the user's actions. For example, the Arch Linux netctl network manager, and many other network managers, modify /etc/resolv.conf in response to network state changes through resolvconf. Yet they have no way of indicating this change to every running application. It is also not feasible to tell everyone to start using NSS, or to run their own DNS proxy (I run neither on my machine, and would not like to).

Oh, that's why I haven't seen the issue for ages鈥擠ebian carries a fix for it.

For future reference, this was finally fixed in a recent glibc release. Though this workaround will probably need to be in place for a while longer.

Was this page helpful?
0 / 5 - 0 ratings