actix-web 🚀 - Connection not closed correctly

I was wrong with that
With .keep_alive(server::KeepAlive::Timeout(5)) -> All connections are closed correctly

I have some connection that are open for more than 1 day

neowutran on 8 Aug 2018

Master should fix this probables

fafhrd91 on 8 Aug 2018

Ok, trying

neowutran on 9 Aug 2018

do you still see opened connections?

fafhrd91 on 11 Aug 2018

Yes. connections are not closed correctly yet

Source code I used to test: https://git.neowutran.ovh:8080/neowutran/Website/commit/6e1a7ec41c4d521d42a8df81b01c469fe77b4a9c

Script I use to check if connection are closed correctly

[root@xxxxxx]# cat ./sockets_uptime.sh
#!/bin/bash
function suptime() {
    local addr=${1:?Specify the remote IPv4 address}
    local port=${2:?Specify the remote port number}
    local min_time=${3:?Specify the min uptime in seconds for socket}
    local min_time=${min_time%.*}
    # convert the provided address to hex format
    local hex_addr=$(python -c "import socket, struct; print(hex(struct.unpack('<L', socket.inet_aton('$addr'))[0])[2:10].upper().zfill(8))")
    local hex_port=$(python -c "print(hex($port)[2:].upper().zfill(4))")
    # get the PID of the owner process
    local pid=$(netstat -ntp 2>/dev/null | awk '$6 == "ESTABLISHED" && $5 == "'$addr:$port'"{sub("/.*", "", $7); print $7}')
    #local pid=$(netstat -ntp 2>/dev/null | awk '$5 == "'$addr:$port'"{sub("/.*", "", $7); print $7}')
    [ -z "$pid" ] && {
#           echo 'Address does not match' 2>&1;
            return 1;
    }
    # get the inode of the socket
    local inode=$(awk '$4 == "01" && $3 == "'$hex_addr:$hex_port'" {print $10}' /proc/net/tcp)
    [ -z "$inode" ] && { echo 'Cannot lookup the socket' 2>&1; return 1; }
    # query the inode status change time
    local timestamp=$(find /proc/$pid/fd -lname "socket:\[$inode\]" -printf %T@)
    [ -z "$timestamp" ] && { echo 'Cannot fetch the timestamp' 2>&1; return 1; }

    local diff=$(bc <<< "$(date +%s.%N) - $timestamp")
    # truncate to int
    local diff=${diff%.*}
    if (( $diff > $min_time )); then
        # compute the time difference
        LANG=C printf '%s (%.2fs ago)\n' "$(date -d @$timestamp)" $diff
        echo $addr:$port
    fi
}

min_time=${1:?Specify the min uptime in seconds for socket}
ports=${2:?Specify the ports list, like ':8083|:443'}
#ports=':8083|:443'
echo "min uptime in seconds for sockets: $min_time"
netstat -ano | grep -E $ports | sed -e "s/[[:space:]]\+/\t/g" | cut -f5 | while read line
do
        ip=$(echo "$line" | cut -d ':' -f1 )
        port=$(echo "$line" | cut -d ':' -f2)
        suptime $ip $port $min_time

done

(based on https://superuser.com/questions/565991/how-to-determine-the-socket-connection-up-time-on-linux )
usage ./sockets_uptime.sh 3600 ':443' 2> /dev/null
Show socket connected to port 443 that where created more than 1 hour ago.

After ~2 hours, I have 23 sockets not closed correctly
But there is an interesting thing about them:
15 sockets where created exactly at 18:07:13
2 sockets where created exactly at 18:15:32
6 sockets where created exactly at 18:26:28
The process was started at 18:05:33

Some "burst" of sockets not closed.

The bad thing: I am unable to reproduce this issue without a heavy load ( port 443, the service is mainly used to download big files like this one https://github.com/actix/actix-web/issues/454#issue-348463796 ).

Process is still running, will update the stats in few hours

neowutran on 11 Aug 2018

Now I am at 32 sockets not closed correctly.
15 sockets where created exactly at 18:07:13
2 sockets where created exactly at 18:15:32
6 sockets where created exactly at 18:26:28
4 socket where created exactly at 18:55:45
The others socket where created at different time, but nothing really interesting with them.

Don't have any unclosed socket created between 22h and 06h ( not much traffic in these hours )
Process is still running.
I expect a small traffic spike today at around 17h/18h, so I think a lot of socket won't be closed at around 17h

neowutran on 12 Aug 2018

Now: ~50 sockets not closed correctly
As expected, most of the new sockets not closed correctly have been created around 17/18h

More than 2 sockets unclosed where created exactly at:

18:15:32
18:26:28
18:13:16
18:07:13
18:55:44
06:20:20

So definitly Some "burst" of sockets not closed.

Killing the process

neowutran on 13 Aug 2018

@fafhrd91
@DoumanAsh

Do you have an idea ?

I can easily reproduce this case on my server when it is heavily loaded.

Do you know a part of the code ( / external crate ) handling the sockets that could be sensitive to timestamp / race condition ?
I can add logs / trace where you want if you have an idea.

But quite a complexe issue

neowutran on 15 Aug 2018

I am a bit busy atm, I’ll get back to this bug this weekend

fafhrd91 on 15 Aug 2018

@neowutran I might have fixed this. Try this pull-request https://github.com/actix/actix-web/pull/483

Neopallium on 26 Aug 2018

@neowutran i merged pr, could you try master

fafhrd91 on 26 Aug 2018

Updated everything

Tested both with http1 only & http1/htt2
There are still unclosed sockets.
Last logs from the service with http1 only:

[root@xxxxxxxx ~]# ./sockets_uptime.sh 3600 443
min uptime in seconds for sockets: 3600
Mon Aug 27 12:31:08 CEST 2018 (101857.00s ago)
xxx.xxx.xxx.xxx:50289
Mon Aug 27 11:37:33 CEST 2018 (105073.00s ago)
xxx.xxx.xxx.xxx:55272
Mon Aug 27 11:23:44 CEST 2018 (105902.00s ago)
xxx.xxx.xxx.xxx:9257
Mon Aug 27 11:08:56 CEST 2018 (106790.00s ago)
xxx.xxx.xxx.xxx:53753
Mon Aug 27 12:31:08 CEST 2018 (101858.00s ago)
xxx.xxx.xxx.xxx:54859
Mon Aug 27 11:37:32 CEST 2018 (105075.00s ago)
xxx.xxx.xxx.xxx:6722
Mon Aug 27 11:37:30 CEST 2018 (105077.00s ago)
xxx.xxx.xxx.xxx:49868
Mon Aug 27 11:29:07 CEST 2018 (105580.00s ago)
xxx.xxx.xxx.xxx:8918
Mon Aug 27 11:37:30 CEST 2018 (105077.00s ago)
xxx.xxx.xxx.xxx:49867
Mon Aug 27 11:37:33 CEST 2018 (105075.00s ago)
xxx.xxx.xxx.xxx:53231
Mon Aug 27 11:23:43 CEST 2018 (105905.00s ago)
xxx.xxx.xxx.xxx:47600
Mon Aug 27 11:29:07 CEST 2018 (105581.00s ago)
xxx.xxx.xxx.xxx:8919
Mon Aug 27 11:37:33 CEST 2018 (105076.00s ago)
xxx.xxx.xxx.xxx:11313
Mon Aug 27 11:37:30 CEST 2018 (105079.00s ago)
xxx.xxx.xxx.xxx:55125
Mon Aug 27 11:37:30 CEST 2018 (105079.00s ago)
xxx.xxx.xxx.xxx:55894
Mon Aug 27 11:09:01 CEST 2018 (106789.00s ago)
xxx.xxx.xxx.xxx:9254
Mon Aug 27 11:08:56 CEST 2018 (106795.00s ago)
xxx.xxx.xxx.xxx:9424
Mon Aug 27 12:31:08 CEST 2018 (101863.00s ago)
xxx.xxx.xxx.xxx:5936
Mon Aug 27 11:37:30 CEST 2018 (105081.00s ago)
xxx.xxx.xxx.xxx:52727
Mon Aug 27 11:37:30 CEST 2018 (105081.00s ago)
xxx.xxx.xxx.xxx:25074
Mon Aug 27 11:37:30 CEST 2018 (105082.00s ago)
xxx.xxx.xxx.xxx:52718
Mon Aug 27 12:31:08 CEST 2018 (101864.00s ago)
xxx.xxx.xxx.xxx:11314
Mon Aug 27 12:31:08 CEST 2018 (101864.00s ago)
xxx.xxx.xxx.xxx:29972
Mon Aug 27 12:31:08 CEST 2018 (101864.00s ago)
xxx.xxx.xxx.xxx:37362
Mon Aug 27 12:31:08 CEST 2018 (101865.00s ago)
xxx.xxx.xxx.xxx:54365
Mon Aug 27 11:23:43 CEST 2018 (105910.00s ago)
xxx.xxx.xxx.xxx:60080

The service was launched on my port 443, I didn't used any stress tool on it
There is still this "burst" pattern, with many socket created at the exact same time and that didn't get closed

I am unable to reproduce this behavior on localhost. Will see if I can consistently reproduce that on a remote server with a http stress tool

neowutran on 28 Aug 2018

@neowutran is this with Os keep-alive or timeout?

DoumanAsh on 28 Aug 2018

For this test I just used the default settings
The source code for this test is: https://git.neowutran.ovh:8080/neowutran/Website/src/31e38fe5e3dabead906b3b2ff52a47e17a321295/src/main.rs

( & I used the latest version on master https://git.neowutran.ovh:8080/neowutran/Website/src/31e38fe5e3dabead906b3b2ff52a47e17a321295/Cargo.lock#L32 )

neowutran on 28 Aug 2018

@neowutran could you test with keep alive timeout

fafhrd91 on 28 Aug 2018

Ok, will relaunch the service with this modification:
https://git.neowutran.ovh:8080/neowutran/Website/commit/5da6a499bd87d3ca4a91442fe3cfb6e2ebbef9e8#diff-04d0cca2a951158b1cd945c941bf6c630572e52R222

Seem ok ?

neowutran on 28 Aug 2018

Looks good

fafhrd91 on 28 Aug 2018

Yup, same thing

[root@vps193339 ~]# ./sockets_uptime.sh 3600 443 2> /dev/null
min uptime in seconds for sockets: 3600
Tue Aug 28 19:44:57 CEST 2018 (76319.00s ago)
xxx.xxx.xxx.xxx:55605
Tue Aug 28 19:44:57 CEST 2018 (76320.00s ago)
xxx.xxx.xxx.xxx:49684
Tue Aug 28 19:44:57 CEST 2018 (76320.00s ago)
xxx.xxx.xxx.xxx:1379

Same condition:
The service was launched on my port 443, I didn't used any stress tool on it

neowutran on 29 Aug 2018

another thing related: after a stress test, CPU consumption stay at 100% for a very long time

neowutran on 8 Sep 2018

Do you see not closed connections?

fafhrd91 on 8 Sep 2018

Only ~50

neowutran on 8 Sep 2018

@neowutran I have seen an issue with connections not being closed.

I have just setup a test environment (two computers on gigabit LAN) for stress testing. I can cause actix-web to have connections that are not closed (ESTABLISHED state on the server, gone on the client). I haven't seen this problem with localhost stress testing, even with 200,000 concurrent connections.

I have been able to cause it some times with:

wrk -c 4000 -d 10 -t 20 http://server:8080/

I happens more often with higher concurrent connection counts. Using 20 threads seems to help cause the problem, even though the client only has 2 cores + hyper-threading.

Neopallium on 8 Sep 2018

The hung connections that I was seeing were caused by dropped RST/FIN packets. The kernel will never closing these connections, until a new RST packet is sent from the client. That might never happen.

@neowutran Do you still have keepalive enabled? That should have closed those connection if they were idle. Maybe use tcpdump to check if the connection is idle for longer then the keepalive timeout.

@fafhrd91 I think the default KeepAlive::Os is not a good default. Connections can hang around for a very long time if the RST/FIN packets are lost.

Neopallium on 8 Sep 2018

@fafhrd91 Even with KeepAlive::Timeout(30) some connections are not closed.

Neopallium on 8 Sep 2018

@Neopallium when you stop server does actix reports not closed connections?

fafhrd91 on 8 Sep 2018

@fafhrd91 Yes. Each worker thread reported about 350 connections when actix-web was shutdown with Ctrl-C. actix-web = 0.7.6 release build.

The leaked connections doesn't always happen. High concurrent connection count (200,000) seems to be the easiest way to cause it.

I have been testing with:

wrk -c 200000 -d 60 -t 20 -s scripts/ports.lua http://server:8080/

scripts/ports.lua file changes the connecting port (8080 -> 8099) for each thread.

local next_port = nil
function setup(thread)
    local port = next_port or wrk.port or 80
    next_port = port + 1

    local addrs = wrk.lookup(wrk.host, port)
    thread.addr = addrs[1]
end

function init(args)
    print(("thread addr: %s"):format(wrk.thread.addr))
end

The server is just the hello-world example changed to listen on ports 8080-8099. This make it easier to do high concurrent connection testing with just one client computer.

I will do some more testing tomorrow, to see if I can narrow down the problem.

Neopallium on 8 Sep 2018

@fafhrd91 It looks like the leaked connections are closed before sending a HTTP request and the RST/FIN packet is lost. The keep-alive timer isn't created until the connections protocol has been detected (http1/http2).

I think Actix-web should have a timeout for receiving the first request (nginx has a few timeouts like client_header_timeout, client_body_timeout). We can't rely on the OS always detecting a closed connection (unless TCP keep-alive is enabled).

Neopallium on 15 Sep 2018

@Neopallium do you think actix process does not get HUP event from kernel?

fafhrd91 on 15 Sep 2018

@Neopallium could test master again, i added slow request timer

fafhrd91 on 15 Sep 2018

@fafhrd91 based on tcpdumps (client and server side), I know that the server is not receiving the RST/FIN packet. So the socket on the server machine stays in ESTABLISHED state. Without some activity from the server side, the socket will just stay in the ESTABLISHED state, until the client tries to connect again with the same source port.

I did some testing with master. No leaked connections now.

@neowutran can you try master to see if it fixes the problem you are seeing.

Neopallium on 15 Sep 2018

Ok, trying master
restarted service, consuming 0.1% of server ram for the moment

waiting few hours

neowutran on 16 Sep 2018

It randomly segfault now

Attaching valgrind to it

neowutran on 16 Sep 2018

1 time it segfaulted after 10 seconds,
another time it segfaulter after 2 hours

neowutran on 16 Sep 2018

New crash after few minutes
https://neowutran.ovh:8080/valgrind_segfault.err

neowutran on 16 Sep 2018

No idea how, maybe related to the patch + rustls, but idk
trying to get a real core file

neowutran on 16 Sep 2018

@neowutran Can you recompile in debug mode. It would provide easier to read stack dumps.

After reviewing that valgrind output, it looks like maybe the HttpChannel is being polled again right after it was freed.

Also looks like it was serving a static file. Maybe the keep-alive timeout expired while it was still sending the response? Not sure.

Neopallium on 16 Sep 2018

Source code: https://git.neowutran.ovh:8080/neowutran/Website/commit/0500025bc5288198c1caba29e0b9139b62e0319d

Binary:
https://neowutran.ovh:8080/others/debug/neowutran

Stderr of the binary:
https://neowutran.ovh:8080/others/debug/log.err

Core files of the binary:
https://neowutran.ovh:8080/others/debug/

After that, tryied with valgrind on the debug binary:
https://neowutran.ovh:8080/valgrind_debug.err

neowutran on 16 Sep 2018

I'm sorry for interrupting (I know this was recognized as a bug).

I'm using 0.7.6. Running lsof -p <pid> prints a list of ESTABLISHED connections. In code I keep a counter in a simple middleware implementation, it's just 1-2 requests per second.

ghost on 17 Sep 2018

@anh-ba-khia You can try master. We are currently testing a fix for this issue.

Neopallium on 17 Sep 2018

@neowutran could you try again. but crash is really strange. i'll try to reproduce

fafhrd91 on 17 Sep 2018

@neowutran I am not able to recreate the crash. Is it only crashing when exposed to public traffic? or can you cause it to crash with a stress test?

Neopallium on 17 Sep 2018

@fafhrd91 @Neopallium
Ya, it segfault after some time ( when exposed to public traffic ), it can be seconds, minutes or hours

If you have some idea,
or if I need to setup some packet capture tools, to record everything & keep like the last minute before the segfault

neowutran on 17 Sep 2018

I rebuilt with revision 03e318f44649aed6c00de19bdb93cc6b0377b1fd, and the program is currently serving request. I have a simple middle implementation to count live requests:

impl<S> Middleware<S> for LiveConnectionCounter {

    fn start(&self, _: &HttpRequest<S>) -> actix_web::error::Result<middleware::Started> {
        // This is a `static AtomicUsize`
        LIVE_CONNECTION_COUNT.fetch_add(1, atomic::Ordering::Relaxed);
        Ok(middleware::Started::Done)
    }

    fn response(&self, _: &HttpRequest<S>, response: HttpResponse) -> actix_web::error::Result<middleware::Response> {
        Ok(middleware::Response::Done(response))
    }

    fn finish(&self, _: &HttpRequest<S>, _: &HttpResponse) -> middleware::Finished {
        LIVE_CONNECTION_COUNT.fetch_sub(1, atomic::Ordering::Relaxed);
        middleware::Finished::Done
    }

}

May I ask if you could verify that it is a correct way to count live connections? If it is, then I think the bug is still there. Because (again, if above is a correct way to count live connections), then while it's just 1-2, but at a same time, lsof is showing over 100 ESTABLISHED TCP connections :-(

ghost on 17 Sep 2018

@neowutran could you try again, i forgot to push my changes

fafhrd91 on 17 Sep 2018

@anh-ba-khia your code counts requests, you need connections

fafhrd91 on 17 Sep 2018

Different, but still memory corruption / segfault
https://neowutran.ovh:8080/valgrind_debug2.err

Crashed in few seconds with live traffic

neowutran on 17 Sep 2018

Oh, something is really broken. Could you try OpenSSL instead of rustls

fafhrd91 on 17 Sep 2018

same, with openssl it still segfault in few seconds

neowutran on 17 Sep 2018

@anh-ba-khia Your counter is for live requests. Do you have keep_alive enabled? The ESTABLISHED connections you are seeing might still be active or haven't timed out yet.

@neowutran A packet capture might help narrow down what type of request causes the crash.

Maybe a large POST request? I haven't tried stress testing with slow POST requests.

Neopallium on 17 Sep 2018

@Neopallium
Worst case scenario, I could just log everything with TCPDump until the segfault, but idk if it will help with the amount of data exchanged on the server. Unless we know what are we looking for

The only "unusual" thing the service is doing is serving large files ( ~40mo )

neowutran on 17 Sep 2018

@neowutran wait, I’ll rewrite how actix manages active connections

fafhrd91 on 17 Sep 2018

@Neopallium Thank you. At first I didn't enable keep-alive. But I discovered thousands of established connections which made the program hang. Then I searched and found this issue. After reading some comments, I thought adding keep-alive would help closing those connections. Then I added a small value of 10 seconds. I'll remove keep-alive, try again and will send you update later.

I don't have crashes like Yukikoo's issue. Maybe it's because my program is very simple, it just serves json requests. (An amateur web service made by me, :-D)

ghost on 17 Sep 2018

@fafhrd91 node.remove() doesn't update the prev pointer in the next node.

The next node will still be pointing to the deleted channel.

Neopallium on 17 Sep 2018

@anh-ba-khia The default keep_alive is now 5 seconds. If you have too many open connections, then either set it to 1 second, or KeepAlive::Disabled

Neopallium on 17 Sep 2018

🤦‍♂️

fafhrd91 on 17 Sep 2018

@neowutran could you try master again

fafhrd91 on 17 Sep 2018

@fafhrd91 I was able to recreate the memory access error. It seems to still happen:

https://gist.github.com/Neopallium/ba133e7430d9398e0c7e3fccc3302000

To recreate the issue I run 3 instances of wrk, then kill the second one. This will close the connections in the middle of the linked list.

I don't understand how the invalid write can still be happening.

Neopallium on 17 Sep 2018

Last commit should fix it. Do you still see access error?

fafhrd91 on 17 Sep 2018

might have had a stale Cargo.lock file. rebuilding now.

Neopallium on 17 Sep 2018

the problem is still there. tested with commit 76410356

Neopallium on 17 Sep 2018

I have updated the gist with the Cargo.toml and src/main.rs that I used for testing.
https://gist.github.com/Neopallium/ba133e7430d9398e0c7e3fccc3302000

to recreate issue:

run server: valgrind ./target/debug/hello-world
run 3 times in parallel wrk -t 1 -c 10 -d 90 http://localhost:8080/
kill the second instance.

Neopallium on 17 Sep 2018

@Neopallium thanks, that helped. last commit fixes access error

fafhrd91 on 17 Sep 2018

I don't know if this is related, but I was trying to do the websocket tutorial/examples and implement a chat, and found that my server code wasn't cleaning up clients when they disconnected. Even when running the websocket example, the StreamHandler's finished() method doesn't ever get called. That is, if I add a finished() method to examples/websocket/src/main.rs, similar to the client one, and run:

cd examples/websocket
cargo run --bin server

and also

cd examples/websocket
cargo run --bin client

and then hit CTRL-C on the client, then the server doesn't notice. Whereas if I hit CTRL-C on the server, the client does get a "Server disconnected" message within a few seconds, meaning finished() was called.

Also I notice now that the server is using WebsocketContext, while the client is using a regular Context. (Shouldn't the client also use WebsocketContext?)

This also happens with current actix-web/master / 0.7.8 and actix 0.7.0. I also tried actix master, but then the example doesn't compile anymore.

Should I open a separate issue for this?

edit: As for connections, netstat shows that the client closes its half of the socket for writing / sends a FIN (FIN_WAIT2 state) while the server is keeping its half open (CLOSE_WAIT state).

sapir on 18 Sep 2018

@sapir let’s use separate issue

fafhrd91 on 18 Sep 2018

ok thanks

sapir on 18 Sep 2018

@neowutran can you try the latest release 0.7.8.

Neopallium on 18 Sep 2018

@Neopallium My server uses 0.7.8. It ran for about 4 hours. I called lsof and grab 284 ESTABLISHED TCP connections. There were about total 28k+ requests (I take from logs). (I'm sorry for interrupting.)

ghost on 18 Sep 2018

I trust my logs that there are just 1-2 requests/second. So I can afford restarts if necessary. I'll keep my server run for some more hours, and post results here. Normally it can run for several days until it hangs up :-D

ghost on 18 Sep 2018

@anh-ba-khia What keep_alive setting are you using? If you are using KeepAlive::Disabled, then try again with a 1 second keep_alive.

@fafhrd91 If keep_alive is disabled, then the slow request timeout would also be disabled? I think it would be better to have a separate setting for the slow request timeout. Some people might want to disable keep_alive, but the slow request timeout should still be used.

Neopallium on 18 Sep 2018

@Neopallium makes sense

fafhrd91 on 18 Sep 2018

@Neopallium I'm not touching keep-alive using startup, so it uses default 5 seconds. I'll try again with 1 second.

I have some new results:

Runtime: 11 hours.
Requests: 95k+
Established TCP connection found via lsof -p <pid>: 932
Running cat /proc/<pid>/status, FDSize field is 2048.

FDSize is _"Number of file descriptor slots currently allocated."_ I got that description from man 5 proc.

I wish I could help more. So if you want any information, please ask me.

ghost on 18 Sep 2018

@anh-ba-khia can you do a tcpdump for one of the long lived connections? Run lsof -n -P -p <pid> twice about a minute apart to find an old socket. Then run tcpdump -n -i eth0 port <client port> and host <client ip>. This would help to see if the socket is actively making requests to keep from being closed. If the client-side is idle or had already closed the socket, then there shouldn't be any output from tcpdump.

I have been thinking of adding support for dumping the status of all active http connections in actix-web, it would help with debugging hang connections. I already have some code written, just need to clean it up and send a PR.

Neopallium on 18 Sep 2018

After 10h:
No segfault for the moment
No socket unclosed for the moment
Most memory leak seems to be fixed ( there are still memory leak ( aka: memory consumption increase over time) , but less than before )

Don't close the ticket yet, the service is still running, I want to let it run for few days before saying that the connection issue is fixed

And only tested with HTTP1, didn't tested HTTP2 yet

But after I will probably open a new ticket about memory with a new valgrind output

neowutran on 18 Sep 2018

@Neopallium Thank you. I followed your instruction and ran that command on 5 different clients. The first lines are exactly the same:

listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

For all 5 results, I've waited for about 1 minute and pressed Ctrl-C to stop the command. They all have 0 packets captured and 0 packets dropped by kernel. Some of them have 2, 3 or 0 packets received by filter.

ghost on 18 Sep 2018

@anh-ba-khia Then those are stuck connections.

The KeepAlive timer is stopped after receiving the request headers and only resumed when the request is finished.

Some possible reason for the stuck connections:

The request handler is stuck/blocked and doesn't finish sending the response. Might be useful to add a debug message for long request handling (make it a compile time feature for development).
Client disconnect during request body upload (should have a timeout for this too). The disconnect packets (FIN/RST) from the client would have to have been lost/drop before getting to your server.

Neopallium on 19 Sep 2018

Thanks @Neopallium , that helps a lot. Hopefully I'll resolve my issue soon. I think my issue is not related to this one. And since Yukikoo confirmed that new version has been running well, I guess this issue is solved.

Thanks again for your help :-)

ghost on 19 Sep 2018

@anh-ba-khia You could adapt your LiveConnectionCounter middleware to detect stuck requests. This might be useful for other people too. I had been thinking about adding the request timer to the actix-web core, but I think a middleware that can be inserted as needed would be better.

Neopallium on 19 Sep 2018

sadly, still have some unclosed connections

Wed Sep 19 06:27:45 CEST 2018 (85094.00s ago)
yyy.yyy.yyy.yyy:59976
Wed Sep 19 06:27:47 CEST 2018 (85093.00s ago)
zzz.zzz.zzz.zzz:58338
Wed Sep 19 06:27:45 CEST 2018 (85095.00s ago)
zzz.zzz.zzz.zzz:58244
Wed Sep 19 06:27:45 CEST 2018 (85096.00s ago)
yyy.yyy.yyy.yyy:59975
Wed Sep 19 06:27:45 CEST 2018 (85097.00s ago)
xxx.xxx.xxx.xxx:29109

( only 3 different IP, xxx, zzz and yyy )

Idk if it can be usefull, but I only got unclosed connection on the service used for download (port 443)
For the moment don't have any unclosed connection on the service used for upload (port 8083)

On the memory consumption side:
Both services started at 0.1% of server ram (~10mo)
Now the service on port 443 is at 0.2% of ram (~20mo, ~x2 on memory consumption since the beginning)
The service on port 8083 is at 5.9% of ram (~1.5Go, ~x59 on memory consumption since the beginning)

neowutran on 20 Sep 2018

Calling lsof takes very long time. So I coded a small program to count established TCP connections, along with oldest and newest connection timestamps.

It works via:

Parsing /proc/<pid>/net/tcp to find established TCP inodes.
Finding file descriptors in /proc/<pid>/fd/ which are sockets _and_ have inodes belonged to above list.
Printing out count and timestamps.

Currently it finds established TCP inodes in first step. You can change its behaviour if you want. I learned some constants from https://oroboro.com/finding-state-tcpip-sockets-linux/

Usage

It accepts systemd service name, or process ID. For example with my Firefox:

$ socket-counter 6497
"6497" -> 3 sockets
Oldest opened socket: 0 days 08:13:28
Newest opened socket: 0 days 08:13:28

Or via service name:

$ socket-counter some-service.service

I hope the program can help: https://github.com/anh-ba-khia/socket-counter

ghost on 20 Sep 2018

[root@vps193339 Website]# /root/sockets_uptime.sh 3600 443
min uptime in seconds for sockets: 3600
Thu Sep 20 06:05:42 CEST 2018 (268084.00s ago)
XXX.XXX.XXX.XXX:57134
Wed Sep 19 06:27:47 CEST 2018 (353160.00s ago)
XXX.XXX.XXX.XXX:58338
Wed Sep 19 06:27:45 CEST 2018 (353162.00s ago)
XXX.XXX.XXX.XXX:24112
Thu Sep 20 06:06:12 CEST 2018 (268055.00s ago)
XXX.XXX.XXX.XXX:5532
Wed Sep 19 06:27:45 CEST 2018 (353162.00s ago)
XXX.XXX.XXX.XXX:58244
Wed Sep 19 06:27:45 CEST 2018 (353163.00s ago)
XXX.XXX.XXX.XXX:50719
Wed Sep 19 06:27:45 CEST 2018 (353163.00s ago)
XXX.XXX.XXX.XXX:29109
Wed Sep 19 06:27:45 CEST 2018 (353164.00s ago)
XXX.XXX.XXX.XXX:53451
Thu Sep 20 06:06:00 CEST 2018 (268070.00s ago)
XXX.XXX.XXX.XXX:5533
Thu Sep 20 06:05:42 CEST 2018 (268087.00s ago)
XXX.XXX.XXX.XXX:24113

neowutran on 23 Sep 2018

@neowutran still only leaked connections on port 443 and not 8083?

Also would it be possible to enable access logging on that port to narrow down what kind of requests are happening on the leaked connections?

I am traveling for the next two weeks. So I will not be able to help much with debugging.

Neopallium on 24 Sep 2018

I think this is the same problem. We have to start slow request timer before ssl negotiation, which is not the case at the moment

fafhrd91 on 24 Sep 2018

Another memory leak, probably related to this issue
https://neowutran.ovh/upload_leak.log

Live traffic, mostly upload ( files upload / json upload )

neowutran on 27 Sep 2018

I have some updates about connections not being closed correctly.

I made some simple protocol over tcp, via tokio + futures crates. That means I replaced actix-web and rolled my own (simple) protocol. I couldn't test with wrk, so I ran my own tests. There were just about 500 requests/second. I know my implementation is poorly coded. (Last time I ran wrk on my implementation with actix-web, it handled up to 20k requests/second.)

However the thing is, no matter how many tests I ran, there were absolutely no unclosed connections when the tests finished. I ran tests again, again and again, and again... There were no unclosed connections.

Then, I used that for production code. When time goes, they appear and go up (the unclosed connections).

I just can guess that there might be some bugs in tokio or futures crates.

ghost on 30 Sep 2018

Not sure if this is useful, but I think I have a similar issue. Repro in this repo https://github.com/dnagir/rust-memomry-leak

But as others pointed out - it might be reproducible with a much smaller example.

dnagir on 30 Sep 2018

@dnagir thats for test. i tried to test with system allocator, i dont see leak with stress-alpine. Could you check?

#![feature(alloc_system, allocator_api)]
extern crate alloc_system;
use alloc_system::System;
#[global_allocator]
static A: System = System;

fafhrd91 on 30 Sep 2018

@neowutran could you try master but with system allocator?

fafhrd91 on 30 Sep 2018

@fafhrd91 already what I am using, unless you mean that you need the "allocator_api"
@Neopallium For the socket not correctly closed that I have, nothing have been registered by the actix middleware. So it never reached it.
Maybe I can activate logs on the firewall level to see what kind of tcp packet is causing the issue

neowutran on 1 Oct 2018

@neowutran yes, "allocator_api". master contains extra timer for ssl handshake, similar to slow request timeout in 0.7.8, I merged it last Friday.

fafhrd91 on 1 Oct 2018

@fafhrd91
Update this morning,
sadly, a lot of new leaks ( from 0.1% of server ram to 12% in ~5 hours)
and at least 1 new bug -> files download cannot complete, it goes to 99% then get stuck indefinitly

https://neowutran.ovh:8080/valgrind_debug5.err

trying again with lastest changes on master

neowutran on 2 Oct 2018

@neowutran could you update from master again.

fafhrd91 on 2 Oct 2018

@fafhrd91
It leak way less memory, and the download bug is fixed
However, after few days with live traffic, I have few hundred unclosed connections - 200+ - ( and memory still leak, but slower )

...
Thu Oct  4 17:57:14 CEST 2018 (303206.00s ago)
XXX.XXX.XX.XXX:41003
Thu Oct  4 17:47:21 CEST 2018 (303800.00s ago)
XXX.XXX.XX.XXX:53984
Thu Oct  4 17:47:21 CEST 2018 (303800.00s ago)
XXX.XXX.XX.XXX:59036
Thu Oct  4 17:47:21 CEST 2018 (303800.00s ago)
XXX.XXX.XX.XXX:57494
Thu Oct  4 17:47:21 CEST 2018 (303801.00s ago)
XXX.XXX.XX.XXX:3255
Thu Oct  4 17:47:21 CEST 2018 (303801.00s ago)
XXX.XXX.XX.XXX:63601
Thu Oct  4 17:47:21 CEST 2018 (303801.00s ago)
XXX.XXX.XX.XXX:57461
Thu Oct  4 17:47:21 CEST 2018 (303801.00s ago)
XXX.XXX.XX.XXX:57460
Thu Oct  4 17:47:21 CEST 2018 (303801.00s ago)
XXX.XXX.XX.XXX:51997
Thu Oct  4 17:50:01 CEST 2018 (303641.00s ago)
XXX.XXX.XX.XXX:60789
Thu Oct  4 17:47:21 CEST 2018 (303802.00s ago)
XXX.XXX.XX.XXX:51863
Thu Oct  4 17:47:21 CEST 2018 (303804.00s ago)
XXX.XXX.XX.XXX:1488
Thu Oct  4 17:47:21 CEST 2018 (303804.00s ago)
XXX.XXX.XX.XXX:58210
Thu Oct  4 17:47:21 CEST 2018 (303804.00s ago)
XXX.XXX.XX.XXX:24316
Thu Oct  4 17:47:21 CEST 2018 (303804.00s ago)
XXX.XXX.XX.XXX:53985
Thu Oct  4 17:47:21 CEST 2018 (303805.00s ago)
XXX.XXX.XX.XXX:49394
Thu Oct  4 17:47:21 CEST 2018 (303805.00s ago)
XXX.XXX.XX.XXX:61586
Thu Oct  4 17:47:21 CEST 2018 (303805.00s ago)
XXX.XXX.XX.XXX:1027
Thu Oct  4 17:47:21 CEST 2018 (303805.00s ago)
XXX.XXX.XX.XXX:2530
Thu Oct  4 17:47:21 CEST 2018 (303806.00s ago)
XXX.XXX.XX.XXX:9384
Thu Oct  4 17:47:21 CEST 2018 (303806.00s ago)
XXX.XXX.XX.XXX:5169
Thu Oct  4 17:47:21 CEST 2018 (303806.00s ago)
XXX.XXX.XX.XXX:49586
Thu Oct  4 17:47:21 CEST 2018 (303807.00s ago)
XXX.XXX.XX.XXX:56157
Thu Oct  4 17:47:21 CEST 2018 (303807.00s ago)
XXX.XXX.XX.XXX:50240
Thu Oct  4 17:57:10 CEST 2018 (303218.00s ago)
XXX.XXX.XX.XXX:59708
Thu Oct  4 17:47:21 CEST 2018 (303808.00s ago)
XXX.XXX.XX.XXX:26615
Thu Oct  4 17:47:21 CEST 2018 (303808.00s ago)
XXX.XXX.XX.XXX:50059

October 4, 17h match with a very high number of users connected to the server. The 200+ unclosed connections are at 17h47 +- few minutes
Same pattern: Multiple unclosed connection at the exact same time

neowutran on 8 Oct 2018

Reached ~500 unclosed connection with live traffic in ~5 days.

some log retrieved with valgrind:
https://neowutran.ovh/valgrind_debug7.err

I needed to do a kill -9 on the process, it was completly stuck

~~ no idea how to help testing this issue

neowutran on 9 Oct 2018

try latest master

fafhrd91 on 9 Oct 2018

Tried latest master, same issue
With live traffic I reached around 300 unclosed connections in 2 days, same "burst" pattern ( a lot of unclosed connection where created at the exact same time )
Valgrind output: https://neowutran.ovh/valgrind_debug8.err

neowutran on 13 Oct 2018

Using a very, very basic implementation with the RUST_LOG environment variable set to debug on the latest version of Actix throws this error on every request, even on a simple plaintext route.

jcoleman-techempower on 25 Oct 2018

actix_web::server::KeepAlive::Timeout(5) does not send TCP FIN in 0.7.13

code

extern crate actix_web;
extern crate env_logger;
extern crate futures;
#[macro_use]
extern crate log;
extern crate failure;
extern crate tokio_timer;
use actix_web::middleware::{Logger};
use futures::prelude::*;
use actix_web::http::{StatusCode, Method};
use actix_web::{HttpRequest, HttpResponse};
use failure::Error;

fn main(){
    ::env_logger::init();

    let srv = ::actix_web::server::new(move ||{
        ::actix_web::App::new()
            .middleware(Logger::default())
            .resource("/", |r|{ r.method(Method::GET).with_async(get_index); })
    })
        .keep_alive(::actix_web::server::KeepAlive::Timeout(5)) // change here
        .bind("0.0.0.0:8081")
        .unwrap();
    srv.run();

}

/// GET /
pub fn get_index(_: HttpRequest) -> impl Future<Item=HttpResponse, Error=Error> {
    ::tokio_timer::sleep(::std::time::Duration::from_secs(0)).map(|()|{
        HttpResponse::Ok()
            .header("content-type", "text/plain; charset=utf-8")
            .body("🐲")
    }).map_err(Into::into)
}

telnet results, tcpdumps and trace logs

actix_web::server::KeepAlive::Timeout(5)

$ telnet localhost 8081
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.1

HTTP/1.1 200 OK
content-length: 4
content-type: text/plain; charset=utf-8
date: Wed, 31 Oct 2018 12:50:47 GMT

🐲GET / HTTP/1.1
Connection closed by foreign host.

TRACE 2018-10-31T12:50:30Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:50:30Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:50:30Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:50:30Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:50:30Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:50:30Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:50:30Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:50:30Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
 INFO 2018-10-31T12:50:47Z: actix_web::middleware::logger: 127.0.0.1:47654 "GET / HTTP/1.1" 200 4 "-" "-" 0.002296
TRACE 2018-10-31T12:50:58Z: actix_web::server::h1: Keep-alive timeout, close connection

$ sudo tcpdump -vvv -s 0 -nX -i lo -r lo-actix-keepalive-timeout-5 port 8081
reading from file lo-actix-keepalive-timeout-5, link-type EN10MB (Ethernet)
21:50:46.369544 IP (tos 0x10, ttl 64, id 35464, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.47654 > 127.0.0.1.8081: Flags [S], cksum 0xfe30 (incorrect -> 0x4601), seq 3217823845, win 43690, options [mss 65495,sackOK,TS val 726696436 ecr 0,nop,wscale 7], length 0
    0x0000:  4510 003c 8a88 4000 4006 b221 7f00 0001  E..<..@.@..!....
    0x0010:  7f00 0001 ba26 1f91 bfcc 1865 0000 0000  .....&.....e....
    0x0020:  a002 aaaa fe30 0000 0204 ffd7 0402 080a  .....0..........
    0x0030:  2b50 81f4 0000 0000 0103 0307            +P..........
21:50:46.369576 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.8081 > 127.0.0.1.47654: Flags [S.], cksum 0xfe30 (incorrect -> 0xb220), seq 1341560468, ack 3217823846, win 43690, options [mss 65495,sackOK,TS val 726696436 ecr 726696436,nop,wscale 7], length 0
    0x0000:  4500 003c 0000 4000 4006 3cba 7f00 0001  E..<..@.@.<.....
    0x0010:  7f00 0001 1f91 ba26 4ff6 9694 bfcc 1866  .......&O......f
    0x0020:  a012 aaaa fe30 0000 0204 ffd7 0402 080a  .....0..........
    0x0030:  2b50 81f4 2b50 81f4 0103 0307            +P..+P......
21:50:46.369606 IP (tos 0x10, ttl 64, id 35465, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.47654 > 127.0.0.1.8081: Flags [.], cksum 0xfe28 (incorrect -> 0x8465), seq 1, ack 1, win 342, options [nop,nop,TS val 726696436 ecr 726696436], length 0
    0x0000:  4510 0034 8a89 4000 4006 b228 7f00 0001  E..4..@.@..(....
    0x0010:  7f00 0001 ba26 1f91 bfcc 1866 4ff6 9695  .....&.....fO...
    0x0020:  8010 0156 fe28 0000 0101 080a 2b50 81f4  ...V.(......+P..
    0x0030:  2b50 81f4                                +P..
21:50:47.156359 IP (tos 0x10, ttl 64, id 35466, offset 0, flags [DF], proto TCP (6), length 68)
    127.0.0.1.47654 > 127.0.0.1.8081: Flags [P.], cksum 0xfe38 (incorrect -> 0xafa3), seq 1:17, ack 1, win 342, options [nop,nop,TS val 726697223 ecr 726696436], length 16
    0x0000:  4510 0044 8a8a 4000 4006 b217 7f00 0001  E..D..@.@.......
    0x0010:  7f00 0001 ba26 1f91 bfcc 1866 4ff6 9695  .....&.....fO...
    0x0020:  8018 0156 fe38 0000 0101 080a 2b50 8507  ...V.8......+P..
    0x0030:  2b50 81f4 4745 5420 2f20 4854 5450 2f31  +P..GET./.HTTP/1
    0x0040:  2e31 0d0a                                .1..
21:50:47.156414 IP (tos 0x0, ttl 64, id 1144, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47654: Flags [.], cksum 0xfe28 (incorrect -> 0x7e2f), seq 1, ack 17, win 342, options [nop,nop,TS val 726697223 ecr 726697223], length 0
    0x0000:  4500 0034 0478 4000 4006 384a 7f00 0001  E..4.x@[email protected]....
    0x0010:  7f00 0001 1f91 ba26 4ff6 9695 bfcc 1876  .......&O......v
    0x0020:  8010 0156 fe28 0000 0101 080a 2b50 8507  ...V.(......+P..
    0x0030:  2b50 8507                                +P..
21:50:47.951633 IP (tos 0x10, ttl 64, id 35467, offset 0, flags [DF], proto TCP (6), length 54)
    127.0.0.1.47654 > 127.0.0.1.8081: Flags [P.], cksum 0xfe2a (incorrect -> 0x6e00), seq 17:19, ack 1, win 342, options [nop,nop,TS val 726698018 ecr 726697223], length 2
    0x0000:  4510 0036 8a8b 4000 4006 b224 7f00 0001  E..6..@.@..$....
    0x0010:  7f00 0001 ba26 1f91 bfcc 1876 4ff6 9695  .....&.....vO...
    0x0020:  8018 0156 fe2a 0000 0101 080a 2b50 8822  ...V.*......+P."
    0x0030:  2b50 8507 0d0a                           +P....
21:50:47.951675 IP (tos 0x0, ttl 64, id 1145, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47654: Flags [.], cksum 0xfe28 (incorrect -> 0x77f7), seq 1, ack 19, win 342, options [nop,nop,TS val 726698018 ecr 726698018], length 0
    0x0000:  4500 0034 0479 4000 4006 3849 7f00 0001  E..4.y@[email protected]....
    0x0010:  7f00 0001 1f91 ba26 4ff6 9695 bfcc 1878  .......&O......x
    0x0020:  8010 0156 fe28 0000 0101 080a 2b50 8822  ...V.(......+P."
    0x0030:  2b50 8822                                +P."
21:50:47.954831 IP (tos 0x0, ttl 64, id 1146, offset 0, flags [DF], proto TCP (6), length 172)
    127.0.0.1.8081 > 127.0.0.1.47654: Flags [P.], cksum 0xfea0 (incorrect -> 0x8214), seq 1:121, ack 19, win 342, options [nop,nop,TS val 726698021 ecr 726698018], length 120
    0x0000:  4500 00ac 047a 4000 4006 37d0 7f00 0001  E....z@[email protected].....
    0x0010:  7f00 0001 1f91 ba26 4ff6 9695 bfcc 1878  .......&O......x
    0x0020:  8018 0156 fea0 0000 0101 080a 2b50 8825  ...V........+P.%
    0x0030:  2b50 8822 4854 5450 2f31 2e31 2032 3030  +P."HTTP/1.1.200
    0x0040:  204f 4b0d 0a63 6f6e 7465 6e74 2d6c 656e  .OK..content-len
    0x0050:  6774 683a 2034 0d0a 636f 6e74 656e 742d  gth:.4..content-
    0x0060:  7479 7065 3a20 7465 7874 2f70 6c61 696e  type:.text/plain
    0x0070:  3b20 6368 6172 7365 743d 7574 662d 380d  ;.charset=utf-8.
    0x0080:  0a64 6174 653a 2057 6564 2c20 3331 204f  .date:.Wed,.31.O
    0x0090:  6374 2032 3031 3820 3132 3a35 303a 3437  ct.2018.12:50:47
    0x00a0:  2047 4d54 0d0a 0d0a f09f 90b2            .GMT........
21:50:47.954858 IP (tos 0x10, ttl 64, id 35468, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.47654 > 127.0.0.1.8081: Flags [.], cksum 0xfe28 (incorrect -> 0x7779), seq 19, ack 121, win 342, options [nop,nop,TS val 726698021 ecr 726698021], length 0
    0x0000:  4510 0034 8a8c 4000 4006 b225 7f00 0001  E..4..@.@..%....
    0x0010:  7f00 0001 ba26 1f91 bfcc 1878 4ff6 970d  .....&.....xO...
    0x0020:  8010 0156 fe28 0000 0101 080a 2b50 8825  ...V.(......+P.%
    0x0030:  2b50 8825                                +P.%
21:50:58.624055 IP (tos 0x10, ttl 64, id 35469, offset 0, flags [DF], proto TCP (6), length 68)
    127.0.0.1.47654 > 127.0.0.1.8081: Flags [P.], cksum 0xfe38 (incorrect -> 0x7c1d), seq 19:35, ack 121, win 342, options [nop,nop,TS val 726708690 ecr 726698021], length 16
    0x0000:  4510 0044 8a8d 4000 4006 b214 7f00 0001  E..D..@.@.......
    0x0010:  7f00 0001 ba26 1f91 bfcc 1878 4ff6 970d  .....&.....xO...
    0x0020:  8018 0156 fe38 0000 0101 080a 2b50 b1d2  ...V.8......+P..
    0x0030:  2b50 8825 4745 5420 2f20 4854 5450 2f31  +P.%GET./.HTTP/1
    0x0040:  2e31 0d0a                                .1..
21:50:58.624692 IP (tos 0x0, ttl 64, id 1147, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47654: Flags [R.], cksum 0xfe28 (incorrect -> 0x240a), seq 121, ack 35, win 342, options [nop,nop,TS val 726708691 ecr 726708690], length 0
    0x0000:  4500 0034 047b 4000 4006 3847 7f00 0001  E..4.{@[email protected]....
    0x0010:  7f00 0001 1f91 ba26 4ff6 970d bfcc 1888  .......&O.......
    0x0020:  8014 0156 fe28 0000 0101 080a 2b50 b1d3  ...V.(......+P..
    0x0030:  2b50 b1d2                                +P..

actix_web::server::KeepAlive::Os

$ telnet localhost 8081
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.1

HTTP/1.1 200 OK
content-length: 4
content-type: text/plain; charset=utf-8
date: Wed, 31 Oct 2018 12:51:38 GMT

🐲Connection closed by foreign host.

TRACE 2018-10-31T12:51:34Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:51:34Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:51:34Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:51:34Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:51:34Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:51:34Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:51:34Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:51:34Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
 INFO 2018-10-31T12:51:38Z: actix_web::middleware::logger: 127.0.0.1:47666 "GET / HTTP/1.1" 200 4 "-" "-" 0.002420
TRACE 2018-10-31T12:51:41Z: actix_web::server::h1: Keep-alive timeout, close connection

$ sudo tcpdump -vvv -s 0 -nX -i lo -r lo-actix-keepalive-os port 8081     
reading from file lo-actix-keepalive-os, link-type EN10MB (Ethernet)
21:51:36.052822 IP (tos 0x10, ttl 64, id 46588, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.47666 > 127.0.0.1.8081: Flags [S], cksum 0xfe30 (incorrect -> 0x88c7), seq 1555920526, win 43690, options [mss 65495,sackOK,TS val 726746119 ecr 0,nop,wscale 7], length 0
    0x0000:  4510 003c b5fc 4000 4006 86ad 7f00 0001  E..<..@.@.......
    0x0010:  7f00 0001 ba32 1f91 5cbd 768e 0000 0000  .....2..\.v.....
    0x0020:  a002 aaaa fe30 0000 0204 ffd7 0402 080a  .....0..........
    0x0030:  2b51 4407 0000 0000 0103 0307            +QD.........
21:51:36.052854 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.8081 > 127.0.0.1.47666: Flags [S.], cksum 0xfe30 (incorrect -> 0x4f6b), seq 978751388, ack 1555920527, win 43690, options [mss 65495,sackOK,TS val 726746119 ecr 726746119,nop,wscale 7], length 0
    0x0000:  4500 003c 0000 4000 4006 3cba 7f00 0001  E..<..@.@.<.....
    0x0010:  7f00 0001 1f91 ba32 3a56 8f9c 5cbd 768f  .......2:V..\.v.
    0x0020:  a012 aaaa fe30 0000 0204 ffd7 0402 080a  .....0..........
    0x0030:  2b51 4407 2b51 4407 0103 0307            +QD.+QD.....
21:51:36.052883 IP (tos 0x10, ttl 64, id 46589, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.47666 > 127.0.0.1.8081: Flags [.], cksum 0xfe28 (incorrect -> 0x21b0), seq 1, ack 1, win 342, options [nop,nop,TS val 726746119 ecr 726746119], length 0
    0x0000:  4510 0034 b5fd 4000 4006 86b4 7f00 0001  E..4..@.@.......
    0x0010:  7f00 0001 ba32 1f91 5cbd 768f 3a56 8f9d  .....2..\.v.:V..
    0x0020:  8010 0156 fe28 0000 0101 080a 2b51 4407  ...V.(......+QD.
    0x0030:  2b51 4407                                +QD.
21:51:36.719542 IP (tos 0x10, ttl 64, id 46590, offset 0, flags [DF], proto TCP (6), length 68)
    127.0.0.1.47666 > 127.0.0.1.8081: Flags [P.], cksum 0xfe38 (incorrect -> 0x4d66), seq 1:17, ack 1, win 342, options [nop,nop,TS val 726746786 ecr 726746119], length 16
    0x0000:  4510 0044 b5fe 4000 4006 86a3 7f00 0001  E..D..@.@.......
    0x0010:  7f00 0001 ba32 1f91 5cbd 768f 3a56 8f9d  .....2..\.v.:V..
    0x0020:  8018 0156 fe38 0000 0101 080a 2b51 46a2  ...V.8......+QF.
    0x0030:  2b51 4407 4745 5420 2f20 4854 5450 2f31  +QD.GET./.HTTP/1
    0x0040:  2e31 0d0a                                .1..
21:51:36.719593 IP (tos 0x0, ttl 64, id 19695, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47666: Flags [.], cksum 0xfe28 (incorrect -> 0x1c6a), seq 1, ack 17, win 342, options [nop,nop,TS val 726746786 ecr 726746786], length 0
    0x0000:  4500 0034 4cef 4000 4006 efd2 7f00 0001  E..4L.@.@.......
    0x0010:  7f00 0001 1f91 ba32 3a56 8f9d 5cbd 769f  .......2:V..\.v.
    0x0020:  8010 0156 fe28 0000 0101 080a 2b51 46a2  ...V.(......+QF.
    0x0030:  2b51 46a2                                +QF.
21:51:38.218838 IP (tos 0x10, ttl 64, id 46591, offset 0, flags [DF], proto TCP (6), length 54)
    127.0.0.1.47666 > 127.0.0.1.8081: Flags [P.], cksum 0xfe2a (incorrect -> 0x097b), seq 17:19, ack 1, win 342, options [nop,nop,TS val 726748285 ecr 726746786], length 2
    0x0000:  4510 0036 b5ff 4000 4006 86b0 7f00 0001  E..6..@.@.......
    0x0010:  7f00 0001 ba32 1f91 5cbd 769f 3a56 8f9d  .....2..\.v.:V..
    0x0020:  8018 0156 fe2a 0000 0101 080a 2b51 4c7d  ...V.*......+QL}
    0x0030:  2b51 46a2 0d0a                           +QF...
21:51:38.218870 IP (tos 0x0, ttl 64, id 19696, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47666: Flags [.], cksum 0xfe28 (incorrect -> 0x10b2), seq 1, ack 19, win 342, options [nop,nop,TS val 726748285 ecr 726748285], length 0
    0x0000:  4500 0034 4cf0 4000 4006 efd1 7f00 0001  E..4L.@.@.......
    0x0010:  7f00 0001 1f91 ba32 3a56 8f9d 5cbd 76a1  .......2:V..\.v.
    0x0020:  8010 0156 fe28 0000 0101 080a 2b51 4c7d  ...V.(......+QL}
    0x0030:  2b51 4c7d                                +QL}
21:51:38.222134 IP (tos 0x0, ttl 64, id 19697, offset 0, flags [DF], proto TCP (6), length 172)
    127.0.0.1.8081 > 127.0.0.1.47666: Flags [P.], cksum 0xfea0 (incorrect -> 0x1ace), seq 1:121, ack 19, win 342, options [nop,nop,TS val 726748288 ecr 726748285], length 120
    0x0000:  4500 00ac 4cf1 4000 4006 ef58 7f00 0001  E...L.@[email protected]....
    0x0010:  7f00 0001 1f91 ba32 3a56 8f9d 5cbd 76a1  .......2:V..\.v.
    0x0020:  8018 0156 fea0 0000 0101 080a 2b51 4c80  ...V........+QL.
    0x0030:  2b51 4c7d 4854 5450 2f31 2e31 2032 3030  +QL}HTTP/1.1.200
    0x0040:  204f 4b0d 0a63 6f6e 7465 6e74 2d6c 656e  .OK..content-len
    0x0050:  6774 683a 2034 0d0a 636f 6e74 656e 742d  gth:.4..content-
    0x0060:  7479 7065 3a20 7465 7874 2f70 6c61 696e  type:.text/plain
    0x0070:  3b20 6368 6172 7365 743d 7574 662d 380d  ;.charset=utf-8.
    0x0080:  0a64 6174 653a 2057 6564 2c20 3331 204f  .date:.Wed,.31.O
    0x0090:  6374 2032 3031 3820 3132 3a35 313a 3338  ct.2018.12:51:38
    0x00a0:  2047 4d54 0d0a 0d0a f09f 90b2            .GMT........
21:51:38.222169 IP (tos 0x10, ttl 64, id 46592, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.47666 > 127.0.0.1.8081: Flags [.], cksum 0xfe28 (incorrect -> 0x1034), seq 19, ack 121, win 342, options [nop,nop,TS val 726748288 ecr 726748288], length 0
    0x0000:  4510 0034 b600 4000 4006 86b1 7f00 0001  E..4..@.@.......
    0x0010:  7f00 0001 ba32 1f91 5cbd 76a1 3a56 9015  .....2..\.v.:V..
    0x0020:  8010 0156 fe28 0000 0101 080a 2b51 4c80  ...V.(......+QL.
    0x0030:  2b51 4c80                                +QL.
21:51:41.055305 IP (tos 0x0, ttl 64, id 19698, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47666: Flags [F.], cksum 0xfe28 (incorrect -> 0x0521), seq 121, ack 19, win 342, options [nop,nop,TS val 726751122 ecr 726748288], length 0
    0x0000:  4500 0034 4cf2 4000 4006 efcf 7f00 0001  E..4L.@.@.......
    0x0010:  7f00 0001 1f91 ba32 3a56 9015 5cbd 76a1  .......2:V..\.v.
    0x0020:  8011 0156 fe28 0000 0101 080a 2b51 5792  ...V.(......+QW.
    0x0030:  2b51 4c80                                +QL.
21:51:41.055549 IP (tos 0x10, ttl 64, id 46593, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.47666 > 127.0.0.1.8081: Flags [F.], cksum 0xfe28 (incorrect -> 0xfa0d), seq 19, ack 122, win 342, options [nop,nop,TS val 726751122 ecr 726751122], length 0
    0x0000:  4510 0034 b601 4000 4006 86b0 7f00 0001  E..4..@.@.......
    0x0010:  7f00 0001 ba32 1f91 5cbd 76a1 3a56 9016  .....2..\.v.:V..
    0x0020:  8011 0156 fe28 0000 0101 080a 2b51 5792  ...V.(......+QW.
    0x0030:  2b51 5792                                +QW.
21:51:41.055601 IP (tos 0x0, ttl 64, id 19699, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47666: Flags [.], cksum 0xfe28 (incorrect -> 0xfa0d), seq 122, ack 20, win 342, options [nop,nop,TS val 726751122 ecr 726751122], length 0
    0x0000:  4500 0034 4cf3 4000 4006 efce 7f00 0001  E..4L.@.@.......
    0x0010:  7f00 0001 1f91 ba32 3a56 9016 5cbd 76a2  .......2:V..\.v.
    0x0020:  8010 0156 fe28 0000 0101 080a 2b51 5792  ...V.(......+QW.
    0x0030:  2b51 5792                                +QW.

actix_web::server::KeepAlive::Disabled (without Conection: close)

$ telnet localhost 8081
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.1

HTTP/1.1 200 OK
content-length: 4
content-type: text/plain; charset=utf-8
date: Wed, 31 Oct 2018 12:52:45 GMT

🐲Connection closed by foreign host.

TRACE 2018-10-31T12:52:40Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:52:40Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:52:40Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:52:40Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:52:40Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:52:40Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:52:40Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:52:40Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
 INFO 2018-10-31T12:52:45Z: actix_web::middleware::logger: 127.0.0.1:47682 "GET / HTTP/1.1" 200 4 "-" "-" 0.001000

$ sudo tcpdump -vvv -s 0 -nX -i lo -r lo-actix-keepalive-disabled port 8081
reading from file lo-actix-keepalive-disabled, link-type EN10MB (Ethernet)
21:52:43.979630 IP (tos 0x10, ttl 64, id 16549, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.47682 > 127.0.0.1.8081: Flags [S], cksum 0xfe30 (incorrect -> 0xe2ba), seq 713246061, win 43690, options [mss 65495,sackOK,TS val 726814046 ecr 0,nop,wscale 7], length 0
    0x0000:  4510 003c 40a5 4000 4006 fc04 7f00 0001  E..<@.@.@.......
    0x0010:  7f00 0001 ba42 1f91 2a83 456d 0000 0000  .....B..*.Em....
    0x0020:  a002 aaaa fe30 0000 0204 ffd7 0402 080a  .....0..........
    0x0030:  2b52 4d5e 0000 0000 0103 0307            +RM^........
21:52:43.979671 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.8081 > 127.0.0.1.47682: Flags [S.], cksum 0xfe30 (incorrect -> 0xf16a), seq 3296441362, ack 713246062, win 43690, options [mss 65495,sackOK,TS val 726814046 ecr 726814046,nop,wscale 7], length 0
    0x0000:  4500 003c 0000 4000 4006 3cba 7f00 0001  E..<..@.@.<.....
    0x0010:  7f00 0001 1f91 ba42 c47b b412 2a83 456e  .......B.{..*.En
    0x0020:  a012 aaaa fe30 0000 0204 ffd7 0402 080a  .....0..........
    0x0030:  2b52 4d5e 2b52 4d5e 0103 0307            +RM^+RM^....
21:52:43.979701 IP (tos 0x10, ttl 64, id 16550, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.47682 > 127.0.0.1.8081: Flags [.], cksum 0xfe28 (incorrect -> 0xc3af), seq 1, ack 1, win 342, options [nop,nop,TS val 726814046 ecr 726814046], length 0
    0x0000:  4510 0034 40a6 4000 4006 fc0b 7f00 0001  E..4@.@.@.......
    0x0010:  7f00 0001 ba42 1f91 2a83 456e c47b b413  .....B..*.En.{..
    0x0020:  8010 0156 fe28 0000 0101 080a 2b52 4d5e  ...V.(......+RM^
    0x0030:  2b52 4d5e                                +RM^
21:52:44.841175 IP (tos 0x10, ttl 64, id 16551, offset 0, flags [DF], proto TCP (6), length 68)
    127.0.0.1.47682 > 127.0.0.1.8081: Flags [P.], cksum 0xfe38 (incorrect -> 0xeea3), seq 1:17, ack 1, win 342, options [nop,nop,TS val 726814907 ecr 726814046], length 16
    0x0000:  4510 0044 40a7 4000 4006 fbfa 7f00 0001  E..D@.@.@.......
    0x0010:  7f00 0001 ba42 1f91 2a83 456e c47b b413  .....B..*.En.{..
    0x0020:  8018 0156 fe38 0000 0101 080a 2b52 50bb  ...V.8......+RP.
    0x0030:  2b52 4d5e 4745 5420 2f20 4854 5450 2f31  +RM^GET./.HTTP/1
    0x0040:  2e31 0d0a                                .1..
21:52:44.841216 IP (tos 0x0, ttl 64, id 45531, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47682: Flags [.], cksum 0xfe28 (incorrect -> 0xbce5), seq 1, ack 17, win 342, options [nop,nop,TS val 726814907 ecr 726814907], length 0
    0x0000:  4500 0034 b1db 4000 4006 8ae6 7f00 0001  E..4..@.@.......
    0x0010:  7f00 0001 1f91 ba42 c47b b413 2a83 457e  .......B.{..*.E~
    0x0020:  8010 0156 fe28 0000 0101 080a 2b52 50bb  ...V.(......+RP.
    0x0030:  2b52 50bb                                +RP.
21:52:45.537356 IP (tos 0x10, ttl 64, id 16552, offset 0, flags [DF], proto TCP (6), length 54)
    127.0.0.1.47682 > 127.0.0.1.8081: Flags [P.], cksum 0xfe2a (incorrect -> 0xad18), seq 17:19, ack 1, win 342, options [nop,nop,TS val 726815604 ecr 726814907], length 2
    0x0000:  4510 0036 40a8 4000 4006 fc07 7f00 0001  E..6@.@.@.......
    0x0010:  7f00 0001 ba42 1f91 2a83 457e c47b b413  .....B..*.E~.{..
    0x0020:  8018 0156 fe2a 0000 0101 080a 2b52 5374  ...V.*......+RSt
    0x0030:  2b52 50bb 0d0a                           +RP...
21:52:45.537396 IP (tos 0x0, ttl 64, id 45532, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47682: Flags [.], cksum 0xfe28 (incorrect -> 0xb771), seq 1, ack 19, win 342, options [nop,nop,TS val 726815604 ecr 726815604], length 0
    0x0000:  4500 0034 b1dc 4000 4006 8ae5 7f00 0001  E..4..@.@.......
    0x0010:  7f00 0001 1f91 ba42 c47b b413 2a83 4580  .......B.{..*.E.
    0x0020:  8010 0156 fe28 0000 0101 080a 2b52 5374  ...V.(......+RSt
    0x0030:  2b52 5374                                +RSt
21:52:45.539239 IP (tos 0x0, ttl 64, id 45533, offset 0, flags [DF], proto TCP (6), length 172)
    127.0.0.1.8081 > 127.0.0.1.47682: Flags [P.], cksum 0xfea0 (incorrect -> 0xbf92), seq 1:121, ack 19, win 342, options [nop,nop,TS val 726815605 ecr 726815604], length 120
    0x0000:  4500 00ac b1dd 4000 4006 8a6c 7f00 0001  E.....@[email protected]....
    0x0010:  7f00 0001 1f91 ba42 c47b b413 2a83 4580  .......B.{..*.E.
    0x0020:  8018 0156 fea0 0000 0101 080a 2b52 5375  ...V........+RSu
    0x0030:  2b52 5374 4854 5450 2f31 2e31 2032 3030  +RStHTTP/1.1.200
    0x0040:  204f 4b0d 0a63 6f6e 7465 6e74 2d6c 656e  .OK..content-len
    0x0050:  6774 683a 2034 0d0a 636f 6e74 656e 742d  gth:.4..content-
    0x0060:  7479 7065 3a20 7465 7874 2f70 6c61 696e  type:.text/plain
    0x0070:  3b20 6368 6172 7365 743d 7574 662d 380d  ;.charset=utf-8.
    0x0080:  0a64 6174 653a 2057 6564 2c20 3331 204f  .date:.Wed,.31.O
    0x0090:  6374 2032 3031 3820 3132 3a35 323a 3435  ct.2018.12:52:45
    0x00a0:  2047 4d54 0d0a 0d0a f09f 90b2            .GMT........
21:52:45.539267 IP (tos 0x10, ttl 64, id 16553, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.47682 > 127.0.0.1.8081: Flags [.], cksum 0xfe28 (incorrect -> 0xb6f6), seq 19, ack 121, win 342, options [nop,nop,TS val 726815606 ecr 726815605], length 0
    0x0000:  4510 0034 40a9 4000 4006 fc08 7f00 0001  E..4@.@.@.......
    0x0010:  7f00 0001 ba42 1f91 2a83 4580 c47b b48b  .....B..*.E..{..
    0x0020:  8010 0156 fe28 0000 0101 080a 2b52 5376  ...V.(......+RSv
    0x0030:  2b52 5375                                +RSu
21:52:45.539433 IP (tos 0x0, ttl 64, id 45534, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47682: Flags [F.], cksum 0xfe28 (incorrect -> 0xb6f4), seq 121, ack 19, win 342, options [nop,nop,TS val 726815606 ecr 726815606], length 0
    0x0000:  4500 0034 b1de 4000 4006 8ae3 7f00 0001  E..4..@.@.......
    0x0010:  7f00 0001 1f91 ba42 c47b b48b 2a83 4580  .......B.{..*.E.
    0x0020:  8011 0156 fe28 0000 0101 080a 2b52 5376  ...V.(......+RSv
    0x0030:  2b52 5376                                +RSv
21:52:45.539609 IP (tos 0x10, ttl 64, id 16554, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.47682 > 127.0.0.1.8081: Flags [F.], cksum 0xfe28 (incorrect -> 0xb6f3), seq 19, ack 122, win 342, options [nop,nop,TS val 726815606 ecr 726815606], length 0
    0x0000:  4510 0034 40aa 4000 4006 fc07 7f00 0001  E..4@.@.@.......
    0x0010:  7f00 0001 ba42 1f91 2a83 4580 c47b b48c  .....B..*.E..{..
    0x0020:  8011 0156 fe28 0000 0101 080a 2b52 5376  ...V.(......+RSv
    0x0030:  2b52 5376                                +RSv
21:52:45.539652 IP (tos 0x0, ttl 64, id 45535, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47682: Flags [.], cksum 0xfe28 (incorrect -> 0xb6f3), seq 122, ack 20, win 342, options [nop,nop,TS val 726815606 ecr 726815606], length 0
    0x0000:  4500 0034 b1df 4000 4006 8ae2 7f00 0001  E..4..@.@.......
    0x0010:  7f00 0001 1f91 ba42 c47b b48c 2a83 4581  .......B.{..*.E.
    0x0020:  8010 0156 fe28 0000 0101 080a 2b52 5376  ...V.(......+RSv
    0x0030:  2b52 5376                                +RSv

actix_web::server::KeepAlive::Tcp(5) (same as OS)

$ telnet localhost 8081
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.1

HTTP/1.1 200 OK
content-length: 4
content-type: text/plain; charset=utf-8
date: Wed, 31 Oct 2018 12:59:47 GMT

🐲Connection closed by foreign host.

TRACE 2018-10-31T12:59:43Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:59:43Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:59:43Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:59:43Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:59:43Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:59:43Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:59:43Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
TRACE 2018-10-31T12:59:43Z: actix_web::middleware::logger: Access log format: %a "%r" %s %b "%{Referer}i" "%{User-Agent}i" %T
 INFO 2018-10-31T12:59:47Z: actix_web::middleware::logger: 127.0.0.1:47778 "GET / HTTP/1.1" 200 4 "-" "-" 0.002062
TRACE 2018-10-31T12:59:50Z: actix_web::server::h1: Keep-alive timeout, close connection

$ sudo tcpdump -vvv -s 0 -nX -i lo -r lo-actix-keepalive-tcp-5 port 8081
reading from file lo-actix-keepalive-tcp-5, link-type EN10MB (Ethernet)
21:59:45.000307 IP (tos 0x10, ttl 64, id 51301, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.47778 > 127.0.0.1.8081: Flags [S], cksum 0xfe30 (incorrect -> 0x1584), seq 1738107018, win 43690, options [mss 65495,sackOK,TS val 727235067 ecr 0,nop,wscale 7], length 0
    0x0000:  4510 003c c865 4000 4006 7444 7f00 0001  E..<.e@[email protected]....
    0x0010:  7f00 0001 baa2 1f91 6799 688a 0000 0000  ........g.h.....
    0x0020:  a002 aaaa fe30 0000 0204 ffd7 0402 080a  .....0..........
    0x0030:  2b58 b9fb 0000 0000 0103 0307            +X..........
21:59:45.000339 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    127.0.0.1.8081 > 127.0.0.1.47778: Flags [S.], cksum 0xfe30 (incorrect -> 0x2ee8), seq 45284996, ack 1738107019, win 43690, options [mss 65495,sackOK,TS val 727235067 ecr 727235067,nop,wscale 7], length 0
    0x0000:  4500 003c 0000 4000 4006 3cba 7f00 0001  E..<..@.@.<.....
    0x0010:  7f00 0001 1f91 baa2 02b2 fe84 6799 688b  ............g.h.
    0x0020:  a012 aaaa fe30 0000 0204 ffd7 0402 080a  .....0..........
    0x0030:  2b58 b9fb 2b58 b9fb 0103 0307            +X..+X......
21:59:45.000369 IP (tos 0x10, ttl 64, id 51302, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.47778 > 127.0.0.1.8081: Flags [.], cksum 0xfe28 (incorrect -> 0x012d), seq 1, ack 1, win 342, options [nop,nop,TS val 727235067 ecr 727235067], length 0
    0x0000:  4510 0034 c866 4000 4006 744b 7f00 0001  E..4.f@[email protected]....
    0x0010:  7f00 0001 baa2 1f91 6799 688b 02b2 fe85  ........g.h.....
    0x0020:  8010 0156 fe28 0000 0101 080a 2b58 b9fb  ...V.(......+X..
    0x0030:  2b58 b9fb                                +X..
21:59:46.326087 IP (tos 0x10, ttl 64, id 51303, offset 0, flags [DF], proto TCP (6), length 68)
    127.0.0.1.47778 > 127.0.0.1.8081: Flags [P.], cksum 0xfe38 (incorrect -> 0x2a51), seq 1:17, ack 1, win 342, options [nop,nop,TS val 727236392 ecr 727235067], length 16
    0x0000:  4510 0044 c867 4000 4006 743a 7f00 0001  E..D.g@[email protected]:....
    0x0010:  7f00 0001 baa2 1f91 6799 688b 02b2 fe85  ........g.h.....
    0x0020:  8018 0156 fe38 0000 0101 080a 2b58 bf28  ...V.8......+X.(
    0x0030:  2b58 b9fb 4745 5420 2f20 4854 5450 2f31  +X..GET./.HTTP/1
    0x0040:  2e31 0d0a                                .1..
21:59:46.326125 IP (tos 0x0, ttl 64, id 24170, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47778: Flags [.], cksum 0xfe28 (incorrect -> 0xf6c1), seq 1, ack 17, win 342, options [nop,nop,TS val 727236393 ecr 727236392], length 0
    0x0000:  4500 0034 5e6a 4000 4006 de57 7f00 0001  E..4^j@[email protected]....
    0x0010:  7f00 0001 1f91 baa2 02b2 fe85 6799 689b  ............g.h.
    0x0020:  8010 0156 fe28 0000 0101 080a 2b58 bf29  ...V.(......+X.)
    0x0030:  2b58 bf28                                +X.(
21:59:47.128546 IP (tos 0x10, ttl 64, id 51304, offset 0, flags [DF], proto TCP (6), length 54)
    127.0.0.1.47778 > 127.0.0.1.8081: Flags [P.], cksum 0xfe2a (incorrect -> 0xe68a), seq 17:19, ack 1, win 342, options [nop,nop,TS val 727237195 ecr 727236393], length 2
    0x0000:  4510 0036 c868 4000 4006 7447 7f00 0001  E..6.h@[email protected]....
    0x0010:  7f00 0001 baa2 1f91 6799 689b 02b2 fe85  ........g.h.....
    0x0020:  8018 0156 fe2a 0000 0101 080a 2b58 c24b  ...V.*......+X.K
    0x0030:  2b58 bf29 0d0a                           +X.)..
21:59:47.128591 IP (tos 0x0, ttl 64, id 24171, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47778: Flags [.], cksum 0xfe28 (incorrect -> 0xf07a), seq 1, ack 19, win 342, options [nop,nop,TS val 727237195 ecr 727237195], length 0
    0x0000:  4500 0034 5e6b 4000 4006 de56 7f00 0001  E..4^k@[email protected]....
    0x0010:  7f00 0001 1f91 baa2 02b2 fe85 6799 689d  ............g.h.
    0x0020:  8010 0156 fe28 0000 0101 080a 2b58 c24b  ...V.(......+X.K
    0x0030:  2b58 c24b                                +X.K
21:59:47.131402 IP (tos 0x0, ttl 64, id 24172, offset 0, flags [DF], proto TCP (6), length 172)
    127.0.0.1.8081 > 127.0.0.1.47778: Flags [P.], cksum 0xfea0 (incorrect -> 0xf197), seq 1:121, ack 19, win 342, options [nop,nop,TS val 727237198 ecr 727237195], length 120
    0x0000:  4500 00ac 5e6c 4000 4006 dddd 7f00 0001  E...^l@.@.......
    0x0010:  7f00 0001 1f91 baa2 02b2 fe85 6799 689d  ............g.h.
    0x0020:  8018 0156 fea0 0000 0101 080a 2b58 c24e  ...V........+X.N
    0x0030:  2b58 c24b 4854 5450 2f31 2e31 2032 3030  +X.KHTTP/1.1.200
    0x0040:  204f 4b0d 0a63 6f6e 7465 6e74 2d6c 656e  .OK..content-len
    0x0050:  6774 683a 2034 0d0a 636f 6e74 656e 742d  gth:.4..content-
    0x0060:  7479 7065 3a20 7465 7874 2f70 6c61 696e  type:.text/plain
    0x0070:  3b20 6368 6172 7365 743d 7574 662d 380d  ;.charset=utf-8.
    0x0080:  0a64 6174 653a 2057 6564 2c20 3331 204f  .date:.Wed,.31.O
    0x0090:  6374 2032 3031 3820 3132 3a35 393a 3437  ct.2018.12:59:47
    0x00a0:  2047 4d54 0d0a 0d0a f09f 90b2            .GMT........
21:59:47.131449 IP (tos 0x10, ttl 64, id 51305, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.47778 > 127.0.0.1.8081: Flags [.], cksum 0xfe28 (incorrect -> 0xeffc), seq 19, ack 121, win 342, options [nop,nop,TS val 727237198 ecr 727237198], length 0
    0x0000:  4510 0034 c869 4000 4006 7448 7f00 0001  E..4.i@[email protected]....
    0x0010:  7f00 0001 baa2 1f91 6799 689d 02b2 fefd  ........g.h.....
    0x0020:  8010 0156 fe28 0000 0101 080a 2b58 c24e  ...V.(......+X.N
    0x0030:  2b58 c24e                                +X.N
21:59:50.001987 IP (tos 0x0, ttl 64, id 24173, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47778: Flags [F.], cksum 0xfe28 (incorrect -> 0xe4c5), seq 121, ack 19, win 342, options [nop,nop,TS val 727240068 ecr 727237198], length 0
    0x0000:  4500 0034 5e6d 4000 4006 de54 7f00 0001  E..4^m@[email protected]....
    0x0010:  7f00 0001 1f91 baa2 02b2 fefd 6799 689d  ............g.h.
    0x0020:  8011 0156 fe28 0000 0101 080a 2b58 cd84  ...V.(......+X..
    0x0030:  2b58 c24e                                +X.N
21:59:50.002168 IP (tos 0x10, ttl 64, id 51306, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.47778 > 127.0.0.1.8081: Flags [F.], cksum 0xfe28 (incorrect -> 0xd98d), seq 19, ack 122, win 342, options [nop,nop,TS val 727240069 ecr 727240068], length 0
    0x0000:  4510 0034 c86a 4000 4006 7447 7f00 0001  E..4.j@[email protected]....
    0x0010:  7f00 0001 baa2 1f91 6799 689d 02b2 fefe  ........g.h.....
    0x0020:  8011 0156 fe28 0000 0101 080a 2b58 cd85  ...V.(......+X..
    0x0030:  2b58 cd84                                +X..
21:59:50.002225 IP (tos 0x0, ttl 64, id 24174, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.8081 > 127.0.0.1.47778: Flags [.], cksum 0xfe28 (incorrect -> 0xd98c), seq 122, ack 20, win 342, options [nop,nop,TS val 727240069 ecr 727240069], length 0
    0x0000:  4500 0034 5e6e 4000 4006 de53 7f00 0001  E..4^n@[email protected]....
    0x0010:  7f00 0001 1f91 baa2 02b2 fefe 6799 689e  ............g.h.
    0x0020:  8010 0156 fe28 0000 0101 080a 2b58 cd85  ...V.(......+X..
    0x0030:  2b58 cd85                                +X..

legokichi on 31 Oct 2018

https://github.com/actix/actix-web/blob/cfd9a56ff74dd6cb6ad38b6a67b40ed1763d6071/src/server/h1writer.rs#L131-L135

This code does not always give connection close header.

Why dont you use self.settings.keep_alive_enabled() ?

if msg.keep_alive().unwrap_or_else(|| self.settings.keep_alive_enabled()) {

legokichi on 1 Nov 2018

@legokichi I think in this case it is supposed to be set in either request or response already.
Though I need to re-check code to be sure

UPD: Settings are used here to determine whether we should start keep-alive timer itself https://github.com/actix/actix-web/blob/cfd9a56ff74dd6cb6ad38b6a67b40ed1763d6071/src/server/h1.rs#L93-L123

Though looking at how we set headers I wonder if we're actually doing it wrong as on HTTP1.1 we should set keep-alive to connection when keep-alive is on, and Close otherwise, while for HTTP1.0 it is always by default should be close and we don't even need to set it

@fafhrd91 What do you think?

UPD: though maybe we need to set keep-alive for HTTP1.0, but I think we should set it for HTTP1.1 too

DoumanAsh on 1 Nov 2018

❤1

Relating to https://github.com/actix/actix-web/issues/439#issuecomment-434714357 ,

I confirmed that function poll_flush(true) will never be called when ::actix_web::server::KeepAlive::Timeout(5)
Http1Dispatcher::poll will not be polled after 5 seconds.
No timer will fire.

I checked on master branch today.

legokichi on 1 Nov 2018

Thanks! It is strange because we should set shutdown flag on expiration of timer. I'll try to check code there again when I'll have time

DoumanAsh on 1 Nov 2018

❤1

@DoumanAsh i'll handle this

fafhrd91 on 1 Nov 2018

❤1

Probably this is necessary.

diff --git a/src/server/h1.rs b/src/server/h1.rs
index a2ffc055..3003a4ac 100644
--- a/src/server/h1.rs
+++ b/src/server/h1.rs
@@ -192,6 +192,7 @@ where

     #[inline]
     pub fn poll(&mut self) -> Poll<(), HttpDispatchError> {
+        ::futures::task::current().notify();
         // check connection keep-alive
         self.poll_keepalive()?;

legokichi on 5 Nov 2018

notify forces tokio to poll future again, which should only be ever necessary when we return NotReady and no IO is scheduled

DoumanAsh on 5 Nov 2018

@legokichi should be fixed in master

@neowutran could you test master again?

fafhrd91 on 8 Nov 2018

👍1

Testing with live traffic,
service is running for 3 days, no unclosed connection for the moment : )
I will let it run for few more days
( however, there are still some memory leak, from 0.1% to 8% in 3 days )

neowutran on 12 Nov 2018

seems good.

legokichi on 13 Nov 2018

Didn't saw any unclosed connection yet, but a lot of memory leak. Crashed once by out of memory, relaunched, and now needed to stop the service as I released a new version of my software, so much more traffic on live

valgrind log when I stopped the service:
https://neowutran.ovh/valgrind_debug9999.err

used the "/shutdown" to close it gracefully

neowutran on 13 Nov 2018

@neowutran Thanks for log, I'll take a look at it during this weekend, sorry for long response

UPD: So far I cannot make sense out of valgrind output

UPD2: Most likely valgrind itself will not be useful here, but it might point where we have some buffer that gets constantly increased over time

DoumanAsh on 23 Nov 2018

@neowutran Just curious, are errors seens through the whole time while you're running your service, or only from time to time?

DoumanAsh on 24 Nov 2018

I wonder why this bug has no attention for so long time (since Nov 24 2018). If server fails due memory leaks - how you run your web sites on it?

e-oz on 19 Feb 2019

👍4

Well it is not easy to understand where the problem yet, and considering upcoming refactoring and change with new major version, it might become irrelevant bug

DoumanAsh on 19 Feb 2019

this should be fixed in 0.7.19

fafhrd91 on 30 Mar 2019

@fafhrd91, @DoumanAsh Hi! Has this issue actually been properly solved? I've been using 1.x version of actix-web for months, had to restart my app every now and then (sometimes after minutes, sometimes after days) since there are a lot of ESTABLISHED connections left there hanging, eventually causing too many open files error. I'm using my server with keep-alive disabled, the rest of the settings are the defaults. I have since tried to upgrade to 2.0.0 to see if it solves the problem, but it's the same thing.

The service itself gets around 500-1000 requests per second in production currently.

orangesoup on 11 Jan 2020

@orangesoup Original problem was 0.7 and was solved. Crate new issue

fafhrd91 on 11 Jan 2020

Alright, will do, thanks!

orangesoup on 11 Jan 2020

Actix-web: Connection not closed correctly

Most helpful comment

All 116 comments

Related issues