Akka-http: Responding with a stream sometimes result in "Connection reset by peer" error

Created on 28 Jun 2018  路  9Comments  路  Source: akka/akka-http

In our application we have routes that are streaming JSON documents. Here is an example:

/** GET api/1/tenant/(tenantId)/ads/ */
def getAllAdsByOwner(advertiserId: AdvertiserId): Route =
  get {
    httpRequiredSession { username =>
      getAllTenantAds(username, advertiserId) { (adSource: Source[AdView, Any]) =>
        complete(adSource)
      }
    }
  }

Most of the time it works as expected, but sometimes, especially when there are many simultaneous requests, the server starts resetting connection just after the headers have been sent.
I tested with a script that requests this route with curl in a loop and aborting if the request failed. It was running for about 2 minutes before stopping. Trace when request fails is the following:

<= Recv header, 17 bytes (0x11)
0000: HTTP/1.1 200 OK
<= Recv header, 54 bytes (0x36)
0000: Access-Control-Allow-Origin: https://<...>
<= Recv header, 135 bytes (0x87)
0000: Access-Control-Expose-Headers: Content-Type, Authorization, Refr
0040: esh-Token, Set-Authorization, Set-Refresh-Token, asset-content-l
0080: ength
<= Recv header, 40 bytes (0x28)
0000: Access-Control-Allow-Credentials: true
<= Recv header, 24 bytes (0x18)
0000: Content-Encoding: gzip
<= Recv header, 23 bytes (0x17)
0000: X-Frame-Options: DENY
<= Recv header, 33 bytes (0x21)
0000: X-Content-Type-Options: nosniff
<= Recv header, 26 bytes (0x1a)
0000: Content-Security-Policy: .
<= Recv header, 20 bytes (0x14)
0000: default-src 'self';.
<= Recv header, 63 bytes (0x3f)
0000: style-src 'self' 'unsafe-inline' https://fonts.googleapis.com;.
<= Recv header, 59 bytes (0x3b)
0000: font-src 'self' 'unsafe-inline' https://fonts.gstatic.com;.
<= Recv header, 99 bytes (0x63)
0000: script-src 'self' 'unsafe-inline' 'unsafe-eval' https://*.google
0040: apis.com https://maps.gstatic.com;.
<= Recv header, 69 bytes (0x45)
0000: img-src 'self' data: https://*.googleapis.com https://*.gstatic.
0040: com;.
<= Recv header, 8 bytes (0x8)
0000:
<= Recv header, 26 bytes (0x1a)
0000: Server: akka-http/10.1.2
<= Recv header, 37 bytes (0x25)
0000: Date: Wed, 27 Jun 2018 15:20:24 GMT
<= Recv header, 28 bytes (0x1c)
0000: Transfer-Encoding: chunked
<= Recv header, 32 bytes (0x20)
0000: Content-Type: application/json
<= Recv header, 2 bytes (0x2)
0000:
== Info: Recv failure: Connection reset by peer
== Info: stopped the pause stream!
== Info: Closing connection 0
curl: (56) Recv failure: Connection reset by peer

The same request inspected in Wireshark:

screen shot 2018-06-27 at 7 32 08 pm

Reading logs didn't give any hint about probable source of the problem. Response logged as successful:

[27-06-2018 19:44:52.837][INFO] access: 'GET /api/1/tenant/ca764a91-8616-409c-8f08-c64a40d3fc07/ads' 200 596ms

There are some old issues which could be related, though not sure:
https://github.com/akka/akka/issues/17854
https://github.com/akka/akka/issues/22177

Versions of used software:
Scala: 2.11.11
Akka: 2.5.13
AkkaHttp: 10.1.2

Configuration files of Akka:
akka.conf.txt
akka-http-core.conf.txt
logback.xml.txt

UPDATED
akka-http updated to version 10.1.3
Loglevels changed to more verbose

bug 1 - triaged

Most helpful comment

Thanks for the additional info. That indeed looks like the dreaded race condition that can convert failures to regular completion. We will fix that in 10.2.0. A preliminary version is in https://github.com/akka/akka-http/pull/2678 but I didn't yet finish it.

akka-http 10.1.x still depends on Akka 2.5 which doesn't support cancellation cause propagation so we couldn't make use of that feature so far. We will drop 2.5 support in 10.2.0, so that might help fixing the problem there.

All 9 comments

Though I think it's unlikely to be related to this particular problem, it's highly recommended to update to akka-http 10.1.3 (https://akka.io/blog/news/2018/06/15/akka-http-10.1.3-released)

Do you find anything interesting in the logging? Perhaps after widening the loglevels?

@raboof Unfortunately no any hints in the log signifying the problem. Also updating akka-http to the latest version didn't seem to change anything.

We see such behavior when complete with source and error happens inside source itself, i.e fetching some elements of the source from db, idle timeout, etc.

You will see 200 logged but connection reset if you complete with httpresponse(status = OK, entity = HttpEntity(json, Source.failed))

We have the same issue, Source which is streamed to the client always finishes with success(when monitoring using watchTermination), but sometimes client gets connection closed with

requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

We dont't have that problem when using mapAsync(1) in the Source but it happens when using mapAsyncUnordered(4)

This usually happens when Stream has failed and the exception is swallowed as a result. For debugging purposes recover failing stream re-throw the exception

we've seen this kind of errors too, after digging into this I have few conclusions:

  1. in our case this happened when system was under extreme load, I'd say it somehow relates to @lustefaniak comment about mapAsync(1) vs mapAsync(4).
  2. we were able to mitigate this by increasing the idle timeout (akka.http.server.idle-timeout)
  3. re. @kstokoz comment about failing entity stream, I've debugged such a scenario and it seems some of the layers in akka-http and akka-tcp aren't using akka 2.6 support for propagating errors upstream (onDownstreamFinish(cause: Throwable)), hence the exception get swallowed along the way and the stream completes successfully.
  4. watchTermination() doesn't help either since the downstream failure never makes it up to the user's flow.

Thanks for the additional info. That indeed looks like the dreaded race condition that can convert failures to regular completion. We will fix that in 10.2.0. A preliminary version is in https://github.com/akka/akka-http/pull/2678 but I didn't yet finish it.

akka-http 10.1.x still depends on Akka 2.5 which doesn't support cancellation cause propagation so we couldn't make use of that feature so far. We will drop 2.5 support in 10.2.0, so that might help fixing the problem there.

I consider this closed by #3022. If it reappears please open another ticket.

Was this page helpful?
0 / 5 - 0 ratings