Ingress-nginx: ingress NGINX error curl: (92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)

Created on 15 Oct 2019 · 18Comments · Source: kubernetes/ingress-nginx

NGINX Ingress controller version:

helm chart -- stable/nginx-ingress:1.22.1
controller.image.tag -- 0.26.1

Kubernetes version (use kubectl version):
v1.12.9-gke.15
Environment:

Cloud provider or hardware configuration: GKE
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

What happened:

This is not curl problem, definitely problem in nginx-ingress and http2

When i'm trying to curl my endpoint

`
Using HTTP2, server supports multi-use

Connection state changed (HTTP/2 confirmed)
Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
Using Stream ID: 1 (easy handle 0x7febb1800400)

GET /api-console/api-model.json HTTP/2
Host: bs-tangerine-s2.tom.takeoff.com
User-Agent: curl/7.54.0
Accept: /

Connection state changed (MAX_CONCURRENT_STREAMS updated)!
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0< HTTP/2 200
< server: nginx/1.15.9
< date: Tue, 01 Oct 2019 11:26:53 GMT
< content-type: application/json
< content-length: 596322
< vary: Accept-Encoding
< last-modified: Fri, 27 Sep 2019 15:00:12 GMT
< strict-transport-security: max-age=15724800; includeSubDomains
<
{ [3907 bytes data]
85 582k 85 499k 0 0 186k 0 0:00:03 0:00:02 0:00:01 186k* HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)
99 582k 99 579k 0 0 195k 0 0:00:02 0:00:02 --:--:-- 195k

Closing connection 0

TLSv1.2 (OUT), TLS alert, Client hello (1):
} [2 bytes data]
curl: (92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)
`

What you expected to happen:

Download JSON, works fine when disable http/2 and work with http1.1

How to reproduce it (as minimally and precisely as possible):

Create app and trying to download big JSON file

Anything else we need to know:

As workaround need to disable http/2

lifecyclrotten

Source

shmykyta

Most helpful comment

For the record, in my case it was a client bug in curl limited to Mac OS builds
With curl built from https://github.com/curl/curl/commit/34a5400de113f2badfaa338b00cc361b4ab8355a the issue no longer occurs.
curl never sent the whole request body but kept hanging consuming CPU time likely looping
This caused 408 and 499 (timeout/Client Closed Request) on the NGINX side
After NGINX's proxy timeout was reached, I got curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)

See https://github.com/curl/curl/issues/1410

bauerd on 10 Jun 2020

👍2

All 18 comments

@shmykyta please post the ingress controller pod log

aledbf on 15 Oct 2019

@aledbf there are not a lot info, just

 - - [01/Oct/2019:11:27:36 +0000] "GET /api-console/api-model.json HTTP/1.1" 200 596338 "-" "curl/7.54.0" 119 1.005 [XXX] [] 172.16.66.72:8080 596338 1.005 200 676e9aaff910d937c63b67d642d9c6ae

And after that timeout 60 sec

shmykyta on 15 Oct 2019

@shmykyta ok, maybe you need to adjust the read timeouts
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-timeouts

nginx.ingress.kubernetes.io/proxy-read-timeout: "300"

1.005

That means the response was returned in 1 second.

aledbf on 15 Oct 2019

@aledbf
So, i got response in 1 second with error and broken json

curl: (92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)

As you see i get around 200k of 600k and then trasfering stoped

99 582k 99 579k 0 0 195k 0 0:00:02 0:00:02 --:--:-- 195k

In this case timeouts are ok

shmykyta on 15 Oct 2019

I suggest you use kubectl exec in the ingress controller pod to execute the curl command (using localhost and -H for Host header)

aledbf on 15 Oct 2019

@aledbf

* Expire in 0 ms for 6 (transfer 0x562f7ca34d30)
* Expire in 1 ms for 1 (transfer 0x562f7ca34d30)
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Expire in 10 ms for 1 (transfer 0x562f7ca34d30)
* Expire in 10 ms for 1 (transfer 0x562f7ca34d30)
*   Trying xxx.xxx.xxx.xxx...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x562f7ca34d30)
* Connected to XXX-HOST (xxx.xxx.xxx.xxx) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [106 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [2908 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [300 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  <<< CERTS HERE >>
*  start date: Sep 25 00:00:00 2019 GMT
*  expire date: Sep 29 12:00:00 2021 GMT
*  <<< CERTS HERE >>
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x562f7ca34d30)
} [5 bytes data]
> GET /api-console/api-model.json HTTP/2
> Host: XXX-HOST
> User-Agent: curl/7.64.0
> Accept: */*
> 
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
} [5 bytes data]
< HTTP/2 200 
< server: nginx/1.15.9
< date: Wed, 16 Oct 2019 10:02:52 GMT
< content-type: application/json
< content-length: 596322
< vary: Accept-Encoding
< last-modified: Fri, 27 Sep 2019 15:00:12 GMT
< strict-transport-security: max-age=15724800; includeSubDomains
< 
{ [3907 bytes data]
* HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
* stopped the pause stream!
 99  582k   99  579k    0     0  7070k      0 --:--:-- --:--:-- --:--:-- 7070k
* Connection #0 to host XXX-HOST left intact
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)

shmykyta on 16 Oct 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 14 Jan 2020

(Not trying to keep issue alive but) I have the same issue with an old Symfony2 PHP app when trying to download specifically PDFs. The PDFs aren't big, only ~13kb. The issue is occurring between the client and ingress. If I bypass the ingress or send the request over http/1.1 it works fine.

But long term workaround is seemingly to disable http/2. Any solution @shmykyta?

walshyb on 13 Feb 2020

Having the same issue with downloads over http2. I'd love to provide some additional logging but afaik there is none.

fliphess on 5 Mar 2020

It would be great if someone could provide a self-contained example reproducing the issue.

aledbf on 8 Mar 2020

After some research this weekend (Read: Pulling my hair out), I'm not fully convinced yet it has actually something to do with the nginx-ingress controller or nginx in general. I'm suspecting an incorrect (too short) Content-Length header that was previously working due to how forgiving http1.1 is.

The endpoint that we experience this issue is bound to our monolith and providing a self-contained example would result in shipping half our software stack, so that's going to be very hard if not impossible. I've done some stracing in debug mode and so far do not notice any references in the strace output to the nginx ingress controller experiencing the network error the client experiences other than just closing the client after the data has been send. I'll look into it some more, but for now as far as I can conclude this is a non-related error caused by how the different http protocols respond to protocol errors.

(Edit: Typo's)

fliphess on 9 Mar 2020

I am having the same issue, whenever I run a POST request with Curl with a large JSON Body (300kb) the request seems to get timeouted and NGINX writes a 408 response status in the access log. Requests with a smallar JSON body (10kb) at the same target work completely fine. This is the verbose curl output:

curl -X POST -H "Content-Type: application/json" -H "Accept: application/json" -H "Authorization: Bearer $api_key" -d @d.json https://redacted-domain.com/api/dashboards/db -v
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 23.151.xxx.xxx...
* TCP_NODELAY set
* Connected to redacted-domain.com (23.151.xxx.xxx) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=redacted-domain.com
*  start date: Feb 18 08:31:58 2020 GMT
*  expire date: May 18 08:31:58 2020 GMT
*  subjectAltName: host "redacted-domain.com" matched cert's "redacted-domain.com"
*  issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fd5bd807400)
> POST /api/dashboards/db HTTP/2
> Host: redacted-domain.com
> User-Agent: curl/7.64.1
> Content-Type: application/json
> Accept: application/json
> Authorization: Bearer redacted
> Content-Length: 278482
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
* We are completely uploaded and fine
* HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
* stopped the pause stream!
* Connection #0 to host redacted-domain.com left intact
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)

After * We are completely uploaded and fine is printed it takes 60s (default client request timeout in nginx ingress) until the rest of the message is printed. Do you have any ideas I could try to resolve the issue? Any config options?

Additional context:

I tried port forwarding the upstream pod (Grafana pod) and sent the curl directly against that pod and the command works completely fine (success response within a few ms).

weeco on 17 Mar 2020

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 16 Apr 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 16 May 2020

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 16 May 2020

Anyone made progress on the cause here?

bauerd on 2 Jun 2020

For me the issue was in the loadbalancer upstream that closed the connection as the idle time was reached, but it can be in several parts of your stack.

if you suspect the ingress controller (which for me wasn't the culprit), try setting the debug logging to be very loud, you'd definately find your answer there :)

fliphess on 2 Jun 2020

👍1

See https://github.com/curl/curl/issues/1410

bauerd on 10 Jun 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings