Icinga2: API breaks JSON on large result

Created on 31 Jan 2017  Â·  8Comments  Â·  Source: Icinga/icinga2

To reproduce this please:

  • create a large amount of dummy hosts (I did 100,000)
  • fetch them through the API

Expected result: valid JSON
Actual result: corrupt (truncated) JSON, varying length

Ran into this while using code based on the Icinga Director, but reproduced it with curl on command line:

curl -k -u user:pass https://icinga2:5665/v1/objects/host

CURL request runs about 15-20 seconds and randomly terminates after fetching 7-15MB of JSON, stating:

curl: (18) transfer closed with outstanding read data remaining

Icinga starts sending data about 4 seconds after the request has been sent and then ships data with about 6-700kB/s. Icinga 2 log shows:

information/HttpServerConnection: No messages for Http connection have been received in the last 10 seconds.

Did some more tests after figuring out this. It really looks like Icinga is terminating the request 10 seconds after it effectively starts sending out data. However, those 10 seconds are not precise: sometimes the transfer keeps running nearly twice that time.

Receiving end shouldn't be the issue, it's plain curl on a loopback interface (different containers, same host). Just to be on the safe side, I did the curl run in the very same container. And as an alternative test, I fetched a few Gigabytes in a way involving three containers multiple times. Did so via HTTPS through a local proxy sitting before another local webserver. Gives me more than 3 GBit/s throughput all the time. Should be obvious, but I just wanted to rule out any other local issue eventually influencing network performance in this setup.

Cheers,
Thomas

areapi bug

Most helpful comment

PRs thankfully sent in by @Stefar77 are on our review list.

All 8 comments

From a quick look I'd guess that there's a race condition between m_PendingRequests being decremented to 0 in httpserverconnect.cpp:226 and the amount of data sent back to the client. That causes HttpServerConnection::CheckLiveness() to disconnect the client.

Discussed so far

This behavior is clearly caused by CheckLiveness(). (@dnsmichi is right)
The problem is that that method doesn't check whether all the request processing is "actually done".
Checking this is kinda hard – the easiest way would be dropping this "cronjob".
This would not hurt as

  1. the connections are cleaned up anyway and
  2. this isn't an effective DOS protection.

@Thomas-Gelf Try it with #5419 added some locks that should prevent a race-condition.

We also ran into this on our soon to be production Cluster. Would be great if this could be fixed soon!

Edit: We ran into this issue by querying all services on our setup. This should be around 44000 services.

PRs thankfully sent in by @Stefar77 are on our review list.

@dnsmichi @gunnarbeutner @Crunsher Pls unassing me

This still happens on our 2.8.1 setup :(

I'm playing around with a small patch which keeps the connection open until the stream is fully flushed. This leaves the connection open for 6 minutes for example (with a debug build). I am not sure if we want to have such a behaviour, neither with a dynamic thread pool for connections in 2.10 nor without - poke @lippserd :)

This needs #6655 and #6661 for better analysis (and basically the git master for 2.10), tests are on my macbook with 6k services, random triggering 7k notifications.

curl -H -k -s -u root:icinga 'https://localhost:5665/v1/objects/services'
[2018-10-09 14:47:43 +0200] information/ApiListener: New client connection from [::1]:53915 (no client certificate)
[2018-10-09 14:47:43 +0200] information/HttpServerConnection: Request: GET /v1/objects/services (from [::1]:53915, user: root)
[2018-10-09 14:53:26 +0200] information/HttpServerConnection: HTTP client disconnected (from [::1]:53915)

Second test, with more resources assigned to other apps.

[2018-10-09 15:06:30 +0200] information/ApiListener: New client connection from [::1]:54094 (no client certificate)
[2018-10-09 15:06:30 +0200] information/HttpServerConnection: Request: GET /v1/objects/services (from [::1]:54094, user: root)
[2018-10-09 15:19:27 +0200] information/HttpServerConnection: HTTP client disconnected (from [::1]:54094)
Was this page helpful?
0 / 5 - 0 ratings