Icinga2: [dev.icinga.com #10638] Regenerate the _api/active-stage, _api/active.conf and _api/include.conf files when they're deleted

Created on 16 Nov 2015  ยท  26Comments  ยท  Source: Icinga/icinga2

This issue has been migrated from Redmine: https://dev.icinga.com/issues/10638

Created by gbeutner on 2015-11-16 06:50:00 +00:00

Assignee: _mfriedrich_
Status: _Assigned_
Target Version: _Backlog_
Last Update: _2017-01-09 15:44:03 +00:00 (in Redmine)_

Icinga Version: 2.4.0
Backport?: Not yet backported
Include in Changelog: 1


Relations:

areapi bug queuimportant

All 26 comments

Updated by mfriedrich on 2016-03-18 16:14:10 +00:00

  • Category set to _API_
  • Priority changed from _Normal_ to _Low_

Updated by mfriedrich on 2016-04-01 11:38:29 +00:00

  • Relates set to _11499_

Updated by mfriedrich on 2016-04-01 11:39:50 +00:00

  • Subject changed from _Regenerate the _api/active.conf and _api/include.conf files when they're deleted_ to _Regenerate the _api/active-stage, _api/active.conf and _api/include.conf files when they're deleted_
    mbmif /usr/local/icinga2/etc/icinga2/tests (master) # ls -la /usr/local/icinga2/var/lib/icinga2/api/packages/_api/
    total 24
    drwx------  6 icinga  staff  204 Sep 15  2015 .
    drwx------  4 icinga  staff  136 Dec 10 15:55 ..
    -rw-r--r--  1 icinga  staff   33 Sep 15  2015 active-stage
    -rw-r--r--  1 icinga  staff  450 Sep 15  2015 active.conf
    -rw-r--r--  1 icinga  staff   25 Sep 15  2015 include.conf
    drwx------  5 icinga  staff  170 Sep 15  2015 mbmif.int.netways.de-1442309540-1
    mbmif /usr/local/icinga2/etc/icinga2/tests (master) # ls -la /usr/local/icinga2/var/lib/icinga2/api/packages/_api/mbmif.int.netways.de-1442309540-1/
    total 8
    drwx------  5 icinga  staff  170 Sep 15  2015 .
    drwx------  6 icinga  staff  204 Sep 15  2015 ..
    drwx------  7 icinga  staff  238 Mar 22 21:22 conf.d
    -rw-r--r--  1 icinga  staff  157 Sep 15  2015 include.conf
    drwx------  2 icinga  staff   68 Sep 15  2015 zones.d

Updated by mfriedrich on 2016-04-01 11:42:44 +00:00

  • Priority changed from _Low_ to _Normal_
  • Target Version set to _Backlog_
  • Parent Id set to _11415_

We should implement that for the runtime create objects which are using the api packages internally. Not with highest priority but it would probably help with support.

Updated by gbeutner on 2016-08-25 16:11:28 +00:00

  • Relates set to _12551_

Updated by mfriedrich on 2016-11-09 14:54:30 +00:00

  • Parent Id deleted 11415

Updated by mfriedrich on 2016-12-07 17:17:58 +00:00

  • Status changed from _New_ to _Assigned_
  • Assigned to set to _mfriedrich_

Updated by mfriedrich on 2017-01-09 15:43:08 +00:00

  • Relates set to _13725_

Updated by mfriedrich on 2017-01-09 15:44:03 +00:00

  • Priority changed from _Normal_ to _High_

Updated by mfriedrich on 2017-01-09 15:44:39 +00:00

  • Relates set to _11012_

FYI, I had this problem on my 2.5.4 standalone server. The _api/ folder had gotten corrupted somehow; it was missing a bunch of files such as active.conf, include.conf, etc. I was able to fix it by blowing away /var/lib/icinga2/api/packages/_api and restarting icinga. This resulted in my missing files (active.conf, etc) being recreated automatically. Downtimes are now working correctly.

At some point the stageName is empty, thus creating such a mess. It is on my TODO list to find out why.

At some point I have tested a master-satellite setup which didn't work out for me. Thus I have reverted all configuration files back to the standalone configuration. I think problems started after that though I can't tell for sure...

Workaround for manually re-creating such:

  • Move the existing directories in ./_api/stagename/conf.d/ to a save place
  • rmdir the "_api" package
  • create a dummy comment via REST API and immediately delete it again (this restores the _api package without a restart)
  • move the backup config into ./_api/stagename/conf.d/ again
  • restart Icinga 2

You can of course try it in different ways, but that one will prevent you from additional restarts.

If you're planning to manually restore the files, their structure is described inside

  • ConfigPackageUtility::WritePackageConfig()
  • ConfigPackageUtility::WriteStageConfig()
  • ConfigPackageUtility::ActivateStage()

Example: My stage name is mbmif.int.netways.de-1442309540-1

mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # ls -lah
total 24
drwx------  6 icinga  icinga   204B Apr  1  2016 .
drwx------  4 icinga  icinga   136B Dec 10  2015 ..
-rw-r--r--  1 icinga  icinga    33B Sep 15  2015 active-stage
-rw-r--r--  1 icinga  icinga   450B Sep 15  2015 active.conf
-rw-r--r--  1 icinga  icinga    25B Sep 15  2015 include.conf
drwx------  5 icinga  icinga   170B Nov 21 15:24 mbmif.int.netways.de-1442309540-1
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # cat active-stage
mbmif.int.netways.de-1442309540-1
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # cat active.conf
if (!globals.contains("ActiveStages")) {
  globals.ActiveStages = {}
}

if (globals.contains("ActiveStageOverride")) {
  var arr = ActiveStageOverride.split(":")
  if (arr[0] == "_api") {
    if (arr.len() < 2) {
      log(LogCritical, "Config", "Invalid value for ActiveStageOverride")
    } else {
      ActiveStages["_api"] = arr[1]
    }
  }
}

if (!ActiveStages.contains("_api")) {
  ActiveStages["_api"] = "mbmif.int.netways.de-1442309540-1"
}
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # cat include.conf
include "*/include.conf"
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # ls -lah  mbmif.int.netways.de-1442309540-1/
total 8
drwx------  5 icinga  icinga   170B Nov 21 15:24 .
drwx------  6 icinga  icinga   204B Apr  1  2016 ..
drwx------  9 icinga  icinga   306B May 10  2016 conf.d
-rw-r--r--  1 icinga  icinga   157B Sep 15  2015 include.conf
drwx------  2 icinga  icinga    68B Sep 15  2015 zones.d
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # cat mbmif.int.netways.de-1442309540-1/include.conf
include "../active.conf"
if (ActiveStages["_api"] == "mbmif.int.netways.de-1442309540-1") {
  include_recursive "conf.d"
  include_zones "_api", "zones.d"
}

This should allow you to reconstruct the files manually, just look where the stage name is used.

If you're running into the problem that there's a conf.d/ directory in the top level of the "_api" package directory, safely move its content to stagename/conf.d and verify that all include.conf files are properly initialized.

If you happen to have such a case, I'd appreciate a copy of that as tarball (remove sensitive host details beforehand).

Thanks. This is very useful information.

On Sun, Feb 26, 2017 at 3:20 AM, Michael Friedrich <[email protected]

wrote:

Workaround for manually re-creating such:


Michael Martinez
http://www.michael--martinez.com

I was not able to reproduce this in a problematic way. All I managed to get were two stages for one node, this happens thanks to us happily performing surgery on files in parallel, which could easily be the cause for the other problems.

The only solution @gunnarbeutner and could come up with right now is using a mutex whenever we write, read and activate stages.

At some point the stageDir string is empty. We should at least log/break when this happens to ensure data integrity of existing files.

Next steps:

  • Test with parallel requests
  • Add log messages in case some names that should not be empty are

Tests worked (Script below). But there where no issues like the ones described. I also removed the log message about the lacking active-stage, because in some places it gets called it does not matter whether it's empty or not and we have the lock in cases where race conditions may happen.

About the missing files:
Thanks to the locks they should not be overwritten anymore, if the user deletes them they are regenerated at startup. How should we proceed with this?

Script I used for testing:

for i in `seq 1 20`; do
    curl -k -s -u root:icinga -H 'Accept: application/json' -X POST "https://localhost:5665/v1/config/packages/example-cmdb${i}" &
done
for i in `seq 1 20`; do
        echo "{\"files\": {\"conf.d/test.conf\": \"object Host \\\"cmdb-host${i}\\\" { check_command = \\\"flatter\\\" }\"}}" | \
        curl -k -s -u root:icinga -H 'Accept: application/json' -X POST \
        -d @- "https://localhost:5665/v1/config/stages/example-cmdb${i}" 
done

@Crunsher do you mean that the include.conf files modified by the user should be re-created on each request? I would strongly advise against it for performance reasons. Users must not edit the _api package, and the daemon must rely on the fact it is the owner for these files. If the daemon puts out garbage, that's the mentioned bug being fixed. But I would not care if the package remains broken because of a manual user change in there.

I've created a PR out of the fix branch, so it is not forgotten for reviews.

@dnsmichi Gods no! Currently we re-create it if it does not exist on startup (covers initial creation). So I guess the locks/make atomic fixes this bug then

Ok, thanks, then the PR of yours should be merged and we bug anyone who encounters the issue reliably to test the snapshot packages then.

The bug is not fixed, we see it in v2.8.
I opened a forum thread regarding this bug https://monitoring-portal.org/t/host-is-not-visible-via-api/2142

I've faced with the same issue. How can I fix it? I've tested it on v2.6 and on v2.9.

  • add host via API
  • restart icinga service
  • remove host via API
  • add host via API
  • issue: there is no newly added host in the web interface.
/var/lib/icinga2/api/packages/_api/
โ”œโ”€โ”€ active.conf
โ”œโ”€โ”€ active-stage
โ”œโ”€โ”€ include.conf
โ””โ”€โ”€ host-name.example.com-1535636549-1
    โ”œโ”€โ”€ conf.d
    โ”‚ย ย  โ”œโ”€โ”€ downtimes
    โ”‚ย ย  โ””โ”€โ”€ hosts
    โ”‚ย ย      โ””โ”€โ”€ test-host.example.conf
    โ”œโ”€โ”€ include.conf
    โ””โ”€โ”€ zones.d

# cat /var/lib/icinga2/api/packages/_api/active.conf
if (!globals.contains("ActiveStages")) {
  globals.ActiveStages = {}
}

if (globals.contains("ActiveStageOverride")) {
  var arr = ActiveStageOverride.split(":")
  if (arr[0] == "_api") {
    if (arr.len() < 2) {
      log(LogCritical, "Config", "Invalid value for ActiveStageOverride")
    } else {
      ActiveStages["_api"] = arr[1]
    }
  }
}

if (!ActiveStages.contains("_api")) {
  ActiveStages["_api"] = "host-name.example.com-1535636549-1"
}

# cat /var/lib/icinga2/api/packages/_api/active-stage
host-name.example.com-1535636549-1

# cat /var/lib/icinga2/api/packages/_api/include.conf
include "*/include.conf"

# cat /var/lib/icinga2/api/packages/_api/host-name.example.com-1535636549-1/conf.d/hosts/test-host.example.com.conf 
object Host "test-host.example.com" {
    import "P2-host"

    address = "test-host.example.com"
    display_name = "test-host.example.com"
    notes = "my notes"
    notes_url = "http://test-host.example.com"
    vars["args"] = {
        services = {
            check_snmp_mem = {
                arg1 = "someone"
                arg2 = "90,0"
                arg3 = "100,30"
                name = "MEMORY"
            }
            ftp = {
                arg1 = 20.000000
                arg2 = 10.000000
                name = "FTP"
            }
        }
    }
    vars["facts"] = {
        nrpe = [ "check_disk", "check_file_exist" ]
        services = [ "ssh", "ftp" ]
        services_p3 = [ "load", "check_snmp_mem" ]
    }
    version = 1535637140.067982
    zone = "some-zone"
}

# cat /var/lib/icinga2/api/packages/_api/host-name.example.com-1535636549-1/include.conf 
include "../active.conf"
if (ActiveStages["_api"] == "host-name.example.com-1535636549-1") {
  include_recursive "conf.d"
  include_zones "_api", "zones.d"
}
Was this page helpful?
0 / 5 - 0 ratings