This issue has been migrated from Redmine: https://dev.icinga.com/issues/10638
Created by gbeutner on 2015-11-16 06:50:00 +00:00
Assignee: _mfriedrich_
Status: _Assigned_
Target Version: _Backlog_
Last Update: _2017-01-09 15:44:03 +00:00 (in Redmine)_
Icinga Version: 2.4.0
Backport?: Not yet backported
Include in Changelog: 1
Relations:
Updated by mfriedrich on 2016-03-18 16:14:10 +00:00
Updated by mfriedrich on 2016-04-01 11:38:29 +00:00
Updated by mfriedrich on 2016-04-01 11:39:50 +00:00
mbmif /usr/local/icinga2/etc/icinga2/tests (master) # ls -la /usr/local/icinga2/var/lib/icinga2/api/packages/_api/
total 24
drwx------ 6 icinga staff 204 Sep 15 2015 .
drwx------ 4 icinga staff 136 Dec 10 15:55 ..
-rw-r--r-- 1 icinga staff 33 Sep 15 2015 active-stage
-rw-r--r-- 1 icinga staff 450 Sep 15 2015 active.conf
-rw-r--r-- 1 icinga staff 25 Sep 15 2015 include.conf
drwx------ 5 icinga staff 170 Sep 15 2015 mbmif.int.netways.de-1442309540-1
mbmif /usr/local/icinga2/etc/icinga2/tests (master) # ls -la /usr/local/icinga2/var/lib/icinga2/api/packages/_api/mbmif.int.netways.de-1442309540-1/
total 8
drwx------ 5 icinga staff 170 Sep 15 2015 .
drwx------ 6 icinga staff 204 Sep 15 2015 ..
drwx------ 7 icinga staff 238 Mar 22 21:22 conf.d
-rw-r--r-- 1 icinga staff 157 Sep 15 2015 include.conf
drwx------ 2 icinga staff 68 Sep 15 2015 zones.d
Updated by mfriedrich on 2016-04-01 11:42:44 +00:00
We should implement that for the runtime create objects which are using the api packages internally. Not with highest priority but it would probably help with support.
Updated by gbeutner on 2016-08-25 16:11:28 +00:00
Updated by mfriedrich on 2016-11-09 14:54:30 +00:00
Updated by mfriedrich on 2016-12-07 17:17:58 +00:00
Updated by mfriedrich on 2017-01-09 15:43:08 +00:00
Updated by mfriedrich on 2017-01-09 15:44:03 +00:00
Updated by mfriedrich on 2017-01-09 15:44:39 +00:00
FYI, I had this problem on my 2.5.4 standalone server. The _api/ folder had gotten corrupted somehow; it was missing a bunch of files such as active.conf, include.conf, etc. I was able to fix it by blowing away /var/lib/icinga2/api/packages/_api and restarting icinga. This resulted in my missing files (active.conf, etc) being recreated automatically. Downtimes are now working correctly.
At some point the stageName is empty, thus creating such a mess. It is on my TODO list to find out why.
At some point I have tested a master-satellite setup which didn't work out for me. Thus I have reverted all configuration files back to the standalone configuration. I think problems started after that though I can't tell for sure...
Workaround for manually re-creating such:
You can of course try it in different ways, but that one will prevent you from additional restarts.
If you're planning to manually restore the files, their structure is described inside
Example: My stage name is mbmif.int.netways.de-1442309540-1
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # ls -lah
total 24
drwx------ 6 icinga icinga 204B Apr 1 2016 .
drwx------ 4 icinga icinga 136B Dec 10 2015 ..
-rw-r--r-- 1 icinga icinga 33B Sep 15 2015 active-stage
-rw-r--r-- 1 icinga icinga 450B Sep 15 2015 active.conf
-rw-r--r-- 1 icinga icinga 25B Sep 15 2015 include.conf
drwx------ 5 icinga icinga 170B Nov 21 15:24 mbmif.int.netways.de-1442309540-1
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # cat active-stage
mbmif.int.netways.de-1442309540-1
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # cat active.conf
if (!globals.contains("ActiveStages")) {
globals.ActiveStages = {}
}
if (globals.contains("ActiveStageOverride")) {
var arr = ActiveStageOverride.split(":")
if (arr[0] == "_api") {
if (arr.len() < 2) {
log(LogCritical, "Config", "Invalid value for ActiveStageOverride")
} else {
ActiveStages["_api"] = arr[1]
}
}
}
if (!ActiveStages.contains("_api")) {
ActiveStages["_api"] = "mbmif.int.netways.de-1442309540-1"
}
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # cat include.conf
include "*/include.conf"
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # ls -lah mbmif.int.netways.de-1442309540-1/
total 8
drwx------ 5 icinga icinga 170B Nov 21 15:24 .
drwx------ 6 icinga icinga 204B Apr 1 2016 ..
drwx------ 9 icinga icinga 306B May 10 2016 conf.d
-rw-r--r-- 1 icinga icinga 157B Sep 15 2015 include.conf
drwx------ 2 icinga icinga 68B Sep 15 2015 zones.d
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # cat mbmif.int.netways.de-1442309540-1/include.conf
include "../active.conf"
if (ActiveStages["_api"] == "mbmif.int.netways.de-1442309540-1") {
include_recursive "conf.d"
include_zones "_api", "zones.d"
}
This should allow you to reconstruct the files manually, just look where the stage name is used.
If you're running into the problem that there's a conf.d/ directory in the top level of the "_api" package directory, safely move its content to stagename/conf.d and verify that all include.conf files are properly initialized.
If you happen to have such a case, I'd appreciate a copy of that as tarball (remove sensitive host details beforehand).
Thanks. This is very useful information.
On Sun, Feb 26, 2017 at 3:20 AM, Michael Friedrich <[email protected]
wrote:
Workaround for manually re-creating such:
Michael Martinez
http://www.michael--martinez.com
I was not able to reproduce this in a problematic way. All I managed to get were two stages for one node, this happens thanks to us happily performing surgery on files in parallel, which could easily be the cause for the other problems.
The only solution @gunnarbeutner and could come up with right now is using a mutex whenever we write, read and activate stages.
At some point the stageDir string is empty. We should at least log/break when this happens to ensure data integrity of existing files.
Next steps:
Tests worked (Script below). But there where no issues like the ones described. I also removed the log message about the lacking active-stage, because in some places it gets called it does not matter whether it's empty or not and we have the lock in cases where race conditions may happen.
About the missing files:
Thanks to the locks they should not be overwritten anymore, if the user deletes them they are regenerated at startup. How should we proceed with this?
Script I used for testing:
for i in `seq 1 20`; do
curl -k -s -u root:icinga -H 'Accept: application/json' -X POST "https://localhost:5665/v1/config/packages/example-cmdb${i}" &
done
for i in `seq 1 20`; do
echo "{\"files\": {\"conf.d/test.conf\": \"object Host \\\"cmdb-host${i}\\\" { check_command = \\\"flatter\\\" }\"}}" | \
curl -k -s -u root:icinga -H 'Accept: application/json' -X POST \
-d @- "https://localhost:5665/v1/config/stages/example-cmdb${i}"
done
@Crunsher do you mean that the include.conf files modified by the user should be re-created on each request? I would strongly advise against it for performance reasons. Users must not edit the _api package, and the daemon must rely on the fact it is the owner for these files. If the daemon puts out garbage, that's the mentioned bug being fixed. But I would not care if the package remains broken because of a manual user change in there.
I've created a PR out of the fix branch, so it is not forgotten for reviews.
@dnsmichi Gods no! Currently we re-create it if it does not exist on startup (covers initial creation). So I guess the locks/make atomic fixes this bug then
Ok, thanks, then the PR of yours should be merged and we bug anyone who encounters the issue reliably to test the snapshot packages then.
The bug is not fixed, we see it in v2.8.
I opened a forum thread regarding this bug https://monitoring-portal.org/t/host-is-not-visible-via-api/2142
I've faced with the same issue. How can I fix it? I've tested it on v2.6 and on v2.9.
/var/lib/icinga2/api/packages/_api/
โโโ active.conf
โโโ active-stage
โโโ include.conf
โโโ host-name.example.com-1535636549-1
โโโ conf.d
โย ย โโโ downtimes
โย ย โโโ hosts
โย ย โโโ test-host.example.conf
โโโ include.conf
โโโ zones.d
# cat /var/lib/icinga2/api/packages/_api/active.conf
if (!globals.contains("ActiveStages")) {
globals.ActiveStages = {}
}
if (globals.contains("ActiveStageOverride")) {
var arr = ActiveStageOverride.split(":")
if (arr[0] == "_api") {
if (arr.len() < 2) {
log(LogCritical, "Config", "Invalid value for ActiveStageOverride")
} else {
ActiveStages["_api"] = arr[1]
}
}
}
if (!ActiveStages.contains("_api")) {
ActiveStages["_api"] = "host-name.example.com-1535636549-1"
}
# cat /var/lib/icinga2/api/packages/_api/active-stage
host-name.example.com-1535636549-1
# cat /var/lib/icinga2/api/packages/_api/include.conf
include "*/include.conf"
# cat /var/lib/icinga2/api/packages/_api/host-name.example.com-1535636549-1/conf.d/hosts/test-host.example.com.conf
object Host "test-host.example.com" {
import "P2-host"
address = "test-host.example.com"
display_name = "test-host.example.com"
notes = "my notes"
notes_url = "http://test-host.example.com"
vars["args"] = {
services = {
check_snmp_mem = {
arg1 = "someone"
arg2 = "90,0"
arg3 = "100,30"
name = "MEMORY"
}
ftp = {
arg1 = 20.000000
arg2 = 10.000000
name = "FTP"
}
}
}
vars["facts"] = {
nrpe = [ "check_disk", "check_file_exist" ]
services = [ "ssh", "ftp" ]
services_p3 = [ "load", "check_snmp_mem" ]
}
version = 1535637140.067982
zone = "some-zone"
}
# cat /var/lib/icinga2/api/packages/_api/host-name.example.com-1535636549-1/include.conf
include "../active.conf"
if (ActiveStages["_api"] == "host-name.example.com-1535636549-1") {
include_recursive "conf.d"
include_zones "_api", "zones.d"
}