Scheduled for CW38, 19.9.2019
Live feed: https://twitter.com/icinga/status/1174311275234504704
@mcktr @Al2Klimov @Crunsher @htriem @lippserd @bobapple The master is now fully frozen. Do not merge anything except for small typo fixes. All remaining PRs are on hold.
Waiting for RC/snapshot customer and user feedback.
Just to let you know: We got the first replies of customers who started test runs of 2.11 RC on Monday. I'll let you know ASAP when I get feedback how the tests went.
ref/NC/627739
The "no shared cipher" problem with Windows agents was successfully mitigated and fixed with one of our customers.
Next up, is #7382 with a possible upgrade & config sync loop.
The sync loop was from binary files which we don't support. Adding detection is hard, and not reasonable for the 99.9% of users who already do use the config sync just for config files. Therefore a doc fix only.
Two new issues with a missing check command - in feedback loop, and debuglog on Windows missing.
@dnsmichi: couldn't we be strict and refuse to work with anything but 100% valid UTF-8?
@Thomas-Gelf We already auto-sanitize JSON I/O (as otherwise our new JSON lib would complain).
Auto-sanitation has it's place. It is required to deal with unclean plugin output and (eventually) configuration "from /etc". I would not apply it to data from "trusted" sources. Read: invalid (non-UTF-8) data in "/var" should lead to an error log message followed by an immediate process shutdown. Invalid data via Netstring should lead to a terminated connection. In these contexts auto-sanitation doesn't help and instead makes part of the problem.
Please move this discussion into #7391 - I've been working on this offline with Tom's help already.
This issue is solely for tracking the tasks left open for 2.11, to keep @lippserd & @bobapple updated.
Cluster config sync is done, the missing powershell command turns into wrong permissions and not really being a bug, the Windows debuglog issue remains non-reproducible.
New to the party is the systemd logging which is part of this week's fixing.
is solved. The Powershell module is being used, which doesn't support icinga2 feature list and variants. It also collides with our graphical setup wizard using the default configuration layout instead of a single icinga2.conf file.
TL;DR - don't use the Powershell module for RC tests.
The reload logging with failed config validation in systemd #7394 now logs this correctly. Alex and myself also decided to add an additional log line to point users to running icinga2 daemon -C afterwards.
Aug 07 11:53:43 icinga2-centos7-dev.vagrant.demo.icinga.com icinga2[22031]: [2019-08-07 11:53:43 +0200] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.
While testing the Windows agent I was looking at something in the docs and decided to restructure the agents chapter. That's following the updates for the distributed monitoring chapter. Done.
The Windows permission problem in #7387 turned into a problem with the Powershell module, and @LordHepipud pointed me to a Director issue. https://github.com/Icinga/icingaweb2-module-director/issues/1297 - was released with 1.6.1 already 馃暫
The network stack with Boost Asio may create fifo pipes visible with lsof. If there's too much, fork errors with too many open files may occur. Under investigation at a customer, raising the nofiles limits as a first shot.
It is not the network stack, it has to do with the check process execution. While mitigating the issue, we've raised the number of open files.
systemctl edit icinga2
LimitNOFILE=50000
LimitNPROC=50000
TasksMax=infinity
vim /etc/default/icinga2
ICINGA2_RLIMIT_FILES=50000
systemctl daemon-reload
systemctl restart icinga2
for p in $(pidof icinga2); do echo -e "$p\n" && ps -ef | grep $p && echo && cat /proc/$p/limits | grep 'open files' && echo; done
for p in $(pidof icinga2); do echo -e "$p\n" && ps -ef | grep $p && echo && lsof -p $p && echo; done
This increased the number of pipes in the main process and fork errors are now gone. Still under investigation why check execution rate may drop - 1000/s vs current_concurrent_checks=10k.
MaxConcurrentChecks is under investigation in our cleanup sprint week, same as the downtime loop. Team @Al2Klimov @bobapple @dnsmichi.
Small version parse fix incoming for the icinga check.
Fork errors are resolved with raising the number of open files, as described in the troubleshooting docs. The general performance is analysed and tested once more.
Coming late to the party, the downtime create/delete loop in HA clusters has been fixed this week with #7198. A nearly 4 year old problem.
The Windows scan crash parts #7431 only exist in RC1 and git master coming from the network stack rewrite. Interestingly enough, this only causes problems on Windows, but not Linux/Unix. It is timing point related and must be fixed prior the release.
The Windows thingy is still going on. Already cost too much time, nerves and sleep.
The fixes for Nessus on Windows are done and already tested. A small fix inside has been reverted causing trouble with the HTTP API. The snapshot packages from today are fully working.
Release is scheduled for CW38.
adds another compatible cipher suite for el6.
armhf/raspbian & the branch/tag thing on GitLab did cost me some time, therefore the build is running for the next 3 hours coming after the announcement: https://git.icinga.com/packaging/raspbian-icinga2/pipelines/5458
Everything else is done thus far, coordinating the announcement with Julia.
Most helpful comment
Everything else is done thus far, coordinating the announcement with Julia.