It seems icinga2 waits until CheckerComponent has finished all the checks before it restarts the daemon, which sometimes takes 100+ secs to do so. Is this normal? How can I set a timeout here or force a restart, because it doesn’t make sense to wait for those checks to complete if I am restarting it anyway?
This seems to be a new behavior, it didn’t occur in v2.10.
I must say it is a new install from scratch, since it needed a newer debian because of the needed libs and I didn’t want to do a dist-upgrade. The config was then cloned from our git.
excerpt from debug log:
... debug/CheckerComponent: Waiting for running checks (3) to finish. Waited for x seconds now.
... debug/CheckerComponent: Waiting for running checks (2) to finish. Waited for x seconds now.
[...]
[2019-12-03 09:10:02 +0100] debug/CheckerComponent: Waiting for running checks (1) to finish. Waited for 175.1 seconds now.
[2019-12-03 09:10:02 +0100] debug/CheckerComponent: Waiting for running checks (1) to finish. Waited for 175.2 seconds now.
[2019-12-03 09:10:02 +0100] notice/Process: PID 67332 ('/usr/lib/nagios/plugins/check_by_ssh' '-C' ''/usr/local/nagios/bin/plugins/check_timemachine.sh' '-c' '57600' '-w' '43200'' '-H' '10.159.0.55' '-l' 'foo' '-o' 'PasswordAuthentication=no' '-p' '22' '-t' '120') terminated with exit code 0
[2019-12-03 09:10:02 +0100] notice/Dependency: Dependency 'roley!host-roley-de-nue-sw-01-dev' passed: Parent host 'de-nue-sw-01-dev' matches state filter.
[2019-12-03 09:10:02 +0100] notice/Dependency: Dependency 'roley!host-roley-de-nue-sw-01-dev' passed: Parent host 'de-nue-sw-01-dev' matches state filter.
[2019-12-03 09:10:02 +0100] notice/Dependency: Dependency 'roley!SSH!disable-host-service-checks' passed: Parent host 'roley' matches state filter.
[2019-12-03 09:10:02 +0100] notice/Dependency: Dependency 'roley!Timemachine backup daily!disable-host-service-checks' passed: Parent host 'roley' matches state filter.
[2019-12-03 09:10:02 +0100] notice/Dependency: Dependency 'roley!Timemachine backup daily!by_ssh-on-ssh-failures' passed: Parent service 'roley!SSH' matches state filter.
[2019-12-03 09:10:02 +0100] notice/Dependency: Dependency 'roley!host-roley-de-nue-sw-01-dev' passed: Parent host 'de-nue-sw-01-dev' matches state filter.
[2019-12-03 09:10:02 +0100] notice/Dependency: Dependency 'roley!host-roley-de-nue-sw-01-dev' passed: Parent host 'de-nue-sw-01-dev' matches state filter.
[2019-12-03 09:10:02 +0100] notice/Dependency: Dependency 'roley!SSH!disable-host-service-checks' passed: Parent host 'roley' matches state filter.
[2019-12-03 09:10:02 +0100] notice/Dependency: Dependency 'roley!Timemachine backup daily!disable-host-service-checks' passed: Parent host 'roley' matches state filter.
[2019-12-03 09:10:02 +0100] notice/Dependency: Dependency 'roley!Timemachine backup daily!by_ssh-on-ssh-failures' passed: Parent service 'roley!SSH' matches state filter.
[2019-12-03 09:10:02 +0100] debug/Checkable: Update checkable 'roley!Timemachine backup daily' with check interval '600' from last check time at 2019-12-03 09:10:02 +0100 (1.57536e+09) to next check time at 2019-12-03 09:19:49 +0100 (1.57536e+09).
[2019-12-03 09:10:02 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2019-12-03 09:10:02 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2019-12-03 09:10:02 +0100] debug/IdoPgsqlConnection: Query: UPDATE icinga_servicestatus SET acknowledgement_type = E'0', active_checks_enabled = E'1', check_command = E'by_ssh', check_source = E'de-nue-monitoring-staging', check_type = E'0', current_check_attempt = E'1', current_notification_number = E'0', current_state = E'0', endpoint_object_id = 21220, event_handler_enabled = E'1', execution_time = E'799.652998', flap_detection_enabled = E'1', has_been_checked = E'1', instance_id = 1, is_flapping = E'0', is_reachable = E'1', last_check = TO_TIMESTAMP(1575360602) AT TIME ZONE 'UTC', last_hard_state = E'0', last_hard_state_change = TO_TIMESTAMP(1575343159) AT TIME ZONE 'UTC', last_state_change = TO_TIMESTAMP(1575343159) AT TIME ZONE 'UTC', last_time_ok = TO_TIMESTAMP(1575360602) AT TIME ZONE 'UTC', last_time_unknown = TO_TIMESTAMP(1575343035) AT TIME ZONE 'UTC', latency = E'0.000874', long_output = E'', max_check_attempts = E'3', next_check = TO_TIMESTAMP(1575361189) AT TIME ZONE 'UTC', next_notification = TO_TIMESTAMP(1575358988) AT TIME ZONE 'UTC', normal_check_interval = E'10', notifications_enabled = E'1', original_attributes = E'null', output = E'OK: last backup ran on Tue Dec 3 08:10:14 CET 2019, 59 minutes ago.', passive_checks_enabled = E'1', percent_state_change = E'0', perfdata = E'', problem_has_been_acknowledged = E'0', process_performance_data = E'1', retry_check_interval = E'2', scheduled_downtime_depth = E'0', service_object_id = 18511, should_be_scheduled = E'1', state_type = E'1', status_update_time = TO_TIMESTAMP(1575360602) AT TIME ZONE 'UTC' WHERE service_object_id = 18511
[2019-12-03 09:10:02 +0100] information/CheckerComponent: 'checker' stopped.
[2019-12-03 09:10:02 +0100] information/LivestatusListener: 'livestatus' stopped.
OS: Debian GNU/Linux 10 (buster)
Package: icinga2, Version: 2.11.2-1.buster
just one (staging) zone. The HA master cluster w/ 2 nodes will also be upgraded, after more testing.