i had such a problem after i created a HA Cluster from normal single-node master. The manuel says I need to copy the /var/lib/icinga2/icinga2.state from the master to the second one. But everytime i tried this, it has not worked. So i decided to remove the statefile on the second master and cleaned the api folder
And since last night I discovered a further problem, not sure if it depends to the first one.
I got a Exception occurred while checking 'master1': Error: Function call 'pipe2' failed with error code 24, 'Too many open files' (0) Executing check for object 'master1'alert in Icingaweb2 and my service crashed:
ā icinga2.service - Icinga host/service/network monitoring system
Loaded: loaded (/lib/systemd/system/icinga2.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/icinga2.service.d
āālimits.conf
Active: failed (Result: exit-code) since Sun 2019-11-19 00:01:01 CET; 0 day 12h ago
Process: 84816 ExecStartPre=/usr/lib/icinga2/prepare-dirs /etc/default/icinga2 (code=exited, status=0/SUCCESS)
Process: 84823 ExecStart=/usr/sbin/icinga2 daemon --close-stdio -e ${ICINGA2_ERROR_LOG} (code=exited, status=1/FAIL
Main PID: 84823 (code=exited, status=1/FAILURE)
Tasks: 0
Memory: 402.1M
CGroup: /system.slice/icinga2.service
Nov 18 14:50:35 master1 icinga2[84823]: [2019-11-18 14:50:35 +0100] information/ConfigItem: Instantiat
Nov 18 14:50:35 master1 icinga2[84823]: [2019-11-18 14:50:35 +0100] information/ConfigItem: Instantiat
Nov 18 14:50:35 master1 icinga2[84823]: [2019-11-18 14:50:35 +0100] information/ConfigItem: Instantiat
Nov 18 14:50:35 master1 icinga2[84823]: [2019-11-18 14:50:35 +0100] information/ConfigItem: Instantiat
Nov 18 14:50:35 master1 icinga2[84823]: [2019-11-18 14:50:35 +0100] information/ConfigItem: Instantiat
Nov 18 14:50:35 master1 icinga2[84823]: [2019-11-18 14:50:35 +0100] information/ScriptGlobal: Dumping
Nov 18 14:50:35 master1 icinga2[84823]: [2019-11-18 14:50:35 +0100] information/cli: Closing console l
Nov 18 14:50:35 master1 systemd[1]: Started Icinga host/service/network monitoring system.
Nov 19 00:01:01 master1 systemd[1]: icinga2.service: Main process exited, code=exited, status=1/FAILUR
Nov 19 00:01:01 master1 systemd[1]: icinga2.service: Failed with result 'exit-code'.
Everything really strangeā¦ maybe this depends on the hugh amount of imported hosts which are not connected to the new Icinga2 Cluster right now, I have deactived them now.
Back to the state-files, does everyone know if there is a way to generate them totally new for both masters?
I tried this again, but I couldnāt find the point where the state file is created and only the point where a old log file is removed, but nothing about creating a new one:
[2019-11-20 12:02:10 +0100] notice/ApiListener: Removing old log file: /var/lib/icinga2/api/log/1574247559
[2019-11-20 12:02:10 +0100] notice/ApiListener: Current zone master: master1
[2019-11-20 12:02:10 +0100] information/ApiListener: New client connection for identity 'master2' to [xxx.xxx.xxx.xxx]:5665
[2019-11-20 12:02:10 +0100] notice/JsonRpcConnection: Received 'config::Update' message from identity 'master2'.
[2019-11-20 12:02:10 +0100] information/ApiListener: Applying config update from endpoint 'master2' of zone 'master'.
[2019-11-20 12:02:10 +0100] notice/ConfigCompiler: Registered authoritative config directories for zone 'director-global': /etc/icinga2/zones.d/director-global and /var/lib/icinga2/api/packages/director/b8437779-e99b-463b-a790-95a1ad2af673/zones.d/director-global
[2019-11-20 12:02:10 +0100] information/ApiListener: Ignoring config update from endpoint 'master2' for zone 'director-global' because we have an authoritative version of the zone's config.
[2019-11-20 12:02:10 +0100] notice/ConfigCompiler: Registered authoritative config directories for zone 'master': /etc/icinga2/zones.d/master and /var/lib/icinga2/api/packages/director/b8437779-e99b-463b-a790-95a1ad2af673/zones.d/master
[2019-11-20 12:02:10 +0100] information/ApiListener: Ignoring config update from endpoint 'master2' for zone 'master' because we have an authoritative version of the zone's config.
[2019-11-20 12:02:10 +0100] information/ApiListener: Received configuration updates (0) from endpoint 'master2' do not qualify for production, not triggering reload.
[2019-11-20 12:02:13 +0100] notice/JsonRpcConnection: Received 'log::SetLogPosition' message from identity 'master2'.
[2019-11-20 12:02:14 +0100] notice/JsonRpcConnection: Received 'event::SetNextCheck' message from identity 'master2'.
[2019-11-20 12:02:14 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2019-11-20 12:02:15 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 25; Checks/s: 0
[2019-11-20 12:02:15 +0100] notice/ApiListener: Setting log position for identity 'master2': 2019/11/20 12:02:14
[2019-11-20 12:02:18 +0100] notice/JsonRpcConnection: Received 'event::Heartbeat' message from identity 'server'.
[2019-11-20 12:02:18 +0100] notice/JsonRpcConnection: Received 'log::SetLogPosition' message from identity 'master2'.
[2019-11-20 12:02:18 +0100] notice/JsonRpcConnection: Received 'event::SetNextCheck' message from identity 'master2'.
[2019-11-20 12:02:18 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2019-11-20 12:02:18 +0100] notice/JsonRpcConnection: Received 'event::CheckResult' message from identity 'master2
I looked again into the debug.log on master1. I found something for me a littlebit strange, there are no lines deleted between:
[2019-11-21 11:18:00 +0100] notice/JsonRpcConnection: Received 'config::Update' message from identity 'master2.domain.com'.
[2019-11-21 11:18:00 +0100] information/ApiListener: Applying config update from endpoint 'master2.domain.com' of zone 'master'.
[2019-11-21 11:18:00 +0100] information/ApiListener: Received configuration updates (0) from endpoint 'master2.domain.com' do not qualify for production, not triggering reload.
How do you solved it? I have the same problems. I found only one solution that works: I have to do following on each (Windows)-Client:
net stop icinga2
del C:\ProgramData\icinga2\var\lib\icinga2\icinga2.state
del C:\ProgramData\icinga2\var\lib\icinga2\modified-attributes.conf
del /s C:\ProgramData\icinga2\var\lib\icinga2\api\zones
del /s C:\ProgramData\icinga2\var\lib\icinga2\api\zones-stages
net start icinga2
Till now, I didnāt fixed it. My problem is not on an agent, is one of the masters. There is an possible bug fix update on GIT but i didnāt have the time and a testing zone with the same issue so I couldnāt test it till now.