No, there is nothing with the name of the old zone.
But I found another difference between both configurations, only on master1 is the file conf.d/api-users.conf included, so there is no configuration for api-users on master2. This is the line more in the output of icinga2 daemon -C on master1. Could this be a problem?
Is there a possibility to make manually a sync stage validation like the icinga check?
It confuses me, that the date is the same since it come up first on 2019-11-13 18:05:41.
Icinga 2 has been running for 8 hours, 11 minutes and 30 seconds. Version: r2.11.2-1; Last zone sync stage validation failed at 2019-11-13 18:05:41 +0100
And also confusing that there is no startup.log anymore.
There is no problem to change a custom variable or anything else. The deployment is successful and the changing takes effect.
I actually checked again the output of icinga2 daemon -C, this time with the notices and searching for “ignor” and found this on master1:
# icinga2 daemon -C -x notice | grep -i ignor
notice/config: Ignoring explicit load request for library "db_ido_mysql".
notice/config: Ignoring non local config include for zone 'director-global': We already have an authoritative copy included.
notice/config: Ignoring non local config include for zone 'master': We already have an authoritative copy included.
Icinga 2 has been running for 8 hours, 42 minutes and 28 seconds. Version: r2.11.2-1; Last zone sync stage validation failed at 2019-11-13 18:05:41 +0100
Seems like there wasn’t a reload of icinga2.service, if i restart the service manually, the uptime for the service is changing, but the date for the stage validation failed does not change.
That sentence referred to icinga2 should ... so the core itself clears that. It is weird that it does not do that. The icinga2.state file should have the entry removed.
i had such a problem after i created a HA Cluster from normal single-node master. The manuel says I need to copy the /var/lib/icinga2/icinga2.state from the master to the second one. But everytime i tried this, it has not worked. So i decided to remove the statefile on the second master and cleaned the api folder
And since last night I discovered a further problem, not sure if it depends to the first one.
I got a Exception occurred while checking 'master1': Error: Function call 'pipe2' failed with error code 24, 'Too many open files' (0) Executing check for object 'master1'alert in Icingaweb2 and my service crashed:
● icinga2.service - Icinga host/service/network monitoring system
Loaded: loaded (/lib/systemd/system/icinga2.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/icinga2.service.d
└─limits.conf
Active: failed (Result: exit-code) since Sun 2019-11-19 00:01:01 CET; 0 day 12h ago
Process: 84816 ExecStartPre=/usr/lib/icinga2/prepare-dirs /etc/default/icinga2 (code=exited, status=0/SUCCESS)
Process: 84823 ExecStart=/usr/sbin/icinga2 daemon --close-stdio -e ${ICINGA2_ERROR_LOG} (code=exited, status=1/FAIL
Main PID: 84823 (code=exited, status=1/FAILURE)
Tasks: 0
Memory: 402.1M
CGroup: /system.slice/icinga2.service
Nov 18 14:50:35 master1 icinga2[84823]: [2019-11-18 14:50:35 +0100] information/ConfigItem: Instantiat
Nov 18 14:50:35 master1 icinga2[84823]: [2019-11-18 14:50:35 +0100] information/ConfigItem: Instantiat
Nov 18 14:50:35 master1 icinga2[84823]: [2019-11-18 14:50:35 +0100] information/ConfigItem: Instantiat
Nov 18 14:50:35 master1 icinga2[84823]: [2019-11-18 14:50:35 +0100] information/ConfigItem: Instantiat
Nov 18 14:50:35 master1 icinga2[84823]: [2019-11-18 14:50:35 +0100] information/ConfigItem: Instantiat
Nov 18 14:50:35 master1 icinga2[84823]: [2019-11-18 14:50:35 +0100] information/ScriptGlobal: Dumping
Nov 18 14:50:35 master1 icinga2[84823]: [2019-11-18 14:50:35 +0100] information/cli: Closing console l
Nov 18 14:50:35 master1 systemd[1]: Started Icinga host/service/network monitoring system.
Nov 19 00:01:01 master1 systemd[1]: icinga2.service: Main process exited, code=exited, status=1/FAILUR
Nov 19 00:01:01 master1 systemd[1]: icinga2.service: Failed with result 'exit-code'.
Everything really strange… maybe this depends on the hugh amount of imported hosts which are not connected to the new Icinga2 Cluster right now, I have deactived them now.
Back to the state-files, does everyone know if there is a way to generate them totally new for both masters?
I tried this again, but I couldn’t find the point where the state file is created and only the point where a old log file is removed, but nothing about creating a new one:
[2019-11-20 12:02:10 +0100] notice/ApiListener: Removing old log file: /var/lib/icinga2/api/log/1574247559
[2019-11-20 12:02:10 +0100] notice/ApiListener: Current zone master: master1
[2019-11-20 12:02:10 +0100] information/ApiListener: New client connection for identity 'master2' to [xxx.xxx.xxx.xxx]:5665
[2019-11-20 12:02:10 +0100] notice/JsonRpcConnection: Received 'config::Update' message from identity 'master2'.
[2019-11-20 12:02:10 +0100] information/ApiListener: Applying config update from endpoint 'master2' of zone 'master'.
[2019-11-20 12:02:10 +0100] notice/ConfigCompiler: Registered authoritative config directories for zone 'director-global': /etc/icinga2/zones.d/director-global and /var/lib/icinga2/api/packages/director/b8437779-e99b-463b-a790-95a1ad2af673/zones.d/director-global
[2019-11-20 12:02:10 +0100] information/ApiListener: Ignoring config update from endpoint 'master2' for zone 'director-global' because we have an authoritative version of the zone's config.
[2019-11-20 12:02:10 +0100] notice/ConfigCompiler: Registered authoritative config directories for zone 'master': /etc/icinga2/zones.d/master and /var/lib/icinga2/api/packages/director/b8437779-e99b-463b-a790-95a1ad2af673/zones.d/master
[2019-11-20 12:02:10 +0100] information/ApiListener: Ignoring config update from endpoint 'master2' for zone 'master' because we have an authoritative version of the zone's config.
[2019-11-20 12:02:10 +0100] information/ApiListener: Received configuration updates (0) from endpoint 'master2' do not qualify for production, not triggering reload.
[2019-11-20 12:02:13 +0100] notice/JsonRpcConnection: Received 'log::SetLogPosition' message from identity 'master2'.
[2019-11-20 12:02:14 +0100] notice/JsonRpcConnection: Received 'event::SetNextCheck' message from identity 'master2'.
[2019-11-20 12:02:14 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2019-11-20 12:02:15 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 25; Checks/s: 0
[2019-11-20 12:02:15 +0100] notice/ApiListener: Setting log position for identity 'master2': 2019/11/20 12:02:14
[2019-11-20 12:02:18 +0100] notice/JsonRpcConnection: Received 'event::Heartbeat' message from identity 'server'.
[2019-11-20 12:02:18 +0100] notice/JsonRpcConnection: Received 'log::SetLogPosition' message from identity 'master2'.
[2019-11-20 12:02:18 +0100] notice/JsonRpcConnection: Received 'event::SetNextCheck' message from identity 'master2'.
[2019-11-20 12:02:18 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2019-11-20 12:02:18 +0100] notice/JsonRpcConnection: Received 'event::CheckResult' message from identity 'master2
I looked again into the debug.log on master1. I found something for me a littlebit strange, there are no lines deleted between:
[2019-11-21 11:18:00 +0100] notice/JsonRpcConnection: Received 'config::Update' message from identity 'master2.domain.com'.
[2019-11-21 11:18:00 +0100] information/ApiListener: Applying config update from endpoint 'master2.domain.com' of zone 'master'.
[2019-11-21 11:18:00 +0100] information/ApiListener: Received configuration updates (0) from endpoint 'master2.domain.com' do not qualify for production, not triggering reload.
How do you solved it? I have the same problems. I found only one solution that works: I have to do following on each (Windows)-Client:
net stop icinga2
del C:\ProgramData\icinga2\var\lib\icinga2\icinga2.state
del C:\ProgramData\icinga2\var\lib\icinga2\modified-attributes.conf
del /s C:\ProgramData\icinga2\var\lib\icinga2\api\zones
del /s C:\ProgramData\icinga2\var\lib\icinga2\api\zones-stages
net start icinga2
Till now, I didn’t fixed it. My problem is not on an agent, is one of the masters. There is an possible bug fix update on GIT but i didn’t have the time and a testing zone with the same issue so I couldn’t test it till now.