New Deployment not happening - Pending State

Hello All

It was all good till yesterday evening. The last things that was changed are as below

  1. Created a generic - host / service template (blank with just few custom variable) and imported in all service and host template.
  2. Created a Service Set for the first time and deployed on a host template
  3. Changed host template generic check method from ping to dummy as the servers under that host template are not pingable

The first 2 change went through without much fuss and is working. Third change is not going through as it still shows the servers are being pinged and is down

All the changes are done through director. Now post that what ever changes we are doing e.g creating a new service (going in pending state) modifying an existing service (not reflecting the change -even though it says the deployment is success

Sometimes while deploying the change getting a curl error. But it is going away on its own and getting a green check like we get deployment is successful. Yes telnet is working to Master 5665 from icingaweb2 server

Note on these servers other service checks which was deployed earlier working fine

The above curl error is very rare. But after few seconds it goes green check as deployment success. It is not coming all the time - but rarely sometimes

I made a change to disk threshold of an existing service from 10% / 20 % to 20 % / 30 %

In the inspect element the change is not reflecting

But if i scroll down the same inspect page at the bottom the vars are getting updated

Nothing obvious is shown in the log with errors etc

Hi,

say greetings to Klaus :slight_smile: The variables shown behind the inspect links get updated when the check of the service did run again. As i see the checks should run in another zone then the master i would check the satelliees/agents if they can got the newest configuration and can load it with out any error (icinga2 daemon -C)

Regards,
Carsten

Hello Klaus

Thank you for your response. If I click on Check Now I see on satellite it is force executing. But don’t think it is getting the new service configuration / changes updates etc. I have no idea how I broke it

Any direction to start troubleshooting will help

Hi,

iam not Klaus. You should greet him, (seeing the corpintra.net makes me think you work for a customer i know).
Steps you can do:

  1. do icinga daemon -C on the affected satellites and see if configuration renders without any error
  2. compare the output of `icinga2 object list --type service --name <HOSTNAME!SERVICENAME>, if it differs the actual configuration is not synced/loaded by the satellite

Regards,
Carsten

Ha Ha - sorry but there are many Klaus - not sure who you are referring to. But Hi to you from all of Klaus in here :slight_smile:


I see this but the error is fine as I don’t expect them to match in any of the server under this satellite

We have 8 Pairs of Satellite

Master is not able to sync data with any of the satellite. Atleast I have tested 4 by making changes to services specific for the respective satellite. Other 4 as well I don’t expect it is in sync

I don’t know what did I screw up. All I know was just creating services

Please check the logfiles of the satellites if you see there a reason why they dont load the new configuration.
Did you do the compare of the service on masters/satellites with icinga2 object list ..?

Regards,
Carsten

P.S.: Klaus J. should know me :slight_smile:

[2020-04-01 09:43:30 +0200] information/ApiListener: Finished syncing endpoint 'xxxxxxx.corpintra.net' in zone 'master'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Applying config update from endpoint 'xxxxxxx.corpintra.net' of zone 'master'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Received configuration for zone 'xxx_Satellite_INF-xxx' from endpoint 'xxxxxxxxxx.corpintra.net'. Comparing the timestamp and checksums.
[2020-04-01 09:43:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/xxx_Satellite_INF-xxx//director/agent_endpoints.conf' for zone 'xxx_Satellite_INF-xxx'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/xxx_Satellite_INF-xxxx//director/agent_zones.conf' for zone 'xxx_Satellite_INF-xxxx'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/xxx_Satellite_INF-xxxx//director/hosts.conf' for zone 'xxx_Satellite_INF-xxxx'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/xxxx_Satellite_INF-xxxx' (16469 Bytes).
[2020-04-01 09:43:30 +0200] information/ApiListener: Received configuration for zone 'director-global' from endpoint 'xxxxxxxxxxx.corpintra.net'. Comparing the timestamp and checksums.
[2020-04-01 09:43:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/001-director-basics.conf' for zone 'director-global'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/commands.conf' for zone 'director-global'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/host_templates.conf' for zone 'director-global'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/hostgroups.conf' for zone 'director-global'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/service_apply.conf' for zone 'director-global'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/service_templates.conf' for zone 'director-global'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/servicesets.conf' for zone 'director-global'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/timeperiods.conf' for zone 'director-global'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/director-global//director/user_templates.conf' for zone 'director-global'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/director-global' (35785 Bytes).
[2020-04-01 09:43:30 +0200] information/ApiListener: Received configuration for zone 'global-templates' from endpoint 'xxxxxxxxxxxx.corpintra.net'. Comparing the timestamp and checksums.
[2020-04-01 09:43:30 +0200] information/ApiListener: Our production configuration is more recent than the received configuration update. Ignoring configuration file update for path '/var/lib/icinga2/api/zones-stage/global-templates'. Current timestamp '2020-03-31 08:55:52 +0200' (1585637752.646405) >= received timestamp '2020-03-31 08:55:52 +0200' (1585637752.646405).
[2020-04-01 09:43:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//director/host_templates.conf' for zone 'global-templates'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/global-templates//director/service_apply.conf' for zone 'global-templates'.
[2020-04-01 09:43:30 +0200] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/global-templates' (2138 Bytes).
[2020-04-01 09:43:30 +0200] information/ApiListener: Received configuration updates (3) from endpoint 'xxxxxxxxx.corpintra.net' are different to production, triggering validation and reload.
[2020-04-01 09:43:30 +0200] **critical/ApiListener: Config validation failed for staged cluster config sync in '/var/lib/icinga2/api/zones-stage/'. Aborting. Logs: '/var/lib/icinga2/api/zones-stage//startup.log**'

cat /var/lib/icinga2/api/zones-stage//startup.log

[2020-04-01 09:43:30 +0200] information/cli: Icinga application loader (version: 2.11.2-1)
[2020-04-01 09:43:30 +0200] information/cli: Loading configuration file(s).
[2020-04-01 09:43:30 +0200] critical/config: Error: Object 'xxx-tmplHost-AG-LZ' of type 'Host' re-defined: in /var/lib/icinga2/api/zones-stage//global-templates/director/host_templates.conf: 1:0-1:34; previous definition: in /var/lib/icinga2/api/zones-stage//director-global/director/host_templates.conf: 57:1-57:35
Location: in /var/lib/icinga2/api/zones-stage//global-templates/director/host_templates.conf: 1:0-1:34
/var/lib/icinga2/api/zones-stage//global-templates/director/host_templates.conf(1): template Host "xxx-tmplHost-AG-LZ" {
                                                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/var/lib/icinga2/api/zones-stage//global-templates/director/host_templates.conf(2):     check_command = "hostalive"
/var/lib/icinga2/api/zones-stage//global-templates/director/host_templates.conf(3):     max_check_attempts = "3"

[2020-04-01 09:43:30 +0200] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

Strangely it is not capturing in icinga2 daemon -C

This is how the template looked like yesterday

and I screwed it like this thinking this is incorrect configuration :frowning:

I did a restore and it started working OK.

But question here is I do need to change the setting from global-template to director-global

Deleting and moving a template at the same time in Director is always tricky and will lead to your problem.
All you can do now is to empty all configuration from the sattelites and let them sync the freshly configuration from the masters. Maybe you have to do the same on agents too.

Hi !

As Carsten already cleared it up, if config isn’t correctly deployed. Most of the time it doesn’t really reach the satellite where the checks can/should be performed. In this case it seemed that the “global-templates” zone was gone.

I can suggest for the next time to check, the zones.conf in /etc/icinga2/zones.conf with the actual zones which are filled with config via the API in /var/lib/icinga2/_api/… and so forth … it helps in finding out why config doesn’t reach the endpoint it was assigned to.

Regards & Thanks Carsten …

David

Thank You Guys. Why the error was not showing up in icinga2 daemon -C. And also why director was saying that the implement is success. even though it was not. But anyways the problem is solved. And All is well.

As suggested I am not going to change from global-template to director-global. By mistake global-template was selected and deployed. All problem started when we tried to correct that.

Thank You guys

2 Likes