Director Scheduled Downtime Apply Rules doesn't assign to all matches

obol89 · September 7, 2023, 9:50am

Hi Everyone,

I noticed weird behaviour (or it is my mistake somewhere) with Scheduled Downtime, i.e. I have something like that configured in Icinga Director → ScheduledDowntime Apply Rules:

zones.d/tmg_master/scheduled_downtime_apply.conf
apply ScheduledDowntime "process-trade-container-server" to Service {
    author = "xxx"
    comment = "Scheduled Downtime for process trade_container_server"
    fixed = true
    assign where match("process trade_container_server*", service.name)
    ranges = {
        "friday"	= "16:00-16:35"
        "monday"	= "16:00-16:35"
        "saturday"	= "16:00-16:35"
        "sunday"	= "16:00-16:35"
        "thursday"	= "16:00-16:35"
        "tuesday"	= "16:00-16:35"
        "wednesday"	= "16:00-16:35"
    }
}

I have 27 services called process trade_container_server*:

but ScheduledDowntime Apply Rule is assigned only to 14 of them:

As you can see, some of these services have names like process trade_container_server, some of them are called process trade_container_server xyz.

Assignment of Scheduled Downtime seems to be pretty random. All these services have been deployed before ScheduledDowntime Apply Rule has been created.

Does anyone have an idea why it is not assigned to all of these services that are even sharing the same name sometimes (but different host)?

This is the configuration we are using:

Director version: 1.10.2
Icinga Web 2 version: 2.11.4
- modules: director, grafana, icingadb, incubator
Icinga 2 version: r2.13.7-1
Operating System and version: Debian 11

obol89 · September 8, 2023, 11:39am

I found a workaround for this, but the ugly one. We have two HA masters configured according to this manual - How to set up High-Availability Masters
So, we have API set up on both masters, but it seems like it is conflicting with each other.

When I stop the second master (master2) and configure Scheduled Downtimes, they are assigned correctly. When both masters are running, and I will apply some Scheduled Downtimes, master1 is complaining in the logs about “not accepting api config”:

[2023-09-08 06:30:06 -0500] warning/ApiListener: Ignoring config update from 'master2' (endpoint: 'master2', zone: 'master') for object 'some-server!process beam.smp!5beda8ca-270d-45ef-bf2d-f050e889a407' of type 'Downtime'. 'api' does not accept config.
[2023-09-08 06:30:06 -0500] warning/ApiListener: Ignoring config update from 'master2' (endpoint: 'master2', zone: 'master') for object 'somesecond-server!load' of type 'Service'. 'api' does not accept config.

obol89 · September 8, 2023, 12:07pm

And finally I found the solution (or part of it) as there was misconfiguration between /etc/icinga2/features-enabled/api.conf and /etc/icinga2/constants.conf on both masters. Currently, api.conf looks the same on both:

object ApiListener "api" {
  accept_config = true
  accept_commands = true
  ticket_salt = TicketSalt
}

and TicketSalt is included in /etc/icinga2/constants.conf

I’m not getting “ignoring config” message, but:

Ignoring config update from endpoint 'master2' for zone 'master' because we have an authoritative version of the zone's config.

For some objects, I see duplicated scheduled downtimes, but in different times, as our masters are in different time zones. So, it seems that API doesn’t like to be in two different masters in two different zones.

log1c · September 11, 2023, 11:36am

Interesting. Afaik Icinga uses UTC internally so timezone should not be an issue.
Maybe @Al2Klimov can share some insights, if this could be a bug?!

As for the message:
As there can only be one config master (generally the first one to be set up), this is (partly) ok, I’d say.
Make sure there is not configuration active under /etc/icinga2/conf.d on the second master. The log message should tell you which config was ignored iirc.

Al2Klimov · September 11, 2023, 11:48am

I wish you were right, but we had even problems with DST:

https://github.com/Icinga/icinga2/issues?q=is%3Aissue+DST+is%3Aclosed

obol89 · September 11, 2023, 12:58pm

Thank you for your reply.

As for the message:
As there can only be one config master (generally the first one to be set up), this is (partly) ok, I’d say.
Make sure there is not configuration active under /etc/icinga2/conf.d on the second master. The log message should tell you which config was ignored iirc.

Yes, in general we have commented out entry // include_recursive "conf.d" in icinga2.conf
The only things I’m including on the second master are:

include "conf.d/api-users.conf"
include "conf.d/generic-user.conf"
include "conf.d/service-apply-services.conf"

because without that, I was noticing some issues / weird behaviour and this is a different story I suppose.

So, I’m not worried about the message saying, the one master has the authoritative version of the config, as it sounds correct.

obol89 · September 11, 2023, 1:03pm

@Al2Klimov do you think I should report it as a bug or do you have any other solution in mind?

I was thinking to change our Icinga2 infrastructure config, and having just one master for now.

Al2Klimov · September 11, 2023, 1:45pm

obol89:

The only things I’m including on the second master are:
include "conf.d/api-users.conf"
include "conf.d/generic-user.conf"
include "conf.d/service-apply-services.conf"
because without that, I was noticing some issues / weird behaviour and this is a different story I suppose.

Please specify that.

obol89 · September 11, 2023, 1:58pm

Ok, I will try to explain as best as I can:

1.
include "conf.d/generic-user.conf"

object User "generic-user" {

  display_name = "generic-user"
}

It was some time ago, but if I recall correctly it was the issue with opsgenie feature. It has this declaration:

object User "opsgenie" {
  import "generic-user"
  display_name = "OpsGenie Contact"
}

in the /etc/icinga2/features-enabled/opsgenie.conf.
And it was throwing out errors, that it the config is invalid, as the generic user doesn’t exist.

2.
include "conf.d/service-apply-services.conf"
I’ve added this as Director has its limitations, and it wasn’t able to handle more advanced key-value pair with “Apply Service For” rules. As in general we have all our config done through the Director, I thought we need to have config files like that on both masters.

3.
include "conf.d/api-users.conf"

object ApiUser "root" {
  password = "hidden_password"
  permissions = [ "*" ]
}

object ApiUser "svc-icinga2" {
  client_cn = "svc-icinga2"
  password = "hidden_password"
  permissions = [ "*" ]
}

A bit of similar case as in the second point. I thought we need to have API users config, as we have api feature enabled on both masters.
I’ve read somewhere it is advised to have the same features enabled on both masters.

Al2Klimov · September 11, 2023, 4:47pm

Even if this doesn’t make any trouble right now, not to mention your current problem, it is the perfect source for making trouble one “nice” day.

My recommendation as Icinga dev

Look at the existing config structure. Admittedly not everything is under zones.d/, not even what should be IMAO. But this is another construction area.

Features are in features-available/, ok. Not every Icinga instance uses the API, fair enough.
The API user is under conf.d/api-users.conf not to break it during node setup. Fine.
Zones are in one central config file as best practice. It can’t be under zones.d/ for obvious reasons.

But! (And now the recommendation.) All other config object types (if I didn’t forget anything, feel free to ask if not sure) which can be under zones.d/ and even is under zones.d/ under normal circumstances should be under zones.d/. Host, Service, User, ScheduledDowntime, … On only one master, so that it gets auto-synced to the other, but under zones.d/. Even if Director is involved. Icinga is smart enough to merge the so-called packages “director” (IIRC), “_etc” (everything in /etc/icinga2/zones.d) and “_api”.

The above should you enable to remove all hand-crafted includes and the total result may even fix your problem.

obol89 · September 12, 2023, 7:42am

Thank you for your recommendation.

I was trying to do some changes, if I understood your advice correctly.

So, first issue I noticed, when I moved generic-user.conf from conf.d to zones.d:

[2023-09-12 09:20:54 +0200] critical/config: Error: Import references unknown template: 'generic-user'
Location: in /etc/icinga2/features-enabled/opsgenie.conf: 88:3-88:23
/etc/icinga2/features-enabled/opsgenie.conf(86): 
/etc/icinga2/features-enabled/opsgenie.conf(87): object User "opsgenie" {
/etc/icinga2/features-enabled/opsgenie.conf(88):   import "generic-user"
                                                   ^^^^^^^^^^^^^^^^^^^^^
/etc/icinga2/features-enabled/opsgenie.conf(89):   display_name = "OpsGenie Contact"
/etc/icinga2/features-enabled/opsgenie.conf(90): }
[2023-09-12 09:20:54 +0200] critical/config: 1 error
[2023-09-12 09:20:54 +0200] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

This is why I have this inclusion include "conf.d/generic-user.conf"

When I moved service-apply-services.conf from conf.d to zones.d it doesn’t work as well. Services just disappeared from icinga.

Please let me know if I’m thinking about it wrong, or it doesn’t really work this way.
When using Director, you are correct, all configurations are stored in zones.d, but these files are rendered

How I am supposed to include files like include "conf.d/service-apply-services.conf" in zones.d? As far as I know, I can’t edit these rendered files, as it possibly breaks even more things.

To emphasize, we don’t have any other handcrafted configs, than these I mentioned in this topic.

The above should you enable to remove all hand-crafted includes and the total result may even fix your problem.

To be honest, I have no idea, how it can help to solve the issue with duplicated Scheduled Downtimes, caused (clearly?) by two HA master configs in two different time zones with using API.

obol89 · September 19, 2023, 8:02am

Well, after many various tries, we’ve ended up with changing a time zone on our second master.
I was trying to include configs in various locations, adjust scheduled downtime ranges to time zones. Every time, the downtimes were messed up.
If anyone knows or will find the solution how to manage Scheduled Downtimes with Director and two masters in different time zones properly, I will really appreciate help.

Al2Klimov · September 26, 2023, 9:35am

obol89:

So, first issue I noticed, when I moved generic-user.conf from conf.d to zones.d:

[2023-09-12 09:20:54 +0200] critical/config: Error: Import references unknown template: 'generic-user'
Location: in /etc/icinga2/features-enabled/opsgenie.conf: 88:3-88:23
/etc/icinga2/features-enabled/opsgenie.conf(86): 
/etc/icinga2/features-enabled/opsgenie.conf(87): object User "opsgenie" {
/etc/icinga2/features-enabled/opsgenie.conf(88):   import "generic-user"
                                                   ^^^^^^^^^^^^^^^^^^^^^
/etc/icinga2/features-enabled/opsgenie.conf(89):   display_name = "OpsGenie Contact"
/etc/icinga2/features-enabled/opsgenie.conf(90): }
[2023-09-12 09:20:54 +0200] critical/config: 1 error
[2023-09-12 09:20:54 +0200] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.

features-enabled/ is an even worse place for everything ex. features. The User objects should be trivial, why not to manage them via Director, too?

I guess you’ve put the config directly under zones.d/. However, as per /etc/icinga2/zones.d/README, you have to

Decide the desired zone (global-templates/ master/ …)
Create the directory /etc/icinga2/zones.d/ZONE_NAME on exactly one of the masters
Move this config to the new directory:

obol89 · September 27, 2023, 9:24am

Because it was like that by default, so I just haven’t touched this. This is a part of the default Opsgenie feature, and it is there after installing it. Inside the default Opsgenie feature config file, between many object definitions is:

object User "opsgenie" {
  import "generic-user"
  display_name = "OpsGenie Contact"
}

Yes, you are correct. I will try with what you are suggesting, and I will let you know.

obol89 · September 28, 2023, 1:40pm

@Al2Klimov
I appreciate your time and suggestions. Unfortunately, none of them is solving my issue with “messed up” Scheduled Downtimes.

Just as a reminder, we have two HA masters and two satellites. Masters and satellites are spread out between two time zones, CDT and CEST.

We are trying to apply Scheduled Downtimes based on service_name for multiple hosts.

There is a common behaviour between some of the config approaches:

Using Icinga Director or zones.d/ZONES_NAME/scheduled_downtime.conf
In this case, all declared scheduled downtimes are added, but some of them (roughly half/half) are adjusted to CDT, some of them to CEST. All hosts are located in CDT.
Using conf.d/scheduled_downtime.conf on the first master (CDT).
Scheduled Downtimes are added to roughly half of the hosts it should be added.
Using conf.d/scheduled_downtime.conf on both Masters (CDT and CEST)
Scheduled Downtimes are added to all necessary hosts, but similar as on the 1. point, they are messed up between two timezones.
There is no issue, when two configs look differently. One has declared time adjusted to CDT, second one to CEST. This is horrible to maintain, and also it won’t work when there will be Summertime/Daylight change between these time zones.

I’m really thinking about shrinking our set-up to just one Master. This is the only solution I found, and it is not messing up Scheduled Downtimes.