Icinga Agent: Top Down Config Sync

rsx · January 31, 2019, 11:17am

I’m trying to find if Top Down Config Sync would bring advantages e.g. continue running checks if parent is not reachable. However, I could not been finding a valid configuration yet.

First, it lookes like it’s incompatible with director, isn’t it? (The host object would not be part of /etc/icinga2/zones.d, hence is not synchronized to the clients.

I’ve configured a host object e.g. /etc/icinga2/zones.d/testvm/testvm.conf and one service object in /etc/icinga2/zones.d/windows, while windows is a global zone. It works as expected, but with strange effects:

If the host object is member of its own zone:
- hostalive ist configured for the wrong check source, hence could be used to identify if host is down.
- the service for the disconnected is still in status OK although it’s not returning any results, if the network is disconnected.
If the host is member of its parent zone:
- hostalive works as expected
- the parent tries to send commands to the client and fails due to accept_command = false
In both cases perfdata isn’t handled as expected

BTW: Accepts config in director looks like it’s not working. (And without director) Which config option is needed at a host object to prevent icinga to send commands?

Does anybody has a valid config? Or is this type of configuration (even) deprecated?

dnsmichi · January 31, 2019, 2:00pm

Hi,

We don’t really support Windows agents as full blown satellite instances with their own local scheduler. Mainly for the reason that performance handling and troubleshooting on Windows is hard compared to Linux. At some point in the future, we will make this default and drop support for the local scheduler on an agent. This is to be discussed though.

The preferred method for agents is the command endpoint execution bridge, and this also is used inside the Icinga Director when making a host an agent.

In terms of your questions:

1.) When the host in the wrong zone, you can use the zone attribute trick to tell the master that it should be authoritative for the host object. Then it executes the check itself, while the agent does nothing. When the service doesn’t return any checkresults, there’s no state change on the master or more visible problems. That is because of the check result history required to be intact when a reconnect happens, and replay log sends cached results from the remote endpoint.

This is a typical scenario for a satellite though.

2.) accept_commands should be enabled for agents by default, this is more of a security thing to not trust anything from above.

3.) What do you mean with perfdata isn't handled as expected?

4.) accept_config in Director - please share a screenshot or configuration snippet to explain this.

Anyhow, I would strongly recommend to use agents only with command endpoints. This requires accept_commands = true on the agent’s api feature, and accept_config for syncing checkcommands via a global zone (if not available in the ITL on the agent).

Cheers,
Michael

rsx · February 1, 2019, 12:52pm

Hi,

ok, so the main advantage about Windows Agent is not existing and I’ll stop investigating the according configuration setup (And we’ll use them as recommend). For Linux we’ll start with by_ssh.

Regarding perfdata: In setup #1 icinga2 created empty files in /var/spool/icinga2/perfdata and in both scenarios no perfdata was received after the network came back online. However, as we don’t want to follow Top Down Config Sync anymore, I’d closed this thread (except you want me to give some more information or do tests).

Cheers,
Roland

dnsmichi · February 1, 2019, 1:02pm

Hi,

In my experience, users and Windows admins don’t like it when a monitoring service takes 100% cpu for scheduling checks on a frequent basis. That sometimes happened with the Windows client including synced local objects, and in combination with the replay log filling up the C: drive, this does not turn out very well for a self-contained system.

In regard of check_by_ssh, the configuration with the arguments parsing and passing is not super sexy imho. I tend to avoid it wherever I can and have an Icinga 2 agent at hand. Then the focus stays on the same configuration and handling for both worlds, Windows and Linux. Having this abstracted e.g. in the Director with specific templates and service sets, you normally just say “this is an agent”, and it starts to work magically.

We don’t need to close the thread, maybe others have different ideas or want to share their experience.

Cheers,
Michael