Check is executed on incorrect server and treated as successful

Hi,

I’ve founded some services whose checks are displayed and treated as health and “OK” even they shouldn’t be cause the check is executed on incorrect server/node.

Icinga2 version : 2.12.0-1
Environment: Debian Buster

See my case:

I am using top down config sync. From master to satellite and from satellite to agents.

zones.conf on satellite.

object Endpoint "master" {
// Connect to us
}

object Endpoint "a_server" {
// That's us
}

object Endpoint "b_server" {
    host = "b_server"
// agent
}


object Zone "master" {
    endpoints = [ "master_server" ]
}

object Zone "a" {
    endpoints = [ "a_server" ]
    parent = "master"
}

object Zone "b" {
  endpoints = [ "b_server" ]
  parent = "a"
}

object Zone "global-templates" {
	global = true
}

object Zone "director-global" {
	global = true
}

zones.conf on agent is identical except the host attributes, only satelite connects to agent not vice versa.

service.conf

object Host "b_server" {
  import "generic-host"

  address = "b_server"
  vars.os = "Linux"
  vars.distro = "Debian"
  vars.agent_type = "Icinga"

}

object Service "disk" {
  import "generic-service"

  host_name = "b_server"
  check_command = "disk"

}

Here is the problem. I’ve got scenario where (for whatever reason) service.conf file is hasn’t been synced onto agent but has been synced onto satellite.

In the web UI the disk service is “OK” and has some data but those data belongs to satelite not to the agent at all. So, for the first look you won’t notice anything wrong and everything seems fine…

So, my question is, how to configure object service for one and only specific server?

In my opinion this is very dangerous, the check should fail because it is not executed on correct server and I thought the host_name variable should be the leading one.

Thanks in advance.

For executing a check at an agent, you need to add

command_endpoint = host.name

to your service definition.

Thank you for reply.

I’ve understood this directive is necessary only in case of Top Down Command Endpoint.

Anyway, I’ve added the directive as command_endpoint = host.name (the service file is still not synchronized which is ok cause I’d like to fail the check as it shall be correct behavior) and that didn’t helped, I guess host.name is substituted for the hostname of the server which has the file not for the hostname of the final destination server.

Anyway, I’ve exchanged host.name for the real host name b_server and yes that works, the check is in the UNKNOWN state and that is correct.

So, I am going to solve the syncing problem and hopefully the check will be executed correctly.

Syncing depends on where you place a conf file. For service objects good practice is to use a gobal zone that exists on all systems e.g. global_zone. If you don’t want to have Linux service objects on Windows machines and vica versa, you could define e.g. two global zones.

Hmm… Icinga just come up with notification the server doesn’t accept commands :smiley:

So, it seams the host_name directive works just for the remote execution.

My goal is to let the Icinga agent on the server execute the checks and only send the results.

I guess I am confused.

I know, I use global zone for templates and services and zones for specific configurations.

I’ve got servers separated by Geo zones. The sync from master to satellite works, from the satellite to agent does not…

At your agents you need to enable accept_commands, details can be found here.

Windows is not supported to execute checks by itself, this is always scheduled by its parent.

I am not sure we understand each other :slight_smile:

I’d like to have setup where some servers are check remotely by master or satellite via command_endpoint and those shall accept commands.

For some servers I need them to act independently (cause I need data about them even during connection loss), they should check theirself and send results back to master directly or via satellite. Those servers will only accept config files, they don’t need to accept commands.

This scenario is I believe in respect of the documentation.

According to link you sent me.

The last point states:

“An agent node will either run its own configured checks or receive command execution events from the parent node.”

So I understand the Agent can check itself without remote execution form the parent eq. satellite.

As I’ve written, indepently is not supported for Windows.

I see. I don’t have any windows machine, so it shall be configurable.

You then need all objects (host, service and command) to be existing at your agents. This can be done manually or with this trick.

Thank you, I am not sure I understand the point. Do you think it is better to use different approach then zones for conf. file distribution?

Cause the issue I have is problem with false positives check in case the synchronization wasn’t successful.

But it’s maybe good point to rather focus on proper config distribution.

Zones somehow works in most cases but sometimes not and there is no native mechanism to ensure success of files distribution.

Zones and conf files are the most challenging part of icinga in my opinion. Icinga is designed to have host objects beloging to a master or satellite zone. This automatically defines where a check is executed except command_endpoint is defined. In both cases all checks are scheduled by an endpoint of the corresponding host zone. For this setup conf distribution works perfect.

You want something different, hence, you need some tricky work arounds.

In every case icinga does not update a check status if is not reported. Is this what you call false positive?

By false positive, I mean the case described in my first post.

Scenario:

Three-level cluster (master-satellite-agent) where agent shall execute checks locally and send results back to master (via satellite).

So, I’ve created a configuration file for the agent with object service configuration for disk check (for the sake of example). The configuration file has been synced onto the satellite but hasn’t been synced onto the agent.

In the web UI, the agent reports a new service, the disk, and the service is in “OK” state and reports information (space, inodes) but it is false positive because the config file is not present on the agent node and agent node and the data reported by the service belongs to satellite eq. disk space.

So, it seems it can happen, cluster will not notice the problem with synchronization (maybe just wrong configuration, so from point of Icinga all is correct) and the object is executed on the node where is present even though it should be bind to specific node or at least I’ve thought the object will belongs to only one node.

Hi there.

Just a small note:
Maybe I am getting you wrong, but the agent-nodes do not need to get the service definitions synced. Agent nodes only need to know about the CheckCommands.


Greetings.

Hi,

thank you for your reply.

Unfortunately, your statement is valid only for Top Down Command Endpoint.

My problem is related to Top Down Config Sync.