Remote agent querying services but not host check_command

ken · May 12, 2021, 9:09pm

System is composed of a master and an agent.

I have some hosts in the master’s /etc/icinga2/zones.d/<remote-agent.hostname.com>/hosts.conf

Services attached to those hosts are polling nicely from the agent and reporting to the master.

Hosts are not being pinged from the agent.

Agent has monitoring-tool installed, and in particular, /usr/lib/nagios/plugins/check_ping exists, works, itl is included, and PluginDir is defined (I get redef errors/warnings if I try to set these again on the remote agent).

In each host object in /etc/icinga2/zones.d/<remote-agent.hostname.com>/hosts.conf:

zone = "master" 
agent_endpoint = <remote-agent.hostname.com>
check_command = "hostalive4"

If I check now on a host in the UI, time refreshes as if check was scheduled. Tcpdump and host status indicates nothing happened from the agent. Tcpdump shows ping from the master.

System config still the same as in Duplicate host entry points to self when adding agent - #4 by dgoetz

How do I move the hostalive4/ping to the agent? I tried changing the zone value in the host object to the agent, but then nothing happens at all. Other community discussions indicate that since the check source is the master, the zone has to be the master. I was hoping putting the hosts in the zones.d directory for the agent would accomplish this move.

ken · May 14, 2021, 7:07pm

Still having this problem.

I don’t see any examples of the Agent pinging any hosts for host alive status using a check_command in Distributed Monitoring - Icinga 2 . Just the agent executing Services, or the master doing the ping. I don’t even see an example of host objects in the Agent directory under zones.d. Are these not capabilities of the Agent?

I have a

const ZoneName = "HQ"

and also:

object Zone "master" {
  endpoints = [ "icinga.ourhq.com" ] //array with endpoint names
}

object Zone "remote-agent.hostname.com" {
  endpoints = [ "remote-agent.hostname.com" ]

  parent = "master" //establish zone hierarchy
}

But yet the zone object identifies itself as “HQ” not “master”. Is that a problem? Should they match? If I dump the object, the object and __name is “HQ” and the endpoint is the same hostname as the assignment above.

ken · May 14, 2021, 8:03pm

As an experiment, I changed NodeName to “master”.

No matter how many restarts and removing the satellite’s icinga2.state file, it still remembers zone object “HQ”.

However on the master, icinga2 object list --type zone command yields:

Object 'master' of type 'Zone':

I had to hack the config files on the Agent to get it to say the same thing.

No change in behavior.

Curiously, when on the Host in the UI and I click Check now, the Next check field starts counting backwards, seemingly forever.

ritzgu · May 17, 2021, 10:06am

Hi @ken,

What happens if you remove the following lines?

zone = "master"
agent_endpoint = <remote-agent.hostname.com>

ken · May 17, 2021, 12:19pm

Hi ritzgu,

Thanks for the question.

If I remove agent_endpoint, services move to polling from the master. The services are written to pick up that value from the host.

If I remove the zone, nothing changes. No ping from either master or agent.

ken · May 18, 2021, 1:32pm

Icinga2 in Agent mode does not seem fit for purpose. For some reason I just cannot get the master to command the Agent to use the host alive check.

I guess I could upgrade it to a satellite, or make the hosts dummy and add a ping service check in its place.

But honestly, if this functionality is disabled by design, the documentation should say so, not just hint at it.

Pooh · May 18, 2021, 2:07pm

I apologise if I’ve come into this thread somewhat late, and perhaps missed
the underlying purpose of what you’re trying to do, but I don’t understand the
concept of an Agent performing a host alive check.

An Agent is performing checks on itself, not any other machines (otherwise it
would be a Master or a Satellite).

“host alive” is a check to find out whether a machine is reachable.

Suppose the machine is turned off. Then it cannot perform the check, or return
the result, but another machine (a Satellite or a Master) could do so.

Support the machine is turned on and operating but has an upstream network
connectivity problem. Then the host alive check, locally on the Agent,
decides that it is contactable, but then cannot return the OK status to the
Master or the Satellite because of the network connectivity problem.

Either way, you end up with a machine which can tell you “I can see myself”
when this is true, but can’t inform you of anything else when there’s a
problem. This strikes me as not useful.

So, what use case do you have for wanting an Agent to perform a host alive
check on itself?

Antony.

ken · May 18, 2021, 2:23pm

Antony,

Thank you for checking in.

"So, what use case do you have for wanting an Agent to perform a host alive
check on itself?"

I don’t have that use case, which is as you say, is not an understandable concept. I’m trying to use the Agent to perform a host alive check on other hosts.

"An Agent is performing checks on itself, not any other machines (otherwise it
would be a Master or a Satellite)."

Well, I do have it performing checks on other hosts’ services successfully. But perhaps that was not what was intended, from what you say, I need a Satellite.

From the documentation at:
https://icinga.com/docs/icinga-2/latest/doc/06-distributed-monitoring/#roles-master-satellites-and-agents

It says:
An agent node only has a parent node.

An agent node will either run its own configured checks or receive command execution events from the parent node.

I can’t find anything in there that says “on itself”. Just where the checks are configured, parent node or its own configured checks.

I think this is a documentation bug.

Pooh · May 18, 2021, 3:23pm

“I don’t have that use case, which is as you say, is not an understandable
concept. I’m trying to use the Agent to perform a host alive check on other
hosts.”

I think for me the question then is “how do you want those checks results to
be reported in Icingaweb2?”

If you’re happy for the host alive checks to be reported against the Agent
which is performing the checks, then yes, use the ping service check or
similar.

If, however, you want the host alive check results to be displayed against the
host which is being checked, then it needs to be defined as a Host in Icinga,
which means it is either a Master, a Satellite or an Agent. The obvious
choice is an Agent, which then turns the machine checking it into a Satellite.

You only get a row (or a column, depending on your screen layout) for a
machine’s check results in Icingaweb2 if it is defined as a Host.

Otherwise it’s just something that’s the result of a check for the machine
doing the check and I would find it strange to have “Machine X is down” being
reported as part of the results for Machine Q. I want Machine X to be there
in its own right.

Antony.

ken · May 18, 2021, 5:04pm

Antony,

Thanks for the questions, I hope I can put my goals into perspective.

"I think for me the question then is “how do you want those checks results to
be reported in Icingaweb2?”

I want the Agent/Satellite monitored Hosts and their Services displayed in IcingaWeb2 the same as any other system: Host with correct hostname, with actual IP address, and each Host with its associated Services.

"If, however, you want the host alive check results to be displayed against the
host which is being checked, then it needs to be defined as a Host in Icinga,
which means it is either a Master, a Satellite or an Agent. The obvious
choice is an Agent, which then turns the machine checking it into a Satellite."

I’m sorry I don’t understand this. I monitor many Hosts from the master, without turning them into Agents. If I have a fully fledged Satellite, why should I need to change each Host it monitors into an Agent?

And

“then turns the machine checking it into a Satellite” i

… in my case I can’t prove this as true. My Agent won’t ping, or return Host status results to the master. I just turned on debugging, at it appears it never receives the remote execute command for a Host status. It does receive (and execute, and report) for any Service on any of the several Hosts it monitors.

Pooh · May 18, 2021, 6:01pm

Antony,

Thanks for the questions, I hope I can put my goals into perspective.
"I think for me the question then is “how do you want those checks
results to be reported in Icingaweb2?”

I want the Agent/Satellite monitored Hosts and their Services displayed in
IcingaWeb2 the same as any other system: Host with correct hostname, with
actual IP address, and each Host with its associated Services.

That, for me, means you have to have these machines defined as Hosts in Icinga.

I’d be happy for someone else to step in and clarify if this expectation /
understanding on my part is incorrect.

"If, however, you want the host alive check results to be displayed
against the host which is being checked, then it needs to be defined as a
Host in Icinga, which means it is either a Master, a Satellite or an
Agent. The obvious choice is an Agent, which then turns the machine
checking it into a Satellite."

I’m sorry I don’t understand this. I monitor many Hosts from the master,
without turning them into Agents. If I have a fully fledged Satellite, why
should I need to change each Host it monitors into an Agent?

How are you connecting to such Hosts, then? Perhaps by SSH, in which case
you’re right, they don’t need to be Agents running Icinga. If you don’t need
service checks, then just the Host Alive check is sufficient, but you said “I
want the Agent/Satellite monitored Hosts and their Services displayed in
IcingaWeb2” so you clearly do want to include Service Checks.

And

“then turns the machine checking it into a Satellite”

… in my case I can’t prove this as true. My Agent won’t ping, or return
Host status results to the master. I just turned on debugging, at it
appears it never receives the remote execute command for a Host status. It
does receive (and execute, and report) for any Service on any of the
several Hosts it monitors.

Ah, sorry - bad phrasing on my part - I didn’t mean that it automatically
converts the machine into a Satellite - I meant that it conceptually converts
it into a Satellite according to the documentation, because it is then sitting
in between a Master and the monitored Agents.

https://icinga.com/docs/icinga-2/latest/doc/06-distributed-monitoring/

Antony.

aaron · May 19, 2021, 11:36am

Hi,
sorry to jump in like this. Do you know if its possible to just upgrade the Agent to a Satellite by running the icinga2 node wizard again?

ken · May 19, 2021, 2:51pm

aaron,

I’ll find out soon, but I think running the node wizard again is fine, as long as you don’t have duplicate or conflicting settings in parts of the config it doesn’t modify.

ken · May 19, 2021, 5:24pm

Antony,

Thanks for helping me sort out this Agent vs Satellite thing.

That, for me, means you have to have these machines defined as Hosts in Icinga.

Yes, from my first post:

In each host object in /etc/icinga2/zones.d/<remote-agent.hostname.com>/hosts.conf:

How are you connecting to such Hosts, then?

Services connect using variety of protocols such as https, port open checks, some custom REST calls, etc.

ken · May 21, 2021, 8:50pm

I manually converted the relationship to satellite and found a missing template in log file /var/lib/icinga2/api/zones-stage//startup.log . After fixing that, pinging hosts started working from the now-satellite host. I’m not sure that log file existed as an agent, or if that was even the problem as an agent, as it didn’t have config errors before, and it was getting all execution commands from the master, which had a view of all templates needed.

To answer a question I had previously, NodeName’s must match on both sides of the configuration, which is documented in the Conventions section:

https://icinga.com/docs/icinga-2/latest/doc/06-distributed-monitoring/#conventions