Problem with agent and satellite

good morning,

I’m having a problem with the connection between my agent and my satellite.
so I’ve got my master connecting to my satellite via a bastion. So the checks from the master to the satellite are done in check_by_ssh (because of the bastion) and it works very well (except that the machine is always Down because the ping is not done but the other checks work very well).
then, The problem is between the satellite and the agents… the check is done via the icinga agent but always remains in “Pending…”

If you can help me :slight_smile:

Thanks

Regards,
Kevin.

this is my master zone.conf :
zone.conf-master

Satellite zone.conf :
zone.conf-satellite

Agent zone.conf :
zone.conf-agent

When the “parent” ( i.e Satellite ) is not healthy the checks that are dependent on it will not execute and report back to the master.

A simple fix for that is to change the host check for the satellite to be either tcp on port 22 or just ssh check.

As you know those work, the satellite will get a healthy status and then the system will schedule the checks on the agents linked to the satellite.

Thanks for reply
I see…
And how can i do that ?
i have to add “port = 22” in my satellite object Endpoint in my Master zones.conf ? and/or i must delete all check via icinga agent ?

regards,

No, you have to change the check that’s used for checking whether the host is up or not. Normally you use hostalive but in your case you would replace it with tcp or ssh.

But besides that it would work way better if you had a satellite behind or on your bastion host. This is what satellites are for.

So let me get this straight:
If the check via_ssh between the Master and the Satellite is healthy, the state of the Satellite machine should be UP and the agents would work fine?

The check between the Satellite and the Agent is done via the icinga agent (behind bastion).

I changed the check_command for the Satellite. I replaced the “hostalive” by “tcp” and then “ssh” but it still doesn’t work :confused:

Sorry, I was confusing because I mixed up several answers into one.

If you can ping your satellite from the master please stay with hostalive. If not you’ll need something else to check if the satellite is reachable. This might be the icinga_zone check which gives you several benefits. It shows if the satellite is reachable and it shows if they are both connected via the Icinga cluster interconnect. You can use this to check if the agent is connected to the satellite, too.

https://icinga.com/docs/icinga2/latest/doc/19-technical-concepts/#cluster-zone

You don’t need check_by_ssh when running checks on the satellite. As long as it is connected to the master, the master can tell it to run checks or execute Plugins there. check_by_ssh is only used for hosts where you can’t install Icinga.

Does the master know about the Agent? So do you get a response when you run the following on master:

icinga2 object list --type endpoint --name "agent-01"
icinga2 object list --type zone --name "agent-01"

Don’t forget that you need all three for the checks to run: endpoint, zone and host object.

Ok, i stay with “hostalive” for Satellite.

yes, but i will check this later (once time all services are connected ^^)

Yes, I have an ouptut ! Master know Agent but Agent is always Pending…

Really ? justly I used “check_by_ssh” because Bastion… Or can I use icinga agent normally ? Last time, I tested that and checks aren’t work… that why actually I use check_by_ssh.

For summary, I have two problems :

  • checks between Master and Satellite (behind bastion) work fine but Satellite State is always Down
  • And checks between Satellite and Agent are Pending…

Thanks
Regards,

I access to Satellite via “check_by_ssh” because I use a Proxy and ssh can bypass this with ProxyCommand

You need to have an active connection between master and satellite via Icinga cluster inteconnect (normally port 5665/tcp) for the satellite to work. check_by_ssh is really just used when you can’t install Icinga (or establish a link via cluster interconnect - in that case you don’t need to install Icinga there, because you can’t control Icinga via check_by_ssh)

The reason why I’m encouraging to use cluster_zone is because it can help with debugging during initial connect. You should always add it but if you don’t wait until everything is working all right, it can help.

Hey !
Sorry for the late answer, I’ve been search around on icinga and
I was able to fix the problem at the Master and Satellite level! :smiley:
I even managed to connect the agent to the Satellite via icinga agent (I don’t know by what miracle). On the other hand, I added a second Agent on the Satellite via the icinga agent and yet this one still remains in Pending…

I really don’t understand how it’s possible but the two agents have exactly the same configuration. One of them works and the other one doesn’t… very strange.

Regards,

Could you give a bit more detail what you changed? Maybe we can help you with the other node or at least it would help other users with a similar problem.

Frankly, I don’t know about… I changed the check_command to “ping4” and then put it back to normal.
But I think the biggest change was to set the IP addresses of the Master in the zones.conf of the Satellite and the same for the Agents pointing to the Satellite.

I tried to do the same thing for the new Agent but nothing changes… There’s too much mystery

If someone has an explanation please… I don’t know where else to look. :cry:

Looks a bit like you have issues with firewalls / NAT so your hosts can’t reach each other. Can you check if port 5665 is reachable from the other hosts?

Yes port 5665 is reachable from the other hosts. And between Agent1, Agent2 and Satellite, ping works fine :confused:

You could search the logs on all the hosts for signs of connection problems.

Other things to try:

  • restart Icinga on all the nodes. Maybe you changed the configuration and just forgot to restart / reload
  • use icinga2 object list to review if all corresponding endpoint, zones and host objects are recognised on the master, the satellite and the agent. Remember the master needs to know about all objects. The satellite needs to know about the master, itself and all agents which connect to it. And the agent needs to know about itself and about the satellite
  • Review the log once again for hints of connections not working. e.g. both ends are waiting for a connection but none of them is actively connecting
  • use the cluster_zone check to make sure that the connection is finally made
  • check if the agent has a valid certificate
1 Like

Oh great, thank you very much! :smiley:
I used the “icinga2 object list” command everywhere and I could see there was an error in the satellite. In fact, the satellite had kept my configuration from yesterday which was working and didn’t want to update with the new configurations.
Once the error was corrected, the new configurations were applied and everything works :smiley:

Thanks again, my friend!

Regards,

1 Like

You’re very welcome. Have fun with Icinga and this community. :slight_smile:

1 Like