Distributed monitoring in compute cluster

This is not really about a problem - more about seeking some advice before I get going so I don’t create a mess.

I have a working icinga2 with a number of nodes reporting in - what used to be called passive checks; this works fine. I am now building a compute cluster with slurm as the workload manager - each node boots up via PXE to debian 12.

When I set up a new node, I use icinga2 node wizard, and it asks for the common name, which will be the FQDN of the node; however, the compute nodes are diskless servers, so I have to configure this on the boot image, and the node’s name isn’t known. The question then is, is this possible? Can I just use the same common name for all the nodes, or will this cause a conflict somewhere?

Agent nodes also have their own unique zone. By convention you must use the FQDN for the zone name. Details can be found here.

But it doesn’t answer the question, of how to monitor diskless nodes in a netboot cluster with read-only image:v We’re struggling with it too, so is there any update on this case?

  • Use the agent but auto configure it on every boot, via the director self-service API
  • Use the director to generate the hosts in Icinga2 and then use check_by_ssh or SNMP

I don’t want to use director, so can I use wildcard certificates?
Another way for me is to generate all certs and mount the proper one via nfs. What do you think?