Condition Service Check in Cluster

HI Team,

We need an advise from you…

Suppose there is Service A that has been installed on both servers server1 & server2… Service A shouldn’t run on both servers.

Both server1 & server2 are in a cluster… When we make server1 primary, service A shouldn’t run on server2. If we make server2 primary then service A shouldn’t run on server1

we are getting critical(false positive alert) for server that doesn’t have running Service A… Is there anyway, I can handle (Conditional check) these kind of situations…

Apologies & let me know if my question is not clear

Presumably the cluster has a floating IP address which is on only one of the
two machines at any given time, and that is also the machine which should be
running the service.

If that is the case, create a host object in Icinga with the floating IP
address, and apply the service check to that machine and not to either of the
two real machines.

Just out of interest, what cluster management / failover software are you
using which works well with two machines in a cluster (rather than three), and
what does it do if the two machines lose connectivity with each other (split
brain)?

Regards,

Antony.

Antony,

We have been using the failover cluster manager, #####.CLSTR.***** is the cluster endpoint SQL-01 & SQL02 are the servers under CLSTR… When i ping CLSTR i am getting different ip & when i do the RDP into CLUSTR i will be either SQL-01 & SQL-02… &
we need to generate an alarm only if the services are down simultaneously in the 2 servers

In icinga2 I have added another host file with CLSTR ip & name etc… I am able to see in icinga2 but all the service checks are not able to connect to CLSTR… Here the screeshot…

Based on the below screeshots, Can you please advise?

BusinessProcess modules is your friend. Create 2 serbices checking if the services runs on the server. Then create a BP and put a service on it.

1 Like

Many thanks for your help… I have been trying with Business process module… & little struggling to work to build the business process for conditional service check. I am following the documentation… Apart from documentation… Can you please provide me a source or example for this? Many thanks in advance…

I think the way with the Cluster-IP is the better solution here.
With the BP module you would still have to work out something that knows which is the active cluster node and thus which service check must be OK and which can be critical. But maybe I’m not thinken straight, it’s still early ;).

With the Cluster-IP and the agent my guess would be that you run into problems with the host certificate names. As the cluster IP will always point to one of the nodes, that already have the icinga agent installed “under their own name”.
Here is a github issue for this, where the problem is discussed:

My solution path would be to either use SNMP or the NSClient for this. Install one of them one both servers and then monitor the Windows service via the cluster IP.

HI,

dont have a good link. But i can give you an easy example.
Lets say you have hostA and hostB with an service ClusterService that runs either on hostA or hostB.
Configure the service ClusterService on both hosts like

object Service "CluserService" {
  host_name = "hostA"

  check_command = "whatever"
  enable_notifications = false

  check_interval = 60s
  retry_interval = 15s
}
object Service "CluserService" {
  host_name = "hostB"

  check_command = "whatever"
  enable_notifications = false

  check_interval = 60s
  retry_interval = 15s
}

The service will not send out any notification, because we want to notify only the cluster check from business process module.

Now create a BP, i can give you only the configuration file (i dont like to create them in the gui)

### Business Process Config File ###
#
# Title           : ClusterService
# AddToMenu       : no
# Statetype       : hard
#
###################################
ClusterService = hostA;ClusterService | hostB;ClusterService
display 1;ClusterService;ClusterService

If you have a 3 node or more node cluster you can change the line for Clusterservice to

ClusterService = 1 of: hostA;ClusterService + hostB;ClusterService + hostC;ClusterService

Now Create a dummy host object for the cluster services:

object Host "ClusterHost" {
  check_command = "dummy"
  check_interval = 24h
  retry_interval = 24h
  enable_notifications = false

  vars.dummy_text = "Cluster dummy host"
  vars.dummy_state = 0
}

Now add a BP service for ClusterService to the host

object Service "BP CluserService" {
  host_name = "ClusterHost"

  check_command = "icingacli-businessprocess"
  enable_notifications = true

  check_interval = 1s
  retry_interval = 30s

  command_endpoint = "YourIcingaweb2Server" // BP is a Icingweb2 module and the command should run where icingaweb2 is installed.

  vars.icingacli_businessprocess_config = "ClusterService" // Put in here the filename of your BP config without the extension.
  vars.icingacli_businessprocess_process = "ClusterService"
}

Thats it, as long one of the two nodes or one of X nodes is up, the BP Service will be in state OK. if both go down it will turn critical and send out a notification.

Regards,
Carsten

2 Likes

Hi Carsten,

this is how I would do it as well, but only of the service needs to run on one of the nodes, regardless which node it is.
But if I understood the opening post correctly, the service must run on the active node of the cluster and must not run on the stand-by node.

And this isn’t be covered by a simple business process, because that does not know which of the nodes is the active one. There, at least, needs to be another check that changes it’s state based on the node status, e.g. check primary node for “active” and stand-by node for “stand-by”.

@log1c

This was the initial question and can be covered with BP. A check if it runs on both server at the same time can be down with check_multi.

1 Like

Ah, I missed that :smiley:
Then your solution will work and is the easiest to implement :+1:

I was only focused on this part

That can be done with a simple BP and negate also :slight_smile:
Add these line to the BP

SHOULDNOTHAPPEN = hostA;ClusterService & hostB; ClusterService
display 0;SHOULDNOTHAPPEN;SHOULDNOTHAPPEN 

And a service like this to the dummy host

object Service "BP ShouldNotHappen" {
  host_name = "ClusterHost"

  check_command = "negate"
  enable_notifications = true

  check_interval = 1s
  retry_interval = 30s

  command_endpoint = "YourIcingaweb2Server" // BP is a Icingweb2 module and the command should run where icingaweb2 is installed.

  vars.negate_ok = "CRITICAL"
  vars.negate_substitute = true
  vars.negate_command = "/usr/bin/icingacli"
  vars.negate_arguments = [ "businessprocess", "process", "check", "--config", "ClusterService", "ClusterService" ]
 }
1 Like

It is way too early :man_facepalming: :man_shrugging: :laughing:

1 Like

@anon66228339 i am impressed… Today will try with the solution that you posted… Hopefully this will work… Many thanks for your help…

Many thanks all…

1 Like

@anon66228339 - Apologies if something i am doing so dumb…

I have been testing with AmazonSSMagent service on two separate windows servers. this is new service & we never added to icinga

Step1:

In service configuration file i started adding the following service blocks

object Service “ClusterAmazonSSMAgent” {
host_name = “*****-01.#####CO.AWS”
check_command = “service-windows”

enable_notifications = false

check_interval = 60s
retry_interval = 15s
}
object Service “ClusterAmazonSSMAgent” {
host_name = “*****-02.#####CO.AWS”
check_command = “service-windows”

enable_notifications = false

check_interval = 60s
retry_interval = 15s
}

Step2:

On Icinga2 server under /etc/icingaweb2/modules/businessprocess/processes , i have added ClusterAmazonSSMAgent.conf file with the below information

ClusterAmazonSSMAgent = *****-01.######ANDCO.AWS;ClusterAmazonSSMAgent | *****-02.######ANDCO.AWS;ClusterAmazonSSMAgent
display 1;ClusterAmazonSSMAgent;ClusterAmazonSSMAgent

Step3:

I have added host file ******-cluster.######.AWS.conf with the below content ( Not sure which command i should add there in check_command )

object Host “*****-cluster.#####CO.AWS” {
check_command = “dummy”
check_interval = 24h
retry_interval = 24h
enable_notifications = false

vars.dummy_text = “Cluster dummy host”
vars.dummy_state = 0
}

Step4:

In the same file where i have added the service blocks in step1, I have added the below service block for business process

Added the below block in the same

object Service “BP CluserService” {
host_name = “*****-cluster.#####CO.AWSt”

check_command = “icingacli-businessprocess”
enable_notifications = true

check_interval = 1s
retry_interval = 30s

command_endpoint = “********.#####.int” // have defined the Icinga2 Server name where icingaweb2 installed.

vars.icingacli_businessprocess_config = “ClusterAmazonSSMAgent” // Put in here the filename of your BP config without the extension.
vars.icingacli_businessprocess_process = “ClusterAmazonSSMAgent”
}

Checked icinga2 configs & restarted the icinga2

I see unknown message on icinga2…

image

Can you please correct if i am doing anything wrong?

I think your 2 normal services is missing the var with the windows servicename it should check.

Carsten - I have tried that as well… somehow pluggin is not working here… Actually i never used object service i always use apply service while i defining the service blocks…
Here, the updates that i made

object Service “ClusterAmazonSSMAgent” {
host_name = “*****-01.#######ANDCO.AWS”
check_command = “service-windows”
vars.service_win_service = “AmazonSSMAgent”

enable_notifications = false
command_endpoint = host.vars.client_endpoint

check_interval = 60s
retry_interval = 15s

}
object Service “ClusterAmazonSSMAgent” {
host_name = “*****-02.#######ANDCO.AWS”
check_command = “service-windows”
vars.service_win_service = “AmazonSSMAgent”

enable_notifications = false
command_endpoint = host.vars.client_endpoint

check_interval = 60s
retry_interval = 15s

}

Getting the below error…
image

You can also use a apply rule,mine was only an example.
Important in the bp config is only that you put HOSTNAME;SERVICENAME pair in

You are using a check plugin of the windows agent, but try to execute it on the monitoring server itself (Linux).
If the two hosts “*****-01.#####CO.AWS” & “*****-02.#####CO.AWS” have the Icinga Agent installed, add a command_endpoint = ... to the service config, so that the checks for the running service will be executed on the host itself.

Thanks for your resp… i have added same service from 01 & 02 servers to icinga2… I can able to see the service in icinga2… Updated bp config as per your recommendation… At this time i can able to see the services on Business process… I am testing the scenario like if i shut down services in both servers… it should send the notification… unfortunately this is not happening… Here the screenshot of business process for service checks… Apologies for bothering you much on this issue… i got this request from my boss & struggling to resolve this issue… we are doing this for a bunch of sql servers in my environment… :neutral_face: Based on the screenshot you see anything weird?

If you now just changed the checks to the icinga2 server this won’t work as well.
Now you are checking for a Windows service on the Icinga 2 master Linux machine…

Please show the config of the services.

For playing around with just the BP, you can click on “Unlock Editing” and then click the “Wand” icon and simulate states for the checks.