Condition Service Check in Cluster

Here all my configs, Can you please give me an advise?

---------------------Adding the appropriate service to Icinga2----------------

Step1:

Create ClusterAmazonSSMAgent service block for AmazonSSMAgent service

apply Service “ClusterAmazonSSMAgent” {
check_command = “service-windows”
enable_notifications = false
vars.service_win_service = “AmazonSSMAgent” //access the host custom vars
command_endpoint = host.vars.client_endpoint

assign where host.vars.SSMAgent == “AmazonSSMAgent” && host.vars.client_endpoint
}

Step2:

Updated both host config files with var attributes

object Host “******-01.#######ANDCO.AWS” {
check_command = “hostalive”
address = “########”
vars.SSMAgent = “AmazonSSMAgent”

object Host “******-02.#######ANDCO.AWS” {
check_command = “hostalive”
address = “########”
vars.SSMAgent = “AmazonSSMAgent”

So that i can see the AmazonSSMAgent up & running on icinga2 with notifications disabled…

----------------------Adding BP---------------------

Step3:

Adding the BP Service block on same service config file in step 1

apply Service “BP CluserService” {

check_command = “icingacli-businessprocess”
enable_notifications = true

check_interval = 1s
retry_interval = 30s

command_endpoint = “<Icinga2 Master(Linux box)>” // BP is a Icingweb2 module and the command should run where icingaweb2 is installed.

vars.icingacli_businessprocess_config = “ClusterAmazonSSMAgent” // Put in here the filename of your BP config without the extension.
vars.icingacli_businessprocess_process = “ClusterAmazonSSMAgent”
assign where host.vars.bpssm == “AmazonSSMAgent”
}

Step4: Added another dummy hostfile with different name as shown below

object Host “*****-cluster.#######ANDCO.AWS” {
check_command = “dummy”
check_interval = 24h
retry_interval = 24h
enable_notifications = true
vars.bpssm = “AmazonSSMAgent”
vars.dummy_text = “Cluster dummy host”
vars.dummy_state = 0
}

Step5: Added BP config file on Icinga2 Master

/etc/icingaweb2/modules/businessprocess/processes# cat ClusterAmazonSSMAgent.conf

ClusterAmazonSSMAgent = ******-01.#####ANDCO.AWS;ClusterAmazonSSMAgent | ******-02.#######ANDCO.AWS;ClusterAmazonSSMAgent
display 1;ClusterAmazonSSMAgent;ClusterAmazonSSMAgent

Restarted icinga2 & Tested the scenarios like shutdown the both the services & i have not seen any email notification.

My expectation is notification should only send when both the services are shutdown… If any one of the service shudown we should not get any email notification…

Here the BP Screenshots

Do you have questions on my service config? can you please advise?

Please show where the services are executed now aka “Check source”.
You can find this in Icinga Web 2 under “Check execution”.
My guess: They are still executed on the master/linux server.
You have no variable named client_endpoint at your hosts. Is the Icinga Agent even installed on the two Windows Servers?

Also please format you code blocks using Markdown. Check here: Create topics and master Markdown formatting

Many thanks for your response…
We have agents installed on both the windows servers… I see check execution happening on remote server… Check execution is not happening on Master/Linux server…
Here, the screenshot.

image

On object host definition i have defined the variable vars.SSMAgent = “AmazonSSMAgent” … really not sure how can make this works :neutral_face:

Then I suggest you compare the service applyrule/template of the checks that are executed on the client with the SSMAgent ones.

This does not seem to work at least, because you either need that variable on the host and fill it with the name of the host (if it is the same as the endpoint name). Or you put command_endpoint = host.name in the service template directly

Here is an exeample from the docs: Distributed Monitoring - Icinga 2

The variable var.SSMAgent has nothing to do with check execution. You just use it to bind the service apply rule to the hosts that have this variable.

I am still wondering business process module should be able to do the Conditional windows service check… Still its not working… :neutral_face:

A business process’ condition can only work if the check inside the BP are working correct.
Are they?

That might be right… condition inside BP might not be working… Just want to introduce my question here,

Suppose if there are same services running on two servers, i should only get notification if both the services on those servers down… if service running on any of the servers? it should not send notification… looks like business process not workin… apart from business process is there any way that i can do the conditional service check on both the servers?

I’m confused now.
One time the service must not run on both servers at the same time.
Now it can run on both servers, but must not be down on both simultaneously.

Both cases can be check with a business process.
For the first one create a BP with both services connect by an AND.
For the second connect them with an OR.

Sorry for the confusion…Apologies for bothering you on this… I have been struggling from days & weeks now…

For example: We have ClusterAmazonSSMAgent windows service running on both A & B servers(note: A & B Servers are clustered). If ClusterAmazonSSMAgent service down on both servers we should get notification… if service down in any one of the server we shouldn’t get notification… we should get notification only when that specific service down on both the servers this is my requirement…

To test notification part…
I manually shutdown the both the services & i see critical in business process. I have tried with AND & OR… Still i am not receiving any notification.
Here, the screenshots

After you have your BP setup you need to create a service that checks the BP.
See the ITL for the command. The check has to run on the webserver with the BP-Module installed.

Cheers,
Carsten

you are talking about this right?

object Service "ClusterAmazonSSMAgent" {
  host_name = "*********01*********ANDCO.AWS"
  check_command = "service-windows"
  vars.service_win_service = "AmazonSSMAgent"
  
  enable_notifications = false
  
  check_interval = 60s
  retry_interval = 15s 
  command_endpoint = "*******01***********ANDCO.AWS"
  }



object Service "ClusterAmazonSSMAgent" {
  host_name = "*********02*********ANDCO.AWS"
  check_command = "service-windows"
  vars.service_win_service = "AmazonSSMAgent"
 
  enable_notifications = false
  command_endpoint = host.name

  check_interval = 60s
  retry_interval = 15s
  command_endpoint = "*********02*********ANDCO.AWS"
}


object Service "BP CluserService" {
  host_name = "******-cluster.******andco.aws"

  check_command = "icingacli-businessprocess"
  enable_notifications = true

  check_interval = 1s
  retry_interval = 30s

  command_endpoint = "******.*****.int" // BP is a Icingweb2 module and the command should run where icingaweb2 is installed.

  vars.icingacli_businessprocess_config = "ClusterAmazonSSMAgent" // Put in here the filename of your BP config without the extension.
  vars.icingacli_businessprocess_process = "ClusterAmazonSSMAgent"
}

/etc/icingaweb2/modules/businessprocess/processes# cat ClusterAmazonSSMAgent.conf
### Business Process Config File ###
#
# Title           : CluserSSMService
# AddToMenu       : no
# Statetype       : hard
#
###################################

ClusterAmazonSSMAgent = aws-dpro-01.NICHOLASANDCO.AWS;ClusterAmazonSSMAgent | aws-dpro-02.NICHOLASANDCO.AWS;ClusterAmazonSSMAgent
display 1;ClusterAmazonSSMAgent;ClusterAmazonSSMAgent

Yes, this looks good, though I would not use a check interval of 1s.

Now you only need a notification template and a notification apply rule as well as a user to get notifications.
https://icinga.com/docs/icinga2/latest/doc/04-configuration/#notificationsconf
https://icinga.com/docs/icinga2/latest/doc/03-monitoring-basics/#using-apply-notifications

This is really important. There’s in fact very few reasons to have a very short check interval. And there is a thing called “death by monitoring”. Don’t overdo it. Icinga is availability monitoring after all and there’s no need to know a problem within seconds when your admins take their time to react anyway.

Yeah i have changed check interval to 60s for BP_ClusterService.

As you said, I have configure a notification template & apply to service check right?

Here i should apply notification to Service right?

apply Notification "mail-noc" to Service {
  import "mail-service-notification"

  user_groups = [ "noc" ]

  assign where host.vars.notification.mail
}

I have noticed service apply rule as shown above… I am just thinking through how i can i do the apply to send the notification to user groups only if this service down in both hosts? sorry for asking this… Can you show me some example? if you don’ mind? many thanks in advance

Please read through the two links I posted above concerning the notifications.
They should give a good introduction to the Topic.

It does not help simply copying part from the documentation, also you ahve to understand what you are doing.
The configuration has to reflect you Monitoring System.

So if you hosts do not have a variable “notification.mail”, it is useless to assign something using that variable.

Here are some HowTo links to get you going: