How to use result from one service to assign another

So I have one relatively simple service here

apply Service "ccp_is_in_recovery_status" {
  import "pg_service_template" 

  vars.pg_action   = "ccp_is_in_recovery_status"

}

It returns a value of 1 if a postgresql database is a replica and a value of 2 if it is a primary.

I have another service that monitors for database replica delay.

apply Service "ccp_replication_lag_size_bytes" {
  import "pg_service_template"

  vars.pg_action   = "ccp_replication_lag_size_bytes"

   if (vars.ccp_replication_lag_size_bytes_alert == "true") {
      vars.pg_compare  = "1"

      vars.pg_warning = "536870912"
      if (host.vars.ccp_replication_lag_size_bytes_warning != "") {
        vars.pg_warning = host.vars.ccp_replication_lag_size_bytes_warning
      }
      vars.pg_critical = "1073741824"
      if (host.vars.ccp_replication_lag_size_bytes_warning != "") {
        vars.pg_critical = host.vars.ccp_replication_lag_size_bytes_critical
      }
     
  }

  assign where host.vars.pg_primary_with_streaming_replica == "true" 
}

I currently assign the replication lag service by checking a host variable pg_primary_with_streaming_replica. However I would like to automate this more so we don’t have to keep updating the icinga host configuration every time there is a database failover. I just want the primary to have the replication monitoring and have it switch which systems have that service depending on the recover status.

Is it possible to retrieve the result of ccp_is_in_recovery_status to use as an “assign where” criteria? I tried retrieving it with get_service() but there doesn’t seem to be a simple numerical result field that just contains the value. The closest I’ve found is

last_check_result -> output = "ccp_is_in_recovery_status OK: ccp_is_in_recovery_status: 1 "

Moving the service around is in my eyes a bad design. What I typically do is having the same service on both nodes, on the active one doing the check and on the passive one just returning a text that active checks are on the other node and perhaps some (empty) perfdata so a graph is still drawn. If needed you can use something like Advanced Topics - Icinga 2 to get some more dynamic into the configuration.

Be happy to do something like this, but how do you determine which one is active/passive? Is that something you set in the hosts file and have to update on a change? That’s what I’m trying to avoid.

Will try looking at your doc reference to see if I can figure it out from there, but figured I’d ask as well since it’s not clear how after my initial reading.

How determine active/passive depends on the service, so for postgres a sql-statement querying pg_stat_replication or pg_is_in_recovery should do this (not very firm with postgres). So a wrapper around the normal plugin which determines the state first by doing such a statement, depending on the result run the actual check or only create some output.

Yes, I know how to determine that in PostgreSQL and I have that Service in my original post (ccp_is_in_recovery_status).

How are you determining in icinga itself whether to do the active check or the passive check on the host based on it being a primary or a replica?

Icinga can not do this but a plugin can do whatever you want, so simply write a wrapper plugin around the already existing one.

Pseudo code:

Take all arguments the real check needs
Run some code to determine if the node is the active one
If the node is the active one
    Run the real check with all parameters
else
    "Node is the passive one, check is not executed | perfdata=0" && exit 0
end
1 Like