Create dependent service that only runs if its sister service is critical/warning

Hi there fellow Icingans,

I’ve got a pair of services that have a common purpose. One checks overall health and load of an application server (let’s call it check_load to keep it simple because it’s practically the same) and another that gathers information about running processes and recent activity (check_running) which exists to help non-sysadmins see an overview of data and isn’t really a check so it’s always OK.

I’ve got them both running on a frequent schedule but I don’t really want check_running to run at all. Under normal running of the server I want check_running to just say OK without the satellite server gathering any data. When check_load goes warning/critical I want check_running to switch to warning and for the data to be returned.

I’ve looked at the docs and the forums at dependency objects, dummy checks and I’m not sure if any of them are possible. In the best case scenario I’d like a service that doesn’t even make the call to the satellite server if check_load is OK. Alternatively, is there a way for the check_running service to “know” the runtime state of check_load so that it can pass this to the script and the script can then exit early and return OK?

Thanks,
connrs

I think that I’ve had some success in working through this myself so I thought I would share this here. I’d be very open to suggestion if there are better ways to handle this but here goes:

First, I defined a service for the main Load check:

apply Service "Load" {
  import "critical-service"
  check_command = "Load"
  command_endpoint = host.vars.agent_endpoint
  assign where host.vars.agent_endpoint
  // More config here...
}

Then, I defined a summary service that runs the more expensive check to gather information about the load. In principle think of this as ps aux for the application server:

apply Service "HiddenSummary" {
  import "critical-service"
  check_command = "Running"  
  command_endpoint = host.vars.agent_endpoint
  assign where host.vars.agent_endpoint
  // More config here...
}

The final service is a dummy service that uses HiddenSummary. Assuming I’ve not mis-read the docs, this will only go warning/critical if Load is critical and HiddenSummary has run its own check to fetch the most recent set of summary information.

apply Service "LoadSummary" {
  import "critical-service"
  check_command = "dummy"
  vars.dummy_state = {{
    var load_service = get_service(macro("$host.name$"), "Load")
    var running_service = get_service(macro("$host.name$"), "Running")
    if (load_service.state != 0 && running_service.last_check_result.execution_start < load_service.last_state_change) {
      // If the HiddenSummary service isn't yet updated, just keep the status as 0
      return 0;
    }
   return load_service.state;
  }}
  
  vars.dummy_text = {{
    var load_service = get_service(macro("$host.name$"), "Load")
    var running_service = get_service(macro("$host.name$"), "Running")
    if (load_service.state == 0) {
      return "OK"
    }
    if (running_service.last_check_result.execution_start < load_service.last_state_change) {
      // If the HiddenSummary service hasn't yet updated since Load went crit/warning then
      // set a basic message
      return "Gathering process data"
    }
    return running_service.last_check_result.output
  }}
  assign where host.vars.agent_endpoint && host.vars.running_mysql_defaults_file
}

Finally, I created a dependency that disables checks on the HiddenSummary service. If Load switches to a critical/warning state then it activates.

apply Dependency "disable-running-checks" to Service {
  parent_service_name = "Load"
  disable_checks = true
  states = [ Critical, Warning ]
  assign where host.vars.agent_endpoint && service.name == "HiddenSummary"
}

As a result of the above, the summary information isn’t gathered until load becomes a problem and, because of the dummy service, this information is emailed out to relevant parties without those parties needing any specific access to the server internals nor login to Icingaweb2.