Create dependent service that only runs if its sister service is critical/warning

I think that I’ve had some success in working through this myself so I thought I would share this here. I’d be very open to suggestion if there are better ways to handle this but here goes:

First, I defined a service for the main Load check:

apply Service "Load" {
  import "critical-service"
  check_command = "Load"
  command_endpoint = host.vars.agent_endpoint
  assign where host.vars.agent_endpoint
  // More config here...
}

Then, I defined a summary service that runs the more expensive check to gather information about the load. In principle think of this as ps aux for the application server:

apply Service "HiddenSummary" {
  import "critical-service"
  check_command = "Running"  
  command_endpoint = host.vars.agent_endpoint
  assign where host.vars.agent_endpoint
  // More config here...
}

The final service is a dummy service that uses HiddenSummary. Assuming I’ve not mis-read the docs, this will only go warning/critical if Load is critical and HiddenSummary has run its own check to fetch the most recent set of summary information.

apply Service "LoadSummary" {
  import "critical-service"
  check_command = "dummy"
  vars.dummy_state = {{
    var load_service = get_service(macro("$host.name$"), "Load")
    var running_service = get_service(macro("$host.name$"), "Running")
    if (load_service.state != 0 && running_service.last_check_result.execution_start < load_service.last_state_change) {
      // If the HiddenSummary service isn't yet updated, just keep the status as 0
      return 0;
    }
   return load_service.state;
  }}
  
  vars.dummy_text = {{
    var load_service = get_service(macro("$host.name$"), "Load")
    var running_service = get_service(macro("$host.name$"), "Running")
    if (load_service.state == 0) {
      return "OK"
    }
    if (running_service.last_check_result.execution_start < load_service.last_state_change) {
      // If the HiddenSummary service hasn't yet updated since Load went crit/warning then
      // set a basic message
      return "Gathering process data"
    }
    return running_service.last_check_result.output
  }}
  assign where host.vars.agent_endpoint && host.vars.running_mysql_defaults_file
}

Finally, I created a dependency that disables checks on the HiddenSummary service. If Load switches to a critical/warning state then it activates.

apply Dependency "disable-running-checks" to Service {
  parent_service_name = "Load"
  disable_checks = true
  states = [ Critical, Warning ]
  assign where host.vars.agent_endpoint && service.name == "HiddenSummary"
}

As a result of the above, the summary information isn’t gathered until load becomes a problem and, because of the dummy service, this information is emailed out to relevant parties without those parties needing any specific access to the server internals nor login to Icingaweb2.