Sending state updates to related objects

Hi!, here’s one thing I’m trying to do and haven’t figured out after reading the docs and searching around. Perhaps someone with more experience can show me the way around this.
I’m monitoring several thousand Access Points.
This access points are in hundreds of different physical locations that I’ll call networks.
Using my checks, I already have the status of each AP.
I created dummy host objects for each network, so I can decide the status of the network based on how many APs are failing, because I’ll take action when the network is in a certain state, but not when individual APs fail.
This dummy checks are calculated on runtime with a function that evaluates the APs of the network. (the AP object hast a networkId property, of course).
The thing is that the network check runs at a certain interval (lets say, 10m), and if an AP fails, there’s a chance that the network won’t reflect that change in 10m. Of course, I can set that check interval to 1s, or 10s, but I think I’m trying to kill a fly with a shotgun with that solution, because most of the time there’ll be no change in status, and I’ll be running that function over and over again.
What I thought about is that , if I can take action on a state change in the APs, and in that action I can trigger the dummy check for the object that represents the network it’ll be more efficient.
Perhaps I’m overthinking this too much, please let me know if somebody figures out a better solution.
Thanks!
M.

Can you share the configuration bits you’ve done thus far? I’m trying to imagine how it is done.

Cheers,
Michael

Yea sure! Here’s my best to keep this example simple. I’ll be glad to share all the config, but perhaps it’ll be confusing.
Before the actual code just a brief summary of what you’ll see.
Here you’ll see Host object definitions for what I’ll call devices and networks.

  • devices are the Access Points.
  • networks are the logical and physical grouping of this Access Points.
  • A device belongs to one and only one network.
  • network status is defined but the amount of devices in warning, critical, or unkown inside a network.
  • The service named devicesStatus uses a custom command to get from meraki api the status of a particular device.

The template and inheritance is done in the way you’ll see because I’m trying to manage different device models, and do different checks based on the device models. I’m not including other models here to make it simple for this question.

Here’s the actual code:
Device>

object Host "ABCD-1234-ABCD" {
  import "MR33"
  vars.serial = "ABCD-1234-ABCD"
  vars.mac = "0c:c0:0c:c0:0c:c0"
  vars.networkId = "X_123456789123456789"
  vars.organization = "123123"
}

template Host "MR33" {
  import "generic-meraki-device"
  vars.model = "MR33"
  vars.deviceType = "AP"
}

template Host "generic-meraki-device" {
  import "generic-meraki"
  vars.type = "device"
  vars.dummy_text = {{
    var output = ""
    var networkName = ""
    for (s in get_objects(Service)) {
      if (s.host_name == host.name && s.name == "deviceStatus") {
        output += "<div class=\"preformatted\"><div>" + s.name + " output: " + s.state + "</div>"
        break
      }
    }
    for (n in get_objects(Host)) {
      if (n.vars.networkId == host.vars.networkId && n.vars.type == "network") {
        networkName = n.name
        output += "<div>Network Data: </div>"
        for (var p => var value in n.vars){
          if (typeof(value) == String){
            if (p == "name"){
              output += "<div>  " + p + ": <a href=\"/icingaweb2/monitoring/host/show?host=" + value + "-network\">" + value + "</div>"
            }else{
              output += "<div>  " + p + ": " + value + "</div>"
            }
          }
        }
        break
      }
    }
    output += "</div>"
    return output
  }}

}

template Host "generic-meraki" {
  max_check_attempts = 3
  check_interval = 1h
  retry_interval = 10m
  check_command = "dummy"

  vars.dummy_state = {{
    var myState = 0
    for (s in get_objects(Service)) {
      if (s.host_name == host.name && s.name == "deviceStatus") {
        myState = s.state
        return myState
      }
    }
  }}
  vars.dummy_text = {{
    for (s in get_objects(Service)) {
      if (s.host_name == host.name && s.name == "deviceStatus") {
        return s.name + " OUTPUT: " + s.state
      }
    }
  }}

}

Network>

object Host "606941-network" {
  import "meraki-network"
  vars.networkId = "X_123456789123456789"
  vars.name = "333111"
  vars.tags = " mytag "
  vars.organization = "123123"
}

template Host "meraki-network" {
  import "generic-meraki"
  max_check_attempts = 3
  check_interval = 5m
  retry_interval = 1m
  vars.type = "network"
  vars.dummy_state = {{
    var myState = 0
    var ok = 0
    var warning = 0
    var critical = 0
    var unknown = 0
    for (h in get_objects(Host)) {
      //search for hosts that have this networkId and type "device"
      if (h.vars.networkId == host.vars.networkId && h.vars.type == "device") {
        //count the status
        if (h.state == 0){
          ok += 1
        }
        if (h.state == 1){
          warning += 1
        }
        if (h.state == 2){
          critical += 1
        }
        if (h.state == 3){
          unknown += 1
        }
      }
    }
    //return the status of this service according to the amount of hosts in certain status
    //TODO: this amount sholud be a variable, perhaps in a template.
    if (unknown > 3){
      myState = 3
    }
    if (warning > 3){
      myState = 1
    }
    if (critical > 3){
      myState = 2
    }
    return myState
  }}

  vars.dummy_text = {{
    var ok = 0
    var warning = 0
    var critical = 0
    var unknown = 0
    var devices = []
    for (h in get_objects(Host)) {
      //same search as for the state
      if (h.vars.networkId == host.vars.networkId && h.vars.type == "device") {
        //this time we'll keep the devices that match the search
        devices.add(h)
        if (h.state == 0){
          ok += 1
        }
        if (h.state == 1){
          warning += 1
        }
        if (h.state == 2){
          critical += 1
        }
        if (h.state == 3){
          unknown += 1
        }
      }
    }
    var devicesstring = []
    //here we'll use the device to add usefull data to the output
    for (d in devices) {
      devicesstring.add("<div class=\"preformatted\"><div>device: <a href=\"/icingaweb2/monitoring/host/show?host=" + d.name + "\">" + d.name + "</a></div><div>  status: " + d.state + "</div><div>  mac: " + d.vars.mac + "</div><div>  model: " + d.vars.model + "</div></div>")
    }
    //lookup organization
    var organizationName = ""
    for (o in get_objects(Host)) {
      if (o.vars.organization == host.vars.organization && o.vars.type == "organization"){
        organizationName = o.name
      }
    }
    return "<div class=\"preformatted\"><div>OK: " + ok + "</div><div>Warning: " + warning + "</div><div>Critical: " + critical + "</div><div>Unknown: " + unknown + "</div><div>Organization: <a href=\"/icingaweb2/monitoring/host/show?host=" + organizationName + "\">" + organizationName  + "</a></div></div><div class=\"preformatted\">Devices: </div>" + devicesstring.join("<br/>")
  }}

}

Services>

// this uses the parseData custom command to parse data from the meraki api.
apply Service "deviceStatus" {
  import "meraki-device-service"
  check_command = "parseData"
  vars.data = "merakiOrganizationStatus" + host.vars.organization
  vars.filter = ".serial == \"" + host.vars.serial + "\""
  vars.property = "status"
  vars.wthresh = "alerting"
  vars.cthresh = "offline"
  vars.okthresh = "online"
  vars.dangerline = "mregxwco"
  assign where host.vars.type == "device"
}

template Service "meraki-device-service" {
  import "generic-service"
  check_interval = 10m
  retry_interval = 5m
  vars.data = "merakiOrganizationStatus" + host.vars.organization
}


// not much use to this service as it gives almost the same info as the host check
apply Service "meraki-network-devices" {
  import "generic-service"
  check_command = "dummy"
  vars.dummy_state = {{
    var myState = 0
    var ok = 0
    var warning = 0
    var critical = 0
    var unknown = 0
    for (h in get_objects(Host)) {
      //search for hosts that have this networkId and type "device"
      if (h.vars.networkId == host.vars.networkId && h.vars.type == "device") {
        //count the status
        if (h.state == 0){
          ok += 1
        }
        if (h.state == 1){
          warning += 1
        }
        if (h.state == 2){
          critical += 1
        }
        if (h.state == 3){
          unknown += 1
        }
      }
    }
    //return the status of this service according to the amount of hosts in certain status
    //TODO: this amount sholud be a variable, perhaps in a template.
    if (unknown > 3){
      myState = 3
    }
    if (warning > 3){
      myState = 1
    }
    if (critical > 3){
      myState = 2
    }
    return myState
  }}

  vars.dummy_text = {{
    var ok = 0
    var warning = 0
    var critical = 0
    var unknown = 0
    var devices = []
    for (h in get_objects(Host)) {
      //same search as for the state
      if (h.vars.networkId == host.vars.networkId && h.vars.type == "device") {
        //this time we'll keep the devices that match the search
        devices.add(h)
        if (h.state == 0){
          ok += 1
        }
        if (h.state == 1){
          warning += 1
        }
        if (h.state == 2){
          critical += 1
        }
        if (h.state == 3){
          unknown += 1
        }
      }
    }
    var devicesstring = []
    //here we'll use the device to add usefull data to the output
    for (d in devices) {
      devicesstring.add("device: " + d.name + "\n  status: " + d.state + "\n  mac: " + d.vars.mac + "\n  model: " + d.vars.model)
    }
    return "OK: " + ok + "\nWarning: " + warning + "\nCritical: " + critical + "\nUnknown: " + unknown + "\nDevices: \n" + devicesstring.join("\n")
  }}
  assign where host.vars.type == "network"

}

Please let me know if more info would be helpfull.
Also, I’ll be glad to share how I’m parsing data from meraki API or other stuff but that’s not part of this question, and I’m trying to keep it simple to understand.
Thanks a lot!.

M.