Thank you for the excellent documentation here:
- DSL: Get host objects in hostgroup with get_objects() and Array#filter (deep-dive into lambda expressions, functions and closures)
- Get_host_group function
These seem to work flawlessly for all of our on-premise (statically defined in the config) hosts; our trouble seems to be that this method is not returning any results from hosts that were added via the API, and I was hoping someone could show me what I’m overlooking.
Summary
- Running on: RHEL 6.7
- Version: icinga2 2.10.3-1 (via yum)
- All methods described here work for statically defined hosts.
- For our AWS hosts, they are added via the Icinga2 API (via an AWS Lambda). Their Templates and HostGroups are still statically defined in the Icinga config
Console test
If I connect to the icinga2 console and run this code, it works perfectly, even for the AWS hosts:
host_group = "aws_test01"
filter_function = function(node) use(host_group) { host_group in node.groups }
get_objects(Host).filter(filter_function)
But, when used in the function and then viewed in Icingaweb2, I get a zero-length array back.
Method 1
Cluster host definition:
object Host "Cluster: App: Healthcheck URL: aws_test01" {
import "aws_test01_cluster"
check_command = "dummy"
vars += {
dummy_state = get_dummy("state", "aws_test01", "App: Healthcheck URL")
dummy_text = get_dummy("text", "aws_test01", "App: Healthcheck URL")
}
}
get_dummy function (work in progress):
globals.get_dummy = function(t, host_group, service_name) {
return function() use (t, host_group, service_name) {
filter_function = function(node) use(host_group) { host_group in node.groups }
cluster_nodes = get_objects(Host).filter(filter_function)
threshold = macro("$host.vars.threshold$")
if (t == "state") {
down_count = 0
for (node in cluster_nodes) {
health_state = get_service(node, service_name).last_check_result.state
if (health_state > 0) {
down_count += 1
}
}
# If no nodes were detected, exit UNKNOWN.
if (down_count >= threshold) {
return 2
} else {
return 0
}
} else if (t == "text") {
down_count = 0
host_state = { "CRITICAL" = {}, "OK" = {} }
for (node in cluster_nodes) {
health_state = get_service(node, service_name).last_check_result.state
health_output = get_service(node, service_name).last_check_result.output
if (health_state == 0) {
host_state["OK"][node.name] = health_output
} else {
host_state["CRITICAL"][node.name] = health_output
down_count += 1
}
}
up_count = len(cluster_nodes) - down_count
if (len(cluster_nodes) == 0) {
output = "[UNKNOWN] "
} else if (down_count >= threshold) {
output = "[CRITICAL] "
} else {
output = "[OK] "
}
output += "Cluster: " + up_count + "/" + len(cluster_nodes)
output += " nodes up. (" + down_count + " down.) "
output += "Failure threshold: " + threshold + "\n"
for (stat => href in host_state) {
for (hname => check_output in href) {
output += "[" + stat + "] " + hname + ": " + check_output + "\n"
}
}
output += "\nDebug:\n"
output += "t: '" + t + "'\n"
output += "host_group: '" + host_group + "'\n"
output += "service_name: '" + service_name + "'\n"
output += "len(cluster_nodes): '" + len(cluster_nodes) + "\n\n"
for (node in cluster_nodes) {
output += "node: '" + node.name + "'\n"
}
return output
}
}
}
And the output in Icinga for an aws cluster:
[UNKNOWN] Cluster: 0/0 nodes up. (0 down.) Failure threshold: 2
Debug:
host_group: 'aws_test01'
service_name: 'App: Healthcheck URL'
len(cluster_nodes): '0
Method 2 - define in lamdba
dummy_text = {{
host_group = "aws_test01"
service_name = "App: Healthcheck URL"
filter_function = function(node) use(host_group) { host_group in node.groups }
cluster_nodes = get_objects(Host).filter(filter_function)
threshold = macro("$host.vars.threshold$")
down_count = 0
host_state = { "CRITICAL" = {}, "OK" = {} }
for (node in cluster_nodes) {
health_state = get_service(node, service_name).last_check_result.state
health_output = get_service(node, service_name).last_check_result.output
if (health_state == 0) {
host_state["OK"][node.name] = health_output
} else {
host_state["CRITICAL"][node.name] = health_output
down_count += 1
}
}
up_count = len(cluster_nodes) - down_count
if (len(cluster_nodes) == 0) {
output = "[UNKNOWN] "
} else if (down_count >= threshold) {
output = "[CRITICAL] "
} else {
output = "[OK] "
}
output += "Cluster: " + up_count + "/" + len(cluster_nodes)
output += " nodes up. (" + down_count + " down.) "
output += "Failure threshold: " + threshold + "\n"
for (stat => href in host_state) {
for (hname => check_output in href) {
output += "[" + stat + "] " + hname + ": " + check_output + "\n"
}
}
output += "\nDebug:\n"
output += "host_group: '" + host_group + "'\n"
output += "service_name: '" + service_name + "'\n"
output += "len(cluster_nodes): '" + len(cluster_nodes) + "\n\n"
for (node in cluster_nodes) {
output += "node: '" + node.name + "'\n"
}
return output
}}
The output in Icingaweb2 is the same.
Method 3 - 1-line filter w/ lambda
dummy_text = {{
host_group = "xhaws_test01_cpe"
service_name = "App: Healthcheck URL"
cluster_nodes = get_objects(Host).filter(h => "xhaws_test01_cpe" in h.groups)
threshold = macro("$host.vars.threshold$")
down_count = 0
host_state = { "CRITICAL" = {}, "OK" = {} }
for (node in cluster_nodes) {
health_state = get_service(node, service_name).last_check_result.state
health_output = get_service(node, service_name).last_check_result.output
if (health_state == 0) {
host_state["OK"][node.name] = health_output
} else {
host_state["CRITICAL"][node.name] = health_output
down_count += 1
}
}
up_count = len(cluster_nodes) - down_count
if (len(cluster_nodes) == 0) {
output = "[UNKNOWN] "
} else if (down_count >= threshold) {
output = "[CRITICAL] "
} else {
output = "[OK] "
}
output += "Cluster: " + up_count + "/" + len(cluster_nodes)
output += " nodes up. (" + down_count + " down.) "
output += "Failure threshold: " + threshold + "\n"
for (stat => href in host_state) {
for (hname => check_output in href) {
output += "[" + stat + "] " + hname + ": " + check_output + "\n"
}
}
output += "\nDebug:\n"
output += "host_group: '" + host_group + "'\n"
output += "service_name: '" + service_name + "'\n"
output += "len(cluster_nodes): '" + len(cluster_nodes) + "\n\n"
for (node in cluster_nodes) {
output += "node: '" + node.name + "'\n"
}
return output
}}
Again, no change.
Final thoughts
I also tried testing by changing the hostgroup for the aws hosts to the same as one of our on-premise clusters. As expected, when looking in the HostGroup itself in Icingaweb2, they all show up together. Also, in the console test, these hosts all show up. But, in Icingaweb2, only the statically defined hosts are returned. I’d love to hear any feedback you might have.
Thanks!