Using array containing dictionaries to apply services

Hi all,

we are currently stuck on debugging a function we created with the Icinga DSL to use the returned array in an apply rule.

The function looks like this and is located here on the config master /etc/icinga2/zones.d/global-satellites/functions.conf

# Get all hostnames with a http service check
globals.get_hosts_with_http_service = function() {
  var http_services = get_objects(Service).filter(service => match("HTTPS_*", service.name))
  var http_hosts = []

  for (var http_service in http_services) {
      if (http_service.vars.get("http_vhost")) {
          if (http_service.vars.http_vhost) {
              var http_service_object = {
                "vhost" = http_service.vars.http_vhost
                "hostname" = http_service.host_name
                "service_owner" = http_service.vars.service_owner
              }
              http_hosts.add(http_service_object)
          }
      }
  }

  # Array with dictionaries
  return http_hosts
}

Returned array http_hosts should look like this (based on test on the console)

var hosts_with_httpservice = [ {
       hostname = "abc-repo01"
       service_owner = "null"
       vhost = "repo-private.xyz.dev"
}, {
       hostname = "monitoring-intern.xyz.dev"
       service_owner = "null"
       vhost = "monitoring-intern.xyz.dev"
}, {
       hostname = "abc-pwm01"
       service_owner = "someone"
       vhost = "pwm.ad.xyz.dev"
}, {
       hostname = "abc-repo01"
       service_owner = "null"
       vhost = "repo-public.xyz.dev"
} ]

Now we are trying to use the return value (array containing a dictionary) to apply services defined in /etc/icinga2/zones.d/global-satellites/services.conf

# Cert Checks for HTTPS Checks
var hosts_with_httpservice = get_hosts_with_http_service()
# debugger
for (http_obj in hosts_with_httpservice) {
  apply Service "CERT_" + http_obj.vhost use (http_obj) {
    import "http"

    vars.http_certificate = "14,7"
    vars.http_ssl = true
    vars.jira_ticket_priority = "3"
    vars.notification_period = "8x5"
    vars.service_owner = http_obj.service_owner

    assign where host.name == http_obj.hostname
  }
}

With this config we don’t get any services applied to the host objects, so we started to debug with icinga2 daemon -C -X.
Strangely the debugger does not have the variable hosts_with_httpservice populated when stopping at the break point

[root@abc-ic-ma01 zones.d]# icinga2 daemon -CX
[2021-08-11 11:10:38 +0200] information/cli: Icinga application loader (version: 2.13.0-1)
[2021-08-11 11:10:38 +0200] information/cli: Loading configuration file(s).
Breakpoint encountered.
Location: in /etc/icinga2/zones.d/global-satellites/services.conf: 129:1-129:8
/etc/icinga2/zones.d/global-satellites/services.conf(127): 
/etc/icinga2/zones.d/global-satellites/services.conf(128): var hosts_with_httpservice = get_hosts_with_http_service()
/etc/icinga2/zones.d/global-satellites/services.conf(129): debugger
                                                           ^^^^^^^^
/etc/icinga2/zones.d/global-satellites/services.conf(130): for (http_obj in hosts_with_httpservice) {
/etc/icinga2/zones.d/global-satellites/services.conf(131):   apply Service "CERT_" + http_obj.vhost use (http_obj) {
You can inspect expressions (such as variables) by entering them at the prompt.
To leave the debugger and continue the program use "$continue".
For further commands see "$help".
<1> => get_hosts_with_http_service()
[ ]
<2> =>

Interestingly another function we use, which is defined in /etc/icinga2/zones.d/master/functions.conf
we have the same behavior in the debugger, but the services based on mgmt_zones still get applied

Function:

globals.get_satellite_zones = function() {
        var mgmt_zones = []
        var all_zones = get_objects(Zone)

        for (sat_zone in all_zones) {
                if (match("master", sat_zone.all_parents)){
                        mgmt_zones.add(sat_zone.name)
                }
        }

        return mgmt_zones
}

Debugger

[root@abc-ic-ma01 zones.d]# icinga2 daemon -CX
[2021-08-11 10:51:03 +0200] information/cli: Icinga application loader (version: 2.13.0-1)
[2021-08-11 10:51:03 +0200] information/cli: Loading configuration file(s).
Breakpoint encountered.
Location: in /etc/icinga2/zones.d/master/services.conf: 20:1-20:8
/etc/icinga2/zones.d/master/services.conf(18):  assign where regex(".*-o365-tenant", host.name)
/etc/icinga2/zones.d/master/services.conf(19): }
/etc/icinga2/zones.d/master/services.conf(20): debugger
                                               ^^^^^^^^
/etc/icinga2/zones.d/master/services.conf(21): apply Service "cluster-zone-" for (mgmt_zone in get_satellite_zones()){
/etc/icinga2/zones.d/master/services.conf(22):  import "icinga-cluster-zone-status"
You can inspect expressions (such as variables) by entering them at the prompt.
To leave the debugger and continue the program use "$continue".
For further commands see "$help".
<1> => get_satellite_zones()
[ ]

Service apply:

apply Service "cluster-zone-" for (mgmt_zone in get_satellite_zones()){
        import "icinga-cluster-zone-status"

        vars.cluster_zone = mgmt_zone

        assign where regex("abc-ic-ma0*", host.name)
        ignore where regex("^q1.*", mgmt_zone)
}

So now our questions are:

  1. Why do both functions return nothing inside the debugger?
  2. Why does the service apply for the first posted function(get_hosts_with_http_service) not work, but the service apply using the second function(get_satellite_zones) works?

Any help/hints are much appreciated :slight_smile:

Cheers.

1 Like

You’re debugging the whole (kinda complex) function at once. I’d suggest to put the breakpoint inside the function and to analyze step-by-step what’s (not) happening.

Thanks for your feedback and the suggestion.

We have tested with adding the debugger here

# Get all hostnames with a http service check
globals.get_hosts_with_http_service = function() {
  var http_services = get_objects(Service).filter(service => match("HTTPS_*", service.name))
  var http_hosts = []
  debugger
  for (var http_service in http_services) {
      if (http_service.vars.get("http_vhost")) {
          if (http_service.vars.http_vhost) {
... 

When the config checker stops at the break point, the var http_services is empty.

When doing the same call on the CLI, it returns values

Icinga 2 (version: 2.13.0-1)
Type $help to view available commands.
<1> => var http_services = get_objects(Service).filter(service => match("HTTPS_*", service.name))
null
<2> => http_services
[ {
        __name = "monitoring-intern.xyz.dev!HTTPS_monitoring-intern.xyz.dev"
        acknowledgement = 0.000000
        acknowledgement_expiry = 0.000000
        acknowledgement_last_change = 0.000000
        action_url = ""
        active = true
        check_attempt = 1.000000
        check_command = "http"
        check_interval = 300.000000
        check_period = ""
        check_timeout = null
        command_endpoint = ""
        display_name = "HTTPS_monitoring-intern.xyz.dev"
        downtime_depth = 0.000000
        enable_active_checks = true
        enable_event_handler = true
        enable_flapping = false
        enable_notifications = true
        enable_passive_checks = false
        enable_perfdata = true
        event_command = ""
        executions = null
        extensions = {
                DbObject = {
                        type = "Object"
                }
        }
        flapping = false
        flapping_buffer = 0.000000
        flapping_current = 0.000000
        flapping_ignore_states = null
        flapping_ignore_states_filter_real = -1.000000
        flapping_index = 11.000000
        flapping_last_change = 0.000000
        flapping_last_state = 0.000000
        flapping_threshold = 0.000000
        flapping_threshold_high = 30.000000
        flapping_threshold_low = 25.000000
        force_next_check = false
        force_next_notification = false
        groups = [ ]
        ha_mode = 0.000000
        handled = false
        host = {
                __name = "monitoring-intern.xyz.dev"
                acknowledgement = 0.000000
                acknowledgement_expiry = 0.000000
                acknowledgement_last_change = 0.000000
                action_url = ""
                active = true
                address = ""
                address6 = ""
                check_attempt = 1.000000
                check_command = "dummy"
                check_interval = 300.000000
                check_period = ""
                check_timeout = null
                command_endpoint = ""
                display_name = "monitoring-intern.xyz.dev"
                downtime_depth = 0.000000
                enable_active_checks = true
                enable_event_handler = true
                enable_flapping = false
                enable_notifications = true
                enable_passive_checks = false
                enable_perfdata = true
                event_command = ""
                executions = null
                extensions = {
                        DbObject = {
                                type = "Object"
                        }
                }
                flapping = false
                flapping_buffer = 0.000000
                flapping_current = 0.000000
                flapping_ignore_states = null
                flapping_ignore_states_filter_real = -1.000000
                flapping_index = 7.000000
                flapping_last_change = 0.000000
                flapping_last_state = 0.000000
                flapping_threshold = 0.000000
                flapping_threshold_high = 30.000000
                flapping_threshold_low = 25.000000
                force_next_check = false
                force_next_notification = false
                groups = [ ]
                ha_mode = 0.000000
                handled = false
                icon_image = ""
                icon_image_alt = ""
                last_check = 1629118577.828568
                last_check_result = {
                        active = true
                        check_source = "abc-ic-sl01"
                        command = "dummy"
                        execution_end = 1629118577.828564
                        execution_start = 1629118577.828564
                        exit_status = 0.000000
                        output = "Check was successful."
                        performance_data = [ ]
                        schedule_end = 1629118577.828568
                        schedule_start = 1629118577.828228
                        scheduling_source = "abc-ic-sl01"
                        state = 0.000000
                        ttl = 0.000000
                        type = "CheckResult"
                        vars_after = {
                                attempt = 1.000000
                                reachable = true
                                state = 0.000000
                                state_type = 1.000000
                        }
                        vars_before = {
                                attempt = 1.000000
                                reachable = true
                                state = 0.000000
                                state_type = 1.000000
                        }
                }
                last_check_started = 1629118577.828478
                last_hard_state = 0.000000
                last_hard_state_change = 1621954286.747852
                last_hard_state_raw = 0.000000
                last_hard_states_raw = 3.000000
                last_reachable = true
                last_soft_states_raw = 99.000000
                last_state = 0.000000
                last_state_change = 1621954286.747852
                last_state_down = 0.000000
                last_state_raw = 0.000000
                last_state_type = 1.000000
                last_state_unreachable = 0.000000
                last_state_up = 1629118577.828564
                max_check_attempts = 3.000000
                name = "monitoring-intern.xyz.dev"
                next_check = 1629118869.824033
                next_update = 1629119169.824705
                notes = ""
                notes_url = ""
                original_attributes = null
                package = "director"
                pause_called = false
                paused = false
                pending_executions = null
                previous_state_change = 1621954286.747852
                problem = false
                resume_called = true
                retry_interval = 60.000000
                severity = 0.000000
                source_location = {
                        first_column = 0.000000
                        first_line = 1.000000
                        last_column = 44.000000
                        last_line = 1.000000
                        path = "/var/lib/icinga2/api/packages/director/5c01c3d3-0e01-40ad-beb0-bda918487a23/zones.d/abc-azure/hosts.conf"
                }
                start_called = true
                state = 0.000000
                state_loaded = true
                state_raw = 0.000000
                state_type = 1.000000
                stop_called = false
                suppressed_notifications = 0.000000
                templates = [ "monitoring-intern.xyz.dev", "abc-azure-host-template", "generic-hosts", "dummy_hosts" ]
                type = "Host"
                vars = {
                        jira_issuetype = "Incident"
                        jira_project = "XYZ"
                        notification_alerting = true
                        notification_period = "24x7"
                }
                version = 0.000000
                volatile = false
                zone = "abc-azure"
        }
        host_name = "monitoring-intern.xyz.dev"
        icon_image = ""
        icon_image_alt = ""
        last_check = 1629118676.612957
        last_check_result = {
                active = true
                check_source = "abc-ic-sl01"
                command = [ "/usr/lib64/nagios/plugins/check_http", "--sni", "-H", "monitoring-intern.xyz.dev", "-I", "monitoring-intern.xyz.dev", "-S" ]
                execution_end = 1629118676.612922
                execution_start = 1629118676.575291
                exit_status = 0.000000
                output = "HTTP OK: HTTP/1.1 302 Found - 553 bytes in 0.033 second response time "
                performance_data = [ "time=0.033050s;;;0.000000", "size=553B;;;0" ]
                schedule_end = 1629118676.612957
                schedule_start = 1629118676.574536
                scheduling_source = "abc-ic-sl01"
                state = 0.000000
                ttl = 0.000000
                type = "CheckResult"
                vars_after = {
                        attempt = 1.000000
                        reachable = true
                        state = 0.000000
                        state_type = 1.000000
                }
                vars_before = {
                        attempt = 1.000000
                        reachable = true
                        state = 0.000000
                        state_type = 1.000000
                }
        }
        last_check_started = 1629118676.574847
        last_hard_state = 0.000000
        last_hard_state_change = 1628545442.050737
        last_hard_state_raw = 0.000000
        last_hard_states_raw = 0.000000
        last_reachable = true
        last_soft_states_raw = 3.000000
        last_state = 0.000000
        last_state_change = 1628545442.050737
        last_state_critical = 1625686367.520210
        last_state_ok = 1629118676.612922
        last_state_raw = 0.000000
        last_state_type = 1.000000
        last_state_unknown = 1628545322.996843
        last_state_unreachable = 1621954194.497015
        last_state_warning = 0.000000
        max_check_attempts = 3.000000
        name = "HTTPS_monitoring-intern.xyz.dev"
        next_check = 1629118967.977217
        next_update = 1629119268.053989
        notes = ""
        notes_url = ""
        original_attributes = null
        package = "director"
        pause_called = false
        paused = true
        pending_executions = null
        previous_state_change = 1628545442.050737
        problem = false
        resume_called = false
        retry_interval = 120.000000
        severity = 0.000000
        source_location = {
                first_column = 1.000000
                first_line = 50.000000
                last_column = 53.000000
                last_line = 50.000000
                path = "/var/lib/icinga2/api/packages/director/5c01c3d3-0e01-40ad-beb0-bda918487a23/zones.d/director-global/service_apply.conf"
        }
        start_called = true
        state = 0.000000
        state_loaded = true
        state_raw = 0.000000
        state_type = 1.000000
        stop_called = false
        suppressed_notifications = 0.000000
        templates = [ "HTTPS_monitoring-intern.xyz.dev", "http", "generic-service-template", "host var overrides (Director)" ]
        type = "Service"
        vars = {
                grafana_graph_disable = true
                http_address = "monitoring-intern.xyz.dev"
                http_sni = true
                http_ssl = true
                http_vhost = "monitoring-intern.xyz.dev"
                jira_issuetype = "Incident"
                jira_ticket_priority = "2"
                notification_period = "24x7"
        }
        version = 0.000000
        volatile = false
        zone = "abc-azure"
}, 
...

So the question is, why is the var empty when run in the config?
Are the Service object no present at that stage of the configuration/deployment?

To answer this question you should split get_objects() and Array#filter().

We already tried that, but it did not change the outcome.

What we found out is:

  • Accessing the icinga objects (host or service) seems only possible inside the (or any) apply rule.

We tested this as follows:

  1. config looks like this
apply Service "commvault-backup" {
    import "commvault-backup_status"
	
debugger

    vars.commvault_server = "bla"
    vars.commvault_user = "bla"

    command_endpoint = host_name


    assign where (host.vars.os_family == "linux" || host.vars.os_family == "windows") && !(host.vars.tags.noBackup == "yes") && !(host.vars.tags.noBackup == "true")
}

# Cert Checks for HTTPS Checks
var http_services = get_objects(Service).filter(service => match("HTTPS_*", service.name))
var hosts_with_httpservice = get_hosts_with_http_service(http_services)

debugger

for (http_obj in hosts_with_httpservice) {
  apply Service "CERT_" + http_obj.vhost use (http_obj) {
    import "http"

    vars.http_certificate = "14,7"
    vars.http_ssl = true
    vars.jira_ticket_priority = "3"
    vars.notification_period = "8x5"
    vars.service_owner = http_obj.service_owner

    #assign where host.name == http_obj.hostname
    assign where host.name == "msd-rp01"
  }
}
  1. icinga2 daemon -CX
  2. Debugger will stop at the second debugger after the two var definitions first, but both vars are empty (no matter if the var http_service is the complete statement or just get_objects(Service)
[2021-08-17 13:24:16 +0200] information/cli: Loading configuration file(s).
Breakpoint encountered.
Location: in /etc/icinga2/zones.d/global-satellites/services.conf: 102:1-102:8
/etc/icinga2/zones.d/global-satellites/services.conf(100): 
/etc/icinga2/zones.d/global-satellites/services.conf(101): # Cert Checks for HTTPS Checks
/etc/icinga2/zones.d/global-satellites/services.conf(102): debugger
                                                           ^^^^^^^^
/etc/icinga2/zones.d/global-satellites/services.conf(103): var http_services = get_objects(Service).filter(service => match("HTTPS_*", service.name))
/etc/icinga2/zones.d/global-satellites/services.conf(104): var hosts_with_httpservice = get_hosts_with_http_service(http_services)
You can inspect expressions (such as variables) by entering them at the prompt.
To leave the debugger and continue the program use "$continue".
For further commands see "$help".
<1> => var test = get_objects(Service)
null
<2> => test
[ ]
<3> =>
  1. after this the debugger will stop at the break point inside the apply rule and then the function can be used.

The question is, can we delay populating our vars by calling the function until the objects exist?

Can’t you just create the cert services based on the same custom vars as the HTTP services?

We don’t have a custom var for the HTTPS checks. We create them manually when needed and want to have the cert checks automatically created when when added a new HTTPS check to any host.

As some of the checks require a follow, or different expect regex we can’t just have a simple var to create the HTTPS checks when the var is true.

I recommend you to have a dictionary custom var on each host with SERVICE_NAME=>SERVICE_VARS. Then you can do:

apply Service "" for (name => opts in host.vars.http) {
  vars += opts
}

As said in my previous post, this is a solution we don’t want to/can’t use.
As the hosts are configured with the Director some nested dictionary var is also not possible, to have the possibility to set different check options for different vhosts (afaik).

We were more hoping on some answers to other questions regarding the configuration:

edit: well. Seems like what we are trying to do is impossible. At least we are out of ideas what else to try.
If, in future, someone stumbles upon this and has ideas (or a solution :wink: ) we are happy to hear it.

For now we will deploy the checks manually unless we get an epiphany on how to automate it :smiley: