API Returns 500 Internal Server Error on process-check-result updating service in remote zone

Hi,
I have a zoned setup with one agent reachable from all addresses and another agent acting from the management network. I try to push a service status update via icinga2 API from a monitored host to the public agent which results in “HTTP/1.1 500 Internal Server Error”. The host itself is being monitored from the agent in the management network.

Is this a non-feature? Are there better designs? The firewall does not allow traffic from the monitored host to the management node.

Reproduce:
Trying to update a service state on the public agent (icinga-external). The host is being monitored from the agent in the management network (icinga-satellite). The agent icinga-satellite connects to icinga-external.

# curl -k -s -S -i -u user:password -H 'Accept: application/json' -X POST 'https://localhost:5665/v1/actions/process-check-result' -d '{ "type": "Service", "filter": "host.name==\"somehost0\" && service.name==\"someservice\"", "exit_status": 1, "plugin_output": "TEST CRITICAL - Packet loss = 100%", "pretty": true }'
HTTP/1.1 500 Internal Server Error
Server: Icinga/2.13.1-1
Content-Type: application/json
Content-Length: 22

{
    "results": []
}

If I try the same on a host being monitored by the public agent (satellite-external):

#curl -k -s -S -i -u user:password -H 'Accept: application/json' -X POST 'https://localhost:5665/v1/actions/process-check-result' -d '{ "type": "Service", "filter": "host.name==\"testhost-external\" && service.name==\"testservice\"", "exit_status": 1, "plugin_output": "TEST CRITICAL - Packet loss = 100%", "pretty": true }'
HTTP/1.1 200 OK
Server: Icinga/2.13.1-1
Content-Type: application/json
Content-Length: 176

{
    "results": [
        {
            "code": 200,
            "status": "Successfully processed check result for object 'testhost-external!testservice'."
        }
    ]
}
  • Version used (icinga2 --version)
    2.13.1-1

  • Operating System and version
    Centos 7

  • Enabled features (icinga2 feature list)
    On the public and management agents: api checker mainlog
    On the master node: api checker ido-pgsql influxdb mainlog notification

  • Icinga Web 2 version and modules (System - About)

  • Config validation (icinga2 daemon -C)
    valid

  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes

Object 'icinga-satellite' of type 'Endpoint':
  % declared in '/etc/icinga2/zones.conf', lines 10:1-10:48
  * __name = "icinga-satellite"
  * host = "10.0.17.171"
    % = modified in '/etc/icinga2/zones.conf', lines 13:9-13:30
  * log_duration = 86400
  * name = "icinga-satellite"
  * package = "_etc"
  * port = "5665"
  * source_location
    * first_column = 1
    * first_line = 10
    * last_column = 48
    * last_line = 10
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "icinga-satellite" ]
    % = modified in '/etc/icinga2/zones.conf', lines 10:1-10:48
  * type = "Endpoint"
  * zone = ""

Object 'icinga-external' of type 'Endpoint':
  % declared in '/etc/icinga2/zones.conf', lines 22:1-22:47
  * __name = "icinga-external"
  * host = ""
  * log_duration = 86400
  * name = "icinga-external"
  * package = "_etc"
  * port = "5665"
  * source_location
    * first_column = 1
    * first_line = 22
    * last_column = 47
    * last_line = 22
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "icinga-external" ]
    % = modified in '/etc/icinga2/zones.conf', lines 22:1-22:47
  * type = "Endpoint"
  * zone = ""

Object 'icinga-master' of type 'Endpoint':
  % declared in '/etc/icinga2/zones.conf', lines 6:1-6:45
  * __name = "icinga-master"
  * host = ""
  * log_duration = 86400
  * name = "icinga-master"
  * package = "_etc"
  * port = "5665"
  * source_location
    * first_column = 1
    * first_line = 6
    * last_column = 45
    * last_line = 6
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "icinga-master" ]
    % = modified in '/etc/icinga2/zones.conf', lines 6:1-6:45
  * type = "Endpoint"
  * zone = ""

Object 'zone-satellite' of type 'Zone':
  % declared in '/etc/icinga2/zones.conf', lines 31:1-31:32
  * __name = "zone-satellite"
  * endpoints = [ "icinga-satellite" ]
    % = modified in '/etc/icinga2/zones.conf', lines 32:9-32:56
  * global = false
  * name = "zone-satellite"
  * package = "_etc"
  * parent = "zone-master"
    % = modified in '/etc/icinga2/zones.conf', lines 33:9-33:30
  * source_location
    * first_column = 1
    * first_line = 31
    * last_column = 32
    * last_line = 31
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "zone-satellite" ]
    % = modified in '/etc/icinga2/zones.conf', lines 31:1-31:32
  * type = "Zone"
  * zone = ""

Object 'zone-master' of type 'Zone':
  % declared in '/etc/icinga2/zones.conf', lines 27:1-27:25
  * __name = "zone-master"
  * endpoints = [ "icinga-master" ]
    % = modified in '/etc/icinga2/zones.conf', lines 28:2-28:46
  * global = false
  * name = "zone-master"
  * package = "_etc"
  * parent = ""
  * source_location
    * first_column = 1
    * first_line = 27
    * last_column = 25
    * last_line = 27
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "zone-master" ]
    % = modified in '/etc/icinga2/zones.conf', lines 27:1-27:25
  * type = "Zone"
  * zone = ""

Object 'global-templates' of type 'Zone':
  % declared in '/etc/icinga2/zones.conf', lines 46:1-46:30
  * __name = "global-templates"
  * endpoints = null
  * global = true
    % = modified in '/etc/icinga2/zones.conf', lines 47:2-47:14
  * name = "global-templates"
  * package = "_etc"
  * parent = ""
  * source_location
    * first_column = 1
    * first_line = 46
    * last_column = 30
    * last_line = 46
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "global-templates" ]
    % = modified in '/etc/icinga2/zones.conf', lines 46:1-46:30
  * type = "Zone"
  * zone = ""

Object 'zone-external' of type 'Zone':
  % declared in '/etc/icinga2/zones.conf', lines 41:1-41:31
  * __name = "zone-external"
  * endpoints = [ "icinga-external" ]
    % = modified in '/etc/icinga2/zones.conf', lines 42:9-42:55
  * global = false
  * name = "zone-external"
  * package = "_etc"
  * parent = "zone-satellite"
    % = modified in '/etc/icinga2/zones.conf', lines 43:9-43:37
  * source_location
    * first_column = 1
    * first_line = 41
    * last_column = 31
    * last_line = 41
    * path = "/etc/icinga2/zones.conf"
  * templates = [ "zone-external" ]
    % = modified in '/etc/icinga2/zones.conf', lines 41:1-41:31
  * type = "Zone"
  * zone = ""
1 Like

A very similar problem is discussed here: Submit passive result via remote agent API? - #13 by Bebef

It seems that this issue could be “as designed”. But how to actively monitor a host from the management network and letting it update status certain services via API (passive-check)?

1 Like

We submit passive check results for 1000s of services without issue:

regardless of zone; we only submit to the master (not agent).

We use the director module for import and try to set these services in the master zone (I can’t say that this is 100%).

The way that the API works is kind of weird too; if you have 2 masters; only the master with the active IDO will accept calls from my experience.

Hi Ben,
Good to know, that sending stuff to the master always works, but at us, the master is in a protected zone, so we wanted to send updates to a satellite in a public zone. Is this possible?

1 Like

Run the job from the master, or another node in the protected zone.

I’m also assuming that your protected zone can go out and reach whatever provides your status updates (API, snmp, DB call, etc…)

Other than making the call to the master, I’m not sure, but someone else might have an idea. From my experience, only the master (and the active master at that) accepts the calls and does anything with them.

We had a similar problem a while back, when we wanted to create downtimes/acknowledgements via the API from clients that can only reach the satellite in their zone but not the master.
The satellite will accept the API request and will forward to the master, but the master will not accept the update because it has the authority over the satellite zones and therefore does not accept config updates from child zones.

What we ended up with is a simple HAproxy on the satellite, listening on port 5666 and forwarding incoming requests to the master.

See here

This could be a solution for your problem as well, if I understood it correctly.

2 Likes

Just for information, on two masters scenario, I also get the 500 error when hitting api endpoint ‘v1/actions/process-check-result’ on the slave-master, when sending from any host.

Why is this not documented on icinga?
Why are the logs not more verbose?
Why does slave master api does not replicate this issue to ido-master instead of failing 500 without further details?

Scenario description:

I have two masters running icinga2-2.13.2-1.el7.icinga.x86_64:

  • master-2, who is the configuration master (/etc/icinga2/zones.d/<all .conf files> + Running the IDO connection) and
  • master-1, which is the peer (configuration replicated via /var/lib/icinga2/api/).

When I send the api request to process check results (a passive service) to master-1 → It errs on 500 without further details
Client logs:

2022-07-29 03:06:47,136 - [32664] - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): master-1:5665
2022-07-29 03:06:47,580 - [32664] - urllib3.connectionpool - DEBUG - https://master-1:5665 “POST /v1/actions/process-check-result HTTP/1.1” 500 14
2022-07-29 03:06:47,584 - [32664] - common.icinga2 - CRITICAL - Failed api call with code HTTP: 500
2022-07-29 03:06:47,585 - [32664] - common.icinga2 - CRITICAL - Internal Server Error
2022-07-29 03:06:47,589 - [32664] - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): master-1:5665
2022-07-29 03:06:48,023 - [32664] - urllib3.connectionpool - DEBUG - https://master-1:5665 “POST /v1/actions/process-check-result HTTP/1.1” 500 14
2022-07-29 03:06:48,027 - [32664] - common.icinga2 - CRITICAL - Failed api call with code HTTP: 500
2022-07-29 03:06:48,027 - [32664] - common.icinga2 - CRITICAL - Internal Server Error
2022-07-29 03:06:48,032 - [32664] - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): master-1:5665
2022-07-29 03:06:48,479 - [32664] - urllib3.connectionpool - DEBUG - https://master-1:5665 “POST /v1/actions/process-check-result HTTP/1.1” 500 14
2022-07-29 03:06:48,484 - [32664] - common.icinga2 - CRITICAL - Failed api call with code HTTP: 500

master-1 debug logs:

[2022-07-29 02:54:05 -0400] information/ApiListener: New client connection from [::ffff:10.103.69.51]:59092 (no client certificate)
[2022-07-29 02:54:05 -0400] notice/ApiListener: Relaying ‘event::SetLastCheckStarted’ message
[2022-07-29 02:54:05 -0400] notice/ApiListener: Relaying ‘event::SetNextCheck’ message
[2022-07-29 02:54:05 -0400] notice/ApiListener: Relaying ‘event::SetLastCheckStarted’ message
[2022-07-29 02:54:05 -0400] notice/ApiListener: Relaying ‘event::SetNextCheck’ message
[2022-07-29 02:54:05 -0400] notice/ApiListener: Relaying ‘event::SetNextCheck’ message
[2022-07-29 02:54:05 -0400] notice/ApiListener: Relaying ‘event::SetNextCheck’ message
[2022-07-29 02:54:05 -0400] notice/ApiListener: Relaying ‘event::SetNextCheck’ message
[2022-07-29 02:54:05 -0400] notice/ApiListener: Relaying ‘event::CheckResult’ message
[2022-07-29 02:54:05 -0400] debug/HttpUtility: Request body: ‘{“type”: “Service”, “filter”: “host.name=="monitoring-influxdb2-100.internal" && service.name=="database-backup-custom_events-daily"”, “exit_status”: 0, “plugin_output”: “Backup Successful”}’
[2022-07-29 02:54:05 -0400] notice/ApiListener: Relaying ‘event::SetNextCheck’ message
[2022-07-29 02:54:05 -0400] notice/ApiActionHandler: Running action process-check-result
[2022-07-29 02:54:05 -0400] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [::ffff:10.103.69.51]:59092), user: externalchecker, agent: python-requests/2.27.1, status: Internal Server Error).

Using the same client, but sending to master-2 instead, I get the passive check result processed:

2022-07-29 03:25:07,203 - [3296] - main - INFO - starting icinga notification
2022-07-29 03:25:07,204 - [3296] - main - DEBUG - Icinga service: database-backup-custom_events-daily
2022-07-29 03:25:07,204 - [3296] - root - DEBUG - Calculated icinga2 filter host.name==“monitoring-influxdb2-100.internal” && service.name==“database-backup-custom_events-daily”
2022-07-29 03:25:07,212 - [3296] - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): master-2:5665
2022-07-29 03:25:07,648 - [3296] - urllib3.connectionpool - DEBUG - https://master-2:5665 “POST /v1/actions/process-check-result HTTP/1.1” 200 172
2022-07-29 03:25:07,652 - [3296] - common.icinga2 - DEBUG - posted to API HTTP 200

So, in my case, I’m forced to send to master-2, (config master + current IDO role) for the passive checks to be accepted; which is far from optimal

for documentation, my ido configuration has ha enabled, and icinga2 gives this role to master-2 (save during outages, when it is automatically failed over)

object IdoMysqlConnection "ido-mysql" {
  host = "mysql-1.internal"
  port = 3306
  user = "someuser"
  password = "whatever"
  database = "icinga"
  table_prefix = "icinga_"
  enable_ha = true
  cleanup = {
    acknowledgements_age = 20d
    commenthistory_age = 10d
    contactnotifications_age = 3d
    flappinghistory_age = 3d
    notifications_age = 3d
    statehistory_age = 10d
  }
  enable_ssl = false
}