Phantom notifications from deleted host

Hello folks,

I have an odd issue I’m hoping someone else has had previously (although I’m not banking on it).

Our systems are on a secure network so unfortunately I’m not able to put up full configs. We’re running Icinga2 version 2.12.3 on a pair of HA masters, which connect to a pair of satellites on another network. Centos 7 on all the servers above.

tl;dr
I deleted a host from icinga as it was alerting despite being acknowledged, downtimed and disabled in the web frontend, and it is now continuing to send notifications from beyond the grave.

Long story time-

We were monitoring a switch with snmp, and it was continually alerting. We have a telegram channel for notifications, and every 30 minutes we get an alert from this host. The host itself has been decommissioned and no longer exists.

Initially, the network team disabled the host in Icinga, but continued to receive notifications. I took over on-call duties, and noticed the host was alerting, so I ack’d the alert only to receive another notification 30 minutes later. I added a fixed downtime to the host of 1 week, and then received a notification 30 minutes after the downtime took effect.

I confirmed with the team that this host was no longer being monitored, and removed it from Icinga by logging in to the web frontend, going to Icinga Director > Hosts > Hosts, searching for the hostname and deleting it, then deployed my changes from the Notifications area.

30 minutes later I received a notification for it.

I restored the host by going back into Icinga Director > Activity log > restore, set the Response SLA to none and deleted the host again. Still receiving notifications.

At that point I gave up for the night. The next day, I logged into the masters and restarted the Icinga2 service on each server. Still receiving notifications.

I logged in to the satellites and restarted the Icinga2 service on each of them. Still receiving notifications.

I had a look at the satellites, and I could see the host referenced in a couple of config files when I grepped for the hostname over /var/lib/icinga2. I tried stopping the icinga2 service on each satellite, then removed everything in the config like so-

rm -rf /var/lib/icinga2/api/{packages,zones,zones-stage}/*

and started up the service. This removed any further reference of that server on the satellites, but still notifications are being sent.

I restarted the services on the masters again, no luck.

I’ve also tried, in Icinga web frontend, going to
Icinga Director > Icinga Infrastructure > Kickstart Wizard

and clicking the “Run import” button, however this doesn’t seem to have done anything.

I tried restoring the host again to at least get it back in the list of hosts in Icinga web, however when I click on my delete action in the Activity log I get this error-

#0 /usr/share/icingaweb2/modules/director/library/Director/Data/Db/DbObject.php(1173): Icinga\Module\Director\Data\Db\DbObject->loadFromDb()
#1 /usr/share/icingaweb2/modules/director/library/Director/Objects/IcingaObject.php(2589): Icinga\Module\Director\Data\Db\DbObject::load(String, Object(Icinga\Module\Director\Db))
#2 /usr/share/icingaweb2/modules/director/library/Director/Web/Controller/ObjectController.php(458): Icinga\Module\Director\Objects\IcingaObject::loadByType(String, String, Object(Icinga\Module\Director\Db))
#3 /usr/share/icingaweb2/modules/director/library/Director/Web/Controller/ObjectController.php(441): Icinga\Module\Director\Web\Controller\ObjectController->loadObject()
#4 /usr/share/icingaweb2/modules/director/library/Director/Web/Controller/ObjectController.php(75): Icinga\Module\Director\Web\Controller\ObjectController->eventuallyLoadObject()
#5 /usr/share/php/Icinga/Web/Controller/ActionController.php(155): Icinga\Module\Director\Web\Controller\ObjectController->init()
#6 /usr/share/php/Icinga/Web/Controller/Dispatcher.php(59): Icinga\Web\Controller\ActionController->__construct(Object(Icinga\Web\Request), Object(Icinga\Web\Response), Array)
#7 /usr/share/icingaweb2/library/vendor/Zend/Controller/Front.php(937): Icinga\Web\Controller\Dispatcher->dispatch(Object(Icinga\Web\Request), Object(Icinga\Web\Response))
#8 /usr/share/php/Icinga/Application/Web.php(300): Zend_Controller_Front->dispatch(Object(Icinga\Web\Request), Object(Icinga\Web\Response))
#9 /usr/share/php/Icinga/Application/webrouter.php(99): Icinga\Application\Web->dispatch()
#10 /usr/share/icingaweb2/public/index.php(4): require_once(String)
#11 {main}

Notifications that come through from this deleted host aren’t showing in the Icinga web frontend in History > Notifications, however you can see the notification getting generated by checking
tail -f /var/log/icinga2/icinga2.log
on one of the masters.

[2021-12-31 14:33:42 +1100] information/Notification: Sending reminder 'Problem' notification 'SERVER-I-DELETED!notify-oncall-telegram-hosts_24x7' for user 'Dan Mackie'

There’s nothing recent in the debug, errbot or error logs on either master.

However, in Icinga web, under System > Application log, I see this error-

Icinga\Exception\NotFoundError in /usr/share/icingaweb2/modules/director/library/Director/Data/Db/DbObject.php:642 with message: Failed to load icinga_host "SERVER-I-DELETED"
#0 /usr/share/icingaweb2/modules/director/library/Director/Data/Db/DbObject.php(1173): Icinga\Module\Director\Data\Db\DbObject->loadFromDb()
#1 /usr/share/icingaweb2/modules/director/library/Director/Objects/IcingaObject.php(2589): Icinga\Module\Director\Data\Db\DbObject::load(String, Object(Icinga\Module\Director\Db))
#2 /usr/share/icingaweb2/modules/director/library/Director/Web/Controller/ObjectController.php(458): Icinga\Module\Director\Objects\IcingaObject::loadByType(String, String, Object(Icinga\Module\Director\Db))
#3 /usr/share/icingaweb2/modules/director/library/Director/Web/Controller/ObjectController.php(441): Icinga\Module\Director\Web\Controller\ObjectController->loadObject()
#4 /usr/share/icingaweb2/modules/director/library/Director/Web/Controller/ObjectController.php(75): Icinga\Module\Director\Web\Controller\ObjectController->eventuallyLoadObject()
#5 /usr/share/php/Icinga/Web/Controller/ActionController.php(155): Icinga\Module\Director\Web\Controller\ObjectController->init()
#6 /usr/share/php/Icinga/Web/Controller/Dispatcher.php(59): Icinga\Web\Controller\ActionController->__construct(Object(Icinga\Web\Request), Object(Icinga\Web\Response), Array)
#7 /usr/share/icingaweb2/library/vendor/Zend/Controller/Front.php(937): Icinga\Web\Controller\Dispatcher->dispatch(Object(Icinga\Web\Request), Object(Icinga\Web\Response))
#8 /usr/share/php/Icinga/Application/Web.php(300): Zend_Controller_Front->dispatch(Object(Icinga\Web\Request), Object(Icinga\Web\Response))
#9 /usr/share/php/Icinga/Application/webrouter.php(99): Icinga\Application\Web->dispatch()
#10 /usr/share/icingaweb2/public/index.php(4): require_once(String)
#11 {main}

I ran
icinga2 object list | grep "SERVER-I-DELETED"
on one of the masters and got no results.

If you’ve made it this far through my tale of woe, any help is appreciated. It seems to me that something is out of sync somewhere in the pipeline, I just can’t figure out where (presumably between the masters and the web frontend?). I know details about our setup are sparse (which I apologise for), my main questions are-

  1. Is there a way to clear a notification out of the queue if the host doesn’t exist any more?
  2. Is there anything really obvious I’m missing in my troubleshooting as far as logs / Director goes?

Thanks all,
Dan

Hi there :slight_smile:
Have you checked on both masters (and the satellites just to be sure) that there is nothing (about this host) configured in the /etc/icinga2 directory?
Maybe just grep for the host name in any .conf file there is^^

You could also run rm -rf /var/lib/icinga2/api/{packages,zones,zones-stage}/* on the masters as a last resort, but this would have some drawbacks, as the state file will also be deleted and every problem check will re-notify again as it will go into soft and then hard state again. Not sure about acked or downtimed checks though.

Thanks @log1c

I had a grep over the /var/lib/icinga2 and /etc/icinga2 directories on the satellites and the masters and can’t find any instances of that deleted host in there.

I also tried the rm -rf /var/lib/icinga2/api/{packages,zones,zones-stage}/* trick on the masters, but the notification is continuing to come through.

Additionally, I logged in to each satellite, stopped the services on both, then stopped the services on the masters, and ran rm -rf /var/lib/icinga2/api/{packages,zones,zones-stage}/* across all four servers simultaneously before bringing the servers backup, but the alert is continuing to come through.

Thanks
Dan

Actually, I just told a lie- there were instances of the deleted host in files under /var/lib.

I stopped all services on the master and the satellites, then killed off any files I found that matched on the grep pattern. Then nuked /var/lib/icinga2/api/{packages,zones,zones-stage}/* again just to be sure.

I’ve now started all services again, and so far haven’t seen that host pop up. I will continue to keep an eye on the notifications though, I should see it in the next half hour or so if it is still occurring.

Thanks
Dan

If the host isn’t present in the director, all /var/lib/icinga2/api/{packages,zones,zones-stage}/* directories have been cleaned and the host somehow is still notifying I would call one of those three :wink:

image
image
image

1 Like

Worth checking the Icinga timestamps on the notifications. We’ve had problems with mail relays in the past where the relay that one of our Icinga instances used was causing mail to be delayed a couple of hours.

Thanks all for your help, here is what finally cleared the notifications for me-

  1. Terminate the icinga2 service on the satellites and masters, so they are all down at the same time
    systemctl stop icinga2
  2. Delete the contents of the api sub directories
    rm -rf /var/lib/icinga2/api/{packages,zones,zones-stage}/*
  3. grep over /var/lib for the hostname
    grep -irl hostname /var/lib
  4. rm any files that contain that hostname. In my case, these seemed to be /var/lib/icinga2/icinga2.state (and also any other icinga2.state files that were open, ie. icinga2.state.ac5Eso)

Case closed!

PS. As @log1c mentioned, clearing the state files did indeed re-alert on every host that had been acknowledged previously, but there wasn’t too much cleanup work I had to do.

2 Likes