Hello folks,
I have an odd issue I’m hoping someone else has had previously (although I’m not banking on it).
Our systems are on a secure network so unfortunately I’m not able to put up full configs. We’re running Icinga2 version 2.12.3 on a pair of HA masters, which connect to a pair of satellites on another network. Centos 7 on all the servers above.
tl;dr
I deleted a host from icinga as it was alerting despite being acknowledged, downtimed and disabled in the web frontend, and it is now continuing to send notifications from beyond the grave.
Long story time-
We were monitoring a switch with snmp, and it was continually alerting. We have a telegram channel for notifications, and every 30 minutes we get an alert from this host. The host itself has been decommissioned and no longer exists.
Initially, the network team disabled the host in Icinga, but continued to receive notifications. I took over on-call duties, and noticed the host was alerting, so I ack’d the alert only to receive another notification 30 minutes later. I added a fixed downtime to the host of 1 week, and then received a notification 30 minutes after the downtime took effect.
I confirmed with the team that this host was no longer being monitored, and removed it from Icinga by logging in to the web frontend, going to Icinga Director > Hosts > Hosts, searching for the hostname and deleting it, then deployed my changes from the Notifications area.
30 minutes later I received a notification for it.
I restored the host by going back into Icinga Director > Activity log > restore, set the Response SLA to none and deleted the host again. Still receiving notifications.
At that point I gave up for the night. The next day, I logged into the masters and restarted the Icinga2 service on each server. Still receiving notifications.
I logged in to the satellites and restarted the Icinga2 service on each of them. Still receiving notifications.
I had a look at the satellites, and I could see the host referenced in a couple of config files when I grepped for the hostname over /var/lib/icinga2. I tried stopping the icinga2 service on each satellite, then removed everything in the config like so-
rm -rf /var/lib/icinga2/api/{packages,zones,zones-stage}/*
and started up the service. This removed any further reference of that server on the satellites, but still notifications are being sent.
I restarted the services on the masters again, no luck.
I’ve also tried, in Icinga web frontend, going to
Icinga Director > Icinga Infrastructure > Kickstart Wizard
and clicking the “Run import” button, however this doesn’t seem to have done anything.
I tried restoring the host again to at least get it back in the list of hosts in Icinga web, however when I click on my delete action in the Activity log I get this error-
#0 /usr/share/icingaweb2/modules/director/library/Director/Data/Db/DbObject.php(1173): Icinga\Module\Director\Data\Db\DbObject->loadFromDb()
#1 /usr/share/icingaweb2/modules/director/library/Director/Objects/IcingaObject.php(2589): Icinga\Module\Director\Data\Db\DbObject::load(String, Object(Icinga\Module\Director\Db))
#2 /usr/share/icingaweb2/modules/director/library/Director/Web/Controller/ObjectController.php(458): Icinga\Module\Director\Objects\IcingaObject::loadByType(String, String, Object(Icinga\Module\Director\Db))
#3 /usr/share/icingaweb2/modules/director/library/Director/Web/Controller/ObjectController.php(441): Icinga\Module\Director\Web\Controller\ObjectController->loadObject()
#4 /usr/share/icingaweb2/modules/director/library/Director/Web/Controller/ObjectController.php(75): Icinga\Module\Director\Web\Controller\ObjectController->eventuallyLoadObject()
#5 /usr/share/php/Icinga/Web/Controller/ActionController.php(155): Icinga\Module\Director\Web\Controller\ObjectController->init()
#6 /usr/share/php/Icinga/Web/Controller/Dispatcher.php(59): Icinga\Web\Controller\ActionController->__construct(Object(Icinga\Web\Request), Object(Icinga\Web\Response), Array)
#7 /usr/share/icingaweb2/library/vendor/Zend/Controller/Front.php(937): Icinga\Web\Controller\Dispatcher->dispatch(Object(Icinga\Web\Request), Object(Icinga\Web\Response))
#8 /usr/share/php/Icinga/Application/Web.php(300): Zend_Controller_Front->dispatch(Object(Icinga\Web\Request), Object(Icinga\Web\Response))
#9 /usr/share/php/Icinga/Application/webrouter.php(99): Icinga\Application\Web->dispatch()
#10 /usr/share/icingaweb2/public/index.php(4): require_once(String)
#11 {main}
Notifications that come through from this deleted host aren’t showing in the Icinga web frontend in History > Notifications, however you can see the notification getting generated by checking
tail -f /var/log/icinga2/icinga2.log
on one of the masters.
[2021-12-31 14:33:42 +1100] information/Notification: Sending reminder 'Problem' notification 'SERVER-I-DELETED!notify-oncall-telegram-hosts_24x7' for user 'Dan Mackie'
There’s nothing recent in the debug, errbot or error logs on either master.
However, in Icinga web, under System > Application log, I see this error-
Icinga\Exception\NotFoundError in /usr/share/icingaweb2/modules/director/library/Director/Data/Db/DbObject.php:642 with message: Failed to load icinga_host "SERVER-I-DELETED"
#0 /usr/share/icingaweb2/modules/director/library/Director/Data/Db/DbObject.php(1173): Icinga\Module\Director\Data\Db\DbObject->loadFromDb()
#1 /usr/share/icingaweb2/modules/director/library/Director/Objects/IcingaObject.php(2589): Icinga\Module\Director\Data\Db\DbObject::load(String, Object(Icinga\Module\Director\Db))
#2 /usr/share/icingaweb2/modules/director/library/Director/Web/Controller/ObjectController.php(458): Icinga\Module\Director\Objects\IcingaObject::loadByType(String, String, Object(Icinga\Module\Director\Db))
#3 /usr/share/icingaweb2/modules/director/library/Director/Web/Controller/ObjectController.php(441): Icinga\Module\Director\Web\Controller\ObjectController->loadObject()
#4 /usr/share/icingaweb2/modules/director/library/Director/Web/Controller/ObjectController.php(75): Icinga\Module\Director\Web\Controller\ObjectController->eventuallyLoadObject()
#5 /usr/share/php/Icinga/Web/Controller/ActionController.php(155): Icinga\Module\Director\Web\Controller\ObjectController->init()
#6 /usr/share/php/Icinga/Web/Controller/Dispatcher.php(59): Icinga\Web\Controller\ActionController->__construct(Object(Icinga\Web\Request), Object(Icinga\Web\Response), Array)
#7 /usr/share/icingaweb2/library/vendor/Zend/Controller/Front.php(937): Icinga\Web\Controller\Dispatcher->dispatch(Object(Icinga\Web\Request), Object(Icinga\Web\Response))
#8 /usr/share/php/Icinga/Application/Web.php(300): Zend_Controller_Front->dispatch(Object(Icinga\Web\Request), Object(Icinga\Web\Response))
#9 /usr/share/php/Icinga/Application/webrouter.php(99): Icinga\Application\Web->dispatch()
#10 /usr/share/icingaweb2/public/index.php(4): require_once(String)
#11 {main}
I ran
icinga2 object list | grep "SERVER-I-DELETED"
on one of the masters and got no results.
If you’ve made it this far through my tale of woe, any help is appreciated. It seems to me that something is out of sync somewhere in the pipeline, I just can’t figure out where (presumably between the masters and the web frontend?). I know details about our setup are sparse (which I apologise for), my main questions are-
- Is there a way to clear a notification out of the queue if the host doesn’t exist any more?
- Is there anything really obvious I’m missing in my troubleshooting as far as logs / Director goes?
Thanks all,
Dan