I’m sending notification for hosts on [ Problem, Recovery ] and I was wondering if there is a field (or a set of fields) I can use in my Recovery notification to uniquely refer to the Problem it is recovering from?
Worst case I can just assume that it is always recovering from the “last received” problem, but I was wondering if there is a cleaner way to do that?
To give you some background, I’m test driving an alert aggregation tool (https://www.keephq.dev/) for which I had to create a custom NotificationCommand. This tool has features to link alerts to each other, the most basic being to link a recovery to the original problem; however I need to tell it which fields to use, and, obviously, Include those fields in both Notifications.
This is great news that a Recovery notification already refers to a specific Problem, its saves me some hacking around (I was already thinking about some dirty workarounds haha). Do you know which field(s) it uses to do so? I cannot find this information in the documentation.
I see. Well, inside Icinga2 the unique identifiers are the host_name (for Host problems) and host_name and service_name (for Service problems, usually combined to Host!Service). Since a single object (Host or Service) is in a Problem state or not, this is usually sufficient to identify a “Problem”.
I hope this helps
Clear, thanks! This is already what I am doing now, but I was curious if there was a way to relate individual (problem, recovery) pairs to facilitate SLA tracking. For instance some monitoring tools associate an ID to a problem, and refer to that ID during recovery.
In the end it is more a business requirement than a technical consideration. I’ll look further, maybe using some of the last_xxx timestamp attributes in combination to the host/service names can do the trick.
Icinga 2 does not create tickets[1] for status problems. If a HARD state change happens and a Notification object is configured, it will be used to send out a notification.
In theory, you could access the Host’s or Service’s last_state_change runtime attribute, but this will not spark joy, e.g., think of flapping checks.
Grouping notifications into incidents is out of Icinga 2’s scope. There are people forwarding notifications to ticket systems or other issue tracking platforms to do this job. The service you linked seems to be one of those, I guess. Within the Icinga universe, there is the new Icinga Notifications project tackling this issue.