Hello everyone,
I have concerns about two aspects of my current setup that I think could be improved.
- User Notifications: Right now, I set a Custom Variable for every host that accepts entries from a list of names. For each name in that list, I’ve created a user and apply rules for both hosts and services corresponding to these users. While this method works, it’s cumbersome to add new users. Does anyone have suggestions for a better configuration?
- Scheduled Operations: We run replications and backups from 10 pm to 6 am. During this period, there’s an increase in critical errors like high CPU usage. I’ve set notifications to only send between 6 am and 9 pm to avoid false alerts, but this means we won’t be notified of genuine issues occurring between 10 pm and 6 am. Any solutions for this?
I’d appreciate any advice. Thank you!
- not sure if we mean the same. I am talking about Notifications not about Useraccounts.
- This would mean i get two times the amount of services, which is quite a lot.
Thanks for your suggestion i hope you can explain me the first one in more detail.
what @rivad means is that you put the users in usergroups. I would even go further and put the hosts and services in host and servicegroups.
Then you can apply the notification based on usergroups:
apply Notification "TEST" to Host {
import "host-notification"
assign where "Network Router" in host.groups
user_groups = [ "one usergroup" ,"another usergroup" ]
users = [ "one user" ,"another user"]
}
for your second problem:
I would’t restrict the notification, but create a downtime for the cpu/mem services
the downtime prohibits icinga to send notifications. In the example I used the checkcommand but you can also use hostnames/servicenames or other vars to even exclude or include hosts/services that should be put in a downtime.
see the docs for more information:
Downtimes can be scheduled for planned server maintenance or any other targeted service outage you are aware of in advance.
Advanced Topics - Icinga 2
apply ScheduledDowntime "TEST" to Service {
author = "nicolas"
comment = "service dt"
fixed = true
assign where service.check_command == "checkmem" && service.check_command == "checkcpu"
ranges = {
"monday" = "22:00-24:00"
"tuesday" = "00:00-06:00"
#and so on...
}
}
duplicated services with adapted thresholds (thats what @rivad sugessted) is only ment for the mem /cpu checks that are really important to you.
Example:
MEM service1 standard / Critical 80% Warning 70%
You know that the mem usage hits 90% during sync processes
MEM service2 info / / Critical 98% / Warning 95%
But that is just for the situation that you really need to distinguish between these two monitoring events
2 Likes
1.) we dont really have user groups our notifications are pretty fine-grained, therefore i assign every hosts the contacts using a variable. So usergroups dont work i think.
2.) that sounds good and i will give feedback on it