Icinga-Powershell-Plugins: Questions to Invoke-IcingaCheckEventlog

stevie-sy · October 6, 2020, 5:48am

Hello at all,

During our implementation and tests with Icinga-PowerShell Plugins (Installation according to the instructions) we got the request to monitor also the Windows EventLog. So we configured the check “Invoke-IcingaCheckEventlog”.
We came up with the following problems, whereby we would like to ask you about your experience. Because we may have a mistake in thinking.

After installing the plugins and without using the switch “DisableTimeCache” the checks doesn’t work. Icinga throws every time a permission error for the cache directory. With this switch everything works fine. Maybe this was a result of our server setups and the set permissions from our server admins. .
What we also realized while using the switch “DisableTimeCache” was, that it really makes no sense using this. Let’s assume we have a check interval from 5 min. During this interval a program writes a message into the eventlog and the check throws warning/critical. After the next check interval the check is ok again, because in this interval there no new event was written into the log.
So checking the hole eventlog also makes no sense. Because therefore we have to know how many log entries are “normal”. Or our colleagues have to delete the event after fixing the problem. In this case it makes a log obsolete.
Trying the switch “After” improves the situation a little bit. With the Icinga-DSL (using var dt = DateTime() - 24 * 60 * 60; return dt.to_string() ) we can create a timestmap like now - 24h. But our office is closed on weekends and public holidays. So if there was a event written, we don’t get it via Icinga the next working day that something was there. Here we would first have to check the logs from all servers again or expand the time-range.
The next idea would be that icinga have to stop to check the eventlog if it’s getting critical. After fixing the problem we have to set the check manually to ok. But here is the problem, if there would be another event written.
Another possibility would be that every server bump the full eventlog from every server to our ELK stack. However, we see similar problems like written above if checking this log.

So what do you think? What is your experience?

Thank you

stevie-sy · December 2, 2020, 6:47am

@cstein How about this? Because of helping in an other thread. Do you have here also some ideas/solutions/workarround? Would be very nice. Thank you

radioactive9 · December 2, 2020, 7:02am

We are using the below definition

apply Service "XXX-P_MS_SQLEvtLogID" {
    import "XXX-tmplService-MS-PS_EventLog"

    assign where "XXX-tmplHost-MS" in host.templates
    vars.IcingaCheckEventlog_Array_IncludeEventId = [
        "1101",
        "3201"
    ]
    vars.IcingaCheckEventlog_Int32_Verbosity = "2"
    vars.IcingaCheckEventlog_Object_Warning = "~:0"
    vars.IcingaCheckEventlog_String_LogName = "Application"
    vars.IcingaCheckEventlog_Switchparameter_DisableTimeCache = false

    import DirectorOverrideTemplate
}

The only issue we feel that is there is we may be missing some alerts when multiple same eventID is written at the same time. But not very sure as we can’t reproduce it.

Lets say 1101 and 3201 is written in same sample we have seen both eventID mentioned in the alert.

Now we don’t have a test case where 1101 is written in 1 sample and alerted - in the next sample we have 3201. Whether the event will update the description of 1101 and change it to 3201. Don’t know.

stevie-sy · December 2, 2020, 7:11am

Thanks for your example. I will add those parameters from your example that are missing to my service definition.Let’s see what happen

cstein · March 4, 2021, 4:01pm

Hello,

sorry for the late response! I’m sadly not getting any notifications when I’m mentioned in an issue

Basically the way to monitoring event logs with the PowerShell plugins have changed since the first release.

What we did so far:

If you add -Verbosity 1 now, you will be prompted with EventLog messages for warning/critical packages. If you go up to -Verbosity 2, you will receive a list of all messages included for an EventId
The last release of Icinga for Windows v1.4.0 added the support to use “5h” for the -After and -Before argument. This will allow you to check the last 5 hours for example, regardless of the last check execution
For the permission problem, you can simply run Test-IcingaAgent. You will then receive a result of certain tests for your installation. In case the cache file is not writeable, you will receive a command to execute to fix the permission error for the Icinga Agent service use

I hope this helps. If we can make different approaches/improvements, please feel free to share them with us.

stevie-sy · March 5, 2021, 6:37am

Thank you @cstein. I will test this also, if this is helping us.