A modest attempt to Monitor Windows Service in services.msc

Hello

I am new in icinga / nagios world. Found that Linux is very rich in plugin- but scripts around windows are hard to find by. I have done monitoring with different tools for many years. I have seen how dynamic the requirement comes to monitor Windows Service. In one server, X-service in another server, Y-service and then in other server all services.

I have always had good results with running with the formula

Status !=running and Startup = "auto" and exit code !=0

The above type formula monitors all service by default which is in automatic startup.

Then we also get requirement to exclude few services which we do not need to alert in case it is down.

Then we also get requirement to monitor only specific services.

To cater all that I came up with the below powershell script. I actually do not know how to use it in icinga2 and I need help on that. Also I need feedback or bug checks by all the experts out here. Please understand this is my first powershell script so don’t get hard on me
What I am trying to make is one plugin which should be able to cater to all windows service monitoring requirement. Again if there is already a plugin like this available please share so that I can use it myself as well. Please share feedback and help in using this in icinga2
Run the script in Powershell ->

#C:\Users\Documents\icinga2\Plugins\Windows\TestService.ps1 -crit|warn 1 -excludeService "A,B,C,D" -startMode "auto|manual" -exitCode 0|1
#C:\Users\Documents\icinga2\Plugins\Windows\TestService.ps1 -crit|-warn 1 -includeService "A,B,C,D" -exitCode 1|0

# Checks either Group of Service as mentioned in argument includeService Or All Service. By default it is all Service
# Obviously you cannot monitor all services as well as group of services - so either you keep allService to blank or 0 and fill includeService OR make allService to 1 and keep includeService to blank
# By Default it checks all service which is in auto startup
# Comma Separated Exclusion List of Service when you monitor all service
# List of Argument being Passed from Icinga2
#Service exitCode : Blank is considered as 0

param([string]$servername, [string]$excludeService, [string]$includeService, [string]$startMode, [string]$state, [int]$exitCode, [int]$allService, [int]$warn, [int]$crit)
    
$singlequote = "'"  #Hammering the command as double quote and single quote is confused in the cim command:P
if (!$state){
    # Defaulting state to running for all services
    $state = "running"  
}
if (!$startMode){
    #Defaulting start mode to auto for all service so that we can monitor by default all automatic services only when it is monitoring AllService
    $startMode = "auto"
}
if ($exitCode){
    # Defaulting exit code
    if ($exitCode -ge 2){
		write-host "Unknown: The exitCode can be blank or 0 or 1"
		exit 3;
	}
}
if (!$includeService -and !$allService){
	#neither includeService is mentioned nor allService is mentioned defaulting to all service
    $allService = 1
}

if (!$warn -and !$crit){
	#neither warn is mentioned nor crit is mentioned defaulting to all service
    $warn = 1
}

if ($allService){
	# Defaulting allSerivce code
	if ($allService -ge 2){
	write-host "Unknown: The allService can be blank or 0 or 1"
	exit 3;
	}
}

if ($excludeService){
	#we expect comma separated services
	$arr = $excludeService -split ','
    foreach ($xelement in $arr){
        $excludeServiceString = $excludeServiceString + ' and ' + 'name !=' + '"' + $xelement+ '"'
    }
}

if ($includeService){
	#we expect comma separated services
	$arr = $includeService -split ','
    foreach ($ielement in $arr){
        $includeServiceString = $includeServiceString + ' or ' + 'name =' + '"' + $ielement+ '"'
    }
    #we need to replace the first occurance of or to and
    $includeServiceString = $includeServiceString -replace '(.*?)or(.*)', '$1and$2'
    echo $includeServiceString
}

if ($allService -eq 1 -and $includeService){
	# Defaulting allSerivce code
	write-host "Unknown: Ambiguous request. Either You want to monitor All Service Or Specified Service in include Service"
	exit 3;
}

if ($includeService -and $excludeService){
	# Defaulting allSerivce code
	write-host "Unknown: Ambiguous request. Either You want to include Service or exclude Service"
	exit 3;
}

if ($allService){
	#We want to include all Service
    if ($startMode){
		#We want to include startMode
        if ($state){
			#We want to include state (ideally should monitor not equal to running)
            if ($exitCode -eq 0){
				#Exit code is 0
                $query = '(Get-CimInstance win32_service -Filter '+$singlequote+'startmode = "'+$startMode+'" and state != "'+$state+'" and exitcode ='+$exitCode+' '+$excludeServiceString+''+$singlequote+').displayname'
			}
            else{
				$query = '(Get-CimInstance win32_service -Filter '+$singlequote+'startmode = "'+$startMode+'" and state != "'+$state+'" and exitcode !=0 '+$excludeServiceString+''+$singlequote+').displayname'
			}
		}
        else{
			write-host "Unknown: The state is messed up. It should be set as 'running'" 
            exit 3 
			}
		}
        else {
			#I do not have a start mode mapping
            if ($exitCode -eq 0){
				$query = '(Get-CimInstance win32_service -Filter '+$singlequote+'state != "'+$state+'" and exitcode ='+$exitCode+' '+$excludeServiceString+''+$singlequote+').displayname'
			}
            else{
				$query = '(Get-CimInstance win32_service -Filter '+$singlequote+'state != "'+$state+'" and exitcode !=0' +$excludeServiceString+''+$singlequote+').displayname'
			}
            
		}
    }
	else {
	#-----------------TO DO ---------------------------#
    #Looks like we want to monitor only the specific services in include service
    if ($exitCode -eq 0){
        $query = '(Get-CimInstance win32_service -Filter '+$singlequote+'state != "'+$state+'" and exitcode ='+$exitCode+' '+$includeServiceString+' '+$excludeServiceString+''+$singlequote+').displayname'
	}
    else{
		$query = '(Get-CimInstance win32_service -Filter '+$singlequote+'state != "'+$state+'" and exitcode !=0' +$excludeServiceString+'' +$includeServiceString+ ''+$singlequote+').displayname'
	}
}

    

try{
    #echo $query
    $serviceStatus = Invoke-Expression $query
}
catch{
    Write-Output $_;
    $_="";
    exit 3;
}
if (!$serviceStatus){
	#No Service Returned matching the query means all good
    Write-Host "OK: All services are up and running"
    exit 0;
}
elseif ($serviceStatus -and $warn){
	Write-Host "WARNING: $serviceStatus are in $startMode and is currently in stopped state"
    exit 1;
}
else{
	Write-Host "CRITICAL: $serviceStatus are in $startMode and is currently in stopped state"
    exit 2;
}
1 Like

Hello and welcome to the community - thanks for sharing your script.

For Windows there is a new solution as release candidate available since week:
PowerShell Framework:

PowerShell Plugins:

PowerShell Service:

To get started easily, you can use the Kickstart:

The documentation within the PowerShell Framework should give you a pretty good start:

We would be happy if you could test it and share your feedback, improvements and if you want also your contribution to these modules.

Just out of curiosity: The script itself looks fine, but you are going to check if every service set to auto is actually running - is this correct? In my experience, this is mostly causing false positives because of services shutting themself down once their work is completed.

Best regards
Lord Hepipud

5 Likes

Thank You. I am excited to test the above.

Regarding your question on monitoring service -> the exit code for those will be set to 0 when the service goes down gracefully. Else it will result in exit code > 0. So ideally set -exitCode 1 in the argument and you will monitor the services which are in auto state and is ungracefully stopped. Also because it gives you option to monitor state / startup type or even few services (arg -includeService) in the whole stack of service - it can be used in variety of ways. Even if you have set the exitCode = 0 the excludeService argument can be used to exclude the Services which is giving false positive as well

Ah I see, thanks for the clarification !
I never considered checking the exit status of a service.

Thanks @cstein . Will be interested if you were able to run the script and your feedback around it

Hello @cstein

Do you have a way to use proxy for installing the module. My windows server do not have a open internet

Another option is to install Icinga on the windows machines as part of a distributed monitoring setup. If you install the NSClient plugin alongside the icinga for windows installation you can then leverage the available checks there and also write custom perfmon counter checks

https://icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#agent-setup-on-windows
https://icinga.com/docs/icinga2/latest/doc/10-icinga-template-library/#windows-plugins-for-icinga-2

I have never used nsclient++. I have installed agents on all Windows Server. I have checked to install the default nsclient++ while configuring the agent. Don’t know if the below can be solved by the combination of agent and nsclient

My Ask is to check
1.event logs (not just event IDs but a combination of eventID with description regex)
2. Service Status monitoring
3. Want to use MSSQL_Health_Check plugin from Windows Server itself where MSSQL is installed via the agent. I do not want to fire commands at port with clear text password from a Linux Machine over the network.

Hello

There is not yet a proxy installation available. You can however download the Kickstart Script inlcuding the required Zip-Packages and use them for a local “repository” of your components.
Then you will not require an internet connection. This handling is build-in with all the installation mechanisms of the module.

1 Like

Hello

Was trying my hands in eventlog and I am not able to find the right arguments. Also it is too risky as if I don’t write correctly then I will never get the alert.

check_nscp_api.exe --password secret -H localhost -q check_eventlog -a “filter=provider = ‘Kerberos’ and id=4 and ???”

Not sure the argument is going in the right direction - Please help

What I want to achieve is
If Event Source = Kerberos and Event ID = 4 and LogName = System -> raise an alarm if it is found in last 30mins. Else OK

The above Monitors AD Trust issues with domain

Hello

I am still struggling with the formation of filter. For some reason I am not able to filter any information Level alert.

I am trying to run :

check_nscp_api.exe --password Secret -H localhost -q check_eventlog -a scan-range=-5h file=Application filter=“source = ‘MYEVENTSOURCE’ and level=‘information’”

Output is:

check_eventlog OK: Event log seems fine | ‘problem_count’=0;0;0

But ideally it should return the event as given in the screenshot. It is pushed hardly few mins back

Hello,

sorry for the late response. I’m somehow not getting any notifications sadly :frowning:

For your Filter, you could try this:

Invoke-IcingaCheckEventlog -LogName Application -IncludeEventId 4 -IncludeEntryType 'Information';

by checking the code I realised there is a small issue with Before and After. It expects a DateTime Integer for the moment, which should have been a 5h for example. I will test this and fix it shortly.

Right now it is not possible to filter for specific sources, could however be integrated if it is required.

Edit: Just one addition. By default, the EventLog check plugin will write a cache file on when the last check has been executed. This means, if you run the check all 5 Minutes for example, you will only read the full EventLog on the first run, afterwards it will only fetch the delta betwee the last run and now.