Adhoc client VPN Service Level Monitoring via Icinga2

prime69 · April 23, 2024, 11:37pm

Please describe your problem as detailed as possible and don’t forget to use a meaningful title

Scenario: Work from home

Requirements:

Monitor service level VPN connections from client VPN end to Firewall VPN interface and raise alerts when the VPN sessions goes offline…

Logic:

Easiest Solution, would have someone work remotely 7 days a week 365days a year from home and have icinga2 ping to the client node through the VPN tunnel…setup when client VPN backs to Firewall

making sure there is no ideal time on Firewall for that specific IP address and that the VPN ip address of client through tunnel is pingable …to satelite/master icinga2

so if VPN ip address of remote client is unreachable, can alert an engineer to check all VPN sessions on the Firewall if VPN service wen offline…

Looking for a better solution from icinga community if possible

Thanks

moreamazingnick · April 24, 2024, 6:52am

This would also mean to contact your engineer if the users internet does not work…
Or the Windows Network settings change from somemthing to public …

I would write a check to query the current sessions from the firewall and then ping them.
Now you can calculate the percentage of reachableHosts/Hosts and decide where to put your threshold for a critical and a warning.

Or

you have a remote device connect regularly via vpn to the firewall
cron based like every 10 minutes
if it connects it send a passive check result to the icinga master.
If it fails in time icinga will change the state of the check and you get an alert.

rivad · April 24, 2024, 7:38am

We use Robot Framework, RCC as runner, ReportPortal for logging, a share for distribution and archiving, and Ansible to deploy and manage to do end-to-end testing.
But because of the complexity, also mentioned by @moreamazingnick, this will only ever tell you with 100% confidence, that it works and NOT that it failed because of the thing you intend to measure as the rest of the stack (robot, OS, network, hardware) can also introduce failure.

I would set up 2 machines, connecting 2 different ways.

Sorry for the rough state:

I also wanted to add an optional Icinga2 API call to read the thresholds but haven’t gotten around to it.

I then use the Business Process module with the min1 operator to ignore flakiness and attach a check to alert on the aggregate.