Icinga2 integration with pagerduty(go-pdagent)

Hi,

Sorry, I replied too fast, I did not read the pages referenced here:

Now that I have read them, I understand your post a bit better, even if I think you are still mixing up the Python and the bash scripts. Everything works from my point of view with the Python script. The only abnormal thing I see is the string “Integration KEY HERE” within the pdagent logs, but this may be due to the bash script.

Could you please share the content of:

  • /etc/icinga2/icinga2.conf
  • /etc/icinga2/conf.d/pagerduty-icinga2.conf

Thank you,

Jean

PS: I will delete the previous post (the hasty reply)

thanks,
now i am only using python script ,
and the “integration key here” is the actual 16 digit integration key which i removed, bcz early when i pasted it got deleted the entire post by icinga.

  • /etc/icinga2/icinga2.conf
// managed by puppet

/**
 * Icinga 2 configuration file
 * - this is where you define settings for the Icinga application including
 * which hosts/services to check.
 *
 * For an overview of all available configuration options please refer
 * to the documentation that is distributed as part of Icinga 2.
 */

/**
 * The constants.conf defines global constants.
 */
include "constants.conf"

/**
 * The zones.conf defines zones for a cluster setup.
 * Not required for single instance setups.
 */
include "zones.conf"

/**
 * The Icinga Template Library (ITL) provides a number of useful templates
 * and command definitions.
 * Common monitoring plugin command definitions are included separately.
 */
include <itl>
include <nscp>
include <plugins>
include <plugins-contrib>
include <windows-plugins>

/**
 * The features-available directory contains a number of configuration
 * files for features which can be enabled and disabled using the
 * icinga2 feature enable / icinga2 feature disable CLI commands.
 * These commands work by creating and removing symbolic links in
 * the features-enabled directory.
 */
include "features-enabled/*.conf"

/**
 * Although in theory you could define all your objects in this file
 * the preferred way is to create separate directories and files in the conf.d
 * directory. Each of these files must have the file extension ".conf".
 */
include_recursive "conf.d"
  • /etc/icinga2/conf.d/pagerduty-icinga2.conf
object UserGroup "ops" {
  display_name = "Ops Team"
}

object User "pdagent" {
  pager = "e6e4973984bb4606c0a2262bac1c760b"
  groups = [ "ops" ]
  display_name = "PagerDuty Notification User"
  states = [ OK, Warning, Critical, Unknown, Up, Down ]
  types = [ Problem, Acknowledgement, Recovery ]
}

object NotificationCommand "notify-service-by-pagerduty" {
  import "plugin-notification-command"
  command = [ "/usr/bin/pdagent-integrations-1.6.2/bin/pd-nagios" ]
  arguments = {
    "-n" = {
      order = 0
      value = "service"
    }
    "-k" = {
      order = 1
      value = "$user.pager$"
    }
    "-t" = {
      order = 2
      value = "$notification.type$"
    }
    "-f" = {
      order = 3
      repeat_key = true
      value = "$f_args$"
    }
  }
  
  vars.f_args = [
    "SERVICEDESC=$service.name$",
    "SERVICEDISPLAYNAME=$service.display_name$",
    "HOSTNAME=$host.name$",
    "HOSTSTATE=$host.state$",
    "HOSTDISPLAYNAME=$host.display_name$",
    "SERVICESTATE=$service.state$",
    "SERVICEPROBLEMID=$service.state_id$",
    "SERVICEOUTPUT=$service.output$" 
  ]
}

object NotificationCommand "notify-host-by-pagerduty" {
  import "plugin-notification-command"
  command = [ "/usr/bin/pdagent-integrations-1.6.2/bin/pd-nagios" ]
  arguments = {
    "-n" = {
      order = 0
      value = "host"
    }
    "-k" = {
      order = 1
      value = "$user.pager$"
    }
    "-t" = {
      order = 2
      value = "$notification.type$"
    }
    "-f" = {
      order = 3
      repeat_key = true
      value = "$f_args$"
    }
  }

  vars.f_args = [
    "HOSTNAME=$host.name$",
    "HOSTSTATE=$host.state$",
    "HOSTPROBLEMID=$host.state_id$",
    "HOSTOUTPUT=$host.output$"
  ]
}

apply Notification "pagerduty-service" to Service {
  command = "notify-service-by-pagerduty"
  states = [ OK, Warning, Critical, Unknown ]
  types = [ Problem, Acknowledgement, Recovery ]
#  period = "24x7"
  users = [ "pdagent" ]
  assign where service.vars.enable_pagerduty == true
}

apply Notification "pagerduty-host" to Host {
  command = "notify-host-by-pagerduty"
  states = [ Up, Down ]
  types = [ Problem, Acknowledgement, Recovery ]
#  period = "24x7" 
  users = [ "pdagent" ]
  assign where host.vars.enable_pagerduty == true
}

Hi again,

Are you sure this is the correct integration key?

In a previous post (deleted since), I saw a reference to b30b453a31e44f0ed06a260060616483.

I suppose the value for pager must match the value defined in /etc/pdagent/config.yaml. Could you again share the full content of /etc/pdagent/config.yaml?

Other than this potential key mismatch, I see no issue with your implementation. The log files tell us that the Icinga notification process gets triggered and the interaction with pdagent is working.

Perhaps debug log files will show the issue. Could you change the log level in /etc/pdagent/config.yaml to debug (here I see only info level logs)? You may need to restart the pdagent in order for the changes to take effect.

Best regards,

Jean

Hi ,
the key is correct i have generated the new one from pagerduty web which is from the list as icinga,
/etc/pdagent/config.yaml :

address: 127.0.0.1:49463
database: /var/db/pdagent/pdagent.db
pidfile: /var/run/pdagent/pidfile
region: us
secret: e6e4973984bb4606c0a2262bac1c760b
log_level: debug 

still it says info in logs ,
and nothing changed same as per last message

Hi,

Could you show the output of ps -ef | grep pdagent? I would like to check the config file is really /etc/pdagent/config.yaml.

Another way to check is to rename that file, restart pdagent, and check for any error.

Thank you,

Jean

ps -ef | grep pdagent
pdagent 1130 1 0 13:43 ? 00:00:00 /usr/local/bin/pdagent server

i have also tried changing the name of config.yaml to pdagent .yaml and it gives below error from command line test alert, and then i changed it back to config.yaml and it was successfully send from command line once again with :

sudo -u nagios pdagent enqueue -k "e6e4973984bb4606c0a2262bac1c760b"   -t trigger   -d "This is only a test"   -u "https://events.pagerduty.com/integration/e6e4973984bb4606c0a2262bac1c760b/enqueue"   -e "error"   -f HOSTDISPLAYNAME=icinga01-mel.trellian.com

output was

{"level":"info","ts":1714967241.031592,"logger":"Heartbeat","caller":"server/heartbeat.go:58","msg":"Starting heartbeat."}
{"level":"info","ts":1714967934.4702215,"logger":"Server","caller":"server/middleware.go:12","msg":"Handling request: /send"}
{"level":"info","ts":1714967934.4711452,"logger":"Server","caller":"server/middleware.go:30","msg":"Authorization failure with client auth header: token yup7rmzgfp00ifq4nb3apnfzq9nlfnh2"}

and the output is :

{"level":"info","ts":1714968284.2885237,"logger":"Heartbeat","caller":"server/heartbeat.go:58","msg":"Starting heartbeat."}
{"level":"info","ts":1714968318.0720694,"logger":"Server","caller":"server/middleware.go:12","msg":"Handling request: /send"}
{"level":"info","ts":1714968318.0725937,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:31","msg":"Enqueuing to e6e4973984bb4606c0a2262bac1c760b with key kdfzgqli3z3u5psa9vj6v2ujfpuzvx1v."}
{"level":"info","ts":1714968318.0768669,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:37","msg":"Event enqueued with key kdfzgqli3z3u5psa9vj6v2ujfpuzvx1v, ID 2."}
{"level":"info","ts":1714968318.0768962,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:50","msg":"Enqueuing kdfzgqli3z3u5psa9vj6v2ujfpuzvx1v with EventQueue."}
{"level":"info","ts":1714968318.0770943,"logger":"EventQueue.e6e4973984bb4606c0a2262bac1c760b","caller":"eventqueue/eventqueue.go:138","msg":"Worker started."}
{"level":"info","ts":1714968318.0771167,"logger":"EventQueue.e6e4973984bb4606c0a2262bac1c760b","caller":"eventqueue/eventqueue.go:140","msg":"Job started, 0 pending."}
{"level":"info","ts":1714968319.1277857,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:63","msg":"EventQueue returned success for kdfzgqli3z3u5psa9vj6v2ujfpuzvx1v. "}
{"level":"info","ts":1714968319.1342375,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:70","msg":"Set status of kdfzgqli3z3u5psa9vj6v2ujfpuzvx1v to success."}

Hi,

When the config file is named config.yaml, the output looks right. Do you get the alert in pdagent?

Best regards,

Jean

with command line yes as shared the logs for pdagent.logs also get alert on the web interface,
but though incident trigger nothing in pdagent logs or web interface.

OK thanks, great. Now we are indeed at step 3 :wink:

Do you see anything in icinga2.log ?

while sending from command line obviously nothing, but when test incident is triggered then yes
the icinga logs for icident triggered is below and pdagent does not have logs for incident triggered apart from command line as mentioned earlier:

[2024-05-06 20:48:21 +1000] information/Checkable: Checkable 'icinga01-mel.trellian.com!ssh' has 1 notification(s). Checking filters for type 'Problem
', sends will be logged.
[2024-05-06 20:48:21 +1000] information/Notification: Sending 'Problem' notification 'icinga01-mel.trellian.com!ssh!pagerduty-service' for user 'pdage
nt'
[2024-05-06 20:48:21 +1000] information/Notification: Completed sending 'Problem' notification 'icinga01-mel.trellian.com!ssh!pagerduty-service' for c
heckable 'icinga01-mel.trellian.com!ssh' and user 'pdagent' using command 'notify-service-by-pagerduty'.

OK thank you!

At this stage, I would create a backup of the Python script, so I can restore after debugging.

As debugging action, I would change the Python script and insert some statements that trace the execution into a log file.

I am not a Python expert, but a quick Google search provided two options:

Hope this will reveal where the problem lies.

Best regards,

Jean

Hi jean,
i recieved some info from pagerduty might that be helpful :

I reviewed the logs and I can see we received the request but dropped the event as it did not have a valid request, according to the logs we received a GET request when it should be a POST, this only suggests that there is a config issue when an incident in incinga is triggered, mostly likely with the payload its sending to PagerDuty.

also i tried to add debug to the script till the time there was error in the script(indentations,etc) it showed that in icinga2.logs but nothing is showed in the logs file we set in the script.

DEBUG:root:Starting event processing.
DEBUG:root:Starting event processing.
DEBUG:root:Starting event processing.
DEBUG:root:Starting event processing.

Hi usmanpasha26,

I propose several debug actions:

  1. Run “/usr/local/bin/pdagent --help”, please paste here the output. Perhaps we are missing a parameter (but I don’t think so).

  2. Share the content of /etc/systemd/system/pdagent.service

  3. Add debug statements to /usr/bin/pdagent-integrations-1.6.2/bin/pd-nagios :

Just before line:

    subprocess.run(command + arguments)

Add (indentation is important):

    import logging
    logger = logging.getLogger(__name__)
    logging.basicConfig(
        filename='/tmp/pd4icinga.log',
        format='%(asctime)s - %(name)s - %(levelname)-8s - %(message)s',
        level=logging.DEBUG)
    logger.debug('Just before subprocess.run')
    logger.debug("Value of command: "+"".join(command)+".")
    logger.debug("Value of arguments: "+"".join(arguments)+".")

Just after that same line:

    subprocess.run(command + arguments)

Add:

    logger.debug('Just after subprocess.run')
  1. Run this modified Python script as previously, from command line, and trigger it from Icinga. Please share corresponding log entries here. You will find them in /tmp/pd4icinga.log

Thank you,

Jean

PS: I am no Python expert; here is where I found background on Python logging:
Logging HOWTO — Python 3.5.9 documentation
how to solve a must be string not list error in python? - Stack Overflow

/usr/local/bin/pdagent --help
A PagerDuty Agent and corresponding Command Line Interface.

	The agent acts as a local server between your own infrastructure and PagerDuty,
	providing command line tools to send PagerDuty events while ensuring event
	ordering and mitigating backpressure.

	On first run it's recommended you run "init" to generate a default
	configuration, then run "server" to start the agent itself.

Usage:
  pdagent [command]

Available Commands:
  enqueue     Queue up a trigger, acknowledge, or resolve v2 event to PagerDuty
  help        Help about any command
  init        Generate a new initial configuration file.
  nagios      Access the Nagios integration command(s).
  queue       Access the daemon's event queue.
  send        Queue up a trigger, acknowledge, or resolve a V1 event to PagerDuty
  sensu       Access the Sensu integration command(s).
  server      Start the server daemon.
  version     Version and build information.
  zabbix      Access the Zabbix integration command(s).

Flags:
  -a, --address string   address to run and access the agent server on. (default "127.0.0.1:49463")
      --config string    config file (default is $HOME/.go-pdagent.yaml)
  -h, --help             help for pdagent
      --pidfile string   pidfile for the currently running pdagent instance, if any. (default "/root/.pdagent/pidfile")
  -s, --secret string    secret used to authorize agent access. (default "wz22raz0ztxbkelwpro0ueg9t6xahcu2")
      --version          version for pdagent
cat /etc/systemd/system/multi-user.target.wants/pdagent.service
[Unit]
Description=PagerDuty Agent
After=network.target

[Service]
Type=simple
Environment=APP_ENV=production
ExecStart=/usr/local/bin/pdagent server
ExecStop=/usr/local/bin/pdagent server stop
KillMode=process
TimeoutStopSec=30
RestartSec=15
User=pdagent
Group=pdagent
PermissionsStartOnly=true

[Install]
WantedBy=multi-user.target

when triggered incident from icinga the debug logs :

cat /tmp//pd4icinga.log 
2024-05-08 14:26:16,182 - __main__ - DEBUG    - Just before subprocess.run
2024-05-08 14:26:16,183 - __main__ - DEBUG    - Value of command: /usr/local/bin/pdagent.
2024-05-08 14:26:16,183 - __main__ - DEBUG    - Value of arguments: -nservice-k$user.pager$-t$notification.type$-fSERVICEDESC=$service.name$-fSERVICEDISPLAYNAME=$service.display_name$-fHOSTNAME=$host.name$-fHOSTSTATE=$host.state$-fHOSTDISPLAYNAME=$host.display_name$-fSERVICESTATE=$service.state$-fSERVICEPROBLEMID=$service.state_id$-fSERVICEOUTPUT=$service.output$.
2024-05-08 14:26:16,190 - __main__ - DEBUG    - Just after subprocess.run

from command line

sudo -u nagios pdagent enqueue -k "e6e4973984bb4606c0a2262bac1c760b"   -t trigger   -d "This is only a test"   -u "https://events.pagerduty.com/integration/e6e4973984bb4606c0a2262bac1c760b/enqueue"   -e "error"   -f HOSTDISPLAYNAME=icinga01-mel.trellian.com
{"level":"info","ts":1715145828.157112,"logger":"Server","caller":"server/middleware.go:12","msg":"Handling request: /send"}
{"level":"info","ts":1715145828.1576862,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:31","msg":"Enqueuing to e6e4973984bb4606c0a2262bac1c760b with key 7fcylvzqq8c80iky5gbhuwi4lzi484ua."}
{"level":"info","ts":1715145828.1649346,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:37","msg":"Event enqueued with key 7fcylvzqq8c80iky5gbhuwi4lzi484ua, ID 3."}
{"level":"info","ts":1715145828.1649747,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:50","msg":"Enqueuing 7fcylvzqq8c80iky5gbhuwi4lzi484ua with EventQueue."}
{"level":"info","ts":1715145828.1680493,"logger":"EventQueue.e6e4973984bb4606c0a2262bac1c760b","caller":"eventqueue/eventqueue.go:138","msg":"Worker started."}
{"level":"info","ts":1715145828.168147,"logger":"EventQueue.e6e4973984bb4606c0a2262bac1c760b","caller":"eventqueue/eventqueue.go:140","msg":"Job started, 0 pending."}
{"level":"info","ts":1715145829.0431778,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:63","msg":"EventQueue returned success for 7fcylvzqq8c80iky5gbhuwi4lzi484ua. "}
{"level":"info","ts":1715145829.046886,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:70","msg":"Set status of 7fcylvzqq8c80iky5gbhuwi4lzi484ua to success."}

Hello,

Thank you for the feedback.

I realise there may be some misunderstanding on terminology.

  • “command”: this is the term I use to refer to a pre-existing command or utility, like “sudo” or “pdagent”
  • “script”: this is the term I use to refer to the scripts wrapping the commands, like the bash script “icinga.sh” (not sure of the name), or the Python script “pd-agent”

I see above you have triggered the script from Icinga, and the script failed to decode the arguments and/or environment variables it was given, as we see “-k $user.pager$ -t $notification.type$ etc.”

But you did not trigger the script from the command line. Could you please do that? Once the script generates a pdagent alert when triggered from the command line, we can tune Icinga to mimic the command line invocation of the script.

Best regards,

Jean

Hi ,
i ran that as well earlier also and running it again, if i comment out -n , then it gives error on -k and so on.

 /usr/bin/pdagent-integrations-1.6.2/bin/pd-nagios -k "e6e4973984bb4606c0a2262bac1c760b"   -t trigger   -d "This is only a test"   -u "https://events.pagerduty.com/integration/e6e4973984bb4606c0a2262bac1c760b/enqueue"   -e "error"   -f HOSTDISPLAYNAME=icinga01-mel.trellian.com
output:
unknown shorthand flag: 'n' in -n

delog logs :

2024-05-08 20:28:57,709 - __main__ - DEBUG    - Just before subprocess.run
2024-05-08 20:28:57,709 - __main__ - DEBUG    - Value of command: /usr/local/bin/pdagent.
2024-05-08 20:28:57,709 - __main__ - DEBUG    - Value of arguments: -nservice-k$user.pager$-t$notification.type$-fSERVICEDESC=$service.name$-fSERVICEDISPLAYNAME=$service.display_name$-fHOSTNAME=$host.name$-fHOSTSTATE=$host.state$-fHOSTDISPLAYNAME=$host.display_name$-fSERVICESTATE=$service.state$-fSERVICEPROBLEMID=$service.state_id$-fSERVICEOUTPUT=$service.output$.
2024-05-08 20:28:57,716 - __main__ - DEBUG    - Just after subprocess.run

Thank you for this.

Now you should debug that Python script so it works from the command line.

I am afraid I cannot help you with this, as it is purely a pdagent issue.

Alternatively - Did you have a bash script that was working from the command line? If you did, we could work on that and make that one work from Icinga.

Best regards,

Jean

Hi jean,
now i can send the notification from command line using the default commands :

/usr/local/bin/pdagent enqueue -k e6e4973984bb4606c0a2262bac1c760b  service -t trigger -d "this is test "

pdagent logs :

{"level":"info","ts":1715764484.5323665,"logger":"Server","caller":"server/middleware.go:12","msg":"Handling request: /send"}
{"level":"info","ts":1715764484.5328326,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:31","msg":"Enqueuing to e6e4973984bb4606c0a226
2bac1c760b with key gtot356wu1db4xzw1g7uy1qdrp1wrfd4."}
{"level":"info","ts":1715764484.5354397,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:37","msg":"Event enqueued with key gtot356wu1d
b4xzw1g7uy1qdrp1wrfd4, ID 7."}
{"level":"info","ts":1715764484.5355127,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:50","msg":"Enqueuing gtot356wu1db4xzw1g7uy1qdr
p1wrfd4 with EventQueue."}
{"level":"info","ts":1715764484.5378575,"logger":"EventQueue.e6e4973984bb4606c0a2262bac1c760b","caller":"eventqueue/eventqueue.go:140","msg":"Job star
ted, 0 pending."}
{"level":"info","ts":1715764486.0072567,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:63","msg":"EventQueue returned success for gto
t356wu1db4xzw1g7uy1qdrp1wrfd4. "}
{"level":"info","ts":1715764486.0121377,"logger":"PersistentQueue","caller":"persistentqueue/enqueue.go:70","msg":"Set status of gtot356wu1db4xzw1g7uy
1qdrp1wrfd4 to success."}

however if i trigger the alert through icinga it also gets recorded in pdagent logs but with en error

{"level":"info","ts":1715856006.5544124,"logger":"Server","caller":"server/middleware.go:12","msg":"Handling request: /send"}
{"level":"info","ts":1715856006.5566208,"logger":"Server","caller":"server/middleware.go:30","msg":"Authorization failure with client auth header: token 540hzt9w2ssiurqtfeuwsvzrtbfvzana"}

the error is been triggered from this part of the code :

Hi,

From this command line, you need to make a bash script, and you can reuse the one you had above:

#!/bin/bash
# Script to check service status using go-pdagent and Icinga 2 plugin format

# Define exit codes
OK=0
WARNING=1
CRITICAL=2
UNKNOWN=3

# Get all parameter values - TO BE WRITTEN CORRECTLY
AUTH_KEY=$1

/usr/local/bin/pdagent enqueue -k "$AUTH_KEY" service -t trigger -d "this is test "

Perhaps there are other parameters that you will want to pass from Icinga to the script, like “service” or “trigger”, or “This is a test”.
Adapt the script to define and read in all parameters.

Then please paste here the full script and the results of a run.

Best regards,

Jean