Given there isn't a PowerDNS Recursor plugin yet (I can see one waiting for PR approval though), I wrote a script to collect the data from rec_control and format it as the InfluxDB line protocol, which I'd read in with Telegraf (script below).
Because rec_control requires root to read/write to the control socket and to /run, I'm using a sudoers entry to allow telegraf to execute /usr/bin/rec_control without a password.
In test, this works fine. It also works fine when testing with the method described in this comment on a similar issue.
[root@nameserver0-a ~]# sudo -H -u telegraf -s
bash-4.2$ cd $HOME
bash-4.2$ pwd
/etc/telegraf
bash-4.2$ telegraf --config=/etc/telegraf/telegraf.conf --test --input-filter=exec
> powerdns_recursor_acmecustom,host=nameserver0-a.srvlist.acme.net all-outqueries=277939708,answers-slow=13612430,....snip.....
bash-4.2$ /usr/local/bin/recursor_telegraf.sh
powerdns_recursor,host=nameserver0-a.srvlist.acme.net all-outqueries=277943153,answers-slow=13612620,.....snip......
However when Telegraf is running, I'm getting the following error:
Aug 16 11:27:40 nameserver0-a.srvlist.acme.net telegraf[19910]: 2018-08-16T10:27:40Z E! Error in plugin [inputs.exec]: metric parse error: expected field at offset 63: "powerdns_recursor,host=nameserver0-a.srvlist.acme.net \n"
Given that unexpected newline I can only assume there's a descripancy between the way I'm running the test and the environment that telegraf is executing in which is preventing sudo from running the command.
[[inputs.exec]]
commands = ["/usr/local/bin/recursor_telegraf.sh"]
data_format = "influx"
name_suffix = "_acmecustom"
[root@nameserver0-a ~]# cat /usr/local/bin/recursor_telegraf.sh
#!/bin/bash
echo "powerdns_recursor,host=$(hostname) $(/usr/bin/sudo /usr/bin/rec_control --timeout=2 get-all | tr -s '\t' '=' | paste -sd ',' -)"
Telegraf should either execute the script successfully with the embedded sudo command, OR output a meaningful error so further diagnostics can take place. Ideally this would also be reflected when running under --test.
It passes the --test but fails when running as a service.
You can write to stderr in your script and Telegraf should log the error.
Okay, gave this another look this morning. The requiretty setting is enabled which was preventing Telegraf from using sudo. Resolved this with the following in sudoers:
Defaults:telegraf !requiretty, !syslog
telegraf ALL = NOPASSWD: /usr/bin/rec_control
In theory the error from sudo should have already been going into stderr, so not sure why Telegraf wasn't picking this up.
Okay, gave this another look this morning. The
requirettysetting is enabled which was preventing Telegraf from using sudo. Resolved this with the following in sudoers:Defaults:telegraf !requiretty, !syslog telegraf ALL = NOPASSWD: /usr/bin/rec_controlIn theory the error from sudo should have already been going into stderr, so not sure why Telegraf wasn't picking this up.
It's works! Thank you very much!
Most helpful comment
Okay, gave this another look this morning. The
requirettysetting is enabled which was preventing Telegraf from using sudo. Resolved this with the following in sudoers:In theory the error from sudo should have already been going into stderr, so not sure why Telegraf wasn't picking this up.