Beats: feature request: metricbeat: support for ntp

Created on 2 Nov 2016  路  15Comments  路  Source: elastic/beats

Most services need ntp, other even require it, so this is a important thing to monitor. Most other monitoring tools support ntp, but beats do not...

I tried to setup some ntp logging via filebeat, but the "Modified Julian Day" date format used by ntpd and the lack of supoprt by logstash to this format, make it almost impossible to do something useful.

Now that we have metricbeat, we could use it to get this stats, probably via ntpq, and send it to logstash/elasticsearch.

Metricbeat Integrations Services enhancement module

Most helpful comment

It sounds like we should have a few metricsets for monitoring time. Ideally some common fields should be established so that both metricsets report report some common data. I think we should create a time module and start with these metricsets:

  • ntpq - This can be used to send queries to a host to get NTP status information.
  • chrony - chronyd has an interface that we can use to get information such as sources or tracking. The interface can be exposed via IP or unix socket.

All 15 comments

I assume the data you would be interested in is the one you get from ntpq -p? It would be nice if we don't have to execute a binary to get this data, but not sure if this is available from somewhere else?

Yes, that it...

you can use a connection to localhost:123/udp to get the data:
i found this, might be useful:
collectd usage of ntpd: https://github.com/collectd/collectd/blob/master/src/ntpd.c
go code for ntpclient:
https://github.com/beevik/ntp
https://github.com/bt51/ntpclient

or enable the peerstats log in ntpd config and check the log

@danielmotaleite Thanks for sharing. It is great that the data is also available over localhost, so not need to execute a binary. I think it is definitively interesting to add an ntp module. Perhaps interested to contribute one? :-)

i still need to learn go for that :D

@danielmotaleite Have a look at the existing metricsets. You can probably copy / paste quite a bit of code. And you can run make create-metricset which creates all the boiler plate code for you. Happy to assists.

Hey, it doesn't look like this feature request has gone far, but I thought I'd add some insight.

PTP is now pretty much a requirement for servers running most types of transactions. Regulations (MiFID, CAT, FINRA, etc) are demanding accuracy depending on the type of transactions being processed. How likely is this feature request to be worked on?

The type of data I'm interested in from a PTP perspective is similar to the NTP data. Offset, source, one-way or round trip delay. protocol (PTP or NTP), source details if there are any.

Is it feasible to look at expanding an NTP module into something that could cater for PTP as well.

The ability to collect this data and represent it in an easily searchable format would be brilliant.

@duncaninnes Thanks for sharing some more insights here. As I don't have any experience with monitoring PTP I wonder what the standard way is to fetch this information? How similar is it to what we discussed with NTP above?

We are more then happy to accept community contributions for this.

Looking at the options briefly, it seems that what metricbeat wants is the ability to query the /proc filesystem somehow. Does that sound right? If so, it's possible that the best method might be some kind of filebeat monitoring.
Whilst these are definitely what I would describe as system level metrics, I don't believe there is any current method of listing the metrics in /proc (happy to be corrected on this). It would take some time to get these metrics into /proc, it might be better to look to filebeats at the moment.
Unless metric beat is OK with probing via executing commands. Downside is that there are several possibilities (NTP, PTP, Chrony, proprietary products). They all report back with subtle differences when their relevant 'status' commands are run.
Not sure what the best way forward here is. Might submit a bug report to start the ball rolling with some kind of support in /proc. But that WILL take a while I reckon. Especially as it will require changes to each of the timekeeping applications to write to any new /proc file structure.

I'll have a think. But any suggestions from here also welcome.

Having the data in /proc would be great as we already have some libraries fetching data from /proc But if the info is not there yet in the different distros, this will be quite an uphill battle. (not saying, it's not worth it).

How do the commands that can be executed fetch the data? Can you share some of the commands you are using with the output they provide?

True. Currently there would be a range of commands to extract time sync offset data depending on what is doing the sync. These are the commands that I use regularly (others may have quicker, better methods - happy to learn):

ntpd: ntpq -p output is then parsed to pull out the line starting with an * (if there is one). The offset is in column 9.
chrony: chronyc -c sources produces similar data to above, but in a csv output. Pull out line with * in column 2. The offset is in column 9
chrony: chrony -c tracking (alternative) I usually like to parse other data from the sources command, but if you just want to pull the current offset, chronyc tracking lists only the data from the currently synched source. Column 6 provides the last offset.
ptp: pmc script. Haven't been using this recently, but I think pmc -u -b 0 'GET CURRENT_DATA_SET' would show offsetFromMaster in the output.

On a proprietary front, I've recently been using a product called TimeKeeper which essentially uses PTP protocols. The command to extract data from there is tkstatus. Only giving this as an example for now to demonstrate that it's not a simple matter to extract the offset data. Depending on which tool is keeping your system clock synched, the query command and/or logging format will be different. That's why I wondered about a change to /proc to have a standard format for these different tools to be queried. (for the essential information at least).

It sounds like we should have a few metricsets for monitoring time. Ideally some common fields should be established so that both metricsets report report some common data. I think we should create a time module and start with these metricsets:

  • ntpq - This can be used to send queries to a host to get NTP status information.
  • chrony - chronyd has an interface that we can use to get information such as sources or tracking. The interface can be exposed via IP or unix socket.

I'm also looking for some NTP beats but was wondering if this sort of test/check might be better served as part of hearbeat beat instead of a metric beat module?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Partly to keep this topic open, I would suggest the most complete method of time offset monitoring would be a metric beat. This allows the history to be logged and extracted at a later date. Not interested in the history? Fine, delete it after a day/week. I think there is enough scope for compliance teams in financial businesses to warrant doing this in a way which can provide historical reports on time accuracy for any server in compliance scope.

The tools which already provide these services charge a LOT of money for this compliance reporting. Would be a feather in the cap of Elastic Stack to handle these issues

Pinging @elastic/integrations-services (Team:Services)

Was this page helpful?
0 / 5 - 0 ratings