Datadog-agent: There was an error querying the ntp host

Created on 27 Mar 2018  路  21Comments  路  Source: DataDog/datadog-agent

Describe what happened:
Launched the agent and the logs look like:

[ AGENT ] 2018-03-27 18:42:06 UTC | INFO | (transaction.go:129 in Process) | Successfully posted payload to "https://6-1-0-app.agent.datadoghq.com/intake/?api_key=*************************370fe"
[ AGENT ] 2018-03-27 18:42:12 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 172.17.0.5:49888->45.76.244.202:123: i/o timeout
[ AGENT ] 2018-03-27 18:42:27 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 172.17.0.5:58747->198.137.202.56:123: i/o timeout
[ AGENT ] 2018-03-27 18:42:37 UTC | INFO | (serializer.go:196 in SendJSONToV1Intake) | Sent processes metadata payload, size: 410 bytes.
[ AGENT ] 2018-03-27 18:42:42 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 172.17.0.5:44762->38.229.71.1:123: i/o timeout
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:246 in work) | Running check cpu
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:302 in work) | Done running check cpu
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:246 in work) | Running check disk
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:302 in work) | Done running check disk
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:246 in work) | Running check docker
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:302 in work) | Done running check docker
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:246 in work) | Running check file_handle
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:302 in work) | Done running check file_handle
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:246 in work) | Running check io
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:302 in work) | Done running check io
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:246 in work) | Running check load
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:302 in work) | Done running check load
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:246 in work) | Running check memory
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:302 in work) | Done running check memory
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:246 in work) | Running check network
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:302 in work) | Done running check network
[ AGENT ] 2018-03-27 18:42:52 UTC | INFO | (runner.go:246 in work) | Running check ntp
[ AGENT ] 2018-03-27 18:42:57 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 172.17.0.5:56134->162.210.111.4:123: i/o timeout
[ AGENT ] 2018-03-27 18:42:57 UTC | INFO | (runner.go:302 in work) | Done running check ntp
[ AGENT ] 2018-03-27 18:42:57 UTC | INFO | (runner.go:246 in work) | Running check uptime
[ AGENT ] 2018-03-27 18:42:57 UTC | INFO | (runner.go:302 in work) | Done running check uptime
[ AGENT ] 2018-03-27 18:43:12 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 172.17.0.5:43058->162.210.111.4:123: i/o timeout
[ AGENT ] 2018-03-27 18:43:27 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 172.17.0.5:46164->107.161.29.207:123: i/o timeout
[ AGENT ] 2018-03-27 18:43:42 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 172.17.0.5:39099->162.210.111.4:123: i/o timeout
[ AGENT ] 2018-03-27 18:43:57 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 172.17.0.5:52643->171.66.97.126:123: i/o timeout
[ AGENT ] 2018-03-27 18:44:12 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 172.17.0.5:34427->162.210.111.4:123: i/o timeout
[ AGENT ] 2018-03-27 18:44:21 UTC | INFO | (transaction.go:129 in Process) | Successfully posted payload to "https://6-1-0-app.agent.datadoghq.com/api/v1/check_run?api_key=*************************370fe"
[ AGENT ] 2018-03-27 18:44:27 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 172.17.0.5:53556->204.11.201.10:123: i/o timeout
[ AGENT ] 2018-03-27 18:44:42 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 172.17.0.5:59500->172.98.193.44:123: i/o timeout
[ AGENT ] 2018-03-27 18:44:57 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 172.17.0.5:33902->96.244.96.19:123: i/o timeout
[ AGENT ] 2018-03-27 18:45:12 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 172.17.0.5:36223->198.137.202.56:123: i/o timeout

Additional environment details (Operating System, Cloud provider, etc):
Docker, AWS, Amazon Linux

teaagent-core

Most helpful comment

Could we look at support the ability to pass a runtime var to define an NTP host for the Docker container within AWS so we could leverage their internal endpoint (169.254.169.123)

As a number of organisations aren't always keen to allow NTP in/out of their VPC/Nat Instances.

All 21 comments

am also facing the same issue . any update on this issue ? any datadog engineer working on this ? please prioritize this

2018-04-14 06:17:25 UTC | INFO | (runner.go:246 in work) | Running check ntp
2018-04-14 06:17:30 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 25.128.37.38:37261->216.218.220.101:123: i/o timeout
2018-04-14 06:17:30 UTC | INFO | (runner.go:302 in work) | Done running check ntp
2018-04-14 06:17:30 UTC | INFO | (runner.go:246 in work) | Running check uptime
2018-04-14 06:17:30 UTC | INFO | (runner.go:302 in work) | Done running check uptime
2018-04-14 06:17:45 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 25.128.37.38:49068->162.210.111.4:123: i/o timeout
2018-04-14 06:18:00 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 25.128.37.38:57120->216.218.220.101:123: i/o timeout

am using datadog docker image datadog/agent:6.1.2-jmx as a side car container in K8s

It seems because of this time sync issue , am not seeing the jmx metrics in datadog dashboards.
https://github.com/DataDog/datadog-agent/blob/68f10761cbcc3b541f645fc4f5cefc65036c3794/pkg/collector/corechecks/network/ntp.go#L122

This is what i get when i do a datadog agent ntp check from the side car

root@something*********-hj2hx:/opt/datadog-agent/bin/agent# ./agent check ntp -r -l INFO
%!s(int=442840760) | INFO | (tagger.go:78 in Init) | starting the tagging system
%!s(int=442840760) | INFO | (runner.go:92 in NewRunner) | Runner started with 1 workers.
%!s(int=442840760) | INFO | (collector.go:51 in NewCollector) | Embedding Python 2.7.14 (default, Apr  4 2018, 16:58:02) [GCC 4.7.2]
%!s(int=442840760) | INFO | (file.go:69 in Collect) | File Configuration Provider: searching for configuration files at: /etc/datadog-agent/conf.d
%!s(int=442840760) | INFO | (file.go:69 in Collect) | File Configuration Provider: searching for configuration files at: /opt/datadog-agent/bin/agent/dist/conf.d
%!s(int=442840760) | WARN | (file.go:73 in Collect) | Skipping, open /opt/datadog-agent/bin/agent/dist/conf.d: no such file or directory
%!s(int=442840760) | WARN | (check.go:243 in Configure) | could not get a check instance with the new api: __init__() takes at least 4 arguments (4 given)
%!s(int=442840760) | WARN | (check.go:244 in Configure) | trying to instantiate the check with the old api, passing agentConfig to the constructor
%!s(int=442840760) | WARN | (check.go:269 in Configure) | passing `agentConfig` to the constructor is deprecated, please use the `get_config` function from the 'datadog_agent' package (disk).
%!s(int=442845760) | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 25.128.37.38:55411->162.210.111.4:123: i/o timeout
%!s(int=442850760) | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp 25.128.37.38:43615->192.155.90.13:123: i/o timeout
=== Service Checks ===
[
  {
    "check": "ntp.in_sync",
    "host_name": "something**********hj2hx",
    "timestamp": 1523688165,
    "status": 3,
    "message": "",
    "tags": null
  },
  {
    "check": "ntp.in_sync",
    "host_name": "something*************-7b64fd57fb-hj2hx",
    "timestamp": 1523688170,
    "status": 3,
    "message": "",
    "tags": null
  }
]
%!s(int=442850760) | ERROR | (host.go:168 in getCPUInfo) | failed to retrieve cpu info at init time
=========
Collector
=========

  Running Checks
  ==============
    ntp
    ---
      Total Runs: 2
      Metrics: 0, Total Metrics: 0
      Events: 0, Total Events: 0
      Service Checks: 1, Total Service Checks: 2

@adamgotterer @shine17 check that your firewalls/security groups allow outgoing connections on high ports (or source port 123 or target port 123).
Regular NTP makes connections from port 123 into port 123, but datadog ntp check initiates connection from high ports (example in log above: 25.128.37.38:55411->162.210.111.4:123). I had the same error logged when my firewall was restricting high ports (and only allowing 123<->123).

@pbudzon I'm on AWS with DD running on an ECS cluster. That machines running those containers have egress rules for TCP open on ports 0 - 65535. So I don't think its a security group issue.

@adamgotterer what about your Network ACL, though?

Just double checked the network and ACL and it's allow all traffic on all protocols outbound.

Remember that network acls are stateless so you need to enable traffic in
as well as out.

... why doesn't datadog just trust the host's time?

This doesn鈥檛 have much to do with trust. It鈥檚 one of the default checks
(like pulling out cpu and memory until) which validates that your system鈥檚
time didn鈥檛 drift off - you can see the time difference (between system鈥檚
time and ntp reported time) in datadog metrics, just like you can see cpu,
memory and bunch of other stuff out of the box.
If you have ntpd or similar service enabled and working on your server then
this check is usually one you can get by without, but still it鈥檚 nice to
have it. Especially if you鈥檙e doing any time-sensitive stuff on the server,
like crypto or some authentications.

I'm getting the same error. Dedicated server. Ports opened.

2018-05-15 20:55:10 UTC | INFO | (ntp.go:122 in Run) | There was an error querying the ntp host: read udp x.x.x.x:60440->64.113.44.55:123: i/o timeout

Any update on this?

on a fresh installation :

agent.log INFO UTC | INFO | (ntp.go:123 in Run) | There was an error querying the ntp host: read udp x.x.x.x:52870->138.96.64.10:123: i/o timeout

still broken.

We're experiencing this as well. This seems to be causing the dd agent to stop reporting momentarily and as a result our monitors start to fire for No Data alerts. Any ETA on the fix?

Could we look at support the ability to pass a runtime var to define an NTP host for the Docker container within AWS so we could leverage their internal endpoint (169.254.169.123)

As a number of organisations aren't always keen to allow NTP in/out of their VPC/Nat Instances.

Isn't there an option to just turn this off?

We're also having this issue, and it appears to be intermittent. The agent will suddenly recover and start posting metrics briefly, then later fail again for a while.

It looks as though there are configuration options for NTP (https://docs.datadoghq.com/integrations/ntp/#configuration).

The page reads "The Agent enables the NTP check by default, but if you want to configure the check yourself, edit the file ntp.d/conf.yaml in the conf.d/ folder at the root of your Agent鈥檚 configuration directory."

There doesn't seem to be an option to disable NTP checks (although stating that the check is enabled by default does seem to imply there is a way to turn it off), for organizations that want to keep NTP internal, NTP server locations are configurable.

Hi all,

Apologies for the late reply. As @micahsmith mentioned, you can configure the Agent's NTP check to query your own/your service provider's NTP servers by following the instructions at https://docs.datadoghq.com/integrations/ntp/#configuration.

Also, as with any other Agent check that's enabled by default, you can disable the check completely by removing the file at <agent_conf_dir>/conf.d/ntp.d/conf.yaml.default and restarting the Agent (on a standard Linux install, <agent_conf_dir> is /etc/datadog-agent).

That said, we don't recommend disabling the NTP check entirely as:
1) it allows you to know when your host's clock is skewed
2) an Agent running on a host on which the system clock is significantly skewed (approx. >60s) may report data so much in the past or future that it'll affect your graphs and monitors on Datadog.

We've implemented significant improvements to the NTP check in v6.4.0 and 6.5.0, if you're having issues with the NTP check please upgrade to Agent >= 6.5.0. If you still run into issues with the NTP check after upgrading please reach out to our support team and send them your Agent's logs.

Having this issue on K8S Deployment on AWS , version of agent 6.10.2

@ahharu Thanks for the heads up, could you send us a note via [email protected] with a flare and some details about your configuration? We could then assess the situation.

We have merged a couple PRs for the upcoming 6.14 release that should make this better. Please let us know if this is still an issue for you after upgrading.

Was this page helpful?
0 / 5 - 0 ratings