Datadog-agent: Panic inside latest datadog-agent:6 container running on ECS

Created on 4 Feb 2020  路  10Comments  路  Source: DataDog/datadog-agent

Starting a little over an hour ago our new tasks in ECS (which have the datadog agent container included in them as an essential container) started exiting right after startup. After enabling logs in the datadog agent container, I can see that there's a panic in the datadog agent code.

Looking on docker hub, I see that the containers were updated at about the same time, so it must be related to the latest :6 container image

Logs from a recent run:

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 01-check-apikey.sh: executing... 
[cont-init.d] 01-check-apikey.sh: exited 0.
[cont-init.d] 50-cri.sh: executing... 
[cont-init.d] 50-cri.sh: exited 0.
[cont-init.d] 50-ecs.sh: executing... 
[cont-init.d] 50-ecs.sh: exited 0.
[cont-init.d] 50-eks.sh: executing... 
[cont-init.d] 50-eks.sh: exited 0.
[cont-init.d] 50-kubernetes.sh: executing... 
[cont-init.d] 50-kubernetes.sh: exited 0.
[cont-init.d] 50-mesos.sh: executing... 
[cont-init.d] 50-mesos.sh: exited 0.
[cont-init.d] 51-docker.sh: executing... 
[cont-init.d] 51-docker.sh: exited 0.
[cont-init.d] 59-defaults.sh: executing... 
[cont-init.d] 59-defaults.sh: exited 0.
[cont-init.d] 60-network-check.sh: executing... 
[cont-init.d] 60-network-check.sh: exited 0.
[cont-init.d] 89-copy-customfiles.sh: executing... 
[cont-init.d] 89-copy-customfiles.sh: exited 0.
[cont-init.d] done.
[services.d] starting services
starting process-agent
starting agent
starting system-probe
starting trace-agent
[services.d] done.
2020-02-04 17:50:14 UTC | SYS-PROBE | INFO | (pkg/process/config/config.go:272 in loadConfigIfExists) | no config exists at /etc/datadog-agent/system-probe.yaml, ignoring...
2020-02-04 17:50:14 UTC | SYS-PROBE | INFO | (pkg/process/config/config.go:422 in loadEnvVariables) | overriding API key from env DD_API_KEY value
2020-02-04 17:50:14 UTC | SYS-PROBE | INFO | (cmd/system-probe/main.go:86 in main) | system probe not enabled. exiting.
2020-02-04 17:50:14 UTC | PROCESS | INFO | (pkg/util/containers/collectors/detector.go:122 in retryCandidates) | Collector ecs_fargate successfully detected
2020-02-04 17:50:14 UTC | PROCESS | INFO | (pkg/util/containers/collectors/detector.go:84 in GetPreferred) | Using collector ecs_fargate
2020-02-04 17:50:14 UTC | PROCESS | INFO | (pkg/process/config/config.go:422 in loadEnvVariables) | overriding API key from env DD_API_KEY value
2020-02-04 17:50:14 UTC | PROCESS | INFO | (pkg/process/config/config.go:272 in loadConfigIfExists) | no config exists at /etc/datadog-agent/system-probe.yaml, ignoring...
2020-02-04 17:50:14 UTC | PROCESS | INFO | (pkg/process/config/config.go:422 in loadEnvVariables) | overriding API key from env DD_API_KEY value
2020-02-04 17:50:14 UTC | CORE | INFO | (cmd/agent/app/run.go:181 in StartAgent) | Starting Datadog Agent v6.17.0
2020-02-04 17:50:14 UTC | CORE | INFO | (cmd/agent/app/run.go:212 in StartAgent) | Hostname is: 
2020-02-04 17:50:14 UTC | CORE | INFO | (pkg/api/security/security.go:135 in FetchAuthToken) | Saved a new authentication token to /etc/datadog-agent/auth_token
2020-02-04 17:50:14 UTC | TRACE | INFO | (pkg/util/log/log.go:482 in func1) | Loaded configuration: /etc/datadog-agent/datadog.yaml
2020-02-04 17:50:14 UTC | TRACE | INFO | (pkg/tagger/tagger.go:152 in tryCollectors) | ecs_fargate tag collector successfully started
2020-02-04 17:50:14 UTC | TRACE | INFO | (pkg/trace/agent/run.go:118 in Run) | Trace agent running on host ip-10-0-31-197.ec2.internal
2020-02-04 17:50:14 UTC | TRACE | INFO | (pkg/trace/api/api.go:153 in Start) | Listening for traces at http://0.0.0.0:8126
2020-02-04 17:50:14 UTC | PROCESS | INFO | (main_common.go:106 in runAgent) | running on platform: linux-4.14.152-127.182.amzn2.x86_64-x86_64-with-debian-bullseye-sid
2020-02-04 17:50:14 UTC | PROCESS | INFO | (main_common.go:109 in runAgent) | running version: Version: 6.17.0, Git hash: 6eb25b4, Git branch: HEAD, Build date: 2020-02-03T22:42:00, Go Version: 1.10.1, 
2020-02-04 17:50:14 UTC | PROCESS | INFO | (pkg/tagger/tagger.go:152 in tryCollectors) | ecs_fargate tag collector successfully started
2020-02-04 17:50:14 UTC | PROCESS | INFO | (pkg/process/checks/container.go:45 in Init) | no network ID detected: could not detect network ID
2020-02-04 17:50:14 UTC | PROCESS | INFO | (collector.go:121 in run) | Starting process-agent for host=fargate_task:arn:aws:ecs:<redacted>, endpoints=[https://process.datadoghq.com], orchestrator endpoints=[https://orchestrator.datadoghq.com], enabled checks=[container rtcontainer]
2020-02-04 17:50:14 UTC | PROCESS | INFO | (collector.go:102 in runCheck) | Finished container check #1 in 3.985301ms
2020-02-04 17:50:14 UTC | CORE | INFO | (cmd/agent/app/run.go:239 in StartAgent) | GUI server port -1 specified: not starting the GUI.
2020-02-04 17:50:14 UTC | CORE | INFO | (pkg/forwarder/forwarder.go:171 in Start) | Forwarder started, sending to 1 endpoint(s) with 1 worker(s) each: "https://6-17-0-app.agent.datadoghq.com" (1 api key(s))
2020-02-04 17:50:14 UTC | CORE | INFO | (pkg/dogstatsd/listeners/udp.go:94 in Listen) | dogstatsd-udp: starting to listen on 127.0.0.1:8125
2020-02-04 17:50:14 UTC | CORE | INFO | (pkg/logs/input/container/launcher.go:52 in NewLauncher) | Could not setup the docker launcher: permanent failure in docker Launcher setup: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2020-02-04 17:50:14 UTC | CORE | INFO | (pkg/logs/input/container/launcher.go:59 in NewLauncher) | Could not setup the kubernetes launcher: /var/log/pods not found
2020-02-04 17:50:14 UTC | CORE | INFO | (pkg/logs/input/container/launcher.go:68 in NewLauncher) | Container logs won't be collected unless a docker daemon is eventually started
2020-02-04 17:50:14 UTC | CORE | INFO | (pkg/logs/logs.go:73 in Start) | Starting logs-agent...
2020-02-04 17:50:14 UTC | CORE | ERROR | (pkg/logs/auditor/auditor.go:159 in recoverRegistry) | open /opt/datadog-agent/run/registry.json: no such file or directory
2020-02-04 17:50:14 UTC | CORE | INFO | (pkg/logs/logs.go:76 in Start) | logs-agent started
2020-02-04 17:50:14 UTC | CORE | INFO | (pkg/tagger/tagger.go:152 in tryCollectors) | ecs_fargate tag collector successfully started
2020-02-04 17:50:14 UTC | CORE | INFO | (pkg/collector/runner/runner.go:92 in NewRunner) | Runner started with 4 workers.
2020-02-04 17:50:14 UTC | CORE | INFO | (pkg/collector/python/init.go:284 in Initialize) | Initializing rtloader with python2 /opt/datadog-agent/embedded
2020-02-04 17:50:15 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:120 in LogMessage) | - | (ddyaml.py:123) | monkey patching yaml.load...
2020-02-04 17:50:15 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:120 in LogMessage) | - | (ddyaml.py:127) | monkey patching yaml.load_all...
2020-02-04 17:50:15 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:120 in LogMessage) | - | (ddyaml.py:131) | monkey patching yaml.dump_all... (affects all yaml dump operations)
2020-02-04 17:50:15 UTC | CORE | INFO | (pkg/collector/collector.go:57 in NewCollector) | Embedding Python 2.7.17 (default, Feb  3 2020, 22:38:47) [GCC 4.7.2]
2020-02-04 17:50:15 UTC | CORE | INFO | (cmd/agent/common/autoconfig.go:71 in SetupAutoConfig) | Registering ecs config provider polled every 10s
2020-02-04 17:50:15 UTC | CORE | INFO | (pkg/autodiscovery/autoconfig.go:351 in initListenerCandidates) | ecs listener successfully started
2020-02-04 17:50:15 UTC | CORE | INFO | (pkg/autodiscovery/providers/file.go:74 in Collect) | file: searching for configuration files at: /etc/datadog-agent/conf.d
2020-02-04 17:50:15 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/ecs.go:180 in createService) | Error getting check names from docker labels on container faabb61daa2eccc16d2a5437e41eb4742bcb738f0d9ef7361f4334123024a965: Cannot parse check names: unexpected end of JSON input
2020-02-04 17:50:15 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/ecs.go:180 in createService) | Error getting check names from docker labels on container 30acf2d647d85f75e06f48b89fbc727ecd96525e957f8d855f8c155659eb53f7: Cannot parse check names: unexpected end of JSON input
2020-02-04 17:50:15 UTC | CORE | INFO | (pkg/logs/scheduler/scheduler.go:71 in Schedule) | Received a new service: docker://faabb61daa2eccc16d2a5437e41eb4742bcb738f0d9ef7361f4334123024a965
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x60 pc=0x1a7453c]
goroutine 158 [running]:
github.com/DataDog/datadog-agent/vendor/github.com/docker/docker/client.(*Client).ContainerList(0x0, 0x3b05100, 0xc0000c4010, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0004a70e0, ...)
    /.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/vendor/github.com/docker/docker/client/container_list.go:38 +0x38c
github.com/DataDog/datadog-agent/pkg/logs/input/docker.GetContainer(0x0, 0xc0002f2d29, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/logs/input/docker/client.go:72 +0x19f
github.com/DataDog/datadog-agent/pkg/logs/input/docker.(*Launcher).run(0xc0004905a0)
    /.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/logs/input/docker/launcher.go:150 +0x738
created by github.com/DataDog/datadog-agent/pkg/logs/input/docker.(*Launcher).Start
    /.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/logs/input/docker/launcher.go:127 +0x3f
AGENT EXITED WITH CODE 2, SIGNAL 0, KILLING CONTAINER
2020-02-04 17:50:15 UTC | PROCESS | CRITICAL | (pkg/process/util/signal_nowindows.go:21 in HandleSignals) | Caught signal 'terminated'; terminating.
2020-02-04 17:50:15 UTC | TRACE | INFO | (main.go:23 in handleSignal) | received signal 15 (terminated)
2020-02-04 17:50:15 UTC | TRACE | INFO | (pkg/trace/agent/agent.go:127 in loop) | Exiting...
2020-02-04 17:50:15 UTC | TRACE | INFO | (pkg/trace/stats/concentrator.go:108 in Run) | Exiting concentrator, computing remaining stats
process-agent exited with code 0, disabling
system-probe exited with code 256, signal 15, restarting in 2 seconds
trace-agent exited with code 0, disabling

[deprecated] tealogs componenautodiscovery componenlogs kinbug teacontainers

Most helpful comment

@magichair we're working on pointing the latest/7/7-jmx tags to 7.16.1/7.16.1-jmx, and 6/6-jmx to 6.16.1/6.16.1-jmx, will post an update here once it's done

All 10 comments

Confirmed that pinning to datadog/agent:6.16.1 resolves the issue. FWIW here's the logs of the startup of the 6.16.1 agent for comparison:

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 01-check-apikey.sh: executing... 
[cont-init.d] 01-check-apikey.sh: exited 0.
[cont-init.d] 50-cri.sh: executing... 
[cont-init.d] 50-cri.sh: exited 0.
[cont-init.d] 50-ecs.sh: executing... 
[cont-init.d] 50-ecs.sh: exited 0.
[cont-init.d] 50-kubernetes.sh: executing... 
[cont-init.d] 50-kubernetes.sh: exited 0.
[cont-init.d] 50-mesos.sh: executing... 
[cont-init.d] 50-mesos.sh: exited 0.
[cont-init.d] 51-docker.sh: executing... 
[cont-init.d] 51-docker.sh: exited 0.
[cont-init.d] 59-defaults.sh: executing... 
[cont-init.d] 59-defaults.sh: exited 0.
[cont-init.d] 60-network-check.sh: executing... 
[cont-init.d] 60-network-check.sh: exited 0.
[cont-init.d] 89-copy-customfiles.sh: executing... 
[cont-init.d] 89-copy-customfiles.sh: exited 0.
[cont-init.d] done.
[services.d] starting services
starting system-probe
starting process-agent
starting agent
starting trace-agent
[services.d] done.
2020-02-04 19:19:11 UTC | SYS-PROBE | INFO | (pkg/process/config/config.go:237 in loadConfigIfExists) | no config exists at /etc/datadog-agent/system-probe.yaml, ignoring...
2020-02-04 19:19:11 UTC | SYS-PROBE | INFO | (pkg/process/config/config.go:381 in loadEnvVariables) | overriding API key from env DD_API_KEY value
2020-02-04 19:19:11 UTC | SYS-PROBE | INFO | (cmd/system-probe/main.go:86 in main) | system probe not enabled. exiting.
2020-02-04 19:19:11 UTC | PROCESS | INFO | (pkg/util/containers/collectors/detector.go:122 in retryCandidates) | Collector ecs_fargate successfully detected
2020-02-04 19:19:11 UTC | PROCESS | INFO | (pkg/util/containers/collectors/detector.go:84 in GetPreferred) | Using collector ecs_fargate
2020-02-04 19:19:11 UTC | PROCESS | INFO | (pkg/process/config/config.go:381 in loadEnvVariables) | overriding API key from env DD_API_KEY value
2020-02-04 19:19:11 UTC | PROCESS | INFO | (pkg/process/config/config.go:237 in loadConfigIfExists) | no config exists at /etc/datadog-agent/system-probe.yaml, ignoring...
2020-02-04 19:19:11 UTC | PROCESS | INFO | (pkg/process/config/config.go:381 in loadEnvVariables) | overriding API key from env DD_API_KEY value
2020-02-04 19:19:11 UTC | CORE | INFO | (cmd/agent/app/run.go:178 in StartAgent) | Starting Datadog Agent v6.16.1
2020-02-04 19:19:11 UTC | CORE | INFO | (cmd/agent/app/run.go:206 in StartAgent) | Hostname is: 
2020-02-04 19:19:11 UTC | CORE | INFO | (pkg/api/security/security.go:135 in FetchAuthToken) | Saved a new authentication token to /etc/datadog-agent/auth_token
2020-02-04 19:19:11 UTC | TRACE | INFO | (pkg/util/log/log.go:482 in func1) | Loaded configuration: /etc/datadog-agent/datadog.yaml
2020-02-04 19:19:11 UTC | PROCESS | INFO | (main_common.go:106 in runAgent) | running on platform: linux-4.14.152-127.182.amzn2.x86_64-x86_64-with-debian-10.2
2020-02-04 19:19:11 UTC | PROCESS | INFO | (main_common.go:109 in runAgent) | running version: Version: 6.16.1, Git hash: 02e0969, Git branch: HEAD, Build date: 2019-12-31T19:05:18, Go Version: 1.10.1, 
2020-02-04 19:19:11 UTC | PROCESS | INFO | (pkg/tagger/tagger.go:152 in tryCollectors) | ecs_fargate tag collector successfully started
2020-02-04 19:19:12 UTC | CORE | INFO | (cmd/agent/app/run.go:233 in StartAgent) | GUI server port -1 specified: not starting the GUI.
2020-02-04 19:19:12 UTC | CORE | INFO | (pkg/forwarder/forwarder.go:156 in Start) | Forwarder started, sending to 1 endpoint(s) with 1 worker(s) each: "https://6-16-1-app.agent.datadoghq.com" (1 api key(s))
2020-02-04 19:19:12 UTC | CORE | INFO | (pkg/dogstatsd/listeners/udp.go:79 in Listen) | dogstatsd-udp: starting to listen on 127.0.0.1:8125
2020-02-04 19:19:12 UTC | CORE | INFO | (pkg/logs/input/container/launcher.go:51 in NewLauncher) | Could not setup the docker launcher: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2020-02-04 19:19:12 UTC | CORE | INFO | (pkg/logs/input/container/launcher.go:58 in NewLauncher) | Could not setup the kubernetes launcher: /var/log/pods not found
2020-02-04 19:19:12 UTC | CORE | INFO | (pkg/logs/input/container/launcher.go:61 in NewLauncher) | Container logs won't be collected
2020-02-04 19:19:12 UTC | CORE | INFO | (pkg/logs/logs.go:73 in Start) | Starting logs-agent...
2020-02-04 19:19:12 UTC | CORE | ERROR | (pkg/logs/auditor/auditor.go:159 in recoverRegistry) | open /opt/datadog-agent/run/registry.json: no such file or directory
2020-02-04 19:19:12 UTC | CORE | INFO | (pkg/logs/logs.go:76 in Start) | logs-agent started
2020-02-04 19:19:12 UTC | CORE | INFO | (pkg/tagger/tagger.go:152 in tryCollectors) | ecs_fargate tag collector successfully started
2020-02-04 19:19:12 UTC | TRACE | INFO | (pkg/tagger/tagger.go:152 in tryCollectors) | ecs_fargate tag collector successfully started
2020-02-04 19:19:12 UTC | TRACE | INFO | (pkg/trace/agent/run.go:109 in Run) | Trace agent running on host ip-10-0-33-93.ec2.internal
2020-02-04 19:19:12 UTC | TRACE | INFO | (pkg/trace/api/api.go:151 in Start) | Listening for traces at http://0.0.0.0:8126
2020-02-04 19:19:12 UTC | PROCESS | INFO | (pkg/process/checks/container.go:45 in Init) | no network ID detected: could not detect network ID
2020-02-04 19:19:12 UTC | PROCESS | INFO | (collector.go:116 in run) | Starting process-agent for host=fargate_task:arn:aws:ecs:<redacted>, endpoints=[https://process.datadoghq.com], enabled checks=[container rtcontainer]
2020-02-04 19:19:12 UTC | PROCESS | INFO | (collector.go:101 in runCheck) | Finished container check #1 in 1.867671ms
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/util/cloudprovider.go:41 in DetectCloudProvider) | Cloud provider AWS detected
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/collector/runner/runner.go:92 in NewRunner) | Runner started with 4 workers.
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/collector/python/init.go:284 in Initialize) | Initializing rtloader with python2 /opt/datadog-agent/embedded
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:120 in LogMessage) | - | (ddyaml.py:123) | monkey patching yaml.load...
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:120 in LogMessage) | - | (ddyaml.py:127) | monkey patching yaml.load_all...
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:120 in LogMessage) | - | (ddyaml.py:131) | monkey patching yaml.dump_all... (affects all yaml dump operations)
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/collector/collector.go:57 in NewCollector) | Embedding Python 2.7.17 (default, Dec 31 2019, 19:02:15) [GCC 4.7.2]
2020-02-04 19:19:13 UTC | CORE | INFO | (cmd/agent/common/autoconfig.go:71 in SetupAutoConfig) | Registering ecs config provider polled every 10s
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/autodiscovery/autoconfig.go:351 in initListenerCandidates) | ecs listener successfully started
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/autodiscovery/providers/file.go:74 in Collect) | file: searching for configuration files at: /etc/datadog-agent/conf.d
2020-02-04 19:19:13 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/ecs.go:179 in createService) | Error getting check names from docker labels on container 932a63e3be23d900613b5af4586f8ee94d87a7ab8d6de24e116d48ef836fbfe7: Cannot parse check names: unexpected end of JSON input
2020-02-04 19:19:13 UTC | CORE | ERROR | (pkg/autodiscovery/listeners/ecs.go:179 in createService) | Error getting check names from docker labels on container 46a2c3c31f7621d533edcb41824532c6c6ea65c9745a51178ae295d0be0cda32: Cannot parse check names: unexpected end of JSON input
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/logs/scheduler/scheduler.go:71 in Schedule) | Received a new service: docker://932a63e3be23d900613b5af4586f8ee94d87a7ab8d6de24e116d48ef836fbfe7
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/logs/scheduler/scheduler.go:71 in Schedule) | Received a new service: docker://46a2c3c31f7621d533edcb41824532c6c6ea65c9745a51178ae295d0be0cda32
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/autodiscovery/providers/file.go:74 in Collect) | file: searching for configuration files at: /opt/datadog-agent/bin/agent/dist/conf.d
2020-02-04 19:19:13 UTC | CORE | WARN | (pkg/autodiscovery/providers/file.go:78 in Collect) | Skipping, open /opt/datadog-agent/bin/agent/dist/conf.d: no such file or directory
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/autodiscovery/providers/file.go:74 in Collect) | file: searching for configuration files at: 
2020-02-04 19:19:13 UTC | CORE | WARN | (pkg/autodiscovery/providers/file.go:78 in Collect) | Skipping, open : no such file or directory
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/collector/scheduler/scheduler.go:77 in Enter) | Scheduling check ecs_fargate with an interval of 15s
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/metadata/host/host.go:161 in getNetworkMeta) | could not get network metadata: could not detect network ID
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/serializer/serializer.go:283 in SendMetadata) | Sent metadata payload, size (raw/compressed): 2586/1150 bytes.
2020-02-04 19:19:13 UTC | CORE | INFO | (pkg/forwarder/transaction.go:193 in Process) | Successfully posted payload to "https://6-16-1-app.agent.datadoghq.com/intake/?api_key=*************************<redacted>", the agent will only log transaction success every 500 transactions
2020-02-04 19:19:14 UTC | CORE | INFO | (pkg/collector/runner/runner.go:261 in work) | Running check ecs_fargate
2020-02-04 19:19:14 UTC | CORE | INFO | (pkg/collector/runner/runner.go:327 in work) | Done running check ecs_fargate
system-probe exited with code 0, disabling
2020-02-04 19:19:22 UTC | TRACE | INFO | (pkg/trace/info/stats.go:108 in LogStats) | [lang:go lang_version:1.12.7 interpreter:gc-amd64-linux tracer_version:v1.20.1] -> traces received: 14, traces filtered: 0, traces amount: 95866 bytes, events extracted: 0, events sampled: 0

Thanks @ryanschneider for reporting the issue and confirming pinning 6.16.1 resolves it.

For anyone else reading this, please pin the datadog/agent image to 6.16.1 (if you're using tag 6) or 7.16.1 (if you're using tag 7 or latest) while we investigate and fix this.

I believe the issue is introduced in this commit: https://github.com/DataDog/datadog-agent/commit/ae3bb2291f7a0223bbc7c6ec87f914f8c976d956#diff-2325533c5d014118fdb0b16567c0b316

I don't see any guards in place around l.cli which might still be nil if setup hasn't completed yet.

That looks correct @ryanschneider, good find, a guard on l.cli here: https://github.com/DataDog/datadog-agent/blob/1c76b8381a195a0b0f629011a6225e936fe1d37a/pkg/logs/input/docker/launcher.go#L150

should solve the nil dereference panic. With only a guard, I believe #4536 would not completely meet its intended purpose though, as it would miss the logs of all the containers that start before the agent is able to connect to the docker socket.

The team that maintains this part of the code is based in Europe, they'll be able to follow up on this tomorrow. Thanks again for looking into this.

The team that maintains this part of the code is based in Europe, they'll be able to follow up on this tomorrow. Thanks again for looking into this.

@olivielpeau Is there any way in the meantime to push a tag change to the latest and other 7/6/7-jmx/6-jmx tags to point back to "last known good" 7.16.1/6.16.1 versions?

@magichair we're working on pointing the latest/7/7-jmx tags to 7.16.1/7.16.1-jmx, and 6/6-jmx to 6.16.1/6.16.1-jmx, will post an update here once it's done

@magichair The tag update is done:

  • latest/7/latest-jmx/7-jmx tags point to 7.16.1/7.16.1-jmx again
  • latest-py2/6/latest-py2-jmx/6-jmx tags point to 6.16.1/6.16.1-jmx again

The "pinned" tags of 7.17.0/6.17.0 are still available and can be safely used for users who are not affected by this issue.

We'll follow up with an actual fix to the bug, sorry for the trouble and thanks for chiming in!

@ryanschneider : Are you using ECS on EC2 or ECS on Fargate ?
Also, can you share the task definition that contains the agent ? (after having concealed the API key and other secrets ?)
I would need to reproduce this issue to check if there are other problems behind the one spotted in the above stacktrace.

@L3n41c we are using ECS Fargate. Here's a copy of the task JSON that was failing, with <redacted> replacing anything potentially sensitive. Some things to note:

  • As noted, task is running in ECS Fargate. As such, we run both the app container and the datadog-agent container in the same task.
  • We use AWS Parameter Store for the datadog API key via ECS secrets injection
  • The DD_AGENT_HOST = 0.0.0.0 param was added during debugging, but the issue was occurring both with and without that environment variable.

Task JSON

{
  "ipcMode": null,
  "executionRoleArn": "arn:aws:iam::<redacted>",
  "containerDefinitions": [
    {
      "dnsSearchDomains": null,
      "logConfiguration": {
        "logDriver": "awslogs",
        "secretOptions": null,
        "options": {
          "awslogs-group": "<redacted>",
          "awslogs-region": "<redacted>",
          "awslogs-stream-prefix": "<redacted>"
        }
      },
      "entryPoint": [
        "/<redacted>"
      ],
      "portMappings": [
        {
          "hostPort": 8080,
          "protocol": "tcp",
          "containerPort": 8080
        },
        {
          "hostPort": 6060,
          "protocol": "tcp",
          "containerPort": 6060
        },
        {
          "hostPort": 9090,
          "protocol": "tcp",
          "containerPort": 9090
        }
      ],
      "command": [
        "<redacted>", "<redacted>"
      ],
      "linuxParameters": null,
      "cpu": 0,
      "environment": [
        {
          "name": "DD_AGENT_HOST",
          "value": "0.0.0.0"
        }
      ],
      "resourceRequirements": null,
      "ulimits": [
        {
          "name": "nofile",
          "softLimit": 131072,
          "hardLimit": 131072
        }
      ],
      "dnsServers": null,
      "mountPoints": [],
      "workingDirectory": null,
      "secrets": [
        {
          "valueFrom": "arn:aws:ssm:<redacted>",
          "name": "DD_API_KEY"
        }
      ],
      "dockerSecurityOptions": null,
      "memory": null,
      "memoryReservation": null,
      "volumesFrom": [],
      "stopTimeout": null,
      "image": "<redacted>.dkr.ecr.us-east-1.amazonaws.com/<redacted>/<redacted>:<redacted>",
      "startTimeout": null,
      "firelensConfiguration": null,
      "dependsOn": null,
      "disableNetworking": null,
      "interactive": null,
      "healthCheck": null,
      "essential": true,
      "links": null,
      "hostname": null,
      "extraHosts": null,
      "pseudoTerminal": null,
      "user": null,
      "readonlyRootFilesystem": null,
      "dockerLabels": {
        "PROMETHEUS_EXPORTER_PORT": "9090"
      },
      "systemControls": null,
      "privileged": null,
      "name": "<redacted>"
    },
    {
      "dnsSearchDomains": null,
      "logConfiguration": {
        "logDriver": "awslogs",
        "secretOptions": null,
        "options": {
          "awslogs-group": "<redacted>",
          "awslogs-region": "<redacted>",
          "awslogs-stream-prefix": "<redacted>"
        }
      },
      "entryPoint": null,
      "portMappings": [],
      "command": null,
      "linuxParameters": null,
      "cpu": 0,
      "environment": [
        {
          "name": "DD_APM_ENABLED",
          "value": "true"
        },
        {
          "name": "DD_APM_ENV",
          "value": "dev"
        },
        {
          "name": "DD_APM_NON_LOCAL_TRAFFIC",
          "value": "true"
        },
        {
          "name": "DD_CMD_PORT",
          "value": "6001"
        },
        {
          "name": "DD_LOGS_ENABLED",
          "value": "true"
        },
        {
          "name": "ECS_FARGATE",
          "value": "true"
        }
      ],
      "resourceRequirements": null,
      "ulimits": null,
      "dnsServers": null,
      "mountPoints": [],
      "workingDirectory": null,
      "secrets": [
        {
          "valueFrom": "arn:aws:ssm:us-east-1:<redacted>",
          "name": "DD_API_KEY"
        }
      ],
      "dockerSecurityOptions": null,
      "memory": null,
      "memoryReservation": 256,
      "volumesFrom": [],
      "stopTimeout": null,
      "image": "datadog/agent:6",
      "startTimeout": null,
      "firelensConfiguration": null,
      "dependsOn": null,
      "disableNetworking": null,
      "interactive": null,
      "healthCheck": null,
      "essential": true,
      "links": null,
      "hostname": null,
      "extraHosts": null,
      "pseudoTerminal": null,
      "user": null,
      "readonlyRootFilesystem": null,
      "dockerLabels": null,
      "systemControls": null,
      "privileged": null,
      "name": "datadog-agent"
    }
  ],
  "placementConstraints": [],
  "memory": "4096",
  "taskRoleArn": "arn:aws:iam::<redacted>:role/<redacted>",
  "compatibilities": [
    "EC2",
    "FARGATE"
  ],
  "taskDefinitionArn": "arn:aws:ecs:us-east-1:<redacted>:task-definition/<redacted>:<redacted>",
  "family": "<redacted>",
  "requiresAttributes": [
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.execution-role-awslogs"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.ecr-auth"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.21"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.task-iam-role"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.execution-role-ecr-pull"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.secrets.ssm.environment-variables"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.task-eni"
    }
  ],
  "pidMode": null,
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "networkMode": "awsvpc",
  "cpu": "2048",
  "revision": 24,
  "status": "ACTIVE",
  "inferenceAccelerators": null,
  "proxyConfiguration": null,
  "volumes": []
}

Also, in case it's relevant, the app is using the Golang APM SDK gopkg.in/DataDog/dd-trace-go.v1 v1.20.1 to send traces, and we are currently _not_ collecting logs on DataDog for this task.

Was this page helpful?
0 / 5 - 0 ratings