hello,
is it possible to disable auto-detection of the following integrations:
1) azure
2) kubernetes
3) ec2/aws
And/or enable, if any, DigitalOcean or Hetzner-specific config.
thanks in advance
=== Running ECS Metadata availability diagnosis ===
[ERROR] diagnoseECS: could not detect ECS agent, tried URLs: [http://localhost:51678/] - 1519993513987227441
===> FAIL
=== Running ECS Fargate Metadata availability diagnosis ===
[ERROR] diagnoseFargate: Get http://169.254.170.2/v2/metadata: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) - 1519993514488407297
===> FAIL
=== Running EC2 Metadata availability diagnosis ===
[ERROR] diagnose: unable to fetch EC2 API, status code 404 trying to fetch http://169.254.169.254/latest/meta-data/hostname - 1519993514494255403
===> FAIL
=== Running GCE Metadata availability diagnosis ===
[ERROR] diagnose: unable to retrieve hostname from GCE: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname - 1519993514499312741
===> FAIL
=== Running Kubelet availability diagnosis ===
[DEBUG] init: Cannot connect: Get https://hostname:10250/: dial tcp 127.0.1.1:10250: getsockopt: connection refused, trying trough http - 1519993514693030344
[DEBUG] init: Cannot connect: Get http://hostname:10255/: dial tcp 127.0.1.1:10255: getsockopt: connection refused - 1519993514693273206
[DEBUG] GetKubeUtil: Init error: temporary failure in kubeutil, will retry later: cannot connect: https: "Get https://hostname:10250/: dial tcp 127.0.1.1:10250: getsockopt: connection refused", http: "Get http://hostname:10255/: dial tcp 127.0.1.1:10255: getsockopt: connection refused" - 1519993514693317355
[ERROR] diagnose: temporary failure in kubeutil, will retry later: cannot connect: https: "Get https://hostname:10250/: dial tcp 127.0.1.1:10250: getsockopt: connection refused", http: "Get http://hostname:10255/: dial tcp 127.0.1.1:10255: getsockopt: connection refused" - 1519993514693356683
===> FAIL
=== Running Kubernetes API Server availability diagnosis ===
[DEBUG] connect: using autoconfiguration - 1519993514693398380
[DEBUG] GetAPIClient: init error: temporary failure in apiserver, will retry later: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined - 1519993514693427645
[ERROR] diagnose: temporary failure in apiserver, will retry later: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined - 1519993514693439616
===> FAIL
=== Running Azure Metadata availability diagnosis ===
[ERROR] diagnose: Azure HostAliases: unable to query metadata endpoint: status code 404 trying to GET http://169.254.169.254/metadata/instance/compute/vmId?api-version=2017-04-02&format=text - 1519993514699412954
===> FAIL
Hi @n0mer
Are you facing any issue related to azure or aws? The agent doesn't have any azure or aws integration (integration in the sense of checks: https://github.com/DataDog/integrations-core ). The only logic related to these platforms is the hostname detection, and if you're not running on these platforms it shouldn't cause any issue. If it did, I would be interested in reading your agent logs (please open a support ticket to send logs securely to us).
Disabling kubernetes logic is as simple as not passing the KUBERNETES environment variable to the agent's container.
If you faced a specific issue, please describe it here along with how you deploy the agent, and we'll help you with it.
oh my bad I hadn't refreshed the page and missed your past message. This output is from the diagnose command which checks connectivity with various components that the agent can interact with.
These failures are expected if you don't run on all of these platforms, and can be safely ignored.
@hkaj thanks for explanation, i suspected that those are error messages so i have to react to them somehow.
Anyway, is it possible to tell agent explicitly that i'm NOT on azure, gce, kubernetes env, ec2 or ecs?
By default it knows it's not on kubernetes or ecs, unless you explicitly pass options to enable these. Regarding cloud providers (azure, gce, aws) it tries to query their local metadata api and once that failed it stops trying to guess the cloud provider.
These failures only show up when you run ./agent diagnose right? This command is explicitly for debugging purpose to troubleshoot connectivity issues. Failures are fine.
@hkaj it is also in agent.log, so i'm wondering whether smth is broken :-/
Thanks (again) for explanation, and sorry for bothering - this agent v6 is a nut to crack.
Oh, this is a problem. These logs shouldn't be in agent.log. How do you install the agent? Do you have any custom log option, and did you run the flare command at any point?
i updated agent from v5 to v6:
https://github.com/DataDog/datadog-agent/issues/1381#issuecomment-369896507
here is snippet from agent.log (with debug turned on):
2018-03-02 15:19:20 CET | DEBUG | (kubelet.go:439 in init) | Cannot connect: Get https://hostname:10250/: dial tcp 127.0.1.1:10250: getsockopt: connection refused, trying trough http
2018-03-02 15:19:20 CET | DEBUG | (kubelet.go:445 in init) | Cannot connect: Get http://hostname:10255/: dial tcp 127.0.1.1:10255: getsockopt: connection refused
2018-03-02 15:19:20 CET | DEBUG | (kubelet.go:87 in GetKubeUtil) | Init error: temporary failure in kubeutil, will retry later: cannot connect: https: "Get https://hostname:10250/: dial tcp 127.0.1.1:10250: getsockopt: connection refused", ht
tp: "Get http://hostname:10255/: dial tcp 127.0.1.1:10255: getsockopt: connection refused"
2018-03-02 15:19:20 CET | DEBUG | (tagger.go:145 in tryCollectors) | will retry kubelet later: temporary failure in kubeutil, will retry later: cannot connect: https: "Get https://hostname:10250/: dial tcp 127.0.1.1:10250: getsockopt: connect
ion refused", http: "Get http://hostname:10255/: dial tcp 127.0.1.1:10255: getsockopt: connection refused"
2018-03-02 15:19:20 CET | DEBUG | (kubelet.go:87 in GetKubeUtil) | Init error: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2018-03-02 15:19:20 CET | DEBUG | (tagger.go:145 in tryCollectors) | will retry kube-service-collector later: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2018-03-02 15:19:20 CET | INFO | (tagger.go:151 in tryCollectors) | docker tag collector successfully started
2018-03-02 15:19:20 CET | DEBUG | (tagger.go:149 in tryCollectors) | ecs_fargate tag collector cannot start: Failed to connect to task metadata API, ECS tagging will not work
2018-03-02 15:19:20 CET | DEBUG | (tagger.go:149 in tryCollectors) | ecs tag collector cannot start: cannot find ECS agent
2018-03-02 15:24:48 CET | DEBUG | (kubelet.go:87 in GetKubeUtil) | Init error: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2018-03-02 15:24:48 CET | DEBUG | (tagger.go:145 in tryCollectors) | will retry kubelet later: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2018-03-02 15:24:48 CET | DEBUG | (kubelet.go:87 in GetKubeUtil) | Init error: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2018-03-02 15:24:48 CET | DEBUG | (tagger.go:145 in tryCollectors) | will retry kube-service-collector later: temporary failure in kubeutil, will retry later: try delay not elapsed yet
relevant options from datadog.yaml
$ cat datadog.yaml | grep log
````
```yaml
disable_file_logging: false
log_format_json: false
log_level: debug
log_payloads: false
log_to_console: true
log_to_syslog: false
logging_frequency: 20
logs_config:
dd_url: intake.logs.datadoghq.com
logs_enabled: true
logset: ""
syslog_pem: ""
syslog_rfc: false
syslog_tls: false
syslog_uri: ""
This is for metadata collection, and will stop happening after a few retries. They're debug logs, you can ignore them.
@hkaj so this metadata collection cannot be disabled, right?
It disables itself after a few retries, but yeah you can't disable it manually.