Onpremise: relay unable to connect to web:9000

Created on 14 Dec 2020  路  16Comments  路  Source: getsentry/onpremise

Version Information

Sentry 20.11.1
Docker version 20.10.0, build 7287ab3
docker-compose version 1.27.4, build 40524192
5.4.0-56-generic Ubuntu 20.04

Description

Since the upgrade to Docker 20.10 i can run sentry but won't get any Issues in, as the relay Container isn't able to connect to the web container, see the logs appended.

Maybe useful additions:
I have traefik as reverse proxy on the relay and the web container. Both of them are in my proxy network and the default network so they can connect to the other containers. like this:

.....
  web:
    << : *sentry_defaults
    networks:
      - default
      - proxy
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.sentry.rule=Host(`domain`)"
      - "traefik.http.routers.sentry.middlewares=redirect-to-https"
      - "traefik.http.routers.sentry.service=sentry"
      - "traefik.http.services.sentry.loadbalancer.server.port=9000"
      - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
.....
  relay:
    .....
    networks:
      - default
      - proxy
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.sentry-relay.rule=Host(`domain`) && PathPrefix(`/api/store/`, `/api/{id:[1-9]\\d*/}`)"
      - "traefik.http.routers.sentry-relay.middlewares=redirect-to-https"
      - "traefik.http.routers.sentry-relay.service=sentry-relay"
      - "traefik.http.services.sentry-relay.loadbalancer.server.port=3000"
      - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
.....

Steps to Reproduce

  1. update docker to latest version
  2. clone this git
  3. either set 20.11.1 in .env or use nightly (no changes, both won't work)
  4. run install.sh
  5. docker-compose up -d
  6. broken relay container

Logs

2020-12-14T19:37:01Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
2020-12-14T19:37:01Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web type: AAAA class: IN
2020-12-14T19:37:01Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 60 seconds

Also things I tried:
Disable traefik -> nope
Install ping package inside relay and ping web -> works
moved all other containers named "web" to a different name -> nope

docker inspect of relay-container isn't giving me a IPv6 address as i have not setup any fancy docker networking things (didn't disable or enable v6/v4 or anything - standard network settings, just larger pools)

this is my daemon.json:

{
  "features": { "buildkit": true },
  "registry-mirrors": ["https://private.docker.mirror"],
  "default-address-pools":[
    {"base":"172.20.0.0/16","size":24},
    {"base":"172.30.0.0/16","size":24},
    {"base":"172.40.0.0/16","size":24}
  ]
}

Hope someone can help, i have no clue what to test anymore.

Let me know when i can add things that maybe help to further track down my problem.

Thanks in advance.

Docs In Progress help wanted

Most helpful comment

Downgrading resolved issue.
Host system: Ubuntu 20.04.1 LTS amd64
Docker version: 20.10.1, build 831ebea
apt-get install -y --allow-downgrades docker-ce=5:19.03.14~3-0~ubuntu-focal docker-ce-cli=5:19.03.14~3-0~ubuntu-focal

All 16 comments

Same problem on Ubuntu 20.04 / Docker 20
Downgrade docker to 19.03.14 fix the problem
Following this issue :)

Encountering the same issue. Just set up a new server and installed docker / sentry as per docs. After docker-compose up -d I see the same errors in the logs that the relay can't resolve "web" via ipv6 (AAAA record). No issues are ingested and there are no other errors visible anywhere else so the issues are silently swallowed.

I can also confirm that downgrading docker-ce to 19.03.14 as @sogos suggested fixes the problem.

I had the same problem, and downgrading Docker fixed it too, but I'm trying to investigate why this is happening and it seems very strange.

First, if I install dig in the relay container, dig web returns the correct IP address.

Second, I cannot figure out why this happens in some containers and not others: relay has this problem (in Rust, but not through dig), and one of my own applications has the same problem (in Node.js, also works with dig), but the rest of the Sentry containers such as nginx connecting to web and web connecting to the database (and my own application's containers) seem to be resolving the names with no trouble even in the newer Docker.


Here's a one-liner for testing using a container that hosts a Node.js app (or has the node executable for any other reason):

docker-compose run my-node-app node -pe 'require("dns").lookup("postgres",function(){console.dir(arguments)})'

Seems like other Projects have these Problems also.
Like Mailcow: https://github.com/mailcow/mailcow-dockerized/issues/3910

Maybe something todo with this also (hostname setting in compose)? -> https://github.com/mailcow/mailcow-dockerized/commit/1311066089d523957e5906387b852fc12242b2b9

@durnerj I don't think that https://github.com/mailcow/mailcow-dockerized/commit/1311066089d523957e5906387b852fc12242b2b9 is the same issue, since the comments there seem to be more about if the docker host machine has the same hostname as one of the internal containers (though that's a Docker 20.10 issue I ran into for my GitLab instance).

However, the comments on https://github.com/mailcow/mailcow-dockerized/issues/3893 lead me to https://github.com/moby/moby/pull/39204, which might be related to this problem.

I am seeing this issue with sentry 20.12.1 on ubuntu 20.04.1 with both docker version 19.03.14 and 20.10.1 so the issue may honestly be in the relay. I haven't changed anything after calling ./install.sh I just tried pushing an issue to it to test and nothing happened so I started digging deeper. I did also double check that dig and ping found the web container after adding the utilities to the relay container. The vps it is running on is a base ubuntu install with only https://docs.docker.com/engine/install/ubuntu/ followed to add docker.

Relay logs:

relay_1                                     | 2020-12-23T06:59:31Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                                     |   caused by: could not send request using actix-web client
relay_1                                     |   caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web type: AAAA class: IN
relay_1                                     |   caused by: Failed resolving hostname: no record found for name: web type: AAAA class: IN
relay_1                                     |   caused by: Failed resolving hostname: no record found for name: web type: AAAA class: IN
relay_1                                     | 2020-12-23T06:59:44Z [relay_server::actors::upstream] WARN: Network outage, scheduling another check in 57.665039062s

For the sake of testing I tried changing the upstream to be the actual public url of my instance and that failed with
2020-12-23T07:12:33Z [actix::actors::resolver] WARN: Can not create system dns resolver: io error

Downgrading resolved the issue as noted above. I ran these two commands so nobody else needs to go hunting for them:
apt-get remove -y docker-ce docker-ce-cli
apt-get install -y --allow-downgrades docker-ce=5:19.03.14~3-0~ubuntu-bionic docker-ce-cli=5:19.03.14~3-0~ubuntu-bionic

Downgrading resolved issue.
Host system: Ubuntu 20.04.1 LTS amd64
Docker version: 20.10.1, build 831ebea
apt-get install -y --allow-downgrades docker-ce=5:19.03.14~3-0~ubuntu-focal docker-ce-cli=5:19.03.14~3-0~ubuntu-focal

@brandinarsenault @agrevtcev so you both downgraded to Docker 19.03 to get this fixed? What was your earlier version?

@BYK What further Info is needed? This is somewhat of a blocker for my work right now.
In short: docker ~= 19.03 -> working, docker > 20.10 -> not working

Same here. Downgrading Docker from 20.10 to 19.03 on Ubuntu 20.04.1 LTS resolved the issue for me.

Problem still exists on docker 20.10.2 on ubuntu server 20.04

Downgrade works on my server also. now on docker 19.03.14, errors in the relay container are gone.

May i suggest: Add docker 20.10.x as not supported right now in the README.md

@durnerj - sorry, somehow I missed all the Docker 20.10 references above on my first read.

This is somewhat of a blocker for my work right now.
I think the new network code in Docker 20.10 needs to mature a bit more based on what I'm seeing on their issue list. I strongly recommend reporting this issue there instead of piling under here as the most we can do is to warn/block when we see Docker 20.10 for now.

We're having same issue on Ubuntu 20.04.1 with Docker version 20.10

relay_1                                     | 2021-01-10T04:30:32Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                                     |   caused by: could not send request using actix-web client
relay_1                                     |   caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web type: AAAA class: IN
relay_1                                     |   caused by: Failed resolving hostname: no record found for name: web type: AAAA class: IN
relay_1                                     |   caused by: Failed resolving hostname: no record found for name: web type: AAAA class: IN
relay_1                                     | 2021-01-10T04:30:46Z [relay_server::actors::upstream] WARN: Network outage, scheduling another check in 60s

Downgrading docker version fixed the issue!

Downgrading resolved issue.
Host system: Ubuntu 20.04.1 LTS amd64
Docker version: 20.10.1, build 831ebea
apt-get install -y --allow-downgrades docker-ce=5:19.03.14~3-0~ubuntu-focal docker-ce-cli=5:19.03.14~3-0~ubuntu-focal

I increased the debug level for the relay in relay/config.yml and found the following:

relay_1                        | 2021-01-12T17:24:40Z [actix::actors::resolver] WARN: Can not create system dns resolver: io error

I think this may help to further debug this issue.

The problem seems to have to do with the newest Docker version. When reading through the release notes the change to cgroupsv2 in the runtime seems to be a huge change which may have influence to the behaviour of the Actix resolver. So my suggestion would be to investigate, if a special capability is needed for Actix in the container to drive it's special DNS resolver? And maybe there is the option to configure Actix in a way it uses the default DNS mechanisms of the underlaying OS to do the DNS stuff?

Was this page helpful?
0 / 5 - 0 ratings