Onpremise: Relay cannot authenticate

Created on 10 May 2020  路  28Comments  路  Source: getsentry/onpremise

After running an install with the latest commits to this branch i'm getting the following error.

web_1                      | 192.168.32.20 - - [10/May/2020:09:52:05 +0000] "POST /api/0/relays/register/challenge/ HTTP/1.1" 200 663 "-" "actix-web/0.7.19"
relay_1                    | 2020-05-10T09:52:05Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: IO error: Connection reset by peer (os error 104)
relay_1                    |   caused by: IO error: Connection reset by peer (os error 104)
relay_1                    |   caused by: IO error: Connection reset by peer (os error 104)
relay_1                    |   caused by: Connection reset by peer (os error 104)

And no events are currently being processed. the relay/credentials.json exists and the correct key is white listed in the sentry/sentry.conf.py file.

I have tried deleting all the images and containers and rebuilding :/

Bug

Most helpful comment

I have been trying to figure this out the whole day, but I guess it's "simple" to fix.

 ---
 relay:
-  upstream: "http://web:9000/"
+  upstream: "http://nginx:80/"
   host: 0.0.0.0
   port: 3000
 logging:
  level: WARN
 processing:
   enabled: true
   kafka_config:
     - {name: "bootstrap.servers", value: "kafka:9092"}
   redis: redis://redis:6379

I guess the web container does not properly speak HTTP or not proper enough for relay to accept it. I can only guess why sometimes it does work, possibly because the request or response is formatted a certain way or sent fast enough or whatever for the authentication to go through and after that it doesn't seem to be an issue, but the above fixed it for me.

Would be happy to leave it like this if there are no downsides of running it like so but I assume there is a reason why nginx is not used inbetween but relay talks to the web container directly.

All 28 comments

Another case of the same error being reported https://forum.sentry.io/t/new-errors-stuck-in-relay-server/9660

Found the debug level in relay and set it to debug

relay_1                    | 2020-05-11T06:23:39Z [relay_server::actors::upstream] DEBUG: got register challenge (token = I80YKA6ST8-SLCE2hB6mKrOsmqqso5xgjdwNveOMqgeyI9MnPG-jIWYEzfDO6jMK2i7OPhKldBodRhOF9rQhjw)
relay_1                    | 2020-05-11T06:23:39Z [relay_server::actors::upstream] DEBUG: sending register challenge response
relay_1                    | 2020-05-11T06:23:39Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: IO error: Connection reset by peer (os error 104)
relay_1                    |   caused by: IO error: Connection reset by peer (os error 104)
relay_1                    |   caused by: IO error: Connection reset by peer (os error 104)
relay_1                    |   caused by: Connection reset by peer (os error 104)

@BYK Sorry for the direct poke but any chance you could help out?

@macnibblet no need to ping, we regularly monitor here. It was Sunday yesterday so I was resting a bit :)

Seems like relay cannot connect to Sentry for some reason. Sentry instance takes a while (a few minutes, sometimes more) to start accepting connections, how long have you waited for this to recover?

Sorry about that, but we had a number of crashes on a mobile app that I couldn't see. So I was a bit stressed.

Probably 10 minutes at most, but I don't understand why sentry would terminate the connection because the challenge was properly responded to. I tried to debug the sentry-relay code but my rust is rather limited but nothing I could see would cause the connection to drop. And the python code makes sense to.

but I don't understand why sentry would terminate the connection because the challenge was properly responded to

Ah, my bad I didn't read the logs carefully enough. Would you mind sharing your logs from docker-compose logs -f web here so we can try to debug? Unless relay is talking to Sentry via a proxy or something, this doesn't make too much sense.

It might be related to this change I made earlier: https://github.com/getsentry/onpremise/commit/74c0d4c257d66dc33198deefbfb84f1fb8eccf01#diff-66f7c6c9e9f860724fb428123754dbaaL156-L161

Can you also try reverting that change in the config (essentially adding back those removed lines) and see if it helps?

@BYK I don't see how any of those changes would cause a connection dropped by peer.

2020-05-11T06:06:20Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#192.168.192.10:9092 failed: Connection refused (after 1ms in state CONNECT)
2020-05-11T06:06:20Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
2020-05-11T06:06:20Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#192.168.192.10:9092 failed: Connection refused (after 0ms in state CONNECT)
2020-05-11T06:06:20Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
2020-05-11T06:06:20Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
2020-05-11T06:06:20Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
2020-05-11T06:06:21Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
2020-05-11T06:06:23Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
2020-05-11T06:06:26Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-05-11T06:06:30Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-05-11T06:06:35Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
2020-05-11T06:06:43Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-05-11T06:06:43Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: Connection reset by peer (os error 104)
2020-05-11T06:06:44Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-05-11T06:06:45Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-05-11T06:06:48Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-05-11T06:06:51Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-05-11T06:06:55Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: Connection reset by peer (os error 104)
2020-05-11T06:06:56Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-05-11T06:07:04Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-05-11T06:07:12Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: Connection reset by peer (os error 104)

But ironically it's been running for a few hours now and it's actually working. And I finally see some events coming in.

image
relay container just keep restarting all the time ,

web_1 | 03:06:51 [WARNING] sentry.utils.geo: settings.GEOIP_PATH_MMDB not configured. web_1 | 03:07:31 [INFO] sentry.plugins.github: apps-not-configured web_1 | *** Starting uWSGI 2.0.18 (64bit) on [Tue May 12 03:07:32 2020] *** web_1 | compiled with version: 8.3.0 on 09 May 2020 01:05:54 web_1 | os: Linux-4.19.76-linuxkit #1 SMP Thu Oct 17 19:31:58 UTC 2019 web_1 | nodename: 061453ba7faa web_1 | machine: x86_64 web_1 | clock source: unix web_1 | detected number of CPU cores: 2 web_1 | current working directory: / web_1 | detected binary path: /usr/local/bin/uwsgi web_1 | !!! no internal routing support, rebuild with pcre support !!! web_1 | your memory page size is 4096 bytes web_1 | detected max file descriptor number: 1048576 web_1 | lock engine: pthread robust mutexes web_1 | thunder lock: enabled web_1 | uwsgi socket 0 bound to TCP address 0.0.0.0:9000 fd 3 web_1 | Python version: 2.7.16 (default, Oct 17 2019, 07:39:30) [GCC 8.3.0] web_1 | Set PythonHome to /usr/local web_1 | Python main interpreter initialized at 0x5575510dddc0 web_1 | python threads support enabled web_1 | your server socket listen backlog is limited to 100 connections web_1 | your mercy for graceful operations on workers is 60 seconds web_1 | setting request body buffering size to 65536 bytes web_1 | mapped 543520 bytes (530 KB) for 2 cores web_1 | *** Operational MODE: threaded *** web_1 | spawned uWSGI master process (pid: 16) web_1 | spawned uWSGI worker 1 (pid: 20, cores: 2) web_1 | 03:07:34 [WARNING] sentry.utils.geo: settings.GEOIP_PATH_MMDB not configured. web_1 | 03:07:40 [INFO] sentry.plugins.github: apps-not-configured web_1 | WSGI app 0 (mountpoint='') ready in 8 seconds on interpreter 0x5575510dddc0 pid: 20 (default app)

Anyone could help, please?

@yanbinkwan That's an invalid json structure, delete the relay/credentials.json and run the install.sh again. After that check that the sentry/sentry.conf.py only contains a single entry for the WHITELIST_PK at the bottom of the file and that the key matches the new pub key in relay/credentials.json

@macnibblet Thank you for your answer but I still have something wrong when I had done what you said above.
And now in ./relay/credentials.json file said:
error: could not open config file (file /work/C:/Users/C/AppData/Local/Temp/config.yml)
caused by: No such file or directory (os error 2)

I dont understand why it use this "strange /work/C:/Users/UEC/" path to find config.yml

I'm looking forward for your answer 馃檪 thank you.

@BYK I don't see how any of those changes would cause a connection dropped by peer.

2020-05-11T06:06:20Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#192.168.192.10:9092 failed: Connection refused (after 1ms in state CONNECT)
2020-05-11T06:06:20Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
2020-05-11T06:06:20Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#192.168.192.10:9092 failed: Connection refused (after 0ms in state CONNECT)
2020-05-11T06:06:20Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
2020-05-11T06:06:20Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
2020-05-11T06:06:20Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
2020-05-11T06:06:21Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
2020-05-11T06:06:23Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web.online.net. type: AAAA class: IN
2020-05-11T06:06:26Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-05-11T06:06:30Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-05-11T06:06:35Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
2020-05-11T06:06:43Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-05-11T06:06:43Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: Connection reset by peer (os error 104)
2020-05-11T06:06:44Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-05-11T06:06:45Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-05-11T06:06:48Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-05-11T06:06:51Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-05-11T06:06:55Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: Connection reset by peer (os error 104)
2020-05-11T06:06:56Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-05-11T06:07:04Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-05-11T06:07:12Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: Connection reset by peer (os error 104)

But ironically it's been running for a few hours now and it's actually working. And I finally see some events coming in.

I have the same issues.

Logs from tcpdump in relay:

POST /api/0/relays/register/challenge/ HTTP/1.1
host: web
x-sentry-relay-id: 435456b8-a7c5-4a87-857b-979db960b3a5
x-sentry-relay-signature: Ld9kw0lCvoGRqVLBe0W3MJ2DNOfDxxrGVRcEa0dxhlgKnKmKhgRtRQzWuIuG2XE3ciCeXrDRpKVNmR42U__ODw.eyJ0IjoiMjAyMC0wNS0xNFQxMjoyMjoyMy40ODkwNDE0MDJaIn0
content-type: application/json
accept-encoding: gzip, deflate
user-agent: actix-web/0.7.19
content-length: 110
date: Thu, 14 May 2020 12:22:23 GMT

{"relay_id":"435456b8-a7c5-4a87-857b-979db960b3a5","public_key":"OlT3r7Af9n8R_Xr10-gXC1fx9FP_OT_5_t69m2H67MU"}
12:22:23.490315 IP sentry_onpremise_web_1.sentry_onpremise_default.9000 > 41f6260395ad.32786: Flags [.], ack 548, win 235, options [nop,nop,TS val 3349079587 ecr 3951920017], length 0
E..4{.@[email protected]........ #(..adHd........Xz.....
...#....
12:22:23.500503 IP sentry_onpremise_web_1.sentry_onpremise_default.9000 > 41f6260395ad.32786: Flags [P.], seq 1:664, ack 548, win 235, options [nop,nop,TS val 3349079597 ecr 3951920017], length 663
E...{.@[email protected]........ #(..adHd........[......
...-....HTTP/1.1 200 OK
Content-Length: 148
Access-Control-Allow-Headers: X-Sentry-Auth, X-Requested-With, Origin, Accept, Content-Type, Authentication, Authorization, Content-Encoding
X-Content-Type-Options: nosniff
Content-Language: en
Access-Control-Expose-Headers: X-Sentry-Error, Retry-After
Vary: Accept-Language, Cookie
X-XSS-Protection: 1; mode=block
Allow: POST, OPTIONS
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: POST, OPTIONS
Content-Type: application/json
X-Frame-Options: deny

{"token":"1nqYweXdwm3UyRdNH2btzXT_EsV4vM4CzI_fQhAKZYyYl6GWdxY1xH7limcVHt7uoZXus3v5aC5yqsDlPUBGFw","relay_id":"435456b8-a7c5-4a87-857b-979db960b3a5"}
12:22:23.500524 IP 41f6260395ad.32786 > sentry_onpremise_web_1.sentry_onpremise_default.9000: Flags [.], ack 664, win 239, options [nop,nop,TS val 3951920027 ecr 3349079597], length 0
E..4.q@.@...... ......#(....adJ.....Xz.....
.......-
12:22:23.500772 IP 41f6260395ad.32786 > sentry_onpremise_web_1.sentry_onpremise_default.9000: Flags [P.], seq 548:1132, ack 664, win 239, options [nop,nop,TS val 3951920027 ecr 3349079597], length 584
E..|.r@.@...... ......#(....adJ.....Z......
.......-POST /api/0/relays/register/response/ HTTP/1.1
host: web
x-sentry-relay-id: 435456b8-a7c5-4a87-857b-979db960b3a5
x-sentry-relay-signature: ogyLcl5BGaruud4GJzulj6pB9_ZpMRzWKDcOotIGuoeJL0LNLGL1CyN2AMUchT9cAIHbTPOK4sVgsNOycPN8CA.eyJ0IjoiMjAyMC0wNS0xNFQxMjoyMjoyMy41MDA2NDU0OTZaIn0
content-type: application/json
accept-encoding: gzip, deflate
user-agent: actix-web/0.7.19
content-length: 148
date: Thu, 14 May 2020 12:22:23 GMT

{"relay_id":"435456b8-a7c5-4a87-857b-979db960b3a5","token":"1nqYweXdwm3UyRdNH2btzXT_EsV4vM4CzI_fQhAKZYyYl6GWdxY1xH7limcVHt7uoZXus3v5aC5yqsDlPUBGFw"}
12:22:23.500953 IP sentry_onpremise_web_1.sentry_onpremise_default.9000 > 41f6260395ad.32786: Flags [R.], seq 664, ack 1132, win 244, options [nop,nop,TS val 3349079597 ecr 3951920027], length 0
E..4{.@[email protected]........ #(..adJ....>....Xz.....
...-....
12:22:26.877730 IP 41f6260395ad.32830 > sentry_onpremise_web_1.sentry_onpremise_default.9000: Flags [S], seq 1283991805, win 29200, options [mss 1460,sackOK,TS val 3951923404 ecr 0,nop,wscale 7], length 0
E..<..@[email protected]... .....>#(L.(.......r.X..........
............
12:22:26.877862 IP sentry_onpremise_web_1.sentry_onpremise_default.9000 > 41f6260395ad.32830: Flags [S.], seq 2408570509, ack 1283991806, win 28960, options [mss 1460,sackOK,TS val 3349082974 ecr 3951923404,nop,wscale 7], length 0
E..<..@[email protected]....... #(.>....L.(...q X..........
...^........
12:22:26.877892 IP 41f6260395ad.32830 > sentry_onpremise_web_1.sentry_onpremise_default.9000: Flags [.], ack 1, win 229, options [nop,nop,TS val 3951923404 ecr 3349082974], length 0
E..4..@[email protected]... .....>#(L.(.........Xz.....
.......^
12:22:26.878192 IP 41f6260395ad.32830 > sentry_onpremise_web_1.sentry_onpremise_default.9000: Flags [P.], seq 1:548, ack 1, win 229, options [nop,nop,TS val 3951923405 ecr 3349082974], length 547
E..W..@[email protected]... .....>#(L.(.........Z......

It looks like the queries are too quickly sent to the web.
In relay settings I changed relay upstream to "http://nginx/" and now relay correct register in web

Same here, but a slightly different error:

relay_1                    | 2020-05-15T12:21:28Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.19.0.6:9092 failed: Connection refused (after 20ms in state CONNECT)
relay_1                    | 2020-05-15T12:21:28Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
relay_1                    | 2020-05-15T12:21:28Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.19.0.6:9092 failed: Connection refused (after 2ms in state CONNECT)
relay_1                    | 2020-05-15T12:21:28Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
relay_1                    | 2020-05-15T12:21:28Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: Failed to connect to host: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    | 2020-05-15T12:21:28Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: Failed to connect to host: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)

The SENTRY_RELAY_WHITELIST_PK key contains the public key of the Relay credentials.json.

EDIT: False alert. In debug mode, you can see that everything works
DEBUG: relay successfully registered with upstream

I have the same issues too

@macnibblet Thank you for your answer but I still have something wrong when I had done what you said above.
And now in ./relay/credentials.json file said:
error: could not open config file (file /work/C:/Users/C/AppData/Local/Temp/config.yml)
caused by: No such file or directory (os error 2)

I dont understand why it use this "strange /work/C:/Users/UEC/" path to find config.yml

I'm looking forward for your answer 馃檪 thank you.

I have also encountered this problem. Have you solved it

@liangdiyuan that's a different problem, please file a new issue.

@macnibblet does the proposed PR fix this for you?

@BYK I'll give it a test spin and see.

@BYK Still seeing the same errors, takes a while for it to handle the authentication properly.

@macnibblet

takes a while for it to handle the authentication properly.

This is expected. What I mean is whether you keep having connection reset errors or not.

Still seeing this 4 hours later, if that's what you are asking.

2020-05-26T09:05:13Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: IO error: Connection reset by peer (os error 104)
  caused by: Connection reset by peer (os error 104)

Same problem for me after fresh install


relay_1 | [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
relay_1 | [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
relay_1 |   caused by: failed to fetch project state from upstream
relay_1 | relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1 |   caused by: IO error: Connection reset by peer (os error 104)
relay_1 |   caused by: IO error: Connection reset by peer (os error 104)
relay_1 |   caused by: IO error: Connection reset by peer (os error 104)
relay_1 |   caused by: Connection reset by peer (os error 104)
relay_1 | [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated

It's been like this for hours, which should be enough to finish authentication
Checked /etc/resolv.conf too, that's all normal and containers from other projects can communicate without problems

After for about a day worth of waiting it started to work out of blue 馃憤
I hope it doesn't break anymore 馃構

I am still seeing the same errors, it sometimes fixes itself in a few minutes, but most of the times it doesn't and it request me to babysit and restart it until it magically comes up... I am not sure what is even happening.

Commit of onpremise at the time of running ./install.sh: 01bec9999612d82328d789e64325d1369aeac4c2, Docker 19.03.10 on Ubuntu 16.04.

Output from docker-compose logs relay

xxx@xxx:~/onpremise# docker-compose restart relay
Restarting sentry_onpremise_relay_1 ... done
xxx@xxx:~/onpremise# docker-compose logs -f relay
Attaching to sentry_onpremise_relay_1
relay_1                    | 2020-06-01T12:00:27Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.20.0.10:9092 failed: Connection refused (after 1ms in state CONNECT)
relay_1                    | 2020-06-01T12:00:27Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
relay_1                    | 2020-06-01T12:00:27Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.20.0.10:9092 failed: Connection refused (after 0ms in state CONNECT)
relay_1                    | 2020-06-01T12:00:27Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
relay_1                    | 2020-06-01T12:00:27Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: Failed to connect to host: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    | 2020-06-01T12:00:27Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: Failed to connect to host: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    | 2020-06-01T12:00:28Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: Failed to connect to host: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    | 2020-06-01T12:00:29Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: Failed to connect to host: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    | 2020-06-01T12:00:32Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: Failed to connect to host: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    |   caused by: Connection refused (os error 111)
relay_1                    | 2020-06-01T12:00:33Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
relay_1                    | 2020-06-01T12:00:34Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
relay_1                    | 2020-06-01T12:00:35Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
relay_1                    | 2020-06-01T12:00:38Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
relay_1                    | 2020-06-01T12:00:38Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: IO error: Connection reset by peer (os error 104)
relay_1                    |   caused by: IO error: Connection reset by peer (os error 104)
relay_1                    |   caused by: IO error: Connection reset by peer (os error 104)
relay_1                    |   caused by: Connection reset by peer (os error 104)

And the relevant output with DEBUG as log level

relay_1                    | 2020-06-01T20:09:18Z [relay_server::actors::upstream] INFO: registering with upstream (http://web:9000/)
relay_1                    | 2020-06-01T20:09:18Z [relay_server::actors::upstream] DEBUG: got register challenge (token = <redacted>)
relay_1                    | 2020-06-01T20:09:18Z [relay_server::actors::upstream] DEBUG: sending register challenge response
relay_1                    | 2020-06-01T20:09:18Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                    |   caused by: IO error: Connection reset by peer (os error 104)
relay_1                    |   caused by: IO error: Connection reset by peer (os error 104)
relay_1                    |   caused by: IO error: Connection reset by peer (os error 104)
relay_1                    |   caused by: Connection reset by peer (os error 104)
relay_1                    | 2020-06-01T20:09:18Z [relay_server::actors::upstream] DEBUG: scheduling authentication retry in 11 seconds

I have been trying to figure this out the whole day, but I guess it's "simple" to fix.

 ---
 relay:
-  upstream: "http://web:9000/"
+  upstream: "http://nginx:80/"
   host: 0.0.0.0
   port: 3000
 logging:
  level: WARN
 processing:
   enabled: true
   kafka_config:
     - {name: "bootstrap.servers", value: "kafka:9092"}
   redis: redis://redis:6379

I guess the web container does not properly speak HTTP or not proper enough for relay to accept it. I can only guess why sometimes it does work, possibly because the request or response is formatted a certain way or sent fast enough or whatever for the authentication to go through and after that it doesn't seem to be an issue, but the above fixed it for me.

Would be happy to leave it like this if there are no downsides of running it like so but I assume there is a reason why nginx is not used inbetween but relay talks to the web container directly.

I encountered the same problem锛孊ecause I modified the default 9000 port of the site

Still not fixed but will be fixed by #578 hopefully.

This still occurs inn latest dev version 20.x.x.

relay_1                        | 2020-09-01T00:54:43Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                        |   caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web.uhlhost.net. type: AAAA class: IN
relay_1                        |   caused by: Failed resolving hostname: no record found for name: web.uhlhost.net. type: AAAA class: IN
relay_1                        |   caused by: Failed resolving hostname: no record found for name: web.uhlhost.net. type: AAAA class: IN
relay_1                        | 2020-09-01T00:54:43Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                        |   caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web.uhlhost.net. type: AAAA class: IN
relay_1                        |   caused by: Failed resolving hostname: no record found for name: web.uhlhost.net. type: AAAA class: IN
relay_1                        |   caused by: Failed resolving hostname: no record found for name: web.uhlhost.net. type: AAAA class: IN
relay_1                        | 2020-09-01T00:54:45Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                        |   caused by: Failed to connect to host: Connection refused (os error 111)
relay_1                        |   caused by: Connection refused (os error 111)
relay_1                        |   caused by: Connection refused (os error 111)
relay_1                        | 2020-09-01T00:54:47Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                        |   caused by: Failed to connect to host: Connection refused (os error 111)
relay_1                        |   caused by: Connection refused (os error 111)
relay_1                        |   caused by: Connection refused (os error 111)
relay_1                        | 2020-09-01T00:54:49Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
relay_1                        |   caused by: Failed to connect to host: Connection refused (os error 111)
relay_1                        |   caused by: Connection refused (os error 111)
relay_1                        |   caused by: Connection refused (os error 111)

I am getting the same issues as above, with the latest commits from master with onpremise.

sentry.exceptions.InvalidConfiguration: Error 111 connecting to 127.0.0.1:6379. Connection refused.
An error occurred, caught SIGERR on line 312
%3|1599345511.957|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.5:9092 failed: Connection refused (after 1ms in state CONNECT)
%3|1599345512.956|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2020-09-05 22:38:32,956 Connection to Kafka failed (attempt 0)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 58, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
%3|1599345513.958|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.5:9092 failed: Connection refused (after 0ms in state CONNECT)
%3|1599345514.958|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2020-09-05 22:38:34,959 Connection to Kafka failed (attempt 1)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 58, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
%3|1599345515.961|FAIL|rdkafka#producer-3| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.5:9092 failed: Connection refused (after 0ms in state CONNECT)
%3|1599345516.961|FAIL|rdkafka#producer-3| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2020-09-05 22:38:36,962 Connection to Kafka failed (attempt 2)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 58, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
%3|1599345517.964|FAIL|rdkafka#producer-4| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.5:9092 failed: Connection refused (after 0ms in state CONNECT)

Any help would be appreciated, thank you.

@phobos-dthorga this seem like different issue, related to Kafka being down or unreachable by Snuba. Please use the forum to make sure your Kafka instance works properly before filing a new issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

6qiongtao picture 6qiongtao  路  4Comments

multinerd picture multinerd  路  5Comments

giggsey picture giggsey  路  3Comments

jellevanhees picture jellevanhees  路  3Comments

meriturva picture meriturva  路  6Comments