Onpremise: snuba.clickhouse.errors.ClickhouseError while upgrading

Created on 5 Mar 2020  路  19Comments  路  Source: getsentry/onpremise

...
...
...

Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 98, in bootstrap
    clickhouse_rw.execute("SELECT 1")
  File "/usr/src/snuba/snuba/clickhouse/native.py", line 72, in execute
    raise ClickhouseError(e.code, e.message) from e
snuba.clickhouse.errors.ClickhouseError: [210] Name or service not known (clickhouse:9000)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/connection.py", line 226, in connect
    self.socket = self._create_socket()
  File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/connection.py", line 194, in _create_socket
    for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
  File "/usr/local/lib/python3.7/socket.py", line 752, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/snuba/snuba/clickhouse/native.py", line 65, in execute
    result = conn.execute(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/client.py", line 196, in execute
    self.connection.force_connect()
  File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/connection.py", line 171, in force_connect
    self.connect()
  File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/connection.py", line 251, in connect
    '{} ({})'.format(e.strerror, self.get_description())
clickhouse_driver.errors.NetworkError: Code: 210. Name or service not known (clickhouse:9000)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/snuba", line 11, in <module>
    load_entry_point('snuba', 'console_scripts', 'snuba')()
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 98, in bootstrap
    clickhouse_rw.execute("SELECT 1")
  File "/usr/src/snuba/snuba/clickhouse/native.py", line 72, in execute
    raise ClickhouseError(e.code, e.message) from e
snuba.clickhouse.errors.ClickhouseError: [210] Name or service not known (clickhouse:9000)
Cleaning up...

Full log: sentry_install_log-2020-03-05_18-19-32.txt

Needs More Information

All 19 comments

Any suggestions guys?

Seems like you are having a network issue with your Docker setup as all cross-service communications seem to be failing.

I run sudo ./install.sh as always. Don鈥檛 know where the problem can be.

Now when I run sudo docker-compose up -d all containers are up as always and Clickhouse container is always in Restarting state.

@maximal why are you using sudo for this? That might be the issue. I'd try deleting and recreating the sentry-clickhouse volume if you don't have any data to lose. Otherwise it might be permission issues on the volume. Hard to tell without seeing the logs from Clickhouse.

@BYK, I run docker as usual.

Without sudo it doesn鈥檛 work (at least you shoud do some preparing with users and groups):

$ ./install.sh 
Checking minimum requirements...
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/version: dial unix /var/run/docker.sock: connect: permission denied
Cleaning up...


$ docker-compose up -d
ERROR: Couldn't connect to Docker daemon at http+docker://localhost - is it running?

If it's at a non-standard location, specify the URL with the DOCKER_HOST environment variable.

Clickhouse logs:

$ sudo docker-compose logs -f
...
clickhouse_1              | Include not found: clickhouse_remote_servers
clickhouse_1              | Include not found: clickhouse_compression
clickhouse_1              | Logging trace to /var/log/clickhouse-server/clickhouse-server.log
clickhouse_1              | Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
clickhouse_1              | Include not found: clickhouse_remote_servers
clickhouse_1              | Include not found: clickhouse_compression
clickhouse_1              | Logging trace to /var/log/clickhouse-server/clickhouse-server.log
clickhouse_1              | Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
clickhouse_1              | Include not found: clickhouse_remote_servers
clickhouse_1              | Include not found: clickhouse_compression
clickhouse_1              | Logging trace to /var/log/clickhouse-server/clickhouse-server.log
clickhouse_1              | Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
...

I'm quite sure there's something wrong with your Docker setup and your Clickhouse image so closing this issue. The newer versions of Docker should automatically setup the necessary permissions so you should not be needing sudo to run any of these commands.

Okay I take it back. I still think you should not be needing sudo but a quick Google search reveals that when you have IPv6 enabled, Clickhouse may throw weird errors like this. Do you have IPv6 enabled by any chance?

Also Clickhouse issue list suggest these warnings are harmless: https://github.com/ClickHouse/ClickHouse/issues/12

Yes, IPv6 enabled. And I have the latest Docker install. Always ran docker with sudo and there were no problems.

Anyway, seems like Clickhouse volume or container are corrupted. What kind of data will I lose, after deleting them and rerunning ./install.sh?

You would lose all your event data so I wouldn't recommend deleting it if you have anything important inside.

How did you determine the volume was corrupted? You can test this by changing the docker-compose.yml file to map clickhouse to a different volume and see if it helps without deleting your old volume.

I have installed Docker by official documentation in latest Debian. It requires sudo to run.

With clean volume, installation script and bringing containers up work just fine (except completely empty event data).

Seems like we won't find any reason why the volume got corrupted as there are no clues in logs at all.
For now, I'm just staying with a clean new volume.

I suggest closing the issue. Thanks for your time!

@BYK, is it for sure that event information not stored anywhere except Clickhouse?

After I created an empty Clickhouse volume, I do not accept any new issues, and there many similar errors in the log:

postgres_1                | ERROR:  duplicate key value violates unique constraint "sentry_eventuser_project_id_377a63c0_uniq"
postgres_1                | DETAIL:  Key (project_id, hash)=(2, d58d80d19131986ebac5ecd5d4842140) already exists.
postgres_1                | STATEMENT:  INSERT INTO "sentry_eventuser" ("project_id", "hash", "ident", "email", "username", "name", "ip_address", "date_added") VALUES (2, 'd58d80d19131986ebac5ecd5d4842140', NULL, NULL, NULL, NULL, '66.66.66.66'::inet, '2020-03-19T13:21:58.458521+00:00'::timestamptz) RETURNING "sentry_eventuser"."id

Also, continuously repeating errors:

worker_1                  | 13:29:58 [INFO] sentry.tasks.update_user_reports: update_user_reports.records_updated (reports_with_event=0 updated_reports=0 reports_to_update=0)

snuba-cleanup_1           | 2020-03-19 13:30:03,502 Dropped 0 partitions on clickhouse

snuba-consumer_1          | + set -- snuba consumer --auto-offset-reset=latest --max-batch-time-ms 750
snuba-consumer_1          | + set gosu snuba snuba consumer --auto-offset-reset=latest --max-batch-time-ms 750
snuba-consumer_1          | + exec gosu snuba snuba consumer --auto-offset-reset=latest --max-batch-time-ms 750
snuba-consumer_1          | 2020-03-19 13:30:20,244 New partitions assigned: {Partition(topic=Topic(name='events'), index=0): 306898}
snuba-consumer_1          | Traceback (most recent call last):
snuba-consumer_1          |   File "/usr/local/bin/snuba", line 11, in <module>
snuba-consumer_1          |     load_entry_point('snuba', 'console_scripts', 'snuba')()
snuba-consumer_1          |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
snuba-consumer_1          |     return self.main(*args, **kwargs)
snuba-consumer_1          |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
snuba-consumer_1          |     rv = self.invoke(ctx)
snuba-consumer_1          |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
snuba-consumer_1          |     return _process_result(sub_ctx.command.invoke(sub_ctx))
snuba-consumer_1          |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
snuba-consumer_1          |     return ctx.invoke(self.callback, **ctx.params)
snuba-consumer_1          |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
snuba-consumer_1          |     return callback(*args, **kwargs)
snuba-consumer_1          |   File "/usr/src/snuba/snuba/cli/consumer.py", line 156, in consumer
snuba-consumer_1          |     consumer.run()
snuba-consumer_1          |   File "/usr/src/snuba/snuba/utils/streams/batching.py", line 137, in run
snuba-consumer_1          |     self._run_once()
snuba-consumer_1          |   File "/usr/src/snuba/snuba/utils/streams/batching.py", line 145, in _run_once
snuba-consumer_1          |     msg = self.consumer.poll(timeout=1.0)
snuba-consumer_1          |   File "/usr/src/snuba/snuba/utils/streams/kafka.py", line 688, in poll
snuba-consumer_1          |     return super().poll(timeout)
snuba-consumer_1          |   File "/usr/src/snuba/snuba/utils/streams/kafka.py", line 402, in poll
snuba-consumer_1          |     raise ConsumerError(str(error))
snuba-consumer_1          | snuba.utils.streams.consumer.ConsumerError: KafkaError{code=OFFSET_OUT_OF_RANGE,val=1,str="Broker: Offset out of range"}

sentry_onpremise_snuba-consumer_1 exited with code 1

@maximal I'd try clearing Kafka and Zookeeper volumes too.

@BYK, okay, I鈥檒l try to recreate those. Where are projects and their settings stored then? In postgres volume, I guess?

Yes, that's correct. So they should stay.

Removing only Kafka and Zookeeper volumes didn鈥檛 help. Had to remove these:

docker volume rm sentry-clickhouse sentry-data sentry-kafka sentry-redis sentry-zookeeper sentry_onpremise_sentry-kafka-log sentry_onpremise_sentry-smtp-log sentry_onpremise_sentry-zookeeper-log

And then it went fine.

Wow, that's a lot of volumes. Not sure what went so wrong but glad that you got it working.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nature1995 picture nature1995  路  4Comments

adrielliu picture adrielliu  路  3Comments

rmisyurev picture rmisyurev  路  4Comments

6qiongtao picture 6qiongtao  路  4Comments

eandersons picture eandersons  路  5Comments