Onpremise: install.sh fails

Created on 18 Nov 2020  路  11Comments  路  Source: getsentry/onpremise

Version Information

Previous Version: 20.10.1
New Version: 20.11.14468076

Description

After trying to upgrade to the latest nightly i've got some Errors during execution of install.sh.
Currently compose is able to start but Sentry throws "There was an error loading data."
Postgres and worker logs show that atleast sentry_release.status column is missing.

I'm dont know if Kafka is the problem which led to the migrations not running or the migrations having errors.

Steps to Reproduce

  1. Execute install.sh

Logs

Install.sh log:

...

Docker images built.
Creating network "sentry_onpremise_default" with the default driver
Bootstrapping and migrating Snuba...
Creating sentry_onpremise_clickhouse_1 ... 
Creating sentry_onpremise_redis_1      ... 
Creating sentry_onpremise_zookeeper_1  ... 
Creating sentry_onpremise_zookeeper_1  ... done
Creating sentry_onpremise_kafka_1      ... 
Creating sentry_onpremise_redis_1      ... done
Creating sentry_onpremise_clickhouse_1 ... done
Creating sentry_onpremise_kafka_1      ... done
+ '[' b = - ']'
+ snuba bootstrap --help
+ set -- snuba bootstrap --force
+ set gosu snuba snuba bootstrap --force
+ exec gosu snuba snuba bootstrap --force
%3|1605706319.866|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.29.0.5:9092 failed: Connection refused (after 2ms in state CONNECT)
%3|1605706320.864|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.29.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2020-11-18 13:32:00,864 Connection to Kafka failed (attempt 0)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 55, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
%3|1605706321.868|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.29.0.5:9092 failed: Connection refused (after 0ms in state CONNECT)
%3|1605706322.868|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.29.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2020-11-18 13:32:02,869 Connection to Kafka failed (attempt 1)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 55, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
%3|1605706323.872|FAIL|rdkafka#producer-3| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.29.0.5:9092 failed: Connection refused (after 1ms in state CONNECT)
%3|1605706324.872|FAIL|rdkafka#producer-3| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.29.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2020-11-18 13:32:04,873 Connection to Kafka failed (attempt 2)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 55, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
%3|1605706325.880|FAIL|rdkafka#producer-4| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.29.0.5:9092 failed: Connection refused (after 0ms in state CONNECT)
%3|1605706326.875|FAIL|rdkafka#producer-4| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.29.0.5:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2020-11-18 13:32:06,882 Connection to Kafka failed (attempt 3)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 55, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
%3|1605706327.885|FAIL|rdkafka#producer-5| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.29.0.5:9092 failed: Connection refused (after 1ms in state CONNECT)
2020-11-18 13:32:08,885 Connection to Kafka failed (attempt 4)
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 55, in bootstrap
    client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
2020-11-18 13:32:10,443 Failed to create topic outcomes
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'outcomes' already exists."}
2020-11-18 13:32:10,445 Failed to create topic events
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'events' already exists."}
2020-11-18 13:32:10,445 Failed to create topic errors-replacements
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'errors-replacements' already exists."}
2020-11-18 13:32:10,446 Failed to create topic event-replacements
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'event-replacements' already exists."}
2020-11-18 13:32:10,446 Failed to create topic snuba-commit-log
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'snuba-commit-log' already exists."}
2020-11-18 13:32:10,447 Failed to create topic cdc
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'cdc' already exists."}
2020-11-18 13:32:10,447 Failed to create topic ingest-sessions
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'ingest-sessions' already exists."}
Traceback (most recent call last):
  File "/usr/local/bin/snuba", line 33, in <module>
    sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 98, in bootstrap
    Runner().run_all(force=True)
  File "/usr/src/snuba/snuba/migrations/runner.py", line 132, in run_all
    pending_migrations = self._get_pending_migrations()
  File "/usr/src/snuba/snuba/migrations/runner.py", line 257, in _get_pending_migrations
    raise MigrationInProgress(migration_key)
snuba.migrations.errors.MigrationInProgress: transactions: 0008_transactions_add_timestamp_index
Cleaning up...

docker-compose logs Traceback:

worker_1                   | Traceback (most recent call last):
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/celery/app/trace.py", line 412, in trace_task
worker_1                   |     R = retval = fun(*args, **kwargs)
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/celery/app/trace.py", line 704, in __protected_call__
worker_1                   |     return self.run(*args, **kwargs)
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/sentry_sdk/integrations/celery.py", line 186, in _inner
worker_1                   |     reraise(*exc_info)
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/sentry_sdk/integrations/celery.py", line 181, in _inner
worker_1                   |     return f(*args, **kwargs)
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/sentry/tasks/base.py", line 48, in _wrapped
worker_1                   |     result = func(*args, **kwargs)
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/sentry/tasks/store.py", line 870, in save_event
worker_1                   |     _do_save_event(cache_key, data, start_time, event_id, project_id, **kwargs)
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/sentry/tasks/store.py", line 784, in _do_save_event
worker_1                   |     project_id, assume_normalized=True, start_time=start_time, cache_key=cache_key
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/sentry/utils/metrics.py", line 193, in inner
worker_1                   |     return f(*args, **kwargs)
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/sentry/event_manager.py", line 317, in save
worker_1                   |     _get_or_create_release_many(jobs, projects)
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/sentry/utils/metrics.py", line 193, in inner
worker_1                   |     return f(*args, **kwargs)
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/sentry/event_manager.py", line 579, in _get_or_create_release_many
worker_1                   |     date_added=release_date_added[(project_id, version)],
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/sentry/models/release.py", line 204, in get_or_create
worker_1                   |     return cls._get_or_create_impl(project, version, date_added, metric_tags)
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/sentry/models/release.py", line 225, in _get_or_create_impl
worker_1                   |     projects=project,
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/django/db/models/query.py", line 250, in __iter__
worker_1                   |     self._fetch_all()
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/django/db/models/query.py", line 1121, in _fetch_all
worker_1                   |     self._result_cache = list(self._iterable_class(self))
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/django/db/models/query.py", line 53, in __iter__
worker_1                   |     results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
worker_1                   |   File "/usr/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 899, in execute_sql
worker_1                   |     raise original_exception
worker_1                   | ProgrammingError: UndefinedColumn('column sentry_release.status does not exist\nLINE 1: ...elease"."id", "sentry_release"."organization_id", "sentry_re...\n                                                             ^\n',)
Question

Most helpful comment

You need to manually reset that migration by running this on the snuba container first:
snuba migrations reverse --group transactions --migration-id 0008_transactions_add_timestamp_index --force

Then upgrade ClickHouse and retry.

All 11 comments

Seems like your upgrade stalled and you now skipped Sentry migrations, which explains the issues. I'll ping @lynnagara to look into the Snuba migration issue.

Hi @LuckyType Can you confirm the version of ClickHouse you are using is 20.3.9.70 as defined here? https://github.com/getsentry/onpremise/blob/a717c11a2554474c7ba8637ebba89750061c2a2f/docker-compose.yml#L106
The migration that stalled uses a feature that was not turned on by default in some prior versions of ClickHouse.

I've got the same issue here; how do we recover from this?

Our clickhouse image:

 image: 'yandex/clickhouse-server:19.17'

how do we recover from this?

Use this repo which already has the correct Clickhouse version set:

https://github.com/getsentry/onpremise/blob/bd284d0b7f764f03b5d447405743a88e4c196b03/docker-compose.yml#L106

@BYK, thanks for the quick response. A couple of questions which are really unclear to me:

  1. Our clickhouse database is in a failed state at present; install.sh does not attempt to recover from that state. How do we resolve that without losing data?
  2. How do we move to that repo and keep our settings?

@mcdurdin since you are not using a setup that we know, it is very hard to assist you with those questions.

If you were using this repo, I'd say just backup your data and config volumes and try stuff.

@BYK, we're using the standard onpremise setup, following your instructions on this repository, from this version of https://github.com/getsentry/onpremise/blob/fb125a1e4c40701b32f974f6eb2c46a05ca2cd78/docker-compose.yml with the following changes:

--- a/BASE/docker-compose.yml
+++ b/HEAD/docker-compose.yml
@@ -170,18 +170,18 @@ services:
     << : *sentry_defaults
     # Increase `--commit-batch-size 1` below to deal with high-load environments.
     command: run post-process-forwarder --commit-batch-size 1
-  sentry-cleanup:
-    << : *sentry_defaults
-    image: sentry-cleanup-onpremise-local
-    build:
-      context: ./cron
-      args:
-        BASE_IMAGE: 'sentry-onpremise-local'
-    command: '"0 0 * * * gosu sentry sentry cleanup --days $SENTRY_EVENT_RETENTION_DAYS"'
+#  sentry-cleanup:
+#    << : *sentry_defaults
+#    image: sentry-cleanup-onpremise-local
+#    build:
+#      context: ./cron
+#      args:
+#        BASE_IMAGE: 'sentry-onpremise-local'
+#    command: '"0 0 * * * gosu sentry sentry cleanup --days $SENTRY_EVENT_RETENTION_DAYS"'
   nginx:
     << : *restart_policy
     ports:
-      - '9000:80/tcp'
+      - '80:80/tcp'
     image: 'nginx:1.16'
     volumes:
       - type: bind

Does that provide the information you need?

@mcdurdin then just upgrade to the latest one and run ./install.sh?

You should probably keep the cleanup job btw and you can now control the port to be bound via SENTRY_BIND env variable so your changes should no longer be needed.

You need to manually reset that migration by running this on the snuba container first:
snuba migrations reverse --group transactions --migration-id 0008_transactions_add_timestamp_index --force

Then upgrade ClickHouse and retry.

@lynnagara, that resolved the issue for me, thank you. For other readers, I ran the following command:

docker-compose run --rm snuba-api migrations reverse --group transactions --migration-id 0008_transactions_add_timestamp_index --force

Thanks for the quick response.
@lynnagara suggestion worked flawlessly for me, thanks to @mcdurdin for the run command.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wodCZ picture wodCZ  路  5Comments

kh0r picture kh0r  路  5Comments

rmisyurev picture rmisyurev  路  4Comments

meriturva picture meriturva  路  6Comments

dotconnor picture dotconnor  路  6Comments