Onpremise: Kafka starts with 50 partitions bogging down system resources

Created on 19 May 2020  路  14Comments  路  Source: getsentry/onpremise

For some reason Kafka starts with 50 partitions, which takes unnecessary system resources

kafka-topics --describe events --zookeeper zookeeper:2181

Topic:__consumer_offsets    PartitionCount:50    ReplicationFactor:1    Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
    Topic: __consumer_offsets    Partition: 0    Leader: 1001    Replicas: 1001    Isr: 1001
    Topic: __consumer_offsets    Partition: 1    Leader: 1001    Replicas: 1001    Isr: 1001
...

50 partitions is extremely huge. Some systems are allocating 1 partition for 1k req/sec as their default. Which is relatively good. I doupt anyone will have 50k error requests incoming every second and runs Sentry off with docker-compoe

Since with docker-compose all partitions run on the same host it's just a waste of system resources to run so many partitions on single host

It would be nice to scale down default number of partitions to some. Or is there some exposed setting to start less of them (didn't find it from doc anywhere)

More reading: https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster/

_Running it on double the minimum requirements EC2 instance, recieving < 7k errors per day (evenly spread), which makes it about 0.08 req/sec, but 100% of RAM and CPU is constantly allocated by it, which makes server unresponsive and crashing constantly_

Enhancement

Most helpful comment

You can easily configure default number of partitions which will be applied for internal topic __consumer_offsets. Set up the env. variable for example KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS: '2'.

You can change the log rotation same way too - KAFKA_LOG_RETENTION_HOURS: '24'.

All 14 comments

Oh wow, this would probably explain the huge disk usage (except for the 7-day retention that people reported). Thanks for bringing this up.

Do you have a number in mind for an average user and would you be willing to submit a PR since this was your idea anyway (so I don't take credit for it in commits 馃檪 )

/cc @fpacifici @mattrobenolt

I'm not familiar enough with Sentry to pick it up, I'd be more than happy if someone not me could contribute regarding to it :slightly_smiling_face:

Wild guess is that 1-6 partitions should be more than enough for most use-cases. But I'm not too familiar how events are produced and consumed in Sentry

Since Kafka supports only increasing partitions without any dirty hacks (like downscaling requires) I'd suggest to start with some low number, like 1 or 2.
I hope most applications are stable enough so that they don't generate hundreds or thousands of errors per sec :wink:

Yeah, just 1 is going to be fine for most on-premise installs. If not all.

I hope most applications are stable enough so that they don't generate hundreds or thousands of errors per sec 馃槈

You鈥檇 uhh, be surprised. But this rate would be difficult to maintain with onpremise without a lot of work. Kafka needing more partitions won鈥檛 be the first issue.

Any tips how users can experiment with tuning this? I also didn't see anything obvious in the config, is it hard coded somewhere?

is there anything we can do to fix it?

I've created this script (which would run every nigh) as a workaround for this issue:

#!/bin/bash

docker-compose down --volumes
docker volume rm sentry-kafka sentry-zookeeper
./install.sh
docker-compose up -d

You can easily configure default number of partitions which will be applied for internal topic __consumer_offsets. Set up the env. variable for example KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS: '2'.

You can change the log rotation same way too - KAFKA_LOG_RETENTION_HOURS: '24'.

@BYK any chance you can give us some tips how correctly optimise Kafka configuration for Sentry usage in mind?

I've created this script (which would run every nigh) as a workaround for this issue:

#!/bin/bash

docker-compose down --volumes
docker volume rm sentry-kafka sentry-zookeeper
./install.sh
docker-compose up -d

i'm getting

ERROR: Volume sentry-kafka declared as external, but could not be found. Please create the volume manually using docker volume create --name=sentry-kafka and try again.

after i run your script

@sckoh as @NullIsNot0 mentioned, install.sh script creates the volume, I forgot to mention that the script should be ran on the same directory as this repo's root directory (which has the install.sh script)

Based on @s7anley's comments, would it make sense to add an env var (KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS: '2') to the kafka container in docker-compose.yml?

Was this page helpful?
0 / 5 - 0 ratings