For some reason Kafka starts with 50 partitions, which takes unnecessary system resources
kafka-topics --describe events --zookeeper zookeeper:2181
Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
Topic: __consumer_offsets Partition: 0 Leader: 1001 Replicas: 1001 Isr: 1001
Topic: __consumer_offsets Partition: 1 Leader: 1001 Replicas: 1001 Isr: 1001
...
50 partitions is extremely huge. Some systems are allocating 1 partition for 1k req/sec as their default. Which is relatively good. I doupt anyone will have 50k error requests incoming every second and runs Sentry off with docker-compoe
Since with docker-compose all partitions run on the same host it's just a waste of system resources to run so many partitions on single host
It would be nice to scale down default number of partitions to some. Or is there some exposed setting to start less of them (didn't find it from doc anywhere)
More reading: https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster/
_Running it on double the minimum requirements EC2 instance, recieving < 7k errors per day (evenly spread), which makes it about 0.08 req/sec, but 100% of RAM and CPU is constantly allocated by it, which makes server unresponsive and crashing constantly_
Oh wow, this would probably explain the huge disk usage (except for the 7-day retention that people reported). Thanks for bringing this up.
Do you have a number in mind for an average user and would you be willing to submit a PR since this was your idea anyway (so I don't take credit for it in commits 馃檪 )
/cc @fpacifici @mattrobenolt
I'm not familiar enough with Sentry to pick it up, I'd be more than happy if someone not me could contribute regarding to it :slightly_smiling_face:
Wild guess is that 1-6 partitions should be more than enough for most use-cases. But I'm not too familiar how events are produced and consumed in Sentry
Since Kafka supports only increasing partitions without any dirty hacks (like downscaling requires) I'd suggest to start with some low number, like 1 or 2.
I hope most applications are stable enough so that they don't generate hundreds or thousands of errors per sec :wink:
Yeah, just 1 is going to be fine for most on-premise installs. If not all.
I hope most applications are stable enough so that they don't generate hundreds or thousands of errors per sec 馃槈
You鈥檇 uhh, be surprised. But this rate would be difficult to maintain with onpremise without a lot of work. Kafka needing more partitions won鈥檛 be the first issue.
Any tips how users can experiment with tuning this? I also didn't see anything obvious in the config, is it hard coded somewhere?
is there anything we can do to fix it?
I've created this script (which would run every nigh) as a workaround for this issue:
#!/bin/bash
docker-compose down --volumes
docker volume rm sentry-kafka sentry-zookeeper
./install.sh
docker-compose up -d
You can easily configure default number of partitions which will be applied for internal topic __consumer_offsets. Set up the env. variable for example KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS: '2'.
You can change the log rotation same way too - KAFKA_LOG_RETENTION_HOURS: '24'.
@BYK any chance you can give us some tips how correctly optimise Kafka configuration for Sentry usage in mind?
I've created this script (which would run every nigh) as a workaround for this issue:
#!/bin/bash docker-compose down --volumes docker volume rm sentry-kafka sentry-zookeeper ./install.sh docker-compose up -d
i'm getting
ERROR: Volume sentry-kafka declared as external, but could not be found. Please create the volume manually using docker volume create --name=sentry-kafka and try again.
after i run your script
@sckoh, install script should create it again.
This line: https://github.com/getsentry/onpremise/blob/5d00d613fab7cf1a4ab7903b466e871c5a9a9015/install.sh#L101
@sckoh as @NullIsNot0 mentioned, install.sh script creates the volume, I forgot to mention that the script should be ran on the same directory as this repo's root directory (which has the install.sh script)
Based on @s7anley's comments, would it make sense to add an env var (KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS: '2') to the kafka container in docker-compose.yml?
Most helpful comment
You can easily configure default number of partitions which will be applied for internal topic
__consumer_offsets. Set up the env. variable for exampleKAFKA_OFFSETS_TOPIC_NUM_PARTITIONS: '2'.You can change the log rotation same way too -
KAFKA_LOG_RETENTION_HOURS: '24'.