Thanos, Prometheus and Golang version used
Prometheus version: 2.11.1
thanos, version 0.6.0 (branch: master, revision: 579671460611d561fcf991182b3f60e91ea04843)
build user: root@packages-build-1127f72e-908b-40dc-b9e1-f95dcacaaa62-sck6t
build date: 20190806-16:13:48
go version: go1.12.7
What happened
When running the compactor on we get the following output (sanitized to remove internal hostnames):
2019-08-13T15:13:54-07:00 observability-compactor-123456 user:notice thanos-compact-observability-036[20057]: level=error ts=2019-08-13T22:13:54.054751118Z caller=main.go:199 msg="running command failed" err="compaction failed: compaction failed for group 0@{host=\"observability-prom-A98765\"}: compact blocks [thanos-compact-tDlEs/compact/0@{host=\"observability-prom-A98765\"}/01DHRV2ADD0DGW2V0WB84D472P thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHTN0B5GK4M9K1YF344C599J thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHTYSE4G3CPN6PT73JD4NFPC thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHV863B7P32AR6CM46ZJCJDR thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHY92FF48P4CPQMGPVMEVJE8 thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHYJM1E1F58STJERMF6PWY0M]: 2 errors: write compaction: write postings: write postings: exceeding max size of 64GiB; exceeding max size o
2019-08-13T15:13:54-07:00 observability-compactor-123456 user:notice thanos-compact-observability-036[20057]: f 64GiB"
What you expected to happen
Compaction would succeed, or skip these block without error, since they cannot be compacted.
How to reproduce it (as minimally and precisely as possible):
Try to compact blocks that result in the postings part of the resulting index to exceed 64G
Full logs to relevant components
See log lines above. Let me know if anything else would be useful, but I think that's all we have that is relevant
Anything else we need to know
These blocks are from a pair of prometheus hosts we have scraping all targets in one of our data centers. Here is more information on the blocks themselves:
Each of these blocks represent 8 hours of data.
This appears to be related to https://github.com/prometheus/prometheus/issues/5868. I'm not sure if the limit is going to change in the TSDB library, but it would be nice in the mean time if this was considered a warning, or if we could somehow specify a threshold to prevent it from trying to compact these blocks, since they're already fairly large.
@jojohappy I'm not sure question is the right tag for this. I believe this is a bug.
@claytono Sorry for the delay. You are right here. I wonder how to check that error? Maybe it is hard to work around. I'm sorry that I'm not very familiar with tsdb.
@jojohappy I think this is a tricky one unless the issue is fixed in the TSDB code. I suspect this doesn't crop up with prometheus itself since it's less aggressive about compaction. Right now my best guess on how thanos might handle this would be to have a maximum block size and if a block is already that big, it won't compact it any further. This might also help with some scenarios using time based partitioning
It looks like this is being addressed upstream: https://github.com/prometheus/prometheus/pull/5884
Sorry for the delay! @claytono
IMO, I agree with you, but also have some questions:
how thanos might handle this would be to have a maximum block size
How can we set the maximum block size? In your case, the size of each index files are less than 30GB, but they actually exceeded the max size of 64GB during compacting. Before finishing compact, we don't know whether the new size of index is more than 64GB or not, right? I think it is difficult to find the right size for checking. Maybe I'm wrong.
This might also help with some scenarios using time based partitioning
Sorry, I can't catch your point. Could you explain it?
It looks like this is being addressed upstream: prometheus/prometheus#5884
Yes, if it is fixed, that will be great.
How can we set the maximum block size? In your case, the size of each index files are less than 30GB, but they actually exceeded the max size of 64GB during compacting. Before finishing compact, we don't know whether the new size of index is more than 64GB or not, right? I think it is difficult to find the right size for checking. Maybe I'm wrong.
I don't think we really can. I think the best we could do would be to have a flag that allowed us to specify a maximum size and index needs to be, and past that we don't try to compact it any more.
This might also help with some scenarios using time based partitioning
Sorry, I can't catch your point. Could you explain it?
We're planning to use the new time based partitioning. As part of this we expect we'll want to rebalance the partitions occasionally as compaction and downsampling occurs to keep the amount of memory usage roughly equivalent between thanos store instances. The limiting factor for us for thanos-store capacity is memory usage. That's roughly proportional to index size, so we'd like to ensure we don't end up with thanos stores serving blocks so big that it's just a few blocks. If a thanos-store process is serving 2 blocks and that consumes all the memory on the host, then our options for rebalancing aren't as good as if we limit index size and the thanos-store process is serving a dozen blocks. Right now I think the best options we have is to limit max compaction, but that's a fairly coarse tool. Ideally we'd like to just tell the compactor something like "Don't compact index files that are already x GB, they're big enough".
That said, this isn't really related to the issue I opened, it would just be a nice side effect if this needed to be fixed or worked around in Thanos.
Awesome! Thank you for your detailed case.
we don't try to compact it any more.
How about downsampling? If we skip compaction, maybe we could not do downsampling, because of lack of enough series in the blocks. Further information is here.
Don't compact index files that are already x GB
What's your idea for the x? I think we are difficult to find the right x.
If a thanos-store process is serving 2 blocks and that consumes all the memory on the host, then our options for rebalancing aren't as good as if we limit index size and the thanos-store process is serving a dozen blocks
I think it is not our goal for compaction, did you compare the time cost for long time period querying between those two ways?
Now @bwplotka is following the issue #1471 about reducing store memory usage, I think you can also follow that if you would like to.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I'm currently running into this and am curious if there is something I can do to help mitigate the issue. Currently compact errors out with this issue and restarts every few hours.
This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.
We just hit this issue as well. This limit is due to the use of 32 bit integers within Prometheus' TSDB code. See here for a good explanation. There is an open bug report as well as PR that will hopefully resolve the issue.
Could this be re-opened until this is resolved?
Yea, let's solve this.
I have another suggestion: I would suggest actually not waiting for a fix here and rather set upper limit for block and start splitting them based on size. This is tracked by https://github.com/thanos-io/thanos/issues/2340
Especially on large setups and WITH some deduplication (vertical or offline) enabled and planned we will have a problem with huge TB size blocks pretty soon. At some point indexing does not scale well with number of series, so we might want to split it....
cc @brancz @pracucci @squat
That sounds good to me and was another fear of mine as well. Sure we could increase the index limit but we will reach that again some day.
Hello ๐ Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! ๐ค
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.
Closing for now as promised, let us know if you need this to be reopened! ๐ค
This is still an issue.
Yes, not done. How you guys workaround so far this issue?
We have just stopped compacting :(
Hello ๐ Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! ๐ค
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.
Still an issue. If anyone knows of a workaround I would be appreciative. Right now we just have the impacted instance not compacting. This will start hurting performance soon.
Hello ๐ Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! ๐ค
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.
_Still an issue_
In the mean time while waiting for https://github.com/prometheus/prometheus/pull/5884
2 is nicer but requires much more thinking. 1. is not the easiest as well as we need different block planning. But can be done without Prometheus upstream changes.
Help wanted! We need to limit the size of blocks anyway at some point so starting this: https://github.com/thanos-io/thanos/issues/3068
Hello ๐ Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! ๐ค
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.
This is still a big issue for us. We now have several compactor instances turned off due to this limit.
We're also now running into this (https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11757)
On it this sprint.
Kind Regards,
Bartek Pลotka (@bwplotka)
On Thu, 29 Oct 2020 at 08:22, Ben Kochie notifications@github.com wrote:
We're also now running into this (
https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11757)โ
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/thanos-io/thanos/issues/1424#issuecomment-718415229,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABVA3O2QWPPSKM3URYSV723SNEJ25ANCNFSM4IMAUWDQ
.
As a workaround, would it be possible to add a json tag to the meta.json so that the compactor can be told to skip it?
Something like this:
{
"thanos": {
"no_compaction": true
}
}
The main problem with this approach is that this block might be one portion of a bigger block that "have" to be created in order to get to 2w compaction level. Need to think about it more, how to make this work, but something like this could be introduced ASAP, yes.
I am working on some block size capped splitting, so compaction will result in two blocks, not one etc. Having blocks capped at XXX GB is not only nice for this issue but also caps how much you need to download for manual interactions, potential deletions, downsampling, and other operations.
Just curious: @SuperQ can you tell what size of the block is right now? (mostly sizes of index I am interested in) Probably more than one in order to get 64GB+ postings.
I should have some workaround today.
Here's some metdata on what's failing:
8.96 GiB 2020-09-20T17:41:23Z gs://gitlab-gprd-prometheus/01EJP50AZ3P1EN2NWMFPMC5BA7/index
3.78 GiB 2020-09-21T05:56:53Z gs://gitlab-gprd-prometheus/01EJQHMW5K9VP7FWW7VD3M9FQB/index
8.19 GiB 2020-09-23T06:57:05Z gs://gitlab-gprd-prometheus/01EJWQJQ0M2F33JV47CEFKFK5N/index
14.08 GiB 2020-09-25T08:00:39Z gs://gitlab-gprd-prometheus/01EK1XNFBHZ0MB86HBEZ9NFSVV/index
12.72 GiB 2020-09-27T07:21:45Z gs://gitlab-gprd-prometheus/01EK711SSHZHPDH5V274M8ESAY/index
11.35 GiB 2020-09-29T07:32:43Z gs://gitlab-gprd-prometheus/01EKC6VXQ4A3ZANP0E30KY5AWM/index
13.31 GiB 2020-10-23T10:03:34Z gs://gitlab-gprd-prometheus/01ENA7V7YW31BVM1QFRJ57TDVX/index
And an example meta.json
{
"ulid": "01EKC6VXQ4A3ZANP0E30KY5AWM",
"minTime": 1601164800000,
"maxTime": 1601337600000,
"stats": {
"numSamples": 24454977719,
"numSeries": 49668912,
"numChunks": 229130151
},
"compaction": {
"level": 3,
"sources": [
"01EK6R1EW9Y76QPSHE2S2GZ1SC",
"01EK6YX646B7RPN4FFFCDR62KP",
"01EK75RXC6HZBKGRB4N8APP638",
"01EK7CMMM611CJ4TJD04BBG6MM",
"01EK7KGBWNC515RFKBDMX4HJSP",
"01EK7TC3463VCS3Z4CXZMEET0X",
"01EK817TC6KVQ59R73D48BRAC9",
"01EK883HM7X7V1SFCH8NZY6FW4",
"01EK8EZ8W5N2JK5ZZY2V4058YG",
"01EK8NV048MXK605DPJ710DMKS",
"01EK8WPQC7ZY51T32WHFMBNPKS",
"01EK93JEN36YBXK58KMTWMPJCK",
"01EK9AE5X3WR35X0FGP6Y1A0PR",
"01EK9H9X586N62EGE972PWPT95",
"01EK9R5MD2B11YRBPE62ETQJX8",
"01EK9Z1BN8DFE5H765E8JTDEJC",
"01EKA5X2XE7959NND514559190",
"01EKACRT59F6KDA9WJ9GP3NZSW",
"01EKAKMHD7VA5WXK10S8EH4274",
"01EKATG8N9Q6EYRR3MYHSE3H10",
"01EKB1BZX6SZRVKMMG83R1SZ2J",
"01EKB87Q5DHDGVK0MGS5BE8PFH",
"01EKBF3ED853GQ0TBWT5YTCTGS",
"01EKBNZ5NNE8GACZDW32RTBWR0"
],
"parents": [
{
"ulid": "01EK89TVSX9ZZAET1FYXC1WHH6",
"minTime": 1601164800000,
"maxTime": 1601193600000
},
{
"ulid": "01EK8HESS4GD6YS55ASCCXGA2N",
"minTime": 1601193600000,
"maxTime": 1601222400000
},
{
"ulid": "01EK9CXXF25T12TBV0XRGRXD63",
"minTime": 1601222400000,
"maxTime": 1601251200000
},
{
"ulid": "01EKA8B4MF78WYY3FM115Q4C6T",
"minTime": 1601251200000,
"maxTime": 1601280000000
},
{
"ulid": "01EKB3XJWV3FMJXSMQ493JV6WN",
"minTime": 1601280000000,
"maxTime": 1601308800000
},
{
"ulid": "01EKBZBX8MCBM2T05Y739KDRGV",
"minTime": 1601308800000,
"maxTime": 1601337600000
}
]
},
"version": 1,
"thanos": {
"labels": {
"cluster": "gprd-gitlab-gke",
"env": "gprd",
"environment": "gprd",
"monitor": "default",
"prometheus": "monitoring/gitlab-monitoring-promethe-prometheus",
"prometheus_replica": "prometheus-gitlab-monitoring-promethe-prometheus-0",
"provider": "gcp",
"region": "us-east1"
},
"downsample": {
"resolution": 0
},
"source": "compactor"
}
}
Due to the interaction of the operator, and helm, we have an excess of overly long labels.
Update. The code is done and in review to fix this issue.
You can manually exclude blocks from compaction, but PR for automatic flow for this is still in review: https://github.com/thanos-io/thanos/pull/3410 will close this issue once merged.
Most helpful comment
Update. The code is done and in review to fix this issue.