Thanos: compactor - "write postings: exceeding max size of 64GiB"

Created on 15 Aug 2019 · 35Comments · Source: thanos-io/thanos

Thanos, Prometheus and Golang version used

Prometheus version: 2.11.1

thanos, version 0.6.0 (branch: master, revision: 579671460611d561fcf991182b3f60e91ea04843)
  build user:       root@packages-build-1127f72e-908b-40dc-b9e1-f95dcacaaa62-sck6t
  build date:       20190806-16:13:48
  go version:       go1.12.7

What happened

When running the compactor on we get the following output (sanitized to remove internal hostnames):

2019-08-13T15:13:54-07:00 observability-compactor-123456 user:notice thanos-compact-observability-036[20057]: level=error ts=2019-08-13T22:13:54.054751118Z caller=main.go:199 msg="running command failed" err="compaction failed: compaction failed for group 0@{host=\"observability-prom-A98765\"}: compact blocks [thanos-compact-tDlEs/compact/0@{host=\"observability-prom-A98765\"}/01DHRV2ADD0DGW2V0WB84D472P thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHTN0B5GK4M9K1YF344C599J thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHTYSE4G3CPN6PT73JD4NFPC thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHV863B7P32AR6CM46ZJCJDR thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHY92FF48P4CPQMGPVMEVJE8 thanos-compact-tDlEs/compact/0@{host=\"observability-prom-B98765\"}/01DHYJM1E1F58STJERMF6PWY0M]: 2 errors: write compaction: write postings: write postings: exceeding max size of 64GiB; exceeding max size o
2019-08-13T15:13:54-07:00 observability-compactor-123456 user:notice thanos-compact-observability-036[20057]: f 64GiB"

What you expected to happen

Compaction would succeed, or skip these block without error, since they cannot be compacted.

How to reproduce it (as minimally and precisely as possible):

Try to compact blocks that result in the postings part of the resulting index to exceed 64G

Full logs to relevant components

See log lines above. Let me know if anything else would be useful, but I think that's all we have that is relevant

Anything else we need to know

These blocks are from a pair of prometheus hosts we have scraping all targets in one of our data centers. Here is more information on the blocks themselves:

01DHRV2ADD0DGW2V0WB84D472P
- Total size: 90.173 GBytes (96822298627 Bytes)
- Index size 19.146 GBytes (20558092904 Bytes)
- Postings (unique label pairs): 289667
- Postings entries (total label pairs): 1593491890
01DHTN0B5GK4M9K1YF344C599J
- Total size: 103.220 GBytes (110831397125 Bytes)
- Index size: 22.640 GBytes (24309975943 Bytes)
- Postings (unique label pairs): 347394
- Postings entries (total label pairs): 1914640990
01DHTYSE4G3CPN6PT73JD4NFPC
- Total size: 102.934 GBytes (110524213458 Bytes)
- Index size: 23.570 GBytes (25308181103 Bytes)
- Postings (unique label pairs): 321638
- Postings entries (total label pairs): 2014402216
01DHV863B7P32AR6CM46ZJCJDR
- Total size: 90.703 GBytes (97391214724 Bytes)
- Index size19.473 GBytes (20909065106 Bytes)
- Postings (unique label pairs): 295153
- Postings entries (total label pairs): 1625440000
01DHY92FF48P4CPQMGPVMEVJE8
- Total size: 103.066 GBytes (110666647216 Bytes)
- Index size: 24.510 GBytes (26317160818 Bytes)
- Postings (unique label pairs): 383836
- Postings entries (total label pairs): 2092985560
01DHYJM1E1F58STJERMF6PWY0M
- Total size: 97.399 GBytes (104581383915 Bytes)
- Index size: 21.636 GBytes (23231951641 Bytes)
- Postings (unique label pairs): 322795
- Postings entries (total label pairs): 1818507147

Each of these blocks represent 8 hours of data.

This appears to be related to https://github.com/prometheus/prometheus/issues/5868. I'm not sure if the limit is going to change in the TSDB library, but it would be nice in the mean time if this was considered a warning, or if we could somehow specify a threshold to prevent it from trying to compact these blocks, since they're already fairly large.

bug compact hard help wanted

Source

claytono

Most helpful comment

Update. The code is done and in review to fix this issue.

no-compact-mark.json added https://github.com/thanos-io/thanos/pull/3409
Auto marking of blocks that would cause the block to be too big (trigger issue of 64GB index) https://github.com/thanos-io/thanos/pull/3410

bwplotka on 5 Nov 2020

🚀2

All 35 comments

@jojohappy I'm not sure question is the right tag for this. I believe this is a bug.

claytono on 27 Aug 2019

@claytono Sorry for the delay. You are right here. I wonder how to check that error? Maybe it is hard to work around. I'm sorry that I'm not very familiar with tsdb.

jojohappy on 28 Aug 2019

@jojohappy I think this is a tricky one unless the issue is fixed in the TSDB code. I suspect this doesn't crop up with prometheus itself since it's less aggressive about compaction. Right now my best guess on how thanos might handle this would be to have a maximum block size and if a block is already that big, it won't compact it any further. This might also help with some scenarios using time based partitioning

claytono on 28 Aug 2019

It looks like this is being addressed upstream: https://github.com/prometheus/prometheus/pull/5884

claytono on 28 Aug 2019

Sorry for the delay! @claytono

IMO, I agree with you, but also have some questions:

how thanos might handle this would be to have a maximum block size

How can we set the maximum block size? In your case, the size of each index files are less than 30GB, but they actually exceeded the max size of 64GB during compacting. Before finishing compact, we don't know whether the new size of index is more than 64GB or not, right? I think it is difficult to find the right size for checking. Maybe I'm wrong.

This might also help with some scenarios using time based partitioning

Sorry, I can't catch your point. Could you explain it?

It looks like this is being addressed upstream: prometheus/prometheus#5884

Yes, if it is fixed, that will be great.

jojohappy on 6 Sep 2019

How can we set the maximum block size? In your case, the size of each index files are less than 30GB, but they actually exceeded the max size of 64GB during compacting. Before finishing compact, we don't know whether the new size of index is more than 64GB or not, right? I think it is difficult to find the right size for checking. Maybe I'm wrong.

I don't think we really can. I think the best we could do would be to have a flag that allowed us to specify a maximum size and index needs to be, and past that we don't try to compact it any more.

This might also help with some scenarios using time based partitioning

Sorry, I can't catch your point. Could you explain it?

We're planning to use the new time based partitioning. As part of this we expect we'll want to rebalance the partitions occasionally as compaction and downsampling occurs to keep the amount of memory usage roughly equivalent between thanos store instances. The limiting factor for us for thanos-store capacity is memory usage. That's roughly proportional to index size, so we'd like to ensure we don't end up with thanos stores serving blocks so big that it's just a few blocks. If a thanos-store process is serving 2 blocks and that consumes all the memory on the host, then our options for rebalancing aren't as good as if we limit index size and the thanos-store process is serving a dozen blocks. Right now I think the best options we have is to limit max compaction, but that's a fairly coarse tool. Ideally we'd like to just tell the compactor something like "Don't compact index files that are already x GB, they're big enough".

That said, this isn't really related to the issue I opened, it would just be a nice side effect if this needed to be fixed or worked around in Thanos.

claytono on 6 Sep 2019

Awesome! Thank you for your detailed case.

we don't try to compact it any more.

How about downsampling? If we skip compaction, maybe we could not do downsampling, because of lack of enough series in the blocks. Further information is here.

Don't compact index files that are already x GB

What's your idea for the x? I think we are difficult to find the right x.

If a thanos-store process is serving 2 blocks and that consumes all the memory on the host, then our options for rebalancing aren't as good as if we limit index size and the thanos-store process is serving a dozen blocks

I think it is not our goal for compaction, did you compare the time cost for long time period querying between those two ways?

Now @bwplotka is following the issue #1471 about reducing store memory usage, I think you can also follow that if you would like to.

jojohappy on 9 Sep 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 11 Jan 2020

I'm currently running into this and am curious if there is something I can do to help mitigate the issue. Currently compact errors out with this issue and restarts every few hours.

Lemmons on 31 Jan 2020

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

stale[bot] on 3 Mar 2020

We just hit this issue as well. This limit is due to the use of 32 bit integers within Prometheus' TSDB code. See here for a good explanation. There is an open bug report as well as PR that will hopefully resolve the issue.

ipstatic on 20 Apr 2020

Could this be re-opened until this is resolved?

ipstatic on 20 Apr 2020

Yea, let's solve this.

I have another suggestion: I would suggest actually not waiting for a fix here and rather set upper limit for block and start splitting them based on size. This is tracked by https://github.com/thanos-io/thanos/issues/2340

Especially on large setups and WITH some deduplication (vertical or offline) enabled and planned we will have a problem with huge TB size blocks pretty soon. At some point indexing does not scale well with number of series, so we might want to split it....

cc @brancz @pracucci @squat

bwplotka on 21 Apr 2020

That sounds good to me and was another fear of mine as well. Sure we could increase the index limit but we will reach that again some day.

ipstatic on 21 Apr 2020

👍1

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] on 21 May 2020

Closing for now as promised, let us know if you need this to be reopened! 🤗

stale[bot] on 28 May 2020

This is still an issue.

ipstatic on 28 May 2020

Yes, not done. How you guys workaround so far this issue?

bwplotka on 28 May 2020

We have just stopped compacting :(

ipstatic on 28 May 2020

stale[bot] on 28 Jun 2020

Still an issue. If anyone knows of a workaround I would be appreciative. Right now we just have the impacted instance not compacting. This will start hurting performance soon.

ipstatic on 28 Jun 2020

stale[bot] on 28 Jul 2020

_Still an issue_

pracucci on 28 Jul 2020

In the mean time while waiting for https://github.com/prometheus/prometheus/pull/5884

One idea is to mitigate this by detecting this error on the compactor side and excluding those blocks from compacting.
Second is to split into two blocks instead of one.

2 is nicer but requires much more thinking. 1. is not the easiest as well as we need different block planning. But can be done without Prometheus upstream changes.

bwplotka on 30 Jul 2020

Help wanted! We need to limit the size of blocks anyway at some point so starting this: https://github.com/thanos-io/thanos/issues/3068

bwplotka on 25 Aug 2020

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] on 24 Oct 2020

This is still a big issue for us. We now have several compactor instances turned off due to this limit.

ipstatic on 24 Oct 2020

We're also now running into this (https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11757)

SuperQ on 29 Oct 2020

On it this sprint.

Kind Regards,
Bartek Płotka (@bwplotka)

On Thu, 29 Oct 2020 at 08:22, Ben Kochie notifications@github.com wrote:

We're also now running into this (
https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11757)

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/thanos-io/thanos/issues/1424#issuecomment-718415229,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABVA3O2QWPPSKM3URYSV723SNEJ25ANCNFSM4IMAUWDQ
.

bwplotka on 29 Oct 2020

As a workaround, would it be possible to add a json tag to the meta.json so that the compactor can be told to skip it?

Something like this:

{
  "thanos": {
    "no_compaction": true
  }
}

SuperQ on 1 Nov 2020

The main problem with this approach is that this block might be one portion of a bigger block that "have" to be created in order to get to 2w compaction level. Need to think about it more, how to make this work, but something like this could be introduced ASAP, yes.

I am working on some block size capped splitting, so compaction will result in two blocks, not one etc. Having blocks capped at XXX GB is not only nice for this issue but also caps how much you need to download for manual interactions, potential deletions, downsampling, and other operations.

Just curious: @SuperQ can you tell what size of the block is right now? (mostly sizes of index I am interested in) Probably more than one in order to get 64GB+ postings.

bwplotka on 2 Nov 2020

I should have some workaround today.

bwplotka on 2 Nov 2020

Here's some metdata on what's failing:

  8.96 GiB  2020-09-20T17:41:23Z  gs://gitlab-gprd-prometheus/01EJP50AZ3P1EN2NWMFPMC5BA7/index
  3.78 GiB  2020-09-21T05:56:53Z  gs://gitlab-gprd-prometheus/01EJQHMW5K9VP7FWW7VD3M9FQB/index
  8.19 GiB  2020-09-23T06:57:05Z  gs://gitlab-gprd-prometheus/01EJWQJQ0M2F33JV47CEFKFK5N/index
 14.08 GiB  2020-09-25T08:00:39Z  gs://gitlab-gprd-prometheus/01EK1XNFBHZ0MB86HBEZ9NFSVV/index
 12.72 GiB  2020-09-27T07:21:45Z  gs://gitlab-gprd-prometheus/01EK711SSHZHPDH5V274M8ESAY/index
 11.35 GiB  2020-09-29T07:32:43Z  gs://gitlab-gprd-prometheus/01EKC6VXQ4A3ZANP0E30KY5AWM/index
 13.31 GiB  2020-10-23T10:03:34Z  gs://gitlab-gprd-prometheus/01ENA7V7YW31BVM1QFRJ57TDVX/index

And an example meta.json

{
    "ulid": "01EKC6VXQ4A3ZANP0E30KY5AWM",
    "minTime": 1601164800000,
    "maxTime": 1601337600000,
    "stats": {
        "numSamples": 24454977719,
        "numSeries": 49668912,
        "numChunks": 229130151
    },
    "compaction": {
        "level": 3,
        "sources": [
            "01EK6R1EW9Y76QPSHE2S2GZ1SC",
            "01EK6YX646B7RPN4FFFCDR62KP",
            "01EK75RXC6HZBKGRB4N8APP638",
            "01EK7CMMM611CJ4TJD04BBG6MM",
            "01EK7KGBWNC515RFKBDMX4HJSP",
            "01EK7TC3463VCS3Z4CXZMEET0X",
            "01EK817TC6KVQ59R73D48BRAC9",
            "01EK883HM7X7V1SFCH8NZY6FW4",
            "01EK8EZ8W5N2JK5ZZY2V4058YG",
            "01EK8NV048MXK605DPJ710DMKS",
            "01EK8WPQC7ZY51T32WHFMBNPKS",
            "01EK93JEN36YBXK58KMTWMPJCK",
            "01EK9AE5X3WR35X0FGP6Y1A0PR",
            "01EK9H9X586N62EGE972PWPT95",
            "01EK9R5MD2B11YRBPE62ETQJX8",
            "01EK9Z1BN8DFE5H765E8JTDEJC",
            "01EKA5X2XE7959NND514559190",
            "01EKACRT59F6KDA9WJ9GP3NZSW",
            "01EKAKMHD7VA5WXK10S8EH4274",
            "01EKATG8N9Q6EYRR3MYHSE3H10",
            "01EKB1BZX6SZRVKMMG83R1SZ2J",
            "01EKB87Q5DHDGVK0MGS5BE8PFH",
            "01EKBF3ED853GQ0TBWT5YTCTGS",
            "01EKBNZ5NNE8GACZDW32RTBWR0"
        ],
        "parents": [
            {
                "ulid": "01EK89TVSX9ZZAET1FYXC1WHH6",
                "minTime": 1601164800000,
                "maxTime": 1601193600000
            },
            {
                "ulid": "01EK8HESS4GD6YS55ASCCXGA2N",
                "minTime": 1601193600000,
                "maxTime": 1601222400000
            },
            {
                "ulid": "01EK9CXXF25T12TBV0XRGRXD63",
                "minTime": 1601222400000,
                "maxTime": 1601251200000
            },
            {
                "ulid": "01EKA8B4MF78WYY3FM115Q4C6T",
                "minTime": 1601251200000,
                "maxTime": 1601280000000
            },
            {
                "ulid": "01EKB3XJWV3FMJXSMQ493JV6WN",
                "minTime": 1601280000000,
                "maxTime": 1601308800000
            },
            {
                "ulid": "01EKBZBX8MCBM2T05Y739KDRGV",
                "minTime": 1601308800000,
                "maxTime": 1601337600000
            }
        ]
    },
    "version": 1,
    "thanos": {
        "labels": {
            "cluster": "gprd-gitlab-gke",
            "env": "gprd",
            "environment": "gprd",
            "monitor": "default",
            "prometheus": "monitoring/gitlab-monitoring-promethe-prometheus",
            "prometheus_replica": "prometheus-gitlab-monitoring-promethe-prometheus-0",
            "provider": "gcp",
            "region": "us-east1"
        },
        "downsample": {
            "resolution": 0
        },
        "source": "compactor"
    }
}

Due to the interaction of the operator, and helm, we have an excess of overly long labels.

SuperQ on 2 Nov 2020

👍1

Update. The code is done and in review to fix this issue.

no-compact-mark.json added https://github.com/thanos-io/thanos/pull/3409
Auto marking of blocks that would cause the block to be too big (trigger issue of 64GB index) https://github.com/thanos-io/thanos/pull/3410

bwplotka on 5 Nov 2020

🚀2

You can manually exclude blocks from compaction, but PR for automatic flow for this is still in review: https://github.com/thanos-io/thanos/pull/3410 will close this issue once merged.

bwplotka on 6 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings