I'm not sure if this is possible ATM but it would be a nice thing to have: Thanos Compact could check if there's enough space on disk to perform compaction before trying to download everything (those blocks could be around ~100GiB) and trying to compact.
Investigate this and, if possible, implement.
Large blocks are a thing. This is from a Prometheus VM doing 2.65 million metrics in the head block and about 135k samples/s -- which is not completely scary. I'm looking for a way to forcast how much scratch space I need, or compact to behave better when it knows that it cannot further compact a set of blocks due to scratch space limitations.
Are you thinking storage scheduling or why are you labeling this with difficulty: hard?
I think it's hard because it's hard to estimate the size in theory. However checking sizes of source blocks would already give you quite precise the overall size needed IMO, so I would say at the size will be most likely
Max: all source blocks sizes x 2
Min: all source blocks + size of biggest source block
We chould easily check if at least min size can fit. and halt/crash/increment/continue with other compaction/inc metric otherwise. This simple thing is difficulty: easy (:
Indeed, we should check if the tsdb package provides the functionality to calculate the exact size of the resulting blocks since it depends on the data itself. If that is not possible, I agree: it should be completely fine to add those simple checks as a heuristic. Still better than downloading hunders of gigabytes of data only to later find out that there will not be enough space to perform the needed actions 😄
Exactly my thought, just wanted to make sure I’m not totally missing something.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I believe this is in progress no? @GiedriusS
Yes :+1: waiting for a review on https://github.com/thanos-io/thanos/pull/1550 which should solve this issue.
This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.
Still in progress, just didn't have much time.
This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.
Still want to finish this soon.
This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.
Will probably get back to this in the next couple of weeks because I have more free time now.
Awesome!
On Mon, 13 Apr 2020 at 22:00, Giedrius Statkevičius <
[email protected]> wrote:
Will probably get back to this in the next couple of weeks because I have
more free time now.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/thanos-io/thanos/issues/1152#issuecomment-613095790,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABVA3O4VI7IWFBID2YKRXW3RMN4QLANCNFSM4HNNKBVQ
.
Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.
I started something but haven't finished. I think this is still relevant.
Closing for now as promised, let us know if you need this to be reopened! 🤗
Most helpful comment
Indeed, we should check if the
tsdbpackage provides the functionality to calculate the exact size of the resulting blocks since it depends on the data itself. If that is not possible, I agree: it should be completely fine to add those simple checks as a heuristic. Still better than downloading hunders of gigabytes of data only to later find out that there will not be enough space to perform the needed actions 😄