Describe the bug
The current documentation of the chunk format does not appear to match what loki actually writes - and I'm uncomfortable putting my data in a system where there's no independent way of getting it out again.
I would also like to understand how the cardinality of labels associated to chunks/files - e.g. does each file only include messages relating to one particular label set?
Finally, I would like to know how the filename relates to its content. I can see it is base64-encoded and includes the tenant name plus some other hex values.
To Reproduce
Steps to reproduce the behavior:
/tmp/loki/chunks/Expected behavior
Files to be readable according to a documented format.
What I actually see is:
Oddly, pkg/chunkenc/interface.go defines EncGZIP, EncDumb and EncNone - but not Snappy.
Maybe there is some additional header, which has been prepended to the chunks format.
Environment:
Screenshots, promtail config, or terminal output
root@loki:~# hexdump -C /tmp/loki/chunks/ZmFrZS83YTI1N2M5ZWI2YTYyMDkwOjE2OWYzMTZiODE2OjE2OWYzMTZiODYyOjdkNzMyNGQy
00000000 00 00 00 d8 ff 06 00 00 73 4e 61 50 70 59 01 c6 |........sNaPpY..|
00000010 00 00 56 ba 38 1a 7b 22 66 69 6e 67 65 72 70 72 |..V.8.{"fingerpr|
00000020 69 6e 74 22 3a 38 38 30 31 35 37 38 30 36 37 38 |int":88015780678|
00000030 37 36 35 32 30 30 38 30 2c 22 75 73 65 72 49 44 |76520080,"userID|
00000040 22 3a 22 66 61 6b 65 22 2c 22 66 72 6f 6d 22 3a |":"fake","from":|
00000050 31 35 35 34 35 36 31 35 34 36 2e 32 36 32 2c 22 |1554561546.262,"|
00000060 74 68 72 6f 75 67 68 22 3a 31 35 35 34 35 36 31 |through":1554561|
00000070 35 34 36 2e 33 33 38 2c 22 6d 65 74 72 69 63 22 |546.338,"metric"|
00000080 3a 7b 22 5f 5f 66 69 6c 65 6e 61 6d 65 5f 5f 22 |:{"__filename__"|
00000090 3a 22 2f 76 61 72 2f 6c 6f 67 2f 61 75 74 68 2e |:"/var/log/auth.|
000000a0 6c 6f 67 22 2c 22 6a 6f 62 22 3a 22 76 61 72 6c |log","job":"varl|
000000b0 6f 67 73 22 2c 22 5f 5f 6e 61 6d 65 5f 5f 22 3a |ogs","__name__":|
000000c0 22 6c 6f 67 73 22 7d 2c 22 65 6e 63 6f 64 69 6e |"logs"},"encodin|
000000d0 67 22 3a 31 32 38 7d 0a 00 00 13 7d 01 2e e5 6a |g":128}....}...j|
000000e0 01 1f 8b 08 00 00 00 00 00 00 ff ac 9d fb 53 55 |..............SU|
000000f0 57 96 c7 cb ff e4 fe d8 5d 53 63 9f 7d de e7 54 |W.......]Sc.}..T|
00000100 4d 75 3b 49 a6 26 3f 74 cf 8c 26 f6 d4 a4 52 dd |Mu;I.&?t..&...R.|
00000110 0a 97 34 1d 05 0b 4c a6 fd 4d f1 85 4a 1b 45 07 |..4...L..M..J.E.|
00000120 14 11 89 e1 25 41 10 04 84 0b 02 49 10 8d 2f 88 |....%A.....I../.|
00000130 12 21 06 15 f1 45 2b 2a 2a a8 f8 9c ba 5c a2 67 |.!...E+**....\.g|
...
00001440 2b ba c4 e6 ef ad df f4 92 2b 05 d0 26 fc 36 87 |+........+..&.6.|
00001450 dc 00 00 00 00 00 00 13 59 |........Y|
00001459
root@loki:~# echo "ZmFrZS83YTI1N2M5ZWI2YTYyMDkwOjE2OWYzMTZiODE2OjE2OWYzMTZiODYyOjdkNzMyNGQy" | base64 -d
fake/7a257c9eb6a62090:169f316b816:169f316b862:7d7324d2
why do you want to push to loki, that is what promtail does.
why do you want to push to loki, that is what promtail does.
I didn't say I wanted to write loki files. I want documentation for how to read the files.
There are several reasons for this:
(1) If loki blows up, I want to be able to manually extract data if necessary. I don't want to be left with useless blobs. (The same applies to other tools I use, e.g. backup tools)
(2) For data retention, I want to understand which files can be deleted safely
(3) To help me understand loki's data model, and in particular how it handles high cardinality of series.
For example: if with 1 million distinct label sets Loki writes 1 million separate chunk files, this has an impact on I/O and filesystem fragmentation. It will affect how I decide to label the series.
oh, i get it, but sorry that now do not delete data from loki.
I think read from Grafana UI is more common user case.
I still think that having valid documentation of the chunk format is a good thing.
Prometheus documents its format. It can change over time as the product develops of course. But at least I don't have to reverse-engineer it from the code.
Sorry, but that commit doesn't close this issue.
I have read the pkg/chunkenc/README.md file, and I linked to it twice in my original post above.
The problem is that the actual files written by loki don't appear to match the format the README.md file describes at all. I attempted some reverse engineering of the original files in the post, and they are very different.
You are right.
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
@candlerb do you think that this has been addressed to our satisfaction now? Would love to year what you think now, a year later.
This is still on my plate. Not really happy with the current documentation. Sorry !
Most helpful comment
This is still on my plate. Not really happy with the current documentation. Sorry !