Adguardhome: Compress stats on harddrive

Created on 30 May 2020  ·  7Comments  ·  Source: AdguardTeam/AdGuardHome

Prerequisites

  • [✔️] I am running the latest version
  • [✔️] I checked the documentation and found no answer
  • [✔️] I checked to make sure that this issue has not already been filed

Problem Description

When having querylog enabled my harddisk gets filled. If simple compression would be enabled that easily could be reduced to an more acceptable size.

Proposed Solution

Compress file before writing though either gzip or sth. before writing it to disk. It might even be an option to not write it to an json but instead use a database like sqlite which handles storage efficiency

Additional Information

Just a short snapshot of my "real world data" which I compressed through gzip -k and brotli -6:

$ ls -lh querylog.*
-rw-r--r-- 1 me me 13G  May 28 18:14 querylog.json
-rw-r--r-- 1 me me 25M  May 28 18:14 querylog.json.br
-rw-r--r-- 1 me me 149M May 28 18:14 querylog.json.gz

I think the difference between "13G" and even "149M" in case of gzip is self-explanatory 😉

Medium feature request

Most helpful comment

With compressed querylog it will be difficult to provide a fast search and paging that we currently have.
We'll have to store compressed data in separate chunks and modify searching algorithm accordingly - not a simple task.

sqlite which handles storage efficiency

I don't know if sqlite can compress data on the fly, but I doubt it's a good solution to store logs there.

All 7 comments

@szolin I remember you were experimenting with it and it was pretty when compression is enabled. Do you remember the exact numbers?

With compressed querylog it will be difficult to provide a fast search and paging that we currently have.
We'll have to store compressed data in separate chunks and modify searching algorithm accordingly - not a simple task.

sqlite which handles storage efficiency

I don't know if sqlite can compress data on the fly, but I doubt it's a good solution to store logs there.

Today I was a little bit bored so I experimented a bit and you are right: SQLite is not appropritate for this.
I wrote


a python script to migrate the querylog.json to sqlite

import json, sqlite3

conn = sqlite3.connect("querylog.db")

cursor = conn.cursor()
cursor.execute("DROP TABLE IF EXISTS entries")
cursor.execute("CREATE TABLE entries (IP test,T datetime, QH text, QT text, QC text, ANSWER text, Result id, Elapsed int)")

cursor.execute("DROP TABLE IF EXISTS result")
cursor.execute("CREATE TABLE result (id integer primary key autoincrement, isFiltered boolean, Reason int, Rule text, FilterID int)")

conn.commit()
with open("../querylog.json") as fp:
    insert_entry_pat = "INSERT INTO entries (IP, T, QH, QT, QC, ANSWER, Elapsed) VALUES(?, ?, ?, ?, ?, ?, ?)"
    insert_result_pat = "INSERT INTO result (isFiltered, Reason, Rule, FilterID) VALUES(?, ?, ?, ?)"
    for line in fp:
        content = json.loads(line)
        if content.get("Result", None):
            r=content["Result"]
            cursor.execute(insert_result_pat, (r.get("IsFiltered", 0), r["Reason"], r["Rule"], r.get("FilterID",0)))

        cursor.execute(insert_entry_pat, (content["IP"], content["T"], content["QH"], content["QT"], content["QC"], content["Answer"], content["Elapsed"]))

conn.commit()


and

a golang tool to compress entries line by line with brotli

package main

import (
        "bufio"
        "github.com/itchio/go-brotli/enc"
        "io"
        "log"
        "os"
)

func main() {
        file, err := os.Open("querylog.json")
        if err != nil {
                log.Fatal(err)
        }
        defer file.Close()

        out, _ := os.Create("querylog.br")
        defer out.Close()

        reader := bufio.NewReader(file)
        for {
                line, err := reader.ReadBytes('\n')
                if err != nil {
                        return
                }
                small, err := enc.CompressBuffer(line, nil)

                out.Write(small)
                out.WriteString("\n")
        }
}

This are my results:

-rw-r--r-- 1 me me 1.6G May 28 18:14 querylog.br
-rw-r--r-- 1 me me  14G May 28 18:14 querylog.db
-rw-r--r-- 1 me me  13G May 28 18:14 querylog.json

SQlite even increased the size but when compressing line by line (and therefore beeing able to use the seeking-algorithm already implemented by adguard) the storage usage is only 12,48% of the "raw data".

So might be an interesting way to go ;)

SQlite even increased the size

you could try having 2 additional tables qt and qc that are filled with every type/class encountered and use foreign keys in entries, that would probably save quiet some space…

I imagine querying an sqlite DB would be a lot faster than scrubbing through the brotlified json with handwritten logic?

I imagine querying an sqlite DB would be a lot faster than scrubbing through the brotlified json with handwritten logic?

It'd be definitely much much easier.

However, we're trying to stick to pure Go, this makes it easier to build AGH and to keep it portable.

That is a actually very good reason against sqlite :/

Having no compression can also be seen as a feature: AGH is a DNS server, it's not a SIEM whose goal is to stockpile as much data as possible. Having a _huge_ amount of logs is pretty bad privacy-wise.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

xenio picture xenio  ·  4Comments

snhv picture snhv  ·  3Comments

ammnt picture ammnt  ·  3Comments

s-timm picture s-timm  ·  4Comments

ammnt picture ammnt  ·  4Comments