Vector: Support FrameStream / dnstap

Created on 30 Apr 2020 · 8Comments · Source: timberio/vector

Continuing from #2041, but thought this made sense as a separate issue.

We're sending DNS data through Vector, and are looking to send dnstap directly to Vector.

Some background information:
dnstap is a binary log format for DNS using protobuf, and can be output by several DNS servers (e.g. BIND, CoreDNS, Unbound). It does this by sending the data to a Unix socket over a lightweight frame protocol called FrameStreams (documentation and github (C) and golang version).

What I envision is:

[ ] FrameStream source
- for our use case, will be from Unix sockets, but could conceivably be from UDP/TCP
- based off existing Unix source, but I think it's complex enough to make a new source
- Length delimited frames
- Some basic control frames (e.g. ready (with content type), accept, start, stop, finished)
- Needs to be able to respond on the socket
- Send full data message as Vector Event
[ ] dnstap transform
- dnstap uses protobuf (dnstap.proto) so when the WASM transform is finished I imagine this will use that. Until it's done I'll probably use a custom transform on a fork -- I doubt y'all want random protobuf transforms if the end goal is to have them in WASM?
- part of dnstap protobuf is just bytes containing the original query
- parse that (e.g. with trust-dns) and extract information into Vector Event

I'm working on the bringing the FrameStream code into Rust, and will do that source and the dnstap transform -- but I wanted to ask for thoughts on how this should be done, and see if these changes would be accepted.

logs sources idea approval more demand feature

Source

bill-bateman

Most helpful comment

We don't need to wait for wasm for this! Since we know the proto beforehand we can build support in already, see https://github.com/timberio/vector/blob/master/proto/event.proto :)

Hoverbear on 1 May 2020

🎉2 👍2

All 8 comments

We don't need to wait for wasm for this! Since we know the proto beforehand we can build support in already, see https://github.com/timberio/vector/blob/master/proto/event.proto :)

Hoverbear on 1 May 2020

🎉2 👍2

I think you might actually want to start with a dnstap source and decode the proto right off the wire, should be faster and simpler. We're investigating the idea of first class codecs in #2414 so that might be a good place to fit in FrameStream

Hoverbear on 2 May 2020

OK so if I understand this correctly -- the dnstap source should handle all of the FrameStream and proto and dns parsing -- and the idea with the first class codecs is to allow a config setting to determine like FrameStream vs other codec?

bill-bateman on 4 May 2020

Hello -- had a question on this (and yes we are working on it!)

For our usecase we want to be able to filter based on if the sourceAddress is in a given CIDR (i.e. if source IP is in a given range, do not create events).

The filter transform doesn't have a good way of dealing with CIDR -- wondering to what extent is it ok to add filtering to a source directly? Or is it a better idea to make a new transform / edit the filter transform?

The actual Rust code of filtering it is pretty straightforward (cidr_utils crate), I guess it's more a question of should sources be allowed to do filtering or is that something only transforms should do?

bill-bateman on 3 Jun 2020

Since the original source IP address (as opposed to its text representation) that would be used for CIDR based filtering is only available in the source, it would make some sense to do the filtering in the source instead of re-parsing the text. Having said that, this is a very general kind of filter, and so it should be available to any source that accepts connections. I think the best approach would be done at the source level but in such a way that it applies easily to all sources.

bruceg on 3 Jun 2020

Sorry -- by source IP I don't mean the IP address of the connection, but the client IP from the DNS query that's being logged.

As in we want to parse the dnstap message then filter based on the contents.

bill-bateman on 3 Jun 2020

Ah, ok, would this client IP is "parsed" directly into a text field, I'd say then the filtering would be best done in a transform. The simplest would almost certainly be to add it to src/conditions/check_fields.rs which is used in the filter and swimlanes and also allows for negation.

bruceg on 3 Jun 2020

👍1

OK thank you! I'll look into that.

bill-bateman on 3 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings