Continuing from #2041, but thought this made sense as a separate issue.
We're sending DNS data through Vector, and are looking to send dnstap directly to Vector.
Some background information:
dnstap is a binary log format for DNS using protobuf, and can be output by several DNS servers (e.g. BIND, CoreDNS, Unbound). It does this by sending the data to a Unix socket over a lightweight frame protocol called FrameStreams (documentation and github (C) and golang version).
What I envision is:
bytes containing the original queryI'm working on the bringing the FrameStream code into Rust, and will do that source and the dnstap transform -- but I wanted to ask for thoughts on how this should be done, and see if these changes would be accepted.
We don't need to wait for wasm for this! Since we know the proto beforehand we can build support in already, see https://github.com/timberio/vector/blob/master/proto/event.proto :)
I think you might actually want to start with a dnstap source and decode the proto right off the wire, should be faster and simpler. We're investigating the idea of first class codecs in #2414 so that might be a good place to fit in FrameStream
OK so if I understand this correctly -- the dnstap source should handle all of the FrameStream and proto and dns parsing -- and the idea with the first class codecs is to allow a config setting to determine like FrameStream vs other codec?
Hello -- had a question on this (and yes we are working on it!)
For our usecase we want to be able to filter based on if the sourceAddress is in a given CIDR (i.e. if source IP is in a given range, do not create events).
The filter transform doesn't have a good way of dealing with CIDR -- wondering to what extent is it ok to add filtering to a source directly? Or is it a better idea to make a new transform / edit the filter transform?
The actual Rust code of filtering it is pretty straightforward (cidr_utils crate), I guess it's more a question of should sources be allowed to do filtering or is that something only transforms should do?
Since the original source IP address (as opposed to its text representation) that would be used for CIDR based filtering is only available in the source, it would make some sense to do the filtering in the source instead of re-parsing the text. Having said that, this is a very general kind of filter, and so it should be available to any source that accepts connections. I think the best approach would be done at the source level but in such a way that it applies easily to all sources.
Sorry -- by source IP I don't mean the IP address of the connection, but the client IP from the DNS query that's being logged.
As in we want to parse the dnstap message then filter based on the contents.
Ah, ok, would this client IP is "parsed" directly into a text field, I'd say then the filtering would be best done in a transform. The simplest would almost certainly be to add it to src/conditions/check_fields.rs which is used in the filter and swimlanes and also allows for negation.
OK thank you! I'll look into that.
Most helpful comment
We don't need to wait for wasm for this! Since we know the proto beforehand we can build support in already, see https://github.com/timberio/vector/blob/master/proto/event.proto :)