Vector: Sending distributions to DataDog

Created on 20 Jul 2020  路  9Comments  路  Source: timberio/vector

We would like for Vector to support sending DataDog's distribution data type to their servers. At the moment, it's unclear how we should go about supporting this.

As far as I can tell, the rough lifecycle of a distribution in the normal DataDog stack of tools is as follows:

  1. The application invokes dog.distribution(...) from their DogStatsd library
  2. The DogstatsD submits that datapoint as a statsd packet with the d type identifier
  3. The DataDog agent collects those points into a sketch map
  4. Those sketches are then serialized and sent to what appears to be a beta API endpoint

The actual sketch implementation is relatively widely known, so the challenge would mostly be to determine how those sketches should be serialized and sent to the correct API endpoint. Some further code-diving seems to suggest that there are both protobuf and JSON representations, but protobuf may be preferred.

On the other hand, it seems that different DataDog libraries may send distribution datapoints directly. The Python lib references a distribution_points API resource that seems accessible via a simple POST of JSON data. This could be a much more lightweight option for Vector if it's viable, allowing us to sidestep (at least temporarily) the complexity of aggregation.

To summarize, we should try to answer the following questions:

  1. Is it supported for 3rd party tools to send distributions to the DataDog API?
  2. If yes, should we be sending aggregated sketches or collections of samples?
  3. Which is the best-supported API endpoint and format for doing so?
data model metrics outside help requirements datadog datadog_metrics enhancement

Most helpful comment

DataDog support third-party tools sending distribution data. The format of the JSON sketch isn't well-documented at this point, but the distribution_points endpoint is simple and supported. It is nearly identical to the metric submission format, but instead of points being a timestamp and float value tuple, a "point" is a timestamp and list of values tuple. This endpoint does the sketch conversion immediately on intake on the server-side. They intend to allow collection and submission of sketches in addition to raw points to support use cases where collecting and serializing a large volume of samples is not feasible, but they don't have a supported endpoint for that at this time.

So go ahead recommendation is to use the distribution_points API endpoint to send the timestamp and list of values tuple.

All 9 comments

@jamtur01 assigning this to you since we just need to unblock this work. I've reached out to our contact at Datadog and have not received a response. We should try to answer these questions this sprint, if possible.

DataDog support third-party tools sending distribution data. The format of the JSON sketch isn't well-documented at this point, but the distribution_points endpoint is simple and supported. It is nearly identical to the metric submission format, but instead of points being a timestamp and float value tuple, a "point" is a timestamp and list of values tuple. This endpoint does the sketch conversion immediately on intake on the server-side. They intend to allow collection and submission of sketches in addition to raw points to support use cases where collecting and serializing a large volume of samples is not feasible, but they don't have a supported endpoint for that at this time.

So go ahead recommendation is to use the distribution_points API endpoint to send the timestamp and list of values tuple.

@lukesteensen It's unblocked. Let's chat where it could fit.

Sounds good! I think this should be a relatively straightforward expansion of where we ended up in #2913 (/cc @ktff). The representation that we send will be the same as our existing distribution, we'll just need to serialize and route them to the correct DataDog endpoint. It looks like the python library has some code we could follow along with.

So, datadog_metrics sink transforms batched events to a single http request, but we need to support having two requests per batch since there is a chance of having distribution and non distribution metric in the same batch which have different endpoints so different uri. Some of the ways to do this are:

  1. Extend build_request method in HttpSink trait to return Vec<Request>, and all of it's call sites, and so on.
  2. Extend datadog_metrics sink to have two sinks internally, one for each endpoint, and split the events between those two.
  3. Remove batching.

First one seems like a better option if we expect to have more of this case, otherwise 2. is a more local change and there shouldn't be a lot of duplicated code but there will be an issue with Acker since we will need to synchronize those two sinks somehow. 3. is the easiest one but we would lose a feature.

cc. @lukesteensen

This sounds reminiscent of the partitioning we're doing in the aws_cloudwatch_logs sink.

Yes, that's exactly what's needed.

I think the best approach is probably to split into two sinks internally and partition events across them. Hopefully we can handle acks the same way we do for other partitioned sinks.

Yep, but luckily our sink utils mesh quite nicely together, although adding some high level documentation of the whole service/batch/buffer/partition stack is something we should consider, so I was able to reuse the partitioning logic.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MOZGIII picture MOZGIII  路  3Comments

a-rodin picture a-rodin  路  3Comments

Hoverbear picture Hoverbear  路  3Comments

raghu999 picture raghu999  路  3Comments

lewisthompson picture lewisthompson  路  3Comments