Influxdb: Create binary protocol API

Created on 16 Dec 2013 · 13Comments · Source: influxdata/influxdb

Should be a new thing under src/api that listens on a different port than anything currently running. Look into supporting the MySQL binary protocol as in the discussion here:

https://groups.google.com/forum/#!topic/influxdb/L9oVTwEizC4

Don't worry about insert into queries for now. All inserts should be through the prepared statement style that the thread talks about. Keep the statement ids 1-20 as reserved so we can support different binary encoding formats (like Protobuf, BSON, MessagePack).

Source

pauldix

Most helpful comment

I know this is a very old bug but I have a use-case that isn't well satisfied by the line protocol + HTTP: Collecting metrics from low-power embedded devices.

HTTP and the text protocol are quite burdensome in resource-constrained environments like this. For example I'm using an ESP32 to push data from some attached sensors into InfluxDB and even just adding an <sstream> include costs 100KB, which was enough to push my binary over the size limit. Yes, the line-based format can be built other ways (like a ton of sprintf) but they become annoyingly complicated pretty quickly.

There's also the size issue. Every byte counts on a low power device because every byte means having the modem powered on just that little bit longer. The text-based format drastically increases the amount of data that has to be sent so the amount of time/power that has to be spent on transmission. A 4-byte float easily doubles in size when converted to text.

In my ideal world, there'd be a way to define a schema for what ends up basically being a plain struct. That way an embedded device can essentially just memcpy it over the network into InfluxDB.

I know this might not be a use-case you have any plan to support directly (there's always the MQTT feature built into Telegraf) but thought I'd raise it here anyway since it wasn't mentioned.

bobobo1618 on 25 Aug 2019

👍5

All 13 comments

For ingesting data at least, the riemann wire protocol based on google protocol buffers https://github.com/aphyr/riemann-java-client/blob/master/src/main/proto/riemann/proto.proto is really nice, and also has a large number of existing clients available http://riemann.io/clients.html and I'd love to see this become a defacto standard in the opensource monitoring / reporting space.

The main attraction is the variety of data types included:

metric values
tags (great for tracing a given request through from end user, front end, proxy, database ... and back)
state (e.g. up/down/degraded, small freetext field)
description (perfect for annotations to events/metrics)
arbitrary key/value pairs (extensibility)

While many other protocols are also common (graphite, for example) they aren't flexible and encourage horrible work-arounds when any extensibility is required.

The protocol is available over udp or tcp, and obviously supports batching as well. Riemann itself supports, like influxdb, receiving these via HTTP/JSON as well, but for throughput and avoiding serialisation overhead, the udp & tcp protobuf implementations are significantly more efficient in CPU & reduced network traffic.

I'll start a thread on the mailing list to gather a few more thoughts.

NB Riemann also supports a query protocol too, but for the moment I'd be rapt to see an improved ingress one to start with.

dch on 20 May 2014

Keep it simple and use Redis Serialization Protocol (RESP)? (http://redis.io/topics/protocol). I've had success with it in a variety of commercial projects.

Supports batching
Supports efficient buffer management
Supports typed buffers
Binary
Well-defined encodings for strings (UTF8).
Arbitrary structures
Nested structures
Extensible types

dzrw on 22 Jul 2014

why not protobufs?

ghost on 19 Aug 2014

Here's a +1 for the Riemann protocol. collectd comes with support via the write_riemann plugin: https://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_write_riemann .

davidblewett on 27 Aug 2014

@meganuke19, Riemann's binary protocol uses protobufs.

mfournier on 27 Aug 2014

related to binary proto, i think i should mention here that a nice thing to have would be a high performance, low latency way to stream results to the client. i.e. don't encode entire result set first, then transfer result set, then decode the entire thing, before the client can start doing results. the larger the result set, the bigger the payoff. (a possible additional future optimisation of start submitting data while the resultset is still being generated, could be useful to keep in mind too)

Dieterbe on 14 Sep 2014

Protobufs is not well supported in dynamic languages. MessagePack provides great features: simple, efficient, great support in almost each environment.

robert-zaremba on 23 Sep 2014

@Dieterbe great point wrt streaming results.

@robert-zaremba which languages are you referring to? Riemann has nodejs, perl, clojure, python clients already, protobufs are well supported across the board in general. When sending very large volumes of metrics, parsing and network traffic become significant. At lower volumes this of course doesn't matter so much, but this is the main reason why I am interested in a non-HTTP+JSON based mechanism. I believe that PB are also designed to handle streaming, I am not sure about other formats if that's also the case. Practically, having a wide set of metrics tools (collectd & many others) already set up for sending metrics (due to riemann) is a huge win for influxdb in avoiding additional development effort.

dch on 23 Sep 2014

@Dieterbe @dch you can stream data using the http chunked response which is supported today.

jvshahid on 23 Sep 2014

@jvshahid hmm i should test that, but it's not available in the go client yet, right?

Dieterbe on 23 Sep 2014

@dch - Sorry, I didn't express well my thought. I mean that standard protobufs implementation in python, ruby and perl is slow. Protobufs implementation is hard. Messagepack is a binary serialization format which is simply and widely supported. It's not a protocol. You can build a protocol on top of it. And basically streaming you can do on almost any binary format by sending chunks of records.

robert-zaremba on 26 Sep 2014

I'm closing this out for now. With the new line protocol (#2696) this is much less necessary. HTTP + Gzip and using the line protocol can already saturate what our storage engine can currently do.

Will open this up again at a much later date if there seems to be a big win from it.

pauldix on 4 Jun 2015

I know this is a very old bug but I have a use-case that isn't well satisfied by the line protocol + HTTP: Collecting metrics from low-power embedded devices.

In my ideal world, there'd be a way to define a schema for what ends up basically being a plain struct. That way an embedded device can essentially just memcpy it over the network into InfluxDB.

I know this might not be a use-case you have any plan to support directly (there's always the MQTT feature built into Telegraf) but thought I'd raise it here anyway since it wasn't mentioned.

bobobo1618 on 25 Aug 2019

👍5

Was this page helpful?

0 / 5 - 0 ratings