Should be a new thing under src/api that listens on a different port than anything currently running. Look into supporting the MySQL binary protocol as in the discussion here:
https://groups.google.com/forum/#!topic/influxdb/L9oVTwEizC4
Don't worry about insert into queries for now. All inserts should be through the prepared statement style that the thread talks about. Keep the statement ids 1-20 as reserved so we can support different binary encoding formats (like Protobuf, BSON, MessagePack).
For ingesting data at least, the riemann wire protocol based on google protocol buffers https://github.com/aphyr/riemann-java-client/blob/master/src/main/proto/riemann/proto.proto is really nice, and also has a large number of existing clients available http://riemann.io/clients.html and I'd love to see this become a defacto standard in the opensource monitoring / reporting space.
The main attraction is the variety of data types included:
While many other protocols are also common (graphite, for example) they aren't flexible and encourage horrible work-arounds when any extensibility is required.
The protocol is available over udp or tcp, and obviously supports batching as well. Riemann itself supports, like influxdb, receiving these via HTTP/JSON as well, but for throughput and avoiding serialisation overhead, the udp & tcp protobuf implementations are significantly more efficient in CPU & reduced network traffic.
I'll start a thread on the mailing list to gather a few more thoughts.
NB Riemann also supports a query protocol too, but for the moment I'd be rapt to see an improved ingress one to start with.
Keep it simple and use Redis Serialization Protocol (RESP)? (http://redis.io/topics/protocol). I've had success with it in a variety of commercial projects.
why not protobufs?
Here's a +1 for the Riemann protocol. collectd comes with support via the write_riemann plugin: https://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_write_riemann .
@meganuke19, Riemann's binary protocol uses protobufs.
related to binary proto, i think i should mention here that a nice thing to have would be a high performance, low latency way to stream results to the client. i.e. don't encode entire result set first, then transfer result set, then decode the entire thing, before the client can start doing results. the larger the result set, the bigger the payoff. (a possible additional future optimisation of start submitting data while the resultset is still being generated, could be useful to keep in mind too)
Protobufs is not well supported in dynamic languages. MessagePack provides great features: simple, efficient, great support in almost each environment.
@Dieterbe great point wrt streaming results.
@robert-zaremba which languages are you referring to? Riemann has nodejs, perl, clojure, python clients already, protobufs are well supported across the board in general. When sending very large volumes of metrics, parsing and network traffic become significant. At lower volumes this of course doesn't matter so much, but this is the main reason why I am interested in a non-HTTP+JSON based mechanism. I believe that PB are also designed to handle streaming, I am not sure about other formats if that's also the case. Practically, having a wide set of metrics tools (collectd & many others) already set up for sending metrics (due to riemann) is a huge win for influxdb in avoiding additional development effort.
@Dieterbe @dch you can stream data using the http chunked response which is supported today.
@jvshahid hmm i should test that, but it's not available in the go client yet, right?
@dch - Sorry, I didn't express well my thought. I mean that standard protobufs implementation in python, ruby and perl is slow. Protobufs implementation is hard. Messagepack is a binary serialization format which is simply and widely supported. It's not a protocol. You can build a protocol on top of it. And basically streaming you can do on almost any binary format by sending chunks of records.
I'm closing this out for now. With the new line protocol (#2696) this is much less necessary. HTTP + Gzip and using the line protocol can already saturate what our storage engine can currently do.
Will open this up again at a much later date if there seems to be a big win from it.
I know this is a very old bug but I have a use-case that isn't well satisfied by the line protocol + HTTP: Collecting metrics from low-power embedded devices.
HTTP and the text protocol are quite burdensome in resource-constrained environments like this. For example I'm using an ESP32 to push data from some attached sensors into InfluxDB and even just adding an <sstream> include costs 100KB, which was enough to push my binary over the size limit. Yes, the line-based format can be built other ways (like a ton of sprintf) but they become annoyingly complicated pretty quickly.
There's also the size issue. Every byte counts on a low power device because every byte means having the modem powered on just that little bit longer. The text-based format drastically increases the amount of data that has to be sent so the amount of time/power that has to be spent on transmission. A 4-byte float easily doubles in size when converted to text.
In my ideal world, there'd be a way to define a schema for what ends up basically being a plain struct. That way an embedded device can essentially just memcpy it over the network into InfluxDB.
I know this might not be a use-case you have any plan to support directly (there's always the MQTT feature built into Telegraf) but thought I'd raise it here anyway since it wasn't mentioned.
Most helpful comment
I know this is a very old bug but I have a use-case that isn't well satisfied by the line protocol + HTTP: Collecting metrics from low-power embedded devices.
HTTP and the text protocol are quite burdensome in resource-constrained environments like this. For example I'm using an ESP32 to push data from some attached sensors into InfluxDB and even just adding an
<sstream>include costs 100KB, which was enough to push my binary over the size limit. Yes, the line-based format can be built other ways (like a ton ofsprintf) but they become annoyingly complicated pretty quickly.There's also the size issue. Every byte counts on a low power device because every byte means having the modem powered on just that little bit longer. The text-based format drastically increases the amount of data that has to be sent so the amount of time/power that has to be spent on transmission. A 4-byte float easily doubles in size when converted to text.
In my ideal world, there'd be a way to define a schema for what ends up basically being a plain struct. That way an embedded device can essentially just
memcpyit over the network into InfluxDB.I know this might not be a use-case you have any plan to support directly (there's always the MQTT feature built into Telegraf) but thought I'd raise it here anyway since it wasn't mentioned.