Jaeger: Additional storage backends

Created on 8 Jan 2018 · 19Comments · Source: jaegertracing/jaeger

Opening this issue to keep track of other related issues.

[x] Badger KV (memory/disk) as build-in option https://github.com/jaegertracing/jaeger/pull/760
[x] ScyllaDB https://github.com/jaegertracing/jaeger/issues/197 (available)
[x] InfluxDB via grpc-plugin https://github.com/jaegertracing/jaeger/issues/272 (available)
[x] Logz.io https://github.com/logzio/jaeger-logzio (available)
[x] Couchbase via grpc-plugin https://github.com/jaegertracing/jaeger/issues/1575 (available)
[ ] Netflix Dynomite https://github.com/jaegertracing/jaeger/issues/331
[ ] Amazon DynamoDB https://github.com/jaegertracing/jaeger/issues/421
[ ] Amazon S3 https://github.com/jaegertracing/jaeger/issues/2633
[ ] BigTable https://github.com/jaegertracing/jaeger/issues/1208
[ ] ConsmosDB https://github.com/jaegertracing/jaeger/issues/1667
[ ] PostgreSQL https://github.com/jaegertracing/jaeger/issues/1895
[ ] MySQL https://github.com/jaegertracing/jaeger/issues/1944
[ ] Elasticsearch APM data format https://github.com/jaegertracing/jaeger/issues/1365
[ ] ClickHouse https://github.com/jaegertracing/jaeger/issues/1438

Relevant issue: plugin support #422 (done).

enhancement storage

Source

yurishkuro

🚀9

Most helpful comment

Giving my two cents.. an ANSI SQL could work for small workloads, so may be useful for lower-throughput applications that still want to benefit from this tool.

I will also throw out there that Timescale (a Postgres extension) may be a good fit for the required high write throughput.

bruth on 7 Aug 2018

👍12

All 19 comments

Did you remove the flags for elasticsearch in jaeger-collector? Because I'm doing a test using the image docker, which version is:

{"gitCommit":"dbd5db721fc59431b1e64874cc7d6265d89ec917","GitVersion":"v1.1.0","BuildDate":"2018-01-08T21:56:21Z"}

and I cannot see the elasticsearch flags.

nbettiol on 9 Jan 2018

It looks like you're using latest instead of 1.1. We recently moved around some of the flags so that we can support plugins better https://github.com/jaegertracing/jaeger/pull/625. Using latest, you have to instead use env variable SPAN_STORAGE=elasticsearch to use the elasticsearch flags. I'd recommend that you use 1.1 since this change will be apart of 1.2 and will be documented at that time.

black-adder on 9 Jan 2018

Thanks for the reply, yes I was using the latest version. I will use the 1.1

nbettiol on 9 Jan 2018

I would love to see a SQL option (whatever ANSI SQL that will be least vendor lock-in).
Setting up Cassandra / ElasticSearch might be too ambitious for projects that want distributed tracing but honestly don't have the TPS to warrant a distributed datastore.

fzakaria on 16 Jan 2018

👍4

Since I work with PostgreSQL, I sure wouldn't complain. But honestly I'm not sure a SQL db is an optimal store for largely free-form metrics of this nature. PostgreSQL at least offers the jsonb type for indexable free-form data. If you're trying to do this in a vendor neutral way you'll land up with your own json blobs, or doing EAV, and both of those are terrible. ANSI SQL is a poor fit for variable-structured or key/value form data and you'll need some vendor extensions to get usable performance.

But you inevitably land up with someone putting an ORM on top to "abstract" the DB. Then the ORM performs terribly, gobbles memory and everyone says "the SQL backend is slow, use instead".

ringerc on 3 Feb 2018

👍1

Related issue to this one is https://github.com/jaegertracing/jaeger/issues/551. Upvote if you are interested in it.

pavolloffay on 5 Feb 2018

New related issue-
Files - https://github.com/jaegertracing/jaeger/issues/894

SwarnimRaj on 29 Jun 2018

We are looking at using BigQuery as a storage layer. Presumably this could work with a SQL storage option. SQL can be a generic way to deal with columnar data stores in a generic way. I would complain about a BigQuery specific solution, but I think there is a place for generic SQL interface beyond RDBs.

wy100101 on 1 Aug 2018

I assume that even if some database can be treated as SQL and accessed via standard database/sql API, we still need to statically import the actual driver. Granted, this may be less maintenance than a dedicated SpanStorage implementation. However, now that the protobuf model has been merged, nothing is blocking us from moving on the storage plugin dev, eg using something like harshicorp grpc plugin framework.

yurishkuro on 1 Aug 2018

~~Our model is sufficiently simple to warrant looking into using an ORM to support a large number of backends. I'll take a look at what's available.~~ Reread above and understand what @yurishkuro means.

isaachier on 1 Aug 2018

Giving my two cents.. an ANSI SQL could work for small workloads, so may be useful for lower-throughput applications that still want to benefit from this tool.

I will also throw out there that Timescale (a Postgres extension) may be a good fit for the required high write throughput.

bruth on 7 Aug 2018

👍12

Clickhouse are SQL high performance storage very efficient for log and trace storage and whold be perfect storage alternative to cassandra original one... they are a true column db... distributed...compressed...

they are near to the CQL (sql like query language)... they use an SQL like language to...

https://clickhouse.yandex/

mcarbonneaux on 27 May 2019

👍3

I just thought that I'd drop something here to say that there is also support for using Couchbase as a storage backend (via the grpc plugin), currently at https://github.com/chvck/couchbase-jaeger-storage-plugin. Will likely move to the couchbase-labs organisation in time.