Jaeger: Additional storage backends

Created on 8 Jan 2018  路  19Comments  路  Source: jaegertracing/jaeger

enhancement storage

Most helpful comment

Giving my two cents.. an ANSI SQL could work for small workloads, so may be useful for lower-throughput applications that still want to benefit from this tool.

I will also throw out there that Timescale (a Postgres extension) may be a good fit for the required high write throughput.

All 19 comments

Did you remove the flags for elasticsearch in jaeger-collector? Because I'm doing a test using the image docker, which version is:

{"gitCommit":"dbd5db721fc59431b1e64874cc7d6265d89ec917","GitVersion":"v1.1.0","BuildDate":"2018-01-08T21:56:21Z"}

and I cannot see the elasticsearch flags.

It looks like you're using latest instead of 1.1. We recently moved around some of the flags so that we can support plugins better https://github.com/jaegertracing/jaeger/pull/625. Using latest, you have to instead use env variable SPAN_STORAGE=elasticsearch to use the elasticsearch flags. I'd recommend that you use 1.1 since this change will be apart of 1.2 and will be documented at that time.

Thanks for the reply, yes I was using the latest version. I will use the 1.1

I would love to see a SQL option (whatever ANSI SQL that will be least vendor lock-in).
Setting up Cassandra / ElasticSearch might be too ambitious for projects that want distributed tracing but honestly don't have the TPS to warrant a distributed datastore.

Since I work with PostgreSQL, I sure wouldn't complain. But honestly I'm not sure a SQL db is an optimal store for largely free-form metrics of this nature. PostgreSQL at least offers the jsonb type for indexable free-form data. If you're trying to do this in a vendor neutral way you'll land up with your own json blobs, or doing EAV, and both of those are terrible. ANSI SQL is a poor fit for variable-structured or key/value form data and you'll need some vendor extensions to get usable performance.

But you inevitably land up with someone putting an ORM on top to "abstract" the DB. Then the ORM performs terribly, gobbles memory and everyone says "the SQL backend is slow, use instead".

Related issue to this one is https://github.com/jaegertracing/jaeger/issues/551. Upvote if you are interested in it.

We are looking at using BigQuery as a storage layer. Presumably this could work with a SQL storage option. SQL can be a generic way to deal with columnar data stores in a generic way. I would complain about a BigQuery specific solution, but I think there is a place for generic SQL interface beyond RDBs.

I assume that even if some database can be treated as SQL and accessed via standard database/sql API, we still need to statically import the actual driver. Granted, this may be less maintenance than a dedicated SpanStorage implementation. However, now that the protobuf model has been merged, nothing is blocking us from moving on the storage plugin dev, eg using something like harshicorp grpc plugin framework.

Our model is sufficiently simple to warrant looking into using an ORM to support a large number of backends. I'll take a look at what's available. Reread above and understand what @yurishkuro means.

Giving my two cents.. an ANSI SQL could work for small workloads, so may be useful for lower-throughput applications that still want to benefit from this tool.

I will also throw out there that Timescale (a Postgres extension) may be a good fit for the required high write throughput.

Clickhouse are SQL high performance storage very efficient for log and trace storage and whold be perfect storage alternative to cassandra original one... they are a true column db... distributed...compressed...

they are near to the CQL (sql like query language)... they use an SQL like language to...

https://clickhouse.yandex/

I just thought that I'd drop something here to say that there is also support for using Couchbase as a storage backend (via the grpc plugin), currently at https://github.com/chvck/couchbase-jaeger-storage-plugin. Will likely move to the couchbase-labs organisation in time.

Has someone started to work on Azure CosmosDB integration? It has support for Cassandra API, but I couldn't manage to make it work...

I just created an issue proposing Chronowave as storage backend. https://github.com/jaegertracing/jaeger/issues/2534

What about ClickHouse? Clickhouse is very cool

What about ClickHouse? Clickhouse is very cool

It's already linked in the issue's description, but here's the tracking issue for it: #1438

What about Apache Solr?

With the recent changes to ElasticSearch licensing, this just because SUPER important.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yurishkuro picture yurishkuro  路  4Comments

NeoCN picture NeoCN  路  4Comments

yurishkuro picture yurishkuro  路  5Comments

black-adder picture black-adder  路  4Comments

tomaszturkowski picture tomaszturkowski  路  4Comments