Jaeger: Support archiving traces with ES storage

Created on 15 May 2018 · 18Comments · Source: jaegertracing/jaeger

Requirement - what kind of business use case are you trying to solve?

We use the ES backend, and would like to be able to archive traces

Problem - what in Jaeger blocks you from solving the requirement?

Archiving is only supported by the Cassandra storage plugin

Proposal - what do you suggest to solve the problem or improve the existing situation?

I briefly looked at the implementation of archiving. It looks like most of the logic is built on things that already existed, and that it would be fairly straightforward to add an ES implementation. Two options come to mind:
1) Allow configuring a second ES cluster for archiving (the same way we do for Cassandra)
2) Use a separate index for archive traces. The esCleaner.py script would need to ignore this index.

Any open questions to address

It seems like we'd need a different way of dividing up the indexes to support this if we go with option (1) above. Currently we create an index for each day, which probably doesn't make sense for archiving. Option (2) above would solve this inherently.

enhancement help wanted storagelasticsearch

Source

nziebart

👍1

All 18 comments

I think this needs both 1 and 2 simultaneously. Probably requires #799 and #628 to be implemented first (per cluster), and then add ability to specify archiving cluster.

yurishkuro on 15 May 2018

Is there a specific motivation for wanting a separate cluster for archiving, rather than just a separate keysapce/index?

nziebart on 16 May 2018

it doesn't have to be a separate cluster, but doesn't have NOT to be either. The configuration is such that archive storage inherits most settings from primary storage, and you can override some things.

yurishkuro on 16 May 2018

I am looking into this

pavolloffay on 15 Nov 2018

@yurishkuro Are the archived traces shown on /search page? it seems that an archived trace can be only accessed directly via /trace/id endpoint.

pavolloffay on 15 Nov 2018

If the above is true we don't want to create index for service names, therefore e will have to make changes to span writer and reader. I also assume that we will create only one archive index per deployment.

To be able to support multiple tenants the index prefix will be just put in front of the archive index e.g. tenant:jaeger-span-archive or tenant:jaeger-span-1970-01-01 :)

pavolloffay on 15 Nov 2018

@yurishkuro Are the archived traces shown on /search page? it seems that an archived trace can be only accessed directly via /trace/id endpoint.

Yes, it only works for direct lookups by ID. It's primarily built to support long-lived hyperlinks that people can put to tickets, postmortem docs, etc.

yurishkuro on 16 Nov 2018

👍1

We should also think about retention for the archive index. We could do it per time like proposed in #628 (day, month, year) or just allow to use a different index name e.g. `jaeger-span-archive-2. It might be also doable with prefix.

Note that it's not expected to delete data from an index in ES.

pavolloffay on 16 Nov 2018

^^ cc @jaegertracing/elasticsearch

pavolloffay on 16 Nov 2018

Just as a counter-point, we would not use the native Jaeger archive at all.

Instead, we currently rely on our own elasticsearch-curator configuration to route indices from "hot nodes" to "warm nodes". Specifically in our Kube+AWS deployment, this means moving from Elasticsearch nodes sized as r4.4xlarge with gp2 EBS volumes to r4.xlarge nodes with st1 EBS volumes.

This might be a better recommendation for the Jaeger project, even though it is more complexity in the Elasticsearch deployment and surrounding tooling itself.

An older blog post, though still largely relevant in Elasticsearch 6.x, which further explains this architecture: "Hot-Warm" Architecture in Elasticsearch 5.x

I guess, specifically, this feels like possibly the wrong way to fix a performance limitation/regression in the Elasticsearch storage backend. If given time-bounding on the query, it will "automatically" optimize which indexes need to be scanned instead of walking the entire available data-set of spans.

masteinhauser on 20 Nov 2018

This is all new to me I will have to experiment and do some reading. But it seems we could have one archive index (as perhaps an alias). This alias would point to one write index (archive-3) and several read indices (archive-1, archive-2). The write index would be rolled over (based on conditions - shards, time?) and put to read indices. I am not sure if the rolled over index can be automatically assigned to another alias.

pavolloffay on 20 Nov 2018

After playing with rollover API here is my proposal how we could go forward and use it for archive index. If it works well we could start experimenting and use it for the main indices.

First a brief explanation of rollover API:

It's an API which allows to rollover a new index if the old one matches any condition - age, number of documents, size.
the API has to be called explicitly - rollover does not happen automatically once it's setup.
the API returns the name of the new index
name of the new index can be specified e.g. name-%counter or name-%date-%counter

Great news is that ES >= 6.4.4 supports is_write_index (https://www.elastic.co/guide/en/elasticsearch/reference/6.4/indices-aliases.html#aliases-write-index, https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html#indices-rollover-is-write-index) which allows to use an alias for writes with multiple indices. The write index in the alias has is_write_index:true. With rollover the old index stays in alias as read only. This simplifies readers and external tolling - adding the old index to separate read alias.

My proposal is to allow use two archive indices - one for writes and one for reads. By default the read index would be the same as write. This satifies a simple deployments with no extra configuration and the more complex deployments would have to create the archive aliases (or single if ES6 is used) before deploying and use cronjob with rollover.

cc @jaegertracing/elasticsearch any feedback is welcome.

The last thing we have to figure out is how to call rollover. We could use curator with a cronjob. The culrator would call rollover, parse the response and put the old index (if ES5) to read alias (I am not sure if rollover operation in curator returns an object with index info).
{"old_index":"jaeger-span-archive-000001","new_index":"jaeger-span-archive-000002","rolled_over":true,"dry_run":false,"acknowledged":true,"shards_acknowledged":true,"conditions":{"[max_age: 1s]":true,"[max_docs: 1]":false}}
I am linking an issue regarding automatic rollover API https://github.com/elastic/elasticsearch/issues/26092.

pavolloffay on 26 Nov 2018

👍1

NB: are you only thinking of using this for archives? It seems useful for the main storage as well, since we're currently issuing queries over multiple indices, rather than a single alias, which would simplify the code & configuration.

yurishkuro on 26 Nov 2018

My plan is to start with the archive index and then add option for the main storage.

pavolloffay on 26 Nov 2018

that's fair, although in Cassandra there is no difference between main/archive storage implementations, just a configuration, so you would still need to make changes to the ES storage impl - are you thinking a fork or a feature flag?

yurishkuro on 26 Nov 2018

A good question, I think a feature flag there will be a lot of similarities. Only get index names functions should by different maybe we could resolve that function in the constructor.

My main blocker here is how to get old index name after rollover and put it into the read alias. I will have to play with the curator.

pavolloffay on 26 Nov 2018

@pavolloffay Would archiving traces only be supported with ES >= 6.4.4?