Thanos: ruler: how about implements the Prometheus remote write feature into thanos ruler?

Created on 6 Nov 2019  路  10Comments  路  Source: thanos-io/thanos

I want to use both Thanos receive and Thanos ruler components, but because Thanos ruler only store rules result in local and ship the data to remote storage like s3, the ruler node will be a single node with HA risk.

In my opinion, I think if Thanos ruler implements the remote write feature, we can write the ruler data to Thanos receive with few seconds delay, then Thanos ruler would be a stateless component (of course if we can accept minutes data loss maybe can reduce by rule failover).

rule

Most helpful comment

I think the requirements should quite similar to why we create Thanos receive component.
Thanos had sidecar, query and store components before, these components also work well, and looks like already making Prometheus with distributing feature. but we create receive component now, I think it will be a little conflict with sidecar and query components, however, we had created it, and the concept looks like more clear and meaningful.

In my opinion, we can separate the components into several parts in the whole architecture as below:

  1. generate: Prometheus and ruler
  2. store: remote storage and compact
  3. read: receive and store

each part should focus on the main function is provided, meanwhile, the same function should keep unified and we can improve it continues. for example, write to remote storage should done by receive component, raw data (include collect from exporter and calc from rules) should delivery to receive component asap, hot data query will be served in receive but not any other part.

so here is why I want to ruler implements the remote write feature, I think it will make the system overview more clearly and keep each part simple.

All 10 comments

The state of the ruler wouldn't disappear. The tsdb would continue to need to be a local persistent buffer. I don't think we would actually gain anything from this unfortunately.

You should run the thanos ruler twice an identical pair if you are looking for HA probably.

@brancz yes the state wouldn't disappear, but this will help to set ruler as more lighten component right? I think this situation similar to the relationship between Prometheus and Thanos receive, the Prometheus will not act as a fully stateless component but we can treat it stateless because of Thanos receive.

@FUSAKLA yes, I thought like this before, but there are some disadvantages I think:

  1. calc an identical pair twice or more will increase the system load.
  2. fan out load on Thanos query side? data will store on the ruler component and we have to add Thanos ruler in Thanos query.
  3. not simple enough for Thanos ruler, I think Thanos ruler focus on query -> exec -> write in the whole system, and Thanos receive and Thanos store act as query components, these sounds more clear, is it?

Yes, the HA always comes with some price. It will be calculated twice but it's similar to Prometheus HA where you do the scraping and evaluation twice as well. But I wouldn't say adding the ruler StoreAPI to the cluster is that much of a burden and even less if you use service discovery. Still, the remote write won't help you with HA.

But I don't see reason to not have the remote write in the ruler eventually. I agree it can make the component more lightweight same way as for a Prometheus with sidecar. But the benefits are now quite small IMHO.

I think the requirements should quite similar to why we create Thanos receive component.
Thanos had sidecar, query and store components before, these components also work well, and looks like already making Prometheus with distributing feature. but we create receive component now, I think it will be a little conflict with sidecar and query components, however, we had created it, and the concept looks like more clear and meaningful.

In my opinion, we can separate the components into several parts in the whole architecture as below:

  1. generate: Prometheus and ruler
  2. store: remote storage and compact
  3. read: receive and store

each part should focus on the main function is provided, meanwhile, the same function should keep unified and we can improve it continues. for example, write to remote storage should done by receive component, raw data (include collect from exporter and calc from rules) should delivery to receive component asap, hot data query will be served in receive but not any other part.

so here is why I want to ruler implements the remote write feature, I think it will make the system overview more clearly and keep each part simple.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Running and scaling receive is somewhat involved I would hate for that kind of involvement to be necessary to use the component when people just use the sidecar approach and otherwise don鈥檛 need the receive component.

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

For now I鈥檓 going to close this. We might rethink this at a later point, but at the moment we don鈥檛 feel this is the right strategy for the project. Thanks a lot for starting the discussion! :)

Was this page helpful?
0 / 5 - 0 ratings