I want to use both Thanos receive and Thanos ruler components, but because Thanos ruler only store rules result in local and ship the data to remote storage like s3, the ruler node will be a single node with HA risk.
In my opinion, I think if Thanos ruler implements the remote write feature, we can write the ruler data to Thanos receive with few seconds delay, then Thanos ruler would be a stateless component (of course if we can accept minutes data loss maybe can reduce by rule failover).
The state of the ruler wouldn't disappear. The tsdb would continue to need to be a local persistent buffer. I don't think we would actually gain anything from this unfortunately.
You should run the thanos ruler twice an identical pair if you are looking for HA probably.
@brancz yes the state wouldn't disappear, but this will help to set ruler as more lighten component right? I think this situation similar to the relationship between Prometheus and Thanos receive, the Prometheus will not act as a fully stateless component but we can treat it stateless because of Thanos receive.
@FUSAKLA yes, I thought like this before, but there are some disadvantages I think:
query -> exec -> write in the whole system, and Thanos receive and Thanos store act as query components, these sounds more clear, is it?Yes, the HA always comes with some price. It will be calculated twice but it's similar to Prometheus HA where you do the scraping and evaluation twice as well. But I wouldn't say adding the ruler StoreAPI to the cluster is that much of a burden and even less if you use service discovery. Still, the remote write won't help you with HA.
But I don't see reason to not have the remote write in the ruler eventually. I agree it can make the component more lightweight same way as for a Prometheus with sidecar. But the benefits are now quite small IMHO.
I think the requirements should quite similar to why we create Thanos receive component.
Thanos had sidecar, query and store components before, these components also work well, and looks like already making Prometheus with distributing feature. but we create receive component now, I think it will be a little conflict with sidecar and query components, however, we had created it, and the concept looks like more clear and meaningful.
In my opinion, we can separate the components into several parts in the whole architecture as below:
Prometheus and rulercompactreceive and storeeach part should focus on the main function is provided, meanwhile, the same function should keep unified and we can improve it continues. for example, write to remote storage should done by receive component, raw data (include collect from exporter and calc from rules) should delivery to receive component asap, hot data query will be served in receive but not any other part.
so here is why I want to ruler implements the remote write feature, I think it will make the system overview more clearly and keep each part simple.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Running and scaling receive is somewhat involved I would hate for that kind of involvement to be necessary to use the component when people just use the sidecar approach and otherwise don鈥檛 need the receive component.
This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.
For now I鈥檓 going to close this. We might rethink this at a later point, but at the moment we don鈥檛 feel this is the right strategy for the project. Thanks a lot for starting the discussion! :)
Most helpful comment
I think the requirements should quite similar to why we create Thanos receive component.
Thanos had
sidecar,queryandstorecomponents before, these components also work well, and looks like already making Prometheus with distributing feature. but we create receive component now, I think it will be a little conflict with sidecar and query components, however, we had created it, and the concept looks like more clear and meaningful.In my opinion, we can separate the components into several parts in the whole architecture as below:
Prometheusandrulercompactreceiveandstoreeach part should focus on the main function is provided, meanwhile, the same function should keep unified and we can improve it continues. for example, write to remote storage should done by
receivecomponent, raw data (include collect from exporter and calc from rules) should delivery toreceivecomponent asap, hot data query will be served inreceivebut not any other part.so here is why I want to
rulerimplements the remote write feature, I think it will make the system overview more clearly and keep each part simple.