Thanos: Is there a way I can do a Global Scale Thanos Deployment?

Created on 1 Aug 2018 · 9Comments · Source: thanos-io/thanos

I want to be able to use an ingress as in --cluster.peers to discover all peers inside another cluster. Is this possible? Or do I need to create and ingress for each prometheus and thanos store in my cluster?

I know I can use the service like --cluster.peers=thanos-peers.monitoring.svc.cluster.local:10900

I'm referencing your slide 80 on slideshare

Also an example how you do this would be sweet!!!

question stale

Source

astub

All 9 comments

Sure, we can try to explain that (:

Global gossip:

You need direct access to the pods on IP level, so it is hard without VPC peering or VPN.

Using static --store option:

You can use some proxy (we are doing this via envoy) to set up the connection between query and sidecars

Using static --store + gossip option:

You can use some proxy to set up the connection between query and another query in separate, isolated cluster that uses gossip to connect to local sidecars.

So the answer to your title is: yes and we are doing that right now.

[Thanos query -> (static) -> Thanos query per each environment] -> (static + proxy) -> [ Thanos-query -> (gossip) local Thanos sidecars]

Where things inside [ ] are in the same network.

bwplotka on 2 Aug 2018

😕2 👍1

--cluster.peers=thanos-peers.monitoring.svc.cluster.local:10900 works only if you can resolve this DNS entry and resolved IPs can be accessed.

bwplotka on 2 Aug 2018

@Bplotka Thanks so much. So this is my setup...

I have 2 query pods and 2 store nodes per cluster.
I have an ingress controller for my peers service and my query service.
I tell my query pods about the other cluster's peers ingresses.

Is this right? What would you change? I'm using different buckets for each cluster, but was thinking I can just use the same one. I not able to resolve datapoints cross cluster, but I can get the series names.

values.yaml

ingress:
  host: k8clusterA or k8clusterB or k8clusterC # this is a --set-value
query:
  peers:
  - "cluster.monitoring.k8clusterA:80"
  - "cluster.monitoring.k8clusterB:80"
  - "cluster.monitoring.k8clusterC:80"

Peers/Cluster

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: thanos-cluster
spec:
  rules:
  - host: cluster.monitoring.{{ .Values.ingress.host }}
    http:
      paths:
      - path: /
        backend:
          serviceName: thanos-peers
          servicePort: 10900
---
apiVersion: v1
kind: Service
metadata:
  name: thanos-peers
  namespace: monitoring
  labels:
    app: {{ template "thanos.name" . }}
    chart: {{ template "thanos.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: cluster
    port: 10900
    targetPort: cluster
  selector:
    thanos-peer: "true"

Query

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: thanos-query
spec:
  rules:
  - host: thanos.monitoring.{{ .Values.ingress.host }}
    http:
      paths:
      - path: /
        backend:
          serviceName: thanos-query
          servicePort: 9090
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: thanos-query
    chart: {{ template "thanos.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
  name: thanos-query
spec:
  externalTrafficPolicy: Cluster
  ports:
  - port: 9090
    protocol: TCP
    targetPort: http
    name: http-query
  - port: 10900
    targetPort: cluster
    name: cluster
  selector:
    component: thanos-query
  sessionAffinity: None
  type: NodePort
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-query
  labels:
    chart: {{ template "thanos.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
spec:
  serviceName: "thanos-query"
  replicas: 2
  selector:
    matchLabels:
      app: thanos
      component: thanos-query
      thanos-peer: "true"
  template:
    metadata:
      labels:
        app: thanos
        component: thanos-query
        thanos-peer: "true"
    spec:
      containers:
      - name: thanos-query
        image: improbable/thanos:v0.1.0-rc.2
        args:
        - "query"
        - "--log.level=debug"
        - "--cluster.peers=thanos-peers.monitoring.svc.cluster.local:10900"
        - "--query.replica-label=replica"
        {{- range .Values.query.peers }}
        - "--cluster.peers={{ . }}"
        {{- end }}
        ports:
        - name: http
          containerPort: 10902
        - name: grpc
          containerPort: 10901
        - name: cluster
          containerPort: 10900

Store

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-store
  labels:
    chart: {{ template "thanos.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
spec:
  serviceName: "thanos-store"
  replicas: 2
  selector:
    matchLabels:
      component: thanos-store
      thanos-peer: "true"
  template:
    metadata:
      labels:
        app: thanos
        component: thanos-store
        thanos-peer: "true"
    spec:
      containers:
      - name: thanos-store
        image: improbable/thanos:v0.1.0-rc.2
        env:
        - name: S3_SECRET_KEY
          value: <key>
        args:
        - "store"
        - "--log.level=debug"
        - "--tsdb.path=/prometheus"
        - "--cluster.peers=thanos-peers.monitoring.svc.cluster.local:10900"
        - "--s3.endpoint=s3.us-west-2.amazonaws.com"
        - "--s3.bucket=<bucket>"
        - "--s3.access-key=<key>"
        ports:
        - name: http
          containerPort: 10902
        - name: grpc
          containerPort: 10901
        - name: cluster
          containerPort: 10900
        volumeMounts:
        - name: data
          mountPath: /prometheus
      volumes:
      - name: data
        emptyDir: {}

astub on 3 Aug 2018

So I've been mess with this the last few days and the major blocker I'm running into is that the cluster.advertise-address relies on an ip:port combination vs a host:port. Unless you can set a set a static IP which you advertise for the other components to find it's way to the peer, this looks impossible.

In your example you're using ingress to the gossip port, but nginx ingress by default is an HTTP layer not a TCP. You can piggy back on your ingress controller using the following method.
https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/exposing-tcp-udp-services.md

john-delivuk on 8 Aug 2018

@Bplotka How is your proxy configured? Is (static + proxy) just a store that can resolve ips on both clusters and those are it's peers?

[Thanos query -> (static) -> Thanos query per each environment] -> (static + proxy) -> [ Thanos-query -> (gossip) local Thanos sidecars]

So basically its networked like this?

Cluster A :

Query
Sidecars
Store

Proxy:

Store

Cluster B:

Query
Sidecars
Store

Peers Configs:

Cluster A Store Peers: CA-Store, CA-Sidecars, Proxy-Store
Cluster B Store Peers: CB-Store, CB-Sidecars, Proxy-Store
Proxy Peers: Proxy-Store, CA-Store, CB-Store

Then Query in Cluster A and B can point at their cluster's store using --store?

astub on 9 Aug 2018

So right now I've dropped the store store component. I'm on AWS and the work isn't finalized for s3, if my next hunch is right, this can be added.

For initial testing I've been removing all kubes based service discovery and going through my ingress controller, or external services in hopes it will scale to my multicluster environment.

I've tried configuring tcp redirect on my ingress for 30900 > thanos-peers service. This would get every component in the cluster matching the label selector, this look liked it worked in a single peer setup. Problem is that the sidecars and the query api register the pod IP, which I'm using Calico on AWS, and that IP is not accessible, probably not a problem with GKE, or kube-net, where traffic can be routed.

Then I tried using an AWS Network Load Balancer which advertises a static IP. My problem through is that as far as configuration goes this is a nightmare. I would need to provision, then check the IP, then go back and update the app with the IPs, and it really isn't ideal. This way might work if followed through, but I don't like it.

Currently, I was looking at hostPort Option for the cluster port. I'll have external-dns post records with the components node IP, and bind to the port. So now I control the node, dns is provisioned on resource create. The final piece is telling the sidecar and query it's advertised address, which I think can be done via an ENV var and mapping it's value from .status.hostIP

In the end it will look like this
Cluster 1:
Node 1 (192.168.1.10)
Prom / Sidecar
Binding to Node's 10900
--cluster.peers="thanos-peers.cluster1.example.org"
--cluster.peers="thanos-peers.cluster2.example.org"
--cluster.advertise-address=$NODE_IP:10900
env:
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIp
DNS record thanos-peer.cluster1.example.org -> 192.168.1.10

Node 2 (192.168.1.20)
Query 1
Binding to Node's 10900
--cluster.peers="thanos-peers.cluster1.example.org"
--cluster.peers="thanos-peers.cluster2.example.org"
--cluster.advertise-address=$NODE_IP:10900
env:
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIp
DNS record thanos-peer.cluster1.example.org -> 192.168.1.20

Node 3 (192.168.1.30)
Query 2
Binding to Node's 10900
--cluster.peers="thanos-peers.cluster1.example.org"
--cluster.peers="thanos-peers.cluster2.example.org"
--cluster.advertise-address=$NODE_IP:10900
env:
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIp
DNS record thanos-peer.cluster1.example.org -> 192.168.1.30

Cluster 2:
Node 1 (192.168.2.10)
Prom / Sidecar
Binding to Node's 10900
--cluster.peers="thanos-peers.cluster1.example.org"
--cluster.peers="thanos-peers.cluster2.example.org"
--cluster.advertise-address=$NODE_IP:10900
env:
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIp
DNS record thanos-peer.cluster2.example.org -> 192.168.2.10

I realize this is SUPER verbose and doesn't answer your question exactly. I'm also not finished testing this method to confirm it will work, but it's the last idea I think I've come up with with my stack. I hope it provides some direction for you.

Regarding the mention of a proxy above I'm not entirely sure how that resolves anything. I'm rolling out istio currently and I'm not certain of how this would bridge the gap, unless their envoy's are communicating cross cluster. Which that is cool for them, but as of now we're having our pod network isolated per cluster. I still stand by the biggest pain in the ass regarding thanos + kubes is it relying on a static IP. Which in the cloud is often ... difficult.

I'll update tomorrow when I've had some sleep and a chance to test these assumptions.

Best of luck

john-delivuk on 9 Aug 2018

So my final working configuration is Thanos query running in one cluster. I set my sidecars to point to my ingress controller. I map my sidecars to my node port. Then when I run my sidecars they advertise their hosts port with the above mentioned configuration. Technically the all node ports will work route traffic but by using the NODE_IP you're avoiding extra hops.

- args: - query - --log.level=debug - --query.replica-label=ENVIRONMENT - --cluster.advertise-address=$(NODE_IP):10900 - --grpc-advertise-address=$(NODE_IP):10901 env: - name: NODE_IP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.hostIP

So the sequence I see is Sidecar gossips to query api > api tries to connect via grpc. I'm not 100% sure if the sidecar will initiate a connection to the query api via grpc. This should also work with the store API. I'm going to look at adding it in next week.

john-delivuk on 12 Aug 2018

Currently, I am deploying it, I found a solution in k8s.

1. advertise the nodePort

        args:
        - "sidecar"
        - "--log.level=debug"
        - "--tsdb.path=/var/prometheus"
        - "--prometheus.url=http://127.0.0.1:9090"
        - "--cluster.peers=$(PEERS)"
        - "--reloader.config-file=/etc/prometheus/prometheus.yml.tmpl"
        - "--reloader.config-envsubst-file=/etc/prometheus-shared/prometheus.yml"
        - "--cluster.advertise-address=$(NODE_IP):30900"
        - "--grpc-advertise-address=$(NODE_IP):30901"
        env:
        - name: NODE_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP

2. use the service `externalTrafficPolicy=Local`

apiVersion: v1
kind: Service
metadata:
  name: thanos-peers-in-cluster
  namespace: thanos
spec:
  type: NodePort
  externalTrafficPolicy: Local
  ports:
  - name: cluster
    port: 10900
    targetPort: cluster
    nodePort: 30900
  - name: grpc
    port: 10901
    targetPort: grpc
    nodePort: 30901
  selector:
    # Useful endpoint for gathering all thanos components for common gossip cluster.
    thanos-peer: "true"

externalTrafficPolicy: Local doc

so, when a sidecar advertise itself to HOST_IP:30900

other components request -> HOST_IP:30900 
-> HOST_IP Node of another k8s cluster, Port: 30900
-> service 30900 -> 10900
-> externalTrafficPolicy: Local
-> so it will route to the 10900 on the same Node
-> sidecar

it works, but the thanos-query complains

level=debug ts=2018-10-02T01:39:48.240360133Z caller=stdlib.go:89 component=cluster caller="peers2018/10/02 01:39" msg=":48 [DEBUG] memberlist: Failed ping: 01CRS6VEFNA8NGYGSZCR8PA97B (timeout reached)"
level=debug ts=2018-10-02T01:39:48.740308321Z caller=stdlib.go:89 component=cluster caller="peers2018/10/02 01:39" msg=":48 [WARN] memberlist: Was able to connect to 01CRS6VEFNA8NGYGSZCR8PA97B but other probes failed, network may be misconfigured"

I am trying to figure out why, is there any clues?

then problem is, you should use different selector and NodePort for different thanos-component if they are deployed in the same cluster.

wklken on 2 Oct 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.