Loki: Separate Loki Component?

Created on 3 Jun 2019 · 10Comments · Source: grafana/loki

https://github.com/grafana/loki/blob/master/docs/operations.md#scalability mentioned #ingestor, distributor, and querier# can running in different Loki processes with their respective roles. (BTW, I was not familiar with libsonnet, could anyone help to show 3 different loki config examples for different roles?)

However, for the sake of performance, I was wondering if ingestor, distributor, and querier can run in different node(like VM, like pod)?
The reason is that if they sit on one same node, the memory/cpu usage will be impacted for each other. (I can not find isolation of cpu/mem for different role)I find that when I did a query, it used up all the memory of node, then the whole Loki get restarted and ingestor break.

Please ask questions you have in the mailing list: https://groups.google.com/forum/#!forum/lokiproject

Or join our #loki slack channel at http://slack.raintank.io/

Source

mizeng

Most helpful comment

@sh0rez much appreciate for your help! To save other users' time, I would like to write a new doc focus on how to separate these components for the beginners who didn't read any Cortex code and didn't know anything about jsonnet.

mizeng on 5 Jun 2019

👍5

All 10 comments

@daixiang0 could you help to take a look? I tried start multiple process in my local with different roles, but can not work.

mizeng on 4 Jun 2019

I changed some code, start table-manager, distributor, ingester in one process, listening http port 3100 and grpc port 9095.
Start querier in another process, listening http port 3101 and grpc port 9096.

Then problem comes, the querier can not find ingester from the ring, thus can not return query result; If let querier listen grpc port 9095, it failed due to error initialising module: server: listen tcp :9095: bind: address already in use.

So how can I achieve "ingestor, distributor, and querier can running in different Loki processes with their respective roles"?

mizeng on 4 Jun 2019

The mentioned production setup consists of the following:

Main components:

distributor.yml

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: distributor
spec:
  minReadySeconds: 10
  replicas: 3
  revisionHistoryLimit: 10
  template:
    metadata:
      annotations:
        config_hash: 969f2db731fb134caee8745da52c2f12
      labels:
        name: distributor
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                name: distributor
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - -config.file=/etc/loki/config.yaml
        - -target=distributor
        image: grafana/loki:v0.1.0
        imagePullPolicy: IfNotPresent
        name: distributor
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        resources:
          limits:
            cpu: "1"
            memory: 200Mi
          requests:
            cpu: 500m
            memory: 100Mi
        volumeMounts:
        - mountPath: /etc/loki
          name: loki
      volumes:
      - configMap:
          name: loki
        name: loki

ingester.yml

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: ingester
spec:
  minReadySeconds: 60
  replicas: 3
  revisionHistoryLimit: 10
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
  template:
    metadata:
      annotations:
        config_hash: 969f2db731fb134caee8745da52c2f12
      labels:
        name: ingester
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                name: ingester
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - -config.file=/etc/loki/config.yaml
        - -target=ingester
        image: grafana/loki:v0.1.0
        imagePullPolicy: IfNotPresent
        name: ingester
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        readinessProbe:
          httpGet:
            path: /ready
            port: 80
          initialDelaySeconds: 15
          timeoutSeconds: 1
        resources:
          limits:
            cpu: "2"
            memory: 10Gi
          requests:
            cpu: "1"
            memory: 5Gi
        volumeMounts:
        - mountPath: /etc/loki
          name: loki
      terminationGracePeriodSeconds: 4800
      volumes:
      - configMap:
          name: loki
        name: loki

querier.yml

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: querier
spec:
  minReadySeconds: 10
  replicas: 3
  revisionHistoryLimit: 10
  template:
    metadata:
      annotations:
        config_hash: 969f2db731fb134caee8745da52c2f12
      labels:
        name: querier
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                name: querier
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - -config.file=/etc/loki/config.yaml
        - -target=querier
        image: grafana/loki:v0.1.0
        imagePullPolicy: IfNotPresent
        name: querier
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        volumeMounts:
        - mountPath: /etc/loki
          name: loki
      volumes:
      - configMap:
          name: loki
        name: loki

table-manager.yml

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: table-manager
spec:
  minReadySeconds: 10
  replicas: 1
  revisionHistoryLimit: 10
  template:
    metadata:
      annotations:
        config_hash: 969f2db731fb134caee8745da52c2f12
      labels:
        name: table-manager
    spec:
      containers:
      - args:
        - -config.file=/etc/loki/config.yaml
        - -target=table-manager
        image: grafana/loki:v0.1.0
        imagePullPolicy: IfNotPresent
        name: table-manager
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        resources:
          limits:
            cpu: 200m
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - mountPath: /etc/loki
          name: loki
      volumes:
      - configMap:
          name: loki
        name: loki

All of these share the same config:

config.yml

---
apiVersion: v1
data:
  config.yaml: |
    chunk_store_config:
      chunk_cache_config:
        memcached:
          batch_size: 100
          parallelism: 100
        memcached_client:
          host: memcached.loki.svc.cluster.local
          service: memcached-client
      max_look_back_period: 0
      write_dedupe_cache_config:
        memcached:
          batch_size: 100
          parallelism: 100
        memcached_client:
          host: memcached-index-writes.loki.svc.cluster.local
          service: memcached-client
    ingester:
      chunk_block_size: 262144
      chunk_idle_period: 15m
      lifecycler:
        claim_on_rollout: false
        heartbeat_period: 5s
        interface_names:
        - eth0
        join_after: 10s
        num_tokens: 512
        ring:
          heartbeat_timeout: 1m
          kvstore:
            consul:
              consistentreads: true
              host: consul.loki.svc.cluster.local:8500
              httpclienttimeout: 20s
              prefix: ""
            store: consul
          replication_factor: 3
    limits_config:
      enforce_metric_name: false
      reject_old_samples: true
      reject_old_samples_max_age: 168h
    schema_config:
      configs:
      - from: 2018-04-15
        index:
          period: 168h
          prefix: loki_index_
        object_store: gcs
        schema: v9
        store: bigtable
    server:
      graceful_shutdown_timeout: 5s
      grpc_server_max_recv_msg_size: 67108864
      http_server_idle_timeout: 120s
    storage_config:
      bigtable:
        instance: ""
        project: ""
      gcs:
        bucket_name: ""
      index_queries_cache_config:
        memcached:
          batch_size: 100
          parallelism: 100
        memcached_client:
          host: memcached-index-queries.loki.svc.cluster.local
          service: memcached-client
    table_manager:
      chunk_tables_provisioning:
        inactive_read_throughput: 0
        inactive_write_throughput: 0
        provisioned_read_throughput: 0
        provisioned_write_throughput: 0
      index_tables_provisioning:
        inactive_read_throughput: 0
        inactive_write_throughput: 0
        provisioned_read_throughput: 0
        provisioned_write_throughput: 0
      retention_deletes_enabled: false
      retention_period: 0
kind: ConfigMap
metadata:
  name: loki

These individual components are running in separate docker containers (actually Kubernetes pods) and have resourceLimits in place, to prevent a single service from impacting the others.

Furthermore, a gateway (nginx) and memcached are running in front of it.

I hope this helps, maybe take a look at the Kubernetes manifests in the <details> above.

sh0rez on 4 Jun 2019

👍5

@sh0rez Thanks a lot for the reply!
I checked config.yml, seems no "server" type config. So I assume above individual components use default http port/grpc port, right?
Then how can they communicate to each other through the Ring?

My trial in local machine, querier (in one process, grpc port 9096) can not find the ingester (in another process, grpc port 9095)to get logs. Do I miss something?

mizeng on 4 Jun 2019

Do I miss something?
No, sorry, I did not provide the full manifests, because I thought they were unnecessary. As this is deployed on Kubernetes, all components run on their default ports (see the pod specs) and they all have a matching service. In config.yml they are configured to talk to each other using these services.

For the ring however, Hashicorp Consul is used, which is deployed to the cluster as well.
These behaviors are not kubernetes-specific, you could also implement this e.g. using docker networks and named containers, or multiple VM's with hostnames.

At the moment, you probably need to use consul for the ring when running in distributed mode. (Refer to Cortex docs, which this functionality of Loki is taken from. Maybe @tomwilkie can tell more about this?

Does that help?

sh0rez on 4 Jun 2019

definitely help a lot, thanks! I will read the docs you provided, and then go back again if I still have questions.

mizeng on 5 Jun 2019

@sh0rez btw, I can not find "consul/consul.libsonnet" and "ksonnet-util/kausal.libsonnet" in Loki.

mizeng on 5 Jun 2019

Hi, according to the Jsonnetfile, these are external dependencies, located in grafana/jsonnet-libs.

https://github.com/grafana/loki/blob/922f1daf5d1780c3b96b0cb7b150f1fe65379262/production/ksonnet/loki/jsonnetfile.json#L4-L19

consul/consul.libsonnet provides manifests to install consul (worth a look)
ksonnet-util/kausal.libsonnet on the other hand is a helper to create Kubernetes objects using the mixin-style of jsonnet.

sh0rez on 5 Jun 2019

mizeng on 5 Jun 2019

👍5

The mentioned production setup consists of the following:

Main components:

distributor.yml

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: distributor
spec:
  minReadySeconds: 10
  replicas: 3
  revisionHistoryLimit: 10
  template:
    metadata:
      annotations:
        config_hash: 969f2db731fb134caee8745da52c2f12
      labels:
        name: distributor
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                name: distributor
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - -config.file=/etc/loki/config.yaml
        - -target=distributor
        image: grafana/loki:v0.1.0
        imagePullPolicy: IfNotPresent
        name: distributor
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        resources:
          limits:
            cpu: "1"
            memory: 200Mi
          requests:
            cpu: 500m
            memory: 100Mi
        volumeMounts:
        - mountPath: /etc/loki
          name: loki
      volumes:
      - configMap:
          name: loki
        name: loki

ingester.yml

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: ingester
spec:
  minReadySeconds: 60
  replicas: 3
  revisionHistoryLimit: 10
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
  template:
    metadata:
      annotations:
        config_hash: 969f2db731fb134caee8745da52c2f12
      labels:
        name: ingester
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                name: ingester
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - -config.file=/etc/loki/config.yaml
        - -target=ingester
        image: grafana/loki:v0.1.0
        imagePullPolicy: IfNotPresent
        name: ingester
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        readinessProbe:
          httpGet:
            path: /ready
            port: 80
          initialDelaySeconds: 15
          timeoutSeconds: 1
        resources:
          limits:
            cpu: "2"
            memory: 10Gi
          requests:
            cpu: "1"
            memory: 5Gi
        volumeMounts:
        - mountPath: /etc/loki
          name: loki
      terminationGracePeriodSeconds: 4800
      volumes:
      - configMap:
          name: loki
        name: loki

querier.yml
table-manager.yml

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: table-manager
spec:
  minReadySeconds: 10
  replicas: 1
  revisionHistoryLimit: 10
  template:
    metadata:
      annotations:
        config_hash: 969f2db731fb134caee8745da52c2f12
      labels:
        name: table-manager
    spec:
      containers:
      - args:
        - -config.file=/etc/loki/config.yaml
        - -target=table-manager
        image: grafana/loki:v0.1.0
        imagePullPolicy: IfNotPresent
        name: table-manager
        ports:
        - containerPort: 80
          name: http-metrics
        - containerPort: 9095
          name: grpc
        resources:
          limits:
            cpu: 200m
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - mountPath: /etc/loki
          name: loki
      volumes:
      - configMap:
          name: loki
        name: loki

All of these share the same config:

config.yml

---
apiVersion: v1
data:
  config.yaml: |
    chunk_store_config:
      chunk_cache_config:
        memcached:
          batch_size: 100
          parallelism: 100
        memcached_client:
          host: memcached.loki.svc.cluster.local
          service: memcached-client
      max_look_back_period: 0
      write_dedupe_cache_config:
        memcached:
          batch_size: 100
          parallelism: 100
        memcached_client:
          host: memcached-index-writes.loki.svc.cluster.local
          service: memcached-client
    ingester:
      chunk_block_size: 262144
      chunk_idle_period: 15m
      lifecycler:
        claim_on_rollout: false
        heartbeat_period: 5s
        interface_names:
        - eth0
        join_after: 10s
        num_tokens: 512
        ring:
          heartbeat_timeout: 1m
          kvstore:
            consul:
              consistentreads: true
              host: consul.loki.svc.cluster.local:8500
              httpclienttimeout: 20s
              prefix: ""
            store: consul
          replication_factor: 3
    limits_config:
      enforce_metric_name: false
      reject_old_samples: true
      reject_old_samples_max_age: 168h
    schema_config:
      configs:
      - from: 2018-04-15
        index:
          period: 168h
          prefix: loki_index_
        object_store: gcs
        schema: v9
        store: bigtable
    server:
      graceful_shutdown_timeout: 5s
      grpc_server_max_recv_msg_size: 67108864
      http_server_idle_timeout: 120s
    storage_config:
      bigtable:
        instance: ""
        project: ""
      gcs:
        bucket_name: ""
      index_queries_cache_config:
        memcached:
          batch_size: 100
          parallelism: 100
        memcached_client:
          host: memcached-index-queries.loki.svc.cluster.local
          service: memcached-client
    table_manager:
      chunk_tables_provisioning:
        inactive_read_throughput: 0
        inactive_write_throughput: 0
        provisioned_read_throughput: 0
        provisioned_write_throughput: 0
      index_tables_provisioning:
        inactive_read_throughput: 0
        inactive_write_throughput: 0
        provisioned_read_throughput: 0
        provisioned_write_throughput: 0
      retention_deletes_enabled: false
      retention_period: 0
kind: ConfigMap
metadata:
  name: loki

These individual components are running in separate docker containers (actually Kubernetes pods) and have resourceLimits in place, to prevent a single service from impacting the others.

Furthermore, a gateway (nginx) and memcached are running in front of it.

I hope this helps, maybe take a look at the Kubernetes manifests in the <details> above.

there are right or wrong?