https://github.com/grafana/loki/blob/master/docs/operations.md#scalability mentioned #ingestor, distributor, and querier# can running in different Loki processes with their respective roles. (BTW, I was not familiar with libsonnet, could anyone help to show 3 different loki config examples for different roles?)
However, for the sake of performance, I was wondering if ingestor, distributor, and querier can run in different node(like VM, like pod)?
The reason is that if they sit on one same node, the memory/cpu usage will be impacted for each other. (I can not find isolation of cpu/mem for different role)I find that when I did a query, it used up all the memory of node, then the whole Loki get restarted and ingestor break.
Please ask questions you have in the mailing list: https://groups.google.com/forum/#!forum/lokiproject
Or join our #loki slack channel at http://slack.raintank.io/
@daixiang0 could you help to take a look? I tried start multiple process in my local with different roles, but can not work.
I changed some code, start table-manager, distributor, ingester in one process, listening http port 3100 and grpc port 9095.
Start querier in another process, listening http port 3101 and grpc port 9096.
Then problem comes, the querier can not find ingester from the ring, thus can not return query result; If let querier listen grpc port 9095, it failed due to error initialising module: server: listen tcp :9095: bind: address already in use.
So how can I achieve "ingestor, distributor, and querier can running in different Loki processes with their respective roles"?
The mentioned production setup consists of the following:
Main components:
distributor.yml
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: distributor
spec:
minReadySeconds: 10
replicas: 3
revisionHistoryLimit: 10
template:
metadata:
annotations:
config_hash: 969f2db731fb134caee8745da52c2f12
labels:
name: distributor
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
name: distributor
topologyKey: kubernetes.io/hostname
containers:
- args:
- -config.file=/etc/loki/config.yaml
- -target=distributor
image: grafana/loki:v0.1.0
imagePullPolicy: IfNotPresent
name: distributor
ports:
- containerPort: 80
name: http-metrics
- containerPort: 9095
name: grpc
resources:
limits:
cpu: "1"
memory: 200Mi
requests:
cpu: 500m
memory: 100Mi
volumeMounts:
- mountPath: /etc/loki
name: loki
volumes:
- configMap:
name: loki
name: loki
ingester.yml
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: ingester
spec:
minReadySeconds: 60
replicas: 3
revisionHistoryLimit: 10
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
template:
metadata:
annotations:
config_hash: 969f2db731fb134caee8745da52c2f12
labels:
name: ingester
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
name: ingester
topologyKey: kubernetes.io/hostname
containers:
- args:
- -config.file=/etc/loki/config.yaml
- -target=ingester
image: grafana/loki:v0.1.0
imagePullPolicy: IfNotPresent
name: ingester
ports:
- containerPort: 80
name: http-metrics
- containerPort: 9095
name: grpc
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 15
timeoutSeconds: 1
resources:
limits:
cpu: "2"
memory: 10Gi
requests:
cpu: "1"
memory: 5Gi
volumeMounts:
- mountPath: /etc/loki
name: loki
terminationGracePeriodSeconds: 4800
volumes:
- configMap:
name: loki
name: loki
querier.yml
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: querier
spec:
minReadySeconds: 10
replicas: 3
revisionHistoryLimit: 10
template:
metadata:
annotations:
config_hash: 969f2db731fb134caee8745da52c2f12
labels:
name: querier
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
name: querier
topologyKey: kubernetes.io/hostname
containers:
- args:
- -config.file=/etc/loki/config.yaml
- -target=querier
image: grafana/loki:v0.1.0
imagePullPolicy: IfNotPresent
name: querier
ports:
- containerPort: 80
name: http-metrics
- containerPort: 9095
name: grpc
volumeMounts:
- mountPath: /etc/loki
name: loki
volumes:
- configMap:
name: loki
name: loki
table-manager.yml
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: table-manager
spec:
minReadySeconds: 10
replicas: 1
revisionHistoryLimit: 10
template:
metadata:
annotations:
config_hash: 969f2db731fb134caee8745da52c2f12
labels:
name: table-manager
spec:
containers:
- args:
- -config.file=/etc/loki/config.yaml
- -target=table-manager
image: grafana/loki:v0.1.0
imagePullPolicy: IfNotPresent
name: table-manager
ports:
- containerPort: 80
name: http-metrics
- containerPort: 9095
name: grpc
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- mountPath: /etc/loki
name: loki
volumes:
- configMap:
name: loki
name: loki
All of these share the same config:
config.yml
---
apiVersion: v1
data:
config.yaml: |
chunk_store_config:
chunk_cache_config:
memcached:
batch_size: 100
parallelism: 100
memcached_client:
host: memcached.loki.svc.cluster.local
service: memcached-client
max_look_back_period: 0
write_dedupe_cache_config:
memcached:
batch_size: 100
parallelism: 100
memcached_client:
host: memcached-index-writes.loki.svc.cluster.local
service: memcached-client
ingester:
chunk_block_size: 262144
chunk_idle_period: 15m
lifecycler:
claim_on_rollout: false
heartbeat_period: 5s
interface_names:
- eth0
join_after: 10s
num_tokens: 512
ring:
heartbeat_timeout: 1m
kvstore:
consul:
consistentreads: true
host: consul.loki.svc.cluster.local:8500
httpclienttimeout: 20s
prefix: ""
store: consul
replication_factor: 3
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
schema_config:
configs:
- from: 2018-04-15
index:
period: 168h
prefix: loki_index_
object_store: gcs
schema: v9
store: bigtable
server:
graceful_shutdown_timeout: 5s
grpc_server_max_recv_msg_size: 67108864
http_server_idle_timeout: 120s
storage_config:
bigtable:
instance: ""
project: ""
gcs:
bucket_name: ""
index_queries_cache_config:
memcached:
batch_size: 100
parallelism: 100
memcached_client:
host: memcached-index-queries.loki.svc.cluster.local
service: memcached-client
table_manager:
chunk_tables_provisioning:
inactive_read_throughput: 0
inactive_write_throughput: 0
provisioned_read_throughput: 0
provisioned_write_throughput: 0
index_tables_provisioning:
inactive_read_throughput: 0
inactive_write_throughput: 0
provisioned_read_throughput: 0
provisioned_write_throughput: 0
retention_deletes_enabled: false
retention_period: 0
kind: ConfigMap
metadata:
name: loki
These individual components are running in separate docker containers (actually Kubernetes pods) and have resourceLimits in place, to prevent a single service from impacting the others.
Furthermore, a gateway (nginx) and memcached are running in front of it.
I hope this helps, maybe take a look at the Kubernetes manifests in the <details> above.
@sh0rez Thanks a lot for the reply!
I checked config.yml, seems no "server" type config. So I assume above individual components use default http port/grpc port, right?
Then how can they communicate to each other through the Ring?
My trial in local machine, querier (in one process, grpc port 9096) can not find the ingester (in another process, grpc port 9095)to get logs. Do I miss something?
Do I miss something?
No, sorry, I did not provide the full manifests, because I thought they were unnecessary. As this is deployed on Kubernetes, all components run on their default ports (see the pod specs) and they all have a matching service. Inconfig.ymlthey are configured to talk to each other using these services.
For the ring however, Hashicorp Consul is used, which is deployed to the cluster as well.
These behaviors are not kubernetes-specific, you could also implement this e.g. using docker networks and named containers, or multiple VM's with hostnames.
At the moment, you probably need to use consul for the ring when running in distributed mode. (Refer to Cortex docs, which this functionality of Loki is taken from. Maybe @tomwilkie can tell more about this?
Does that help?
definitely help a lot, thanks! I will read the docs you provided, and then go back again if I still have questions.
@sh0rez btw, I can not find "consul/consul.libsonnet" and "ksonnet-util/kausal.libsonnet" in Loki.
Hi, according to the Jsonnetfile, these are external dependencies, located in grafana/jsonnet-libs.
consul/consul.libsonnet provides manifests to install consul (worth a look)ksonnet-util/kausal.libsonnet on the other hand is a helper to create Kubernetes objects using the mixin-style of jsonnet.@sh0rez much appreciate for your help! To save other users' time, I would like to write a new doc focus on how to separate these components for the beginners who didn't read any Cortex code and didn't know anything about jsonnet.
The mentioned production setup consists of the following:
Main components:
distributor.yml
--- apiVersion: apps/v1beta1 kind: Deployment metadata: name: distributor spec: minReadySeconds: 10 replicas: 3 revisionHistoryLimit: 10 template: metadata: annotations: config_hash: 969f2db731fb134caee8745da52c2f12 labels: name: distributor spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: name: distributor topologyKey: kubernetes.io/hostname containers: - args: - -config.file=/etc/loki/config.yaml - -target=distributor image: grafana/loki:v0.1.0 imagePullPolicy: IfNotPresent name: distributor ports: - containerPort: 80 name: http-metrics - containerPort: 9095 name: grpc resources: limits: cpu: "1" memory: 200Mi requests: cpu: 500m memory: 100Mi volumeMounts: - mountPath: /etc/loki name: loki volumes: - configMap: name: loki name: lokiingester.yml
--- apiVersion: apps/v1beta1 kind: Deployment metadata: name: ingester spec: minReadySeconds: 60 replicas: 3 revisionHistoryLimit: 10 strategy: rollingUpdate: maxSurge: 0 maxUnavailable: 1 template: metadata: annotations: config_hash: 969f2db731fb134caee8745da52c2f12 labels: name: ingester spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: name: ingester topologyKey: kubernetes.io/hostname containers: - args: - -config.file=/etc/loki/config.yaml - -target=ingester image: grafana/loki:v0.1.0 imagePullPolicy: IfNotPresent name: ingester ports: - containerPort: 80 name: http-metrics - containerPort: 9095 name: grpc readinessProbe: httpGet: path: /ready port: 80 initialDelaySeconds: 15 timeoutSeconds: 1 resources: limits: cpu: "2" memory: 10Gi requests: cpu: "1" memory: 5Gi volumeMounts: - mountPath: /etc/loki name: loki terminationGracePeriodSeconds: 4800 volumes: - configMap: name: loki name: lokiquerier.yml
table-manager.yml--- apiVersion: apps/v1beta1 kind: Deployment metadata: name: table-manager spec: minReadySeconds: 10 replicas: 1 revisionHistoryLimit: 10 template: metadata: annotations: config_hash: 969f2db731fb134caee8745da52c2f12 labels: name: table-manager spec: containers: - args: - -config.file=/etc/loki/config.yaml - -target=table-manager image: grafana/loki:v0.1.0 imagePullPolicy: IfNotPresent name: table-manager ports: - containerPort: 80 name: http-metrics - containerPort: 9095 name: grpc resources: limits: cpu: 200m memory: 200Mi requests: cpu: 100m memory: 100Mi volumeMounts: - mountPath: /etc/loki name: loki volumes: - configMap: name: loki name: lokiAll of these share the same config:
config.yml
--- apiVersion: v1 data: config.yaml: | chunk_store_config: chunk_cache_config: memcached: batch_size: 100 parallelism: 100 memcached_client: host: memcached.loki.svc.cluster.local service: memcached-client max_look_back_period: 0 write_dedupe_cache_config: memcached: batch_size: 100 parallelism: 100 memcached_client: host: memcached-index-writes.loki.svc.cluster.local service: memcached-client ingester: chunk_block_size: 262144 chunk_idle_period: 15m lifecycler: claim_on_rollout: false heartbeat_period: 5s interface_names: - eth0 join_after: 10s num_tokens: 512 ring: heartbeat_timeout: 1m kvstore: consul: consistentreads: true host: consul.loki.svc.cluster.local:8500 httpclienttimeout: 20s prefix: "" store: consul replication_factor: 3 limits_config: enforce_metric_name: false reject_old_samples: true reject_old_samples_max_age: 168h schema_config: configs: - from: 2018-04-15 index: period: 168h prefix: loki_index_ object_store: gcs schema: v9 store: bigtable server: graceful_shutdown_timeout: 5s grpc_server_max_recv_msg_size: 67108864 http_server_idle_timeout: 120s storage_config: bigtable: instance: "" project: "" gcs: bucket_name: "" index_queries_cache_config: memcached: batch_size: 100 parallelism: 100 memcached_client: host: memcached-index-queries.loki.svc.cluster.local service: memcached-client table_manager: chunk_tables_provisioning: inactive_read_throughput: 0 inactive_write_throughput: 0 provisioned_read_throughput: 0 provisioned_write_throughput: 0 index_tables_provisioning: inactive_read_throughput: 0 inactive_write_throughput: 0 provisioned_read_throughput: 0 provisioned_write_throughput: 0 retention_deletes_enabled: false retention_period: 0 kind: ConfigMap metadata: name: lokiThese individual components are running in separate docker containers (actually Kubernetes pods) and have
resourceLimitsin place, to prevent a single service from impacting the others.Furthermore, a
gateway(nginx) andmemcachedare running in front of it.I hope this helps, maybe take a look at the Kubernetes manifests in the
<details>above.

there are right or wrong?
Most helpful comment
@sh0rez much appreciate for your help! To save other users' time, I would like to write a new doc focus on how to separate these components for the beginners who didn't read any Cortex code and didn't know anything about jsonnet.