Longhorn: [Question] run longhorn workloads on dedicated nodes only (taints, tolerations, affinity, nodeSelector)

Created on 23 Jul 2020 · 23Comments · Source: longhorn/longhorn

hi!

we run longhorn installed via Rancher catalog on our cluster. We would like to run longhorn workloads (instance manager, engines etc) only on dedicated nodes, not on every node. The reason for this would be that we might be adding more nodes to the cluster, then remove that nodes or run experiments over that nodes that would result into longhorn failures (engine would stuck in deploying phase if we shutdown the node that used to have longhorn running). By far I can't find a way to lock longhorn within certain nodes.

Please advise,
Anton

question

Source

aandrushchenko

👍2

Most helpful comment

Another use case for supporting taints and tolerations is a cluster with windows and linux nodes where you want to use longhorn only on the linux nodes.

bashofmann on 31 Jul 2020

👍2

All 23 comments

Longhorn needs to be running on the node to provide storage on the node, so normally it's not recommended to limit the deployment for a subset of the node.

However, can you elaborate your issue with engine would be stuck in deploying phase? Longhorn is designed to scale well with the cluster, so if there is a problem with that part, we want to understand why it happened and potentially fix it.

yasker on 23 Jul 2020

so the scenario was (not exact, but i will be running more soon, and likely will have better steps to reproduce):

have longhorn installed via Rancher. everything is stable
add node(s) to the cluster - longhorn is getting installed
taint nodes in order to isolate them for heavy jobs (do not require longhorn, but it's already installed)
nodes are getting randomly rebooted/killed due to resources or kernel bugs
longhorn engine image becomes stuck in 'deploying' status, we are having intermittent issues on other nodes
longhorn is stabilised by reinstalling deployment

so our idea is to be able to tell longhorn what nodes it should run, so when we spin up nodes that don't need it - we don't have to deal with reinstalling it in order to respect taints that are set after longhorn is already installed

aandrushchenko on 24 Jul 2020

Using taints with effect NoExecute will evict the existing Longhorn workloads on the node immediately. Then rebooting/shutting down the tainted nodes won't affect Longhorn.
Rebooting/Killing the nodes on which the Longhorn workloads are running does lead to the longhorn engine image becoming 'deploying' status. In this case, waiting for the longhorn workloads back is enough and you don' need to reinstall Longhorn. If the Longhorn workload recovery somehow gets stuck, it may be a bug. You can file an issue for that then.

shuo-wu on 27 Jul 2020

Another use case for supporting taints and tolerations is a cluster with windows and linux nodes where you want to use longhorn only on the linux nodes.

bashofmann on 31 Jul 2020

👍2

It doesn't work with Rancher with windows node enable. I bet people on longhorn don't talk to people on Rancher team lol

maxisam on 24 Sep 2020

I just discovered there is a setting called "taintToleration".
if you set it like taintToleration: "cattle.io/os=linux:NoSchedule"
It will work if you deploy from helm chart directly. But it doesn't work from Rancher App catalog.
I also taint windows nodes. (I really think Rancher tainting linux nodes is very annoying.)

maxisam on 25 Sep 2020

@maxisam The taint toleration should work with the Rancher App too, once you set the default setting option?

yasker on 25 Sep 2020

@yasker I did. But somehow it doesn't work. And I can't find a way to let it only select linux node, so I have to taint windwos node. In the end, every nodes is taints lol. I hope Rancher provide an option to taint linux or windows node

maxisam on 26 Sep 2020

There is a known issue in Longhorn toleration setting: The toleration setting will be applied only after Longhorn workloads up in the cluster. In other words, if the toleration setting doesn't work and the Longhorn workloads can not be deployed if all nodes are tainted before Longhorn launching

shuo-wu on 28 Sep 2020

@shuo-wu I don't think will work. I believe the way you said was the first time I did. I didn't taint windows nodes and I tried to fix it by setting toleration and affiliation after deployment. But toleration setting didn't stick. For a moment, it seemed working but it didn't work in the end. I remembered UI part did work. I really think it should be an easy fix from Rancher's side, just don't taint it.

maxisam on 28 Sep 2020

@maxisam

Longhorn doesn't support Windows now.
The default setting doesn't work for the Longhorn upgrade. That's a default setting for the upcoming Longhorn system. Hence modifying the Longhorn toleration setting via Rancher App won't stick when there is already an (old) Longhorn system in the cluster.

shuo-wu on 28 Sep 2020

@shuo-wu I totally aware Longhorn doesn't support Windows. I just need it run on Linux nods in a hybrid (Windows/Linux) cluster.

About 2. I think I didn't make it clear enough. I changed the settings on the deployment/daemonsets and scale the it down to 0 and scale back. There were 2 changes I made, affinity and toleration. Affinity part works but not toleration part.

maxisam on 28 Sep 2020

👍1

yasker on 29 Sep 2020

👍1

Got it. Sorry for misunderstanding your comment. The 2nd issue will be tracked here: https://github.com/longhorn/longhorn/issues/1833

shuo-wu on 29 Sep 2020

👍1

Longhorn needs to be running on the node to provide storage on the node, so normally it's not recommended to limit the deployment for a subset of the node.

However, can you elaborate your issue with engine would be stuck in deploying phase? Longhorn is designed to scale well with the cluster, so if there is a problem with that part, we want to understand why it happened and potentially fix it.

Hi folks! I need more info about @yasker answer above.
I was thinking on put longhorn in 3 dedicated nodes (they will exclusively will handle volumes and replicas, dashboard etc) and all _other worker nodes_ could have its pods using PVs hosted at these 3 longhorn nodes. By @yasker answer, it isn't possible?

I have a cluster with Longhorn on all nodes (3) but I also have a bad application that create massive CPU spikes - when this happens every process suffers from CPU starvation and Longhorn in particular 'get lost'; it stops scheduling, then starts rebuilding replicas (lots of disk pressure) and the worst: many pods get their PVCs in read-only mode. And Longhorn recovers itself but I have to restart all affected workloads.
That why I was thinking about isolating longhorn on dedicated nodes.

Sorry about this lenghty question. Any thoughts are appreciated.
Regards,
Fabio Carvalho

FCarvMobil on 1 Oct 2020

@FCarvMobil You need to set resource limitation on your "bad apps".

maxisam on 1 Oct 2020

👍1

Hi @maxisam. Those pods already have k8s resources set. The issue here is there are some paralelism with them and I can't set resources too low as they will take too long to complete.
Thanks for replying.

FCarvMobil on 1 Oct 2020

@FCarvMobil

You can have three dedicated nodes to provide storage, which means running replicas and stuff. But Longhorn needs to be run on every node to provide the connectivity to the volume. So if any workload on the node need access to the Longhorn volume, Longhorn needs to be run there.

Based on what you described, you need to shield Longhorn from this situation. You can try to set higher GuaranteedEngineCPU to see if it helps, which will translate into CPU Request for the key Longhorn pods. Notice that reset the value will restart all the volumes, so scale down the workload and detach the volume first.

yasker on 2 Oct 2020

Hi @yasker, thanks for replying.

Yes, the bottom line is that, to protect Longhorn from this situation. I'm already using the GuaranteedEngineCPU setting at the default suggested value (by the docs it should fit) but I'll try to increase it (aware to bring down workloads prior to this).

Sorry if I didn't get something about the architecture at the docs. I should run Longhorn in all nodes to provide connectivity to volumes, this implies on having replicas scheduled _in each node_? If not, how can I control on which nodes to allow/forbid replica scheduling?
[UPDATE] I went through the docs again (I think that time with a refreshed mind): it's this? https://longhorn.io/docs/1.0.0/references/settings/#kubernetes-taint-toleration

Again, thanks!

FCarvMobil on 2 Oct 2020

The default value is a bit conservative since if we put too high a value then the instance manager may fail to start in user's environment.

We're also working on #1691 which should help us to get some guideline about how should we set the GuaranteedEngineCPU according to how many volumes will be used on the node.

Also, Longhorn manager needs to run on every node, but you can choose which nodes provide the storage (a.k.a has replica created). Any node without a disk set in Longhorn won't be used for replica scheduling. You can do that in the UI node page. Or you can set annotation on the Kubernetes node object to customize the default disk for the node. See https://longhorn.io/docs/1.0.2/advanced-resources/default-disk-and-node-config/

Taint toleration is indeed for dedicate nodes for Longhorn storage, but it's not required. See https://longhorn.io/docs/1.0.2/advanced-resources/deploy/taint-toleration/

yasker on 2 Oct 2020

I was thinking on put longhorn in 3 dedicated nodes (they will exclusively will handle volumes and replicas, dashboard etc) and all other worker nodes could have its pods using PVs hosted at these 3 longhorn nodes.

This would be great. But taints and tolerations seem not to be the right way for me. IMHO node-selector or affinity would be the right way for that.

rdxmb on 16 Oct 2020

Wait, it seems that https://github.com/longhorn/longhorn/issues/583 has fixed this by using labels.

rdxmb on 16 Oct 2020

Hi guys! Thank you all for replying.

I've seen that I can disable replica scheduling on Dashboard but the approach with nodeSelector is preferable so I haven't to manually configure anything on Dashboard - just add a new node with the appropriate label and it's done. #583 is a good reference!

I'll return here with my findings, still waiting an internal approval to build up a test scenario within our cloud provider account.
Again, thanks for replying.

FCarvMobil on 19 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings