Salt: Master calling find_job every 10 seconds, Windows takes 10 seconds to execute it

Created on 13 Feb 2019  路  18Comments  路  Source: saltstack/salt

Description of Issue/Question

When you run salt win something, about every 10 seconds salt master calls find_job to see how the job running on the minion is going. On windows this starts a new thread. On windows starting a new thread seems to take about 10 seconds.

This just cause a lot of thrashing on the windows minion. A work around for this of salt -t 60 .... which just gives find_job longer to respond.

Steps to Reproduce Issue

Add some addition logging around when the minion first gets the request (before a thread is started), find_job starts. On Unix/Linux this in ms on windows this is about 10-15 seconds.

Suggest Fix

Add find_job to minion.py and no longer fork or start a thread, and therefore responds in under a second, removing the need to use the salt -t <secs> option.
Maybe a change here https://github.com/saltstack/salt/blob/81eb15264380d82267ffc3c1930410baf1f3fbf1/salt/minion.py#L1510 to check for find_job and stop the fork/thread?

Examples

https://github.com/saltstack/salt/issues/48882#issuecomment-441038972 Slow response from Windows minions on 2018.3

30007 return delay of a find_job task

Bug severity-medium team-windows

Most helpful comment

As request by @dwoz

All 18 comments

As request by @dwoz

Hi @Ch3LL seems more like a P1, this issue make windows experience a bad one in comparison to linux/unix. Enough for people to walk away from salt. And it just a simple fix for someone who understands the minion.py thread code.

I've just started learning to use Salt. I have a Server 2016 minion. I found it impossible to get a response from the minion when executing win_wua.list from the master.
When I tried salt-call it would run fine.
Is this issue the reason why I am seeing no response from minion unless I add "-t 120" to my execution commands?

Yes

@Ch3LL I have tried to fix this myself, but do not understand the code well enough. This would be a fascistic performance improvement for Windows. Most importantly stop all the time out errors Minion did not return. [No response] and stop the need to do salt -t 60 I am sure someone who knows the code well could make the change in a day or less.

looks like its assigned to @dwoz currently and in his backlog.

@Ch3LL and @dwoz I tried to in-bed a copy of the find_job() code in minion.py. But the place I found suitable to call it from, still started a thread even though it looked like it shouldn't on my reading of the code. Its just (I hope) a simple fix for a person who known the threading code in minion.py to stop a thread being created when calling a copy of the code for find_job(). Thanks. My fork/thread (windows 10 secs) for a something which opens a file to read input in less than 10ms and is part of the infrastructure running a job from salt master.

Fantastic its on the plan.

@dwoz @twangboy @sagetherage I would not be surprised after fixing this, that the test suite as part of a PR run a lot lot faster.

Looks like windows is also authenticating all the time when anything is returned to the master i.e. salt/auth. Where linux/unix does not.

Please find a collection of how Windows performance deteriorated with 2017 and Python3 in https://github.com/saltstack/salt/issues/53890.

Could "find_job" be made optional? I have no use for it.

@damon-atkins we cannot commit to the work for Sodium, but looking at @cmcmarrow to attempt and if it doesn't make it then we will plan and I will push to get a commit for the Magnesium release.

@sagetherage Thanks. The work around I use is to change the salt master is change several settings that effect find_job, how often it runs and its timeouts to give find_job() time to finish. However the impact is even slower response compare with Linux however its stable response.

@cmcmarrow For someone new looking at using Salt for Windows getting Minion did not return. [No response] , is enough for them to conclude the product does not work, move on to something else.

@damon-atkins how do you set how often find_job runs, please?

I cannot find that setting.

Thank you

There has been some work to skip grains which take a while to run on Windows. I suspect the Grains are being process every time a fork/thread is started including when find_job is called, where on Linux/Unix Grains are process once at startup

I'm not seeing any release notes for this in Magnesium. Can you please fix this?

Like most of your customers I'm sure, we have workarounds in place. But we shouldn't have to do this.

I'm not seeing any release notes for this in Magnesium. Can you please fix this?

Like most of your customers I'm sure, we have workarounds in place. But we shouldn't have to do this.

This is still an open issue and therefore not released, yet and isn't in any release notes. I need to check with the team to see when it can be worked and then released.

Was this page helpful?
0 / 5 - 0 ratings