Cylc-flow: Local jobs inherit scheduler environment

Created on 27 Jul 2020  路  13Comments  路  Source: cylc/cylc-flow

Back when we used a fixed multi-processing pool for job submission etc.:

  • local background jobs were entirely divorced from the scheduler environment, just like remote and batch system jobs.

Now we just manage sub-processes (up to some max size), and:

  • local background jobs inherit the scheduler environment just like normal shell sub-processes, unlike batch system jobs.

This is inconsistent and potentially dangerous, because:

  • it could cause jobs to behave differently depending on choice of job launcher (local background vs all other)
  • for Cylc 8 it also causes local background jobs, uniquely, to bypass the central cylc wrapper, because they inherit the Python or conda virtual environment from the scheduler.

Bypassing the wrapper is not in itself bad, but it is (again) inconsistent, and it makes it harder to get the wrapper right because most testing uses local background jobs, and users might reasonably assume that the wrapper is used by all jobs.

@cylc/core - should we try to fix this?

Most helpful comment

This issue is not limited to background jobs - all submission methods are affected. In my tests, background, at and slurm jobs all inherit the environment (with Slurm you can use the directive --export=none to prevent this). See also #3487.

All 13 comments

Thoughts on how to fix it.

To submit jobs, the scheduler calls cylc jobs-submit in a subprocess , which for local background jobs executes the job script via nohup in another subprocess - see cylc/flow/batch_sys_handlers/background.py.

subprocess.Popen() defaults to passing the entire parent environment to the subprocess, if its env arg is undefined. If env is set to a dict of essential environment variables instead, only those will be passed to the subprocess. So we should be able to pass only a minimal clean environment to background jobs.

More inconsistency: if the user gets access to a cylc wrapper (for Cylc 7, say) by prepending its location to PATH in their .bashrc, the wrapper will not be bypassed by local background jobs from a Cylc 8 virtual env. (Only wrappers installed in the default path, e.g. /usr/local/bin will be bypassed).

Chipping in.

Back when we used a fixed multi-processing pool for job submission etc.:
local background jobs were entirely divorced from the scheduler environment, just like remote and batch system jobs.

Unless you are talking about something before my time, my understanding is that background jobs have always inherited the scheduler's environment.

Your proposal https://github.com/cylc/cylc-flow/issues/3710#issuecomment-664048563 is most correct, as long as the job script runs a login shell again to regain an environment. 馃憤 (The equivalent command line is env -i ... which can be equally effective.)

Bonus consideration: What about event handler scripts?

This issue is not limited to background jobs - all submission methods are affected. In my tests, background, at and slurm jobs all inherit the environment (with Slurm you can use the directive --export=none to prevent this). See also #3487.

@matthewrmshin -

Unless you are talking about something before my time, my understanding is that background jobs have always inherited the scheduler's environment.

I thought I recalled that the old multiprocessing pool did not carry through the scheduler environment, but I could be wrong about that.

@dpmatthews -

This issue is not limited to background jobs

I'm not sure if the default behaviour is the same for all batch systems (PBS, Slurm etc.) - do they all default to passing the submission environment through to the job, or not? Either way, they don't have do pass it, and (IMO) for remote jobs at least it doesn't seem advisable to pass the entire submission environment to the job. (And if that is the case, we should treat local background jobs the same, for consistency).

Surely the safest thing to do is to give jobs only the environment that is explicitly set for them?

(Previous comment edited 3 or 4 times to correct typos!)

(Bringing back my Cylc brain.)

You are partially right there, as the sequence of event is a bit like this:

  1. Scheduler starts (with environment of CLI that started it).
  2. Multi processing pool starts.
  3. Scheduler reads suite, exports more environment variables.

So yes, the jobs won't have the environment variables in the suite, but they will have the environment from the CLI that started the suite.

@hjoliver

I'm not sure if the default behaviour is the same for all batch systems

I think we should ensure the same environment is used when submitting a job whichever method you use.
Then it doesn't really matter which methods are affected.

Surely the safest thing to do is to give jobs only the environment that is explicitly set for them?

Agreed - I was just pointing out this is a wider issue than just background.

So yes, the jobs won't have the environment variables in the suite, but they will have the environment from the CLI that started the suite.

Yes, by "scheduler environment" I meant the environment that the scheduler is running in, which is the environment in which cylc run is executed. (Slightly complicated by the fact that the scheduler exports a few variables to its own environment on start-up, I think).

the environment in which cylc run is executed

Actually that's only true if cylc run runs the suite on localhost (which is another reason why it would be good to make the behaviour consistent).

(well cylc run also runs the suite on the target host, even if that is not the original CLI host - but yes, we need to distinguish between those hosts)

I assume CYLC_* variables will still be made available to jobs? Those exported in the job script and those in the suite environment (the former may depend on the latter?).
I have a few use cases for them, but I suppose we can add more to wrapper/job-script if needed..

I assume CYLC_* variables will still be made available to jobs?

Yes, this is only about the general environment (such as $PATH etc.) that the scheduler program has access to, and whether or not jobs should have implicit access to that as well (they really shouldn't, but local background jobs do by the normal subshell inheritance mechanism). All variables written to the job script are fine as that is a deliberate Cylc thing.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dwsutherland picture dwsutherland  路  3Comments

sadielbartholomew picture sadielbartholomew  路  4Comments

hjoliver picture hjoliver  路  5Comments

sadielbartholomew picture sadielbartholomew  路  5Comments

dpmatthews picture dpmatthews  路  3Comments