Flux-core: flux-mini: controlling environment variables included

Created on 14 Aug 2020  ·  16Comments  ·  Source: flux-framework/flux-core

AFAICT, we don't have a way to control which environment variables get captured by flux mini into the jobspec and passed onto the job.

Slurm uses the --export=<[ALL,]environment variables|ALL|NONE> argument. Do we want to adopt something similar? Any ideas for a better interface?

Most helpful comment

Ok, here's where I ended up with prototype for now:

      --env=RULE            Control how environment variables are exported. If
                            RULE starts with '-' apply rest of RULE as a
                            filter (see --env-filter), if '^' then read rules
                            from a file (see --env-file). Otherwise, set an
                            environment variable from the current environment
                            (--env=VAR) or set a value explicitly
                            (--env=VAR=VALUE). Rules are applied in the order
                            they are used
      --env-filter=PATTERN  Filter environment variables matching PATTERN. If
                            PATTERN starts with a '/', then it is matched as a
                            regular expression, otherwise PATTERN is a shell
                            glob expression. (multiple use OK)
      --env-file=FILE       Read a set of environment rules from FILE

Allowing "rules" like -PATTERN and ^FILE allows a user to have a file like:

$ cat envfile 
-FLUX_URI
-LS_COLORS
-DBUS*
PATH=$PATH:/foo

Then

$ flux mini run --env-file=envfile hostname

Prepend/append on the command line would have to be handled separately since they don't have a "rule" syntax in this scheme.

This was just meant as an experiment. I'm willing to go a different direction if there is a perceived need for users to set environment variables with leading ^ and - characters on the command line.

All 16 comments

It feels like we could do better.

@trws added something like the following to flux-run

It would be nice to support functionality to the --export=NONE in jobspec srun. Even if it's named differently or has different semantics.

Agreed. I added something, let me know what you think. This one works like this:

create initial dictionary either if nothing is specified about environment or --env-all is specified, initialize to current env initialize to an empty dict
add environment entries from the command-line, with repeatable --env=key[=val] or -e key[=val] arguments
Also considering having an --env-file argument that would be between step 1 and step 2, but did this for now.

This seems preferable to Slurm's approach. For sophisticated users, perhaps an option could be used to supply a Python function that could act as a filter or would be called in place of the default method to fetch the environment?

One question though: what is the use case for no environment (i.e. --export=NONE)? Can anything run with an empty environment?

This seems preferable to Slurm's approach.

Yeah. I like that suggestion!

One question though: what is the use case for no environment (i.e. --export=NONE)?

My personal use case was to use it in combination with --dry-run. It makes the json output much more concise and readable (especially when using Spack). Something similar can be achieved with flux mini run --dry-run ... | jq '.attributes.system.environment = {}', but --export=NONE (or similar) is more convenient. Not sure how frequently users will reach for --dry-run, but if we expect them to, I suspect they will want to optionally drop the env too.

Can anything run with an empty environment?

That's a great question. I don't know. What is the minimal set of env vars that you need? Would anyone ever try flux mini run --export=NONE --env=PATH --env=HOME ...? That's probably a less common potential use case.

According to the srun docs, it looks like --export=NONE is incompatible with specifying individual env vars, but it is "particularly important for jobs that are submitted on one cluster and execute on a different cluster (e.g. with different paths)." That could be useful when using something like flux proxy.

For sophisticated users, perhaps an option could be used to supply a Python function that could act as a filter

Maybe a good starting point is supporting regex? Strawman: flux mini run --env-filter-regex='OMPI_.*' .... If we went that route, we wouldn't need an explicit --export=NONE since --env-filter-regex='.*' would cover it.

One question though: what is the use case for no environment (i.e. --export=NONE)? Can anything run with an empty environment?

I actually do this, or really the moral equivalent env -i, relatively frequently when I’m trying to debug crazy compute environments at HPC centers. There tends to be a lot of crud in the environment, so I pretty frequently want to clear all of that out and start a shell or job with a script that sets only what I want. Part of the key there is setting what I want, which is easier when you can add things on the same command, and using something that can run my controlled init scripts without having to deal with system files. It’s not something I think most users would use a lot, but being able to clear all current state and have a job kinda “sealed” with only what it specifies can be useful when you want to be sure you can reproduce things.

Thanks @SteVwonder and @trws - that makes total sense.

I like the idea of --env-filter-regex as a starting point. For a simpler interface, a glob might work as well:

 $ flux mini run --env-filter="*" ...
 $ flux mini run --env-filter="OMPI_*" ...

I think most users would use a lot, but being able to clear all current state and have a job kinda “sealed” with only what it specifies can be useful when you want to be sure you can reproduce things.

Great point!

In Slurm we added a use-env plugin which allowed flexible control over environment via "named" configurations. (e.g. you could run a job with srun --use-env=normal ... or srun --use-env=testing to set up a "normal" vs "testing" environment.

It occurs to me that we could do something even more powerful by optionally passing the formed jobspec object to a plugin or set of plugins after it is created in flux-mini. Not only could these plugins set, unset, or modify the jobspec environment, but they could modify anything in the jobspec. Perhaps this could part of a solution for #3143.

It occurs to me that we could do something even more powerful by optionally passing the formed jobspec object to a plugin or set of plugins after it is created in flux-mini

Oh duh, @SteVwonder already brought this one up in #2875

How about this?

 --env=RULE

Where RULE is a generic environment modification RULE with syntax like:

  • -<pattern>: filter out environment variables matching pattern pattern. pattern is shell glob syntax for simplicity, unless it is prefixed with /, in which case it is a regex (with optional trailing '/')
  • VAR: Set environment variable VAR from current environment
  • VAR=VAL set env var VAL explicitly to VAL
  • VAR+=VAL prepend VAL to colon separated env var VAR (e.g. PATH)
  • VAR=+VAL append VAL to colon separated env var VAR (e.g. PATH)

As a convenience, the --env-filter=PATTERN option can still be offered, but is the same as --env=-PATTERN

e.g. to clear environment

$ flux mini run --env="-*" --dry-run hostname | jq .attributes.system.environment
{}

To only propagate the current PATH:

$ flux mini run --env="-*" --env=PATH --dry-run hostname | jq .attributes.system.environment
{
  "PATH": "/home/grondo/git/flux-core.git/src/cmd:/usr/bin:/bin"
}

To propagate PATH, appending or prepending a path element:

$ flux mini run --env="-*" --env="PATH=+/foo" --dry-run hostname | jq .attributes.system.environment
{
  "PATH": "/home/grondo/git/flux-core.git/src/cmd:/usr/bin:/bin:/foo"
}
$ flux mini run --env="-*" --env="PATH+=/foo" --dry-run hostname | jq .attributes.system.environment
{
  "PATH": "/foo:/home/grondo/git/flux-core.git/src/cmd:/usr/bin:/bin"
}

The env RULE prefix could later be expanded, for example a ^file or |program to read env vars from a file or external program.

Just throwing this approach out there. I have a working prototype.

I like that a lot!

The one question I have is w.r.t. ordering of the filters. Do the filters get applied in the order they are provided on the command line? If so, presumably this would produce an empty environment: flux mini run --env=PATH --env="-*" hostname? I think that is fine, I just want to make sure I have the semnatics straight in my head.

One alternate semnatic would be to sort the modifications and then apply them in the order you have above:

  • Removing with glob/regex
  • Copying from current environment
  • Explicitly setting
  • Prepending
  • Appending

The one question I have is w.r.t. ordering of the filters. Do the filters get applied in the order they are provided on the command line? If so, presumably this would produce an empty environment: flux mini run --env=PATH --env="-*" hostname? I think that is fine, I just want to make sure I have the semnatics straight in my head.

Yeah, the filters would be applied in the order they are specified on the command line. This not only might be slightly more predictable, but would be much easier to implement. However, as you note above this does allow you to undo something you've just done.

The alternate semantics you proposed above seem like they would work as well. I can't think of a use case they would not handle. Obviously if you added features to get environment from files and or filter or fetch environment with a program you'd have to be explicit about the order those features are processed in and keep that documented. It might be easier in the long run to just state once that "--env and --env-filter options are processed in the order they are given on the command line".

It might be easier in the long run to just state once that "--env and --env-filter options are processed in the order they are given on the command line".

Sounds good to me!

I'd probably go with in order FWIW. If there needs to be some further ordering applied, perhaps the way gcc handles order of arguments would help, where it takes each type of argument in order but does not interleave them. All -I arguments are processed in order, but apply to all files specified for example. All -L are applied before -l. So we could say "all filters, then all env" or something if we had to but just having it be in order seems simplest and probably least surprise for now.

As to the rules, I like the concept of each of those, I'm not sure I'm onboard with making it implicit based on a prefix, or perhaps especially the =+ variant. Environment variables in shells and classic utilities are relatively kind things, but the rules are a great deal looser on non-standard utilities use of them. From The Open Group:

"These strings have the form name=value; names shall not contain the character '='. For values to be portable across systems conforming to IEEE Std 1003.1-2001, the value shall be composed of characters from the portable character set (except NUL and as indicated below)."

So... yeah, this is valid:

~/build master*
127 ❯ env -i - '--meh=bah' bash -c 'env'
--meh=bah
PWD=/Users/scogland1/build
SHLVL=1
_=/usr/bin/env

As is this:

~/build master*
❯ env -i - 'meh=+bah' bash -c 'env'
meh=+bah
PWD=/Users/scogland1/build
SHLVL=1
_=/usr/bin/env

And meh+=bah parses as variable env+ being assigned to bah.

Most shells don't let you use environment variables like this, so we might get away with just saying "you shall not pass!" these environment variables through this interfaces, but at least the chance we'll get someone who wants to set an environment variable to something starting with a + seems relatively likely.

Good points. I will back off the append/prepend variants.
Especially since a user could do:

 --env=PATH=$PATH:/foo

Sorry got carried away there.

FWIW I really like the idea of having easy access to that functionality, especially since it makes it composable multiple times on the same command which you can't really do otherwise, just perhaps expressed with separate flags?

Yeah, the only reason I was using the admittedly naive syntax was to be able to append all rules to the same internal list, then allow a special "read from file" syntax which allowed the same "rules" to be listed in a file, e.g.:

PATH+=/foo
DEBUG=t
TERM

This can't be accomplished if you require a separate option for each environment manipulation rule.

However, there's probably a different, but more sophisticated way to handle env-files.

Hm, I just discovered os.path.expandvars() which could be applied to an environment file, which allows:

PATH=/foo:$PATH

:shrug:

For now we can leave the append/prepend off and a specific option can be added later as @trws suggests.

Ok, here's where I ended up with prototype for now:

      --env=RULE            Control how environment variables are exported. If
                            RULE starts with '-' apply rest of RULE as a
                            filter (see --env-filter), if '^' then read rules
                            from a file (see --env-file). Otherwise, set an
                            environment variable from the current environment
                            (--env=VAR) or set a value explicitly
                            (--env=VAR=VALUE). Rules are applied in the order
                            they are used
      --env-filter=PATTERN  Filter environment variables matching PATTERN. If
                            PATTERN starts with a '/', then it is matched as a
                            regular expression, otherwise PATTERN is a shell
                            glob expression. (multiple use OK)
      --env-file=FILE       Read a set of environment rules from FILE

Allowing "rules" like -PATTERN and ^FILE allows a user to have a file like:

$ cat envfile 
-FLUX_URI
-LS_COLORS
-DBUS*
PATH=$PATH:/foo

Then

$ flux mini run --env-file=envfile hostname

Prepend/append on the command line would have to be handled separately since they don't have a "rule" syntax in this scheme.

This was just meant as an experiment. I'm willing to go a different direction if there is a perceived need for users to set environment variables with leading ^ and - characters on the command line.

This should have been closed by #3150

Was this page helpful?
0 / 5 - 0 ratings

Related issues

garlick picture garlick  ·  4Comments

garlick picture garlick  ·  3Comments

grondo picture grondo  ·  7Comments

chu11 picture chu11  ·  3Comments

garlick picture garlick  ·  8Comments