Implement stdio modes beyond simply capturing stdout + stderr to the KVS on job exit.
Options can be passed via jobspec attributes.system.shell.iomode or similar.
Some options from wreck for consideration in the new system are
--label-io::
-l::
Prepend stdout and stderr output lines with the task id to which
the output belongs.
--output='FILENAME'::
-O 'FILENAME'::
Duplicate stdout and stderr from tasks to a file or files. 'FILENAME'
is optionally a mustache template which expands the keys 'id', 'cmd'
and 'taskid'. (e.g. '--output=flux-{{id}}.out')
--error='FILENAME'::
-E 'FILENAME'::
Send stderr to a different location than stdout.
--input='HOW'::
-i 'HOW'::
Indicate how to deal with stdin for tasks. 'HOW' is a list of 'src:dst'
pairs where 'src' is a source 'FILENAME' which may optionally be a
mustache template as in `--output`, or the special term `stdin` to
indicate the stdin from a front end program, and 'dst' is a list of
taskids or `*` or `all` to indicate all tasks. The default is `stdin:*`.
If only one of 'src' or 'dst' is specified, a heuristic is used to
determine whether a list of tasks or an input file was meant. (e.g
`--input=file.in` will use `file.in` as input for all tasks, and
`--input=0` will send stdin to task 0 only.
I think the mustache templates were thought to be pretty nice!
Brainstormed thoughts on how to approach some of these modes.
First I assumed that we do want to keep basic premise of adding an output service (proposed in #2201, WIP PR #2208) in the job-shell.
if we wanted to add additional output modes, such as output to files after the job is done, output to files locally but don't permanently store, store output to KVS, drop all output, etc. adding these to the output service would be a good way to abstract the ultimate output away from the job-shell.
if we wanted to add a mechanism to read past / future output, it would also be convenient to go through the output service. flux job attach could attach to the service directly to get all output from the job so far, or see all the future output coming. The output of just stdout/stderr or just the output of a specific task would be easy to implement as well.
Issues that we need to determine:
Since the output service will use a service name based on the jobid, it's easy to connect to that service name.
But how would the front end tool (flux job attach) know which broker the service is on to communicate with? This currently isn't known. Perhaps the information could be stored in the job eventlog? So if this information isn't in the job eventlog, we know the job hasn't started, so flux job attach can wait for the job to start (or error, could be either).
How would the front end tool know if there is any output to be read, at all? For example, if the user chose to drop all output in the task, how could flux job attach know this information and appropriately wait, error, or handle the situation. Would the information be in the jobspec? (Get via KVS lookup). Or would the information be in the job eventlog? Or would we simply have the output service return an appropriate errno indicating error?
Assuming we write some information to the job eventlog, we need to determine which job eventlog to write this information to: the main job eventlog or the guest one. The attach tool would also have to determine which eventlog to look at, the main job eventlog or the guest one.
Thinking about these issues, I think the most important task is to determine how flux job attach would communicate with the service. By eventlog or otherwise. That would lead to us having to possibly have to solve #2105 first. Once that is done, we can probably do a minimal libiobuf service #2201/#2208 that replicates what is currently there, before moving onto the more advanced I/O modes we care about. Perhaps --labelio is the trivial first one we do.
First, I assumed that we do want to keep basic premise of adding an output service (proposed in #2201, WIP PR #2208) in the job-shell
The _service_ may be too specialized to the shell to abstract in a library economically. For example, the shell can log to stderr and exit, while a library needs interfaces to let the caller make such decisions. The corner cases you mentioned would likely be more straightforward to handle in the shell rather than in a library. Given how simple shell/io.c turned out to be, it feels to me like we need to rethink the library approach.
That said, front end commands and the shell should probably share some code, for example the encoding/decoding of I/O (flux-framework/rfc#192).
There is a design card on the exec project board due 8/16. I suggest the flux-core team convene sometime soon at coffee time to discuss the important questions you are raising here and get a sketch of the design agreed upon.
Anyway, thanks for raising these questions as I think they are pertinent to the design either way!
Let's refocus this issue on _output_ modes, since that's somewhat separable from input modes. We can cover input in #2257.
in order for the shell to support several of these output modes, job submission information has to be passed from the user to the shell. The best place for this appears to be the jobspec.
As an initial implementation, lets pass such options in attributes.system.exec.shell.options.
It's also not clear to me how the average user will specify these options at the moment (via flux-srun? via flux-jobspec?). Perhaps via the eventual newer flux-srun (#2150) where the user can pass options through onto the shell?
We have to decide if we want full command line compatibility with Slurm srun for flux jobspec srun. If we do, then we'll need a flux jobspec srun -o, --output option which supports the Slurm IO redirection "filename pattern", (See IO Redirection section of srun(1) manpage)
My guess is that we don't want to support the full slurm filename patterns, but do we want to save the -o, --output option for later, in case at some point flux jobspec srun does have to emulate srun?
If we're willing to break compatibility, one idea would be to add an -O, --output option which takes some sort of output filename template (it wouldn't have to be mustache).
As part of supporting shell MPI and other plugins, we'll need a way to set other generic options in the shell.options object, and I was hoping to have a -o, --option=SPEC option available to do that, but that does break the srun interface (-o is --output in srun.)
About to add output options to flux mini. The shell options proposed in #2395 and #2396 are:
attributes.system.shell.options.output.<stream>.type
attributes.system.shell.options.output.<stream>.path
attributes.system.shell.options.output.<stream>.label
Values for _type_: (string) "kvs", "file", or "per-task"
Values for _path_: (string) a UNIX path, w/ optional embedded "{{id}}" mustache
Values for _label_: (boolean) true or false
The plan was to create --output and --error options similar to wreckrun's. It's straightforward to set _type_="file" and _path_ based on these options.
How should the user select between _type_="file" and _type_="per-task"? Do we need --output-mode and --error-mode options?
For _label_, I assume that --label would set _label_=true for streams with _type_="file".
I think the per-task "type" is going to go away once we have real templating. The shell will have to figure out if the output file is per-task, per-shell, or per-job based on the template. Therefore, my opinion is that we shouldn't add any new options to flux mini just yet so we won't have to change them later.
Values for type: (string) "kvs", "file", or "per-task"
Yup
For label, I assume that --label would set label=true for streams with type="file".
Yup
Edit (i messed up cut & paste)
Values for path: (string) a UNIX path, w/ optional embedded "{{id}}" mustache
Values for label: (boolean) true or false
both yup
we shouldn't add any new options to flux mini just yet so we won't have to change them later
Are we ok with -o stdout.type=per-task being the only way to activate per-task I/O for now, or would it be better to have the output plugin support "{{taskid}}" now and drop the _type_ shell options?
It would be better if we could support arbitrary templates now and drop the type=per-task. For the current milestone, could we delay per-task support and just support per job output files, and rework the current per-task PR with a version that supports full mustache templates?
-o stdout.type=per-task
Does this work now? That's cool!!
For the current milestone, could we delay per-task support and just support per job output files, and rework the current per-task PR with a version that supports full mustache templates?
I was working under this understanding. That we were only trying to get single file stdout/stderr redirection working.
I peeled off #2422 for per-task output so we can close this one.
Most helpful comment
I was working under this understanding. That we were only trying to get single file stdout/stderr redirection working.