Flux-core: Design/prototype new run interface

Created on 27 Jun 2019  Â·  13Comments  Â·  Source: flux-framework/flux-core

This is a design/discussion issue for the command line arguments and syntax for a jobspec-oriented run/submit interface.

The main idea is that there is a "slot shape", and a target entity for task scheduling. I'm not 100% sold on my own terminology here so please feel free to propose alternatives. Essentially, you get whether the task is per-slot or per-resource and which resource from the --target parameter, which defaults to slot. The number of tasks can either be per-target, or a total count, and the shape is specified with a restricted version of the original short-form jobspec I proposed. Here's a sketch of the interface:

flux run

  • --file: read jobspec from a file, TODO determine override behavior, for now mutually exclusive with all else
    OR
  • --target: either slot or a specific resource specified in the request, default slot
  • --slot-shape: short-form resource shape, default: Node, current format thought <resource-type>[\[<min>\[:<max>\]\]]\[><resource>|,<resource at same level>], basically what we discussed long ago but limited in what can be specified for now, still set up to parse as yaml so you could also put actual yaml/json here if you were sufficiently motivated
  • --shape-file: read shape as a resource-set from a file
  • --nslots: number of slots to request, default: 1, also accepts a range to populate count
  • --tasks-per-target: number of tasks to run per target, either slot or resource, default 1
  • --total-tasks: total number of tasks to run in some arrangement across resources, mutually exclusive with --tasks-per-target
  • --time walltime, using flux duration

use-cases, drawn from rfc14:

  1. Request 4 nodes: flux run --nslots 4
  2. Request between 3 and 30 nodes: flux run --nslots 3:30
  3. Request 4 tasks(sic. was nodes, but that would be the same as the following) with at least 2 sockets each, and 4 cores per socket: (not planning to support sockets yet, but) flux run --nslots 4 --shape socket[2]>core[4]
  4. Request an exclusive allocation of 4 nodes that have at least two sockets and 4 cores per socket: flux run --nslots 4 --shape node>socket[2]>core[4]

Skipping the complex examples as we don't plan to support them yet, and for now the recommended mechanism would be writing the jobspec.

use-case set 2:

  1. Run hostname 20 times on 4 nodes, 5 per node

    1. flux run --nslots 4 --total-tasks 20 hostname

    2. flux run --nslots 4 --tasks-per-slot 5 hostname

    3. flux run --slot-shape node[4] --tasks-per-resource node:5 hostname

  2. Run 5 copies of hostname across 4 nodes, default distribution: flux run --nslots 4 --total-tasks 5 hostname
  3. Run 10 copies of myapp, require 2 cores per copy, for a total of 20 cores: flux run --nslots 10 --shape core[2] myapp
  4. Multiple binaries is not necessarily on tap yet, but I'm thinking of allowing you to have multiple of these on the same command line with a separator, probably get to the same place.
  5. Run 10 copies of app across 10 cores with at least 2GB per core: flux run --shape (core,memory[2g]) app (possibly amounts we may need to revisit)
  6. Run 10 copies of app across 2 nodes with at least 4GB per node: flux run --shape node>memory[4g] --total-tasks 10 app

One possible issue here is that several of our use-cases require the slot to be outside the node for them to be easily expressible. Opening another issue for discussion of jobspec-V1 and ordering shortly.

All 13 comments

I really like this interface. I think it is very concise yet powerful and flexible.

--time walltime, TODO need a format here, prefer a standard

Flux Standard Duration? Unfortunately not super user friendly if you want to specify a fractional amount like 4 hours and 30 minutes (i.e., 270m in FSD).

--tasks-per-target: number of tasks to run per target, either slot or resource, default 1

After showing this to a fellow HPDC attendee (@dchapp), the target name was initially very confusing. Once I got him in terms of purely slots (and basically ignoring the target idea to start), the interface was very natural to him. I think the 90% use-case will be with slots and only the more advanced users will end up using the --target option (I may be wrong on that though). If that is the case, I think it would be helpful to have a --tasks-per-slot option which is essentially an alias with --tasks-per-target, which (potentially) is mutually exclusive with --target. In any user documentation, the --target option should probably be presented towards the end, if not in its own section, to avoid confusing users with simpler resource specification needs.

I am also leaning towards changing --shape to --slot-shape, that way it is clear when reading the final command line:
flux run --shape Core[2] --nslots 10 myapp
vs
flux run --slot-shape Core[2] --nslots 10 myapp
In the former, it is ambiguous as to whether the shape means the shape of the entire allocation or the shape of the slot.

One other point brought up by @dchapp was, in the case of --shape (Core,Memory) what does Memory default to? If it defaults to 1 byte, that's not overly useful and could be harmful if the job is actually contained to that exact amount. I wonder if we make an exception for that specific resource (i.e., the default is 1g).

Unfortunately not super user friendly if you want to specify a fractional amount like 4 hours and 30 minutes (i.e., 270m in FSD).

4.5h would be valid FSD as well since the number is defined as floating point.

Would it make sense to add a new subcommand to flux-jobspec that implements the proposed command line as the first prototype step? That makes it easy to play with...

Good feedback, thanks! I'll update it to slot-shape, it's certainly easier to tell what that means.

What would you think of getting rid of --target as it is, having --tasks-per-slot and --tasks-per-resource where the latter would take an argument of the form <resource>:<num> or similar? I keep circling around how to deal with that split, and I like the idea of having the two be mutually exclusive, so maybe make both the per-resource options one?

As to memory, I think we had a concept of a default unit at some point, when it comes to memory or storage having a default unit of gigabyte seems pretty reasonable. Will have to think about that. The first version wont support memory as an option since V1 jobspec doesn't, but when we get there I would think we'd do something like that.

On flux-jobspec, sure I don't see why not. It would probably make some of the refactoring easier to build the second part in there anyway now that I think about it.

Great start!

Yes I agree that dovetailing this with flux-jobspec will make later testing easier.

In terms of the esource types names, I prefer using the actual names being used in other places like schedulers. Right now we are using lower cases like socket instead of Socket.

Same comment on how to specify the time as @SteVwonder.

Good point @dongahn, I had forgotten the implementation used all-lower, updated the description to match, and added the flux duration up there. Do we have an existing library for parsing the durations?

Do we have an existing library for parsing the durations?

Good question. @SteVwonder or @garlick? I need to modify libjobspec as well.

Do we have an existing library for parsing the durations?

That’s a good question. Turns out, we do. In libutil.
https://github.com/flux-framework/flux-core/blob/4e01f7517c6ab3e3b5b56a67282a9205125a1498/src/common/libutil/fsd.h

What would you think of getting rid of --target as it is, having --tasks-per-slot and --tasks-per-resource where the latter would take an argument of the form : or similar?

That sounds like a good plan to me.

Cool. Will definitely use, although I'm not sure we want to continue allowing NaN and infinite to be valid durations...

Agreed that NaN shouldn't be valid. Infinite though, that could be useful for more cloud-like persistent workloads. 🤷‍♂

Maybe the tiny FSD RFC should be updated with 1) NaN not allowed, and 2)
decimal point optional.

Jim

On Fri, Jun 28, 2019 at 12:37 PM Stephen Herbein notifications@github.com
wrote:

Agreed that NaN shouldn't be valid. Infinite though, that could be useful
for more cloud-like persistent workloads. 🤷‍♂

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/flux-framework/flux-core/issues/2213?email_source=notifications&email_token=AABJPW35LLND4IK42PMFF4DP4ZR5ZA5CNFSM4H4AFQQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY27O4I#issuecomment-506853233,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABJPW7VNLR4HODKF2CGRE3P4ZR5ZANCNFSM4H4AFQQQ
.

Maybe just state positive normal FP with decimal optional.

There are quite NaNs, signaling NaNs, overflow, underflow etc as defined by IEEE754. They are considered not normal and we should 't allow any of these. If users want inf, they can just specify a very large value?

I’ll propose an update and a patch, it should be pretty straightforward. My preference would be (!(zero||nan||inf)). I don’t see a strong reason to explicitly disallow subnormals, they really shouldn’t come up, but if they do I don’t know why we’d choke. Did you have an issue in mind Dong?

I have no issue. Wanted to reduce the wording. Fine with your proposal.

Dong

Was this page helpful?
0 / 5 - 0 ratings

Related issues

grondo picture grondo  Â·  7Comments

dongahn picture dongahn  Â·  7Comments

SteVwonder picture SteVwonder  Â·  7Comments

SteVwonder picture SteVwonder  Â·  7Comments

SteVwonder picture SteVwonder  Â·  4Comments