Flux-core: [spectrum mpi] need to supppress OMPI_, JSM_, and PMIX_ environment

Created on 10 Aug 2018  路  7Comments  路  Source: flux-framework/flux-core

During the hackathon, @trws and I chased down that (on Sierra) we now need to strip out environment variables starting with OMPI_, JSM_, and PMIX_. It could be the case that it is only one or a few of the env vars, but going the thermonuclear route (i.e., stripping out all of them) does not seem to do any harm and fixes the current release (and hopefully future proofs us against any further spectrum updates).

All 7 comments

@dongahn: the instructions on that page actually work if followed verbatim because they use flux proxy. If you proxy into a flux instance, your environment doesn't have any of the variables set by jsrun. So everything works as expected. The problem only occurs when you try to do things like jsrun flux start flux wreckrun myApp. So we should probably add a note at the end.

The long term fix is probably updating the spectrum MPI personality in wreck, maybe tagging a minor release (0.10.1), and then re-building the newest version on Sierra. Then users don't have to specify any of the environment variables, and can instead, run flux wreckrun -o mpi=spectrum (or whatever the parallel is for the submit RPC).

Related, I just learned the following from IBM JSM team.

To make a long story short, it appears that we should also be able to run high speed Spectrum MPI based on libpami with flux if we provide pmix natively from within flux.

(mention @garlick: since he wanted to know about this.)

This is a design feature not a bug. PAMI will dlopen/dlsym all symbols that it needs from PMIx. This way we eliminate any compile time dependency between both libraries. The way PAMI interacts with PMIX is as follows:
1 - PAMI dlopen/dlsym PMIX convenience (client) library.
2 - PAMI then will call PMIX symbols starting with PMIx_Init to get job related info.
3 - At PAMI init, PAMI will use PMIx to communicate all IB related info (and more) among its tasks

PAMI will dlopen/dlsym all symbols that it needs from PMIx.

Does PAMI require all of the PMIx spec to be present (i.e., all the symbols)? Or is there a minimal set that we could start with that would be sufficient for Spectrum MPI?

The info isn't available but maybe we can trace it using a tool like ltrace.

I opened a new issue #1620 for the PMIx topic. Let's keep this focused on environment changes needed in the Flux spectrum mpi personality.

I believe this can be closed. The environment variables are scrubbed in the spectrum.lua job shell initrc script.

Was this page helpful?
0 / 5 - 0 ratings