Vscode-cpptools: [feature request] very basic support for MPI debugging multiple processes at the same time

Created on 22 Mar 2018  Ā·  21Comments  Ā·  Source: microsoft/vscode-cpptools

It would be good to have a very basic support of debugging MPI programs.

Outside of VSCode I can attach a debugger to each terminal via

mpirun -np 2 xterm -e lldb -f my_exec

and then do the standard serial debugging in each terminal window.

Alternative is to hack-in into the source a piece of code which prints process ID and waits so that a user can attach a debugger to some/all processes, see https://www.open-mpi.org/faq/?category=debugging#serial-debuggers .

With vscode-cpptools one can already adopt the latter via launch.json and its "processId": "${command:pickProcess}" option.

It would be great, if vscode-cpptools goes one step further by extending the launch config with

    {
      "num_mpi_processes" : "4",
      "mpi_exec": "<path-to-mpirun>"
    },

and in the background simply attach gdb/lldb debugger for each MPI process and perhaps append a number to the "name": "(lldb) Attach", so that one can use the available GUI.

One would, of course, need to manually switch between debuggers for each process in the GUI but this should be fine for up to 4-8 processes, which is often enough to find bugs in MPI programs.

p.s. AFAIK the only other non-commerical project with MPI debugger is Eclipse PTP, but last time I checked it was limited to Linux.

Feature Request debugger

Most helpful comment

Thanks @davydden. We played around with this a bit and I _think_ I could see how we someone could make this work.

Below is a basic outline in case anyone wants to try and take this on. In the interest of full disclosure, this is enough work that we would need to see significant interest in using this from the MPI community before we would take on this work.

At any rate, here is how we think it could work:

  1. A VS Code extension that will do something

    1. It opens a unix domain socket (or whatever IPC you want) to listen to messages from a ā€˜launch helper’ program (see item 2 below).

    2. It kicks off an MPI session by executing something like mpirun -n <num_nodes> /example/path/to/launch/helper <path-to-app>

    3. When the launch helper program reports process ids back, build up a json object with the content that would normally be in a launch.json configuration that tells VS Code to debug using GDB/LLDB, and attach to the specified process ID

    4. Send this launch command to VS Code using vscode.debug.startDebugging(undefined, DebugConfiguration) // see https://code.visualstudio.com/docs/extensionAPI/vscode-api#_debug

    5. In startDebugging.then, tell the launch helper program to run the target app

  2. We need a little ā€˜launch helper’ program that is written in C/C++. I will call this 'MPIDebugLauncher'. It does the following:

    1. When it starts up, it sends a message to the extension with its process id

    2. Wait for the extension to tell the helper that launch is ready (step 1.e)

    3. execv the target app

A few other notes:

  1. It might be possible to have GDB launch the process instead of having it to attach if GDB is unhappy with attaching past the execv. But I am not sure if it is just environment variables that need to be propagated from MPI to the target app, or if things will only work if the parent/child process relationship is preserved.
  2. When we implemented MPI debugging in full VS many years ago, there were many scalability and races that we needed to deal with. So while this seems like it should work, its hard to say for sure.
  3. As an alternative, if the target app processes are all spawned as child processes of mpirun/exec, it might be possible to implement this as just general child process debugging

All 21 comments

One would, of course, need to manually switch between debuggers for each process in the GUI but this should be fine for up to 4-8 processes, which is often enough to find bugs in MPI programs.

@davydden I don't think the UI would support what you are proposing. Unfortunately, VS Code will only allow one debug instance, so even if we attach to 4 separate processes (which we can't because VS Code won't spawn the debug adapter 4 times) there isn't a UI to allow this change. With the way the debugAdapter is authored, it is a one to one relationship with the debuggee. Your feature request would need to start with VS Code for them to allow simultaneous multi-process debugging.

@pieandcakes
that's unfortunate, but thanks for the prompt and detailed comment, though šŸ‘

EDIT: i guess the only way to hack this is separate _attach_ configs:

    {
      "name": "(lldb) Attach MPI 0",
      "type": "cppdbg",
      "request": "attach",
      "program": "${workspaceFolder}/some.debug",
      "processId": "${command:pickProcess}",
      "MIMode": "lldb"
    },
    {
      "name": "(lldb) Attach MPI 1",
      "type": "cppdbg",
      "request": "attach",
      "program": "${workspaceFolder}/some.debug",
      "processId": "${command:pickProcess}",
      "MIMode": "lldb"
    },
    {
      "name": "(lldb) Attach MPI 2",
      "type": "cppdbg",
      "request": "attach",
      "program": "${workspaceFolder}/some.debug",
      "processId": "${command:pickProcess}",
      "MIMode": "lldb"
    },

@davydden Even then, you can only attach one process at a time so it won't get you too far. Once you have started debugging, you can't really debug again.

Once you have started debugging, you can't really debug again.

I see, thanks for clarifying.

@pieandcakes I was said upstream that the support for multiple processes is already there:

The VS Code debugger supports multiple processes. See https://code.visualstudio.com/updates/v1_20#_node-debugging.
The animated gif shows VS Code debugging a master process and 5 child processes.

do you think it's possible for cpptools to use this functionality and enable debugging with multiple processes?

based on the discussion in vscode issue, I would say one needs to do two things:

  1. from cpptools debugger configuration, run a given executable with mpirun -np X my_executable and somehow pause until debuggers are attached (X is number of processes).
  2. look for processes with the name my_executable (there should be as many as X) and attach lldb debugger to each one

Personally, I have no idea how one would pause[1] processes to wait until debugger is attached, but apart from this technical issue, this should be doable and everything is there in VSCode to support it.

[1] without injecting any code like

while (0 == i) 
  sleep(5);

@davydden we can take that as a suggested feature request. Can you provide a project with a repro? I don't have experience with mpirun.

@pieandcakes sure, here's a simple example

// compile with:
//   mpic++ -std=c++11 mpi_example.cc -o mpi_example
// run with:
//   mpirun -np 2 mpi_example
// debug manually with different terminals for different processes:
//   mpirun -np 2 xterm -e lldb mpi_example

#include <mpi.h>
#include <stdio.h>
#include <iostream>
#include <unistd.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    for (int p = 0; p < world_size; ++p)
      {
        if (p == world_rank)
          std::cout << "process rank " << p << " out of " << world_size << std::endl;

        // uncomment this if you want all processes to wait and check PID and alike
        /*
        while (p==0)
          sleep(5);
        */

        MPI_Barrier(MPI_COMM_WORLD);
      }

    // Finalize the MPI environment.
    MPI_Finalize();
}

on macOS you can use homebrew or spack to build any MPI providers i.e. OpenMPI (do not confuse with OpenMP, those are different things). On Linux you can get MPI from standard repositories (i.e. sudo apt-get install openmpi-bin openmpi-common libopenmpi-dev should do). I have no experience with building MPI on Windows, though.

Let me know if I could further help somehow, though I have no experience whatsoever with TypeScript / JavaScript used in VSCode.

p.s. this is of course a personal opinion, but I think VS Code could attract a fair amount of computational scientists with this feature as not all universities have licenses for dedicated commercial debuggers, that cost upwards from around 700$. It's probably also not something an individual would buy himself. A good summary of debugger options for MPI is in this stackoverflow post and also in deal.II FAQ wiki.

Thanks @davydden. We played around with this a bit and I _think_ I could see how we someone could make this work.

Below is a basic outline in case anyone wants to try and take this on. In the interest of full disclosure, this is enough work that we would need to see significant interest in using this from the MPI community before we would take on this work.

At any rate, here is how we think it could work:

  1. A VS Code extension that will do something

    1. It opens a unix domain socket (or whatever IPC you want) to listen to messages from a ā€˜launch helper’ program (see item 2 below).

    2. It kicks off an MPI session by executing something like mpirun -n <num_nodes> /example/path/to/launch/helper <path-to-app>

    3. When the launch helper program reports process ids back, build up a json object with the content that would normally be in a launch.json configuration that tells VS Code to debug using GDB/LLDB, and attach to the specified process ID

    4. Send this launch command to VS Code using vscode.debug.startDebugging(undefined, DebugConfiguration) // see https://code.visualstudio.com/docs/extensionAPI/vscode-api#_debug

    5. In startDebugging.then, tell the launch helper program to run the target app

  2. We need a little ā€˜launch helper’ program that is written in C/C++. I will call this 'MPIDebugLauncher'. It does the following:

    1. When it starts up, it sends a message to the extension with its process id

    2. Wait for the extension to tell the helper that launch is ready (step 1.e)

    3. execv the target app

A few other notes:

  1. It might be possible to have GDB launch the process instead of having it to attach if GDB is unhappy with attaching past the execv. But I am not sure if it is just environment variables that need to be propagated from MPI to the target app, or if things will only work if the parent/child process relationship is preserved.
  2. When we implemented MPI debugging in full VS many years ago, there were many scalability and races that we needed to deal with. So while this seems like it should work, its hard to say for sure.
  3. As an alternative, if the target app processes are all spawned as child processes of mpirun/exec, it might be possible to implement this as just general child process debugging

@WardenGnaw thanks entertaining the idea, appreciate it.

this is enough work that we would need to see significant interest in using this from the MPI community before we would take on this work.

That's perfectly understandable. I will ping a few folks in case the want to add something on the prospect of debugging MPI with GUI in Visual Studio Code @alalazo @bangerth @tjhei @BarrySmith @tgamblin @goxberry @jthies .

Thanks for the ping @davydden!

I agree with both of the statements @davydden mentioned re: market:

VS Code could attract a fair amount of computational scientists with this feature as not all universities have licenses for dedicated commercial debuggers, that cost upwards from around 700$. It's probably also not something an individual would buy himself.

and scoping a "parallel debugging" feature:

up to 4-8 processes, which is often enough to find bugs in MPI programs

The two most important things that make dedicated commercial debuggers (e.g., TotalView) easier to use are:

  • a single window can toggle over each MPI process
  • the current active process number is prominently labeled

It's possible to do many of the same sorts of things that these debuggers do by spawning multiple debugger instances in separate windows, but it's a kludgy setup that tends to reduce productivity because a user has to spend time figuring out which context (i.e., which window) they want to operate on.

If VS Code were able to toggle among MPI processes in a single window and display the appropriate debugging info, it would be a killer feature for developers who don't have access to dedicated commercial debuggers, and it would make it easier to demo and explain to beginners how to debug parallel programs that use MPI.

VS Code could attract a fair amount of computational scientists with this feature as not all universities have licenses for dedicated commercial debuggers, that cost upwards from around 700$. It's probably also not something an individual would buy himself.

I agree with @davydden. We pay lot of money for site licenses of parallel debuggers. Often it's sufficient for many users to debug with few mpi processes. If parallel debugging feature will be available, me (and my colleagues) will definitely give a try!

That will be a very practical feature for daily use before jumping to more sophisticated debuggers.

Have you talked to the Microsoft HPC/MPI developers? I thought they developed exactly this kind of functionality (on the MPI side) several years ago. I could be wrong -- I don't follow the Microsoft side of HPC very much. Best to check with them to see exactly what they did.

Additionally, be aware that the MPI community defined a mechanism for the MPI implementation and a tool (e.g., a debugger) to attach to each other -- it's called the MPIR process acquisition interface. Both Open MPI and MPICH have supported MPIR for... er... I don't know offhand, but it's measured in decades. It's the same mechanism that is used by DDT, TotalView, ...etc. If you're going to make new tools that attach to MPI processes, you might as well use the pre-existing infrastructure for it.

That being said, Open MPI is deprecating its MPIR interface and is (literally) in the middle of replacing it with a more modern/flexible PMIx interface for tool attachment. MPIR is fine, but it's showing its age; PMIx is the New Hotness. That might be worth investigating, too.

Just my $0.02.

Also, debugging a parallel application is quite different than debugging several independant processes

actions (continue, step, next, ...) and breakpoints/watchpoints/tracepoints should be set on all the tasks, a subset of tasks (e.g. communicator/group) or an individual task.

automatically attaching to several independant processes is obviously better than nothing, especially if it is free (as in beer), but I am afraid it is a bit too rusty to be effective.

Obviously, a parallel debugger is not a trivial development.

This would be extremely useful too. For people who can't purchase something that's about $1000 to use at research projects or at home, and there is quite a few that I know of personally, this would be an amazing feature.
In fact, I am interested enough that depending on what's needed I would be willing to contribute some too.

@themeh I had listed a description on what needs to be done in this comment.

Creating a VSCode extension can be found here.

@WardenGnaw
I seem to have gotten very close to getting this to work (a 'launch' configuration, not 'attach') by just changing a bunch of gdb settings in the launch.json file. At the moment, there appears to be one or two bugs getting in the way, but it seems to work in principle:

My Application:
There's a single parent process responsible for launching all of the child processes using fork() and execv().
System Control Task (parent)
-- InitTask (child)
-- ConfigTask (child)
-- StorageManager (child)
-- SBServer (child)
-- etc...

Normal behavior without gdb (for reference):

  1. Launch the Control Task
  2. The C.T. launches the InitTask to completion (waits for it to exit)
  3. The C.T. launches the rest of the child processes in turn, and they _stay_ running
  4. All of the log messages from the child processes are fed to the terminal where the C.T. was run.

What _works_ when launched with gdb in vscode:
1) The C.T. starts using a simple 'launch' style debugging configuration
2) The C.T. forks itself, and you get 2 call stacks and 2 identical "inferiors" listed the debug console.
image
image

  1. The execv() is run, and the second inferior becomes the InitTask.
    image
    image
  2. Note that the child task appears to have automatically been selected when it hit its own break points, despite "follow-fork-mode" being set to "parent" (see below config). This _only_ works if you have the "schedule-multiple" and "non-stop" settings set to "on". Without them, the focus stays on Inferior 1 (the parent), and the application hangs.
    Default Wrong Behavior ('non-stop' and 'schedule-multiple' aren't turned on):
    image
    image
    .
    .
    Observed Bugs
    1) The _entire_ debugging instance is torn down once the InitTask closes itself (despite the 3rd task starting to launch successfully). The only way I've avoided this is to detatch the process manually, before it exits.
    1.a) insert a breakpoint at the very end of the InitTask main()
    1.b) use the debug console to switch to a different active inferior.
    1.c) detach the InitTask inferior (which is trying to close) and remove it. That way, it'll continue running, and close itself on its own.
    When I enabled engine logging, it appeared as if the extension was sending an "exit" command when it detected the child process close (while still attached).
    image
    (here, I skipped a bunch of "paragraph-sized" messages related to unloading/re-loading libraries, breakpoints, etc)
    image
    Below is a stripped down version of the log output:

[2019-09-02 07:09:39.881] I SCT >>> SCT starting on PID 30603 <<<
[2019-09-02 07:09:40.306] I SCT starting 'Init' (AppControl/debug/bin/PrimeInit) on PID 30611
[2019-09-02 07:10:05.903] D INIT no previous network/interfaces file found - skipping network configuration
[2019-09-02 07:10:19.567] I SCT 'Init' (AppControl/debug/bin/PrimeInit) exited - status 0
[2019-09-02 07:10:19.969] I SCT starting 'Config' (Configuration/debug/bin/ConfigTask) on PID 30615

2) Another observed bug was that break points are misplaced in gdb if processes have conflicting names for source files (e.g. if both have a main.cpp). I just posted about this on this ticket: https://github.com/microsoft/vscode-cpptools/issues/3268

.
.
.
.
This is my current entry for "setupCommands" in my launch.json file. The rest was just a typical launch config (define an executable, working directory, etc).
NOTE: The farther down this list you go, the less sure I am about the behavior (and/or "correctness") of each setting.

"setupCommands": [
{
"description": "Enable pretty-printing for gdb",
"text": "-enable-pretty-printing",
"ignoreFailures": false
},
{
"text": "-gdb-set follow-fork-mode parent", //parent or child both seem to work if 'schedule-multiple is enabled.
"ignoreFailures": false
},
{
"description": "On a fork, keep gdb attached to both processes.",
"text": "-gdb-set detach-on-fork off", //set to off
"ignoreFailures": false
},
{
"description": "PID stays the same after renaming forked process",
"text": "-gdb-set follow-exec-mode same", //'same' or 'new' works
"ignoreFailures": false
},
{
"text": "-gdb-set schedule-multiple on", //must be set to 'on' if follow-fork-mode is 'parent', or child will never run.
"ignoreFailures": false
},
{
"description": "Let other processes continue if one hits a break point",
"text": "-gdb-set non-stop on", //this is probably optional, depending on desired behavior
"ignoreFailures": false
},
{
"description": "needs to be off for 'non-stop' mode?",
"text": "-gdb-set pagination off", //unsure if this is needed
"ignoreFailures": false
},
{
"description": "Need this 'on' to ensure 'non-stop on' works",
"text": "-gdb-set target-async on",
"ignoreFailures": false
}
]

Any update on this? This is still the top result on google for "debug mpi in vscode", so it seems this has significant interest. As a stop-gap solution, has anyone been able to successfully create a launch.json file to launch and automatically attatch to eg the first mpi rank? Naively, it seems to me that this should be significantly more doable, and still extremely usefull...

Debugging would be fantastic, but I can't figure out how to include the "mpiexec -n " prefix in launch.json. Surely there is some way to do this, yes? I tried adding it as a prelaunch task but the system requires that I include it as a prefix to running the a python script. Any help would be greatly appreciated.

Tangentially related: have a look at tmpi. It's not as cool as debugging MPI executables in VSCode could be, but it's a major improvement over mpirun -np $n xterm -e gdb --args ./my_executable since keyboard input is multiplexed to each process (by default). Note that it only works for OpenMPI (not MPICH) for the moment though.

Was this page helpful?
0 / 5 - 0 ratings