Version: 20.1.0
What was I trying to do:
Run fMRIPrep on nodes on our cluster with 64GB ram and 12 physical cores hyperthreaded to 24. The dataset includes single echo task fmri and multi-echo rest data and I'm asking for aroma to be run.
What did I expect to happen:
See fMRIPrep complete without errors.
What actually happened:
fMRIPrep completed, but reports an error in AROMA. The report contains AROMA output for sub-23613_ses-01_task-rest_run-1and it appears that all of the expected output files are created and of about the correct size.
I've attached the error log.
20200528-165544_3854c182-41ad-4620-b4c1-abf281cb0956.zip
As well as the complete standard out and standard error with debug turned on:
sub-23613_output.zip
Here is the contents of the func directory for a subject that did not produce the error:
-rw-r--r-- 1 nielsond MBDU 68 May 29 09:06 sub-23927_ses-01_task-mid_run-1_AROMAnoiseICs.csv
-rw-r--r-- 1 nielsond MBDU 201K May 29 09:07 sub-23927_ses-01_task-mid_run-1_desc-confounds_regressors.json
-rw-r--r-- 1 nielsond MBDU 1.5M May 29 09:06 sub-23927_ses-01_task-mid_run-1_desc-confounds_regressors.tsv
-rw-r--r-- 1 nielsond MBDU 620K May 29 09:06 sub-23927_ses-01_task-mid_run-1_desc-MELODIC_mixing.tsv
-rw-r--r-- 1 nielsond MBDU 228M May 29 02:30 sub-23927_ses-01_task-mid_run-1_space-fsaverage_hemi-L_bold.func.gii
-rw-r--r-- 1 nielsond MBDU 228M May 29 02:30 sub-23927_ses-01_task-mid_run-1_space-fsaverage_hemi-R_bold.func.gii
-rw-r--r-- 1 nielsond MBDU 1.2M May 29 05:03 sub-23927_ses-01_task-mid_run-1_space-MNI152NLin2009cAsym_res-2_boldref.nii.gz
-rw-r--r-- 1 nielsond MBDU 100K May 29 05:05 sub-23927_ses-01_task-mid_run-1_space-MNI152NLin2009cAsym_res-2_desc-aparcaseg_dseg.nii.gz
-rw-r--r-- 1 nielsond MBDU 88K May 29 05:05 sub-23927_ses-01_task-mid_run-1_space-MNI152NLin2009cAsym_res-2_desc-aseg_dseg.nii.gz
-rw-r--r-- 1 nielsond MBDU 97 May 29 05:03 sub-23927_ses-01_task-mid_run-1_space-MNI152NLin2009cAsym_res-2_desc-brain_mask.json
-rw-r--r-- 1 nielsond MBDU 27K May 29 05:03 sub-23927_ses-01_task-mid_run-1_space-MNI152NLin2009cAsym_res-2_desc-brain_mask.nii.gz
-rw-r--r-- 1 nielsond MBDU 93 May 29 05:05 sub-23927_ses-01_task-mid_run-1_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.json
-rw-r--r-- 1 nielsond MBDU 524M May 29 05:05 sub-23927_ses-01_task-mid_run-1_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.nii.gz
-rw-r--r-- 1 nielsond MBDU 337M May 29 09:08 sub-23927_ses-01_task-mid_run-1_space-MNI152NLin6Asym_desc-smoothAROMAnonaggr_bold.nii.gz
-rw-r--r-- 1 nielsond MBDU 33 May 29 09:06 sub-23927_ses-01_task-rest_run-1_AROMAnoiseICs.csv
-rw-r--r-- 1 nielsond MBDU 113K May 29 09:07 sub-23927_ses-01_task-rest_run-1_desc-confounds_regressors.json
-rw-r--r-- 1 nielsond MBDU 269K May 29 09:06 sub-23927_ses-01_task-rest_run-1_desc-confounds_regressors.tsv
-rw-r--r-- 1 nielsond MBDU 161K May 29 09:06 sub-23927_ses-01_task-rest_run-1_desc-MELODIC_mixing.tsv
-rw-r--r-- 1 nielsond MBDU 123M May 29 02:28 sub-23927_ses-01_task-rest_run-1_space-fsaverage_hemi-L_bold.func.gii
-rw-r--r-- 1 nielsond MBDU 122M May 29 02:28 sub-23927_ses-01_task-rest_run-1_space-fsaverage_hemi-R_bold.func.gii
-rw-r--r-- 1 nielsond MBDU 1.5M May 29 04:46 sub-23927_ses-01_task-rest_run-1_space-MNI152NLin2009cAsym_res-2_boldref.nii.gz
-rw-r--r-- 1 nielsond MBDU 100K May 29 04:46 sub-23927_ses-01_task-rest_run-1_space-MNI152NLin2009cAsym_res-2_desc-aparcaseg_dseg.nii.gz
-rw-r--r-- 1 nielsond MBDU 88K May 29 04:46 sub-23927_ses-01_task-rest_run-1_space-MNI152NLin2009cAsym_res-2_desc-aseg_dseg.nii.gz
-rw-r--r-- 1 nielsond MBDU 98 May 29 04:46 sub-23927_ses-01_task-rest_run-1_space-MNI152NLin2009cAsym_res-2_desc-brain_mask.json
-rw-r--r-- 1 nielsond MBDU 28K May 29 04:46 sub-23927_ses-01_task-rest_run-1_space-MNI152NLin2009cAsym_res-2_desc-brain_mask.nii.gz
-rw-r--r-- 1 nielsond MBDU 75 May 29 05:02 sub-23927_ses-01_task-rest_run-1_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.json
-rw-r--r-- 1 nielsond MBDU 382M May 29 05:02 sub-23927_ses-01_task-rest_run-1_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.nii.gz
-rw-r--r-- 1 nielsond MBDU 507M May 29 09:30 sub-23927_ses-01_task-rest_run-1_space-MNI152NLin6Asym_desc-smoothAROMAnonaggr_bold.nii.gz
And here is one that did produce the error:
-rw-r--r-- 1 nielsond MBDU 112 May 29 04:09 sub-23613_ses-01_task-mid_run-1_AROMAnoiseICs.csv
-rw-r--r-- 1 nielsond MBDU 199K May 29 04:09 sub-23613_ses-01_task-mid_run-1_desc-confounds_regressors.json
-rw-r--r-- 1 nielsond MBDU 1.4M May 29 04:09 sub-23613_ses-01_task-mid_run-1_desc-confounds_regressors.tsv
-rw-r--r-- 1 nielsond MBDU 720K May 29 04:09 sub-23613_ses-01_task-mid_run-1_desc-MELODIC_mixing.tsv
-rw-r--r-- 1 nielsond MBDU 229M May 29 00:32 sub-23613_ses-01_task-mid_run-1_space-fsaverage_hemi-L_bold.func.gii
-rw-r--r-- 1 nielsond MBDU 230M May 29 00:32 sub-23613_ses-01_task-mid_run-1_space-fsaverage_hemi-R_bold.func.gii
-rw-r--r-- 1 nielsond MBDU 1.2M May 29 02:39 sub-23613_ses-01_task-mid_run-1_space-MNI152NLin2009cAsym_res-2_boldref.nii.gz
-rw-r--r-- 1 nielsond MBDU 101K May 29 02:40 sub-23613_ses-01_task-mid_run-1_space-MNI152NLin2009cAsym_res-2_desc-aparcaseg_dseg.nii.gz
-rw-r--r-- 1 nielsond MBDU 88K May 29 02:40 sub-23613_ses-01_task-mid_run-1_space-MNI152NLin2009cAsym_res-2_desc-aseg_dseg.nii.gz
-rw-r--r-- 1 nielsond MBDU 97 May 29 02:39 sub-23613_ses-01_task-mid_run-1_space-MNI152NLin2009cAsym_res-2_desc-brain_mask.json
-rw-r--r-- 1 nielsond MBDU 29K May 29 02:39 sub-23613_ses-01_task-mid_run-1_space-MNI152NLin2009cAsym_res-2_desc-brain_mask.nii.gz
-rw-r--r-- 1 nielsond MBDU 93 May 29 02:40 sub-23613_ses-01_task-mid_run-1_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.json
-rw-r--r-- 1 nielsond MBDU 541M May 29 02:40 sub-23613_ses-01_task-mid_run-1_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.nii.gz
-rw-r--r-- 1 nielsond MBDU 350M May 29 04:11 sub-23613_ses-01_task-mid_run-1_space-MNI152NLin6Asym_desc-smoothAROMAnonaggr_bold.nii.gz
-rw-r--r-- 1 nielsond MBDU 57 May 29 03:43 sub-23613_ses-01_task-rest_run-1_AROMAnoiseICs.csv
-rw-r--r-- 1 nielsond MBDU 119K May 29 03:43 sub-23613_ses-01_task-rest_run-1_desc-confounds_regressors.json
-rw-r--r-- 1 nielsond MBDU 423K May 29 03:43 sub-23613_ses-01_task-rest_run-1_desc-confounds_regressors.tsv
-rw-r--r-- 1 nielsond MBDU 269K May 29 03:43 sub-23613_ses-01_task-rest_run-1_desc-MELODIC_mixing.tsv
-rw-r--r-- 1 nielsond MBDU 124M May 29 00:30 sub-23613_ses-01_task-rest_run-1_space-fsaverage_hemi-L_bold.func.gii
-rw-r--r-- 1 nielsond MBDU 125M May 29 00:30 sub-23613_ses-01_task-rest_run-1_space-fsaverage_hemi-R_bold.func.gii
-rw-r--r-- 1 nielsond MBDU 1.4M May 29 02:23 sub-23613_ses-01_task-rest_run-1_space-MNI152NLin2009cAsym_res-2_boldref.nii.gz
-rw-r--r-- 1 nielsond MBDU 101K May 29 02:23 sub-23613_ses-01_task-rest_run-1_space-MNI152NLin2009cAsym_res-2_desc-aparcaseg_dseg.nii.gz
-rw-r--r-- 1 nielsond MBDU 88K May 29 02:23 sub-23613_ses-01_task-rest_run-1_space-MNI152NLin2009cAsym_res-2_desc-aseg_dseg.nii.gz
-rw-r--r-- 1 nielsond MBDU 98 May 29 02:23 sub-23613_ses-01_task-rest_run-1_space-MNI152NLin2009cAsym_res-2_desc-brain_mask.json
-rw-r--r-- 1 nielsond MBDU 29K May 29 02:23 sub-23613_ses-01_task-rest_run-1_space-MNI152NLin2009cAsym_res-2_desc-brain_mask.nii.gz
-rw-r--r-- 1 nielsond MBDU 75 May 29 02:38 sub-23613_ses-01_task-rest_run-1_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.json
-rw-r--r-- 1 nielsond MBDU 369M May 29 02:38 sub-23613_ses-01_task-rest_run-1_space-MNI152NLin2009cAsym_res-2_desc-preproc_bold.nii.gz
-rw-r--r-- 1 nielsond MBDU 471M May 29 04:02 sub-23613_ses-01_task-rest_run-1_space-MNI152NLin6Asym_desc-smoothAROMAnonaggr_bold.nii.gz
I'm at a bit of a loss about how to debug this further, but I've got the working directory for one of the subjects that produced the error if that's needed.
In terms of replicability, I've got one subject where I ran it twice and once produced the error and once did not. It looks like the number of CPUs differs, so this may be another manifestation of the nthreads issue #2151. I'll parse through the runs that produced errors and see if they're consistently on 72 CPU nodes.
i'm almost positive this is a result of the memory failure - can you rerun using the --nprocs flag and see if you still run into this?
Yep, I've kicked off a new batch of runs. Will let you know tomorrow if that's resolved it.
I'm not 100% certain it will though. I grouped runs by completion status, number of CPUs on the compute node, and the label of the nipype node that generated the first error message and found all 5 jobs on 32 CPU nodes failed, as did a bunch on the 56 CPU nodes. There are several cases where it ran out of memory and did not write any Error messages to standard out. You can see the first failure was always in the aroma workflow for the first rest echo from the first rest run.

In case it's relevant to the error, all of the rest scans and field maps are LAS+
@mgxd I could be wrong, but I don't think nprocs is going to fix it. Here's the command I ran:
singularity run --cleanenv /data/MBDU/singularity_images/fmriprep_20.1.0.simg /data/MBDU/midla/data/bids $TMPDIR/out participant --participant_label 20900 -w $TMPDIR/wrk --nprocs $SLURM_CPUS_PER_TASK --mem_mb $SLURM_MEM_PER_NODE --fs-license-file /data/MBDU/singularity_images/license.txt --output-spaces MNI152NLin2009cAsym:res-2 fsaverage --dummy-scans 4 --bids-filter-file /data/MBDU/midla/data/derivatives/fmriprep/fmriprepv20.1.0_20200528_2mm_nprocs/bids_filter --longitudinal --use-aroma --error-on-aroma-warnings -vvv
And nprocs is set:
[nipype]
crashfile_format = "txt"
get_linked_libs = false
memory_gb = 65.536
nprocs = 24
omp_nthreads = 8
plugin = "MultiProc"
resource_monitor = false
stop_on_first_crash = false
However, it looks like it's still using 56 threads:
200529-16:14:59,986 nipype.workflow INFO:
[MultiProc] Running 0 tasks, and 41 jobs ready. Free memory (GB): 65.54/65.54, Free processors: 56/56.
200529-16:15:00,350 nipype.workflow DEBUG:
Allocating fmriprep_wf.single_subject_20900_wf.func_preproc_ses_01_task_mid_run_1_wf.func_derivatives_wf.spacesource ID=0 (0.20GB, 1 threads). Free: 65.34GB, 55 threads.
I can let these continue running if you really think it's made a difference, but as far as I can tell, --nprocs is not a solution.
In the mean time I'm also testing a small batch of jobs specifying the 56 core nodes and requesting 247 GB of ram.
nope - you are totally right, it won't affect anything... we're not setting the correct nipype config value 馃槄
Alright, I'm using singularity build to build from dockerhub images in order to run things on our HPC. Easiest way for me to test it would be if I could pull an image with that fix from dockerhub. Is there a relatively painless way for that to happen tonight so that I can check results tomorrow?
your best bet would probably be to:
https://github.com/mgxd/fmriprep/archive/fix/cli-parser.zip locally and unzipfmriprep/ source folder to fmriprep's location within the container: /usr/local/miniconda/lib/python3.7/site-packages/fmriprepThanks for the prompt response late on a Friday! That seems to be running with the bindpath set:
( export TMPDIR=/lscratch/$SLURM_JOB_ID && export SINGULARITY_BINDPATH="/gs4,/gs5,/gs6,/gs7,/gs8,/gs9,/gs10,/gs11,/spin1,/scratch,/fdb,/data,/lscratch" && mkdir
-p $TMPDIR/out && mkdir -p $TMPDIR/wrk && singularity run --cleanenv --bind /data/MBDU/nielsond/fmriprep-fix-cli-parser/fmriprep/:/usr/local/miniconda/lib/pytho
n3.7/site-packages/fmriprep /data/MBDU/singularity_images/fmriprep_20.1.0.simg /data/MBDU/midla/data/bids $TMPDIR/out participant --participant_label 20900
-w $TMPDIR/wrk --nprocs $SLURM_CPUS_PER_TASK --mem_mb $SLURM_MEM_PER_NODE --fs-license-file /data/MBDU/singularity_images/license.txt --output-spaces MNI
152NLin2009cAsym:res-2 fsaverage --dummy-scans 4 --bids-filter-file /data/MBDU/midla/data/derivatives/fmriprep/fmriprepv20.1.0_20200528_2mm_clifix_test/bids_filter -
-longitudinal --use-aroma --error-on-aroma-warnings -vvv ;
I'll check it in 3 hours and see if the number of threads is being constrained properly.
Number of threads looks good now, running the rest of my subjects.
Looks like it still had errors with aroma on many of the subjects.
Looks like it still had errors with aroma on many of the subjects.
Same errors as above?
No memory errors this time, but still failures in Aroma.

Since it wasn't getting memory errors, it looks like the more sessions were run, which produced more errors for each subject. I think for the analysis I'm using this data for at OHBM, I'm going to just have to forgo AROMA. Processing without aroma, only the DVARs error persisted. I'm happy to add some of the failing sessions to this OpenNeuro dataset (https://openneuro.org/datasets/ds002156/versions/1.0.0) if that would be helpful. I can also pull out more log files or error messages if needed.
I can also pull out more log files or error messages if needed.
That'd be helpful if you want to give aroma another opportunity.
I'm happy to add some of the failing sessions to this OpenNeuro dataset (https://openneuro.org/datasets/ds002156/versions/1.0.0) if that would be helpful.
This would be very helpful, but I can't promise anything immediate. Particularly useful to test whether upgrading aroma will help (which I believe it will)
Alright, Here are the full logs and error files for one example of each of the two types of errors:
ica_aroma_confound_extraction.zip
ica_aroma.zip
Additionally, I've added sessions that failed Aroma to the OpenNeuro dataset I have for testing: https://openneuro.org/datasets/ds002156/versions/2.0.0
Here is the log and crash output from this subject:
23638.zip
I'm also running that subject again with the dataset as it appears on OpenNeuro to make sure that the error replicates.
I've picked one thing that probably means nothing - the version string looks as though the image was custom built - how did you get this image?
Regarding AROMA there's little I can tell right away. If anything, that ICA-AROMA is not a great idea on ME-EPI (@emdupre, @tsalo please correct me if I'm suggesting something notoriously wrong). Because of that, I'm actually pretty unclear about how fMRIPrep implements ICA-AROMA for multiecho.
I believe that ICA-AROMA is run on optimally combined data when ME-EPI data are provided, since optimal combination (bold_t2s_wf) is done before the bold-to-standard transform, and ICA-AROMA (ica_aroma_wf) is done after.
I don't think there's a problem running ICA-AROMA on optimally combined data though. We've even talked about incorporating ICA-AROMA into tedana (see https://github.com/ME-ICA/tedana/issues/217) or running both multi-echo denoising and ICA-AROMA using the same decomposition in fmriprep (see https://github.com/poldracklab/fmriprep/issues/1010#issuecomment-619209589).
Based on the error file, it looks like the mixing matrix is one-dimensional. I'm not very familiar with MELODIC but is it possible that it generates 1D mixing matrices in cases of convergence failure?
EDIT: Oh, or, if not convergence failure, could it be that only one component was detected?
@oesteban this container had @mgxd's fix for the the CLI parsing added into it as suggested earlier in the comment thread.
Here's a report for the the subject above that failed with the error in ICA_aroma:
example_report.zip
It's strange because the Aroma report is still being generated, it's got a bunch of components, and all of the outputs are present and of approximately the expected size. Is it possible that Aroma is inappropriately getting called on echo 1 after the SDC?
@Shotgunosine how many echoes you have? this stinks as if we were running ICA-AROMA as many times as echoes are found...
Indeed, the crashfiles are named:
20200529-221226_71675025-57a3-49d6-9d5b-a694c40a3a86/crash-20200530-094228-nielsond-ica_aroma.a2-02a95bc1-4169-456a-92f5-093936ad77ef.txt
20200529-221226_71675025-57a3-49d6-9d5b-a694c40a3a86/crash-20200530-094200-nielsond-ica_aroma.a3-79745c2f-f59f-4b3d-b404-5c3d1a9253b0.txt
so, there must be a node ica_aroma.a1 which did not fail and generated the components that are then rendered. @mgxd could you check this theory?
Looking at the paths of the ica-aroma command line, it seems clear that we are attempting to run ica-aroma on each echo.
MELODIC seems fine, and run only once. We should check that we run it on the optimally combined image, though.
@oesteban Thanks for looking into this. This data has 4 echos.
ha! so one echo was lucky enough to go without failure (??)
@Shotgunosine was kind enough to send me the relevant part of the working directory for one of these failure cases and I noticed some patterns.
Firstly, the masks definitely vary by echo. This makes sense, given what I found in #2175. For three of the four echoes, the mask looks fairly good, but for the last one there are only a handful of voxels. This is a little surprising to me. I've overlaid the masks from the four echoes, as used in the ica_aroma_wf, and you can see the differences:

The last mask is on top and in red, way down at the bottom. In case you can't see it, here's a screenshot with the crosshairs on the mask:

That mask is so tiny that ICA run on it produces only one component that doesn't seem to actually vary, resulting in a 1D array. That's the cause of the error "not enough dimensions" error.
However, I _also_ noticed that the mean_image used by MELODIC differs across echoes as well.

This confuses me. ica_aroma_wf accepts its bold_std, bold_mask_std, and spatial_reference inputs from bold_std_trans_wf. We know that bold_std_trans_wf is run in parallel across echoes, but the actual BOLD data fed to bold_std_trans_wf is the optimally combined data, so the only things that I can think of that would vary are the brain masks (we already know those vary from the first part of this comment) and maybe the SDC warps? Would that explain the obvious ghosting?
Actually, it looks like the "ghosting" in the mean images is due to weird shifts that happen partway through the time series for the third and fourth echoes (see below). Echo 4 is on the left and echo 3 is on the right. The volume the figure is on is the first one exhibiting the problem in the time series. You can see that the problem starts with volume 145 for echo 4 and volume 166 for echo 3. The shift happens earlier in the fourth echo than in the third echo, which makes the "ghosting" in the mean images appear to increase with echo.

This splitting isn't in the raw data, so my guess is that it's introduced in the bold_std_trans_wf and is caused by either echo-wise parallelization of the masks _or_ echo-wise parallelization of one of the other inputs to bold_std_trans_wf with the exception of the actual BOLD data, which is definitely the same, optimally-combined data.
I reran 5 subjects that were failing with this error with #2181 and didn't get the errors and the outputs look good. I'm happy to rerun again once #2181 is put on the head of #1803 if someone pings me when there a branch somewhere for me to test.
@tsalo, just for clarity:
Is this correct?
Yes, that seems correct to me.
This should be fixed as of #2181. @Shotgunosine would you be willing to try again with unstable once the build finishes on Docker Hub?
Sorry, just saw this, will attempt to get this running today.
Pinging @Shotgunosine - we'd like to release 20.1.2 (and also 20.2.0!) as soon as possible. Can we confirm whether #2181 fixed this?
Actually, I'm going to close this and the 20.1.2 milestone with it. Please reopen if this is not fixed.
Most helpful comment
I reran 5 subjects that were failing with this error with #2181 and didn't get the errors and the outputs look good. I'm happy to rerun again once #2181 is put on the head of #1803 if someone pings me when there a branch somewhere for me to test.