Mne-python: DOC: more ICA doc improvements

Created on 20 Jul 2019 · 11Comments · Source: mne-tools/mne-python

I'm working on merging the two ICA tutorials into one. Here they are: one two. I want to expand the narrative a lot, explaining how ICA in MNE-Python works and how the user chooses the right parameter settings. Some of the parameters have confusing names like n_components, max_pca_components, n_pca_components. After reading the docstrings, I could not understand what is going on, so I read the source code for a couple hours. Then I read through #4856, #4882, #5054, and #5276. Now I want to make sure I understand everything before I try to (1) improve the docstrings more, and (2) improve the ICA tutorials. Here is my current understanding:

pre-whitening: each channel type is standardized to zero-mean, unit-variance. That way, channels with big artifacts are still big relative to other channels of the same type, but across-channel-type signal magnitudes are comparable. If the user supplied noise_cov, that will be used instead to do the pre-whitening. We tacitly assume that individual channels are already zero-mean, but don't check or enforce it.
PCA: the pre-whitened data go into PCA. Only max_pca_components are retained. (max_pca_components must be integer or None)
ICA fitting: the PCs are fed into the ICA algorithm. How many of the PCs go in (and how many ICs you get back) is determined by n_components (integer, float, or None).
reconstruction: after possibly marking some ICs to be excluded, ica.apply() will reconstruct the sensor signals from the (unexcluded) ICs and possibly some of the PCs as well:
- If n_components == n_pca_components then reconstruction is based only on the ICs.
- if n_components < n_pca_components then reconstruction is based on the ICs plus PCs[n_components:n_pca_components]. This has a special case where n_pca_components == max_pca_components (or n_pca_components is None), in which case all of the retained PCs are used in the reconstruction (but the first n_components of them are replaced by the ICs).

questions:

Is any part of that summary incorrect?
Are there any details missing that you think are important enough to go in the tutorial?
Are there any details included that you think are too much information to be in the tutorial?

@cbrnr @mmagnuski @dengemann @agramfort

Source

drammock

❤1

All 11 comments

Great that you are taking this on @drammock! I think it is a very good idea to combine and expand the two tutorials.

re: approving your summary of the arguments to ICA, I'll let the more experienced folks to the job, because I am still confused about some of them :-) it will be very worthwhile to improve the docstrings.

I'll be happy to review the PR and contribute to your questions 2 and 3 as we go. Currently your proposal sounds great.

sappelhoff on 20 Jul 2019

I'll let the more experienced folks to the job, because I am still confused about some of them :-)

@sappelhoff in that case, a different question -- do you find the description above understandable? @drammock could be put it in the ICA notes section (alongside a few param description updates probably) in a quick PR.

larsoner on 20 Jul 2019

Yes, I find the descriptions good. Let me list the points that are still unclear to me and that might find a good place in the tutorials and/or docstrings:

I am organizing my questions according to the input arguments to mne.preprocessing.ICA.

It has been some time, so it could be that some of these questions are already sufficiently answered and I just forgot ... but I think this can still be useful for @drammock as a quick check, whether the points are addressed.

`noise_cov`

Noise covariance used for pre-whitening. If None (default), channels are scaled to unit variance (“z-standardized”) prior to the whitening by PCA.

What if I don't want to do PCA at at all?
What does "whitening by PCA" mean? Is in MNE's PCA implementation a whitening step? Or is PCA itself the whitening?
How would I create a noise_cov if I didn't want to use the default "z-standardization"? And what would be benefits / drawbacks of using my own noise_cov?

`max_pca_components` and `n_components`

Is there no option not to do a PCA? Why not?
It would be good to see a "flow" of processing like below
- data --> whitening -> pca --> ica --> ... do processing here, kick out IC components ... --> ica.apply --> pca.apply --> ??? --> cleaned data
Why would I use max_pca_components to reduce my dimensionality say from 64 EEG electrodes to 60 components ... and then use n_components to pass only 50 of those components to ICA? ... why not set max_pca_components to 50 in the first step?

`random_state` (minor question)

when I set a random_state, will the ICA always yield the same results?

`method` and `fit_params`

What are the differences between methods? Is there a quick summary?
What are pros and cons of different methods?
What options can i pass in fit_params beyond the boolean extended mentioned in the docstring?
- there seems to be some talk about tolerance in the notes section below ... ?

`max_iter`

How would I set this parameter sensibly? What am I trading off with low versus high values of max_iter?

Bonus questions

Can I run my ICA on epoched data? What are potential caveats? What are recommendations?
How to decide on the magnitude of the decim parameter in ica.fit()? Perhaps one could mention here, that the temporal dimension is irrelevant to ICA?
Can I take raw data, apply filters to it (like the often recommended 1Hz Highpass filter) ... and let my ICA run on this ... just to finally apply the ICA results on the unfiltered raw data? ... Why?

sappelhoff on 20 Jul 2019

@sappelhoff thanks for the notes about what is not clear. here are some quick answers:

Section 5 of this PDF gives a short and fairly clear explanation of why centering and whitening is almost always done prior to ICA.
You create your own noise_cov using mne.compute_covariance on a relevant part of your data (i.e., not the evoked part; either baseline period, empty room, etc)
one reason for using large max_pca_components and small n_components is that ICA will run much faster, and including more of the PCA dimensions will often not make much difference in the ICA result (because more and more PCA dimensions capture less and less variance)
don't bother adjusting max_iter unless you get a convergence warning (in which case increase it, but also stop to think about if your data has some problem preventing convergence, like multiple noisy channels that maybe should have been excluded)
acceptable decim value will probably depend on original sfreq; I don't have a lot of practical experience witht this, but I would reason that your decim should not reduce your effective sample rate below the Nyqvist of the highest frequency you care about.

drammock on 22 Jul 2019

👍1

to complement these great answers:

if you have max_pca_components > n_components you remove much less
dimensions in your data. You can certainly
get the EOG + ECG components with PCA+ICA using n_components < 30 but if
you can avoid working then with less
than 30 dimensions it's much better.
regarding decim you can actually go below nyqvist but you need to try on
your data.

agramfort on 23 Jul 2019

👍1

Thanks @drammock for trying to clarify this topic! This is very much needed indeed. On a quick skim, your description is correct.

Regarding what else should go into the tutorial, many questions @sappelhoff brought up pertain to why specific things are the way they are in MNE. I think it is important to include such practical explanations in the tutorial. For example, what if I don't want to do PCA is a legitimate question, and we could address it by stating that as long as all PCA components are retained this is basically equivalent to not performing PCA at all.

Another important piece of information is to include use cases for supplying different values for n_components, max_pca_components, and n_pca_components. For example, a very basic use case could be to compute ICs for 64-channel EEG data with no dimensionality reduction with the aim to identify artifact components (e.g. EOG) - which parameters should be set?

cbrnr on 5 Aug 2019

👍1

Another important piece of information is to include use cases for supplying different values for n_components, max_pca_components, and n_pca_components. For example, a very basic use case could be to compute ICs for 64-channel EEG data with no dimensionality reduction with the aim to identify artifact components (e.g. EOG) - which parameters should be set?

what would you do here? I don't think that no dimensionality reduction
is a requirement here. WDYT?

agramfort on 5 Aug 2019

It's not a requirement, but I'd compute all ICs without dim reduction by default because most ICA algorithms (especiall PICARD) are fast enough that this is not necessary.

cbrnr on 5 Aug 2019

@cbrnr Here's the latest rendering:

https://14566-1301584-gh.circle-artifacts.com/0/dev/auto_tutorials/preprocessing/plot_40_artifact_correction_ica.html

Is the sidebar just above the diagram good enough? I hesitate to add too many specific "case studies" since the tutorial is very long already. If you think this isn't enough, maybe we could add a shorter EEG-specific walkthrough as a separate example?

drammock on 6 Aug 2019

@drammock the sidebar is perfect!

cbrnr on 6 Aug 2019

It's not a requirement, but I'd compute all ICs without dim reduction by
default because most ICA algorithms (especiall PICARD) are fast enough that
this is not necessary.

ok fair enough. Maybe the debate more existing with MEG data where you can
have > 300 channels, or even HD EEG.

agramfort on 6 Aug 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

MAINT: CI

larsoner · 6Comments

I've a problem with "plot_sensors_connectivity" visualization

Sirabhop · 6Comments

Wrong axis scaling in .plot_image

sappelhoff · 6Comments

Remove window title decoration from new _TimeViewer

hoechenberger · 6Comments

report does not work for toy example

jasmainak · 3Comments

Mne-python: DOC: more ICA doc improvements

All 11 comments

noise_cov

max_pca_components and n_components

random_state (minor question)

method and fit_params

max_iter