I'm working on merging the two ICA tutorials into one. Here they are: one two. I want to expand the narrative a lot, explaining how ICA in MNE-Python works and how the user chooses the right parameter settings. Some of the parameters have confusing names like n_components, max_pca_components, n_pca_components. After reading the docstrings, I could not understand what is going on, so I read the source code for a couple hours. Then I read through #4856, #4882, #5054, and #5276. Now I want to make sure I understand everything before I try to (1) improve the docstrings more, and (2) improve the ICA tutorials. Here is my current understanding:
noise_cov, that will be used instead to do the pre-whitening. We tacitly assume that individual channels are already zero-mean, but don't check or enforce it.max_pca_components are retained. (max_pca_components must be integer or None)n_components (integer, float, or None).excluded, ica.apply() will reconstruct the sensor signals from the (unexcluded) ICs and possibly some of the PCs as well:n_components == n_pca_components then reconstruction is based only on the ICs.n_components < n_pca_components then reconstruction is based on the ICs plus PCs[n_components:n_pca_components]. This has a special case where n_pca_components == max_pca_components (or n_pca_components is None), in which case all of the retained PCs are used in the reconstruction (but the first n_components of them are replaced by the ICs).questions:
@cbrnr @mmagnuski @dengemann @agramfort
Great that you are taking this on @drammock! I think it is a very good idea to combine and expand the two tutorials.
re: approving your summary of the arguments to ICA, I'll let the more experienced folks to the job, because I am still confused about some of them :-) it will be very worthwhile to improve the docstrings.
I'll be happy to review the PR and contribute to your questions 2 and 3 as we go. Currently your proposal sounds great.
I'll let the more experienced folks to the job, because I am still confused about some of them :-)
@sappelhoff in that case, a different question -- do you find the description above understandable? @drammock could be put it in the ICA notes section (alongside a few param description updates probably) in a quick PR.
Yes, I find the descriptions good. Let me list the points that are still unclear to me and that might find a good place in the tutorials and/or docstrings:
I am organizing my questions according to the input arguments to mne.preprocessing.ICA.
It has been some time, so it could be that some of these questions are already sufficiently answered and I just forgot ... but I think this can still be useful for @drammock as a quick check, whether the points are addressed.
noise_covNoise covariance used for pre-whitening. If None (default), channels are scaled to unit variance (“z-standardized”) prior to the whitening by PCA.
noise_cov if I didn't want to use the default "z-standardization"? And what would be benefits / drawbacks of using my own noise_cov?max_pca_components and n_componentsmax_pca_components to reduce my dimensionality say from 64 EEG electrodes to 60 components ... and then use n_components to pass only 50 of those components to ICA? ... why not set max_pca_components to 50 in the first step?random_state (minor question)random_state, will the ICA always yield the same results?method and fit_paramsfit_params beyond the boolean extended mentioned in the docstring?tolerance in the notes section below ... ?max_itermax_iter?decim parameter in ica.fit()? Perhaps one could mention here, that the temporal dimension is irrelevant to ICA?@sappelhoff thanks for the notes about what is not clear. here are some quick answers:
noise_cov using mne.compute_covariance on a relevant part of your data (i.e., not the evoked part; either baseline period, empty room, etc)max_pca_components and small n_components is that ICA will run much faster, and including more of the PCA dimensions will often not make much difference in the ICA result (because more and more PCA dimensions capture less and less variance)max_iter unless you get a convergence warning (in which case increase it, but also stop to think about if your data has some problem preventing convergence, like multiple noisy channels that maybe should have been excluded)decim value will probably depend on original sfreq; I don't have a lot of practical experience witht this, but I would reason that your decim should not reduce your effective sample rate below the Nyqvist of the highest frequency you care about.to complement these great answers:
if you have max_pca_components > n_components you remove much less
dimensions in your data. You can certainly
get the EOG + ECG components with PCA+ICA using n_components < 30 but if
you can avoid working then with less
than 30 dimensions it's much better.
regarding decim you can actually go below nyqvist but you need to try on
your data.
Thanks @drammock for trying to clarify this topic! This is very much needed indeed. On a quick skim, your description is correct.
Regarding what else should go into the tutorial, many questions @sappelhoff brought up pertain to why specific things are the way they are in MNE. I think it is important to include such practical explanations in the tutorial. For example, what if I don't want to do PCA is a legitimate question, and we could address it by stating that as long as all PCA components are retained this is basically equivalent to not performing PCA at all.
Another important piece of information is to include use cases for supplying different values for n_components, max_pca_components, and n_pca_components. For example, a very basic use case could be to compute ICs for 64-channel EEG data with no dimensionality reduction with the aim to identify artifact components (e.g. EOG) - which parameters should be set?
Another important piece of information is to include use cases for supplying different values for n_components, max_pca_components, and n_pca_components. For example, a very basic use case could be to compute ICs for 64-channel EEG data with no dimensionality reduction with the aim to identify artifact components (e.g. EOG) - which parameters should be set?
what would you do here? I don't think that no dimensionality reduction
is a requirement here. WDYT?
It's not a requirement, but I'd compute all ICs without dim reduction by default because most ICA algorithms (especiall PICARD) are fast enough that this is not necessary.
@cbrnr Here's the latest rendering:
Is the sidebar just above the diagram good enough? I hesitate to add too many specific "case studies" since the tutorial is very long already. If you think this isn't enough, maybe we could add a shorter EEG-specific walkthrough as a separate example?
@drammock the sidebar is perfect!
>
It's not a requirement, but I'd compute all ICs without dim reduction by
default because most ICA algorithms (especiall PICARD) are fast enough that
this is not necessary.
ok fair enough. Maybe the debate more existing with MEG data where you can
have > 300 channels, or even HD EEG.