Mne-python: Proceeding with ICA works

Created on 3 Oct 2020  Â·  41Comments  Â·  Source: mne-tools/mne-python

This is just a rough sketchpad to help organize and keep track a little of the ICA construction site, because the big picture is fading quickly for me…

  • [ ] We now support floats in max_pca_components, and we will keep this parameter
  • [ ] We will drop ICA.n_pca_components, as typically, users can achieve similar results by controlling max_pca_components and n_components. This will close #8327
  • [x] Ensure that projectors are handled better and that n_pca_components=None work (#8343)
  • [x] Selection of number of components via n_components as a float should follow the "greater than" approach, not "less or equal than" (#8336, #8326)
  • [x] Even in cases where we don't warn, display the estimated rank of the and potentially applied projectors, as suggested by @cbrnr (#8328)
  • [ ] Next PR:

    • Drop ICA.max_pca_components via deprecation

    • Keep ICA.n_components and ICA.n_pca_components, keep apply(..., n_pca_components) to select a sub-set of all PCs when reconstructing the signal

    • Warn if n_pca_components n_components is likely to operate on rank-deficient data (produce ~zero-valued principal components) by checking the variance division values used when computing mixing_matrix_ from the unmixing_matrix_ (#8328)

    • Make sure n_pca_components is not changed during fit (should become n_pca_components_; #8327)

  • [ ] Next PR: make n_pca_components and apply_baseline keyword-only by fixing decorator
    How would we handle the deprecation of max_pca_components? Will we even need a deprecation cycle, since while we're changing the API, we're not removing functionality, and we could emit a fat helpful error message if sb tries to use this param?
ICA

Most helpful comment

But this will only fix things in cases where we know about projectors, right?

No, the average ref (not projector, applied) works fine in master already in the noop test IIRC, and should be fine. The critical thing is just that the data you fit should have the same rank deficiency / linear operators applied to it as the data you apply to. So in the case of projections we can just force this to be the case. People with custom refs or rank deficiencies from elsewhere will just have to be careful about it.

But why not? It wouldn't hurt – or would it? I cannot see any drawbacks here, neither with EEG nor with MEG data. Am I missing something?

Any time you have to numerically estimate rank you have to trade off the risk of blowing up some meaningless/numerically zero component against accidentally zeroing out some meaningful component. So we should only add it if we need to.

Although, in this case, we'd fit and apply on the same data, i.e., same level of rank deficiency… still. Would that work?

Yes as above

All 41 comments

@larsoner @agramfort Let's move ICA discussions here :)

sounds like a plan !

Although it might be slightly cleaner to do:

__init__(self, n_components)
def apply(self, n_pca_components)

I can live with the API:

__init__(self, n_components, n_pca_components)
def apply(self, n_pca_components)

it makes it clear that the value only matters during reconstruction, not during fitting. But if we consider the constructor n_pca_components a sort of "default value" that can be overridden it makes sense, especially since it will be an attribute of the object and survive a save-load round-trip.

One thing I think we should no longer do is remove any components from the PCA decomposition -- there is no speed gain so no advantage to doing so beyond a negligible memory and I/O space savings. This means this would now be permitted:

ica = ICA(n_components=8, n_pca_components=9)
ica.fit(raw)
ica.apply(raw, n_pca_components=10)

In other words, the n_pca_components in the constructor provide the default number to reconstruct with, but you can always choose up to n_channels to use.

We will warn if n_pca_components is likely to operate on rank-deficient data

-1 on this. I actually think it won't matter or break anything -- you'll only need to be careful about rank deficiency for the n_components step (to keep the self.mixing_matrix_ = pinv(self.unmixing_matrix_) stable because we divide the latter by the exp var). If you do the lines I pasted above on 10-channel EEG data with an average projector I think it will reconstruct just fine. There will be a noise component reconstructed but it will be 1e-14 (or maybe 1e-7) or so down so should be fine.

From a practical standpoint, I think we should merge #8326, then a follow-up PR using the noop test there should help assert the behaviors above about not needing to warn about rank-deficiency of n_pca_components, just n_components.

... and then for defaults we can just have:

n_components=0.9999, n_pca_components=None

and everything should be safe.

Ok so let's go ahead and merge #8326 then, and then move on from there.

8326 has been merged. I want to test this with some of our previously problematic data before we proceed. But out for a walk now – will test sometime tonight.

... and then for defaults we can just have:

n_components=0.9999, n_pca_components=None

Why not set n_components to a bearable value by default, like, "70" or "100" or so… And warn and use fewer components if there are not as many in the data

@larsoner Sorry to break this to you but #8326 is not doing the trick. The ICA decomposition seems to be working fine, but the signal reconstruction step with n_pca_components = max_pca_components = None (i.e., all channels) screws up the results (check out the "cleaned" magnetometer data in the last plot; gradiometers are not affected. Magnetometers, on the other hand, have projectors applied, i.e. their data is rank-deficient, and removal of the ECG-related ICs doesn't seem to really do much of anything.)

Setting n_pca_components = 0.9999, however, yields a sensible reconstruction, with the ECG artifact being removed from both, gradiometers and magnetometers.

This is exactly the same behavior @agramfort and I witnessed before.

# %%
import os
import mne

sample_data_folder = mne.datasets.sample.data_path()
sample_data_raw_file = os.path.join(sample_data_folder, 'MEG', 'sample',
                                    'sample_audvis_raw.fif')

# Load & filter raw data.
raw = (mne.io.read_raw_fif(sample_data_raw_file, preload=True)
       .pick_types(meg=True, stim=True, eog=True))
raw.filter(l_freq=1, h_freq=None)

# Create Epochs.
events = mne.find_events(raw, stim_channel='STI 014')
event_dict = {'auditory/left': 1, 'auditory/right': 2, 'visual/left': 3,
              'visual/right': 4, 'face': 5, 'buttonpress': 32}
tmin, tmax = -0.2, 0.5
baseline = (None, 0)
reject = dict(mag=3000e-15,
              grad=3000e-13)
epochs = mne.Epochs(raw, events, tmin=tmin, tmax=tmax, event_id=event_dict,
                    reject=reject, baseline=baseline, preload=True)

# Create Evoked.
evoked = epochs.average()

# Run ICA.
n_components = 0.8
n_pca_components = None
max_pca_components = None
method = 'picard'
fit_params = dict(fastica_it=5)
max_iter = 500
random_state = 42
ica = mne.preprocessing.ICA(method=method, random_state=random_state,
                            fit_params=fit_params,
                            max_iter=max_iter,
                            n_components=n_components,
                            n_pca_components=n_pca_components,
                            max_pca_components=max_pca_components)
ica.fit(epochs)

# Detect ECG artifacts.
ecg_epochs = mne.preprocessing.create_ecg_epochs(raw, tmin=-0.5, tmax=0.5,
                                                 baseline=(None, -0.2),
                                                 reject=None)
ecg_evoked = ecg_epochs.average()
ecg_inds, scores = ica.find_bads_ecg(
    ecg_epochs, method='ctps',
    threshold=0.1)
ica.exclude = ecg_inds

print(f'Detected {len(ecg_inds)} ECG-related ICs in '
      f'{len(ecg_epochs)} ECG epochs.')

# Plot ICA source, original, and cleaned data. (plot_overlay somehow doesn't work on my machine?!)
ica.plot_sources(ecg_evoked)
ecg_evoked.plot()
ica.apply(ecg_evoked.copy()).plot()

Why not set n_components to a bearable value by default, like, "70" or "100" or so… And warn and use fewer components if there are not as many in the data

Because this depends on the number of channels which is a bit wacky. It seems like going by the variance explained and just fitting all the components by default is better.

Setting n_pca_components = 0.9999, however, yields a sensible reconstruction, with the ECG artifact being removed from both, gradiometers and magnetometers.

Okay I'm not sure why this would be the case -- I'll take a look tomorrow and see why it breaks

If it turns out there is some bug / rank deficiency is problematic, then I still think max_pca_components can go away and we can just have n_components=None, n_pca_components=0.9999 or something as defaults.

If it turns out there is some bug / rank deficiency is problematic, then I still think max_pca_components can go away and we can just have n_components=None, n_pca_components=0.9999 or something as defaults.

+1 on that.

Thanks for taking the time to look into this, I really appreciate it.

It seems like going by the variance explained and just fitting all the components by default is better.

But if you have hundreds of sensors like in MEG it may seem ICA never finishes. I guess what I'm getting at is: could we maybe add a progress bar somehow?

could we maybe add a progress bar somehow?

For infomax maybe because we write the code for it. For FastICA and Picard you'll have to look into seeing if those libraries provide some interface for a callback to call at the end of each loop. Then we could use that plus the max_iter to get it into a ProgressBar (which might terminate early because of convergence criteria, but this might be okay).

I could also just add a progress bar upstream in picard? We do pass max_iter to it, after all

… but ok, callback seems like a much cleaner idea.

I lost track of all the ICA changes in various PRs, so I wanted to confirm that the previous behavior won't change.

That is, by default I would like to fit all components (corresponding to the number of available channels). If I understood the issues correctly, fitting on rank-deficient data will now issue a warning, correct? However, I don't think that even in this case MNE should automatically drop a principal component. Same thing for applying ICA (which means going back to channel space): by default all components should be used to reconstruct the channels. In essence, fitting and applying ICA should result in the original channel signals.

Is this still the case with the ongoing ICA PRs?

so I wanted to confirm that the previous behavior won't change.

Until now, that's still true.

That is, by default I would like to fit _all_ components (corresponding to the number of available channels).

This is what @larsoner is trying to make work reliably. I'm running into issues with this when using rank-deficient data.

If I understood the issues correctly, fitting on rank-deficient data will now issue a warning, correct?

No, not yet; the respective PR has not yet been merged.

However, I don't think that even in this case MNE should automatically drop a principal component. Same thing for applying ICA (which means going back to channel space): by default all components should be used to reconstruct the channels. In essence, fitting and applying ICA should result in the original channel signals.

If we cannot figure out why rank-deficient data screws up signal reconstruction, I think the much better default behavior would be to drop all principal components not explaining ~any variance. This gives stable (and sane) results.

Thanks @hoechenberger for the update.

If we cannot figure out why rank-deficient data screws up signal reconstruction, I think the much better default behavior would be to drop all principal components not explaining ~any variance. This gives stable (and sane) results.

As long as MNE is very clear on what it is doing (i.e. log to info or even warning) this might be fine. However, users should always be able to override the default behavior (which I think we all agree on).

Which step is problematic? Fitting or applying? I assume it's fitting as @larsoner mentioned, because I wouldn't know why rank matters in applying (but maybe someone can explain).

Which step is problematic? Fitting or applying? I assume it's fitting as @larsoner mentioned, because I wouldn't know why rank matters in applying (but maybe someone can explain).

It seems that applying is the culprit. Check out the MWE I posted earlier:
https://github.com/mne-tools/mne-python/issues/8337#issuecomment-703266620

We set n_components = 0.8, i.e. only fit on a subset of PCA components.
If we don't combine this with n_pca_components < 1 or max_pca_components < 1, something goes wrong: the ECG artifact wil ~not be removed from the magnetometers. If we set n_pca_components or max_pca_components to e.g. 0.9999, it works nicely.

@cbrnr I've reworked the MWE, this one is better and does all you need to see in one go.

# %%
import os
import mne

sample_data_folder = mne.datasets.sample.data_path()
sample_data_raw_file = os.path.join(sample_data_folder, 'MEG', 'sample',
                                    'sample_audvis_raw.fif')

# Load & filter raw data.
raw = (mne.io.read_raw_fif(sample_data_raw_file, preload=True)
       .pick_types(meg=True, stim=True, eog=True))
raw.filter(l_freq=1, h_freq=None)

# Create Epochs.
events = mne.find_events(raw, stim_channel='STI 014')
event_dict = {'auditory/left': 1, 'auditory/right': 2, 'visual/left': 3,
              'visual/right': 4, 'face': 5, 'buttonpress': 32}
tmin, tmax = -0.2, 0.5
baseline = (None, 0)
reject = dict(mag=3000e-15,
              grad=3000e-13)
epochs = mne.Epochs(raw, events, tmin=tmin, tmax=tmax, event_id=event_dict,
                    reject=reject, baseline=baseline, preload=True)

# Create Evoked.
evoked = epochs.average()

# Run ICA.
n_components = 0.8
n_pca_components = None
max_pca_components = None
method = 'picard'
fit_params = dict(fastica_it=5)
max_iter = 500
random_state = 42
ica = mne.preprocessing.ICA(method=method, random_state=random_state,
                            fit_params=fit_params,
                            max_iter=max_iter,
                            n_components=n_components,
                            n_pca_components=n_pca_components,
                            max_pca_components=max_pca_components)
ica.fit(epochs)

# Detect ECG artifacts.
ecg_epochs = mne.preprocessing.create_ecg_epochs(raw, tmin=-0.5, tmax=0.5,
                                                 baseline=(None, -0.2),
                                                 reject=None)
ecg_evoked = ecg_epochs.average()
ecg_inds, scores = ica.find_bads_ecg(
    ecg_epochs, method='ctps',
    threshold=0.1)
ica.exclude = ecg_inds

print(f'Detected {len(ecg_inds)} ECG-related ICs in '
      f'{len(ecg_epochs)} ECG epochs.')

# Plot ICA sources.
ica.plot_sources(ecg_evoked)

# Apply ICA.
ica.apply(ecg_evoked.copy()).plot()
ica.apply(ecg_evoked.copy(), n_pca_components=0.9999).plot()

See how the plots differ, when the only thing we changed is n_pca_components.

(when testing, beware of #8327, which also seems to happen during apply)

Interesting. So the initial n_components=0.8 is necessary because otherwise the ECG detection algorithm detects 305/305 ICs? Isn't that also a problem? I've never seen such a behavior when using ICA to zero out ocular components (I manually identify between 1-3 ICs that are clearly related to eye movement, not sure if/how an automatic EOG detection algorithm would work).

I was thinking that fitting on a reduced set of principal components is related to reconstructing the signals. If that is the case, wouldn't setting n_pca_components=0.8 make sense because that's what we used during fitting with n_components=0.8?

Interesting. So the initial n_components=0.8 is necessary because otherwise the ECG detection algorithm detects 305/305 ICs?

The ICA by default fits one IC per passed PC; by default, that's all PCs, yes. It has nothing to do directly with the ECG detection algorithm, though! I'm just using the ECG IC removal to demonstrate the problem here.

Isn't that also a problem?

Well it's simply taking way too long, yes ;) And yeah probably passing a PC with ~zero value violates some ICA assumptions, but I'm not entirely sure about this.

I manually identify between 1-3 ICs that are clearly related to eye movement, not sure if/how an automatic EOG detection algorithm would work

Pretty much the same. It looks at the ECG/EOG signal in ECG/EOG epochs and then tries to find ICs that correspond to this signal.

I was thinking that fitting on a reduced set of principal components is related to reconstructing the signals.

Yes, it is. If you only fit ICA on a subset of your PCs, during signal reconstruction we will use all ICs + N PCs, where N = n_pca_components - n_components. In EEG, we'd typically use n_components = n_pca_components, because we'd want to fit one IC per sensor (if the data is full rank); for MEG, there's just too many sensors.

If that is the case, wouldn't setting n_pca_components=0.8 make sense because that's what we used during fitting with n_components=0.8?

Yes so if you passed n_components=None, n_pca_components=0.8, you'd reconstruct the signal entirely from the ICs, as we typically do in EEG. In MEG, people seem to think differently, though :)

(@cbrnr I've edited my comment above)

If that is the case, wouldn't setting n_pca_components=0.8 make sense because that's what we used during fitting with n_components=0.8?

Yes so if you passed n_components=None, n_pca_components=0.8, you'd reconstruct the signal entirely from the ICs, as we typically do in EEG. In MEG, people seem to think differently, though :)

The additional PCs are included to "denoise" the data (or something like that), if I've followed the previous conversations correctly.

OK - but you're not concerned that the ECG detection doesn't work if you specify n_components=None? It doesn't just take ages, but it finds that all ICs contain ECG (as opposed to 2 found when reducing components to 80% of explained variance)?

OK - but you're not concerned that the ECG detection doesn't work if you specify n_components=None? It doesn't just take ages, but it finds that all ICs contain ECG (as opposed to 2 found when reducing components to 80% of explained variance)?

I had never tried that, and it does concern me ;)

@cbrnr mind re-running the ECG detection with n_components=None, n_pca_components=0.9999?

mind re-running the ECG detection with n_components=None, n_pca_components=0.9999?

This doesn't change anything, because all 0.9999 means all components:

Fitting ICA to data using 305 channels (please be patient, this may take a while)
Inferring max_pca_components from picks
Selecting all PCA components: 305 components
/usr/local/lib/python3.8/site-packages/picard/solver.py:213: UserWarning: Picard did not converge. Final gradient norm : 5.328e-05. Requested tolerance : 1e-07. Consider increasing the number of iterations or the tolerance.
  warnings.warn('Picard did not converge. Final gradient norm : %.4g.'
Fitting ICA took 966.5s.

Which means that the ECG detection algorithm still identifies all components as ECG-related:

Detected 305 ECG-related ICs in 283 ECG epochs.

This doesn't change anything, because all 0.9999 means all components:

When you say "means all components", are you referring to this part in the output:

Selecting all PCA components: 305 components

?

Because that only tells you how many components were selected based on n_components, which is "all" for n_components=None

Still, n_pca_components=0.9999 should not keep 305, but 302 PCs. Are you absolutely sure you're using the latest master to run this code?

I was using 0.21, will retry with the latest master...

I was using 0.21, will retry with the latest master...

Oh!!! Yes, sorry, I assumed you're on master! You're missing the fixes @larsoner has added since the release.

It looks like the real problem with this example is that the data are fit on Epochs that have been projected (rank 302) and then apply is run on Epochs that have not been projected (rank 305).

In your example if you:

  1. add proj=False to the Epochs constructor, or
  2. ica.fit(raw) instead of epochs, or
  3. add ecg_evoked.apply_proj()

It works. Here is a faster version you can more quickly test with:

import os
import mne

sample_data_folder = mne.datasets.sample.data_path()
sample_data_raw_file = os.path.join(sample_data_folder, 'MEG', 'sample',
                                    'sample_audvis_raw.fif')

# Load & filter raw data.
raw = (mne.io.read_raw_fif(sample_data_raw_file)
       .pick_types(meg=True, stim=True, eog=True)).crop(0, 60)

# Create Epochs.
events = mne.find_events(raw, stim_channel='STI 014')
event_dict = {'auditory/left': 1, 'auditory/right': 2, 'visual/left': 3,
              'visual/right': 4, 'face': 5, 'buttonpress': 32}
tmin, tmax = -0.2, 0.5
baseline = (None, 0)
reject = dict(mag=3000e-15,
              grad=3000e-13)
epochs = mne.Epochs(raw, events, tmin=tmin, tmax=tmax, event_id=event_dict,
                    reject=reject, baseline=baseline, preload=True, proj=False)
epochs.apply_proj()  # 1. XXX comment this out and it works, or

# Create Evoked.
evoked = epochs.average()

# Run ICA.
n_components = 0.8
n_pca_components = None
max_pca_components = None
method = 'picard'
fit_params = dict(fastica_it=5)
max_iter = 5
random_state = 42
ica = mne.preprocessing.ICA(method=method, random_state=random_state,
                            fit_params=fit_params,
                            max_iter=max_iter,
                            n_components=n_components,
                            n_pca_components=n_pca_components,
                            max_pca_components=max_pca_components)
ica.fit(epochs)
# ica.fit(raw)  # 2. XXX comment the line above and uncomment this one, or

# Detect ECG artifacts.
ecg_epochs = mne.preprocessing.create_ecg_epochs(raw, tmin=-0.5, tmax=0.5,
                                                 baseline=(None, -0.2),
                                                 reject=None)
ecg_evoked = ecg_epochs.average()
# ecg_evoked.apply_proj()  # 3. XXX uncomment this line.
ecg_inds, scores = ica.find_bads_ecg(
    ecg_epochs, method='ctps',
    threshold=0.1)
ica.exclude = ecg_inds

print(f'Detected {len(ecg_inds)} ECG-related ICs in '
      f'{len(ecg_epochs)} ECG epochs.')

# Apply ICA.
ica.apply(ecg_evoked.copy(), verbose=True).plot()
ica.apply(ecg_evoked.copy(), n_pca_components=0.9999, verbose=True).plot()

To me the fix is to make sure that whatever projections are applied (present and active) during the fit step are present and active during the apply step / on that instance, or maybe better, just apply them directly using the ica.info instance since they are stored there.

Working on a fix now

So does this mean that the initial problem is also solved? Was it caused by computing ICA on different data than applying it to?

Was it caused by computing ICA on different data than applying it to?

It was caused by fitting on data that had projections applied but applying it to data that didn't have them applied (and not applying them in the process)

OK then, I'm glad we don't have to use n_pca_components=0.9999 then!

But this will only fix things in cases where we know about projectors, right? What about e.g. #7727, where we might deal with EEGLAB-preprocessed data that has been average-referenced, hence is now rank-deficient, but we don't know this beforehand? (I know #7727 was originally about covariance estimation, but I'd expect to run into similar issues with ICA).

OK then, I'm glad we don't have to use n_pca_components=0.9999 then!

But why not? It wouldn't hurt – or would it? I cannot see any drawbacks here, neither with EEG nor with MEG data. Am I missing something?

But this will only fix things in cases where we know about projectors, right? What about e.g. #7727, where we might deal with EEGLAB-preprocessed data that has been average-referenced, hence is now rank-deficient, but we don't know this beforehand? (I know #7727 was originally about covariance estimation, but I'd expect to run into similar issues with ICA).

Although, in this case, we'd fit and apply on the same data, i.e., same level of rank deficiency… still. Would that work?

But this will only fix things in cases where we know about projectors, right?

No, the average ref (not projector, applied) works fine in master already in the noop test IIRC, and should be fine. The critical thing is just that the data you fit should have the same rank deficiency / linear operators applied to it as the data you apply to. So in the case of projections we can just force this to be the case. People with custom refs or rank deficiencies from elsewhere will just have to be careful about it.

But why not? It wouldn't hurt – or would it? I cannot see any drawbacks here, neither with EEG nor with MEG data. Am I missing something?

Any time you have to numerically estimate rank you have to trade off the risk of blowing up some meaningless/numerically zero component against accidentally zeroing out some meaningful component. So we should only add it if we need to.

Although, in this case, we'd fit and apply on the same data, i.e., same level of rank deficiency… still. Would that work?

Yes as above

How would we handle the deprecation of max_pca_components? Will we even need a deprecation cycle, since while we're changing the API, we're not removing functionality, and we could emit a fat helpful error message if sb tries to use this param?

We should just emit a warning saying it will be removed and they should use n_pca_components instead. Then in our .fit we set n_pca_components_ using the smaller of the n_pca_components value and max_pca_components value (for example if one is int and one is float). And we have to be careful that when reading an ICA object we properly handle old ICA objects in a similar way.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rob-luke picture rob-luke  Â·  51Comments

kingjr picture kingjr  Â·  66Comments

thht picture thht  Â·  38Comments

DominiqueMakowski picture DominiqueMakowski  Â·  32Comments

kingjr picture kingjr  Â·  36Comments