Mne-python: Organizing supervised methods

Created on 18 Jul 2016  Â·  35Comments  Â·  Source: mne-tools/mne-python

@kaichogami @dengemann @alexandrebarachant @agramfort @jona-sassenhagen @choldgraf

As you know, the aim of the GSoC is to clean up the decoding module. However, I'm not entirely clear on how to guide @kaichogami in terms of module and objects location.

I think we have two options:

  • either we keep the transformers in their 'natural' locations e.g. mne.preprocessing.ICA, mne.preprocessing.Xdawn
  • or we re-unite all the sklearn transformers and estimators in mne.decoding and mne.encoding (or mne.supervised) module.

Opinions?

All 35 comments

That is a tough question - do we have any intuition for how MNE users would
prefer things? E.g., I tend to think about everything in terms of
supervised learning etc so to me it seems natural to have things in a
machine-learning module of some sort. However that's just my perspective,
and my guess is that many people don't think about their analyses in that
way.

I like to think about it like this: if we took a random sample of MNE users
and asked them to find an ICA function, where would they look? In a module
called "preprocessing", or a module called "supervised" or "learn"? I
suppose this is biased towards the status quo in terms of how people think
about neuroscience analysis, but maybe it's a good start for discussion.

On Sun, Jul 17, 2016 at 7:42 PM, Jean-Rémi KING [email protected]
wrote:

@kaichogami https://github.com/kaichogami @dengemann
https://github.com/dengemann @alexandrebarachant
https://github.com/alexandrebarachant @agramfort
https://github.com/agramfort @jona-sassenhagen
https://github.com/jona-sassenhagen @choldgraf
https://github.com/choldgraf

As you know, the aim of the GSoC is to clean up the decoding module.
However, I'm not entirely clear on how to guide @kaichogami
https://github.com/kaichogami in terms of module and objects location.

I think we have two options:

  • either we keep the transformers in their 'natural' locations e.g.
    mne.preprocessing.ICA, mne.preprocessing.Xdawn
  • or we re-unite all the sklearn transformers and estimators in
    mne.decoding and mne.encoding (or mne.supervised) module.

Opinions?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/3428, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABwSHax-jk0T4EmOBTbG2XYaF675OCHSks5qWswCgaJpZM4JOWeE
.

A difficulty to keep in mind is that we'll probably end up with a redundancy between the current MNE functions (band_pass_filter, single_trial_power, apply_baseline, ica etc) and what will be used in the sklearn pipeline (Filter, TimeFreqDecomposer, Baseliner, ICA etc).

One possible solution would be to adopt an analogous design to the viz module: i.e. write the function inside their expected directory, and have a generic 'supervise' module to wraps these functions into sklearn-like objects.

Another risk is that we'll end up dumping everything in 'preprocessing', which isn't very informative.

To drop some more thoughts, I think we can already plan on separating:

  • supervised objects: e.g. Xdawn, CSP which store coefficients as a function of both X and y
  • non-supervised but fitting objects: e.g. Baseliner, SpatialFilter(PCA()), etc which store coefficients only as a function of X
  • non-supervised and data independent object: e.g. TimeFrequencyDecomposition, SpatialFilter('average_reference') etc which do not store any coefficient and can thus be applied outside the CV.

That is a tough question - do we have any intuition for how MNE users would
prefer things? E.g., I tend to think about everything in terms of
supervised learning etc so to me it seems natural to have things in a
machine-learning module of some sort. However that's just my perspective,
and my guess is that many people don't think about their analyses in that
way.

I like to think about it like this: if we took a random sample of MNE users
and asked them to find an ICA function, where would they look? In a module
called "preprocessing", or a module called "supervised" or "learn"? I
suppose this is biased towards the status quo in terms of how people think
about neuroscience analysis, but maybe it's a good start for discussion.

Usually I would think stuff should be put where it should be, not where people expect it to be, but in this case, I think I am with Chris.

To some extend, everything that is before the classifier can be considered as preprocessing, and i'm actually not sure the distinction with supervised and unsupervised is very relevant.

One issue is that some methods are not pure decoding or pure preprocessing. The spatial filtering methods (ICA, PCA, CSP, Xdawn, etc) are both used in decoding context but also in denoising or pattern visualization. It's a bit off-topic, but I would actually advocate for an unification of all the spatial filtering methods under the same API (fit, transform, inverse_transform, apply, plot_pattern, etc)

Yeah it would be nice to have one and the same API and back-end for plotting ICs/SSPs and classifier coefs/patterns. All in all, that would probably simplify the code, testing etc.

To some extend, everything that is before the classifier can be considered as preprocessing

+1

The spatial_filter could indeed be an alternative. However, I'm not sure how easy it will be to unify everything: in the GSoC, we've made the decision to keep the object as simple as possible (use of numpy arrays, explicit params etc), whereas the current ICA takes mne objects and do a lot of fancy but useful stuff wrt plotting.

I agree with some of the comments. Just my 2 quick cents:

we should not put code in modules based on their API. ICA is used mostly
for denoising / pattern viz and therefore fits in preprocessing. Classical
MEG/EEG
users, non BCI folks, should use Xdawn for denoising, hence the
preprocessing
module.

the decoding module is meant (only?) for supervised tasks where the sklearn
API is really powerful. We limit the number of classes in MNE as much as
possible. But in the decoding module I have have more classes if the API
strictly follows sklearn (fit, predict, transform etc.)

ICA is used mostly for denoising / pattern viz and therefore fits in preprocessing.

FWIW I've used ICs instead of sensors a few times for GAT decoding and it makes stuff a bit faster and sometimes gives you an extra percentage point on the scores. though I wouldn't _fit_ the ICs at that stage, just apply.

ICA is used mostly for denoising / pattern viz and therefore fits in
preprocessing.

FWIW I've used ICs instead of sensors a few times for GAT decoding and it
makes stuff a bit faster and sometimes gives you an extra percentage point
on the scores. though I wouldn't _fit_ the ICs at that stage, just apply

ok but then it's denoising not a pipeline step in the sklearn sense

FWIW I've used ICs instead of sensors a few times for GAT decoding and it makes stuff a bit faster and sometimes gives you an extra percentage point on the scores. though I wouldn't fit the ICs at that stage, just apply.

@jona-sassenhagen yes it's faster with dimensionality reduction although doing it with an xdawn is probably a better idea. It was one of my best evoked preproc for this.

we should not put code in modules based on their API. ICA

ok

the decoding module is meant (only?) for supervised tasks where the sklearn API is really powerful. We limit the number of classes in MNE as much as possible. But in the decoding module I have have more classes if the API strictly follows sklearn (fit, predict, transform etc.)

So... TimeFreq, Filter and other kind of non supervised sklearn-like objects in their respective module?

ok but then it's denoising not a pipeline step in the sklearn sense

@agramfort I'm note sure what you mean. Both @alexandrebarachant and I use these kind of steps in sklearn pipelines: e.g. https://github.com/kingjr/decoding_challenge_cortana_2016_3rd, https://github.com/alexandrebarachant/decoding-brain-challenge-2016.

So... TimeFreq, Filter and other kind of non supervised sklearn-like
objects in their respective module?

if they are meant to be used in decoding pipelines then in decoding.

I don't to teach the use of these objects outside of the decoding use case.

if they are meant to be used in decoding pipelines then in decoding.

@agramfort so... based on API then ;)

I agree that we follow my logic ICA with transform should be in decoding
but that's not the primary use case of ICA.

yes for new objects but keep me ICA and Xdawn classes in preprocessing

at least the Xdawn we have now with overlaps. overlaps are not ok with
cross-val hence for decodign

Would it be too confusing to have functions live in their current places (e.g., preprocessing etc) but have objects meant for sklearn-style fitting built on top of those functions and living in some kind of learn or supervised module (e.g., with a pointer in their docstring to say where their underlying function lives)?

Xdawn is a method designed for decoding, but it turns out to be so effective to pull up the ERP on a subset of components that one of its usage could be to do denoising or artifact removal (for artifact that are time locked). So i added the inverse_transform and apply method, but it's not the typical use case of this method.

The inverse is true for CSP, which is used in decoding but is also effective for denoising or artifact removal (when you want to remove class related induced activity).

Taking a step back, this is true for any spatial filtering method.

what is missing in Xdawn right now is the plot_filters and plot_patterns, and what is missing in CSP is the apply and inverse_transform. We will probably have the same discussion when i will push SPoC :)

My 2 cents

missing plot features related to discussion in #3424

@alexandrebarachant before you push the SPoC, don't we need to steal your Covariance estimator?

Regarding the apply() method, I'm not entirely clear. Conceptually, is this identical to transform(X).inverse_transform(X), or do we consider that inverse_transform only applies to injective functions? Note that apply seems to exist in sklearn for Forest and GradientBoosting estimators. Either way, we could put both transform() and apply() in a _SpatialFilterMixin for all spatial filters.

Regarding the plotting, the current MNE spatial filters seem to always have a filters_ of shape (n_channels,) or (n_channels, n_times), but in pyRiemann the filters_ can be of shape (n_channels, n_chans) (e.g CSP). @alexandrebarachant do you expect that this could be a blocker for a mixing class?

I suggest to adopt the following organization. @agramfort @Eric89GXL @dengemann you have as more experience in package organization. LMKWYT.

1) Each transformer is paired with a set of (potentially private) functions similarly to the to-be-merged EMS and XDawn classes: e.g.

ems = EMS()
ems.fit(X, y)
ems.transform(X)

class EMS(BaseEstimator):
     def fit(self, X, y):
         self.filters_ = ems_fit(X, y)
     def transform(self, X):
        return ems_transform(self.filters_, X, y)
  • This will facilitate the integration of the functions already existing in MNE to their sklearn-API counter parts.

2) For data independent transformations (e.g. wavelet convolution, psd etc), I think we have two options:

  • either create a class per Transformer e.g.TimeFreqDecomposer().fit_transform(X) as in sklearn, or
  • create a generic class that passes the already existing functions with kwargs arguments e.g. Generic(single_trial_power, sfreq=1000, frequencies=range(100), n_cyles=5...). The benefit is that we create less classes, but the user-side API will look a bit nested.

3) Make all new transformers private for now, and until we stabilize the supervised module organization, naming, etc. And only write the decoding examples once the new transformers are public.

@kingjr you don't need to steal the Covariance estimation module. Like for the CSP, the covariance estimation will be done within the fit(). for pyRiemann i chose to externalize the covariance estimation because it gave you more flexibility and to be directly compatible with other tools of the toolbox. IMO, the same thing is out of the scope of MNE, and it will be less confusing for the user to keep it that way.

Regarding the apply, i think the main difference is that you can select which components to exclude or keep. so it is not strictly equivalent to pipelining transform and inverse_transform.

Regarding the filter shapes, i believe MNE's CSP also have filters in shape of (n_channel, n_filters) with n_filters <= n_channels. i think it's the same for all spatial filtering.

I'm not familiar enough with how all these bits work together to comment meaningfully right now :(

.apply()

Thanks for the clarification. This API seems to deviate from the sklearn API, which in the past has proved difficult to support, but perhaps it could be interesting to generalize it to all spatial filters.

Covariances

I thought it would be beneficial to give the user a decoding example to do make_pipeline(Xdawn(), Covariance(), TangentSpace(), LogisticRegression()), no? Plus this will be an additional explicit reference to your papers ;)

Spatial filters.

The GAT can also be thought as a series of spatial filters with of shape n_channels, n_times, where n_times > n_channels The redundancy of the spatial filters can be overcome by averaging across subjects.

Either way, the filters shape doesn't seem to be a blocker indeed, and we can think of making a generic plotting method.

yes apply is a shortcut for transform().inverse_transform() that works with
MNE objects and that is way more efficient than doing
transform().inverse_transform(). No need to compute sources array to
denoise / remove some of them.

No need to compute sources array to denoise / remove some of them.

WDYM? Shall we, or shall we not have a n_component param in apply to vary the number of components to use for denoising? If so, that means we potentially need to fit all possible components at fitting stage, which isn't the sklearn way.

No need to compute sources array to denoise / remove some of them.

WDYM?

X_denoise = A_clean * W * X_noisy

but :

X_denoise = (A_clean * W) * X_noisy

is faster and less memory demanding than:

X_denoise = A_clean * (W * X_noisy)

ok?

Shall we, or shall we not have a n_component param in apply to vary the
number of components to use for denoising?

no strong opinion. I'd rather have n_components in init only though.

If so, that means we potentially need to fit all possible components at
fitting stage, which isn't the sklearn way.

then no

For some application (or for ICA), you want to be able to pick / nullify a specific set of component. So having just a fixed number of components, ranked by the algorithm, is not enough IMO.

Having exlude and include as parameter of the apply() method, like it is for ICA, is a good idea. I'm not fond of having this in init as it will force you to call fit every time you want to change this parameter.

@agramfort OK

@alexandrebarachant We could potentially pass either an int or a list to apply() to select the components. But overall, I think this kind of method is to be applied in more general data processing way, and is not suited for the sklearn-decoding / encoding pipeline that we're trying to achieve.

So in conclusion, I think we should primarily focus on the pipelinable features (methods, objects). We will move towards plotting, and re-use of sklearn-like transformers for non-decoding purposes in a second stage.

@kingjr agreed.

ok but don't deprecate what we have for now, just refactor to avoid code
dupes and convince use with cool examples with beautiful python code :)

I'll just focus on enhancements from now on. If you want to keep the current functions, I'll let someone else take care of the corresponding refactoring.

sounds like a plan.

After discussing with @agramfort and @dengemann we decided to

  • put all sklearn transformers in mne.decoding
  • make sure all transformer call a function (e.g. fit_xdawn, transform_xdawn, compute_tfr)
  • put the function in the relevant directory (e.g. mne.time_frequency.compute_tfr)

Sounds good to me thanks for taking the time to hash it out

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kingjr picture kingjr  Â·  66Comments

hoechenberger picture hoechenberger  Â·  43Comments

rob-luke picture rob-luke  Â·  51Comments

cbrnr picture cbrnr  Â·  64Comments

cbrnr picture cbrnr  Â·  80Comments