Mne-python: Organizing supervised methods

Created on 18 Jul 2016 · 35Comments · Source: mne-tools/mne-python

@kaichogami @dengemann @alexandrebarachant @agramfort @jona-sassenhagen @choldgraf

As you know, the aim of the GSoC is to clean up the decoding module. However, I'm not entirely clear on how to guide @kaichogami in terms of module and objects location.

I think we have two options:

either we keep the transformers in their 'natural' locations e.g. mne.preprocessing.ICA, mne.preprocessing.Xdawn
or we re-unite all the sklearn transformers and estimators in mne.decoding and mne.encoding (or mne.supervised) module.

Opinions?

Source

kingjr

All 35 comments

That is a tough question - do we have any intuition for how MNE users would
prefer things? E.g., I tend to think about everything in terms of
supervised learning etc so to me it seems natural to have things in a
machine-learning module of some sort. However that's just my perspective,
and my guess is that many people don't think about their analyses in that
way.

I like to think about it like this: if we took a random sample of MNE users
and asked them to find an ICA function, where would they look? In a module
called "preprocessing", or a module called "supervised" or "learn"? I
suppose this is biased towards the status quo in terms of how people think
about neuroscience analysis, but maybe it's a good start for discussion.

On Sun, Jul 17, 2016 at 7:42 PM, Jean-Rémi KING [email protected]
wrote:

@kaichogami https://github.com/kaichogami @dengemann
https://github.com/dengemann @alexandrebarachant
https://github.com/alexandrebarachant @agramfort
https://github.com/agramfort @jona-sassenhagen
https://github.com/jona-sassenhagen @choldgraf
https://github.com/choldgraf

As you know, the aim of the GSoC is to clean up the decoding module.
However, I'm not entirely clear on how to guide @kaichogami
https://github.com/kaichogami in terms of module and objects location.

I think we have two options:

either we keep the transformers in their 'natural' locations e.g.
mne.preprocessing.ICA, mne.preprocessing.Xdawn

or we re-unite all the sklearn transformers and estimators in
mne.decoding and mne.encoding (or mne.supervised) module.

Opinions?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/3428, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABwSHax-jk0T4EmOBTbG2XYaF675OCHSks5qWswCgaJpZM4JOWeE
.

choldgraf on 18 Jul 2016

A difficulty to keep in mind is that we'll probably end up with a redundancy between the current MNE functions (band_pass_filter, single_trial_power, apply_baseline, ica etc) and what will be used in the sklearn pipeline (Filter, TimeFreqDecomposer, Baseliner, ICA etc).

One possible solution would be to adopt an analogous design to the viz module: i.e. write the function inside their expected directory, and have a generic 'supervise' module to wraps these functions into sklearn-like objects.

Another risk is that we'll end up dumping everything in 'preprocessing', which isn't very informative.

To drop some more thoughts, I think we can already plan on separating:

supervised objects: e.g. Xdawn, CSP which store coefficients as a function of both X and y
non-supervised but fitting objects: e.g. Baseliner, SpatialFilter(PCA()), etc which store coefficients only as a function of X
non-supervised and data independent object: e.g. TimeFrequencyDecomposition, SpatialFilter('average_reference') etc which do not store any coefficient and can thus be applied outside the CV.

kingjr on 18 Jul 2016

That is a tough question - do we have any intuition for how MNE users would
prefer things? E.g., I tend to think about everything in terms of
supervised learning etc so to me it seems natural to have things in a
machine-learning module of some sort. However that's just my perspective,
and my guess is that many people don't think about their analyses in that
way.

I like to think about it like this: if we took a random sample of MNE users
and asked them to find an ICA function, where would they look? In a module
called "preprocessing", or a module called "supervised" or "learn"? I
suppose this is biased towards the status quo in terms of how people think
about neuroscience analysis, but maybe it's a good start for discussion.

Usually I would think stuff should be put where it should be, not where people expect it to be, but in this case, I think I am with Chris.

jona-sassenhagen on 18 Jul 2016

To some extend, everything that is before the classifier can be considered as preprocessing, and i'm actually not sure the distinction with supervised and unsupervised is very relevant.

One issue is that some methods are not pure decoding or pure preprocessing. The spatial filtering methods (ICA, PCA, CSP, Xdawn, etc) are both used in decoding context but also in denoising or pattern visualization. It's a bit off-topic, but I would actually advocate for an unification of all the spatial filtering methods under the same API (fit, transform, inverse_transform, apply, plot_pattern, etc)

alexandrebarachant on 18 Jul 2016

Yeah it would be nice to have one and the same API and back-end for plotting ICs/SSPs and classifier coefs/patterns. All in all, that would probably simplify the code, testing etc.

jona-sassenhagen on 18 Jul 2016

To some extend, everything that is before the classifier can be considered as preprocessing

The spatial_filter could indeed be an alternative. However, I'm not sure how easy it will be to unify everything: in the GSoC, we've made the decision to keep the object as simple as possible (use of numpy arrays, explicit params etc), whereas the current ICA takes mne objects and do a lot of fancy but useful stuff wrt plotting.

kingjr on 18 Jul 2016

I agree with some of the comments. Just my 2 quick cents:

we should not put code in modules based on their API. ICA is used mostly
for denoising / pattern viz and therefore fits in preprocessing. Classical
MEG/EEG
users, non BCI folks, should use Xdawn for denoising, hence the
preprocessing
module.

the decoding module is meant (only?) for supervised tasks where the sklearn
API is really powerful. We limit the number of classes in MNE as much as
possible. But in the decoding module I have have more classes if the API
strictly follows sklearn (fit, predict, transform etc.)

agramfort on 18 Jul 2016

ICA is used mostly for denoising / pattern viz and therefore fits in preprocessing.

FWIW I've used ICs instead of sensors a few times for GAT decoding and it makes stuff a bit faster and sometimes gives you an extra percentage point on the scores. though I wouldn't _fit_ the ICs at that stage, just apply.

jona-sassenhagen on 18 Jul 2016

ICA is used mostly for denoising / pattern viz and therefore fits in
preprocessing.

FWIW I've used ICs instead of sensors a few times for GAT decoding and it
makes stuff a bit faster and sometimes gives you an extra percentage point
on the scores. though I wouldn't _fit_ the ICs at that stage, just apply

ok but then it's denoising not a pipeline step in the sklearn sense

agramfort on 18 Jul 2016

FWIW I've used ICs instead of sensors a few times for GAT decoding and it makes stuff a bit faster and sometimes gives you an extra percentage point on the scores. though I wouldn't fit the ICs at that stage, just apply.

@jona-sassenhagen yes it's faster with dimensionality reduction although doing it with an xdawn is probably a better idea. It was one of my best evoked preproc for this.

we should not put code in modules based on their API. ICA

the decoding module is meant (only?) for supervised tasks where the sklearn API is really powerful. We limit the number of classes in MNE as much as possible. But in the decoding module I have have more classes if the API strictly follows sklearn (fit, predict, transform etc.)

So... TimeFreq, Filter and other kind of non supervised sklearn-like objects in their respective module?

kingjr on 18 Jul 2016

ok but then it's denoising not a pipeline step in the sklearn sense

@agramfort I'm note sure what you mean. Both @alexandrebarachant and I use these kind of steps in sklearn pipelines: e.g. https://github.com/kingjr/decoding_challenge_cortana_2016_3rd, https://github.com/alexandrebarachant/decoding-brain-challenge-2016.

kingjr on 19 Jul 2016

So... TimeFreq, Filter and other kind of non supervised sklearn-like
objects in their respective module?

if they are meant to be used in decoding pipelines then in decoding.

I don't to teach the use of these objects outside of the decoding use case.

agramfort on 19 Jul 2016

if they are meant to be used in decoding pipelines then in decoding.

@agramfort so... based on API then ;)

kingjr on 19 Jul 2016

I agree that we follow my logic ICA with transform should be in decoding
but that's not the primary use case of ICA.

agramfort on 19 Jul 2016

yes for new objects but keep me ICA and Xdawn classes in preprocessing

agramfort on 19 Jul 2016

👍1

at least the Xdawn we have now with overlaps. overlaps are not ok with
cross-val hence for decodign

agramfort on 19 Jul 2016

👍1

Would it be too confusing to have functions live in their current places (e.g., preprocessing etc) but have objects meant for sklearn-style fitting built on top of those functions and living in some kind of learn or supervised module (e.g., with a pointer in their docstring to say where their underlying function lives)?

choldgraf on 19 Jul 2016

👍1

Xdawn is a method designed for decoding, but it turns out to be so effective to pull up the ERP on a subset of components that one of its usage could be to do denoising or artifact removal (for artifact that are time locked). So i added the inverse_transform and apply method, but it's not the typical use case of this method.

The inverse is true for CSP, which is used in decoding but is also effective for denoising or artifact removal (when you want to remove class related induced activity).

Taking a step back, this is true for any spatial filtering method.

what is missing in Xdawn right now is the plot_filters and plot_patterns, and what is missing in CSP is the apply and inverse_transform. We will probably have the same discussion when i will push SPoC :)

My 2 cents

alexandrebarachant on 19 Jul 2016

missing plot features related to discussion in #3424

teonbrooks on 19 Jul 2016

@alexandrebarachant before you push the SPoC, don't we need to steal your Covariance estimator?

Regarding the apply() method, I'm not entirely clear. Conceptually, is this identical to transform(X).inverse_transform(X), or do we consider that inverse_transform only applies to injective functions? Note that apply seems to exist in sklearn for Forest and GradientBoosting estimators. Either way, we could put both transform() and apply() in a _SpatialFilterMixin for all spatial filters.

Regarding the plotting, the current MNE spatial filters seem to always have a filters_ of shape (n_channels,) or (n_channels, n_times), but in pyRiemann the filters_ can be of shape (n_channels, n_chans) (e.g CSP). @alexandrebarachant do you expect that this could be a blocker for a mixing class?

kingjr on 21 Jul 2016

I suggest to adopt the following organization. @agramfort @Eric89GXL @dengemann you have as more experience in package organization. LMKWYT.

1) Each transformer is paired with a set of (potentially private) functions similarly to the to-be-merged EMS and XDawn classes: e.g.

ems = EMS()
ems.fit(X, y)
ems.transform(X)

class EMS(BaseEstimator):
     def fit(self, X, y):
         self.filters_ = ems_fit(X, y)
     def transform(self, X):
        return ems_transform(self.filters_, X, y)

This will facilitate the integration of the functions already existing in MNE to their sklearn-API counter parts.

2) For data independent transformations (e.g. wavelet convolution, psd etc), I think we have two options:

either create a class per Transformer e.g.TimeFreqDecomposer().fit_transform(X) as in sklearn, or
create a generic class that passes the already existing functions with kwargs arguments e.g. Generic(single_trial_power, sfreq=1000, frequencies=range(100), n_cyles=5...). The benefit is that we create less classes, but the user-side API will look a bit nested.

3) Make all new transformers private for now, and until we stabilize the supervised module organization, naming, etc. And only write the decoding examples once the new transformers are public.

kingjr on 21 Jul 2016

@kingjr you don't need to steal the Covariance estimation module. Like for the CSP, the covariance estimation will be done within the fit(). for pyRiemann i chose to externalize the covariance estimation because it gave you more flexibility and to be directly compatible with other tools of the toolbox. IMO, the same thing is out of the scope of MNE, and it will be less confusing for the user to keep it that way.

Regarding the apply, i think the main difference is that you can select which components to exclude or keep. so it is not strictly equivalent to pipelining transform and inverse_transform.

Regarding the filter shapes, i believe MNE's CSP also have filters in shape of (n_channel, n_filters) with n_filters <= n_channels. i think it's the same for all spatial filtering.

alexandrebarachant on 21 Jul 2016

I'm not familiar enough with how all these bits work together to comment meaningfully right now :(

larsoner on 21 Jul 2016

.apply()

Thanks for the clarification. This API seems to deviate from the sklearn API, which in the past has proved difficult to support, but perhaps it could be interesting to generalize it to all spatial filters.

Covariances

I thought it would be beneficial to give the user a decoding example to do make_pipeline(Xdawn(), Covariance(), TangentSpace(), LogisticRegression()), no? Plus this will be an additional explicit reference to your papers ;)

Spatial filters.

The GAT can also be thought as a series of spatial filters with of shape n_channels, n_times, where n_times > n_channels The redundancy of the spatial filters can be overcome by averaging across subjects.

Either way, the filters shape doesn't seem to be a blocker indeed, and we can think of making a generic plotting method.

kingjr on 21 Jul 2016

yes apply is a shortcut for transform().inverse_transform() that works with
MNE objects and that is way more efficient than doing
transform().inverse_transform(). No need to compute sources array to
denoise / remove some of them.

agramfort on 22 Jul 2016

No need to compute sources array to denoise / remove some of them.

WDYM? Shall we, or shall we not have a n_component param in apply to vary the number of components to use for denoising? If so, that means we potentially need to fit all possible components at fitting stage, which isn't the sklearn way.

kingjr on 22 Jul 2016

No need to compute sources array to denoise / remove some of them.

WDYM?

X_denoise = A_clean * W * X_noisy

but :

X_denoise = (A_clean * W) * X_noisy

is faster and less memory demanding than:

X_denoise = A_clean * (W * X_noisy)

ok?

Shall we, or shall we not have a n_component param in apply to vary the
number of components to use for denoising?

no strong opinion. I'd rather have n_components in init only though.

If so, that means we potentially need to fit all possible components at
fitting stage, which isn't the sklearn way.

then no

agramfort on 22 Jul 2016

For some application (or for ICA), you want to be able to pick / nullify a specific set of component. So having just a fixed number of components, ranked by the algorithm, is not enough IMO.

Having exlude and include as parameter of the apply() method, like it is for ICA, is a good idea. I'm not fond of having this in init as it will force you to call fit every time you want to change this parameter.

alexandrebarachant on 22 Jul 2016

@agramfort OK

@alexandrebarachant We could potentially pass either an int or a list to apply() to select the components. But overall, I think this kind of method is to be applied in more general data processing way, and is not suited for the sklearn-decoding / encoding pipeline that we're trying to achieve.

So in conclusion, I think we should primarily focus on the pipelinable features (methods, objects). We will move towards plotting, and re-use of sklearn-like transformers for non-decoding purposes in a second stage.

kingjr on 22 Jul 2016

@kingjr agreed.

alexandrebarachant on 22 Jul 2016

ok but don't deprecate what we have for now, just refactor to avoid code
dupes and convince use with cool examples with beautiful python code :)

agramfort on 22 Jul 2016

I'll just focus on enhancements from now on. If you want to keep the current functions, I'll let someone else take care of the corresponding refactoring.

kingjr on 22 Jul 2016

sounds like a plan.

agramfort on 22 Jul 2016

After discussing with @agramfort and @dengemann we decided to

put all sklearn transformers in mne.decoding
make sure all transformer call a function (e.g. fit_xdawn, transform_xdawn, compute_tfr)
put the function in the relevant directory (e.g. mne.time_frequency.compute_tfr)

kingjr on 29 Jul 2016

Sounds good to me thanks for taking the time to hash it out

larsoner on 29 Jul 2016

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

API: picks as list of string

kingjr · 66Comments

Working with rank-deficient EEGLAB data

hoechenberger · 43Comments

ENH: Agenda for fNIRS processing

rob-luke · 51Comments

Are projections applied when fitting ICA?

cbrnr · 64Comments

ICA arguments confusion

cbrnr · 80Comments