The ICA object takes three arguments that are somehow related to the number of components:
n_componentsmax_pca_componentsn_pca_componentsEvery time I use ICA in MNE I get confused. Why are there three seemingly related arguments? The only thing I ever needed to do is to reduce the number of PCA components prior to ICA (e.g. for average-referenced data the number of PCA components should be set to the number of channels minus one). Now since PCA is the first step in any ICA algorithm, I thought that n_components would set this value. But from the docs, it looks like this is really done by max_pca_components - or is it? And when do I need n_pca_components? I'm curious which use cases require these three arguments.
BTW, both EEGLAB's runica and scikit-learn's FastICA functions/objects only have n_components arguments.
Related to this question: what does the noise_cov argument do? Is the purpose to whiten the data if noise_cov is some covariance matrix or not whiten (i.e. only decorrelate) if noise_cov=None? If so, wouldn't it make more sense to introduce a boolean whiten argument instead? Here, it seems like MNE's ICA does not whiten by default, whereas EEGLAB and scikit-learn do whiten by default.
There was once an issue about that but I forgot to fix the docs. I made some notes about these args at that time however. I found these notes on my computer about yesterday so your timing is excellent :) Below I copy-paste the notes:
pca_components_ - pca_components x channels matrix
unmixing_matrix_ - ica_components x pca_components
mixing_matrix_ - pca_components x ica_components (inverse of unmixing matrix)
max_pca_components - the number of components used in PCA
n_components - the number of pca components that are further used in ICA,
can be a float - then selects first N components that
explain ...
This is also the number of ICA components that you will get
n_pca_components - the number of PCA components used when reconstructing the
data from ICA and PCA
max_pca_components >= n_pca_components >= n_components
if max_pca_components is set then n_components must be set too (but this is not checked for)
if n_components is None all components are used.
picks is used to infer max_pca_components if it is None
picks are on the other hand set with _pick_data_channels if None
n_components can be float - selecting PCA components used in ICA by exaplained variance, but n_components_ is always int indicating the number of selected PCA components
I also found following remark in these notes (not sure whether it was true at the time of writing that is about 1 yr ago or if it was true whether it still is the case):
it seems to me that there is a little too much freedom in the three components arguments. One can set max_pca_components < n_channels or set n_components < n_channels and get the same results. Similar situation is with setting max_pca_components and n_pca_components
Generally when going from data to ICs:
data --[ PCA and max_pca_components]--> PCs --[ICA and n_components] --> ICs
n_pca_components is used when going in the reverse order (IIRC to drop some PCs - ignore them in reconstructing the data?)
This is confusing. Can we combine max_pca_components and n_components into one argument (called n_components)? It seems like both arguments affect the number of ICA components I get back. Internally, this is achieved by setting the number of PCA components prior to ICA. However, as a user I'm usually only interested in setting the number of ICs I get back. This is what the n_components argument in sklearn.decomposition.FastICA or the pca argument in runica do. In my opinion, if you want to do something else than specify the number of ICs you want, you could run PCA explicitly before ICA (e.g. if you want to decompose a PCA subspace that accounts for 99% of the variance). Or we let n_components accept both int and float.
I think part of the confusion comes from the fact that in MNE, the unmixing matrix is ica_components x pca_components, whereas EEGLAB calls this weights and defines the unmixing matrix as weights @ sphere. And I still don't see the need for a n_pca_components argument - if someone really needs to drop PCA components in the reverse case, one could always do this manually with pca_components_ and unmixing_matrix_.
This is a very old API on which lots of user code relies and I see no
reason why it is broken or needs to be changed.
It is an important feature that our API allows us to precisely control the
relationship between ICA and PCA components used and discarded. I'd just
suggest to improve the documentation if needed.
On Thu, Jan 11, 2018 at 10:18 AM Clemens Brunner notifications@github.com
wrote:
This is confusing. Can we combine max_pca_components and n_components
into one argument (called n_components)? It seems like both arguments
affect the number of ICA components I get back. Internally, this is
achieved by setting the number of PCA components prior to ICA. However, as
a user I'm usually only interested in setting the number of ICs I get back.
This is what the n_components argument in sklearn.decomposition.FastICA
or the pca argument in runica do. In my opinion, if you want to do
something else than specify the number of ICs you want, you could run PCA
explicitly before ICA (e.g. if you want to decompose a PCA subspace that
accounts for 99% of the variance). Or we let n_components accept both int
and float.I think part of the confusion comes from the fact that in MNE, the
unmixing matrix is ica_components x pca_components, whereas EEGLAB calls
this weights and defines the unmixing matrix as weights @ sphere. And I
still don't see the need for a n_pca_components argument - if someone
really needs to drop PCA components in the reverse case, one could always
do this manually with pca_components_ and unmixing_matrix_.โ
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-356873729,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fiqwuK-peWm_9N1t32qsw8ZgQOTU3ks5tJdHNgaJpZM4RX9Zb
.
Where is this feature (controlling ICA and PCA components separately) actually used?
We used this a lot historically in several research projects and
developmentally it helped us to come up with the final API here that allows
you to run ICA on few PCA components while not throwing away anything when
reconstructing the signal. Another potential use case maybe additional
denoising by discarding PCA.
On Thu, Jan 11, 2018 at 10:22 AM Clemens Brunner notifications@github.com
wrote:
Where is this feature (controlling ICA and PCA components separately)
actually used?โ
You are receiving this because you commented.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-356874921,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fiiM9Q_-MGDqFPtV-Mh8WUSq3pJ7kks5tJdLlgaJpZM4RX9Zb
.
Yes, this may all be true, but still I find the usage rather confusing, especially for people used to most other ICA APIs.
Anyway, since an API change seems to be out of the question, I'll add a paragraph to the documentation to clarify these arguments.
That would be great. The important is to make the examples clear-cut and
just not advertise the more expert arguments. Additional reasons why I am
reluctant to change this API is that we discussed it in several papers and
it really makes explicit the relationship between ICA and PCA, which is
educational.
On Thu, Jan 11, 2018 at 11:29 AM Clemens Brunner notifications@github.com
wrote:
Yes, this may all be true, but still I find the usage rather confusing,
especially for people used to most other ICA APIs.Anyway, since an API change seems to be out of the question, I'll add a
paragraph to the documentation to clarify these arguments.โ
You are receiving this because you commented.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-356892944,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fimXKnPL4sDFFX1F9p3H7hq5xidPMks5tJeKkgaJpZM4RX9Zb
.
I'm also in favor of improving the docs and not changing the API.
Or we let n_components accept both int and float.
n_components already allows that - this is actually clearly stated in the docs. I mosly use only n_components - you don't need the other params for basic use cases.
n_componentsalready allows that - this is actually clearly stated in the docs. I mosly use onlyn_components- you don't need the other params for basic use cases.
Yes, I saw that, but I wasn't sure if it does the right thing (given that the percentage really applies to the number of PCA components it made more sense to me to have this in the max_pca_components argument).
So are you saying that if I want to compute ICA on a PCA subspace (say I have 64 EEG channels but due to average reference I want to compute ICA only on 63 components), n_components and max_pca_components actually do the same thing? If yes, this would be great in a way, because everyone else is using the n_components argument exactly for this purpose. Then we only need to make it clear in the docs that the other two arguments are for more specific use cases.
@cbrnr Yes, you will get the same number of ICs in ICA object, but different number of PCs IIRC (some will be thrown away). Then during the reconstruction of the signal this could lead to differences (because you will have different number of PCs, notice that max_pca_components >= n_pca_components).
I played around with the arguments a bit. The most common use case (which is implemented in EEGLAB and sklearn) is to compute ICs only on a PCA subspace, i.e. only a subset of PCA components is used to compute the ICs.
Here's an example. Let's assume we have an EEG data set consisting of 64 channels, i.e. the shape of x is (64, N). Let's further assume we want to compute only 60 ICs.
In EEGLAB, I'd specify pca=60 for this purpose. The results I'd get are of the following shapes:
icaweights: (60, 64)icasphere: (64, 64)icawinv: (64, 60)This works out, because the unmixing matrix can be computed by unmixing = icaweights @ icasphere and is of shape (60, 64). This unmixing matrix can then be applied to the data to get the sources as unmixing @ x, which results in the expected shape (60, N).
How do we achieve the same thing with MNE? First I thought that n_components=60 does the trick, but it results in the following matrices:
unmixing_matrix_: (60, 60)pca_components_: (64, 64)mixing_matrix_: (60, 60)In contrast to what I'd expect, I can't compute the sources by multiplying the unmixing matrix with the data: unmixing_matrix_ @ x has incompatible dimensions. Apparently, I have to do unmixing_matrix_ @ pca_components_[:60] @ x. This is completely unsatisfiable from a user perspective (why do I have to slice the PCA components?).
It looks like max_pca_components=60 actually does what I want:
unmixing_matrix_: (60, 60)pca_components_: (60, 64)mixing_matrix_: (60, 60)unmixing_matrix_ @ pca_components_ @ x. In fact, this is identical to what EEGLAB does, but EEGLAB correctly calls the matrices weights and sphere (and their product is unmixing).Finally, I have still no idea why n_pca_components is needed.
All this is to say that I still don't find calculating ICA in MNE intuitive. What exactly can not be done when there is only one argument (as in EEGLAB)? PCA and ICA are intrinsically linked, so if someone wants to to some special PCA preprocessing, I suggest to do this outside of ICA.
But if it's not possible to change the API, please confirm that max_pca_components is the counterpart to EEGLAB's n_components (and scikit-learn's n_components).
Oh, and also, how do I turn on whitening? I think whitening is disabled by default, whereas it is enabled in EEGLAB and scikit-learn.
I'll reply later, I'm in a bit of a hurry now. But you can safely just use n_components (n_components=60) and it will work just like in eeglab - using a subset of PCs to use in ICA.
Not quite. If I just use get_sources, then yes, but what if I want the (true) unmixing matrix? I'd have to compute it via unmixing_matrix_ @ pca_components_[:60]. Setting max_pca_components=60 seems to be more what I expect, because then the unmixing matrix is unmixing_matrix_ @ pca_components_.
That would make you reduce dimensionality which is not what you want. You
will use volume / amplitude. You need to use get sources to inverse
transform as you see correctly. And I am convinced that is good as it
allows us to get the behavior right for most use cases. Purists and ICA
researchers can use scikit-learn directly. Spherical pre-whitening is on by
default and makes sure your sensors are on the same scale. You can pass a
noise covariance as additional whitener.
On Fri 12 Jan 2018 at 14:39, Clemens Brunner notifications@github.com
wrote:
Not quite. If I just use get_sources, then yes, but what if I want the
(true) unmixing matrix? I'd have to compute it via unmixing_matrix_ @
pca_components_[:60]. Setting max_pca_components=60 seems to be more what
I expect, because then the unmixing matrix is unmixing_matrix_ @
pca_components_.โ
You are receiving this because you commented.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-357240596,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fipOUaEiDO6oTOjQDHAQLi7tz8fz2ks5tJ2CJgaJpZM4RX9Zb
.
Yes, reducing dimensionality is exactly what I want if I want fewer components than there are channels. So if I want only 60 components from my 64-channel data, ICA uses PCA as the initialization step to reduce dimensionality to 60. Then ICA finds 60 ICs given the 60 PCA components. The unmixing_matrix_ should be (60, 64), because s = U x with corresponding shapes (60, N) = (60, 64) x (64, N). MNE uses completely different definitions and names, and has three arguments that influence dimensionality of the resulting ICA decomposition. This is confusing, especially for newcomers, but it also continues to confuse me since I haven't figured out the exact meaning of the arguments yet. I think the main part of this confusion comes from the fact that MNE (historically) views ICA as a two-step process: first PCA, then ICA. But this view isn't compatible with the standard ICA equations and notations.
Since you are reluctant to change the API (and maybe for a good reason - I just haven't heard a convincing argument), I need to be able to understand what's going on in order to clarify the docs. As far as I can see, n_components and max_pca_components result in identical ICs and differ only in their pca_components_ property (the former retains all components, whereas the latter contains only the requested components). Other than that, these should be identical (actually they are not quite identical since the order of the components seems to differ even though I set an identical random_state) - if not, please let me know what the difference is.
Regarding whitening, the docstring of the ICA class describes the noise_cov argument as follows: "Noise covariance used for whitening. If None, channels are just z-scored." What does "just z-scored" mean? I guess this means that whitening is performed, but this is too technical and should be clarified. Furthermore, how could I turn off whitening? It would be useful to have a boolean whitening argument.
If None, channels are just z-scored." What does "just z-scored" mean?
Typically this means subtract mean and divide by standard deviation for each channel, i.e. each channel gets transformed into units "number of standard deviations from its mean". Given no better priors it is a reasonable way to get channels to be on roughly the same scale. But it would be good to check what the code actually does (for example, if the data are assumed zero mean, then that step could be omitted).
As for the other stuff I haven't worked with ICA much so can't help.
let me try to help here.
let's say you have 300 MEG channels and you want to remove EOG.
you fit ICA eg with 50 components after doing PCA reduction then pick EOG
components.
then you have 2 options:
the current API allows you to do this.
HTH
@cbrnr
Yes, reducing dimensionality is exactly what I want
As @dengemann and @agramfort nicely explain, this may or may not be what you want in the mixing step (ie when you go from sources back to sensors).
Given this discussion it is evident that this issue is not clear enough in the docs and should be elucidated. The argument names are very similar to each other, which makes them open to confusion, but before thinking about changing arg names it would be good to clarify the docs (and maybe adding also a note to ICA docs and tutorial).
Typically this means subtract mean and divide by standard deviation for each channel, i.e. each channel gets transformed into units "number of standard deviations from its mean". Given no better priors it is a reasonable way to get channels to be on roughly the same scale. But it would be good to check what the code actually does (for example, if the data are assumed zero mean, then that step could be omitted).
@larsoner so does the noise_cov argument have anything to do with whitening then? Standardizing channels before PCA is not whitening, so is it correct that whitening is always performed and that this cannot be changed with any arguments?
I'll respond to the other comments soon.
@agramfort why would you fit ICA only on 50 principal components? I would fit ICA on all 300 MEG channels, then remove the EOG ICs (let's assume you find only 1, although practically you will almost always find 2), and then back-project to channel space, which results in 300 MEG channels with eye artifacts removed. The PCA step should only reduce dimensionality if there are numerical problems (rank deficiency) so that ICA cannot run (e.g. in the case of average referenced data channels). Removing e.g. 250 PCs before ICA will remove brain activity, which is definitely not what you want.
It is much faster and in the case of SSSed data which are low rank the
right thing to do, e.g. with 64 components. The other thing is that many
early MEG papers did this, often with 25 components, which is still done in
some research groups. So people including myself want these options.
On Tue 16 Jan 2018 at 11:04, Clemens Brunner notifications@github.com
wrote:
@agramfort https://github.com/agramfort why would you fit ICA only on
50 components? I would fit ICA on all 300 MEG channels, then remove the EOG
ICs (let's assume you find only 1, although practically you will almost
always find 2), and then back-project to channel space, which results in
300 MEG channels with eye artifacts removed. The PCA step should only
reduce dimensionality if there are numerical problems (rank deficiency) so
that ICA cannot run. Removing e.g. 250 PCs before ICA will remove brain
activity, which is definitely not what you want.โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-357911585,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fio2P-_YQklJJbkoR79MCoDRDhWycks5tLHQfgaJpZM4RX9Zb
.
@dengemann OK, fair enough, but this is not common practice in the EEG community. I'd even argue that the speedup is not worth the potential loss in quality, because you will get a better decomposition if you run it on all available channels.
In any case, I am aware that these options are worth keeping, but I'd like to clearly document how to perform ICA EEGLAB-style with MNE. Currently, it seems like setting max_pca_components is the way to go, but I need you to confirm this. Same thing for the whitening option (see my comment above).
>
@dengemann https://github.com/dengemann OK, fair enough, but this is
not common practice in the EEG community. I'd even argue that the speedup
is not worth the potential loss in quality, because you will get a better
decomposition if you run it on all available channels.even on high density EEG with 200 electrodes?
how do you judge if it's better?
In any case, I am aware that these options are worth keeping, but I'd like
to clearly document how to perform ICA EEGLAB-style with MNE. Currently, it
seems like setting max_pca_components is the way to go, but I need you to
confirm this. Same thing for the whitening option (see my comment above).
yes I agree we should improve API/DOC...
even on high density EEG with 200 electrodes?
Of course. The more channels, the better the decomposition. Scott Makeig's lab routinely use 256 channels, otherwise it would not be possible to get clearly separated muscle components.
how do you judge if it's better?
That's a good question, but the common argument is that if you only keep the first N PCA components (ranked by variance), you will lose some amount of brain activity (sources of brain activity are not orthogonal). I agree that if ICA is purely used to remove some ocular components, this might not be such a big issue, but even in that case I'd rather compute a bit longer on all channels and get potentially better results.
yes I agree we should improve API/DOC...
I thought changing the API was already ruled out. I'm fine with this, but let's clearly document how to use MNE ICA to perform EEGLAB-style ICA.
I think it would be useful to create a table explaining options mapping
from MNE to EEGLAB.
โIf you want to run EEGLAB style ICA choose x,y,z ..โ
On Tue 16 Jan 2018 at 12:22, Clemens Brunner notifications@github.com
wrote:
even on high density EEG with 200 electrodes?
Of course. The more channels, the better the decomposition. Scott Makeig's
lab routinely use 256 channels, otherwise it would not be possible to get
clearly separated muscle components.how do you judge if it's better?
That's a good question, but the common argument is that if you only keep
the first N PCA components (ranked by variance), you will lose some amount
of brain activity (sources of brain activity are not orthogonal). I agree
that if ICA is purely used to remove some ocular components, this might not
be such a big issue, but even in that case I'd rather compute a bit longer
on all channels and get potentially better results.yes I agree we should improve API/DOC...
I thought changing the API was already ruled out. I'm fine with this, but
let's clearly document how to use MNE ICA to perform EEGLAB-style ICA.โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-357931025,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0finxESvjGogdAPums1JSVJlv-krdWks5tLIaSgaJpZM4RX9Zb
.
>
Of course. The more channels, the better the decomposition. Scott Makeig's
lab routinely use 256 channels, otherwise it would not be possible to get
clearly separated muscle components.well I would argue that with many sources the optimization and estimation
is more difficult. If you have less true sources
than the number of EEG channels then you break the ICA model and algorithms
like FastICA may not converge and Infomax
can be a pain to tune and use.
that's lessons learnt with @pierreablin in this work
https://arxiv.org/abs/1706.08171
see figure 2 and 3.
how do you judge if it's better?
That's a good question, but the common argument is that if you only keep
the first N PCA components (ranked by variance), you will lose some amount
of brain activity (sources of brain activity are not orthogonal). I agree
that if ICA is purely used to remove some ocular components, this might not
be such a big issue, but even in that case I'd rather compute a bit longer
on all channels and get potentially better results.
I am playing the devil here on purpose as the notion of "better" in ICA is
hard to define. If there is a notion of
better this would simplify our work of evaluation/comparison of ICA
algorithms.
If I may hijack the conversation, what would you like to see to be
convinced that an ICA algorithm is the
best for EEG/MEG data?
yes I agree we should improve API/DOC...
I thought changing the API was already ruled out. I'm fine with this, but
let's clearly document how to use MNE ICA to perform EEGLAB-style ICA.I am not against changing the API although I would love to avoid it.
what is the proposition?
well I would argue that with many sources the optimization and estimation
is more difficult. If you have less true sources
than the number of EEG channels then you break the ICA model and algorithms
like FastICA may not converge and Infomax
can be a pain to tune and use.that's lessons learnt with @pierreablin in this work
https://arxiv.org/abs/1706.08171
see figure 2 and 3.
Nice paper! Yes, you are right, it is naive to assume that increasing the number of channels solves everything, but reducing dimensionality with PCA doesn't seem to be a very good solution.
Slightly OT: without having read the paper (just quickly browsed it), you mention that AMICA already uses Newton (in contrast to Infomax), which seems to yield much better results in terms of convergence. Did you actually implement AMICA in Python?
I am playing the devil here on purpose as the notion of "better" in ICA is
hard to define. If there is a notion of better this would simplify our work of evaluation/comparison of ICA algorithms.
I agree.
If I may hijack the conversation, what would you like to see to be
convinced that an ICA algorithm is the
best for EEG/MEG data?
This also depends on how you use ICA. The EEGLAB people have used a measure of dipolarity and a reduction in mutual information as a quality measure, which is OK if you want physiological components. If you use ICA just to remove ocular artifacts, this might or might not be suitable - I haven't seen a study comparing different ICA algorithms for the purpose of removing eye activity. For this reason, I'd love to see AMICA in MNE (but maybe there are even better algorithms around, I might be biased towards EEGLAB methodology). On the other hand, it seems like Extended Infomax is good enough, but especially for artifact correction it is generally not advised to perform PCA dimensionality reduction first (see several discussion on the EEGLAB mailing list, e.g. here).
I am not against changing the API although I would love to avoid it.
what is the proposition?
I like @dengemann's suggestion to provide MNE analogies to EEGLAB settings in the docs (docstring).
>
Nice paper! Yes, you are right, it is naive to assume that increasing the
number of channels solves everything, but reducing dimensionality with PCA
doesn't seem to be a very good solution.yes I agree. PCA is not ideal for sure.
Slightly OT: without having read the paper (just quickly browsed it), you
mention that AMICA already uses Newton (in contrast to Infomax), which
seems to yield much better results in terms of convergence. Did you
actually implement AMICA in Python?AMICA adds something to our algo PICARD which is the estimation of the
"score function" that comes from the probability density of each source.
if you fix the score / density and you use m=0 in our code then we do the
same as AMICA :-)
@pierreablin has some code but it's not public (yet?). His code so far does
not implement
the same family of densities.
I am playing the devil here on purpose as the notion of "better" in ICA is
hard to define. If there is a notion of better this would simplify our
work of evaluation/comparison of ICA algorithms.I agree.
If I may hijack the conversation, what would you like to see to be
convinced that an ICA algorithm is the
best for EEG/MEG data?This also depends on how you use ICA. The EEGLAB people have used a
measure of dipolarity and a reduction in mutual information as a quality
measure, which is OK if you want physiological components. If you use ICA
just to remove ocular artifacts, this might or might not be suitable - I
haven't seen a study comparing different ICA algorithms for the purpose of
removing eye activity. For this reason, I'd love to see AMICA in MNE (but
maybe there are even better algorithms around, I might be biased towards
EEGLAB methodology). On the other hand, it seems like Extended Infomax is
good enough, but especially for artifact correction it is generally not
advised to perform PCA dimensionality reduction first (see several
discussion on the EEGLAB mailing list, e.g. here
https://sccn.ucsd.edu/pipermail/eeglablist/2013/006101.html).my experience is that dipolarity can be misleading. See figure1 in :
https://hal.archives-ouvertes.fr/hal-01451432/document
and MIR gives you very minor differences between the different algos. it's
a metric quite far of
the user experience.
I am not against changing the API although I would love to avoid it.
what is the proposition?
I like @dengemann https://github.com/dengemann's suggestion to provide
MNE analogies to EEGLAB settings in the docs (docstring).
+1
>
>
AMICA adds something to our algo PICARD which is the estimation of the "score function" that comes from the probability density of each source.
if you fix the score / density and you use m=0 in our code then we do the same as AMICA :-)
@pierreablin has some code but it's not public (yet?). His code so far does not implement the same family of densities.
Cool, so there is hope to have something similar to AMICA in MNE soon!
my experience is that dipolarity can be misleading. See figure1 in : https://hal.archives-ouvertes.fr/hal-01451432/document
Thanks for the link - another interesting paper to read! Now it strikes me as odd that Figure 4 in Delorme et al., 2012 does not include sphering in panels B and C, which compare the different algorithms with respect to the dipolarity measure. Interestingly, sphering is included in panels A and D, where dipolarity is not discussed.
The fact that sphering alone produces dipolar components does not automatically rule out dipolarity as a quality measure for source separation algorithms. Since sphering is used to kickstart Infomax, I assume that Infomax improves on the dipolarity measure. But I guess you discuss this in your paper, which I haven't read yet.
and MIR gives you very minor differences between the different algos. it's a metric quite far of the user experience.
Then which measure would you suggest? Which ICA algorithm would you prefer? Maybe a new algorithm comparison is in order (since the one from Delorme et al. is already 6 years old).
Anyway, we should probably discuss these things not in this thread - I'd be happy to talk about this further, maybe there's even a possibility for collaboration.
Regarding MNE's ICA implementation, I'll prepare a doc update.
Sounds terrific!
On Wed 17 Jan 2018 at 08:56, Clemens Brunner notifications@github.com
wrote:
AMICA adds something to our algo PICARD which is the estimation of the
"score function" that comes from the probability density of each source.if you fix the score / density and you use m=0 in our code then we do the
same as AMICA :-)@pierreablin https://github.com/pierreablin has some code but it's not
public (yet?). His code so far does not implement the same family of
densities.Cool, so there is hope to have something similar to AMICA in MNE soon!
my experience is that dipolarity can be misleading. See figure1 in :
https://hal.archives-ouvertes.fr/hal-01451432/documentThanks for the link - another interesting paper to read! Now it strikes me
as odd that Figure 4 in Delorme et al., 2012
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030135
does not include sphering in panels B and C, which compare the
different algorithms with respect to the dipolarity measure. Interestingly,
sphering is included in panels A and D, where dipolarity is not discussed.The fact that sphering alone produces dipolar components does not
automatically rule out dipolarity as a quality measure for source
separation algorithms. Since sphering is used to kickstart Infomax, I
assume that Infomax improves on the dipolarity measure. But I guess you
discuss this in your paper, which I haven't read yet.and MIR gives you very minor differences between the different algos. it's
a metric quite far of the user experience.Then which measure would you suggest? Which ICA algorithm would you
prefer? Maybe a new algorithm comparison is in order (since the one from
Delorme et al. is already 6 years old).Anyway, we should probably discuss these things not in this thread - I'd
be happy to talk about this further, maybe there's even a possibility for
collaboration.Regarding MNE's ICA implementation, I'll prepare a doc update.
โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-358225611,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fir4IzfpziP4oqVfXgEWsRod_lkqMks5tLafBgaJpZM4RX9Zb
.
OK, for the docs I need your opinion/expertise. Let's take our 64-channel EEG example, which we want to reduce to 63 dimensions via PCA (because it is average referenced).
n_components=63 returns unmixing_matrix_ of shape 63, 63 and pca_components_ of shape 64, 64. How do I get the unmixing matrix (which unmixes the original 64-channel data into 63 ICs, i.e. shape 63, 64)? unmixing_matrix_ @ pca_components_[:63] or unmixing_matrix_ @ pca_components_[1:]?max_pca_components=63 returns unmixing_matrix_ of shape 63, 63 and pca_components_ of shape 63, 64. Therefore, the unmixing matrix is unmixing_matrix_ @ pca_components_ (shape 63, 64).Do these two approaches yield the same decomposition? I assume yes, but I couldn't confirm with toy data because the component order differs.
The get_sources code should document the different possibilities. We just
need to verbalize what it does :)
On Wed 17 Jan 2018 at 15:14, Clemens Brunner notifications@github.com
wrote:
OK, for the docs I need your opinion/expertise. Let's take our 64-channel
EEG example, which we want to reduce to 63 dimensions via PCA (because it
is average referenced).
- n_components=63 returns unmixing_matrix_ of shape 63, 63 and
pca_components_ of shape 64, 64. How do I get the unmixing matrix
(which unmixes the original 64-channel data into 63 ICs, i.e. shape 63,
64)? unmixing_matrix_ @ pca_components_[:63] or unmixing_matrix_ @
pca_components_[1:]?- max_pca_components=63 returns unmixing_matrix_ of shape 63, 63 and
pca_components_ of shape 63, 64. Therefore, the unmixing matrix is unmixing_matrix_
@ pca_components_ (shape 63, 64).Do these two approaches yield the same decomposition? I assume yes, but I
couldn't confirm with toy data because the component order differs.โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-358316530,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fijNykpE8zeYsTTRs9QwPJ8_UU4tpks5tLgAzgaJpZM4RX9Zb
.
>
Cool, so there is hope to have something similar to AMICA in MNE soon!
yes
my experience is that dipolarity can be misleading. See figure1 in :
https://hal.archives-ouvertes.fr/hal-01451432/documentThanks for the link - another interesting paper to read! Now it strikes me
as odd that Figure 4 in Delorme et al., 2012
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030135
does not include sphering in panels B and C, which compare the
different algorithms with respect to the dipolarity measure. Interestingly,
sphering is included in panels A and D, where dipolarity is not discussed.
indeed :)
The fact that sphering alone produces dipolar components does not
automatically rule out dipolarity as a quality measure for source
separation algorithms.yes and especially for MEG.
Since sphering is used to kickstart Infomax, I assume that Infomax
improves on the dipolarity measure. But I guess you discuss this in your
paper, which I haven't read yet.yes we discuss this.
and MIR gives you very minor differences between the different algos. it's
a metric quite far of the user experience.Then which measure would you suggest? Which ICA algorithm would you
prefer? Maybe a new algorithm comparison is in order (since the one from
Delorme et al. is already 6 years old).that's why I ask the question :)
Anyway, we should probably discuss these things not in this thread - I'd
be happy to talk about this further, maybe there's even a possibility for
collaboration.Regarding MNE's ICA implementation, I'll prepare a doc update.
great
According to https://github.com/cbrnr/mne-python/blob/dc9d4da05cd18dcd0ce43ddfc6b8cb40bc8dea8d/mne/preprocessing/ica.py#L636, the first 63 components are selected, which answers the first question.
I think the main problem now is this: if someone wants the unmixing matrix, it is not ica.unmixing_matrix_ but rather what I mentioned before. Maybe it is sufficient to clearly document this, because getting the sources has its own method. But this still does not clarify the 3 different args.
We could think of adding a method to get the matrices ...
On Wed 17 Jan 2018 at 16:49, Clemens Brunner notifications@github.com
wrote:
I think the main problem now is this: if someone wants the unmixing
matrix, it is not ica.unmixing_matrix_ but rather what I mentioned
before. Maybe it is sufficient to clearly document this, because getting
the sources has its own method. But this still clarifies the 3 different
args.โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-358346642,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fimbrIDrCc3RjjARVNg8pAfM0l7ooks5tLhZ3gaJpZM4RX9Zb
.
@dengemann I'd rename ica.unmixing_matrix_ to e.g. ica.weights_ and put the true unmixing matrix into ica.unmixing_matrix_. Otherwise, people will be confused if there are two methods/properties related to getting the unmixing matrix.
careful because renaming requires deprecation cycle
Why not .get_unmixing_matrix ? No depracation needed.
On Thu 18 Jan 2018 at 10:49, Alexandre Gramfort notifications@github.com
wrote:
careful because renaming requires deprecation cycle
โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-358593857,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fimyrt7vywpZtSdIaxBoWgMBBn_c7ks5tLxOSgaJpZM4RX9Zb
.
Sure, .get_unmixing_matrix makes life easier for us devs, but then the users sees .get_unmixing_matrix and .unmixing_matrix_.
Well. They should use the function in case of doubt :)
On Thu 18 Jan 2018 at 10:52, Clemens Brunner notifications@github.com
wrote:
Sure, .get_unmixing_matrix makes life easier for us devs, but then the
users sees .get_unmixing_matrix and .unmixing_matrix_.โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-358594837,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fig9A1zGgHU7D1sAisIxofkD3dLVSks5tLxRXgaJpZM4RX9Zb
.
ahem you know what happens if you assume that users should do something...
Oh yeah. You want to tell me that more people will open issues like this
one and keep us busy :)
Let me think a bit if it may be the case to deprecate the public attribute
and add a function that returns whatever is desired via arguments.
On Thu 18 Jan 2018 at 10:54, Clemens Brunner notifications@github.com
wrote:
ahem you know what happens if you assume that users should do
something...โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-358595443,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0firvfr0tnfZdFUbGq9DB_J6v2Nn65ks5tLxTWgaJpZM4RX9Zb
.
So I dug into the ICA code and found out that n_components=63 and max_pca_components=63 should yield identical ICA results. The data argument for the ICA algorithm is identical, but the random_state isn't (because of the way the data is PCA-transformed), so due to a different random state the results are not identical. I'm not sure if this is a bug.
In any case, the only difference in these two arguments is that for n_components=63, all PCA components are returned and then only the first 63 are used for ICA. In the case of max_pca_components=63, the PCA algorithm returns only the first 63 components, which is directly used for ICA. So the input data for ICA are identical, but (as I've said before) (1) the random states are different, and (2) ica.pca_components_ are also different (in the latter, the last component is missing, but otherwise these arrays are also identical).
This again leads me to believe that in fact these two arguments are actually doing the same thing, and only one of these arguments is needed.
(I have several questions on the algorithm, but let's focus on the n_components and max_pca_components arguments in this issue and I'll open a new issue for the new questions.)
so due to a different random state the results are not identical. I'm not sure if this is a bug.
It shouldn't be, without the same random state ICA results will be slightly different from run to run.
This again leads me to believe that in fact these two arguments are actually doing the same thing, and only one of these arguments is needed.
They are not doing the same when you are applying ICA (.apply()). With max_pca_components=63 you will remove the ignored PCA components from the data.
It shouldn't be, without the same random state ICA results will be slightly different from run to run.
The problem is that the random_state object is passed to the PCA fitting function, which apparently uses the generator a different number of times when you specify a different n_components number.
They are not doing the same when you are applying ICA (.apply()). With max_pca_components=63 you will remove the ignored PCA components from the data.
I don't understand. I get 63 ICs in both cases. This is achieved by dropping 1 PCA component before ICA. It doesn't matter which argument I use. In both cases, I will end up with 63 ICs.
I don't understand. I get 63 ICs in both cases. This is achieved by dropping 1 PCA component before ICA. It doesn't matter which argument I use. In both cases, I will end up with 63 ICs.
Yes, the difference would be in the reconstructed data. When reconstructing the data from your original data and ICs you either use the remainging PCs (the signal from these components) or not (throw it away), take a look at the "diagram" I posted at the beginning of this discussion. max_pca_components influences the reconstruction of your data when you do ica.apply().
can you try without using a randomized PCA? it's slower but should be more
deterministic
@cbrnr Sorry, I might have not been clear enough. The same could be said about n_pca_components (max_pca_componets has the same effect in reconstruction). So overall I agree that the interdependence between the args results in an API with three *_components parameters where two would be enough. However I still think changing the docs is more pressing as some parts are confusing. A short example:
n_components : int | float | None
The number of components used for ICA decomposition. If int, it must be
**smaller** then max_pca_components. If None, **all PCA components will be
used**. If float between 0 and 1 components will be selected by the
cumulative percentage of explained variance.
(emphasis mine)
n_components actually has to be <= than max_pca_componentsdata --> PCA --> ICAand now, one I've read the notes I pasted at the beginning more carefully I find the same conclusion there: "it seems to me that there is a little too much freedom in the three components arguments. One can set max_pca_components < n_channels or set n_components < n_channels and get the same results. Similar situation is with setting max_pca_components and n_pca_components" :)
@mmagnuski I think one argument is sufficient for ICA calculation, so why not keep only n_components in __init__ and use max_pca_components in the apply method? Interestingly, this method already has an argument n_pca_components which might already do what we want.
More ICA-related questions (not related to API, but to the algorithm) in #4882.
@agramfort yes, using regular PCA does yield identical random_state objects after PCA (and thus identical ICA results for both n_components=63 and max_pca_components=63. Should we consider replacing randomized PCA with regular PCA?
Maybe we should validate this against the background of our latest exchange
@cbrunner. Seems like we might want to make it the default.
On Fri 19 Jan 2018 at 10:37, Clemens Brunner notifications@github.com
wrote:
@agramfort https://github.com/agramfort yes, using regular PCA does
yield identical random_state objects after PCA (and thus identical ICA
results for both n_components=63 and max_pca_components=63. Should we
consider replacing randomized PCA with regular PCA?โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-358914252,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fip1BifmzYZ9bWtEzChDAIBRUKf2gks5tMGJrgaJpZM4RX9Zb
.
@dengemann I don't think this is the source of randomness you are seeing across different machines. This only yields different results if you use either n_components or max_pca_components, but results are consistent across multiple runs if you keep using the same argument.
Would have been too clear-cut and satisfying to find a silly bug that
explains it.
On Fri 19 Jan 2018 at 10:51, Clemens Brunner notifications@github.com
wrote:
@dengemann https://github.com/dengemann I don't think this is the
source of randomness you are seeing across different machines. This only
yields different results if you use either n_components or
max_pca_components, but results are consistent across multiple runs if
you keep using the same argument.โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-358917827,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fins1aMuSulzaoO8Ei7YjVAKFB5nVks5tMGW5gaJpZM4RX9Zb
.
randomized pca with n_comps == 30 will not give you the same 15 comps than
if you run directly it with 15 comps. It's how the algo works. It is
however guaranteed by non-randomized methods. It's a trade-off with speed
vs determinism.
Right. Good to recap.
On Fri 19 Jan 2018 at 11:01, Alexandre Gramfort notifications@github.com
wrote:
randomized pca with n_comps == 30 will not give you the same 15 comps than
if you run directly it with 15 comps. It's how the algo works. It is
however guaranteed by non-randomized methods. It's a trade-off with speed
vs determinism.โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-358920275,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fisggOlr5M-Jm2D8cV5rkJqxdm4zCks5tMGf8gaJpZM4RX9Zb
.
@agramfort has anyone ever tested this? It seems like the randomized solver is not optimal for the typical MEEG use case. PCA(solver='full') takes 3.7s and PCA(solver='randomized') takes 13.2s on my machine on the same data...
well I "think" we did with @dengemann but it may depend on your data
size.... anyway I feel also that the randomize PCA may not be what we want
here... it's negligible compared to ICA time anyway...
Exactly, even if it were faster it doesn't matter because ICA takes orders of magnitude longer. I'll prepare a PR.
There is also quite some history in this. 2013 things looked different also
in scikit-learn. The randomized solver used to be faster. I think
especially on large data.
On Fri 19 Jan 2018 at 11:14, Alexandre Gramfort notifications@github.com
wrote:
well I "think" we did with @dengemann but it may depend on your data
size.... anyway I feel also that the randomize PCA may not be what we want
here... it's negligible compared to ICA time anyway...โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-358923460,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fig4clpEiMA1qJHa7ZNjkVxZ5xgCfks5tMGr8gaJpZM4RX9Zb
.
First validate on a 2GB raw file.
On Fri 19 Jan 2018 at 11:15, Denis-Alexander Engemann <
[email protected]> wrote:
There is also quite some history in this. 2013 things looked different
also in scikit-learn. The randomized solver used to be faster. I think
especially on large data.
On Fri 19 Jan 2018 at 11:14, Alexandre Gramfort notifications@github.com
wrote:well I "think" we did with @dengemann but it may depend on your data
size.... anyway I feel also that the randomize PCA may not be what we want
here... it's negligible compared to ICA time anyway...โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-358923460,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fig4clpEiMA1qJHa7ZNjkVxZ5xgCfks5tMGr8gaJpZM4RX9Zb
.
Do you have such a large file for me to test? Is there something in the sample data?
Maybe just add some noise and tile the data ... creating a long noisy copy.
On Fri 19 Jan 2018 at 11:18, Clemens Brunner notifications@github.com
wrote:
Do you have such a large file for me to test? Is there something in the
sample data?โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-358924569,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fio04s570_9wXfIDqb5vX-evYuxCeks5tMGwNgaJpZM4RX9Zb
.
It also depends on input dimensionality.
On Fri 19 Jan 2018 at 11:20, Denis-Alexander Engemann <
[email protected]> wrote:
Maybe just add some noise and tile the data ... creating a long noisy copy.
On Fri 19 Jan 2018 at 11:18, Clemens Brunner notifications@github.com
wrote:Do you have such a large file for me to test? Is there something in the
sample data?โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-358924569,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fio04s570_9wXfIDqb5vX-evYuxCeks5tMGwNgaJpZM4RX9Zb
.
Alternatively: propose PR and I check before/after on a few files
On Fri 19 Jan 2018 at 11:20, Denis-Alexander Engemann <
[email protected]> wrote:
It also depends on input dimensionality.
On Fri 19 Jan 2018 at 11:20, Denis-Alexander Engemann <
[email protected]> wrote:Maybe just add some noise and tile the data ... creating a long noisy
copy.On Fri 19 Jan 2018 at 11:18, Clemens Brunner notifications@github.com
wrote:Do you have such a large file for me to test? Is there something in the
sample data?โ
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-358924569,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fio04s570_9wXfIDqb5vX-evYuxCeks5tMGwNgaJpZM4RX9Zb
.
I tiled my EEG data to create a 2.4GB (4935680, 64) array and compared full PCA vs. randomized PCA:
Keep all components: randomized: 193.0s, full: 48.8s.
Keep only 10 components: randomized: 41.8s, full: 48.5s.
I guess we can safely move to full PCA, but feel free to test this with real data.
Migrating from #5054 ...
Like @cbrnr, I have issues with the ICA API and would appreciate a quick check on my intuition about it:
My current understanding is:
Xmax_pca_components and project X to X_reducedn_components is doing something ... but what?X_reduced, we compute an ICA and have X_reduced_icaX_reduced_ica, we retain/reject componentsX_reduced_ica using the ICA weights and have X_reduced'' indicating that some components were rejected in the ICA ... but the dimensions are the same)X_reduced' can now be back-projected to X' using the PCA weights of dimension max_pca_componentsX' instead of the original X because of two reasons: (a) some ICA components were missing (rejected) to get the same X_reduced and (b) X_reduced does not contain all pca components so we cannot recover all the data) n_pca_components comes into play by using more than just max_pca_components to construct X' from X_reduced'Furthermore, given the discussion about PCA before ICA, there is a recent paper that warns about performing dimension reduction prior to ICA
Maybe we should include a warning that performing dimensionality reduction (even only by 1% variance) before ICA can lead to unreliable results? Just as @cbrnr has pointed out in #5054 we can mention the case that dimension reduction is necessary in the case of e.g., the average reference. That information might fit well next to the warning note about filters, wouldn't it? :-)
Hi,
I have just made some comments to this #5114 docstring commit, after having dived through the ICA code the whole day in order to get a more profound understanding of what is going on with the different component-settings. What actually provoked this undertaking were two mistakes in the specification of the pca_components_-shape in the doc.
@cbrnr, maybe you could review and consider it? See here: https://github.com/mne-tools/mne-python/commit/1a09859c6e5a7749b25a6c78403c0f02b225f0f5
(I am not sure, if this is the right way to contribute anything to this issue; if not, please let me know the most proper way.)
@jeythekey can you make a PR to fix the problems you've identified?
OK, done so (my first one ;): #5620
Closing as (optimistically!) fixed, @sappelhoff or others if it's not, feel free to reopen
That's indeed optimistic ๐ - when I have more time I'll come back to this.
That's indeed optimistic smile - when I have more time I'll come back to this.
+1, same here :-)
... at the latest the next time I want to use it. Fair enough to close it for now.
I don't think that I'm the person you guys are intending to copy on this
thread.
On Tue, Nov 13, 2018 at 2:39 AM Stefan Appelhoff notifications@github.com
wrote:
That's indeed optimistic smile - when I have more time I'll come back to
this.+1, same here :-)
... at the latest the next time I want to use it. Fair enough to close it
for now.โ
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4856#issuecomment-438166669,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAaDyd978ie6iNejhlZ_nlE50N0H_DHmks5uundAgaJpZM4RX9Zb
.
Ahh yes sorry for the noise @cbrunner, we meant to tag @cbrnr -- you should be able to follow the "mute this thread" link in your email to disable notifications (or do so via the GitHub web interface)
Most helpful comment
OK, done so (my first one ;): #5620