Mne-python: GSoC, ENH: Improve Time Frequency Analysis in Source Space

Created on 9 May 2019  路  23Comments  路  Source: mne-tools/mne-python

Hey everyone, I'm going to work on improving Time Frequency Analysis in Source Space as my Google Summer of Code project this year!
I opened this issue, so we can discuss different aspects of the project, e.g. what you would like to have implemented and what not. To get an idea of the project, you can take a look at this google doc, which is a copy of my GSoC proposal. Feel free to comment on the google doc, anyhow, it would probably be better if we keep the general discussion within the thread of this issue.

Problem:

For TFR Analysis in source space, it would be nice to be able to apply MNE's standard repertoire of TFR functions from mne.time_frequency on SourceEstimate data, instead of being restricted to use the source_induced_power and source_band_induced_power functions.

Solution/ alternatives:

There are some possible solutions listed in the proposal.
The basic idea was to enable mne.time_frequency functions to take VectorSourceEstimates and VolVectorSourceEstimates as data input and return one or more SourceTFR classes, which contain time-frequency transformed source level data.
Anyhow, feel free to express your opinion on any solution or alternative (that's what this issue was created for).

Most helpful comment

If we're going to get rid of the Vector/non-vector and time/freq class-based distinctions, we might as well get rid of the surface/volume/mixed class-based distinctions, too. We don't have great support for mixed source spaces and trying to have a general class could help fix this problem.

I propose we come up with a new class that's for any type of source-level data we want to work with. I'd call it SourceData because even the idea of it being an "estimate" is a misnomer (e.g. when simulating you're actually giving ground truth). Here is a first pass at a draft spec for what a SourceData instance sd has and can do:

  • sd.dims tuple of str saying what the data dims correspond to, in order (with optionality; n_orientations always 1 or 3.):
    (n_vertices, n_orientations[, n_epochs][, n_frequencies][, n_times])
  • sd.src for the source space corresponding to the data/vertices (a stripped-down version, as src for surfaces with distances can be hundreds of MB)
  • sd.subject for the subject
  • sd.times and sd.frequencies as ndarray or None
  • sd.units for the data: e.g. 'nAm' (MNE), 'F' (dSPM), or 'AU' (for now; maybe we also can figure out the units for time-freq data if it's properly normalized somehow)
  • sd.orientations either ('normal',) or ('X', 'Y', 'Z')

For methods:

  • sd.get_data() to get an ndarray containing the data for a subset of vertices/epochs/times/freqs. This, rather than always creating a .data attribute, is useful for efficiency. In cases where we have a linear imaging kernel, a lot of stuff could be simplified e.g. iterating over epochs, doing TF transformation, apply kernel, take magnitude, etc. In other words: the default behavior of e.g. apply_inverse is to not have a full sd.data stored in memory.
  • sd.load_data() sd.apply_kernel() to create and populate the sd.data array if people want the full array available and modifiable in sd.data
  • sd.plot_3d() -- hopefully there is some convergence here with @GuillaumeFavelier's work that this will use surface plots for the surface source spaces and some 3D volume rendering for the volume source spaces
  • sd.plot_orthoview() -- plot in 3D volumetric space like our volume plotters currently do
  • sd.save() to HDF5

I think this could be made to work for all variants of *SourceEstimate we currently have, and all of their currently implemented methods.

All 23 comments

I asked @DiGyt to start a conversation here to get more opinions.

We have already a few SourceEstimate classes such as VolSourceEstimate, MixedSourceEstimate, VectorSourceEstimate, VolVectorSourceEstimate and i have doubts on adding even more of these to have time frequency content in source space.

does anyone has some opinion on this?
thoughts on how to limit the number of classes?
maybe we can first simplify before adding more complexity?

just to instigate the discussion a bit, I have some questions

can you post some usage example? what is the "kernel trick" in the proposal of the google doc? Is it like a LinearOperator? Do you foresee the apply_inverse method also accepting TFR objects?

just to instigate the discussion a bit, I have some questions

can you post some usage example? what is the "kernel trick" in the proposal of the google doc? Is it like a LinearOperator? Do you foresee the apply_inverse method also accepting TFR objects?

Hey @jasmainak . Thanks for the reply!
Some easy example would be to be able to do something like:

import numpy as np
import mne
from mne.datasets import sample
from mne.time_frequency import tfr_multitaper, tfr_stockwell
from mne.minimum_norm import read_inverse_operator, apply_inverse


data_path = sample.data_path()
ave_fname = data_path + '/MEG/sample/sample_audvis-ave.fif'
inv_fname = data_path +\
            '/MEG/sample/sample_audvis-meg-eeg-oct-6-meg-eeg-inv.fif'


evoked = mne.read_evokeds(ave_fname, baseline=(None, 0), condition=0)

inv = read_inverse_operator(inv_fname)
stc = apply_inverse(evoked, inv,pick_ori="vector")


freqs = np.arange(5., 100., 3.)
n_cycles = freqs / 2.
t_bandwidth = 8

src_multi = tfr_multitaper(stc, freqs=freqs, n_cycles=n_cycles,
                         time_bandwidth=t_bandwidth)

src_stockw = tfr_stockwell(stc, fmin=5, fmax=100)

and then go on to do stuff like stc.plot() or stc.get_peak(fmin=8, fmax=12).

As said, the idea is just making everything from mne.time_frequency available to be used with stcs.
I think the most intuitive way for users would be to read stcs into the mne.time_frequency functions just like it can read evoked or epochs data.
Accordingly, feeding TFR data to apply_inverse was not within my scope, i thought to rather go the opposite way.

Reading 'kernel trick' data into time_frequency functions is meant to be a measure to prevent SourceTFR objects from growing too large. This 'trick' is already implemented in mne.SourceEstimate, it basically works by keeping a large matrix of (n_vertices, n_times) within a tuple of smaller matrices ((n_vertices, n_sensors), (n_sensors, n_times)) which can the be transform into the real data by taking the dot product. As i see it, this trick isn't used very often, but being able to do so when creating SourceTFR objects might be pretty helpful to keep down memory usage.

hmm okay ... wouldn't it be prohibitively time consuming to do TFR on the sources? Could you do it in sensor space and then do apply_inverse? what about epochs? how will you handle them? I know they are really slow to compute even on sensor space.

also how is the 5D visualization work? Is it going to live as a method of some class? or just a function? will it work both for surface and volume source estimates?

hmm okay ... wouldn't it be prohibitively time consuming to do TFR on the sources? Could you do it in sensor space and then do apply_inverse? what about epochs? how will you handle them? I know they are really slow to compute even on sensor space.

Well yes, although I'd say that (in my environment) it's not prohibitively time consuming, but it's surely the slower solution.
I think this may be the central decision here, whether we put stcs into time_frequency functions or put TFR data into apply_inverse. From the initial discussions I thought that the first may be the more intuitive solution, although it certainly has its downsides too. I'd say this is a question I'd rather relay to you core developers, since you have a better oversight of what's more practical and all the possible implications.

Handling of epochs is another part in question. Since multiple stcs are also stored as lists of stcs, and TFR data multiplies the size of these stcs, i would say that giving out 'epochs' of SourceTFR as lists of SourceTFR would be the primary solution, while creating an EpochsSourceTFR will probably need such things as the kernel trick to keep the size down.

also how is the 5D visualization work? Is it going to live as a method of some class? or just a function? will it work both for surface and volume source estimates?

I'd say the best thing for surface data is certainly to go the same way as in plot_source_estimates i.e. construct a TimeFrequencyViewer from the existing TimeViewer, which can ultimately be accesed from a method within the SourceTFR objects. Maybe a first solution for VolumeSourceEstimates would be to plot single frequencies?

Maybe a simpler thing to start with would be to change a bit how stc's are used to store and display frequency-resolved source-space representation (that is before adressing time-frequency). Currently it is a bit hacky as times are used to store frequecies. Do we have some ideas on how to deal with that and forthcoming source-space TFR without a class proliferation that @agramfort mentioned?
One option is to store data dimension types within a single SourceEstimate (for example stc.datadims = ['vert', 'freq']) and make .times and .freqs to work depending on these dimensions. This could even be done as storing an empty standard mne object (like Epochs or TFR or PSD/Spectra [hopefully coming soon]) so one could check isinstance(stc.datadims, TFR) instead of creating multiple sub-classes. I don't have problem with that but some may see it as ugly/hacky.

One option is to store data dimension types within a single SourceEstimate (for example stc.datadims = ['vert', 'freq']) and make .times and .freqs to work depending on these dimensions.

This sounds like we might end up recreating Xarray :)

@larsoner
Yes, I know :) but this would be only to check the sub-class ("what dimensions are available") and not so much to do slicing, dicing and computations. I don't think @agramfort would approve to depend on xarray. :) (although its more mature nowadays and actually a simple function in mne like TFR.to_xarray() would be useful - but that's another story).

I considered this option too. Only two SourceEstimates | VolSourceEstimates
object and a dims attribute which is a tuple that can be 'vert', 'freq',
'time', 'orientation'

>

If we're going to get rid of the Vector/non-vector and time/freq class-based distinctions, we might as well get rid of the surface/volume/mixed class-based distinctions, too. We don't have great support for mixed source spaces and trying to have a general class could help fix this problem.

I propose we come up with a new class that's for any type of source-level data we want to work with. I'd call it SourceData because even the idea of it being an "estimate" is a misnomer (e.g. when simulating you're actually giving ground truth). Here is a first pass at a draft spec for what a SourceData instance sd has and can do:

  • sd.dims tuple of str saying what the data dims correspond to, in order (with optionality; n_orientations always 1 or 3.):
    (n_vertices, n_orientations[, n_epochs][, n_frequencies][, n_times])
  • sd.src for the source space corresponding to the data/vertices (a stripped-down version, as src for surfaces with distances can be hundreds of MB)
  • sd.subject for the subject
  • sd.times and sd.frequencies as ndarray or None
  • sd.units for the data: e.g. 'nAm' (MNE), 'F' (dSPM), or 'AU' (for now; maybe we also can figure out the units for time-freq data if it's properly normalized somehow)
  • sd.orientations either ('normal',) or ('X', 'Y', 'Z')

For methods:

  • sd.get_data() to get an ndarray containing the data for a subset of vertices/epochs/times/freqs. This, rather than always creating a .data attribute, is useful for efficiency. In cases where we have a linear imaging kernel, a lot of stuff could be simplified e.g. iterating over epochs, doing TF transformation, apply kernel, take magnitude, etc. In other words: the default behavior of e.g. apply_inverse is to not have a full sd.data stored in memory.
  • sd.load_data() sd.apply_kernel() to create and populate the sd.data array if people want the full array available and modifiable in sd.data
  • sd.plot_3d() -- hopefully there is some convergence here with @GuillaumeFavelier's work that this will use surface plots for the surface source spaces and some 3D volume rendering for the volume source spaces
  • sd.plot_orthoview() -- plot in 3D volumetric space like our volume plotters currently do
  • sd.save() to HDF5

I think this could be made to work for all variants of *SourceEstimate we currently have, and all of their currently implemented methods.

Incidentally, this get_data + kernel-default business would fix some long-standing issues, such as people wanting to iterate over STC epochs and/or vertices and/or times, depending on the application. The get_data can (hopefully) easily and efficiently calculate only what it needs to in order to return the requested data via slicing properly before the dot/einsum so these things become easier from a user perspective, too.

I like that!

How much backward compatible is this proposal going to be? Isn't SourceEstimate a very old part of MNE? Is this going to break people's scripts?

Isn't SourceEstimate a very old part of MNE?

We'll have to decide how backward compatible we want it to be. The way I would start is by looking at how far we can get by supporting "upsampling" our five existing classes to SourceData. Hopefully at the end of the day our five classes can just be thin wrappers that contain little to no code like:

class SourceEstimate(object):
    ...
    def plot(self):
        return self.to_SourceData(...).plot_3d()  # or maybe plot_pysurfer for the old interface

Maybe we deprecate them with a long cycle, maybe not. But I'm optimistic it's a solvable problem.

Based on the API I outlined above I think it's worth the necessary work to get this more compact-yet-complete interface working. It would simplify a lot of code I've locally written to handle the various SourceEstimate classes, as well as things internally.

okay I see, thin wrappers sounds like a good idea and +1 to a long deprecation cycle. I have neuroscientists complaining to me that the API changes too fast in Python / MNE and things break when they update versions.

This looks like a great solution to me. So this would mean time_frequency functions could run on sourceData at the same speed as on sensor level data, since the apply_inverse part would be performed afterwards, right?
Like this, we could also do without time_frequency functions taking lists as input, since the input could be handled as proper epoch objects...

Yes exactly. Preserve and utilize linearity as long as possible. We have functions like compute_source_epochs_psd that take advantage of linearity for speed purposes under the hood. This class would standardize that practice and keep it in one place.

One other thing to consider is whether we can extend this behavior to things like label time course extraction, too. That is sometimes one more linear step. But even if it's nonlinear, it reduces the dimensions. One maybe easy way to do this is to move away from "vertex" as the first dimension and let it be "location". A "location" could be a single vertex (analogous to SourceEstimate) or more than one vertex (analogous to what you get after extract_label_time_course).

So in the end, get_data() could give a standard source estimate 8196 vertices by 1000 time point array, or something as complicated as the 72 labels by 20 epochs by 50 freqs and 40 time point array (internally stored efficiently as a list of sensor data epoch ffts, imaging kernel, label vertex mapping, and averaging mode).

This would I think "only" in addition to the above require:

  1. Making a list vertices per location. (For SourceEstimate-like SourceData instances, each entry would be a single element.)
  2. Adding an attribute that says how the vertices were (or will be, in a delayed scenario) pooled to give the location activation. For example, 'mean_flip'.

This can be a follow-up PR, but it's good to think of such an additional transformation being tacked on.

We could think about where morphing could fit in, too. A SourceMorph class could be an attribute to take the data (eventually) to a different subject's space.

I think that would cover most of the source space transformations we tend to do anyway.

Or maybe what we want instead of a single monolithic class is a set of modular transformations that can be chained...

Maybe I'll see if I can come up with proposals for the monolithic and modular designs with advantages and disadvantages.

I think we'll have to think on this SourceData API a bit. Let's not aim for doing it before the GSoC work.

Maybe the first thing to start with for GSoC is assume for now that you have some SourceTFR object with dimensions [n_epochs, ]n_sources, n_freqs, n_times. Maybe build such an object.

I think we'll have to think on this SourceData API a bit. Let's not aim for doing it before the GSoC work.

Maybe the first thing to start with for GSoC is assume for now that you have some SourceTFR object with dimensions [n_epochs, ]n_sources, n_freqs, n_times. Maybe build such an object.

Hmm. but there's alot of stuff from the project that would depend on this (e.g. not needing to read lists in time_frequency functions and not having the trouble with dealing with vectorized SourceEstimates).
If i already could start out with a kernel type class, i could spend more time of the project developing this class instead of make time_frequency functions handle SourceEstimates.
If i start out with passing SourceEstimates to time_frequency functions now, a big part of the project might have to be reverted or changed later, when the SourceData is introduced.
But of course I fully agree with you that such a huge API change should be planned carefully over a longer period.

Is there maybe some kind of middle way? @larsoner @agramfort
Maybe i could start out with reading only SourceEstimates that already use the kernel trick until we know exactly what we want to do?

I agree with @larsoner

maybe one option is to draft a SouceData-like class for the time-frequency estiamtes in source space and make the plotting code work (with and without the kernel trick).

once this object is agreed upon in terms of API we can see how to make it possibly a first class citizen in mne-python.

maybe one option is to draft a SouceData-like class for the time-frequency estiamtes in source space and make the plotting code work (with and without the kernel trick).

Sounds good! I'll do that.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kingjr picture kingjr  路  3Comments

Sirabhop picture Sirabhop  路  6Comments

bloyl picture bloyl  路  6Comments

larsoner picture larsoner  路  6Comments

hoechenberger picture hoechenberger  路  6Comments