Mne-python: ENH: change _segment_raw to make_fixed_length_epochs

Created on 3 Jul 2017  Â·  31Comments  Â·  Source: mne-tools/mne-python

from mne.datasets import testing
import os.path as op
from mne.io import read_raw_fif
from mne import Epochs, make_fixed_length_events
from numpy.testing import assert_equal

elekta_base_dir = op.join(testing.data_path(download=False), 'misc')
fname_raw_elekta = op.join(elekta_base_dir, 'test_elekta_3ch_raw.fif')

def test_shape():
    raw = read_raw_fif(fname_raw_elekta, preload=True)
    sfreq = raw.info['sfreq']
    events = make_fixed_length_events(raw, id=1, duration=20.)
    epochs = Epochs(raw, events, tmin=0., tmax=20.)
    epochs_ok = Epochs(raw, events, tmin=0, tmax=20. - 1./sfreq)
    data = epochs.get_data()
    data_ok = epochs_ok.get_data()
    assert_equal(data.shape, (4, 4, 20000))
    assert_equal(data_ok.shape, (4, 4, 20000))

if __name__ == '__main__':

    test_shape()

The following test fails because of the first assert_equal.
Here, the raw object contains data with 4 channels and 80 000 samples (sfreq = 1000Hz).
I want to epoch the raw data and have epochs which are all 20s long. For that, I used the make_fixed_length_events function. Intuitively, I expected the shape of the epoched data to be (4, 4, 20000). However, it is actually (3, 4, 20001). It seems that there is one redundant sample. This issue (if it is really one), is easily corrected by replacing tmax=20. with tmax=20. - 1./sfreq in Epochs.

Is this happening because I made a mistake somewhere ?

EASY

Most helpful comment

ok I slept over it and I think I can live with:

make_fixed_length_epochs

which would be a good public name for _segment_raw

and we keep the rest untouched ok?

All 31 comments

Epoching is inclusive, so the sample number 0 is also included. I think it's stated somewhere in the docs, if not, it should.

Its a bug of make_fixed_length_events for me

Alex

I agree it looks like a bug - everybody so far has been using make_fixed_length_events the same way as @jbschiratti above (to segment raw data, without subtracting one sample from epoch length) so they all have been duplicating some samples. It is performed the same way in _segment_raw for example.

On the other hand - make_fixed_length_events does what it advertises (creates events with fixed spacing) and the bug is actually in the interaction of these events with mne.Epochs inclusiveness. So it only occurs when someone wants to use the evets from make_fixed_length_events in epoching; if one uses them for slicing data matrix for example - everything is fine.
Therefore another way of resolving this problem would be to fix _segment_raw, make it public and advertise to use it for purposes of segmentation instead of make_fixed_length_events (and add a note in make_fixed_length_events about that). :smiling_imp:

how about just removing one sample in make_fixed_length_events ?

@agramfort
That would not cause mne.Epochs to stop being inclusive, so samples would still be duplicated (even more actually) and epoch length would still be surprising - as long as the same duration is passed to make_fixed_length_events and mne.Epochs. Adding a sample in make_fixed_length_events would not make duplication happen but would still lead to epochs longer than expected (due to mne.Epochs inclusiveness).
Removing (or adding) a sample in make_fixed_length_events could also lead to other problems - if someone is using make_fixed_length_events to do something else than epoching (like taking slices of data - in which case one would get too short segments). And also - the difference between event times would not be equal to the requested duration.
I think that fixing _segment_raw and making it public is really the easiest solution.

I don't get why it would not fix the problem to fix make_fixed_length_events

we should fix people's code rather than teach them to use something
different.

there should be only one obvious way of doing things.

(Here is what I tried) In the code I provided, because raw.first_samp is 3000 and raw.last_samp is 82999, the events array (as given by make_fixed_length_events) is :

[[ 3000     0     1]
 [23000     0     1]
 [43000     0     1]
 [63000     0     1]]

Changing it to

[[ 3000     0     1]
 [22999     0     1]
 [42998     0     1]
 [62997     0     1]]

does not seem to completely solve the problem because data.shape changes from (3, 4, 20001) to (4, 4, 20001). However, we still get a "redundant sample" in the end.

@agramfort
Maybe I misunderstood what you were suggesting.
From my point of view there are two problems here: one is duplication of samples (ending sample of epoch 1 is the starting sample of epoch 2) another is unexpected length (duration + 1 sample). If you remove one sample in make_fixed_length_events then the shape will be ok, but the duplication will still be there.

Let's take a signal that has sfreq=2 and so the times of the first 6 samples are:
0, 0.5, 1, 1.5, 2, 2.5
When one asks make_fixed_length_events for events separated by one second (duration=1.) they will get events at times:
0, 1, 2
This is ok, but using these events in Epochs along with tmax the same as duration will produce following segmantation:
[0, 0.5, 1], [1, 1.5, 2]
The segment length is longer than 1 second, sample 1 is used twice and sample 2.5 is excluded. This is because mne.Epochs does not perform pythonic slice but includes the last sample too - at least to my understanding).

Now, if make_fixed_length_events subtracted one sample from duration the events returned in this example would be:
0, 0.5, 1, 1.5, 2, 2.5
and segmentation, given tmax is set as the duration (most common case) will be then:
[0, 0.5, 1], [0.5, 1, 1.5], [1, 1.5, 2]
now epochs are still longer than expected and more samples are duplicated.

The most common scenario is that people pass the same duration to make_fixed_length_events and Epochs, so we wont be able to fix all problems by just changing make_fixed_length_events.

If you remove one sample in make_fixed_length_events then the shape will be ok, but the duplication will still be there.

should be: "will still not be ok and the duplication will still be there".

>

If you remove one sample in make_fixed_length_events then the shape will
be ok, but the duplication will still be there.

how about:

raw = read_raw_fif(fname_raw_elekta, preload=True)
events = make_fixed_length_events(raw, id=1, duration=20.)
events[:, 0] += np.arange(len(events))
epochs = Epochs(raw, events, tmin=0., tmax=20., baseline=None)
raw_data = raw[:, :][0]
epochs_data = np.concatenate(epochs.get_data(), axis=1)
assert_array_equal(raw_data[:, :60000], epochs_data[:, :60000])

is that good enough?

@agramfort:
adding one sample works for overlapping samples, but the segment length should still be 1 sample longer than duration, right?

it would not change the segment length but I think that in practice +1 or
-1 samples on an epoch is no big deal. I am more afraid of people doing

epochs_data = np.concatenate(epochs.get_data(), axis=1)

on epochs to recover the raw data.

I meant that for sfreq=250 and tmin=0., tmax=1. the epoch length is 251 samples. I don't think it is big deal in general, but in case of segmentation people would probably prefer to have it more precisely match duration if it is possible. See the example in the first post in this issue:

I expected the shape of the epoched data to be (4, 4, 20000). However, it is actually (3, 4, 20001). It seems that there is one redundant sample.

Because each epoch was one sample longer, whole 20 000 sample segment is dropped.
If you are opposed to making segment_raw public we could modify examples to preach something like:

duration = 1.
sfreq = raw.info['sfreq']
events = make_fixed_length_events(raw, id=1, duration=duration)
if int(sfreq * duration) == sfreq * duration: # if whole number of samples in given duration
    duration -= 1 / sfreq * 0.001 # make sure we do not include the tmax sample
epochs = Epochs(raw, events, tmin=0., tmax=duration, baseline=None)

this should avoid overlap and give adequate segment length, but this is getting complex and such implementation details should rather be hidden from the user than required him to remember or check online each time.

>

I meant that for sfreq=250 and tmin=0., tmax=1. the epoch length is 251
samples. I don't think it is big deal in general, but in case of
segmentation people would probably prefer to have it more precisely match
duration if it is possible.

that's a problem we cannot fix without changing Epochs. You already have
this problem with Epochs. What I am suggesting is just to fix by 1 sample
the output of make_fixed_length_events so you are less likely to be bitten
but using simple code. At least that's my opinion. If you make segment_raw
public we need to duplicate a lot of API code.

@dengemann any thoughts on this?

that's a problem we cannot fix without changing Epochs. You already have
this problem with Epochs

I don't think it is a problem in general - just in the case of segmentation. segment_raw is already present in the codebase and can be made fit to the special case of segmentation, I am not sure what is duplicated when making it public.

In general - I understand your prespective, but if make_fixed_length_events always added one sample to the duration then it may be surprising in other cases (using the event samples for slicing for example). Then - the problem with sample overlap is not present when the duration contains for example 125.8 samples (right?), so always adding a sample may lead to other inconsistencies like loosing samples.

ok I slept over it and I think I can live with:

make_fixed_length_epochs

which would be a good public name for _segment_raw

and we keep the rest untouched ok?

Ok, I can start a PR soon that fixes _segment_raw and adds make_fixed_length_epochs

sounds like a plan !

@mmagnuski I just debriefed with @agramfort and I feel I also want to sleep over it. Let me see if we can find a clean way to avoid creating a monster function that takes the union of make_fixed_length_events and Epochs ...

@dengemann Sure, in the meantime I can fix _segment_raw (which is pretty small , I wouldn't call it monster function :) ).

it's not a monster as it has **kwargs which we never use for public
functions. We will need to duplicate all Epochs parameters...

So it's a monster :)
On Wed, 5 Jul 2017 at 21:34, Alexandre Gramfort notifications@github.com
wrote:

it's not a monster as it has **kwargs which we never use for public
functions. We will need to duplicate all Epochs parameters...

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/mne-tools/mne-python/issues/4367#issuecomment-313203999,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0fig51uEK0PcUBvKoiVIqw8BDX8WxVks5sK-U4gaJpZM4OMJ3b
.

Ah, ok, that's the monstrocity. :)

conclusion is to expose _segment_raw as make_fixed_length_epochs

@mmagnuski volunteer?

Sure, glad we all agree. :)

I would however reconsider the function name because all epochs in mne have "fixed length" - here we are talking about adjacent segments/epochs.

Did someone make a PR for this issue ?

no AFAIK
it remains to be done...

Sorry, I forgot about it. I'll take a stab at it during the weekend.

Hello, the code above runs fine. Is there still any issue to work on?
The _segment_raw function already uses the make_fixed_length_event as shown: _segment_raw

yes given how often this is requested I think that doing a make_fixed_length_epochs that does not expose most of the epochs param (just bare minimum) can be useful to users.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Sirabhop picture Sirabhop  Â·  6Comments

Mehrabkhazraei picture Mehrabkhazraei  Â·  7Comments

kingjr picture kingjr  Â·  3Comments

timonmerk picture timonmerk  Â·  5Comments

hoechenberger picture hoechenberger  Â·  6Comments