Is there a way to pull out the annotations of an EDF file without having to read in the whole file?
Right now what I am doing is running the following commands:
foo = mne.io.read_raw_edf(file, stim_channel=auto,preload = True)
temp = mne.io.get_edf_events(foo)
given the way the annotations are constructed, the file must be loaded in order to pull off the annotations.
I agree that it should not be necessary to preload. PR welcome
As @teonbrooks has already commented, the whole file must be read in order to parse the annotations. These are stored in a special channel and not in an event table like e.g. in GDF files at the beginning or end of the file. Therefore, this issue should be closed because it cannot be implemented.
the whole file must be read in order to parse the annotations.
The single channel from the whole file must be loaded, yes. However, it is sometimes faster and almost always more memory efficient to read a single channel of data in MNE than it is to read the entire thing. So this still seems like a reasonable request. If preload is False then we can still do something internally similar to raw[annot_pick][0] to get the channel with lower overhead than preload=True.
Can't we set the exclude argument accordingly to only load the stim channel?
I suppose so. I'd consider that a more of a workaround than a permanent solution, though. Forcing someone to have two copies of their raw instance, one that they had to construct with a exclude=np.setdiff1d(np.arange(len(raw.ch_names)), raw.ch_names.index('annotation_ch')) or so is a pretty clunky solution to what we should be able to solve internally.
True, but I'm still not sure if this is really necessary in the first place unless you throw some convincing numbers at me (e.g. a decent amount of reduced absolute memory usage and/or speed increase). I haven't come across a large EDF file where memory usage/load times was an issue.
I assumed that this issue was opened because it was problematic. Can you explain why preload=True is bad for you @sbang002 ?
@larsoner Sbang002 was an intern working with me but he finished up, hence the lack of reply.
Just to get back to you, our lab works with sleep records, and loading in 8 hrs+ worth of EEG to read of the sleep stages (encoded as edf annotations) is time-consuming and memory intensive. Loading annotations only would speed up our analysis by a large margin, however, I understand if this is a low priority fix.
we have code for this and we need to clean it up. It's part of the
Annotations refactoring plan by @massich
we extensively used this code in this work:
https://ieeexplore.ieee.org/document/8307462/
if you can share the files you want to read more efficiently we could use
them as test cases.
I understand if this is a low priority fix
@bdyetton I don't think is a low priority fix. It is actually on the top of my list. I'll be happy to help. And if you can share some data and we make an example out of it, it would be awesome.
plus, we will need code reviews when we tackle it.
Great, I'll gather up a few edf+ and edf++ files for you.
Here ya go:
https://drive.google.com/open?id=186eklftvEB2FeLLxHPHUjZxe2iPvJ1f7
This a selection of EDF files from a bunch of open source sleep datasets, some are edf, others are edf+ and edf++.
thanks heaps !
we'll improve the support of IO of annotations in the EDF reader and
we'll then take a stab at this.
@bdyetton have you checked #5718 #5699 and what this test is doing?
at this point mne-python/master should be able to do what you need.
I'm closing this issue, if there's anything that does not work feel free to reopen or open a new one.