Mne-python: Add generic reader function?

Created on 12 May 2020 · 35Comments · Source: mne-tools/mne-python

At some point we've already discussed if adding a generic reader function (mne.io.read_raw) would be useful. I can't find the relevant discussion, but I think that we decided that users should stick to the separate readers.

I've written such a wrapper function for MNELAB (https://github.com/cbrnr/mnelab/pull/148), and I think this might still be a good idea to include in MNE. It makes things easier for users - the file extension already determines the file type, so why do they have to explicitly pick the right function if this could be done automatically?

Of course this is not meant to replace the individual readers - it's just a convenience wrapper that should work for the most common file types.

mne.io.read_raw("test.vhdr")
mne.io.read_raw("test.edf")
mne.io.read_raw("test.bdf")
mne.io.read_raw("test.set")
# ... you get the point

I'm fine if people decide (again) not to include this ("There should be one-- and preferably only one --obvious way to do it"). But in that case, I'd like to ask if anyone has a good idea how to include the true MNE reader function calls in MNELAB's history without creating too big of a mess. Currently, I stick to just using mnelab.io.read_raw for history entries because it keeps the code so much simpler. Suggestions are very much appreciated (either here or at https://github.com/cbrnr/mnelab/pull/148)!

Source

cbrnr

Most helpful comment

would you expose at least the verbose and preload option to read_raw
without kwargs?
these are supported by all readers.

agramfort on 19 May 2020

👍2

All 35 comments

my pb is the heterogeneity of the read_raw_xxx function signatures:

https://mne.tools/stable/python_reference.html#module-mne.io

having one read_raw function would mean having a long list of parameters
and most of them being useless for each user working with one type of format

my 2c

agramfort on 12 May 2020

My wrapper function just passes them on as *args and **kwargs, so this is not really a problem.

cbrnr on 12 May 2020

documenting args and *kwargs is hard. So you would keep both the read_raw
and the read_raw_xxx functions?

agramfort on 12 May 2020

Yes, I would keep them both. That way, we can refer to the documentation in the specific functions, but also have a wrapper that's easier to use for many use cases (this is based on feedback I got from a few people who find it cumbersome to look up the name of the reader when all they want is just to load the data).

cbrnr on 12 May 2020

documenting args and *kwargs is hard.

I agree, but I'm open to this idea if a suitable docstring could be concocted. @cbrnr do you have ideas for what the docstring of read_raw() would look like?

drammock on 12 May 2020

I have to think about it, but first I would explicitly mention that read_raw is a wrapper for all those other functions that do the actual work. I'd also include all supported file formats that can be read and most important parameters that most/all read_raw_xxx functions have in common (e.g. preload). Then I'd refer people to the corresponding read_raw_xxx functions for more specific parameters. Sure you can argue that if we refer people to specific readers we could just as well continue to have them use these readers, but I think read_raw will work out of the box for many use cases.

cbrnr on 12 May 2020

❤1

I would very much welcome this feature!

hoechenberger on 17 May 2020

I like this idea very much. And as @cbrnr says, keeping both is not really a problem. Actually the documentation of the current function could be linked somehow in the doc string of a general function.

dengemann on 17 May 2020

👍1

My wrapper function just passes them on as *args and **kwargs, so this is not really a problem.

Editor autocompletion hates this trick 😅💩 This is my only concern here. It's what makes Matplotlib a pain to use for me.

Could we add the union of all args of all reader functions as kwargs, and document them accordingly?

hoechenberger on 18 May 2020

Could we add the union of all args of all reader functions as kwargs, and document them accordingly?

I'd rather not, it makes it so that in the docs we have to say which formats each arg applies to, anytime we change a reader we have to change it in two places, we need to ensure that all and only the correct args/kwargs actually get used / set by the user (whereas passing *args, **kwargs directly does this for us). If people want nice introspection and to understand all of their options easily, I think they should use the dedicated reader.

larsoner on 18 May 2020

I'd rather not, it makes it so that in the docs we have to say which formats each arg applies to, anytime we change a reader we have to change it in two places, we need to ensure that all and only the correct args/kwargs actually get used / set by the user (whereas passing *args, **kwargs directly does this for us). If people want nice introspection and to understand all of their options easily, I think they should use the dedicated reader.

I do see your concern, but I'd simply pass all accepted args to the respective reader, and silently drop all others.

Also – do reader function APIs change _this often_ really? :)

hoechenberger on 18 May 2020

you know my position on this. But I am just one person. I will not veto
anything if there is a consensus.
Is it really possible that a user does not (want to) know the format he
uses?

agramfort on 18 May 2020

Is it really possible that a user does not (want to) know the format he uses?

This will hopefully only be needed during a transition period until everybody has switched all their data to BIDS: mne-bids picks the correct reader automagically!

(Hey, just allow me to _dream_ for a bit, would you? ☺️)

hoechenberger on 18 May 2020

Is it really possible that a user does not (want to) know the format he
uses?

It's a bit beyond this. Sometimes I'm testing some code, and I'd rather use CTF than FIF data. Instead of just switching a filename, I need to switch the reading call, too. If you want to build generic code like some small demo pipelie, it's more things to change than what could otherwise be just a single path (e.g., for resting-state analyses).

larsoner on 18 May 2020

Yes, it's mainly a matter of practicality. Of course we can all live with using dedicated readers, but for many users it will be a bit less effort to use the generic reader. For example, to read .mff files people have to use the read_raw_egi function - not entirely clear at first glance. So although not essential, a generic reader is a nice bonus that can potentially be used almost all the time. And if it doesn't work, there are still the dedicated reader functions.

cbrnr on 18 May 2020

👍1

again if you are all convinced ...

agramfort on 18 May 2020

I'm not convinced until I see a concrete proposal for what the docstring will actually say. @cbrnr described it in general terms here but I think there's value in someone actually writing it out: to make sure we didn't overlook something, to give ourselves a chance to foresee problems with docstring-related CIs, etc.

drammock on 18 May 2020

I'll come up with something concrete soon. I'm not sure what you mean by docstring-related CI problems though - I wouldn't use any templates/placeholders/some other sphinx-related shortcuts, just plain old text describing the function.

cbrnr on 18 May 2020

👍1

I'm not sure what you mean by docstring-related CI problems

I'm mostly thinking about the CI that checks that the list of Parameters is in the same order as the parameters given in the function / method signature. I don't know how it behaves with *args, **kwargs-type params.

We also have a test that checks maximum docstring length, but that can be overridden when needed.

drammock on 18 May 2020

I don't know how it behaves with args, *kwargs-type params.

FWIW I don't think we've settled on whether this approach (my preference) or the include-all-arguments-from-all-readers approach should be taken (@hoechenberger's preference)

larsoner on 18 May 2020

args, *kwargs is fine with me if document things nicely :)

hoechenberger on 18 May 2020

args, *kwargs is fine with me if document things nicely :)

To me the docstring should be something like:

fname : str
    The filename.
*args : list
    Arguments to pass to the underlying reader.
    For details, see the arguments of the reader for your
    underlying file format.
**kwargs : dict
    Keyword arguments to pass to the underlying reader.

Returns
-------
...

See Also
--------
... <list of all readers > ...

If it's more than this, then there is not much point in having *args, **kwargs as opposed to actual completed arguments. So really I think the documentation and presentation of *args, **kwargs are / should be tied together.

larsoner on 18 May 2020

👍 for @larsoner's approach and suggested docstring. fname really seems to be the only parameter that needs to be described explicitly.

cbrnr on 18 May 2020

agreed; I would also add an inital section before Params explaining that this is a convenience wrapper for the filetype-specific readers.

drammock on 18 May 2020

+1 to all of that.

hoechenberger on 18 May 2020

would you expose at least the verbose and preload option to read_raw
without kwargs?
these are supported by all readers.

agramfort on 19 May 2020

👍2

You're right, we should explicitly mention fname, preload, and verbose.

cbrnr on 19 May 2020

👍1

if we specify read_raw(fname, preload, verbose, *args, **kwargs) we might be setting up expectations about order of the function signature that is different than is actually found in the individual readers (i.e., verbose is usually at/near the end of the parameter list). I guess that's probably OK? If anyone needs to switch from using the generic reader to using a specific one, presumably they'll look at its docstring first and either reorder their unnamed args, or turn them into kwargs instead.

drammock on 19 May 2020

What about read_raw(fname, *args, preload=False, verbose=None, **kwargs)? Or use kw-only args (everything except fname)?

cbrnr on 19 May 2020

Or just single out fname, preload, and verbose from the list of possible parameters contained in *args or **kwargs? That is, by correctly phrasing the docstring we might be able to prevent setting up wrong expectations.

cbrnr on 19 May 2020

@cbrnr yeah, one of those options should suffice. I think I lean toward read_raw(fname, preload=False, verbose=None, **kwargs) even though it could still imply different order from the specific readers. But as I said, that's probably fine; people can read docstrings (indeed, we're basically forcing them to read the docstrings of the more specific readers if they want to do anything fancy).

drammock on 19 May 2020

I think I lean toward read_raw(fname, preload=False, verbose=None, **kwargs)

ditto

hoechenberger on 19 May 2020

Yep, I think we can expect people to be able to read docstrings (same argument I made in #7800).

cbrnr on 19 May 2020

Where should we include the read_raw function? mne/io/__init__.py, mne/io/base.py, a new module (e.g. mne/io/read_raw.py) or somewhere else?

cbrnr on 21 May 2020

Probably a new mne/io/_read_raw.py -- it will need to import all other readers at the top, so a separate module will help avoid circular imports

larsoner on 21 May 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Create canonical template channel locations

cbrnr · 43Comments

Trouble installing Spyder into MNE conda environment

hoechenberger · 56Comments

ENH: Power line noise filtering - spectrum interpolation and zapline

DominiqueMakowski · 32Comments

Are projections applied when fitting ICA?

cbrnr · 64Comments

Improve documentation installation info

choldgraf · 42Comments