At some point we've already discussed if adding a generic reader function (mne.io.read_raw) would be useful. I can't find the relevant discussion, but I think that we decided that users should stick to the separate readers.
I've written such a wrapper function for MNELAB (https://github.com/cbrnr/mnelab/pull/148), and I think this might still be a good idea to include in MNE. It makes things easier for users - the file extension already determines the file type, so why do they have to explicitly pick the right function if this could be done automatically?
Of course this is not meant to replace the individual readers - it's just a convenience wrapper that should work for the most common file types.
mne.io.read_raw("test.vhdr")
mne.io.read_raw("test.edf")
mne.io.read_raw("test.bdf")
mne.io.read_raw("test.set")
# ... you get the point
I'm fine if people decide (again) not to include this ("There should be one-- and preferably only one --obvious way to do it"). But in that case, I'd like to ask if anyone has a good idea how to include the true MNE reader function calls in MNELAB's history without creating too big of a mess. Currently, I stick to just using mnelab.io.read_raw for history entries because it keeps the code so much simpler. Suggestions are very much appreciated (either here or at https://github.com/cbrnr/mnelab/pull/148)!
my pb is the heterogeneity of the read_raw_xxx function signatures:
https://mne.tools/stable/python_reference.html#module-mne.io
having one read_raw function would mean having a long list of parameters
and most of them being useless for each user working with one type of format
my 2c
>
My wrapper function just passes them on as *args and **kwargs, so this is not really a problem.
documenting args and *kwargs is hard. So you would keep both the read_raw
and the read_raw_xxx functions?
>
Yes, I would keep them both. That way, we can refer to the documentation in the specific functions, but also have a wrapper that's easier to use for many use cases (this is based on feedback I got from a few people who find it cumbersome to look up the name of the reader when all they want is just to load the data).
documenting args and *kwargs is hard.
I agree, but I'm open to this idea if a suitable docstring could be concocted. @cbrnr do you have ideas for what the docstring of read_raw() would look like?
I have to think about it, but first I would explicitly mention that read_raw is a wrapper for all those other functions that do the actual work. I'd also include all supported file formats that can be read and most important parameters that most/all read_raw_xxx functions have in common (e.g. preload). Then I'd refer people to the corresponding read_raw_xxx functions for more specific parameters. Sure you can argue that if we refer people to specific readers we could just as well continue to have them use these readers, but I think read_raw will work out of the box for many use cases.
I would very much welcome this feature!
I like this idea very much. And as @cbrnr says, keeping both is not really a problem. Actually the documentation of the current function could be linked somehow in the doc string of a general function.
My wrapper function just passes them on as
*argsand**kwargs, so this is not really a problem.
Editor autocompletion hates this trick 😅💩 This is my only concern here. It's what makes Matplotlib a pain to use for me.
Could we add the union of all args of all reader functions as kwargs, and document them accordingly?
Could we add the union of all args of all reader functions as kwargs, and document them accordingly?
I'd rather not, it makes it so that in the docs we have to say which formats each arg applies to, anytime we change a reader we have to change it in two places, we need to ensure that all and only the correct args/kwargs actually get used / set by the user (whereas passing *args, **kwargs directly does this for us). If people want nice introspection and to understand all of their options easily, I think they should use the dedicated reader.
I'd rather not, it makes it so that in the docs we have to say which formats each arg applies to, anytime we change a reader we have to change it in two places, we need to ensure that all and only the correct
args/kwargsactually get used / set by the user (whereas passing*args, **kwargsdirectly does this for us). If people want nice introspection and to understand all of their options easily, I think they should use the dedicated reader.
I do see your concern, but I'd simply pass all accepted args to the respective reader, and silently drop all others.
Also – do reader function APIs change _this often_ really? :)
you know my position on this. But I am just one person. I will not veto
anything if there is a consensus.
Is it really possible that a user does not (want to) know the format he
uses?
>
Is it really possible that a user does not (want to) know the format he uses?
This will hopefully only be needed during a transition period until everybody has switched all their data to BIDS: mne-bids picks the correct reader automagically!
(Hey, just allow me to _dream_ for a bit, would you? ☺️)
Is it really possible that a user does not (want to) know the format he
uses?
It's a bit beyond this. Sometimes I'm testing some code, and I'd rather use CTF than FIF data. Instead of just switching a filename, I need to switch the reading call, too. If you want to build generic code like some small demo pipelie, it's more things to change than what could otherwise be just a single path (e.g., for resting-state analyses).
Yes, it's mainly a matter of practicality. Of course we can all live with using dedicated readers, but for many users it will be a bit less effort to use the generic reader. For example, to read .mff files people have to use the read_raw_egi function - not entirely clear at first glance. So although not essential, a generic reader is a nice bonus that can potentially be used almost all the time. And if it doesn't work, there are still the dedicated reader functions.
again if you are all convinced ...
I'm not convinced until I see a concrete proposal for what the docstring will actually say. @cbrnr described it in general terms here but I think there's value in someone actually writing it out: to make sure we didn't overlook something, to give ourselves a chance to foresee problems with docstring-related CIs, etc.
I'll come up with something concrete soon. I'm not sure what you mean by docstring-related CI problems though - I wouldn't use any templates/placeholders/some other sphinx-related shortcuts, just plain old text describing the function.
I'm not sure what you mean by docstring-related CI problems
I'm mostly thinking about the CI that checks that the list of Parameters is in the same order as the parameters given in the function / method signature. I don't know how it behaves with *args, **kwargs-type params.
We also have a test that checks maximum docstring length, but that can be overridden when needed.
I don't know how it behaves with args, *kwargs-type params.
FWIW I don't think we've settled on whether this approach (my preference) or the include-all-arguments-from-all-readers approach should be taken (@hoechenberger's preference)
args, *kwargs is fine with me if document things nicely :)
args, *kwargs is fine with me if document things nicely :)
To me the docstring should be something like:
fname : str
The filename.
*args : list
Arguments to pass to the underlying reader.
For details, see the arguments of the reader for your
underlying file format.
**kwargs : dict
Keyword arguments to pass to the underlying reader.
Returns
-------
...
See Also
--------
... <list of all readers > ...
If it's more than this, then there is not much point in having *args, **kwargs as opposed to actual completed arguments. So really I think the documentation and presentation of *args, **kwargs are / should be tied together.
👍 for @larsoner's approach and suggested docstring. fname really seems to be the only parameter that needs to be described explicitly.
agreed; I would also add an inital section before Params explaining that this is a convenience wrapper for the filetype-specific readers.
+1 to all of that.
would you expose at least the verbose and preload option to read_raw
without kwargs?
these are supported by all readers.
You're right, we should explicitly mention fname, preload, and verbose.
if we specify read_raw(fname, preload, verbose, *args, **kwargs) we might be setting up expectations about order of the function signature that is different than is actually found in the individual readers (i.e., verbose is usually at/near the end of the parameter list). I guess that's probably OK? If anyone needs to switch from using the generic reader to using a specific one, presumably they'll look at its docstring first and either reorder their unnamed args, or turn them into kwargs instead.
What about read_raw(fname, *args, preload=False, verbose=None, **kwargs)? Or use kw-only args (everything except fname)?
Or just single out fname, preload, and verbose from the list of possible parameters contained in *args or **kwargs? That is, by correctly phrasing the docstring we might be able to prevent setting up wrong expectations.
@cbrnr yeah, one of those options should suffice. I think I lean toward read_raw(fname, preload=False, verbose=None, **kwargs) even though it could still imply different order from the specific readers. But as I said, that's probably fine; people can read docstrings (indeed, we're basically forcing them to read the docstrings of the more specific readers if they want to do anything fancy).
I think I lean toward
read_raw(fname, preload=False, verbose=None, **kwargs)
ditto
Yep, I think we can expect people to be able to read docstrings (same argument I made in #7800).
Where should we include the read_raw function? mne/io/__init__.py, mne/io/base.py, a new module (e.g. mne/io/read_raw.py) or somewhere else?
Probably a new mne/io/_read_raw.py -- it will need to import all other readers at the top, so a separate module will help avoid circular imports
Most helpful comment
would you expose at least the verbose and preload option to read_raw
without kwargs?
these are supported by all readers.