Scikit-image: Consider flattening skimage namespace

Created on 14 Jul 2015  Â·  33Comments  Â·  Source: scikit-image/scikit-image

Lately I've been finding the API rather cumbersome. Is canny in features or filters? Is watershed in segmentation or morphology? Where is relabel_sequential?

I suggest importing everything to the root skimage namespace, coupled perhaps with an aliasing convention, e.g. import skimage as ski or import skimage as im. (Yes, im is very common and I'm not advocating for it... But you gotta admit edges = im.sobel(image) would be pretty nifty. =)

needs decision discussion

All 33 comments

I'm against it. The inconvenience you speak of is very minor and would take only a minute of the programmer's time to overcome.
On the other hand
Consider these two snippets

from skiamge import filters
filters.sobel (img)
import skimage as ski
ski.sobel (img)

The person reading the first snippet, would know about the filters package and would find other filtering methods inside it. He might try a few of those to get better results.

The person reading the second piece has to spend more time hunting for filters.

My main concern is an increasingly crowded namespace, which will only become less intuitive to explore. User's thought process: _Is loading an image load? I'm looking under "L" - nothing - perhaps read - no, nothing in "R" - maybe open - still no. Here it is, weird, it's prefixed with "im"..._

However, if the import time was reasonable, I would be fine with importing all packages by default but leaving them at the various levels. Then someone introspecting skimage. has a better sense of what is available. My sense was that the import time was an issue, though.

Hopefully @stefanv can chime in with the original design goals.

We could add an indexed search function at the top level that will greedily import and check docstrings for a search string, or optionally just search function names if desired.

I like @blink1073's idea. My point is that the search is very inefficient when it requires switching back to a browser (different workspace, for me), waiting for search results, scrolling to the right part of the API doc... If I could quickly search within IPython I would be happy.

Two other options for consideration:

  • skimage.directory.foo would contain a string with the full package path to the foo function. Then it's just a matter of typing skimage.directory.rela<tab><enter> to find out that relabel_sequential is in segmentation (for some reason) (yes I'm aware I put it there. =P).
  • skimage.all.func could be an _optional_ namespace containing all of skimage.

skimage.all / skimage.directory sounds like a good idea to me. Maybe with
the option to output the actual package path.

On Tue, Jul 14, 2015 at 1:05 PM, Juan Nunez-Iglesias <
[email protected]> wrote:

I like @blink1073 https://github.com/blink1073's idea. My point is that
the search is very inefficient when it requires switching back to a browser
(different workspace, for me), waiting for search results, scrolling to the
right part of the API doc... If I could quickly search within IPython I
would be happy.

Two other options for consideration:

  • skimage.directory.foo would contain a string with the full package
    path to the foo function. Then it's just a matter of typing
    skimage.directory.rela to find out that relabel_sequential
    is in segmentation (for some reason) (yes I'm aware I put it there.
    =P).
  • skimage.all.func could be an _optional_ namespace containing all of
    skimage.

—
Reply to this email directly or view it on GitHub
https://github.com/scikit-image/scikit-image/issues/1608#issuecomment-121366273
.

I'm also not in favor of flattening the API. The use pattern that I like most is from skimage import filters. It's true that I go to the API doc page several times a day because I'm not sure of the location of a function.

Should we consider instead to merge the most problematic submodules? Somehow I'm always hesitating between filters and restoration for denoising filters, and I'm always searching for remove_small_objects in segmentation whereas it's in morphology. Other submodules are really straightforward to deal with (eg io, color, data), in my opinion.

A possible solution would be to import the same function in several submodules. It makes sense to have canny both in filters and in features. Would that be a problem?

Functions are searched better in a web browser. It does all that and better. @jni and @blink1073 are trying to do too much behind the scene IMHO.

I'm talking from a pure REPL perspective.

(Especially as someone who has worked in a lab with no internet access).

If you type "segmentation" in the API search box you get everything related to segmentation.

@vighneshbirodkar, that is assuming you have built the docs locally or have an internet connection.

@blink1073. You can still search the API offline if you browse the docs. What you and @jni are trying to do will be better accomplished with a new module. Like something that can explore any package and aid its exploration

@emmanuelle I think there is still room for confusion even with a few merges, e.g. relabel_sequential and remove_small_objects could both be in util.

@vighneshbirodkar there is an argument for convenience. Neither directory nor all are cumbersome to make, and could be automated. As to making a more general package, I think it would be very tricky to cover all different package structures that are out there. They also help with tab-completion, which would be difficult to do with a new package.

I've certainly opened a big-ass can of worms here! =P

@jni and how about redundant imports?

@jni But are we willing to sacrifice readability? Reading code will be like reading a book without chapter names. Grouping functions into modules makes the programmer want to explore the module he is using. A person reading skimage.all.BRIEF will have to put in more effort to find out about other features like DAISY or CENSURE.

I think reading takes more importance over writing. That's also one of the philosopies behind Python. I maybe wrong, but I think attracting new users is important. And a newbie spends a lot more time reading code, unlike everyone on this thread.


I'd like to recount one of my own experiences. Yeas ago me and my friends had just started with image processing. We were using matlab and somewhere in our flow we were doing thresolding. We used im2bw and struggled a lot, till after much searching we found graythresh. If matlab supported namespaces, we would have saved a lot of critical time.


There's a balance to be struck here. Simply saying "it's online, look it up" forces a context switch which is very costly in terms of productivity and can be quite frustrating. One of the most frustrating tutorials I've ever attended at a SciPy conference was for a package which had poor docstrings but good HTML docs.

However, a flat namespace gets troublesome fast. I still think - efficiency aside - the best approach, if we change anything, would be to _import everything intact in the current structure_. So you look in skimage and find the various submodules, which can be themselves inspected, etc.

For any marvel fans out here
captain america civil war

Ah, Juan, good one--you almost had me there.

@stefanv ROTFL nice. Do you also veto all of the other mooted options? =) To recap: (1) skimage.directory, (2) skimage.all, (3) redundant imports, and (4) a whole new third package to search for functions within Python packages.

Let's tie with docrepr and have it launch in a web browser: https://github.com/spyder-ide/docrepr. That is in scope for scikit-image, right?

How about we borrow numpy's lookfor functionality?

We may have to make a PR to numpy to support using our own cache. Until then, we can maintain a copy.

Example usage:

In [2]: np.lookfor('interpolate')

Search results for 'interpolate'
--------------------------------
numpy.interp
    One-dimensional linear interpolation.
numpy.polyfit
    Least squares polynomial fit.
numpy.histogram2d
    Compute the bi-dimensional histogram of two data samples.
numpy.ma.polyfit
    Least squares polynomial fit.
numpy.polynomial.Hermite._fit
    Least squares fit of Hermite series to data.
numpy.polynomial.HermiteE._fit
    Least squares fit of Hermite series to data.
numpy.polynomial.Laguerre._fit
    Least squares fit of Laguerre series to data.
numpy.polynomial.Legendre._fit
    Least squares fit of Legendre series to data.
numpy.polynomial.Chebyshev._fit
    Least squares fit of Chebyshev series to data.
numpy.polynomial.Polynomial._fit
    Least-squares fit of a polynomial to data.

+1 for lookfor. "Polluting" the global namespace is a bad idea imo.

Oh, huh, np.lookfor('rgb', 'skimage') works as expected. We could just create a version that adds skimage by default for the module.

Awesome. Didn't know about this function at all!

In addition to including our own lookfor (maybe), I would still support the creation of a flat directory so that function names could be tab-completed.

As long as that namespace does not get imported by default--because that
would ramp up our default loading time. (Alternatively, find a
workaround for that problem.)

@blink1073 that looks pretty neat! If I'm reading it right, it's lazy but allows tab-completion _including in subpackages_ which would allow us to maintain the current submodule groupings.

Edit: Bonus points for providing the ability to use it without even adding a dependency.

That is my impression as well.

http://www.pyimagesearch.com/2015/08/31/how-to-find-functions-by-name-in-opencv/ we re not the only package where it s difficult to find functions :-)

apipkg is dead, and np.lookfor seems to get us almost all the way there. I'm +1 on adding skimage.lookfor, let's discuss on #2426

Was this page helpful?
0 / 5 - 0 ratings