Distributed: Include NumPy BLAS/LAPACK info in client.get_versions()

Created on 9 Mar 2018  Â·  13Comments  Â·  Source: dask/distributed

At the risk of overloading client.get_versions() with info, it would be handy to be able to check the NumPy BLAS/LAPACK linkage in here. This can be really helpful when debugging a slow computation or a very strange segfault that might be BLAS or LAPACK related. One way at this info is numpy.__config__.show(), but that might be too heavy for client.get_versions(). Open to other ways to include this info if there are suggestions.

Good First Issue

Most helpful comment

As others have commented, adding to get_versions seems to be a supported idea. You might want to look at https://github.com/dask/distributed/pull/3567 as it has some updates to get_versions as well as tests

All 13 comments

I think it's reasonable to include more things. It's fairly cheap. We
might also keep get_versions as it is, but make a larger get_info function
that has a wider scope

On Fri, Mar 9, 2018 at 2:14 PM, jakirkham notifications@github.com wrote:

At the risk of overloading client.get_versions() with info, it would be
handy to be able to check the NumPy BLAS/LAPACK linkage in here. This can
be really helpful when debugging a slow computation or a very strange
segfault that might be BLAS or LAPACK related. One way at this info is
numpy.__config__.show(), but that might be too heavy for
client.get_versions(). Open to other ways to include this info if there
are suggestions.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/dask/distributed/issues/1827, or mute the thread
https://github.com/notifications/unsubscribe-auth/AASszDqpRpSYbYb75M98Y4WC129gU6r4ks5tctSggaJpZM4SkvBu
.

Where the task is largely about gathering information from workers, I was wondering if the right approach might be to modify Client.run() to be able to return values (or futures). Deciding what information to harvest from the workers would then be in the control of the clients, not reliant on changes to distributed.

Yes, that's doable today from user-space and a fine solution.

One reason by get_versions doesn't take this approach (it used to) is that it also gathers information from the scheduler, where we try to avoid depending on pickle. I suspect that in the future, using pickle may be turned off by default in the scheduler.

Hi, I am first time contributing to open source. Can I wok on it?

@lalitparate - yes, dask is a community driven open-source project. As such, anyone is welcome to work on anything. Let us know if you need help.

Are we still aiming to show this worker linkage info in client.get_versions() ?

I think it's reasonable to include more things. It's fairly cheap. We might also keep get_versions as it is, but make a larger get_info function that has a wider scope
…

Or should I build get_info() by wrapping client.run() ?

As others have commented, adding to get_versions seems to be a supported idea. You might want to look at https://github.com/dask/distributed/pull/3567 as it has some updates to get_versions as well as tests

As others have commented, adding to get_versions seems to be a supported idea. You might want to look at #3567 as it has some updates to get_versions as well as tests

Sure, thanks.

May I ask what exactly do we want to show in get_versions()?
Since there are lots of possible BLAS/LAPACK library linking options in
Numpy, (seven currently)
I'm not sure if showing every build info presented in Numpy.show_config() is the best idea.

Another question is how should we fit the various library linkage info into client.get_versions()?
It seems to me that the current output layout is not meant to present a list of sublists about a package but to show version info alone,
packing stuff like this into get_versions() for every worker seems suboptimal

blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]

@HsuanTingLu do you have thoughts on a more optimal layout ?

No, I don't have one, so I'll probably add it anywhere you guys see fit.

Back to the second question, how should the info be fitted into client.get_versions()?
I'm thinking about adding a numpy-config sublist under host, or maybe somewhere under package::numpy?

I am +1 on package::numpy. I understand this to mean something like:

 'packages': {'numpy': 'blas_opt_info: {}

Is that right ?

Yeah something like this
'packages': { 'numpy': '1.18.2', 'blas_opt_info: {}, 'lapack_opt_info: {}}

Was this page helpful?
0 / 5 - 0 ratings