hwloc recently had a new major release, just found out we're incompatible with it.
I was reading that hwloc now places NUMA nodes as siblings of the cores attached to them instead of parents. Is this the problem?
Does the incompatibility show up at configure time, compile time or runtime, and is it a problem in flux-core or flux-sched?
Thanks!
The first thing I hit was a missing define, the HWLOC_WHOLE_SYSTEM_IO or
similar is now missing. There are probably other issues but that’s
the one I ran into on a random spack update.
On 22 Mar 2018, at 10:49, Mark Grondona wrote:
I was reading that hwloc now places NUMA nodes as siblings of the
cores attached to them instead of parents. Is this the problem?Does the incompatibility show up at configure time, compile time or
runtime, and is it a problem in flux-core or flux-sched?Thanks!
--
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
https://github.com/flux-framework/flux-core/issues/1377#issuecomment-375398850
Bump on this issue now that Ubuntu 20.04 no longer has hwloc v1 available:
affinity.c: In function ‘topology_restrict’:
affinity.c:35:17: error: ‘HWLOC_RESTRICT_FLAG_ADAPT_DISTANCES’ undeclared (first use in this function); did you mean ‘HWLOC_RESTRICT_FLAG_ADAPT_MISC’?
35 | int flags = HWLOC_RESTRICT_FLAG_ADAPT_DISTANCES |
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| HWLOC_RESTRICT_FLAG_ADAPT_MISC
affinity.c:35:17: note: each undeclared identifier is reported only once for each function it appears in
make[2]: *** [Makefile:1287: affinity.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/data/hd1/Repositories/flux-framework/flux-core/src/shell'
make[1]: *** [Makefile:508: all-recursive] Error 1
make[1]: Leaving directory '/data/hd1/Repositories/flux-framework/flux-core/src'
make: *** [Makefile:580: all-recursive] Error 1
(2) 49s sherbein ~/Repositories/flux-framework/flux-core (upmaster ?S)
❯ apt search hwloc 18:50:53 ()
Sorting... Done
Full Text Search... Done
hwloc/focal 2.1.0+dfsg-4 amd64
Hierarchical view of the machine - utilities
hwloc-nox/focal,now 2.1.0+dfsg-4 amd64 [installed,automatic]
Hierarchical view of the machine - non-X version of utilities
libhwloc-common/focal,focal 2.1.0+dfsg-4 all
Hierarchical view of the machine - common files
libhwloc-contrib-plugins/focal 2.1.0+dfsg-3 amd64
Hierarchical view of the machine - contrib plugins
libhwloc-dev/focal,now 2.1.0+dfsg-4 amd64 [installed]
Hierarchical view of the machine - static libs and headers
libhwloc-doc/focal,focal 2.1.0+dfsg-4 all
Hierarchical view of the machine - documentation
libhwloc-plugins/focal,now 2.1.0+dfsg-4 amd64 [installed,automatic]
Hierarchical view of the machine - plugins
libhwloc15/focal,now 2.1.0+dfsg-4 amd64 [installed,automatic]
Hierarchical view of the machine - shared libs
Is that just one of many errors? What happens if you try something obvious like:
```diff
```diff --git a/src/shell/affinity.c b/src/shell/affinity.c
index 9f3b86f66..5d9829a22 100644
--- a/src/shell/affinity.c
+++ b/src/shell/affinity.c
@@ -32,7 +32,10 @@ struct shell_affinity {
*/
static int topology_restrict (hwloc_topology_t topo, hwloc_cpuset_t set)
{
- int flags = HWLOC_RESTRICT_FLAG_ADAPT_DISTANCES |
+ int flags =
+#ifdef HWLOC_RESTRICT_FLAG_ADAPT_DISTANCES
+ HWLOC_RESTRICT_FLAG_ADAPT_DISTANCES |
+#endif
HWLOC_RESTRICT_FLAG_ADAPT_MISC |
HWLOC_RESTRICT_FLAG_ADAPT_IO;
flags = 0;
Honestly I don't even think the affinity plugin is using the distances (though maybe this affects hwloc_distribute internally), so perhaps that flag can just be removed.
That is a great question. It was the only one that I saw. I applied your patch and got a bunch of additional errors:
CC builtin/hwloc.o
builtin/hwloc.c: In function ‘global_hwloc_create’:
builtin/hwloc.c:140:12: error: implicit declaration of function ‘hwloc_topology_set_custom’; did you mean ‘hwloc_topology_get_support’? [-Werror=implicit-function-declaration]
140 | || hwloc_topology_set_custom (global) < 0)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
| hwloc_topology_get_support
builtin/hwloc.c:151:13: error: implicit declaration of function ‘hwloc_custom_insert_topology’ [-Werror=implicit-function-declaration]
151 | if (hwloc_custom_insert_topology (global,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
builtin/hwloc.c: In function ‘flux_hwloc_global_xml’:
builtin/hwloc.c:202:9: error: too few arguments to function ‘hwloc_topology_export_xmlbuffer’
202 | if (hwloc_topology_export_xmlbuffer (global, &buf, &buflen) < 0)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/hwloc.h:2377,
from builtin/hwloc.c:23:
/usr/include/hwloc/export.h:105:20: note: declared here
105 | HWLOC_DECLSPEC int hwloc_topology_export_xmlbuffer(hwloc_topology_t topology, char **xmlbuffer, int *buflen, unsigned long flags);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
builtin/hwloc.c: In function ‘topo_init_common’:
builtin/hwloc.c:225:40: error: ‘HWLOC_TOPOLOGY_FLAG_IO_DEVICES’ undeclared (first use in this function); did you mean ‘HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM’?
225 | if (hwloc_topology_set_flags (*tp, HWLOC_TOPOLOGY_FLAG_IO_DEVICES) < 0)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM
builtin/hwloc.c:225:40: note: each undeclared identifier is reported only once for each function it appears in
builtin/hwloc.c:227:9: error: implicit declaration of function ‘hwloc_topology_ignore_type’; did you mean ‘hwloc_topology_export_xml’? [-Werror=implicit-function-declaration]
227 | if (hwloc_topology_ignore_type (*tp, HWLOC_OBJ_CACHE) < 0)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
| hwloc_topology_export_xml
builtin/hwloc.c:227:42: error: ‘HWLOC_OBJ_CACHE’ undeclared (first use in this function); did you mean ‘HWLOC_OBJ_L5CACHE’?
227 | if (hwloc_topology_ignore_type (*tp, HWLOC_OBJ_CACHE) < 0)
| ^~~~~~~~~~~~~~~
| HWLOC_OBJ_L5CACHE
builtin/hwloc.c: In function ‘flux_hwloc_local_xml’:
builtin/hwloc.c:268:9: error: too few arguments to function ‘hwloc_topology_export_xmlbuffer’
268 | if (hwloc_topology_export_xmlbuffer (topo, &buf, &buflen) < 0)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/hwloc.h:2377,
from builtin/hwloc.c:23:
/usr/include/hwloc/export.h:105:20: note: declared here
105 | HWLOC_DECLSPEC int hwloc_topology_export_xmlbuffer(hwloc_topology_t topology, char **xmlbuffer, int *buflen, unsigned long flags);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
builtin/hwloc.c: In function ‘kvs_txn_put_xml_file’:
builtin/hwloc.c:512:9: error: too few arguments to function ‘hwloc_topology_export_xmlbuffer’
512 | if (hwloc_topology_export_xmlbuffer (topo, &xml, &len) < 0)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/hwloc.h:2377,
from builtin/hwloc.c:23:
/usr/include/hwloc/export.h:105:20: note: declared here
105 | HWLOC_DECLSPEC int hwloc_topology_export_xmlbuffer(hwloc_topology_t topology, char **xmlbuffer, int *buflen, unsigned long flags);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Summary here: https://www.open-mpi.org/projects/hwloc/doc/v2.0.0/a00327.php
The big ones are:
Thanks @trws for that pointer. After reading the linked page, my impression is that the only breaking changes that we have to worry about right now are the APIs changes (e.g., hwloc_topology_set_custom). I don't believe anything leverages the location of the memory currently, and it looks like we can setup the library to dump 1.0-compatible XML so that readers (like fluxion-resource) don't need to be updated (assuming the 1.0-compatibility does not strip out NVMe devices and their sizes - https://github.com/flux-framework/flux-sched/issues/651).
I think that’s right yeah. The memory location thing might become an
issue if we picked something specific for job spec, but the way it is
right now is probably better for us overall than the way it was
previously so I’m not too worried about it.
I have all of the "easy" transitions done:
hwloc_topology_export_* functionshwloc_topology_set_flags and hwloc_topology_ignore_type calls with hwloc_topology_set_*_types_filterI went to replace the hwloc_topology_set_custom, and the conclusion that I've come to after a couple of hours of searching is that hwloc no longer supports multi-node topologies. For that, we will need to reach for netloc (which is shipped with hwloc currently, but is not yet at a stable 1.0 release). A few reasons why I came to this conclusion:
I suppose we can try and replicate that last option in Flux, but I don't think there is a way to do it with the public hwloc API. I think it would require mucking around with internal data structures.
At the end of the day, the lack of this hwloc feature only affects three sub-commands of Flux: flux hwloc info, flux hwloc topology, and flux hwloc lstopo. For flux hwloc info, it would be pretty easy to iterate over all the individual rank xmls and sum up the number of nodes, cores, PUs from there. For the topology command, we can probably "fudge" the output to have a root "system" object and multiple child nodes. For the lstopo command, we can try and fake valid v1.X XML/topology output, but I'm not sure how fragile that will be with the various versions of the lstopo command. It's lame, but I'm inclined to just disable the flux hwloc lstopo command when configured against hwloc 2.0+.
Thoughts?
EDIT: just posted on the hwloc-users mailing list to confirm if multi-node topology functionality has been relegated to netloc.
Thoughts?
Nice sleuthing @SteVwonder! I don't think we have use cases for flux hwloc topology nor flux hwloc lstopo. I'd personally be fine disabling both of these subcommands.
The fact that hwloc has shied away from multi-node topologies probably cements the fact that we should transition away from using it for our "first class" resource representation for flux instances.
Yeah, those are kinda just helpers. I'm not sure if users actually make much use of them, if not having them go wouldn't be a huge loss. if it's something people like, I'd almost go for something more like "unless rank is specified, run lstopo on each rank's xml" to keep us from maintaining anything more abstract.
It might be a fun toy to provide a way to run "netloc" without some of the nastiness that usually entails, it could actually be quite useful for populating the network portion of the resource graph, but I see little reason to do that until someone runs into a need for it.
Reply from the mailing list:
There's no equivalent in hwloc 2.x unfortunately, even with netloc. "custom" caused too many issues for core maintenance (mostly because of cpusets being different between machines) while use cases were very rare.
The fact that hwloc has shied away from multi-node topologies probably cements the fact that we should transition away from using it for our "first class" resource representation for flux instances.
Agreed!
I'd almost go for something more like "unless rank is specified, run lstopo on each rank's xml" to keep us from maintaining anything more abstract.
Yeah, it should be easy enough to tweak those commands to dump each rank individually, or optionally take a single rank as an argument as dump that.
It might be a fun toy to provide a way to run "netloc" without some of the nastiness that usually entails, it could actually be quite useful for populating the network portion of the resource graph, but I see little reason to do that until someone runs into a need for it.
That would be cool!