I think what this is trying to say is that I don't have enough memory to run my network? Or I've configured my device manager with some incorrect amount of memory? It's hard to know what to do with this string because "logical devices" isn't a terribly intuitive concept.
I ran into this too while running my HostManager benchmark on the Interpreter backend.
Cc @gcatron
It could be reworded, The assumption is the partitioner makes logical partitions based on available memory, so if there are more logical devices than physical there is insufficient memory. I personally like keeping the logical>physical but we could make it: "Insufficient memory to run network: Provisioner found more logical devices than physical devices."
@gcatron I don't think "logical device" makes sense as a concept here. It makes it sound like we're virtualizing devices. It'd be better to be explicit about what actually happened:
Network partitioning failed: insufficient device memory to run network
Available device memory: xx MB
Required network memory: yy MB
And yes I realize the above message doesn't capture the subtlety of partitioning (e.g. fragmentation due to varying table sizes), but it at least isn't totally opaque, and it'll make some things very obvious, like if the available device memory is bogus.
Yes, what @bertmaher said is good.
As a side bar, I think we should add strFormat directly to RETURN_ERR_IF_NOT making it variadic to encourage adding actual values to error messages.
I like adding the memory info, but I think we should still keep the logical-physical device mismatch as part of the error. If we will be debugging that is useful information we would be throwing away. It lets us know it's an issue with the partitioner not outputting a valid partitioning scheme. (Perhaps it could be worded more like that)
people may not know what a "logical device" is vs a "physical device", is there a better term or maybe a link to some documentation
Yes, as @jackm321 said, this is the root of my problem. I'm still not sure what a logical device is :-). Is it a synonym for "network partitions"? Also, how could we ever create more partitions than available memory? Shouldn't the partitioner raise an error saying that the graph couldn't be partitioned into the available memory?
That's basically what this check is, we could modify the partitioner to return out the error instead. A logical device is a grouping of partitions all meant to be on one physical device. @beicy What are your thoughts on the partitioner returning an error here instead of the Provisioner?
That's basically what this check is, we could modify the partitioner to return out the error instead. A logical device is a grouping of partitions all meant to be on one physical device. @beicy What are your thoughts on the partitioner returning an error here instead of the Provisioner?
Sure. It is reasonable. I will make a PR for it.
Added string formatting to GlowErrs as promised above #2671
Most helpful comment
Sure. It is reasonable. I will make a PR for it.