Use case:
My development team continually constructs new MultibodyPlant objects. Those plants reflect our robot's world model, meaning that the number of objects within them grows as new objects are detected. We need a way to uniquely identify bodies shared between the plants as these MBPs change.
At present, the way that one can uniquely ID bodies is through a combination of two std::string objects. This is less than ideal for our real-time application because it means that one needs to be exceedingly careful to create all string objects (which includes any objects that use string objects, perhaps buried way down) during a "warm up" phase. In practice, that's very hard to do and leads to lots of unintended heap allocations that must be chased down and arduously eliminated.
The standard solution for these kinds of problems in real-time-world is to allocate everything on the stack. In general, we can do that. For example, Eigen allows us to create max-sized matrices, vectors, and linear algebra "objects" (think: SVD) that will be allocated on the stack. Drake is good at letting us allocate everything we need on the heap in a warmup pass.
Potential solution:
One possibility for us to eliminate the aforementioned heap allocation problem is to use character arrays to hold the strings that MBP uses for its unique IDs. An experimental branch that makes use of this change by adding API functions to MBT and changes the underlying maps that MBP uses can be seen here: https://github.com/RobotLocomotion/drake/compare/master...edrumwri:mbp_const_char_queries?expand=1
A better solution would be for MBP to create unique hash IDs that can be used to refer to bodies (as well as joints, actuators, and model instances). That would allow for much faster lookups since strings don't have to be compared and would allow users like myself to not have to store strings or other bulky objects. But I don't have a sense of how much harder the implementation effort would be for this approach. For one, some thought would be required to implement a zero-collision hash routine.
Or perhaps I'm missing another way to look at the problem that we need to solve.
@edrumwri: when you are looking for corresponding bodies in two different MBPs, is it always the case that both the body name and the ModelInstance name are the same in both MBPs? But presumably the ModelInstanceIndex and BodyIndex are both different?
@edrumwri: when you are looking for corresponding bodies in two different MBPs, is it always the case that both the body name and the ModelInstance name are the same in both MBPs? But presumably the ModelInstanceIndex and BodyIndex are both different?
Yes. Put simply we can ensure that when we construct the MBP- which happens in non-real-time code- we give a unique body name and model instance pair. But because the number of bodies changes and because we don't have precise control over ModelInstanceIndex and BodyIndex creation (unless we create empty model instances and dummy bodies, I suppose, but bleh), we can't make the model and body indices match.
On the issue of managing allocations with strings, if you are willing to lose on human readability of item names within MBP, and already have a task-specific means of creating unique identifiers for a given item, you could use those identifiers directly (or via base64 to ensure valid characters) as short fixed-length string names that would fit within the short-string-optimization bounds of the compiler+library you're using. You should be able to fit any hash coming out of std::hash or equivalent into the SSO space on either GCC or Clang (more space) and thus avoid any heap allocations in the strings themselves.
In terms of string comparison costs, do you have a sense from profiling about how expensive these actually are in your system?
I don't think it's a good idea to lose the item names within MBP, if for no other reason than it would require serious API changes (and not for the better IMO).
I'm sure that the string comparison costs are not high, but that's not the problem that we really need to address; removing those costs would just be an ancillary benefit.
To be clear, with that approach you wouldn't be losing item names within MBP, you'd be losing human readable names in exceptions thrown by MBP. In other cases of name lookup you're presumably doing something to generate a scoped name (ex model_instance:name) anyways, and those are the only places you'd need to unmangle names.
Would changing, e.g., GetBodyByName(const std::string&) to be GetBodyByName(const std::string_view&) sufficiently address the allocation problem? I think that would allow callers to pass in either a std::string or a char*, without allocations in the latter?
@edrumwri If I understand your problem correctly, the issue is more the std::strings stored in the MBP than the strings used to query it?
Would changing, e.g.,
GetBodyByName(const std::string&)to beGetBodyByName(const std::string_view&)sufficiently address the allocation problem? I think that would allow callers to pass in either astd::stringor achar*, without allocations in the latter?
I don't think so, since the string_view can't be stored in the map, so a string has to be created to query the map, right?
@edrumwri If I understand your problem correctly, the issue is more the
std::strings stored in the MBP than the strings used to query it?
No- the issue is that to query MBP, I have to either have a std::string stored somewhere or create it.
To be clear, with that approach you wouldn't be losing item names within MBP, you'd be losing human readable names in exceptions thrown by MBP. In other cases of name lookup you're presumably doing something to generate a scoped name (ex
model_instance:name) anyways, and those are the only places you'd need to unmangle names.
I guess I don't fully understand your proposal. But I think you're suggesting that we still do string comparisons?
Would changing, e.g.,
GetBodyByName(const std::string&)to beGetBodyByName(const std::string_view&)sufficiently address the allocation problem? I think that would allow callers to pass in either astd::stringor achar*, without allocations in the latter?I don't think so, since the string_view can't be stored in the map, so a string has to be created to query the map, right?
Ah. I should have also said that we'd change the implementation of MbP not allocate within the implementation of Get... (so, change the type of the body_name_to_frame_id_ member). Assuming we could implement a getter which did not allocate, and which took a string_view argument, would that be sufficient?
I ask because changing the signature of the getter seems much easier than designing and adding even more opaque identifiers into the plant's documented semantics, and/or trying to retcon those identifiers into the SBO strings meant for useful names.
It would, assuming that you are able to figure out how to initialize the same underlying maps using string_view. It's not clear to me how C++ would allow that to work and, in fact, when I attempt to compile:
#include <unordered_map>
#include <string_view>
int main() {
std::unordered_map<std::basic_string_view, int> w;
}
I get this compilation error:
test.cc: In function ‘int main()’:
test.cc:5:49: error: type/value mismatch at argument 1 in template parameter list for ‘template<class _Key, class _Tp, class _Hash, class _Pred, class _Alloc> class std::unordered_map’
std::unordered_map<std::basic_string_view, int> w;
^
test.cc:5:49: note: expected a type, got ‘basic_string_view’
test.cc:5:49: error: template argument 3 is invalid
test.cc:5:49: error: template argument 4 is invalid
test.cc:5:49: error: template argument 5 is invalid
I guess I don't fully understand your proposal. But I think you're suggesting that we still do string comparisons?
It's predicated on the assumption that comparisons of SSO'd strings are cheap enough, and that by using truly unique string names the number of collisions and comparisons would be reduced.
The proposal is you don't change anything in MBP directly, when creating bodies/joints/etc you create them with unique names that are small enough to fit in the SSO space which avoids heap allocations for string data and likely makes all the other operations on them faster. If/when you want a human-legible version of the unique name, you have some means of unmangling them into something useful.
seems much easier than designing and adding even more opaque identifiers into the plant's documented semantics, and/or trying to retcon those identifiers into the SBO strings meant for useful names.
I agree that using the existing names as unique keys is pretty hacky, but I don't see how changing the getters alone would solve the problem of lots of std::string-caused heap allocations.
It would, assuming that you are able to figure out how to initialize the same underlying maps using string_view.
The https://stackoverflow.com/a/53530846/13032709 answer (and its sibling answers nearby) has suggestions.
N.B. basic_string_view is a template type. You want std::string_view to specialize it for char.
It would, assuming that you are able to figure out how to initialize the same underlying maps using string_view.
The https://stackoverflow.com/a/53530846/13032709 answer (and its sibling answers nearby) has suggestions.
N.B.
basic_string_viewis a template type. You wantstd::string_viewto specialize it forchar.
This sounds like the best solution.
On general note, it's been on my mind (since we hit gcc >= 7 as minimum) that pretty much _all_ places in drake that take an input argument of type const std::string& should change to take a string_view instead -- not just the MbP. (Places that take a string by-value and then move from it would remain unchanged).
Switching to std::string_view seems like the perfect solution if it works! Can you try that Evan to see if it meets your needs?
Here is a possible implementation trick:
https://gist.github.com/jwnimmer-tri/3390d218fdad2fa9fc24dec076e53916
(... with a TODO that we can replace it with is_transparent as of C++20.)
I love this discussion and the usage of string_view. However I still have questions about the problem.
Quoting @edrumwri:
This is less than ideal for our real-time application because it means that one needs to be exceedingly careful to create all string objects
sry if slow catching up.
* My question is, if you did have all names stored in say, a vector. Wouldn't GetFoo() methods incur in no heap allocation?
Yes, after the names were allocated. But now imagine that I need to pass one of those names to another function. And that I don't have global access to that vector. Or that I don't want to have to maintain that vector in non-real-time code. It's a major PITA and just requires much, much less thought to use static storage.
* Follow up question, why is it that in your application you cannot have all these strings created at a warm up phase? (cycle through bodies, get their names+model_ids, push the pair into a set)
If I know all the body IDs ahead of time, I could conceivably do that. But still a PITA because I have to maintain that globally accessible vector/set.
Here is a possible implementation trick:
https://gist.github.com/jwnimmer-tri/3390d218fdad2fa9fc24dec076e53916
(... with a TODO that we can replace it withis_transparentas of C++20.)
I'll update my branch in progress using this. Thanks @jwnimmer-tri!
Most helpful comment
On general note, it's been on my mind (since we hit gcc >= 7 as minimum) that pretty much _all_ places in drake that take an input argument of type
const std::string&should change to take astring_viewinstead -- not just the MbP. (Places that take a string by-value and then move from it would remain unchanged).