Drake: Need a good solution to aligned allocation of Eigen object-containing classes

Created on 13 Mar 2016  Â·  46Comments  Â·  Source: RobotLocomotion/drake

Any Drake class that contains aligned, fixed-size Eigen objects needs to have a custom operator new that aligns the whole class properly. STL containers and std::make_shared use the wrong one. Eigen has macros for dealing with this, but our current usage violates basic encapsulation because the class user must know its implementation in order to use it. A later addition of an aligned Eigen type as a private member of a previously Eigen-free class could break a lot of working user code.

A lesser issue is two heap allocations for shared_ptr(new Blah) vs. one for{make,allocate}_shared<Blah>().

I know this has come up before and that there are workarounds, but I wanted to have an issue where we can discuss a permanent solution.

low kitware design

All 46 comments

How about let's add the following rule to the style guide?

Do not add an Eigen type that requires memory alignment as a member variable of a Drake class. Instead, store it in a managed pointer. Then initialize the managed pointer by using Eigen's custom aligned allocator.

For example, do NOT do the following:

class Foo {
 public:
  Foo() {
  }
  EIGEN_MAKE_ALIGNED_OPERATOR_NEW
  Eigen::Quaterniond bar;
}

Instead, do:

class Foo {
 public:
  Foo() {
    bar = std::allocate_shared<Eigen::Quaterniond>(Eigen::aligned_allocator<Eigen::Quaterniond>());
  }
  std::shared_ptr<Eigen::Quaterniond> bar;
}

Note: I tested the above on 32-bit Windows 10 Pro and it seems to work in terms of avoiding the memory misalignment assertion. I have not tested whether another class that declares this class as a member variable needs a custom aligned allocator. Hopefully not.

@tkoolen — can you weigh in?

This could be a workable solution but the extra heap allocation worries me. The whole point of aligning these small data structures is to permit use of SIMD instructions to accelerate operations on them. I _think_ heap allocation is likely about 100X slower that a single SIMD add (because of the out-of-cache thread lock). That has to be amortized over a lot of operations.

I think we can design Drake to only perform heap allocation during an initialization phase and not during simulation or within the control servo loop. For example, we can add an init () method to every class in Drake that does all of the heap allocation after object construction but before the start of the simulation and controller.

One thing I am not sure about is what is the overhead of accessing an object on the heap via a pointer vs. by value on the stack. Might need to perform some microbenchmarks and study the assembly output.

Do not add an Eigen type that requires memory alignment as a member variable of a Drake class

I don't think that should be a hard and fast rule.

the extra heap allocation worries me

Agree.

I think we can design Drake to only perform heap allocation during an initialization phase

Yes, that should be the case. However it will be very annoying to have to preallocate all of the temporary variables that every single algorithm uses, while it's possible to circumvent that in a lot of cases by using stack-allocated variables. An additional minor issue would be that it's annoying to have to dereference these pointers everywhere in the algorithms.

So my opinion is that this needs to continue to be decided on a case by case basis. For something elementary like a future Wrench or Twist class, which will be constructed and destructed a lot in algorithms, I would allow having a stack-allocated fixed size vectorizable member. For something like a System, which should live for the duration of the program, I would disallow it.

Yes perhaps we should adopt a more flexible criteria for deciding whether we should go out of our way ensuring a class is memory aligned regardless of how it is instantiated. I recommend partitioning Drake into a "foundation" and "non-foundation" layers.

Foundation classes like rigid body tree, rigid body system, drake joint, etc., must have strict memory alignment transparency semantics.

For non-foundation classes, we can decide based on our estimate of how many different classes will want to include it as a member variable in the future.

At least we should always document the alignment requirement for any class that has specific requirements.

Here's a candidate warning:

WARNING: This class must be instantiated using Eigen's custom 16-bit aligned allocator called Eigen::aligned_allocator. This is due to member variables of particular Eigen types that require 16-bit memory alignment for expedited computation. When instantiating this class, care must be taken to ensure this custom allocator is used and not, for example, C++'s default allocator. Sure-fire ways to ensure the correct allocator is used include using the new keyword to instantiate this class and using std::allocate_shared with Eigen's Eigen::aligned_allocator as a parameter. When including this class as a member variable of another class, you might be able to use EIGEN_MAKE_ALIGNED_OPERATOR_NEW in the containing class, though this is not 100% reliable because even if the containing class is 16-bit memory aligned, this particular member variable within that class may not be.

That's good. I do think that if the warning referred to http://eigen.tuxfamily.org/dox/group__TopicStructHavingEigenMembers.html instead (or to a self-maintained wiki page) it could be a bit shorter.

Another possible way to increase safety is to require that non-templated classes with fixed-size-vectorizable Eigen members specialize make_shared etc.. Unfortunately you can't partially specialize template functions, so that approach won't work for templated classes.

"16 byte aligned" is too much detail. That is just an artifact of Eigen's support for SSE2 instructions. Eigen 3.3 supports AVX instructions and changes the alignment to 32 bytes.

As worded I don't think the comment is very helpful. We need to invent a discipline that works and tell people to follow it. I'm not sure I know what that is yet.

@tkoolen and @sherm1: Thanks for the feedback. One thing I don't think is very clear on Eigen's official page is the fact that an aligned allocator is needed not only if a class has a fixed-size-vectorable Eigen member variable, but if the _transitive closure_ if all member variables within the class' member variables includes a fixed-size-vectorable Eigen type. Thus, if class Foo contains a member variable of type Bar, which contains a member variable of type Eigen::Vector2d, Foo needs the EIGEN_MAKE_ALIGNED_OPERATOR_NEW macro too, not just Bar.

Furthermore, Foo must never be instantiated using make_shared and instead must use allocate_shared.

We need to invent a discipline that works and tell people to follow it. I'm not sure I know what that is yet.

Maybe we need a tool similar to cpplint that's able to analyze every instantiation of a fixed-size-vectorable Eigen variable, trace their instantiation graphs, and flag instantiation sequences that may result in an unaligned instance. The discipline would then be to run this tool with each commit.

P.S. That's "16/32 _byte_ aligned", not "bit"!

Not well thought through: could we use inheritance or a compile time Concept (an alignment trait or some such) to automate or enforce this?

I'm not sure that there is any mechanism within C++ itself that can be used to resolve this, shy of the comittee's work to support alignas properly for over-aligned types.

http://eigen.tuxfamily.org/bz/show_bug.cgi?id=965
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0035r2.html

In the meantime, I suspect you're stuck with some after-the-fact analysis to verify that correct procedures were used, either through cpplint, llvm, or similar.

@sherm1: Yeah, I realized it was _byte_ alignment, not _bit_ alignment after closing my laptop last night. Sorry about that. Good catch!

Regarding the idea of using inheritance: I can see how that could potentially allow subclasses to avoid needing to use EIGEN_MAKE_ALIGNED_OPERATOR_NEW, but I don't think that will address a situation like make_shared deciding to use the default allocator anyway. @tkoolen mentioned the possibility of requiring non-templated classes with fixed-size-vectorizable Eigen members to specialize make_shared, though I'm not sure how to force this specialized make_shared to be used other than to call it a different name.

Regarding the use of a compile-time concept: I doubt it will allow us to avoid the make_shared issue since compile-time concepts can only modify stuff inside a class whereas the decision on which allocator to use is done outside of the class (i.e., the decision is made even before touching the class).

@jpieper-tri: Thanks for the input. I am beginning to suspect there's no 100% satisfying solution until C++ itself is updated.

Here's an updated warning:

WARNING: This class and all classes that have this class as a member variable _must_ be instantiated using Eigen's custom aligned allocator called Eigen::aligned_allocator. For more information see Eigen's official documentation about this problem. 100% reliable methods to satisfy this requirement include:

  1. Using the new keyword to instantiate this class.
  2. Using std::allocate_shared with Eigen's Eigen::aligned_allocator as a parameter.
  3. Using std::move to initialize a std::unique_ptr to this class.

To further reduce the possibility of alignment problems, the macro EIGEN_MAKE_ALIGNED_OPERATOR_NEW should be included in all classes that have a variable of this class type within its _transitive closure_ of all member variables.

Proposal: For any class Foo that uses EIGEN_MAKE_ALIGNED_OPERATOR_NEW:

  • mark the constructor(s) private; and
  • add a public factory method that returns a std::unique_ptr<Foo>, properly aligned; and
  • everyone has to use the factory method.

For cases where the unique_ptr complexity is undesirable for the caller, you can either:

  • use Eigen::DontAlign for your members, so your constructor(s) can remain public; or
  • keep the constructor(s) private, but declare a friendship relationship to allow opt-in access to the shotgun-at-foot constructor.

I guess, having read this entire thread once again, perhaps my proposal is not actually solving the problem. It may be the a better answer than documentation for larger classes like RigidBody, but probably doesn't cover the smaller, transient classes like Twist or whatever.

@jwnimmer-tri: +1, the "larger classes" idea is basically the same as the "foundation layer" I mentioned above.

I would like to coin a new phrase: "Eigen-Safe" -- A class is _Eigen Safe_ if there will not be any memory alignment problem regardless of how the class is instantiated.

We should make as much of Drake Eigen Safe as practically possible, and label all other classes with large warning signs like the one I posted above in the hopes that users will not fall into the memory alignment trap when using Drake's classes.

but probably doesn't cover the smaller, transient classes like Twist or whatever.

I would argue that all transient objects should be allocated on the stack rather than heap and thus not need to worry about the class member variable memory alignment issue.

Thus, I think my original rule proposal holds assuming 1. it works and 2. accessing an object on the heap after it is already allocated does not incur excessive overhead relative to the stack.

+1 for "Eigen safe". I presume the opposite is "Eigen unsafe". Maybe to honor Eigen's etymological roots we should make those "Eigensafe" and "Eigenunsafe" ("Neigensafe") :smile:.

I would argue that all transient objects should be allocated on the stack

Is that enough? Is there still a problem if these are data members of a stack-allocated struct?

I would argue that all transient objects should be allocated on the stack
Is that enough? Is there still a problem if these are data members of a stack-allocated struct?

I believe it's enough because I believe the correct allocator will always be used.

// The following class is eigensafe because it will always have correct memory alignment
// regardless of how it is instantiated.
class Foo {
  void MethodBar() {
    // The following variable instantiation is eigensafe because the correct allocator will
    // always be used regardless of how Foo is instantiated.
    Eigen::Quaterniond bar;
  }
}

To address @tkoolen's concern, we could relax the requirement to say that at some level of Drake's object hierarchy, we decouple the allocation of the parent object (which is eigensafe) from the allocation of an neighenunsafe class:

// This class is obviously neigenunsafe since it will not work on 32-bit windows when
// instantiated using `make_shared`.
class NeigenUnsafeClass {
 public:
  EIGEN_MAKE_ALIGNED_OPERATOR_NEW
  Eigen::Quaterniond bar;
}

// This class is also neigenunsafe because it includes a neigenunsafe member variable.
class NeigenUnsafeClassContainer {
 public:
  EIGEN_MAKE_ALIGNED_OPERATOR_NEW
  NeigenUnsafeClass Foo;
}

// This class is eigensafe because it will always use the correct allocator to
// instantiate its neigenunsafe member variable.
class EigenSafeClass {
  EigenSafeClass() {
    bar = std::allocate_shared<NeigenUnafeSafeClassContainer>(
      Eigen::aligned_allocator<NeigenUnafeSafeClassContainer>());
  }
  std::shared_ptr<NeigenUnafeSafeClassContainer> bar;
}

Thanks, Liang. I suggest a terminology refinement: use a double negative "neigenunsafe". In reading the above I found "eigensafe" and "neigensafe" too close in mental space.

@sherm1: I've updated the terminology to use "NeigenUnsafe" and "EigenSafe".

I have another proposal and I know @sherm1 won't like it but I am growing fond of it specially after the large discussion we had on smart pointers. A PIMPL idiom I think would solve this problem. The implementation would be "EigenUnsafe" but since it is hidden to the user we can make sure it is always allocated using Eigen::aligned_allocator. The interface class would therefore not have any of these problems ("EigenSafe") and the user would never know about them. Of course internally we have to always make sure we use Eigen::aligned_allocator when using STL's containers but there is no way around this anyways.
Let me add that a PIMPL idiom would have the benefit of the users never having to deal with pointers (or smart pointers).

I'm a big fan of PIMPL for making the most beautiful APIs, but I think it will be too difficult to use in practice in Drake. Also, it isn't clear to me that it is a solution for dealing with small, inline objects like Eigen's aligned Vector4d, Isometry3d, etc.

This is what I was trying to communicate above. Hopefully this helps to clarify :-)

// This is how our current Drake classes are implemented.
// But with PIMPL we would move them to the implementation class.
class RibidBody_impl {
  RigidBody_impl() { }
public: // The user never sees this so I don't see why not to make members public (or just use a struct).
  // Notice I can have as many Eigen objects as I please and I do not need to separately call an aligned allocation on each of them. 
  // Future extensions can keep adding Eigen members without worrying about alignment.  
  Eigen::Vector3d com_;
  Eigen::Matrix<double, TWIST_SIZE, TWIST_SIZE> I_;
public:
  EIGEN_MAKE_ALIGNED_OPERATOR_NEW
};

// The interface class, the one the user instantiates and plays around with.
// Notice that there's no need for EIGEN_MAKE_ALIGNED_OPERATOR_NEW non-sense here. 
class RigidBody {
  RigidBody() {
     impl_.reset(new RigidBody_impl()); // The appropriate (aligned) allocator is called here since we used EIGEN_MAKE_ALIGNED_OPERATOR_NEW in the implementation class.
  }
public: 
  Vector3d com() { return impl_.com;}
private:
  std::unique_ptr<RigidBody_impl> impl_;
};

// Usage example:
// This class does not need to be aligned and therefore the user can place it safely in any STL container. 
std::vector<RigidBody> bodies_collection; // Notice I don't even need a pointer (or smart pointer) at all!! super clean.

// Then the user can cleanly query actions on those objects as if they were allocated on the stack:
Vector3d x = bodies_collection[body_index].com();

// When going out of scope all the objects get destructed cleanly automatically.
// No need for smart pointer non-sense!!!


I forgot to mention that SIMD instructions will still work so no performance lost! You might argue that dynamic allocation is costly but keep in mind that we always do so anyways. When we instantiate our RigidBodyTree from an SDF file there is no way for the compiler to know what there will be in that SDF. All objects then get created at run-time by some factory logic dynamically. But once these are created the SIMD instructions still work and after the initialization stage (with all the dynamic allocations) everything should run at full speed.

I really like that API. Simbody uses PIMPL extensively to provide a pointer-free interface. Here is an example from Simbody's README file. As I've discussed with @amcastro-tri, it was a huge amount of work to provide PIMPL handle/implementation pairs for every object, which I think would be excessive for Drake. However, killing the API and alignment birds with one stone does seem appealing. Maybe we can design a smart pointer of some kind that can serve as a generic handle so we don't have to work so hard?

Oh wow! that's so beautiful! would it be possible to have the process of exposing the implementation members and public methods through the interface automatically? maybe somebody could volunteer to write that parser (not me hehe).

I am also a fan of the pimpl pattern, mostly to remove the necessity to expose all internal state and private members in the public headers.

While automatically forwarding all methods, as you noted, would require an outside the language solution, there are techniques to make the code overhead lower, especially if your class is not templated:

  1. Use something similar to the techniques in the experimental boost::pimpl library to automatically forward constructor arguments and automate the declaration of the pimpl class. I don't think you'd want to use it directly, but it could be a good reference: http://yet-another-user.github.io/pimpl/doc/html/index.html
  2. If you have the public methods just do their work in the public member definitions, but refer to implementation variables through the impl pointer, you don't have to duplicate method definitions. The only methods in the impl would be those that otherwise would be private.
  3. Minor nitpick, if you are going to initialize a unique_ptr in a constructor, do so through the initializer list, rather than calling reset in the constructor.

Hmm, isn't @amcastro-tri's PIMPL design basically the same as my EigenSafeClass above except it restricts the EigenSafeClass and NeigenUnsafeClassContainer to have a 1:1 relationship?

@liangfok: Notice that in your design if you have members bar1, bar2, bar3, etc. you'd need to call std::allocate_shared on each of them separately making future extensions more difficult. Not to mention that if bar1 is an Eigen::Matrix3d or the like you'd need to dereference it every time time you need to use it. The implementation class in the PIMPL idiom is essentially the one class we have within Drake right now. The interface class (holding a pointer to the implementation) hides the awful particulars of the implementation from the user.

@jpieper-tri: I don't really understand point 2 but I'll be in Alewife later today so we can discuss (though you might want to clarify your point here for everyone else).

Having done a lot of PIMPLing, I understand Josh's point 2 very well. If in the handle (public) API you have a method DoSomething(), the implementation could be DoSomething() {impl->DoSomething()} meaning every method is implemented twice. Josh is suggesting instead to do the implementation in the handle class by making repeated use of the impl pointer, leaving the Impl class holding only data and private members.

I have done it both ways. In practice it can depend on how the Impl class is used internally. It can be nice to have all the functionality in Impl so that you don't have to have one of the handle-class objects around when working internally.

I'm not seeing the difference since EigenSafeClass == RigidBody and NeigenUnsafeClassContainer == RibidBody_impl. I would think that not enforcing a 1:1 relationship between the interface (e.g., RigidBodySystem) and the numerous implementations (e.g., RigidBodyTree and RigidBodySensor) would be desirable. Am I missing something obvious?

Liang, not sure if this answers your question: the PIMPL design is inherently 1:1. There can't be multiple implementations of the user-visible handle class (which must be entirely concrete, i.e. no virtual methods). There is a single implementation class for each type of handle, but that implementation can be abstract. Any NeigenUnsafe class C would be PIMPLed so that its handle class C is EigenSafe and C_impl is NeigenUnsafe. No one but class C would every see C_impl or know it exists. Because the (C,C_impl) pair is essentially a single object, later changes that make C_impl safe or unsafe are fine; only the implemenation of C needs to know and it is written by the same person at the same time.

@sherm1: Thanks for the explanation. I agree having a 1:1 relationship between an encapsulating EigenSafe class and its NeigenUnsafe member variable will result in maximum flexibility / safety in terms of allowing users to use any class in Drake without needing to worry about Eigen Alignment. I view @amcastro-tri's proposal as a special case of my proposal that enforces a 1:1 relationship.

An additional minor issue would be that it's annoying to have to dereference these pointers everywhere in the algorithms.

One option to avoid excessive dereferencing is to alias a pointer using a reference. For example:

  auto tree = std::make_shared<RigidBodyTree>("foo.urdf");

  RigidBodyTree& my_tree = *(tree.get());
  std::cout << "Number of model instances in tree: "
            << std::to_string(my_tree.get_number_of_model_instances())
            << std::endl;

Another relevant question is how to test that code is correct. I realized today that we might be able to replace memory allocation to always return unaligned base pointers, so that if Eigen ever fails to do the alignment masking the code would definitely trap, rather than only failing on MSVC 32-bit or whatever.

It looks like P0035R4 has landed in Clang 4 and GCC 7, which as I understand it means that heap allocation for over-aligned types works easily and inductively. Eigen has a ticket tracking the feature request at http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1409.

On 64-bit Ubuntu, without AVX or higher turned on, Eigen works out of the box without any special annotations. That's the route we should take, not trying to use fancy custom alignment. If we need larger aligned types (AVX support), we should just get 1409 fixed instead.

Fascinating. Trying to fix this very same (super-annoying) issue in GTSAM here. The last comment, @jwnimmer-tri , seems like you're giving up on trying to wrestle with it? Curious what the eventual policy was you came up with though.

The rough policy is "we don't support 32-bit linux" and then "don't turn on vector operations with greater than https://en.cppreference.com/w/cpp/types/max_align_t alignment". In practice that means we can do -msse4.2 just fine, but not -march=native nor -mavx. For us, the potential performance improvements of using AVX widths instead of SSE widths was not worth the huge API burden.

If you need -march=native to work, it's possible that there are Eigen preprocessor defines that you could set which downgraded to use only SSE operations, even if AVX was enabled in the compiler. That might require a global rebuild-from-source though, which may not always be possible for all packaging mechanisms.

If Drake decided that we needed to use AVX, I would choose to develop and submit a fix for Eigen 1409 to get the right annotations into the upstream Eigen::DenseBase types, rather than annotate _all_ of my own classes with work-arounds. Requiring clang >= 4 and gcc >= 7 is a compromise I'd be willing to take, if I need to use AVX.

Thanks, that's helpful!!

Was this page helpful?
0 / 5 - 0 ratings