Drake: Explicit template instantiation

Created on 10 Oct 2016 · 39Comments · Source: RobotLocomotion/drake

It's probably worth revisiting the Drake style guide for handling explicit template instantiation. The current guidelines balance two issues:

Efficient compilation
General application

The two-file approach emphasizes efficient compilation by putting implementation and explicit instantiation in the .cc file. It assumes the ability to exhaustively enumerate desired instantiation parameters for the templated class and precludes additional instantiations.

The three-file approach allows for general instantiation, but comes at an increased compilation cost. Furthermore, to explicitly instantiate a templated class required including two header files (in the appropriate order). A bit fragile.

We can have the best of both worlds by using extern template instantiations.

Example:

_header.h_

template <typename T,int  N>
class Sum {
 public:
  int total() {
    int sum = N;
    Sum<T, N-1> next;
    return sum + next.total();
  }
};
template< typename T> class Sum<T,1>{
 public:
  int total() {
    return 1;
  }
};
#ifdef WITH_EXTERN
extern template class Sum<int, 256>;
#endif

main.cc

#include "header.h"
#include <iostream>

int main() {
  Sum<int, 256> values;
  std::cout << "Sum: " << values.total() << "\n";
  return 0;
}

Invoking the following commands:
clang++-3.7 -c main.cc -o clang_no_extern.o
clang++-3.7 -c -DWITH_EXTERN main.cc -o clang_extern.o
g++-4.9 main.cc -c -o g++_no_extern.o
g++-4.9 main.cc -c -DWITH_EXTERN -o g++_extern.o

Produces the following results

   2720 Oct 10 08:14 clang_extern.o
 111992 Oct 10 08:14 clang_no_extern.o
   2696 Oct 10 08:14 g++_extern.o
 118288 Oct 10 08:14 g++_no_extern.o

The use of extern clearly communicates to the compiler that an explicit instantiation has already been compiled and it defers the problem of finding it to linking (as witnessed by the significantly smaller object files.) Incidentally, g++ allows greater recursion depths in template definitions. Larger template values (e.g., N = 900) lead to differences in compilation times that even a human can detect. And that's with a very small, simple program.

Thus, we can have all the advantages of both compilation efficiency and general instantiation by simply declaring our explicitly instantiated members as extern. The full template definition can be made available by referencing a single header file.

(Special thanks to @sherm1 for the idea of using the recursive template as a way to cheaply create huge amounts of template code generation.)

dynamics style

Source

SeanCurtis-TRI

Most helpful comment

I've been rethinking through this ticket a bit today.

I think I would like to see extern template class Foo<double>; used pervasively throughout our templated headers. It would be a sanity check on two kinds of correctness -- (1) for method definitions, by making sure that we explicitly instantiate them to catch compilation-error bugs early, and (2) for linker rules, by making sure that all of the correct dependencies have been identified in the link step. (Bazel already does (2) for header visibility, but this is a nice step along the way without it.)

Also, I would be curious to see a relative performance measurement in our tree across such a change. As we add more method bodies to headers in the framework classes, it's possible that compilation and objcode bloat are going to be a real problem. I would be curious to see a good measurement!

jwnimmer-tri on 19 Nov 2016

👍3

All 39 comments

The use of extern template instantiations sounds good to me. I believe the benefits are clear. Is the main drawback the fact that it uses a non-standard compiler feature? Are there any other drawbacks?

liangfok on 10 Oct 2016

My understanding is that it is a standard compiler feature.

https://gcc.gnu.org/onlinedocs/gcc/Template-Instantiation.html

With particular emphasis on the line:

This [using extern with template instantiation] syntax is defined in the C++ 2011 standard, but has been supported by G++ and other compilers since well before 2011.

SeanCurtis-TRI on 10 Oct 2016

Oh nice. The fact that it's a standard is great. Are there _any_ drawbacks then?

liangfok on 10 Oct 2016

At first glance, it appears not. So, I figured I'd throw it up to the brain trust to see if _someone_ could poke a hole in it. But in the absence of any counter-arguments, I'm sold (as is implied by my original advocacy.)

SeanCurtis-TRI on 10 Oct 2016

The three-file approach allows for general instantiation, but comes at an increased compilation cost. Furthermore, to explicitly instantiate a templated class required including two header files (in the appropriate order).

Both of these statements are incorrect.

jwnimmer-tri on 10 Oct 2016

@jwnimmer-tri Can you elaborate? I've re-read the spec and agree that the statement about including both files to be incorrect (the -inl.h file includes the declaration header). But I still don't see how the compiler can avoid compiling an explicit instantiation when it has no specific knowledge that an instantiation exists in another object file.

SeanCurtis-TRI on 10 Oct 2016

With the three-file approach, its code that uses standard values for T (such as double) will do exactly what we want. I think we all agree on that?

The possible hazard with the three-file approach is that code that needs to include the -inl.h file in order to use T = MyWeirdThing may redundantly compile specializations for standard values of T, such as if it wants both T = double and T = MyWeirdThing at the same time. Is that the problem with "redundant compilation" that you are getting at? And if so, why is it important to solve that problem now? It doesn't seem like a pressing need.

I consider leaving the domain of T open-ended to be highly irregular and undesirable. We should provide standard, well-tested types for T across all of our systems code. Letting downstream code specify its own non-standard T is a burden that we should take on only with great care.

That being said, if the extra compilation in my second paragraph above is an issue, the easiest fix is to add the extern statements, but within the existing three-file pattern.

jwnimmer-tri on 10 Oct 2016

Thanks. I agree leaving T open-ended is dangerous.

liangfok on 10 Oct 2016

Thanks for your comments. Generally, I agree. In retrospect the only real value over our current state of the use of extern is that the end user would only have to consider a single header file (rather than one of two depending on their intent.) And the value of that scales with how much we would want to leave the door open for arbitrary instantiation.

SeanCurtis-TRI on 10 Oct 2016

I don't see why leaving T open ended is bad -- I think it allows for creative use of the Drake "glass box" property, which is a unique and valuable Drake feature. Why do you feel it should be closed, @jwnimmer-tri? In any case we aren't yet in a position where we can enumerate all the necessary T's, and I don't think it is 100% certain that we would want to pre-instantiate all of the more obscure ones.

I like the three-file solution for code cleanliness, but now that we understand how extern template works we have the opportunity to hide that from API users (which I think is a good thing). The pattern would be:

thing.h:
  template <class T> class Thing { ... };  // declaration
  extern template class Thing<double>;     // all available pre-instantiations 
  extern template class Thing<AutoDiffXd>;
  // Compiler ignores this for any type T mentioned in an extern:
  #include "thing-inl.h"

thing-inl.h:
  // all the definitions for class Thing

thing.cc:
  // all the instantiations

user_code.cc:
#include "thing.h"  // no need to know about -inl.h

Is there a downside to using that approach?

sherm1 on 10 Oct 2016

I don't see why leaving T open ended is bad ...

Untested code is broken code. Any combination of MyClass<T> crossed with T = SomeT has the possibility of exposing defects, even though both MyClass and SomeT are entirely correct on their own. If we are going to allow it, we either need to (1) ask users to write tests for MyClass<SomeT> when they decide to use it, or (2) make all of our code fail-fast when a SomeT value behaves incorrectly in the context of MyClass.

Option (1) is unlikely to work, because (1a) they will be using dozens of our classes with SomeT and won't easily be able to know which classes they've happened to apply SomeT to, and (1b) users are fundamentally lazy and won't read the docs that mandate those tests, or will risk it anyway.

Option (2) is realistic. The STL does it. But the amount of engineering that goes into STL implementations is huge per function, and I don't want Drake to take on that kind of burden if we can possibly help it.

In short, I think we should leave the domain of T closed, because we can't easily ensure or reason about the correctness of specialized T values.

I eagerly await _anyone, anywhere, ever_ posting a sample application where we care to have an open-ended T. I would be happy to help meet the need! But right now, we are opening ourselves up for abuse, with no end-game value in sight.

In any case we aren't yet in a position where we can enumerate all the necessary T's ...

So we are uncertain as to the domain of T, but we are still going to write generic code that takes in any T and produces a correct, well-tested, or fail-fast result? That sounds preposterous to me.

I don't think it is 100% certain that we would want to pre-instantiate all of the more obscure ones ...

Why not? Either its used and tested, or its not. All else being equal, having a common set of Ts that work throughout Drake will be easier for users to understand. Compilation is cheap.

I agree we should come to consensus on this point soon though, because we're writing a lot of Systems code, and we should agree on Ts and hygiene for how to write generic code, so we can pay off that debt as we incur it, instead of delaying it to much later.

I like the three-file solution for code cleanliness ... Is there a downside to using that approach?

One downside is the "untested code is broken code" above.

Another downside is that it hides from users whether they are using explicit or implicit instantiation. Some consider that a feature; I consider that a defect. It makes users (and us) unaware of the cost of the types they choose. I would prefer that we have to explicitly either opt-in to new standard T values across the board, or else explicitly opt-out and #include an -inl.h file. Having it be automatic means that we may end up with a whole lot of implicit compilation for a type that should be standardized, but isn't, because we never considered it.

I really do think that the set of valid types can be enumerated up-front, and amended as we find new ones. I think we should do that, at least until we have a use case that demands otherwise.

jwnimmer-tri on 11 Oct 2016

@jwnimmer-tri, I don't agree that we have any obligation to test for every possible template parameter any more than Eigen does. By not locking down their scalar types we were able to make creative use of Eigen in ways they likely did not anticipate. However, you clearly feel much more strongly about this than I do -- I think we should do it your way!

But ... I'm not sure exactly what that means now, in practice. Is it a defect to include -inl.h anywhere other than the explicit instantiation .cc? Is it a defect to include definitions in a .h file? If that's OK should those be accompanied by extern template declarations for any pre-instantiated types so the compiler won't waste time expanding the templates? Why are we documenting the existence of -inl.h files in user documentation (per cxx_inl.html) if unanticipated instantiations are to be prohibited?

sherm1 on 11 Oct 2016

I don't agree that we have any obligation to test for every possible template parameter any more than Eigen does.

I think Eigen has way more tests for interesting ScalarType values than we do. If we achieve that level of testing for things that are not double, then perhaps we _should_ open the door for more numeric-like Ts also. It will take a lot of work to get there!

I'm not sure exactly what that means now, in practice. Is it a defect to ...

I am not proposing any additional kinds of defects.

[should header files that contain code] be accompanied by extern template declarations for any pre-instantiated types so the compiler won't waste time expanding the templates?

I'm fine if someone wants to write up this rule. (I personally don't care yet; our new code is not the part of the build that is slow.)

Why are we documenting the existence of -inl.h files in user documentation (per cxx_inl.html) if unanticipated instantiations are to be prohibited?

At the time, it seemed important to people to have leeway for what Ts got used, which was going to mean putting all code in an .h file forevermore, just like System1 (and its associated nightmarish build times, even for the tiny functionality it already had). It seemed better to document the three-file inl pattern than to put all code in header files. I kind of regret doing it at this point, though; I think it was premature.

For myself, I am happy to leave the rules as-written, to defect code that uses the three-file pattern without just cause (the cxx_inl says to prefer the two-file approach), and to defect code that uses three-file pattern without the associated bump in unit testing for a variety of T values.

jwnimmer-tri on 11 Oct 2016

Yes, it all comes down to a matter of control. The more control we have over how our software is used, the more confident we can be that our software is "correct". A key challenge is how to maintain high levels of control and correctness without stifling creativity.

I wonder whether the fact that Drake is open source has any implications on our calculations for how much control we want to (or can) assert. Since anyone can change the source code by, for example, adding new explicit template instantiations with bizarre types, have we already given up control of what types T can take on?

liangfok on 11 Oct 2016

There are some interesting points here. @jwnimmer-tri's points vis a vis the principle of correctness certainly appeals to me -- there's a certain "prevent the user from using my code to hurt themselves while blaming me" aspect that I generally subscribe to. But it does represent the ideal.

And @liangfok makes an excellent point that this is open-source code. We are essentially passing out rope to people and saying, "Do with it what you will." This significantly undercuts our ability to make sure that users only pursue tested code paths.

Generally, the arguments in favor of the untested implies incorrect school of thought sway me the most. In that case, it seems to me that the templated code we have is strictly an _internal_ facility. It's useful for us to re-use the code. However, that puts us in the position of exposing code (i.e., the header files for these classes) that suggest explicit instantiation that we don't actually support. A bit of a mixed message.

SeanCurtis-TRI on 11 Oct 2016

That is a great write-up, and it sounds like we are heading toward the same page.

However, I don't quite follow this part:

However, that puts us in the position of exposing code (i.e., the header files for these classes) that suggest explicit instantiation that we don't actually support. A bit of a mixed message.

Would doing [header files that contain code are] accompanied by extern template declarations for any pre-instantiated types make it better? Or even adding the extern statements for the types we support, instead of relying on doxygen for that? Or did you mean something else?

jwnimmer-tri on 11 Oct 2016

Something like that. The idea is that if I'm using code that exposes a templated class/function to me, I assume that as long as things compile with the template parameter of my choice, I'm good. I believe, that is the very mindset you're trying to avoid. But for me, it is implicit in the presence of the template.

It may well be that our final resort is documentation. We tell the user that these classes are intended to be used with a specific set of types. The use of extern template instantiation in the header file can be tied to the documentation as an explicit list of "this gets the stamp of approval" -- this has a very self-documenting feel rather than referring them to another file or maintaining a list of comments that must correlate with code in another file. And the final clause is one of _caveat emptor_; instantiate whatever you want above and beyond this set, but there are no guarantees.

SeanCurtis-TRI on 11 Oct 2016

Hm, maybe not so much on the same page. My opening bid would be to put all of the code in the .cc file (with perhaps still extern template in the header as documentation).

jwnimmer-tri on 11 Oct 2016

That would offer compiler enforcement of them not creating novel instantiations. And that's the ultimate question; do we want to completely close the door. If so, I think your bid is the best way to do so. If we crack the door open, things get slippery.

SeanCurtis-TRI on 11 Oct 2016

Sorry to be only casually following the conversation, but one use case which we were concerned about in RigidBodyTree was the fact that the dimension of the state vector was not known until runtime (e.g. you dynamically load a robot from urdf). In that case, the AutoDiff type is not known until runtime. Allowing a user to compile something with a maximum size that matched their robot could make a dramatic speed difference vs choosing a larger dynamically sized allocation. @tkoolen had a way around that which came up a month or two ago, which I never fully parsed. But it might be a case where allowing a user to explicitly instantiate is important.

RussTedrake on 11 Oct 2016

As I understand @tkoolen's approach it is to "chunk" the derivatives into fixed-size pieces. There will presumably be a diminishing return on ever-larger AutoDiff sizes, so that perhaps a pair of AutoDiff<64>s might run almost as fast as a single AutoDiff<128>. If that's right then we could pre-instantiate an assortment of AutoDiff sizes up to the point where it doesn't matter any more, and work in chunks after that. Unresolved are (a) how big would we have to go, and (b) is it reasonable to pre-instantiate all the useful sizes prophylactically?

sherm1 on 11 Oct 2016

I believe the idea of "chunking" is described in #2619.

liangfok on 11 Oct 2016

One important detail: the chunking approach I used allows specification of a fixed maximum size of the derivative vector.

Example: if you want the Jacobian of a function f that takes a length-17 vector as an input and you use a chunk size of 10, then f will be called with AutoDiffScalar<FixedMaximumSizeVectorUpTo<10>> inputs twice; the first chunk argument will have a runtime length of 10, and the second chunk argument will have a runtime length of 7.

So a single instantiation (e.g. for AutoDiffScalar with fixed max derivative size of 10) would already be able to handle gradients/Jacobians/hessians with any number of derivatives. Which instantiation(s) to choose is still up for grabs, but it's just a matter of optimizing performance.

tkoolen on 11 Oct 2016

👍1

So far it sounds to me like "put all of the code in the .cc file (with perhaps still extern template in the header as documentation)" meets our needs, in terms of code-style for scalartypes? (We still have separate work to do on defining our T domain, and making sure we test it for all Systems code.)

jwnimmer-tri on 13 Oct 2016

"put all of the code in the .cc file (with perhaps still extern template in the header as documentation)" meets our needs?

I think so. But @SeanCurtis-TRI is chasing down a problem that came up with a mix of in-header and in-.cc definitions, in particular default constructors and assignment. Those can only be declared = default in the header.

sherm1 on 15 Oct 2016

So, in trying to use this, I encountered an issue (that I've only been intermittently been able to reproduce.) However, the error is related to known gcc bug:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57728

In simple terms, the error is this: using the extern template declaration can interfere the creation of default methods (i.e., copy constructors and assignment operators). Clang and gcc have different behaviors and the assertion in the bug discussion is that gcc is wrong. So, I'd recommend the whole thing be put on hold until we move to a pair of compilers that give the same results.

SeanCurtis-TRI on 21 Oct 2016

👍1

This was such an interesting discussion. Sad to hear gcc has this bug, I actually liked the extern approach, if it worked.
So it seems that for now we'd still be using the -inl.h pattern then?

amcastro-tri on 21 Oct 2016

The style guide currently says prefer two-file if you can, or else fall back to three-file if you have some weird requirement. That will be true until a PR changes the rules.

jwnimmer-tri on 21 Oct 2016

http://stackoverflow.com/questions/14256726/default-constructor-destructor-outside-the-class appears to successfully work around the GCC bug.

jwnimmer-tri on 19 Nov 2016

👍1

Yay! Great find, Jeremy. Check it out @SeanCurtis-TRI.

sherm1 on 19 Nov 2016

I've been rethinking through this ticket a bit today.

jwnimmer-tri on 19 Nov 2016

👍3

Can I ask if we should go ahead and incorporate this into the C++ *-inl.h files documentation?
grep'ing through our source base reveals that we tend to not be using this (only in externals).

I can see that this is relatively low-priority, but an easy start would be adding this as a suggestion in the above page.

EricCousineau-TRI on 18 Apr 2017

Hi @EricCousineau-TRI. Might be a good idea to submit a PR adding some of these extern template declarations to existing classes before making this a documented requirement. Assuming it doesn't cause problems, then upgrade to a requirement in the doc?

sherm1 on 18 Apr 2017

@sherm1 Sounds good! Will do so in the next few days.
I may also have a PR coming in leveraging extern template, but within the scope of explicitly instantiated methods, rather than classes.

EricCousineau-TRI on 18 Apr 2017

Added in a draft PR to my personal repo (to avoid CI for the time being): EricCousineau-TRI/drake#1

Benefits:

Explicit header-file visibility of explicit instantiations.

Drawbacks:

Not necessary (I think). An implicit instantiation of a *-inl.h class should not really have any meat, since everything is already defined external.
- @jwnimmer-tri Your comment was from a few months ago, and may have been made before the definitions were moved into *-inl.h - may I ask if it still holds? I ask because, as things stand, I don't think there will be too much issue of duplicate object code (if all templates method definitions are made in the *-inl.h file.
You must explicitly expose all upstream types for extern template declarations.
- In this case, I had to move #include "drake/common/eigen_autodiff_types.h" from kinematics_cache.cc to kinematics_cache.h.

I think there should be a mention of extern template in our doc, if anything to at least mention that they are only necessary if you have definitions in your header file (such as ostream helper or what not).

EricCousineau-TRI on 19 Apr 2017

Since I posted my comment most recent comment here many months ago, more people have used the three-file pattern without just cause, which is now confusing the issue because that pattern seems predominant. We should stop doing that! I could write up a whole paragraph here with gripes but suffice to say I don't think we're doing a very good job as platform reviewers pushing back on the three-file pattern. It's supposed to be for exceptional cases only.

But yes, I still agree with my most recent upthread comment. If you consider my comment with respect to things like drake/systems/framework/*.h it makes more sense, but you'll might be confused if you instead consider it with respect to the three-file pattern. In systems/framework, approximately all of the class template methods are implemented directly in the *.h files at their point of declaration; in that case, extern template class _for double only_ would provide the benefits I listed.

If the class signature is already aware of other standard scalartypes (e.g., System already is, because of transmogrification), we could imagine adding extern template class for AutoDiffXd and symbolic::Expression if we wanted, but I would be content to start with double across the board first.

jwnimmer-tri on 21 Apr 2017

In the years since I wrote (much higher upthread) my objections to having an open set of scalartypes allowed, I have changed my mind. I think for core libraries such as framework, analysis, controllers, we do get some benefit to allowing the user to choose their own scalar. For example, the user could choose an exact fized-size for AutoDiff. The user might have their own autodiff-like type that would be useful to try. We should offer some good defaults (common/default_scalars.h should probably add a few more), but leave the set of types open for users to extend.

I think @sammy-tri is going to experiment with adding a macro DRAKE_DECLARE_CLASS_TEMPLATE_INSTANTIATIONS_ON_DEFAULT_SCALARS for use in a header file, to match the DRAKE_DEFINE_CLASS_TEMPLATE_INSTANTIATIONS_ON_DEFAULT_SCALARS currently used in *.cc files. That might be the best of both worlds.

jwnimmer-tri on 7 Jan 2019

Opened PR for the implementation described above: #10350

sammy-tri on 9 Jan 2019

Calling this one resolved by #10350

sammy-tri on 9 Jan 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

RigidBody::body_index_ is not a good unique identifier in RBT

amcastro-tri · 4Comments

Ensure that Scalar's used within Eigen::Matrix<> do not cause undefined behavior when the matrix is resized

EricCousineau-TRI · 5Comments

Consistency among ==, <=, >=, <, >, != operators for symbolic expressions/formulas ....

mitiguy · 4Comments

use variable names instead of long text to access elements of the context

RussTedrake · 4Comments

Implement PID controller

amcastro-tri · 5Comments