As we are switching from char * to std::string in the JitBuilder interface, lifetime has become an issue worth a discussion. A lot of the JitBuilder APIs take in char * parameters and use / store them well beyond the lifetime of that method call. This means that we are relying on the JitBuilder client to allocate the strings for long enough such that they don't result in dangling pointers.
A lot of the JitBuilder example code that we have today are using string literals, and therefore, this may or may not have got much attention. However, with the std::string change in place, we need to decide how do we go about managing lifetime, since the string reference passed into a JitBuilder API may be invalidated as early as the termination of the method call, and so would the underlying character array.
But, given that we still need access to the character array beyond the lifetime of the method call would mean that we make an internal copy of the string and store it within that JitBuilder class (possibly inside a std::vector). However, this comes with the assumption that the character array would not be required beyond the lifetime of that particular JitBuilder object (i.e. beyond compilation) - which I doubt
would be a valid one to make for some cases.
Looking for some insight / suggestions.
@mstoodle @mgaudet @Leonardo2718 Please share your take on this.
It's slightly more complicated, in that any allocations that happen during a compilation need to be tracked (and allocated) via compilation-level allocators. And that includes the backing allocations for characters; they cannot just be malloc'd or ::new'd.
Anything allocated during the execution of a MethodBuilder's ::buildIL() needs to be allocated like that, unless the object needs to have a lifetime extending past the compilation. Things that are allocated in a MethodBuilder's constructor (outside it's ::buildIL() method, anyway) will need a longer lifetime.
I think my general preference would be that, if JitBuilder needs a string to survive longer than the call that provided it, then JitBuilder should really take over the responsibility to manage that string (you're right that using const char * helped to defer even if it didn't really address this problem). I don't feel that it is fair to impose a requirement on the client to reason about how long JitBuilder will need to refer to those std::string objects. So: currently we're being unfair. We should make that better. The switch to std::string just puts the problem more in clients' faces.
Tying the lifetime of these std::string objects to the lifetime of the JitBuilder object (IlBuilder, BytecodeBuilder, MethodBuilder, TypeDictionary, etc.) seems appropriate. Right now, all IlBuilder objects created during IL generation survive for the duration of the compilation because of the way they're allocated (hm, and then become dangling pointers off the internal fields of the MethodBuilder object :( sigh ), but we shouldn't make that a requirement, I don't think. For the character arrays that get stuffed into other structures (like TR::ResolvedMethods), we should really make a copy in the appropriate "heap" allocation region so that they are cleaned up when the compilation ends. That's why you'll see a couple of explicit allocations for char arrays in the existing code base.
Did I miss anything or am I not looking at anything the right way?
I think JITBuilder owning string lifetimes is the right idea. I want to make it clear, though, that
any allocations that happen during a compilation need to be tracked (and allocated) via compilation-level allocators
precludes the use of std::string. We can use std::basic_string< char, std::char_traits< char >, TR::typed_allocator< char, [TR::Allocator or TR::Region &] > > as an equivalent but we won't be able to use basic_stringstream until we support at least C++11 (and even then we will likely have a problem because the constructor provides no direct way of specifying an Allocator instance)
Thanks @mstoodle . This did answer my question (although, I had a few other concerns, which we already discussed).
Given what @lmaisons mentioned, for the core compiler classes such as TR::ResolvedMethod, which would be the preferred approach:
std::basic_string copy with custom allocator as mentioned above, or,std::string contents which get passed over through method calls from JitBuilder.My natural instinct says it would be the former, just because of the _ugliness_ involved with fixed size arrays. However, I want to check what the consolidated take is on this.
== EDIT ==
Also, if we are to use custom allocators, what would be the appropriate memory region to allocate std::basic_strings inside TR::ResolvedMethods?
Most helpful comment
It's slightly more complicated, in that any allocations that happen during a compilation need to be tracked (and allocated) via compilation-level allocators. And that includes the backing allocations for characters; they cannot just be malloc'd or ::new'd.
Anything allocated during the execution of a MethodBuilder's ::buildIL() needs to be allocated like that, unless the object needs to have a lifetime extending past the compilation. Things that are allocated in a MethodBuilder's constructor (outside it's ::buildIL() method, anyway) will need a longer lifetime.
I think my general preference would be that, if JitBuilder needs a string to survive longer than the call that provided it, then JitBuilder should really take over the responsibility to manage that string (you're right that using const char * helped to defer even if it didn't really address this problem). I don't feel that it is fair to impose a requirement on the client to reason about how long JitBuilder will need to refer to those std::string objects. So: currently we're being unfair. We should make that better. The switch to std::string just puts the problem more in clients' faces.
Tying the lifetime of these std::string objects to the lifetime of the JitBuilder object (IlBuilder, BytecodeBuilder, MethodBuilder, TypeDictionary, etc.) seems appropriate. Right now, all IlBuilder objects created during IL generation survive for the duration of the compilation because of the way they're allocated (hm, and then become dangling pointers off the internal fields of the MethodBuilder object :( sigh ), but we shouldn't make that a requirement, I don't think. For the character arrays that get stuffed into other structures (like TR::ResolvedMethods), we should really make a copy in the appropriate "heap" allocation region so that they are cleaned up when the compilation ends. That's why you'll see a couple of explicit allocations for char arrays in the existing code base.
Did I miss anything or am I not looking at anything the right way?