Runtime: Serializing exceptions in CoreCLR

Created on 18 Jan 2016  路  37Comments  路  Source: dotnet/runtime

It has recently come to my attention that CoreCLR has eliminated ISerializable. While I believe that this is a move in the right direction, it seems to have also removed the only way possible to correctly serialize exceptions in .NET. Serialization of exceptions is tricky, because they are objects which encapsulate two types of data:

  1. Data defined at construction time, such as Message, InnerException or any other fields defined by the particular exception implementation.
  2. Data appended by the runtime, such as StackTrace. Importantly, this is data that cannot be modified by the user without resorting to reflection.

A correct exception serialization should carry along both types of data. This is very important in distributed frameworks such as akka.net, mbrace and I would imagine Orleans too. So my question here is, what will CoreCLR be offering as a replacement scheme for serializing exceptions?

enhancement

Most helpful comment

@mikedn This is in the context of cloud-scale distributed programming systems such as AkkaDotNet and MBrace (http://mbrace.io). Please familiarize yourself with these systems, check out this video for example. https://channel9.msdn.com/Events/dotnetConf/2015/The-F-Path-to-Data-Scripting-Nirvana

In this kind of system, .NET code gets distributed and run over clusters of hundreds or thousands of machines. These represent a very important use of JVM and .NET-like systems and are some of the most promising applications of .NET in the cloud. They are not the kind of system where you connect your debugger to each of 10,000 .NET instances, something which obviously won't scale.

In this context, it is entirely normal to distribute code to other machines and to marshal results (including exceptions) back to host processes. This is normal on the JVM and normal for .NET Framework. So yes, the use case is valid.

All 37 comments

Why would you need to pass around all the exception details (the stack trace in particular) in a distributed framework?

@mikedn same reason why you would want it in a non-distributed setting. An exception without an accompanying stacktrace is useless in many cases.

same reason why you would want it in a non-distributed setting

That's the problem. In a non distributed setting the stacktrace is useful only in 2 cases - debugging and logging. Neither of them seem to require serialization (actually deserialization since that is the actual issue).

It might the case that you want to re-raise an exception with its original stacktrace intact. This is where ExceptionDispatchInfo is being used.

Well, yes, sometimes you want to preserve the stracktrace but that's only for logging and debugging purposes. But that's usually the case only inside the same process. Is there an actual need to preserve such exception details across processes?

@mikedn This is in the context of cloud-scale distributed programming systems such as AkkaDotNet and MBrace (http://mbrace.io). Please familiarize yourself with these systems, check out this video for example. https://channel9.msdn.com/Events/dotnetConf/2015/The-F-Path-to-Data-Scripting-Nirvana

In this kind of system, .NET code gets distributed and run over clusters of hundreds or thousands of machines. These represent a very important use of JVM and .NET-like systems and are some of the most promising applications of .NET in the cloud. They are not the kind of system where you connect your debugger to each of 10,000 .NET instances, something which obviously won't scale.

In this context, it is entirely normal to distribute code to other machines and to marshal results (including exceptions) back to host processes. This is normal on the JVM and normal for .NET Framework. So yes, the use case is valid.

This is in the context of cloud-scale distributed programming systems such as AkkaDotNet and MBrace ...

Yes, I know that. But that doesn't automatically imply that exception details such as the stack trace must flow across the system. The functionality of such a system does not depend in any way on such details.

They are not the kind of system where you connect your debugger to each of 10,000 .NET instances, something which obviously won't scale.

And that's why it's more likely to rely on exception logging (at the source of exception) instead of attempting to serialize the whole thing in the hope that someone will log/debug at destination.

Mbrace reraises the exception automatically on the client, right?

@forki not only that, depending on the workflow an uncaught exception raised in a remote job may bubble up on its parent job inside the cluster. This is not really mbrace-specific, it is perfectly common for actor implementations to pass exceptions as messages to a remote actor for further processing.

Akka.NET allows for an actor to supervise another actor on another machine. Parent actor on machine A supervises Child actor on machine B.

if Child actor fails, the entire exception is propagated to the Parent actor to be analyzed in its supervisor strategy.. it might be logged, it might be part of logic to decide what to do with the failed actor.

So allowing as much details about an exception to flow across systems as possible is important for us too.

I think that the original issue is slightly misleading because it focuses on things like stack trace that are potentially questionable.

What's more interesting is the custom data added not by the runtime but by the exception source itself. For example, SqlException has a bunch of readonly properties and some of them might actually be needed at the destination - SqlException.Number for example. There's no mechanism that allows deserializing such an exception (except reflection...). It's not even possible to create such an exception object because its constructor is internal.

There are reasons why WCF uses fault contracts. One of those reasons is that exceptions are simply not designed to work well cross process in the absence of ISerializable.

MBrace certainly benefits from the stack trace. If I write

let f1 () = ... raise exception ...

let f2 () = ... f1 () ...

cloud { f2() } |> cluster.Run

Then we expect an exception raised on the client and a stacktrace with f1 and f2. To be honest it's probably the most important part of the exception details we need :)

An easy solution would be to include stacktrace modification utilities inside the System.Runtime.ExceptionServices namespace. Roughly,

public static void SetStacktrace(Exception e, string stacktrace);
public static void AppendStacktraceEntry(Exception e, string stacktrace);

These could be utilised by serialization libraries for correct exception serialization without resorting to reflection. The latter method is also useful in slightly unrelated applications, for instance symbolic stacktraces in asynchronous methods [1,2]. With a bit of design effort, these could be incorporate to extend the (somewhat limited) ExceptionDispatchInfo class.

@gkhanna79 what does the milestone "Future" mean exactly?

@forki It means that the fix will be part of an update to the runtime later in the year. It is key for this since we need to be intentional about how the feature is going to be designed.

These could be utilised by serialization libraries for correct exception serialization without resorting to reflection.

You dont want to just stick with the stacktrace, especially if there would be other exception details you are interested in. It sounds like, for the case here, it maybe useful to determine how to pass ExceptionDispatchInfo across actor/process boundaries.

it is entirely normal to distribute code to other machines and to marshal results (including exceptions) back to host processes.

With the above said, I am wondering why is this unique to exceptions? Granted that exceptions is one of the artifacts required to returned/marshalled/serialized from one process to another but similar argument can be applied to bunch of other (potentially custom) types too. It sounds like this is about general purpose mechanism to flow the information from one process/actor to another and passing exception data (StackTrace in this discussion) is one of the use-cases for this general purpose mechanism. @weshaggard @stephentoub What are your thoughts around serializing data across process/component bundaries?

With the above said, I am wondering why is this unique to exceptions?

What makes exceptions stand out is that they share a common base type whose data is not user-accessible. Arbitrary types are usually standalone and their implementer can specify custom serialization rules, if indeed he wants them to be serializable.

Adding some serialization folks @zhenlan @SGuyGe @khdang

@shmao do you think if surrogate will be an option here?

In Orleans, automatic propagation of exceptions across distributed calls and machine boundaries is one of the most important features. A caller to a grain (actor) A receives an exception thrown in the call chain of, say, A calling grain B which in turn calls grain C regardless of where in the call chain the exception was thrown, even if A, B, and C run on different physical/virtual nodes.

This gives developers building apps/services on Orleans a moral equivalent of distributed, asynchronous try/catch semantics that just magically works across machine boundaries.

I also +1 @dsyme that for distributed, often high scale, applications logs are the only insight into root causes of failures, often times byzantine. Losing _any_ relevant information may cost extra days and weeks of hair pulling investigative work.

Exceptions in general and call stacks in particular are so important for us and our users that we go an extra mile to marshal information even when a non-serializable exception is throw - by extracting vital data from it and passing it as a string.

To lose serialization of exception details would be a drastic step back for Orleans and its users. I don't even understand the rational for that, to be honest. Performance can't be an argument, I think, as this is for already exceptional situation.

If one wanted to pass exceptions across processes in .Net Core, using ISerializationSurrogateProvider would be an option.

There was a discussion on how .Net Core would support serialization of types like exceptions. Here is the link to the meeting notes. The notes indicate,

Serialization team asked whether we should add ISerializable into System.Runtime. Our stance on this topic is that we don't want to make serialization a technology that the core has to know about. In particular, we don't want to add types that require core framework types, such as exceptions and collections, having to implement serialization logic.

Of course, we want the .NET platform to be able to provide the right building blocks for serialization, which includes reflection and design time code generation. We believe the right model for serialization is make it so that consumers can register handlers for serialization/deserialization of types. ... This model is referred to as surrogates and it's already supported by some serializers, such as data contract serializer

@shmao ISerializationSurrogateProvider provides one pattern for defining serialization logics externally, but it does not on its own constitute a way rebuilding exceptions that were raised remotely. The problem that motivated the creation of this thread is the removal of this constructor in particular. While you are right to say that this is code tied to a very particular serialization pattern, it also provided the only way in which we could safely and correctly reconstruct exceptions in a runtime agnostic manner (i.e. one that works both with .NET and mono).

Surely, the suggestion that we should move back to reflection serialization is misguided. The implementation of the exception base class is strongly tied to the particular runtime implementation and contains inherently nonserializable fields including pointers. Field-based serialization for exceptions is not version-tolerant nor cross-platform. The fact that exceptions can be mutated by the runtime even makes this a potentially dangerous ordeal.

What we need here is a successor scheme that allows us to safely reconstruct/reraise exceptions in a remote process. Decoupling this mechanism from ISerializable is a good step forward, however removing it altogether is a case of throwing the baby out with the bathwater.

@shmao having a surrogate provider is simply not enough.
Serializers needs to be both able to read the relevant data from each exception type, and then know how to reconstruct the object again.
Thats fine for POCO objects that are used as data carriers only, but types that are not designed for serialization in the first place, are really hard to deal with.
That is a problem in the current stack for types like the Bcl.Immutable collections.
The serializer need to make assumptions on what fields are relevant, and hardcode what constructor or object recreation strategy to use.. it is a huge pain.

If exceptions arent serializable out of the box, we will have the exact same problem as we have today with immutable collections.
With the big difference that the immutable collections are limited in number and we can hard code strategies for those.
Exceptions come in all shapes and forms, we can not rely on black magic for those.

Also, from a strategy point;
.NET have had a notoriously hard time to position itself as a platform for anything else than webpages and desktop apps.
Distributed and scalable systems are mostly a domain for Java, Scala and Erlang right now.

Now that we have a few attempts to change that with frameworks like Mbrace, Orleans and Akka.NET, I think it would be a strange move to make serialization even harder than it is today.

@RogerAlsing MS is a cloud company now. I assume supporting the major cloud programming frameworks is no. 1 priority. Since all maintainers of the big .NET cloud frameworks already expressed big interest here they are surely already working on this.

What if we simply exposed the setter for StackTrace property? Would it unblock the scenarios discussed here?

@krytarowski No, because the serializers could never know what constructor to call as it is no longer possible to create empty objects using FormatterServices.GetUninitializedObject

We now how to guess/estimate which constructor we should call to create the instance.
That further makes serializers incompatible between platforms.
e.g. even Json.NET handles things this way for full and portable framework.
It uses FormatterService for .NET full, and guesstimate a ctor to use for PCL.
Which could end up materializing objects with different state.
Serializers _must_ be able to handle data the same way on both ends if one end is .NET full and the other is using a lighter framework.

All exceptions have (or should have) the default ctor + ctor taking string message. So it seems like it would work for exceptions. Wouldn't it?

And no matter what we do, to make a type serializable, the designer of the type needs to either implement an abstraction or follow some convention. It seems to me like the following convention would work: expose ctor with parameter names corresponding to all get-only properties + a set of set-set properties.

@KrzysztofCwalina in principle it should improve things, though personally I would not want to expose the StackTrace property as settable to the average user. I'd rather have this achieved through helper methods hidden inside the System.Runtime namespace, so that it should only be used by developers of serialization libraries.

I commented in https://github.com/dotnet/corefx/issues/7938 that

FormatterServices.GetUninitializedObject() is absolutely necessary for Orleans-generated serializers. Without it we'll be entirely broken.

Pardon my ignorance, but why are people focusing on the ISerializable abstraction (etc) when what's actually wanted here, very specifically, is a way to:

  1. Convert an Exception (and sub-classes thereof) into a streamable form (e.g. string or byte[]), and
  2. Create an Exception from that streamable form, potentially on another system.

Exceptions are such a specialized type within .NET that providing some way of working with them like this seems eminently sensible, especially in the context of distributed systems, and it shouldn't be blocked just because ISerializable is a horrible abstraction in other contexts. And an Exception-specific solution would probably offer better performance, wouldn't it?

Relating to other threads, the same might be true for lambda...

@markrendle Without ISerializable, how would re-creating custom Exceptions work? Currently they just subclass and add whatever fields or properties they like. How would the serializer know enough to pull all the relevant fields out and then re-create it on the other side?

Maybe instead of re-creating a full-fledged exception, we can re-create a sort of ExpressionInfo type that holds all the detail of an arbitrarily sub-classed Exception? Say for instance, that this type can be created even if we don't have that exact type in the receiving system?

Going a step further, what if, when re-throwing, we could pass the ExpressionInfo into the inner exception parameter of the new Exception?

I thought of this when thinking about how Type and TypeInfo work now and thought maybe that thinking could apply to a remote Exception.

We now have BinaryFormatter back in 2.0, and some exceptions are serializable with it. We are considering increasing the set. We do in general advise against taking new dependencies on [Serializable]/BinaryFormatter based serialization.

@danmosemsft does this mean you've brought back ISerializable for exceptions or that exceptions are only serializable via BinaryFormatter? The two are not the same thing.

Hi @eiriktsarpalis . For your first question: we have brought back BinaryFomatter and marked Exception and AggregateException [Serializable] (we have had requests to add other exceptions, but right now have not). It happens those implement ISerializable in order to help the binary formatter so we brought back that too. Those exceptions that we did not mark [Serializable] and historically implemented ISerializable, we made ISerializable throw PlatformNotSupportedException. ISerializable should not be used on types that are not marked [Serializable] -- that is documented.

For your second question, you are free to serialize any type by any means you wish. Some options are listed in this document. Note that document refers to 1.x but is still accurate in this regard.

Does that help?

Can I ask what the motivation for excluding certain exception types is? For distributed frameworks such as mbrace, akka.net and orleans (whose key people have commented in this very thread) this is is crippling.

The framework exception types transitively depend on a lot of other types that are not serializable in .NET Core. Making all exception types binary serialiazable would bring back https://github.com/dotnet/corefx/issues/19119.

I see.

I think there's a tendency of conflating ISerializable with BinaryFormatter. Even though BinaryFormatter does utilize ISerializable, there are other (non-binary format, non-legacy) serialization libraries that make use of it. It's an essential abstraction (read: the only available abstraction) for extracting useful exception metadata without resorting to reflection.

No point in keeping this issue open then.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

noahfalk picture noahfalk  路  3Comments

GitAntoinee picture GitAntoinee  路  3Comments

Timovzl picture Timovzl  路  3Comments

EgorBo picture EgorBo  路  3Comments

matty-hall picture matty-hall  路  3Comments