Roslyn-analyzers: Design for "Dispose ownership transfer" in dispose rules

Created on 26 Feb 2018 · 11Comments · Source: dotnet/roslyn-analyzers

We currently have following related dispose rules that ensure that disposable objects are disposed by the creating method and/or containing type:

1) CA2000 (Dispose objects before losing scope): A local object of a IDisposable type is created but the object is not disposed before all references to the object are out of scope.
2) CA2213 (Disposable fields should be disposed): A type that implements System.IDisposable declares fields that are of types that also implement IDisposable. The Dispose method of the field is not called by the Dispose method of the declaring type.

So, a disposable object created in a method body, which is not saved into an instance field and/or escaped via either calling into an external library or returned by the method, must be disposed in a method. If assigned to a field of a disposable object, then the Dispose method (or call graph rooted at Dispose) of the containing type must dispose the field. If escaped via returning the disposable object, then the caller takes over the ownership to dispose the object.

These rules seem appropriate, except for the cases where a _dispose ownership transfer_ takes place from method that creates a disposable object to a wrapper object that holds onto this disposable object into a field and disposes the field when the wrapper is disposed. For example, see ModuleMetadata. According to above dispose design, the callers of the ModuleMetadata constructor have dispose ownership of the passed in dispsoable PEReader argument, and hence are all flagged. In practice, ModuleMetadata is wrapper that creates a PEModule object that wraps the PEReader object and disposes it when the PE module is disposed. Dispose ownership is transferred from creators of ModuleMetadata, to the ModuleMetadata object to the PEModule object. We need a complete inter-procedural dataflow analysis to figure out this dispose ownership transfer and ensure that either PEModule.Dispose disposes the PE reader (CA2213) or the methods creating PE reader disposes it (CA2000). Legacy FxCop as well as our current implementation do not perform inter-procedural analysis, and generates false negative CA2000 instances for these cases.

Design choices:

Keep reporting CA2000 false positives when dispose ownership transfer takes place. End users must explicitly suppress violations in source when this takes place. However, note that CA2213 will not flag such cases even if the actual owner of the disposable object after this ownership transfer does not dispose the wrapped disposable object - CA2213 fires only if a disposable field was assigned to a locally created disposable object in one of the containing type's methods. This causes us to miss reporting cases where there is an actual dispose violation.
Add support for end users to explicit specify dispose ownership transfer method/parameter pairs in an additional file that the dispose rules can use. Essentially implement https://github.com/dotnet/roslyn-analyzers/issues/1587. Dispose rules can correctly detect such ownership transfers and we avoid CA2000 false positives and don't miss CA2213 violations in eventual owner.
Stay with option 1 for now and fix it when and if we eventually add inter-procedural analysis. Note that the analyses will still be more computationally expensive than approach 2, but doesn't need user annotations/configuration as in approach 2.

I believe this issue captures one of the core design issues around dataflow analysis -

perform less expensive and precise analysis that generates known false reports that can be explicitly suppressed in source OR
perform more powerful but computationally expensive inter-procedural analysis OR
take a hybrid approach of user annotations to aid analyses to produce more precise results with not so expensive analysis

I prefer 3 as it also provides an explicit (pseudo) documentation/contracts and allows user configuration for cases that analyses just cannot detect. Primary downside would be if such annotations or user configuration just grow too much and get out of sync with actual user code.

DataFlow Discussion

Source

mavasani

👍1

Most helpful comment

https://github.com/dotnet/corefx/issues/37873

JohanLarsson on 4 Dec 2019

👍2

All 11 comments

Putting some more thought, I think approach 3., i.e. interprocedural analysis, to track how the disposable instance moves around shouldn't be super expensive as the instance generally does not get passed around multiple times (given it is a disposable object!), and we can enhance CA2000 and CA2213 to do a lot better job at reducing false positives and also enhance CA2213 (currently it misses out some true leaks due to its design: CA2213 fires only if a disposable field was assigned to a locally created disposable object in one of the containing type's methods. This causes us to miss reporting cases where there is an actual dispose violation.)

Tagging @dotnet/roslyn-analysis

mavasani on 2 Mar 2018

I would be interested in seeing some improvements to the heuristics involved with intra-procedural analysis.

Single transfer

A warning should be reported if code flow results in ownership transfer of an object not owned by the current method/instance. This covers multiple transfers of the same object as well as unintentional transfer of ownership e.g. via an argument. This rule is intended to expose false negatives.

Object construction

The first such rule would involve transfer of ownership via object construction. This pattern occurs when an argument is passed to a constructor under the following conditions:

The declaring type of the constructor is disposable
The declared type of the parameter corresponding to the argument is disposable

Caller analysis

The caller is assumed to transfer ownership of an object when passing an argument to the constructor with the properties given above.

Callee analysis

A constructor is assumed to have gained ownership of an object when passing an argument under the conditions above.

sharwell on 2 Mar 2018

The first such rule would involve transfer of ownership via object construction

I am strongly opposed to adding any heuristic for dispose ownership transfer. There are multiple reasons:

Dispose rules are much special then any rules - lots of false positives are fine, but we should not mask _true_ leaks. This heuristic guarantees that we are masking cases where passing an object to a constructor does not lead to eventual dispose of that object.
Legacy FXCop implementation did not have such a heuristic for a very specific reason - it was designed to be more aggressive because customers asked for such a behavior. When we have tried to cut down implementation of dispose analyzers citing high level of noise as the underlying reason, there has been an extremely strong opposition from the community. In fact it has been explicitly stated multiple times that noise level is not an issue with dispose as it allows users to understand and/or improve their own code structure, and if it indeed turns out be a false report, suppressing is no big deal. See examples here: https://github.com/dotnet/roslyn-analyzers/issues/695#issuecomment-182399857, https://github.com/dotnet/roslyn-analyzers/issues/695#issuecomment-224253449 and https://github.com/dotnet/roslyn-analyzers/issues/695#issuecomment-232643339.
If the false positive rate is indeed concerning we have approach 2 and 3 available to us - A quick prototype can help estimate the improvements from 3, with any related performance implications (which I believe should not be too much as disposable objects do not get down around 100 successive method calls or such). If inter-procedural is not what we want to go, we should take approach 2 and allow users to explicitly define dispose ownership contracts to avoid in-source suppressions.

Overall, I think the least preferred approach for dispose ownership rule is to add any heuristic to mask real failures. Current state gives us parity with FXCop and hence we can wait on implementing 2. or 3 for v2 of DFA rule, i.e. once we have ported all DFA rules and also moved to compiler's CFG. @jinujoseph to confirm, but I think we have had parity as our v1 benchmark since starting work in this area.

mavasani on 2 Mar 2018

@sharwell I also just read your comment here: https://github.com/dotnet/roslyn/pull/25179/files#r171921736

We can avoid masking by pairing the transfer heuristics - each case where signatures treat the code as a transfer of ownership, both the caller and callee agree this is the case

This is not possible without inter-procedural analysis. To identify who owns or disposes an object that moves around methods needs inter-prodedural analysis. Otherwise, we are going to have false positives either at the caller or callee. Legacy FXCop implementation decided to retain the false positives in CA2000 (i.e. the caller) and your recommendation will move to them to the callee (CA2213). I don't think we gain anything by moving away from original FXCop design as any design without inter-procedural analysis OR annotations will lead to false positives. In fact we introduce a big risk of introducing a barrage of new false positives for CA2213. Legacy FXCop customers now end up with needing to suppress them at caller (done in past for CA2000 false positives) and now at callee (your proposed heuristic).

mavasani on 2 Mar 2018

don't we need some kind of annotation at the end, anyway? we don't need annotation for parity but to be better than FxCop, I think we said we need annotation before? since .net or language itself doesn't have concept (or a way to expresses explicitly) of ownership transfer ?

heejaechang on 2 Mar 2018

we don't need annotation for parity but to be better than FxCop, I think we said we need annotation before

We should surely add something here to improve over FXCop - no arguments to this. I think we want to at least prototype or design an inter-procedural approach that needs least user work and understand the performance and implementation costs. If these are indeed reasonable, then we want to try it out. Otherwise, we can always implement the annotation strategy, whose cost is already pretty well understood.

mavasani on 2 Mar 2018

I am strongly opposed to adding any heuristic for dispose ownership transfer ...

I don't think we gain anything by moving away from original FXCop design as any design without inter-procedural analysis OR annotations will lead to false positives.

The rule already operates on heuristics, and already produces a non-trivial number of false positives. Furthermore, it is trivial to demonstrate that some intra-procedural approaches produce more false positives than others. To demonstrate this, we simply remove the treatment of return values as a transfer of ownership. Given that we have not tried any alternatives to these rules in practice, it is not unreasonable to think that other approaches could produce more accurate results without resorting to to inter-procedural analysis or annotations not already in place.

An ideal solution would have the following characteristics:

A set of rules can be defined which, when followed, produce code which does not contain disposable object resource leaks (or non-deterministic release).
The set of rules can be understood and applied correctly by a mid-level engineer.
The set of rules can be fully evaluated by an analyzer using intra-procedural analysis only, producing no false positives when the rules are followed.
The set of rules can be followed by adhering to a subset of established .NET conventions without the introduction or use of attributes dedicated to this purpose.
The conventions used in the previous step align with conventions commonly applied to new .NET code.

The inter-procedural approach may produce better overall results in the end, but relying on it too early will limit our ability to meet the third goal, or worse, the second goal. Annotations (attributes or otherwise) have a similar impact on the fourth goal.

sharwell on 3 Mar 2018

I prefer 3 as it also provides an explicit (pseudo) documentation/contracts

Yes, I agree. Don't forget that it is not only the analyzer that needs to understand the code, but users who reads the code needs to understand how to interact with it. I would prefer not only have to rely on tooling for that.

Liander on 3 Dec 2019

Primary downside would be if such annotations or user configuration just grow too much and get out of sync with actual user code.

Analyzers will be able to check that annotations are in sync in many places.

JohanLarsson on 4 Dec 2019

https://github.com/dotnet/corefx/issues/37873

JohanLarsson on 4 Dec 2019

👍2

users who reads the code needs to understand how to interact with it.

That's actually the case right now. Users have to read the docs to know whether a return value must be disposed or not.

We have had the same discussion with @JohanLarsson on https://github.com/DotNetAnalyzers/IDisposableAnalyzers/issues/126 (and on the IDisposableAnalyzers' gitter), and we came to the conclusion that annotations were needed to make it correct, but the adoption would be limited by the fact that the Annotations package would need to be implemented in every project.

We then ended to the conclusion that something needed to be done in .NET core directly, and this is why I proposed https://github.com/dotnet/corefx/issues/37873 . Opinions welcomed there 😃