Roslyn: C# Pure Function Keyword to Mark No Side Effects Or External Dependencies

Created on 17 Dec 2015  ยท  32Comments  ยท  Source: dotnet/roslyn

Be able to mark a method as having no side effects or external dependencies, ie: it does not change any state outside the inputs or outputs. Any code that attempts to do this would throw an exception. My thought was that the keyword could be "functional", "pure" (as in a "pure functions" mentioned in some Msdn documentation ), "purefunction", or even "nosideffects".

See https://msdn.microsoft.com/en-us/library/bb669139.aspx for some current naming conventions and reasons for this feature.

Area-Language Design Discussion

Most helpful comment

If the compiler were to start enforcing method purity through the addition of new warnings or errors than a new keyword might be necessary in order to avoid breaking changes.

private int x;

public void Unpure() {
    x++;
}

[Pure]
public void Pure1() {
    Unpure(); // legal, no change to existing code
}

public pure void Pure2() {
    x++; // compiler error, side effects
    Unpure(); // compiler error, method not marked as pure
    Pure1(); // legal, method marked as pure (even if it might not be)
}

An analyzer that could issue warnings based on PureAttribute would probably be a good start, though.

All 32 comments

:question: Considering we already have [Pure], which is very short and doesn't require new keywords be added, what additional benefit(s) would this proposal bring?

:memo: Note that I'm not necessarily against this proposal. I'm just trying to get some more context. :smile:

If the compiler were to start enforcing method purity through the addition of new warnings or errors than a new keyword might be necessary in order to avoid breaking changes.

private int x;

public void Unpure() {
    x++;
}

[Pure]
public void Pure1() {
    Unpure(); // legal, no change to existing code
}

public pure void Pure2() {
    x++; // compiler error, side effects
    Unpure(); // compiler error, method not marked as pure
    Pure1(); // legal, method marked as pure (even if it might not be)
}

An analyzer that could issue warnings based on PureAttribute would probably be a good start, though.

Doesn't pure imply "always evaluating the same result value given the same argument value"? I think C++ const would be more familiar; e.g. void M() const { }. or whatever #159 would use.

I like this idea, but how do we verify that a function has no side-effects, recursively? Does memoizing count as a side-effect? If not, how can that be verified? Even if #49 is implemented so we can encapsulate that ConcurrentDictionary instance inside the function, we cannot mark GetOrAdd() as [Pure], because it isn't.

@orthoxerox I think it needs an immutable map, and yet map itself is a mutating state, probably needs a state monad or something? Then recursion is off the table, I guess.

@alrz One option would be to add Memoized as a new parameter to [Pure]. This new parameter would force the compiler to rewrite the function into something like this if the original function was verifiably pure.

``` c#
[Pure(Memoize=true)]
modifiers return_type name(args)
{
body;
}

``` c#
[Pure(SkipVerification=true)]
modifiers return_type name(args)
{
    return_type mangled_name(args) {
        body;
    }
    static let memos = new ConcurrentDictionary<ValueTuple<args_types>, return_type>();
    static let locks = new ConcurrentDictionary<ValueTuple<args_types>, object>();

    return_type result;
    let a = ValueTuple.Create(args);

    if (!memos.TryGetValue(a, out result)) {
        var l = locks.GetOrAdd(a, new object());
        lock (l) {
            result = memos.GetOrAdd(a, mangled_name);
        }
        locks.TryRemove(a, out l);
    }
    return result;
}

Just comment.

If a method returning void is marked as pure, the compiler should be able to remove it as it has no side-effects.

@leppie Common cases where you may want code that doesn't affect program state but don't need to return a value:

  • Benchmarking
  • Unit tests - here _exceptions_ could be considered a meaningful return value, which means even pure methods marked as void can have a return value of sorts

Removing the following would probably not be desirable, yet it's arguably a pure method:

public class Requires
{
  [Pure]
  public static void NotNull<T>(T value, string paramName)
    where T : class
  {
    if (value == null)
      throw new ArgumentNullException(paramName);
  }
}

@sharwell The presence of a possible throw, hardly makes it 'pure' :) But I get what you saying. Perhaps pure is not the best word here.

@orthoxerox Memoization wouldn't make it to the language. (already proposed in #5205).

The existing [Pure] attribute is not particularly helpful, as it doesn't even produce a squiggle if you write to fields. I'm all in favor of a better way to mark methods that shouldn't have side effects. Right now I am often forced to try to use static methods for this purpose, but that only goes so far because sometimes you need static fields and there's nothing to stop you from writing to them.

The existing [Pure] attribute is not particularly helpful, as it doesn't even produce a squiggle if you write to fields.

This could be implemented as an analyzer. However, it's a bit complicated.

  • Writing to non-static fields of instances directly or indirectly created by the pure method would probably be allowed. This means a pure method can in some cases call a non-pure method, without removing its effective purity.
  • Writing to instance fields of a struct which is passed as a parameter would probably be allowed, unless the parameter uses ref. Writing to a struct parameter with ref can be fine as long as the reference points to a stack-allocated struct in a caller
  • Creating instances is generally OK, as long as the constructor is also pure. Unlike other methods, pure constructors _can_ write to their own instance fields.

    • Constructors of types which have user-defined finalizers (including in a base type) cannot be pure unless the finalizer is also pure.

Does a so-called "pure function" as a sole feature really help in a non-immutable type? C++ allowed this and instead disallowed it for static methods. Makes sense that way, but with _immutable types_ I suspect _pure functions_ as a distinct entity would make the world any better. I mean, having partially-immutable types might be confusing, yet, the C++ way of "purity" might be a better approach โ€” purity at the function level and immutability at the variable declaration (type usage) level, instead of type declaration level. This would allow declaring variables like "immutable arrays" e.g. int immutable[] arr = {3,4}; which I think even #159 couldn't address very well (via immutable unsafe).

Concept of "pure" does not have a single clear definition between languages, so it might be better to use some alternative terminology.

E.g. when I researched this last time, here's what I ended with:

  1. In D the only limitation is non-mutation of global state. "Pure" functions can mutate its arguments.
  2. In GCC there are two types of "pure": pure (no side-effects, but can read global state) and const (stricly pure as per wikipedia definition).
  3. In C#, [Pure] is defined as "does not make any visible state changes" (whatever that is).
  4. Haskell follows the Wikipedia definition (deterministic + no side effects)

http://cs.stackexchange.com/questions/24406/is-there-a-canonical-definition-of-pure-function

That's not even starting on how exceptions should behave.

I think each limitation we could apply to "pure" has it's own uses, e.g. determinism excludes reading mutable state -- good for concurrency. So maybe some more complex state attribute(s) are needed.


And if we look just at side effects, there is another question -- is this pure?

public string M() {
    var builder = new StringBuilder();
    builder.Append("Hello world!");
    return builder.ToString();
}

It can only be verifiably pure if StringBuilder.Append is marked with some variant of mutability attribute that specs self-mutation but not outside-mutation. Which again brings the need for more complex mutability descriptions.

@ashmind How about isolated for StringBuilder.Append or the whole StringBuilder class?

Local mutation within a method whose variables are not captured (free) would not be impure to me.

@alrz
I think at least the following qualities are needed (I'm not suggesting the keywords, just trying to see the whole picture).

| Function quality | Description |
| --- | --- |
| CanChangeExternalState | Non pure, default behavior |
| CanChangeArguments (including this) | Non pure, but can be used as pure if the arguments don't come from external state. E.g. combination of new StringBuilder and any number of StringBuilder.Append is side-effect-free and deterministic. |
| CanReadExternalState | Pure by some definitions, but might not be safe for concurrency |

That also raises a question of ownership -- let's say we have a class X that has internal StringBuilder in a field. If we can demonstrate that this StringBuilder is _owned_ by the class, then we can prove that changing it is changing this and not external state. So some kind of [Owns] annotation would be useful.

@ashmind I didn't understand, having said isolated or "CanChangeArguments" methods (only able to change _internal_ state) what is the need for ownership qualifiers? by "internal" we mean "not leaking outside of the class", so they must be private right? I mean a private state doesn't imply it _belongs_ to the enclosing type? and can you please elaborate on "CanReadExternalState" what are its use cases?

I didn't understand, having said isolated or "CanChangeArguments" methods (only able to to change internal states) what is the need for ownership qualifiers? by internal we mean not leaking outside of the class, so they must be private right? I mean a private state doesn't imply it _belongs_ to the enclosing type?

Example:

public class Changer {
    private readonly Changeable _inner;

    public Changer(Changeable inner) {
        _inner = inner;
    }

    public void Change() {
        _inner.X = "Y";
    }
}

Is this class changing external state or only state it owns? It's uncertain and depends on whether inner is owned by this instance, or whether it might be shared. One example where this is already important is disposal -- e.g. Autofac has Owned<T> references that specify that instance is owned and will be disposed by the owner.

and can you please elaborate on "CanReadExternalState" what are its use cases?

Reading external state makes function potentially unsafe for threading, and unsafe for permanent memoization. On the other hand, it would mean that function does not change external state, and so is safe to call it automatically in debugging, for example.

@ashmind (1) ok, assuming that _inner belongs to the Changer class, how would you know that argument passed to the constructor is not shared? (2) I'm thinking in #7626, so "CanReadExternalState" doesn't provide anything useful for immutability enforcement, right?

PS: I think the answer to the number (1) is in #160. Perhaps, a type qualifier would be better than move I guess.

Considering that PureAttribute already exists and it has been applied to some percentage of the BCL, _assuming_ (and this is a big assumption) that this has been done using a somewhat consistent set of rules, I think that any direct support for pure functions in the C# compiler should adhere to those same rules.

If that's not the case I think that the C# compiler should pick a set of rules and run with it. Trying to adopt many flavors of pure from many different languages seems like a recipe for disaster. However, I could see value in offering that level of information within the syntax tree available to analyzers.

@HaloFour Not from different languages, these are just concepts tied to immutability, if you want a safe environment to code in, I think it's good to take advantage of these concepts, it encourages you to actually think about isolation and immutability in your design and prevents you to kick yourself in the foot.

@alrz What other languages consider "pure" methods was mentioned by @ashmind. I understand that there are different concepts around immutability, but it doesn't make sense to try to take one concept like "pure" functions and to attempt to accomplish all of them when they differ in design. My point is that the CLR already has "pure" functions, as decorated by the existing attribute, and it makes the most sense for C# to adhere to the same conventions already applied rather than trying to forge its own path, or worse, trying to define some crazy matrix of degrees-of-purity.

@HaloFour There are two paths that can be taken for immutability enforcement in a language. F# does this by making mutability a special case e.g. mutable keyword, but for C# this is not applicable because everything is mutable by default. Deep immutability (#7626) on the other hand, as Eric said, "has strict requirements that may be more than we need, or more than is practical to achieve." Two scenarios in which this becomes a problem are initialization (like #229) and iteration, I can imagine that "isolation" would be helpful in these situations, while it doesn't affect "purity" as far as immutability concerns.
For example, if you want to use property initializers or iterating over an immutable list, I think it makes sense if you annotate property setters like const int P { get; isolated set; }. Also to be able to use foreach, GetEnumerator should be annotated as such, because MoveNext is not pure by definition.

There's another benefit to having purity enforced by the compiler (or, at least, to have the compiler reasonably confident about purity) -- some of the artificial constraints around covariance would be lifted. For example:

If we define a very simple ISet interface

interface ISet<T> : IEnumerable<T>
{
    bool Contains(T item);
}

Unfortunately we can't declare our Set interface as ISet<out T> because the Contains method uses T in an input position; something the language disallows to prevent inserting a Banana into a list of Fruit that is _actually_ a list of Apples.

But! In a set you _should_ be able to safely check whether it contains a given item even though the collection is covariant. Why? Because the contains function is _pure_. So the following could be allowed by the compiler:

interface ISet<out T> : IEnumerable<T>
{
    [Pure] bool Contains(T item);
}

Being pure means:

  • Not calling any unpure methods (that includes properties)
  • No assigning to fields, properties etc (basically anything that isn't a local variable)
  • Not accessing any fields unless they are declared readonly

That should cover most of the basics. In theory, if you can't any unpure data (e.g. via properties or methods) then your function kind of has to be deterministic as well...

I was just getting ready to propose this exact feature. My assumption was that pure members could only call other pure members. This would be an improvement in cases where I've created static methods just to narrow the reach available to the statements within the method. Could having such a pure keyword assist the complier in optimizations as well? Pure methods should obviously be able to be inferred by the complier in order to make the optimizations so I suppose the use of the keyword would be more about making the contract (for lack of better words) more explicit.

I see this being useful for code readability and developer experience in an IDE or code editor. Example would be when hovering over a method call in a body of code it could indicate that it is pure which gives me an immediate assurance that the method isn't modifying any state. Another option would be to more subtly changing the code syntax colorization to make pure calls distinct.

A possible extension of this could be to allow for a sort of local purity scoped by a class where only fields on the local instance or other members also marked as _local pure_ could be invoked. This would allow class implementations that could guarantee that it doesn't reach out to any global singletons or anything like that. The keyword that would make the most sense here to me would be isolated. Both the proposed _pure_ keyword and a hypothetical _isolated_ keyword seem to be sort of inversely related to the normal access modifiers (public, private, protected, ...). I think it would be crucial to make sure that if introduced that they have an obvious separation in the syntax.

public:pure int add(int x, int y);
public class Person
{
    int Weight { get; private set; }

    public:isolated void eat(Food item)
    {
        this.Weight += item.Weight;
    }

    public void shout(string phrase)
    {
        Console.WriteLine(phrase.ToUpper());
    }
}

In the example above a class could be marked isolated which would enforce that a class could only contain members that are themselves isolated.

public:isolated class Person
{
    public void shout(string phrase)   // Compiler error, shot is not isolated
    {
        Console.WriteLine(phrase.ToUpper());
    }
}

Perhaps the same may make sense for the _pure_ keyword in that it could be used at the class level to ensure that all members are pure members.

So... should we use a F*-style effect marker that forms a lattice?
We have Total at the lowermost position (purely functional, guaranteed return), then we have Divergence, State and Exception. And the effect marker of โ€œgeneralโ€ C# functions are CSharp...

cc. @nikswamy @catalin-hritcu @simonpj

Issue moved to dotnet/csharplang #776 via ZenHub

Moved the issue over to csharplang repo to continue the discussion there. Thanks

Doesn't pure imply "always evaluating the same result value given the same argument value"

Actually, Unreal Engine 4 (their visual scripting, not C++) uses pure functions to denote functions that "promise" not to have side effects. I said promise in quotes, because you still can modify things, but the nature of them and the visual scripting meant that they had to basically be used as what would be the equalivent of calling it only as an arguement to another function, ie, using it only for the return value.

If it wasn't used for that, it had no way of being called. So, pure is very familiar to me and I think it makes sense. const in C++ made me think it meant it dealt with function pointers and const functions couldn't be set or something else.

@AustinBryan, coincidentaly perhaps, it looks like in Rust-Lang that they have opted for the const keyword to implement something very similar to what is being asked for here.

https://blog.rust-lang.org/2018/12/06/Rust-1.31-and-rust-2018.html#const-fn

Surely benchmarking and unit tests have side effects: they produce reports.

Was this page helpful?
0 / 5 - 0 ratings