Rfcs: Pure functions/effect system

Created on 26 May 2016  Â·  53Comments  Â·  Source: rust-lang/rfcs

Can we get pure fn which would be a way to indicate a function has no side-effects/can only call other pure fns?

It'd prohibit mutation of statics, pointers and references, and heap allocations. Traits would be able to demand pure fns, which would be useful for getters.

T-lang

Most helpful comment

I believe the general solution for this which is not completely cumbersome, and can support various predicates on the behavior (another useful one is "can panic/abort") is an effect system, but I can't find open issues on anything like that.

All 53 comments

Would unsafe would allow mutation or calls to non-pure functions inside a pure fn? I suppose say taking a reference to an Arc<T> is not pure since it must update the reference count?

@burdges Taking a reference to Arc<T> is free, .clone() does a refcount increment.

I believe the general solution for this which is not completely cumbersome, and can support various predicates on the behavior (another useful one is "can panic/abort") is an effect system, but I can't find open issues on anything like that.

@burdges I guess unsafe _could_ be used to say "this doesn't strictly follow pure rules, but we can guarantee it's still pure ourselves"...

I guess unsafe blocks inside pure impls would be useful for RPC and stuff.

I'd agree with @eddyb that one should work out how a (non-)effects system might fit into Rust.

A priori, I'd wager this requires first working out how higher-kinded types fit in, if only because effects are monads in Idris. Ain't clear if a non-effect like pure can or should be viewed a monad, comonad, etc., but I doubt you drop abstraction with the non.

There might be interesting papers on using HTKs to do non-effects like pure in non-pure functional languages. Ask @edwinb if he's seen anything similar. :)

Rust's affine types would in fact already enable having a really strong and flexible notion of purity that's well-integrated with the existing type system, without having to add any major new features or annotations on top of it. If you chase links from this recent discussion I had on reddit you can read more of my thoughts about it. TL;DR there'd be a zero-sized IO type (a capability token) that's not Copy or Clone, I/O and shared mutable state like Cell and RefCell would require it, and any fn and &Fn which don't receive IO as an argument could _therefore_ not be anything other than pure. Further refinements possible.

But it's not something that could be easily retrofitted onto Rust 1.x.

I am against this because of the lack of generality. pure is just one of many effects systems, which could be useful. Ideally, these should be expressed through the type system (i.e., as traits implemented by function pointers and closures), but I am unsure how it can be "restricted" (in the sense that pure code can't call impure code, non-panic code can't call panicable code, and so on). In particular, there are two ways of breaking such an restriction. The first one being by propagating it, namely letting callers carry the same bound. The second way is local and non-propagating way, which might not always be allowed.

Either way, such an addition should be general and most importantly _expressive_, covering lots of possible use cases (an example is the insecure, denoting standard non-UB security problems, which is frequently asked for).

Finally, I could see such a feature being abused in one way or another, especially since it might not be consistent across libraries. One way that could be solved is to provide a medium sized collection of common "effect systems" (in the lack of a better term), but uniformity is still not guaranteed: each library might have its own definition of "secure" and "not secure", and these might or might not play well together.

pure should be synonym for "mathematical function", and unsafe inside pure should let you use I/O for RPC.

Design by contract may be a possible path for this by allowing a contract to flag all side effects a function may generate. With Spark14 (maybe Ada12 as well), you specify which globals/statics you may read from, which you may read & write, and which parameters you may write. Params passed in are assumed read access. Then would need the compiler to propagate those contracts and enforce that if B calls A, then B's contract must contain the same level of side effects as A in it's contract. For backwards compatibility, no listed contracts would have to be assumed as "anything goes".

I would not agree with allowing unsafe or impure blocks inside of a pure. It breaks the contract and purpose of saying the function is pure. I use layers where the outer-most layer handles IO/user input/anything external to the app, calls an internal pure-layer to do processing, then that returns something to return back out of the app.

I think "unsafe pure" is "hey compiler, you can't check the purity of this but I'm a smart developer so trust me on this".

Allowing some RPCs to be pure would be nice, and "unsafe pure" does exactly that. In theory you'd need to use unsafe { unsafe { stuff here } } to be able to do unsafe things, in practice that'd be a pain for the compiler and you'd instead either have impure {} or just use a plain unsafe for both.

The compiler can't guarantee a RPC is pure because 1. it knows nothing about RPCs 2. it's implemented over I/O which's impure, but that doesn't mean a RPC isn't pure.

Take, for example, getting the title of a Guake tab. This must be implemented as a RPC, but it's still pure as it mutates nothing.

Huh, I may be crazy, but: How much of the pure fn stuff is already available by using const fn?

@killercup I think there are quite a lot of limitations in the current RFC for const fn per se (see here, for example), and it's also quite likely that any existing implementation for that is even more limited.

That was my expectations as well. There are (unofficial) plans to use Miri
in the future for const eval, though.

Ilyas Gasanov [email protected] schrieb am Mo. 15. Aug. 2016 um
18:00:

@killercup https://github.com/killercup I think there are quite a lot
of limitations in the current RFC for const fn per se (see here
https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md#detailed-design,
for example), and it's also quite likely that any existing implementation
for that is even more limited.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rfcs/issues/1631#issuecomment-239844155,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABOXwBaMWh268pLgzMbKhV31dIbqU58ks5qgI0fgaJpZM4Inbba
.

Allowing some RPCs to be pure would be nice, and "unsafe pure" does exactly that. In theory you'd need to use unsafe { unsafe { stuff here } } to be able to do unsafe things, in practice that'd be a pain for the compiler and you'd instead either have impure {} or just use a plain unsafe for both.

@SoniEx2 In my opinion, plain unsafe should be the way to go for a number of reasons. One of the most important ones is that, for example, if we use the pure constraint on deref() in the Deref trait as a guarantee that each consequent call to it should yield the same value (see the above reference link on conversion traits), and we have some code which makes an assertion that this is always the case (safe bindings to FFI code being a major example of this, see e.g. here) - in this case using impure code can be as unsafe as one generally marked as unsafe.

TL;DR there'd be a zero-sized IO type (a capability token) that's not Copy or Clone, I/O and shared mutable state like Cell and RefCell would require it, and any fn and &Fn which don't receive IO as an argument could _therefore_ not be anything other than pure. Further refinements possible.

But it's not something that could be easily retrofitted onto Rust 1.x.

@glaebhoerl If traits can be implemented for fn types, a possible way of doing this would be having Io as a marker bound which would be implicitly implemented for all fn declarations, barring const fn ones, and then the pure keyword as a syntactic sugar for marking the type of fn as !Io. This would be similar to how Sized is implicit for type arguments in generics.

Either that, or we could use Pure trait with semantics inverse to that, with pure keyword being basically synonymous to marking the function type as #[derive(Pure)]. I think Sync works in a similar way.

Of course this implies that the types of a pure fn and an "impure" fn would be different, hence their signatures would not match even if otherwise similar. However I don't think that working around it would be hard; allowing linking an impure consumer with a pure producer (but not the other way around) should be a matter of little more than adding just another (generic) coercion rule.

This has been much discussed in the past, and may even have existed in very old Rust. It is highly unlikely that we'd accept an RFC for this. The best way to make progress here (for anyone interested) would be to design and implement a system in a fork of the compiler and use that to prove utility. That would be huge, but very interesting task.

@nrc Rust used to have an effect system (pure fn).

More fine grained effect types were discussed, and proposed, but never agreed upon or implemented. I have no idea how we could retrofit it now :(

Speaking of effect types in general, apart from side effects there is another useful concept - the one of predictable timing/performance. It would be nice to be able to guarantee that any call to some fn won't take more time - or number of instructions - than a certain fixed maximum (and in the case of instructions that maximum can be determined statically after compilation). This would disallow stuff like arbitrary loops (unless compiler can reason that a loop is finite and has a maximum number of possible iterations), heap/resource allocations, calls to other fns which aren't marked as predictable, possibly some types of dereferencing too. As a bonus, some certain metadata would be written for every such fn after compilation, depending on maximum numbers of different instruction types &c in code path, which can then be used by profiling tools.

This is similar to certain goals that language subsets like MISRA C are designed to achieve. Common use cases for this are going to be interrupt and signal handlers, and generally whatever code constrained to real-time conditions.

How do you guys feel about having effect system similar to Koka's? (see http://www.rise4fun.com/koka/tutorialcontent/guide or https://www.microsoft.com/en-us/research/wp-content/uploads/2016/08/algeff-tr-2016-v3.pdf).
Would that fit into Rust? Certainly nice thing about such design is that purity is just a special case then.

Adding purity to D language, despite several successive refinements, was a big failure. If you want to add something like that to Rust you need at least to do it in a much more principled way. Like row (set) effects:
http://www.purescript.org/learn/eff/

I wouldn't call pure functions in D "a big failure." Pure functions in D are very useful and interesting, and the rules for how they work are very logical. There are ways of implementing pure functions in programming languages, and you have to decide what it is you want to really implement.

As const fns are pure (== deterministic) functions, I'm closing this issue as resolved.

Isn't const a compile-time only function? If so, that's not the same thing. A pure function is any function which can be executed at runtime, which cannot produce side-effects, to some defined degree of success. ("Safe" code is never really safe, as you must at some point do an unsafe thing in safe code, and that's okay.)

@w0rp No, const fns are not limited to compile time execution and they can be given arguments that are evaluated at runtime. But they can't do things that require run-time execution. This happens to coincide with purity.

"Pure functions in D are very useful": what they are actually used for and useful for in D? (I've tried to use them for years and I've seen people try to use them for years). In D standard library even in the few spots where a purity annotation is useful, it's not used.

@Centril Okay, that could be considered some definition of pure, then.

@leonardo-m You don't _need_ them, like how you don't _need_ memory safety, but they are nice to have. I'm not sure I'd be able to convince you that they are useful, but I like them. I find them useful for documenting which functions may or may not have side effects, with compile time checking for that promise of no side effects. They are just difficult to use some times because too much code has side effects, even functions as simple as sin.

Things that I would consider pure (but not const):

  • Getting an &-ptr from a box (i.e. Deref).
  • RPC calls to pure functions.
  • Interning/memoization of pure functions.

Things that I would consider const (but not pure):

  • Anything that involves &mut. Just because you have a &mut, doesn't mean it's not deterministic.

The more persuasive argument against a separate purity effect/annotation is that Rust already tried that before 1.0. I wasn't around back then, but I've heard that we ended up removing it because it wasn't that useful, no one could agree what it meant, and had composability issues that led to a de facto standard practice of marking everything as if it was impure. Unfortunately I can't find a source to cite for that right now.

I guess we mostly have purity through immutable references already. Altho some ppl (diesel.rs >.>) don't use them correctly.

Altho some ppl (diesel.rs >.>) don't use them correctly.

I very much doubt that. This sounds more like there's a misunderstanding. Rust doesn't have immutable references. Rust only has shared references, and you can very much use shared references to mutate state.

I'd expect diesel to take &mut Connections but (apparently?) it takes &Connections.

I do consider "immutable" references to be a form of purity system. While you can use interior mutability for everything, which is like marking everything as pure and overriding the purity system (see https://github.com/rust-lang/rfcs/issues/1631#issuecomment-384614268 which says RPC can be pure), I'd rather things used & for immutable-ish/pure and encapsulation (see eventbus crate), and &mut for mutable-ish.

RPC calls to pure functions.

I think that is a problematic semantics of purity and might be the cause of confusion in the future.

The problem is, I can only find one real academic definiton of purity, which is in the realm of functional programming languages (lambda calculus) and I'm not sure how useful those semantics are: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.27.7800&rep=rep1&type=pdf

Comparing with existing non-academic definitions of imperative languages:

The loosest definition is probably the gcc one (which is slightly different to const functions, because it allows reading global variables), but even that assumes no side-effects.

What's wrong with the ability to have pure RPC functions?

What's wrong with the ability to have pure RPC functions?

Because they are inherently side-effect full (possible network calls and whatnot, even if they are hidden behind an abstraction). I haven't seen a definition of "purity" yet that allows side-effects. I'm not convinced rust is a good experiment to introduce such a foggy definition.

network state doesn't affect data state (unless you're Wireshark)

network state doesn't affect data state (unless you're Wireshark)

Even if that was true, it's irrelevant to the definition of purity. Networking is a side-effect and introduces additional sources of failure that you cannot easily reason about.

The function call may fail, regardless of input. How is that pure?

You're introducing a totally new definition that no language (or academical source) has previously deployed.

The function call may fail, regardless of input. How is that pure?

To be clear; it is not the failure itself that is the cause of impurity; Failure can easily be modelled in a partiality monad and can be handled purely.

What is the cause of impurity is that if you call net_fun(x) twice you may get two different results (including the server failing, or simply returning something else cause of changed state on the server side) despite x being the same in both calls.
That is, networking can easily violate the requirement of a pure function f : A -> B that:
forall x : A, y : A. x == y -> f(x) == f(y).

If you somehow manage to define a linear function f : UniqueToken x A -o B that consumes the UniqueToken and for which the same UniqueToken can truly never be created twice, then f should be a pure function. Of course, to generate a truly unique token, you need impurity, so you have moved out the impurity from f to somewhere else.

To be clear; it is not the failure itself that is the cause of impurity; Failure can easily be modelled in a partiality monad and can be handled purely.

What is the cause of impurity is that if you call net_fun(x) twice you may get two different results

Failure that doesn't depend on input (or initial program state) automatically violates referential transparency, no matter if you model it as a monadic value or not. So yes, I think we kind of mean the same.

Expanding on that:

Purity is most often defined in terms of evaluation. That means: if evaluation doesn't cause execution (e.g. of network calls), then the function is still pure. In practice, that means: if you make effects explicit via e.g. an IO type, then you can still have a pure function that does network stuff, given that your evaluation model doesn't cause execution of that IO action. That would allow the compiler to recognize this and have correct optimisation on pure functions in either way.

Since we don't have an IO type (anymore) and evaluation and execution are not strictly separated, we cannot allow network calls.

Failure that doesn't depend on input (or initial program state) automatically violates referential transparency, no matter if you model it as a monadic value or not.

We are mostly in agreement, but I have a nit: This doesn't have to be the case. We can encode the following function:

alwaysError :: b -> Maybe a
alwaysError _ = Nothing

The use of alwaysError whatever is referentially transparent, but it does not depend on the value of b. However, it is true that the call by name/need and the call by value semantics differ as alwaysError undefined == Nothing for the former two and alwaysError undefined == _|_ for the latter.

I agree with your notes on practice, however, I think I showed that there can be pure networked functions given that they take an argument for which the value can never be repeated. I think this is a bizarre theoretical result which is boring and uninteresting, but well...

EDIT: In this bizarre and unpractical world, networkedFun ut is a pure expression,
but do ut <- mkUniqueToken; pure $ networkedFun ut is not pure.

We are mostly in agreement, but I have a nit: This doesn't have to be the case. We can encode the following function:

You are right, my informal definition was maybe misleading. My point was rather that whether a function is pure or not doesn't depend on how you model failure (e.g. monadic). Which I think we still agree on.

However, it is true that the call by name/need and the call by value semantics differ

Kind of, but it still meets the formal definition of purity, since it's defined as the "(weak) equivalence of call-by-name, call-by-value and call-by-need" (weak equivalence = considering undefined to be equivalent).

I think I showed that there can be pure networked functions given that they take an argument for which the value can never be repeated.

Whether a value can be repeated in practice is irrelevant for the theory of functional purity. A function doesn't suddenly turn pure or impure depending on what exact instance the value has in the range of the type.
Even if you insist on this model (which I argue is flawed, because it pulls practical computability into a theoretical equivalence model), then you would have to say "I cannot say whether the function is pure or not, because I cannot repeat the function call with the exact values".

As such, every function that causes networking during evaluation is always impure. Do we agree on that?

let's say you have a networked FFT function. that is, you call it with some set of values, and it calculates the FFT on a computing cluster.

is it pure? yes, because it's an FFT. same input, same output, always.

let's say you have a networked FFT function. that is, you call it with some set of values, and it calculates the FFT on a computing cluster.

is it pure? yes, because it's an FFT. same input, same output, always.

Absolutely not. The syscall may fail, your computer may be offline, the cluster may be down, corrupted, ...

As such, calling this function twice may totally yield different results (including unexpected failure).

oh yeah I can also point a heatgun to my RAM while running a pure local FFT function.

those failures don't exist in the abstract machine. they're failures of the physical system.

oh yeah I can also point a heatgun to my RAM while running a pure local FFT function.

Execution is always impure. I'm not sure you listened to the conversation.

networked FFT is pure, because if you consider the network as a whole as an abstract machine, it's no different from calling a local function.

because if you consider the network as a whole as an abstract machine

It isn't. That's not how you design a language. That's how you design your own architecture. You are confusing totally different abstraction levels. Please.

the function's implementation should not affect its purity. neither should side-channels. (e.g. checking if your FFT is running locally or on another machine based on CPU usage)

@SoniEx2
The language has absolutely no way to know if the function is pure or not if it's remote. Also, the odds of a network failure are quite high compared to a local failure, so the odds of the assumption of purity being violated become unacceptably high.

Using pure means the language has to guarantee or provide contracts that there would be no side-effects. Like @Serentty said if the language have no way to figure out the purity of function, its pointless to be able to mark a function as pure.

I think this thread has outlived it's usefulness.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

burdges picture burdges  Â·  3Comments

3442853561 picture 3442853561  Â·  4Comments

marinintim picture marinintim  Â·  3Comments

torkleyy picture torkleyy  Â·  3Comments

rudolfschmidt picture rudolfschmidt  Â·  3Comments