V: Require `mut` in signatures of impure functions

Created on 15 Dec 2019 · 28Comments · Source: vlang/v

Right now, I can write a function whose signature looks pure, but that still, for instance, performs file IO. This side effect is just as powerful as could be achieved with global variables. Similarly, there are functions in the standard library that look pure judging by their signature, but that actually give different answers with the same inputs, like now for instance.

If a function is assumed pure by its signature, but is actually using such functions as now and read_file under the hood, the callers may be in for nasty surprises.

There should be a way of designating a function as impure, so that it is impossible for the function to be used in pure functions. This might imply type signatures that look like this:

fn mut print()
fn mut now() Time
fn sin(f64 theta) f64
fn pure_i_swear(x f64, y f64) f64

Pure functions require no mut decorators and impure ones do. print then couldn't be run from pure_i_swear.

At the programmer's discretion, mut could be lifted using unsafe_pure(). Example: loading read-only program assets from disk.

fn load_asset(config Config, name string) string {
    unsafe_pure(read_file)(config.asset_dir + name)
}

As long as you can be reasonably assured the assets won't change, this will work fine. And in case anything does go wrong, you can grep unsafe_pure for possible explanations. It shouldn't have to be used very often. (If it is, you have other problems lol)

The optimiser should also be able to reap benefits from mut hints. Lack of such a specifier guarantees a function only needs to be run once in a for loop, for instance, just as long as the inputs are constant throughout. It might not make much of a difference if the optimiser can see inside the inner function, but for functions hidden behind shared libraries, this could be very useful. For instance, LAPACK is full of pure functions that the optimiser doesn't know are actually pure.

Finally, seeing as this is a major selling point of this language, I would consider your mission incomplete and your statement of "pure functions by default" misleading at best and false at worst until something like this exists.

Feature Request Discussion

Source

ewtoombs

👍2

Most helpful comment

I guess I should change the wording on the home page, because V functions are not fully pure from a FP perspective.

They can't change arguments or global variables, but they can print or access system time etc.

medvednikov on 15 Dec 2019

👍4

All 28 comments

In practice functions are always impure by this standard, because after all, they consume time and space when they run...

spytheman on 15 Dec 2019

👍1

If you want to execute functions just once, you can just call them only once...

spytheman on 15 Dec 2019

The ability to trace program execution by putting println (or the equivalent printf) in ANY function is VERY useful... https://en.wikipedia.org/wiki/Printf_debugging

spytheman on 15 Dec 2019

I guess I should change the wording on the home page, because V functions are not fully pure from a FP perspective.

They can't change arguments or global variables, but they can print or access system time etc.

medvednikov on 15 Dec 2019

👍4

You can provisionally make println pure during debugging, but all such printlns shouldn't be in pure functions in prod anyway. (Though in this specific case, perhaps a debug() function that is always pure would be more appropriate. Such a function could also be made non-blocking. Before moving to prod, you can just have a git hook grep out all of the debug()s you may have forgotten about.)

This is meant to be a practical tool—not the dogmatic pursuit of purity for its own sake (which doesn't even sound like programming out of context, lol). If you know that a function's side effects will be negligible, like println or some kind of log to file, then yeah, mark it pure. If you know that a certain set of assets on disk will never change during the program's runtime, then yeah, make a load_asset function and mark it pure. This will require an unsafe cast of functions inside it to pure functions, like load_file(), but such unsafe casts will be few, and highly greppable if anything goes wrong. Actually, I think I'll mention the unsafe pure cast in the original post.

println is useful in debugging, but so are functions that are guaranteed not to have side effects, especially in foreign functions whose implementations aren't well-known. It eliminates huge classes of problems automatically.

ewtoombs on 15 Dec 2019

👍2

As for calling functions only once, it makes code longer and harder to read than it needs to be. Having to export every constant in a mathematical expression to outside the loop gets really annoying really fast. There usually aren't good names to call such constants other than sin_n_x or something. This kind of manual optimisation was a recurring headache when trying to do quantum mechanics simulations in numpy.

ewtoombs on 15 Dec 2019

As for time and space consumed during run time, these are performance concerns, not correctness concerns. I am only concerned with side effects related to correctness, i.e. functions that cause IO and alter memory, and functions whose sole purpose is to change its runtime, like sleep(). This would be useful enough by itself. This is really what pure means, from the FP perspective.

ewtoombs on 15 Dec 2019

👍1

As for calling functions only once, it makes code longer and harder to read than it needs to be. Having to export every constant in a mathematical expression to outside the loop gets really annoying really fast. There usually aren't good names to call such constants other than sin_n_x or something. This kind of manual optimisation was a recurring headache when trying to do quantum mechanics simulations in numpy.

Any examples would could wrap our heads around? I'm sure there is a better solution Python as well as V can provide in this regard (without changing/extending the current meaning of mut).

dumblob on 16 Dec 2019

Possible optimisations are really just a side effect (lol) of adding this feature. The main point is to aid debugging, code safety, and code understanding, precisely the same goals that mut already has, as applied to variables and function arguments. But if you want a concrete example, making one is easy enough.

With mut hints, optimisation automatic:

for thing in things {
        thing.modify(ctx.funk(), ctx.awesomeness, ctx.umami(), ctx.asset_dir())
}

Manual optimisation:

ctx_funk = ctx.funk()
ctx_umami = ctx.umami()
ctx_asset_dir = ctx.asset_dir()
for thing in things {
        thing.modify(ctx_funk, ctx.awesomeness, ctx_umami, ctx_asset_dir)
}

funk, umami, and asset_dir are all properties derived from ctx's data in some way. The extent of the manual optimisation is directly proportional to how complicated the loop is, so it only gets worse. But I can't emphasise enough, this is not the primary motivation.

ewtoombs on 16 Dec 2019

👍1

As for the word mut, it is not a modification or extension of the pre-existing meaning of mut. It is a new and different meaning when used in a different context (i.e. when applied to a function), exactly like how in has different meanings in different contexts. in means one thing in for i in things and a different thing in if i in things. And there is no confusion between the two meanings because the context is clear. But if you still think it's confusing, then by all means call it something else.

ewtoombs on 16 Dec 2019

About that code:

for thing in things { 
    thing.modify(ctx.funk(), ctx.awesomeness, ctx.umami(), ctx.asset_dir()) 
}

Looking at it, I expect it to call the function ctx.funk(), then ctx.umami(), then ctx.asset_dir() , then use their results as parameters to thing.modify(). I do NOT expect that code to be automatically converted to the second invariant. In fact, I do get confused, if because of some optimization, the compiler decides to do it.

Reading it, if the functions are marked as pure and this optimization is on, will require the programmer to know the function signatures first, in order to deduce what will happen.

Also, what happens if I put a println or debug or log function marked pure inside the loop?
I would expect it to be called on each iteration, but if the optimization is on, they would be moved outside the loop too, just like the ctx.funk(), would not they?

spytheman on 17 Dec 2019

If the functions are pure, there's no difference between the optimised and unoptimised versions.

Now, if you're actually putting a debug() function in the loop and its argument doesn't change with each iteration, which is possible, then yeah that gets more complicated. You could just switch off the optimisation for the debug() function, which is not a big deal. But yeah you would need a way to do that. Like a dont_optimise hint or some crap, idk. So, maybe optimisation requires more thought, though it would be valuable if it could be done. The optimisation is still a secondary objective, though.

Or just turn off optimisation while debugging, which is usually done anyway.

ewtoombs on 17 Dec 2019

👍1

Anyway, could any of you at least agree that this feature is valuable, or are you just going to keep pointing out minor problems without bothering to try to fix them?

ewtoombs on 17 Dec 2019

👍1

I am against that feature. It has value, but it would complicate things needlessly in my opinion, while making code harder to read and follow.

spytheman on 18 Dec 2019

👎1

The problem of not being able to use (or making much harder to use) a major debugging/tracing technique is anything but minor in my opinion.

spytheman on 18 Dec 2019

I think the problem raised by @spytheman about debugability is not a real issue.

As in Go, for debug purposes one usually use println calls here and there. But when doing some actual logging / printing it is considered best practice to rely on a fmt-family function like fmt.Printf.
This goes beyond the scope of this feature request but I would consider doing the same in V (since Go is its main source of inspiration).

Builtin println and subsequent print functions should be equivalent to the debug function mentionned earlier, thus lifting the shadow about its specific usage. Solution : modularize impure IO functions, provide builtin pure functions to use as debug / dirty printing.

Back to the initial topic. I am fully supporting this initiative as it would refrain people from abusing function names to hide side effects and provide more developer-friendly APIs.
Regarding optimization and the example mentionned above, I would not expect the compiler to optimize the functions containing println calls.

Spriithy on 2 Jan 2020

👍1

Thanks, @Spriithy, that's encouraging.

I thought about it more, and concluded that using v's println or any other blocking IO call for debugging is actually a super awful idea, because stdout could block. If output is being piped to less, it will block. If there is too much output for whatever stdout is connected to, again, it will block. A dedicated debug logging function is a much better idea. It can queue log entries to stdout, and if the queue is growing too fast, it can start dropping log entries and say how many it has dropped. Though this is a little off topic, lol.

ewtoombs on 2 Jan 2020

You usually don't want to drop logs, especially in a non-deterministic manner. What if you drop the logs you actually expected to see ? But yeah, it's way off topic.

Spriithy on 2 Jan 2020

You don't want to drop logs, but you also don't want to block time-sensitive code, thus changing its behaviour. If you're testing a game, for instance, it is usually more important that the game keep running. If debug messages are actually getting in the way of that, there are probably too many to read anyway, at least in real time. If you really need every message, you could just tell debug to keep everything and be patient. If something has really gone wrong, the debug message queue might use up all your memory waiting for stdout to be available though. So even if you want to keep everything, it is still advisable to put a limit on the message queue, if only for this reason.

ewtoombs on 2 Jan 2020

So just sticking with fn mut and unsafe_pure for now, first of all, are the semantics clear, or are there cases I haven't thought of where behaviour is ambiguous? And if it's clear, how hard would it actually be to add this feature?

ewtoombs on 2 Jan 2020

I think your proposal is sound. I'm just unsure about the unsafe_pure "macro"-ish syntax.

Spriithy on 2 Jan 2020

That syntax was inspired by the way conversions are done. unsafe_pure(f) is supposed to be analogous to an expression like string(cstring). So, basically convert this function to a pure function. I'm definitely open to suggestions there.

ewtoombs on 2 Jan 2020

True function purity (no observable logic side effects) is very useful as it makes function calls easier to reason about. It is similar to how explicit mut makes code easier to understand and can help offer safe efficient concurrent code. Truly pure functions can run concurrently with no data races.

I think using mut to mark a function as not pure is not very clear, instead we could use [impure]. Best not to overload the meaning of mut.

A very useful concept is 'weak purity' where a truly pure function can call a function that mutates local state, but that state is not visible outside the truly pure function. This is done in D and makes writing truly pure functions much easier as mutation can be safely used internally.

They can't change arguments or global variables

Actually that's not true, V does use mutable global variables sometimes. Mutable global variables are useful sometimes, but they should only be accessed from a function marked as impure.

ntrel on 10 Aug 2020

👍1

About that code:
for thing in things { 
    thing.modify(ctx.funk(), ctx.awesomeness, ctx.umami(), ctx.asset_dir()) 
}
Looking at it, I expect it to call the function ctx.funk(), then ctx.umami(), then ctx.asset_dir() , then use their results as parameters to thing.modify(). I do NOT expect that code to be automatically converted to the second invariant. In fact, I do get confused, if because of some optimization, the compiler decides to do it.

Changes such as this are actually very common compiler optimizations, and have been for many, many years. The term I have heard is "code hoisting". Basically, move things that are only done once outside the loop, instead of doing them over and over every time through the loop.

Of course, function calls are trickier, since you may not know all the side-effects... but the compiler might, and it if does, hoisting is a good optimization.

JalonSolov on 10 Aug 2020

👍1

Because purity is infectious (pure can't call impure aside from local mutability), it might be more practical to make functions impure by default.
Also function purity can potentially be inferred by the compiler, so you don't have to remember to write e.g. [pure]. D does inference of pure.
We could allow the pure attribute on a whole file basis, e.g.:

[pure]
module mymod
// All functions are verified for purity

ntrel on 10 Aug 2020

[pure], [impure], whatever. It's all good. And though the compiler could infer purity, it is very useful as part of an API spec. Like reading the docs and seeing oh this function is pure. Good. Moving on.

Perhaps, impure by default might be more practical for languages like C, but V already has good mutation-free vibes, you know? I think it could be done with a majority of pure functions.

There's a psychological angle too. It could easily be the case that if this were implemented with impure default, most functions would be impure, and if it were implemented with pure default, most functions would be pure.

ewtoombs on 13 Aug 2020