Deterministic destruction
Deterministic destruction is a guarantee that some specified resources will be released when exiting from the enclosing block even if we exit from the block by the means of exceptions.
Example
In python deterministic destruction is done using the with
statement.
with open("myfile","w") as f:
f.write("Hello")
Considering the code above we don't have to worry about closing a file explicitly. After the with
block, the file will be closed. Even when something throws an exception inside the with
block, the resources handled by the with
statement will be released (in this case closed).
Other languages
using
statement for this same purpose. Julia?
It is my firm belief that Julia also needs to have a feature supporting deterministic destruction. We have countless cases when a resource needs to be closed at soon as possible: serial ports, database connections, some external api handles, files, etc. In cases like these deterministic destruction would mean cleaner code, no explicit close()
/ release()
calls and no sandwiching the code in try
.. catch
.. finally
blocks.
The do
syntax doesn't handle this?
You can do the same things with julias do notation anonymous function syntax and with macros. Try finally also lends itself to deterministic destruction.
I brought this up briefly in #4664. I think with
and finalize
would be good.
I know it can be done in a custom way by hacking some macros, but it would be much more beneficial if we had this feature in combination with support from the standard library - as it is done in python. Having a standard way of doing it would make new libraries conform to this standard way. Then the users don't have to worry about the method name of the finalizer (Is it close(), release(), free()?).
@klaufir, it's not hacking some macros. In Julia your example is
open("myfile", "w") do file
write(file, "Hello")
end
and the file is automatically closed for you. See? No macros :smile:.
But there are places where one might want to use finalizers in most circumstances yet be able to force them to run in specific situations. So there might be room for some new features, but it's not like there isn't already a way to do this kind of thing.
Oh, I see there is a standard way already. Sorry for the noise.
That is the standard idiom – writing code to ensure that some code is always called upon exit is still manual though. I've often wanted something like Go's defer
or D's scope exit clauses. Note that these are more general in a way because you can ensure that any expression is executed on scope exit.
I think with
would be better than what we do now, since different finalizeable types wouldn't need to re-implement the pattern used by open
.
I agree that implementing that pattern usually requires a trip to the manual, and occasionally interferes with some other desirable call syntax. Ran into the latter with CUDArt.
Just checking: the general feeling is that "do" doesn't do it, and "with" is still desirable?
I'm convinced at this point that do
syntax isn't sufficient, but I also think that Python's with
syntax doesn't quite cut it either. What seems to be needed is automatic insertion of a finalize
on the result of a function call when the appropriate scope exits. One problem with both do
and with
is that they require nesting/indentation. It's common to do a bunch of setup and then do the main work and then do all the corresponding teardown. Even using the with
construct, we'd have to write something like this:
with open("input") as r
with open("output", "w") as w
# do actual work
end
end
Another problem with both syntaxes is that they cannot be used inline, making them unhelpful, e.g. for the common problem of wanting to open a file only to pass it to a function and then close it when that call returns (see here for example).
I noticed that the syntax f(x...)!
is available so I'm going to throw this out there as syntax for doing y = f(x...)
and inserting finalize(y)
in at the point where the returned value y
goes out of scope. This would allow us to write the above examples as:
r = open("input")!
w = open("output", "w")!
# do actual work
Calls to finalize(w)
and finalize(r)
are inserted automatically when r
and w
go out of scope. Actually, it's more than that since the finalize
calls are guaranteed no matter how the stack unwinds. You can also use this effectively without binding to a local variable, just passing it to a function call:
write(open("file","w")!, data)
Since the result of the open
call goes out of scope after this line, this becomes something like this:
try f = open("file","w")
write(f, data)
finally
finalize(f)
end
So that addresses https://github.com/JuliaLang/julia/pull/14546 in a systematic way without adding any new methods and can eliminate many of the functions littering https://github.com/JuliaLang/julia/issues/14608.
Wow, I kind of like that idea. It would be _excellent_ if resource-backed objects only needed to implement close
and one form of open
, and we hardly ever used gc finalizers (or eliminated them entirely).
Just for a second, can we imagine inverting the syntax: so v = ...! covers the unusual case where I _do_ want to wait for GC to finalize v? How much existing code depends on that? Which bugs are worse: referencing prematurely-finalized objects, or leaking resources? The former are pretty easy to detect, at least at the cost of the finalizer explicitly putting the object into an invalid state.
See also #11207
@awf you don't want to translate every function call into a try/catch with a corresponding finalize call.
Sounds somewhat like golang's defer
(also mentioned by @quinnj in #11207), except that defer is more general (and more specific about what exactly is going to happen).
I think we can have defer
and have !
be syntax for defer close(x)
since that's what you need 99.9% of the time. The verbosity reduction is huge:
x = open(file); defer close(x)
write(x, data)
vs
write(open(file)!, data)
Nice. +1
I like the idea, but not the syntax. Sigils for resource management starts getting into rust levels of "what is going on" for newcomers unfamiliar with what a trailing !
might mean. Can this be done with a macro or function/type wrapper? If we want deterministic destruction, the question of when it occurs should be easy to answer at a glance. It can be hard enough as is to explain how julia's scoping works sometimes.
@JeffBezanson: shouldn't we be having this call finalize(x)
rather than close(x)
? Or do you feel that close
is a general enough as a concept to be considered the generic name for "finalize me"? It feels kind of I/O-specific to me. Keep in mind that we can just define finalize(io::IO) = close(io)
in Base and then all you ever need to define for any IO object is open
and close
.
Next step is an RC{T}
class that wraps objects of type T
, adding reference counting. This would be quite useful if the resource might be stored in a data structure or might be passed to another thread, but we still want early finalization. I'm thinking e.g. of large arrays or matrices for numerical calculations.
Although the implementation will be quite different, I hope that the syntax to use it can be made similar to the one discussed here. A macro might work:
dothework(@fin open("file"))
dothework(@rc open("file"))
@tkelman: Given the potential ubiquity of this, a very lightweight syntax is essential. Since you don't like the syntax, please propose alternatives. This cannot be done with a function since it interacts with the surrounding syntax; if we had defer
then it could be done with a macro generating defer
. I very much like that f(x)!
_almost_ looks like f(x)
since that's what you would write if you just let GC finalize x
. Inserting a macro call would make this a lot less pleasant.
I kind of like the idea of combining finalize
and close
into one function, but it's no big deal.
Next step is an RC{T} class that wraps objects of type T, adding reference counting
-100. I would be amazed if there is any reasonable way to make that work. The right way to handle this case is to have the compiler insert speculative early-free calls.
I don't mind the indentation of the do block form. I think readable and intuitive syntax should trump saving keystrokes especially for subtle sources of bugs like resource management. Defer with a macro would be easier to explain the rules for than a sigil handled at the parser level.
I've watched a lot of people write code like this over and over:
f = open("file")
# do work
close(f)
I cannot get them to use the do-block form, even though I've explained many times that it's the right way to express this since it prevents unclosed file handles in case of exceptions. Of course, the file handles do eventually get closed on gc, so it's not dire, but in general, if people want to do something one way, it's better to make the thing they want to do work right rather than lecturing them about how some other way to do it is better. I doubt we'd have more luck with getting people to use the with
form than the do
block form. But I'm pretty optimistic that I could get the same people to just write f = open("file")!
and omit the close(f)
entirely. In fact, I think people would love this (it's way easier and requires less code), and I don't think that explaining what the !
means would be hard at all.
Longer term, this syntax would entirely eliminate the need for having do-block versions of all functions to do cleanup. That's a big win. But the real question is whether it's important enough to have its own syntax. I would argue that the pattern of doing setup, then doing work, then doing some cleanup (regardless of how the stack unwinds) is ubiquitous. It's also annoying to get right without syntactic support and as above, people usually just don't bother. So in my view, having syntactic support for doing this pattern correctly is a no-brainer. Whether the f(x)!
syntax is the best one or not is another question, but I can't think of anything better. It makes some mnemonic sense too:
f!(x)
means that the caller is passing a value to the callee, which will do some work with it and return it to the caller.f(x)!
means that the callee is returning a value to the caller, which will do some work with it and return it to the callee.Makes sense to me. I suspect it will make sense to other people too and not be terribly hard to explain.
f!(x)
isn't syntax, it's a naming convention. I'd really prefer a named macro for this (@cleanup
maybe?), otherwise I'm seeing myself having a hard time explaining to people who aren't familiar with manual resource management why exclamation points sometimes mean "modifies an input" and sometimes mean "cleans up when value goes out of scope." A single character is going to be easy to overlook when quickly reading or trying to debug library code.
Regardless of the syntax I do think it's worth prototyping the pieces of what the implementation would require.
edit: and now a prefix unary !
will start being used more frequently for logical / function negation, giving yet another meaning exclamation points might have
f!(x) isn't syntax, it's a naming convention.
Is it? Somehow I'd missed that.
We can go ahead with the general defer
part of this and then play with syntax for auto-finalization.
A nice usage of "with" can be found in Python Yattag library http://www.yattag.org/
A similar Julia library will be a great use case
see https://groups.google.com/forum/#!topic/julia-users/leNMURKreZo
I think that use can already be handled as well or better with the existing do
block syntax.
Why doesn't the following work?
redirect_stdout() do r
show("Hello")
# close(r[2])
hello = readavailable(r[1])
# close(r[1])
end
because (a) there's nothing "available" on the socket until you close it (b) you can't close it until you've read what's available on it
you'll probably also get a deadlock on show
sometimes too, since it (the operating system) also can't write to the socket until you've starting reading from it.
please use the discourse forum to ask questions, rather than hijacking issue threads on github
I went looking for the community's current position on finalizers and resource cleanup, and I found this.
The state seems to be that there are many options, and none of them are great. Package authors seem to solve the problem in many ways. I've seen:
A future language feature, with
, might provide for finalizers to be called sooner. defer
may also be added to allow the user to schedule finalizers at resource construction time. I don't see PRs for either of these solutions.
While this issue is a minor nuisance, it may reduce the quality of the code that is available in the package repository.
Here's my strawman proposal: https://github.com/adambrewster/Defer.jl/blob/master/src/Defer.jl.
Assistance from the compiler to generate the scopes automatically would be nice, but this gets somewhat close. It's also a similar syntax to what would be used when with
or defer
are implemented.
Thoughts?
I think we should introduce this feature in 1.0 – getting it in 0.6 would have been nice, but there's only so much time. There's still a bit of controversy about the exact syntax, but we'll get there.
A good example using my proposed syntax here is eachline
:
for line in eachline("file.txt")!
# do stuff with line
end # io object closed reliably on exit of for loop
Without the !
this works but the file remains open until it is GC'd. Using a do block or defer with a name is much more awkward:
open("file.txt") do io
for line in eachline(io)!
# do stuff with line
end
end
io = open("file.txt") defer close(io)
for line in eachline(io)!
# do stuff with line
end
# io closed when the enclosing scope is left
Happy to see this likely to make it into 1.0.
This version works without defining close(::EachLine)
:
@scope for line in eachline(@! open("file.txt"))
# do stuff
end
Of course this might, too, depending on when the !
decides to close things.
for line in eachline(open("file.txt")!)
# do stuff
end
Going back to https://github.com/JuliaLang/julia/issues/7721#issuecomment-171345256, I have played around with the do
block pattern, and it does get clunky very fast, see e.g. [1] or [2].
Also, it makes it difficult to step through the code at the REPL in an interactive session. But it does make it clear when the destroy()
/close()
methods will get called, which is useful in avoiding gotchas when handling complex interactions with remote resources.
defer
statements improve the situation, but when there are lots of methods in the mix, it can be annoying to keep track of the specific methods to be deferred, and it'll be nice for a library to register them in advance for users.
I think that having broad enough scopes and doing destruction in guaranteed reverse order should be sufficient. By broad enough scopes, I mean that we chose where a defer's scope ends carefully so that will be uncommon to want an object to outlive that scope. The biggest consideration should be looping behavior, which is the main reason you need to make sure a resource is finalized: if you open a file every time through a for or while loop, you will want to close it before the next loop iteration because if there are a lot of iterations, you'll need those resources. Of course, any resources should also be finalized before a function returns – partly because that just seems sensible, but also because the function may be called in a loop or recursively.
This one looks very neat, so I'm curious why it's not tagged for 1.0 or even 0.7 in order to keep track of it?
(I'm pretty sure even those great coders among you can't remember every issue they've ever read) ;)
Because it is a feature and thus not a release blocker. The milestones are not for arbitrary tagging of stuff to remember them.
As @KristofferC said, this is a non-breaking change and can be implemented in any 1.x release.
Hello, I wonder what the status of the review of the open() do end
alternatives are.
I am OK with the current use, but I often run into cases where it would be nice to have a different solution:
do
block is actually a function body, so the variable scope is local, and often we want to be reading stuff from the open
ed file. This can be done using a construction likea, b = open(file) do fd
## read and compute a and b
return a, b
end
but I think this defeats the purpose of readability
open() do
construct with things like Channel() do
and then the hierarchical locality can make it really hard for the two constructs to deal with eachother's results (but maybe this requires some rethinking on my behalf)It would be great to have this functionality soon if anyone feels like taking a crack at implementing it.
I have updated my attempt at https://github.com/adambrewster/Defer.jl to be compatible with julia v1.0.
It's not ideal to do this with a package, but it does present an opportunity to iterate a bit before committing to a language feature.
Regarding, the slightly off-topic, reference counting of e.g. large arrays with deterministic disposal, RC{T}
, mentioned by @eschnett: this need also came up in our development recently in the context of multiple cooperating tasks, with one large-array-producing task and multiple consumers. We wrapped up the reference counting in a (quite basic) package:
https://github.com/IHPSystems/ResourcePools.jl
Transferred from https://github.com/JuliaLang/julia/issues/35815:
For performance-critical and real-time applications such as Control Systems, Robotics, Automotive, Audio VST, etc, having a deterministic memory management approach is necessary.
I want to propose an optional memory management feature that allows using Julia without garbage collection. Users can disable the garbage collection and just use this system. Otherwise, GC can help in all of these cases. This should be an optional feature that is added on top of the current behavior and should be fully backward compatible.
_This is just an initial idea, so let me know your suggestions._
Var{scope_name}(definition)
and Scope{scope_name}(code)
:
The variables defined using Var{scope_name}(definition)
should only exist in that scope.
For example,
function fun()
Scope{:Foo} begin
# this keeps x in the memory only inside the :Foo scope:
x = Var{:Foo}( rand(3) )
end
# x will be deleted once the :Foo scope is finished
return nothing
end
fun()
The scope name allows us to let a variable escape
other scopes with different names.
function fun2()
Scope{:Foo} begin
Scope{:Bar} begin
x = Var{:Foo}( rand(3) )
y = Var{:Bar}( rand(4) )
end
# `x` will escape this scope
# `y` will be removed
end
# `x` will be removed here
return nothing
end
fun2()
One can use Scope{Any}
, which means everything inside them will be removed regardless of the variables' scope names.
function fun3()
Scope{Any} begin
x = Var{:Foo}( rand(3) )
y = Var{:Bar}( rand(4) )
end
# everything should be removed.
end
Scope{:Foo}
and Var{:Foo}
should be inside the same module. This means all the scoping information is gone outside a module. But this allows deferring the removal of a variable until the scope is called (either from the top level of the module or from inside another function of the module)
module ModuleA
function fun()
x = Var{:Foo}( rand(3) )
return x
end
Scope{:Foo} begin
xout = fun()
end
# xout will be removed here
When the scope is not specified, it should be considered as local
and the variable should be removed once the program exits that local scope (unless returned). Such a variable should escape all the named scopes.
This syntax can be simplified by removing the need for using Var
since this is obvious and we don't need an extra Var
.
function fun4(z)
# This keeps x and y in the memory only inside the `fun` function
# the simplified version
y = rand(3)
# or more explicit
x = Var( rand(3) )
# x and y will be removed here.
# z is from outside, so it should not be removed.
return z
end
```jl
function fun4()
Scope{:Foo} begin
y = rand(3)
end
# y will escape because it does not have a named scope
return y
end
You can think of the local scope of a function as `Scope{:fun_local}`.
So:
- Each scope has some entry points and some exit points.
- Each scope will be treated independently.
- Things coming from the outside of the scope (entry points), should not be removed inside the scope.
# Generic programming
Because all of the scoping information is gone outside that module, the caller in another module should take responsibility for the memory management of the returns. So they should specify the scope (or delete things manually).
```jl
using SomePackage # exports fun
Scope{:Foo} begin
# we specify the scope of the return of `fun()`
myx = Var{:Foo}( fun() )
end
Variables that are defined using global
should be deleted manually by the user. This is the only place where we need to call delete
directly.
function fun()
global x = rand(3)
return x
end
xout = fun()
delete xout
We can decide between two options. With propagation
addresses the issue when references are passed to other objects.
1) With propagation
: scope propagates in the operations unless it is explicitly overwritten by the user. The variable returned from a function applied to the arguments with different scopes will have the outer scope. When one object is stored in another object with different scope, the outer scope should be chosen for both. If one is from outside the other one should live until the outer one lives.
function fun5()
Scope{:Foo} begin
a = Var{:Foo}( rand(3) )
b = a # a is also a `Var{:Foo}` now.
c = a + 1 # c is a `Var{:fun5_local}` now (because 1 was local)
Scope{:Bar} begin
# two different scopes
d = Var{:Bar}( rand(3) )
e = d .+ a # e is considered Var{:Foo}
end
# d and e are removed here
end
# a, b are removed here
return nothing
# c is removed here
end
2)
No propagation
: based on people's feedback, this option may not work in some situations.
2) No propagation
: scope does not propagate and should be explicitly specified for each new variable. The exception can be the raw assignment (d=a
).
function fun5()
Scope{:Foo} begin
a = Var{:Foo}( rand(3) )
b = Var{:Foo}( a + 1 ) # b is a `Var{:Foo}`.
c = a + 1 # c is not a `Var{:Foo}` (it is local).
# assignment exception
d = a# d is a `Var{:Foo}`
end
# a, b, d are removed here
# c escapes
return c
end
We can use a macro-like syntax instead, while I prefer the above one (seems cleaner).
for definition @var(scope_name, definition)
and for scope @scope(scope_name, code)
:
@scope :Foo begin
@var :Foo rand(3)
end
Inspired by https://en.cppreference.com/w/cpp/language/raii, https://doc.rust-lang.org/rust-by-example/scope/raii.html, and https://www.beeflang.org/docs/language-guide/memory/
This is interesting! How does it handle higher-order functions and recursions?
function f!(y, n = 3)
n == 0 && return y
Scope{:Foo} begin
x = Var{:Foo}(rand(3))
y .+= f!(x, n - 1)
end
return y # is `y` alive here?
end
This is interesting! How does it handle higher-order functions and recursions?
My general idea is that all of this should be decided in the compile-time (that is why I prefer no propagation
). So the compiler should be able to detect the scopes and the variables that should be freed. These two cases are detectable by the rules that are set for the compiler.
function f!(y, n = 3)
n == 0 && return y
# y is returned (also coming from outside), so it is not freed.
# n is from outside so it is not freed
Scope{:Foo} begin
x = Var{:Foo}(rand(3))
y = y .+ f!(x, n-1) # the resulting y is local (no propagation)
end
# x is removed
# y escapes
# if we did not return y, it should have been freed
return y
end
I was looking at "Named Scope" section (rather than "Scope Propagation"):
function fun() x = Var{:Foo}( rand(3) ) return x end Scope{:Foo} begin xout = fun() end # xout will be removed here
From this example, it looks like the caller can specify Scope{:Foo}
? Then what happens when the caller and callee are the same function (= recursion)?
Maybe I better ask:
Scope{:Foo} begin
Scope{:Foo} begin
...
end # is this `end` a no-op?
end
From this example, it looks like the caller can specify
Scope{:Foo}
? Then what happens when the caller and callee are the same function (= recursion)?
I don't think of recursion as a special case. For recursion (similar to all the other situations), each function call should be processed separately.
More generally:
_Each scope has some entry points and some exit points. Each scope will be treated independently._
Maybe I better ask:
Scope{:Foo} begin Scope{:Foo} begin ... end # is this `end` a no-op? end
If someone writes this directly:
Scope{:Foo} begin
Scope{:Foo} begin
...
end
# anything with :Foo scope is removed here
end
# nothing happens
If you want to unwrap the recursion, you should also include each function's local scope: Scope{:fun_local}
Scope{:fun_local} begin
Scope{:Foo} begin
...
end
end
I've been thinking about resource management a lot lately (in trying to implement consistent data lifecycle handling in DataSets.jl).
I'm using the do
block form to ensure watertight resource handling, but it's really an API pain point. Some kind of defer
-like thing seems really desirable.
However I've noticed is that returning an object from open()
and then close()
ing it later has a few downsides compared to the do
block form:
IOStream
can be open or closed). State machines are cool and all, but I'd really rather the compiler generate them from "normal code".open()
must outlive the function call.These are not problems in the do
form. To demonstrate these points with one ugly example:
function open(user_func, config::ResourceConfig)
r1 = Resource1(config)
r2 = Resource2(config) # (1) - stateful resources
x = foo(r1, r2)
@sync begin # (2) structured concurrency
@async background_work(r1)
try
user_func(x) # (3) no need to create a wrapper to hold (r1,r2) and present it with the interface of x
finally
close(r1) # (1) No need to write cleanup in a separate function `close()`
close(r2)
end
end
end
I'm not sure exactly what to do with these observations! But it makes me wish for a compiler transformation which allowed resource authors to write their resource handling in the do
block callback-passing style, but for users to see beautiful syntax like open(path)!
I'd personally be interested in seeing that become a facet of @sync
handling, since I think having it manage any resource fits in nicely with it being the nursery for the tasks launched inside it. Where 'that' means that any resource object can be attached to the sync block, and then that block ensures that they are close
d (or finalized?) before leaving the end
. For Task
, that would continue to be a call to wait
, for IO and Channel that would be a call to close
, for addproc it could be rmprocs(timeout=Inf)
, etc.
Though do
blocks currently are used in some places for similar effect (particularly for something like lock()
) or try/catch blocks (e.g. @lock
). And in nearly all cases, finalize
is also appropriate to use in tandem, for when the user doesn't or can't explicitly specify the lifetime.
EDIT: translating the above example, with an added hypothetical @manage
macro
```julia
function open(user_func, config::ResourceConfig)
@sync begin
r1 = @manage Resource1(config)
r2 = Resource2(config)
@manage r2
x = foo(r1, r2)
@async background_work(r1)
user_func(x)
@unmanage r2 # explicitly leak and return it
end
end
Right, having something like @manage
makes sense :+1:
But I think this is orthogonal to the point I'm trying to make here. It's the following dichotomy which is bothering me:
open()
is flexible and convenientuser_func(open(config)!)
is desirable compared to open(user_func, config)
Hah, with the transformation user_func(open(config)!)
=> open(user_func, config)
in mind, this is reminding me strikingly of delimited continuations. Is shift/reset
an alternative way to think about the desired functioning of the !
syntax?
Most helpful comment
I'm convinced at this point that
do
syntax isn't sufficient, but I also think that Python'swith
syntax doesn't quite cut it either. What seems to be needed is automatic insertion of afinalize
on the result of a function call when the appropriate scope exits. One problem with bothdo
andwith
is that they require nesting/indentation. It's common to do a bunch of setup and then do the main work and then do all the corresponding teardown. Even using thewith
construct, we'd have to write something like this:Another problem with both syntaxes is that they cannot be used inline, making them unhelpful, e.g. for the common problem of wanting to open a file only to pass it to a function and then close it when that call returns (see here for example).
I noticed that the syntax
f(x...)!
is available so I'm going to throw this out there as syntax for doingy = f(x...)
and insertingfinalize(y)
in at the point where the returned valuey
goes out of scope. This would allow us to write the above examples as:Calls to
finalize(w)
andfinalize(r)
are inserted automatically whenr
andw
go out of scope. Actually, it's more than that since thefinalize
calls are guaranteed no matter how the stack unwinds. You can also use this effectively without binding to a local variable, just passing it to a function call:Since the result of the
open
call goes out of scope after this line, this becomes something like this:So that addresses https://github.com/JuliaLang/julia/pull/14546 in a systematic way without adding any new methods and can eliminate many of the functions littering https://github.com/JuliaLang/julia/issues/14608.