Currently, when defining inner constructors, we can decide to incompletely initialize a structure simply by passing to new(...)
fewer values than the number of fields:
struct Incomplete
x1
x2
x3
...
x719
Incomplete(x1, x2, x3, x4) = new(x1, x2, x3, x4) # x5, ..., x719 are uninitialized
end
Since the call to new(...)
accepts only positional arguments, this has the obvious limitation that we can only initialize a prefix of all the fields and leave the remaining ones uninitialized. It is impossible to initialize only x1
, x42
, x470
and x666
.
My proposal is to allow new(...)
to accept keyword arguments, so that the following code becomes legal:
struct Incomplete
x1
x2
x3
...
x719
Incomplete(a, b, c, d) = new(x1=a, x42=b, x470=c, x666=d) # the other fields are uninitialized
end
It is just useful sometimes to be able to do this. See for instance this thread where I asked on the forum if there was a way to achieve this with the current state of the language. There are plenty of workarounds, with more or less significant downsides (complexity-wise and performance-wise), but no direct solution.
This should be :100:.
I honestly don't see any.
design, feature, keyword arguments, types and dispatch
Orthogonally to incomplete initialization, just having a form of new
with keyword arguments would be kind of nice for composite types with a nontrivial amount of fields.
Downsides
I honestly don't see any.
- Complicates the parser and lowering passes of the front-end
How does it complicate the _parser_? The parser is already capable of parsing keyword arguments elsewhere. Is the new
function special-cased in this regard? I'll try to dig into the code as soon as I find the time. I'm not familiar at all with the internals, but it seems strange to me that this proposal would require a big change to the parser.
- Compiler needs to be able to handle arbitrary subsets of fields being uninitialized.
Does this mean that currently the compiler relies on the fact that only a suffix of the fields can be uninitialized? I don't see why the compiler should care about this. The implementation should be pretty straightforward:
0
, NULL
or whatever the fields which are references, or skip initialization altogether if possible.Then the logic for accessing fields doesn't need to change at all. We don't have to keep track of which fields are initialized or not.
How does the compiler currently "handle" which fields are initialized? I think it just emits the asm to read them whenever they are accessed, right? There shouldn't be anything new to do w.r.t. the current implementation.
It's probably not a huge complication, but it will certainly complicate some of the front-end-code. The new
constructor isn't really a function, so it's handled a bit specially.
Regarding the compiler, it needs to defensively emit code that checks any ref fields that can be uninitialized to make sure they're not NULL pointers before accessing them. When accessing fields that are always initialized, this doesn't need to be done. Currently, which fields need to be checked and which don't can be tracked with a single number; if it becomes an arbitrary subset, a bitfield will be required instead.
I'm not arguing against this feature, just pointing out that it isn't without implementation and maintenance costs.
I do think this feature makes sense. It's easy to implement for mutable structs, since we can just lower it to a series of setfield!
s. For immutable we would probably need the "mutating immutables" feature (#11902) before we could do this.
@StefanKarpinski
Regarding the compiler, it needs to defensively emit code that checks any ref fields that can be uninitialized to make sure they're not NULL pointers before accessing them. When accessing fields that are always initialized, this doesn't need to be done. Currently, which fields need to be checked and which don't can be tracked with a single number; if it becomes an arbitrary subset, a bitfield will be required instead.
Now I completely get what you mean. Sorry :) Of course I agree with what you say.
I believe the counter you mention is ninitialized
in julia.h
typedef struct _jl_datatype_t {
/* ... */
int32_t ninitialized;
/* ... */
} jl_datatype_t;
right?
@JeffBezanson
For immutable we would probably need the "mutating immutables" feature (#11902) before we could do this.
I also thought about having temporary mutability for immutable structs inside inner constructors, but that seemed somewhat a bigger change to the language, therefore I limited myself to this proposal, which could potentially be implemented in an easier way than the much bigger rework of immutability.
It's easy to implement for mutable structs, since we can just lower it to a series of
setfield!
s.
That's also the case that I consider less interesting, because I could just do it myself by using an emptynew()
in the inner constructor and then explicitly setting the fields that I want. The only downside is that those fields would probably not be marked as "always initialized", so accessing them requires a NULL
check. Of course having the support for this feature would still be handy, but not as indispensable as it is for immutable structs, for which there is no possible workaround in userland.
FWIW, I've always questioned the need for partially initialized immutable structs. I guess that it can be useful in some situations, but since you can't finish initializing them, it seems like a bit of a dicey feature to me.
Well, maybe you don't need to "finish initializing" them; they are already finished as they are. They just carry redundant fields for homogeneity with other instances of the same type.
I'll reproduce here the example I gave on the forum. Imagine that you want to represent bounded/unbounded open/closed intervals with endpoints of type T
. In C
you would do
typedef struct _interval_t {
T left;
T right;
uint8_t kind;
} interval_t;
where kind
is a bitmask that indicates the kind of endpoints: there are 3 possibilities (closed, open, unbounded) for each endpoint, so you can use 2 bits for each.
When you need to represent the interval [42, ∞)
, you just don't assign to the member right
(and of course never read it afterwards!). The produced struct
has an uninitialized member, but it is in no way "unfinished". It's like this just to blend in with her friends (-∞, 13]
and [0, 1]
.
In that case, if T
is a reference type you're better off setting the unused field to some value anyway, so we don't have to check fields on access.
True, but it isn't straightforward to produce "some value" of type T
in a generic fashion. One would need to force library users to define a
default(::Type{T}) = ...?...
for their types if they want to use them in the structure I provide them.
Also, either you always return exactly the same instance, or you'll have some allocations.
default(::Type{BigInt}) = BigInt(0)
would be bad. One has to do
const BIGINT_DEFAULT = BigInt(0)
default(::Type{BigInt}) = BIGINT_DEFAULT
This could be seen as a possible tradeoff once the two approaches are available:
NULL
checks;NULL
checks.@FedericoStra: as I suggested on the forum, I think that Union
types are the perfect solution for your original problem.
Generally, incomplete initialization may be a vestigial feature of Julia since Union
became efficient. I don't think that relying on it to effectively emulate NULL
in other languages should be encouraged, since we have Union
types which was developed for precisely this.
@tpapp: You know what? I think if you keep telling me you'll probably convince me in the end! I might be a bit stubborn, but not to the point of being completely unreasonable :stuck_out_tongue_closed_eyes:
So here is my idea: I plan to have a look at builtins.c
, cgutils.c
, codegen.cpp
, datatype.c
, and jltypes.c
, the places where ninitialized
is mainly used. This is an interesting study for me, so even if I don't conclude anything, it'll be an instructive learning experience. I'll start doing this on monday, and I'll try to keep posting here some notes documenting my findings and what needs to be changed. I wish you all a good weekend!
The main difference between using a union with nothing
and undef fields is that accessing an undef field is an immediate error whereas nothing
is a first class value that you can access, pass to functions, define methods for, etc. Using a union with nothing
is also more generic since not all types of fields can be undef
, so if you're relying on the "error on access" feature of undef fields then you have to be careful of what type of field you have—it must be a reference type. The union approach seems generally more useful and usable to me, but that is of course a judgement call.
The downside of the Union
approach is that every field keeps track of its own state of definedness, so every access is always checked. I feel like a benefit of the undef approach is when the field is "plain data" (hence no NULL
check) and the state of definedness is kept implicitly by other fields, or it is even determined outside of the struct by its usage. Imagine having
struct P
x::Int
y::Int
end
and then knowing that in a collection Vector{P}
every element has both x
and y
, except for the last which has only y
. This is a knowledge that is kept outside of the struct, it is given by the problem at hand, and the undef approach allows you to avoid any checks upon access.
I realize that this is a quite specific usage, so the Union
approach is definitely generally more useful, but this necessity can come up and the undef alternative allows some (possibly tiny) performance improvement.
I would never suggest relying on the "error on access" feature of ref fields. I'm more for the "look before you leap" than the "it’s easier to ask for forgiveness than permission" ideology (when I'm outside of Python).
I plan to have a look at
builtins.c
,cgutils.c
,codegen.cpp
,datatype.c
, andjltypes.c
, the places whereninitialized
is mainly used.
Of course do so if you're interested, but I don't think ninitialized
is the limiting factor here. We could very well implement this without changing it, and still just set it to whatever prefix of fields is always initialized. In particular, this won't cause regressions in any existing code, so generalizing ninitialized
would not be urgent.
@FedericoStra:
downside of the Union approach is that every field keeps track of its own state of definedness
[...]
knowing that in a collection Vector{P} every element has both x and y, except for the last which has only y.
What about
struct P
xy::Union{Tuple{Int,Int},Tuple{Int,Nothing}}
end
with a convenient interface via Base.getproperty
?
Generally, the more I think about incomplete initialization in the post-Union
-optimized era of Julia, the less the use case makes sense, especially in a style which favors immutables and generic code. The example in the manual about a recursive struct is cute, but I am not sure that this pattern occurs in the wild so much that it would be missed. Maybe in 2.0 the whole feature could be removed, or at least the language should enforce that no uninitialized fields can escape the constructor.
If there are examples of incomplete initialization used in the wild, I would appreciate some links.
Please don't remove incomplete initialization for no reason. There are a lot of applications outside of number crunching, where immutable types are not an option and incomplete initialization and delayed computation of fields is important. Also, Unions
are not a solution to every problem, in particular in view of the recent changes towards reducing compiler latency.
There are a lot of applications
Just to clarify: I am not (yet) suggesting that incomplete initialization is removed (in any case, this is a breaking change so the earliest this could happen is 2.0), but I think that real-world use cases would still be very interesting.
The example in the manual about a recursive struct is cute, but I am not sure that this pattern occurs in the wild so much that it would be missed
Trees and linked-lists? Nodes have to refer to other nodes, but for a list the head and tail don't (yet) refer to anything. Likewise for the root and child nodes of trees. You might think you could circumvent this by using mutable struct
s and have the head/tail refer to themselves (so all fields are initialized), but you still have to temporarily leave a field uninitialized:
# A demo where the user never sees an object with an uninitialized field, but you still need to be able to construct one transiently
"""
Node{T}(data)
Construct the head of the list.
"""
function Node{T}(data) where T
head = new{T}(data)
head.preceeding = head
head.succeeding = head
return head
end
"""
Node(data, preceeding)
Add a new link after `preceeding`
"""
function Node(data, preceeding::Node{T}) where T
node = new{T}(data, preceeding)
preceeding.succeeding = node
return node
end
These are pretty fundamental data structures, and need to be supported.
@JeffBezanson
Of course do so if you're interested, but I don't think ninitialized is the limiting factor here. We could very well implement this without changing it, and still just set it to whatever prefix of fields is always initialized. In particular, this won't cause regressions in any existing code, so generalizing ninitialized would not be urgent.
Sorry, maybe what I wrote was unclear. My intention is of course to retain ninitialized
with exactly the same meaning that it currently has: the length of the longest prefix of fields that are always initialized by every inner constructor. What I meant is that I would probably add a member
typedef struct _jl_datatype_t {
/* ... */
jl_svec_t *types;
jl_svec_t *names;
jl_svec_t *alwaysinit; /* this is the new member */
int32_t ninitialized;
/* ... */
} jl_datatype_t;
which is a simple vector of jl_bool_type
s keeping track of whether each individual field is always initialized by every inner constructor.
This way ninitialized
would still be available and nothing should break. What I meant is that I was looking for all the call sites of jl_new_datatype
, which need to be updated to pass also alwaysinit
.
@timholy: thanks for the example. I agree that mutable struct
is indeed a relevant use case. Note that for this, incomplete initialization of tail fields works fine, even though it has no dedicated syntax:
julia> mutable struct Incomplete
a
b
Incomplete(b) = (result = new(); result.b = b; result)
end
julia> Incomplete(2)
Incomplete(#undef, 2)
If a keyword-based syntax is desired, a macro simple should be able to take care of this just fine.
As for immutable struct
s, I still think that leaving fields uninitialized is a vestigial feature after Union
optimizations, and there is no compelling reason for it to be supported (this, of course, is a question for 2.0). I may be mistaken of course, in which case I would appreciate examples.
@tpapp I believe I've already given elsewhere some compelling examples where immutable structs with uninitialized fields can be of use. I'll try to repeat here the main advantage they can provide from my point of view.
They allow to work with homogeneous collections of objects with efficient access (no isa Nothing
check) of "plain data" fields in situations where the state of initialization is not necessarily kept by some internal flags (checking which would probably be analogous to isa Nothing
) but rather by some external invariants (for instance, being the first or last element in a list).
In the fullness of time (i.e. for Julia 2.0), I think we should consider eliminating uninitialized fields entirely. This could be done with a combination of union types and field defaults. For example, we could only allow fields to not be explicitly given a value if they have a default and then double down on the convention of using nothing
as the default for fields that may need to be uninitialized for a bit. This need not change the memory representation of values at all since it's strightforward to represent a field of type Union{List, Nothing}
as a single pointer with NULL
indicating that the value is the singleton nothing
rather than actually storing a pointer to a nothing
value, although, that's a constant as well, so maybe there's no real point.
The only hesitation I have about that is that it forces the type of a field to be a union even if the field never escapes the constructor uninitialized. So an intermediate solution would be to somehow enforce that undef fields get defined by the time an object is fully constructed. Of course, that leaves the question of when the object is considered fully constructed. One potential definition is when it is returned from the method in which new
is called, so we would enforce that it have all fields assigned by the time it is returned from that function body and then not have to worry about uninitialized fields in any other code.
In any case, I think this is mostly tangential to this issue. None that clashes with this feature that I can see.
Another avenue to explore is to have an explicit unconstructable token. Union{List, Never}
can thus be used to represent a field that is set to either List or not set (as the Never
type would be known to have no instances). This makes Never similar to Union{}
in the (un)constructable sense, without also defining it to be a subtype of all other types.
Most helpful comment
Trees and linked-lists? Nodes have to refer to other nodes, but for a list the head and tail don't (yet) refer to anything. Likewise for the root and child nodes of trees. You might think you could circumvent this by using
mutable struct
s and have the head/tail refer to themselves (so all fields are initialized), but you still have to temporarily leave a field uninitialized:These are pretty fundamental data structures, and need to be supported.