The behavior between Dict
and an Array
is inconsistent when it comes to when copying is made:
julia> a = [1,2,3];
julia> b = typeof(a)(a);
julia> b[1] = 3
2
julia> a
3-element Array{Int64,1}:
3
2
3
julia> A = Dict(1=>"a", 2 =>"b", 3=>"c");
julia> B = typeof(A)(A);
julia> B[1] => "c"
"b"=>"c"
julia> A
Dict{Int64,String} with 3 entries:
2 => "b"
3 => "c"
1 => "a" # <---- not "c"
In my opinion, Dict(::Dict)
should not make a copy if it can just return the dict.
Reported originally at: https://discourse.julialang.org/t/why-is-d-dict-dict-d/5197
This might be a convert
vs. construct difference --- convert
should not make a copy if the value is already of the right type, but arguably a constructor should always make a new object (the word "constructor" seems to imply it anyway). I'd vote for the other change, and make Array
copying.
What about the convert
fulfills the construct
contract (or however it phrased)?
-(::Type{T})(arg) where {T} = convert(T, arg)::T
+(::Type{T})(arg::T) where {T} = copy(arg)
+(::Type{T})(arg) where {T} = convert(T, arg)::T
???
Here's a survey of what happens calling mutable structs' own constructors on themselves :
julia> tas = [
(Array, [1,2,3]),
(BitArray, 2),
(SharedArray, [1,2,3]),
(Dict, 2 => "a"),
(ObjectIdDict, 2 => "a"),
(Set, "a"),
(Task, () -> rand(9)),
(Channel, 2),
(WeakRef, "a"),
(Ref, "a"),
(Regex, "^a"),
(IOStream, "a"),
(IOBuffer, "a")
];
julia> for (t, a) in tas
print(t, "\n ")
x = t(a)
y = t(x)
println(x === y)
end
Array
true
BitArray
true
SharedArray
true
Dict
false
ObjectIdDict
false
Set
false
Task
false
Channel
true
WeakRef
false
Ref
true
Regex
true
IOStream
true
Base.AbstractIOBuffer{Array{UInt8,1}}
true
Yikes, that's a bit haphazard.
Very occasionally, the overlap between conversion and construction is annoying. While the default method for construction calling convert
may be to blame here, there is still the issue that convert
itself has no guarantees about aliasing.
My point being that given that generic code may be dependent on aliasing behaviour, generally it would be great if each function had documented aliasing guarantees. I'm not sure the current set of functions is built around this idea (for constructors, convert, and many others too)...
Thinking aloud, perhaps a trait or something could describe the aliasing relationship between two types through convert
so at least generic code could introspect and deal with this (at compile time)?
@JeffBezanson While, yes, constructors may tend to allocate new objects, many are immutable wrappers (I'm thinking RowVector
as an example here) which are designed around the idea of not performing copies; OTOH it is aliasing behaviour (often of elements or fields) which is important for programmers to reason about, not just whether the outermost object is semantically a "new" instance (of the outermost type)...
...or constructors are only allowed not to copy their input if it is wrapped in a Move
object. :grin: Next up: attaching &&
to types...
@martinholters see #15120... The fact that the fallback constructor doesn't call copy
is just one way to look at the problem; convert
might not even be defined, so we're basically just guessing at what might work to construct an instance of a type.
Thanks for the analysis @garborg . I wouldn't necessarily apply this issue to non-collections like IOStream
though. And Task(::Task)
should perhaps be an error.
Whatever else is decided here, it's very clear to me that convert
cannot be required to make copies. convert(Any, x)
should be a no-op, and therefore so should conversion to any type that already includes x.
I agree that some types like SubArray
and RowVector
act as "views" of data. They will have to have non-copying constructors; one size doesn't fit all. For an unknown type T
, it's probably impossible and inadvisable to reason about whether T(x)
aliases x
. But I think we can very reasonably be concerned about this case:
function dostuff(pairs)
d = Dict(pairs)
d[newkey] = 0
...
end
Here I'm explicitly constructing a specific type of mutable collection. My function expects an iterator of pairs. It seems to work. But one day somebody passes an argument that's already a Dict
, and the function clobbers their data. Hopefully we can agree that would be a bad situation.
I should point out that Array
has kind of always been a special case here, since we don't have an Array(iterator)
constructor like we do for Set and Dict. collect
is basically this Array
constructor.
This would also be fixed by #15120, since Array(array)
would give a method error instead of calling convert
.
we don't have an
Array(iterator)
constructor like we do for Set and Dict.collect
is basically thisArray
constructor
little old, but xref #16633, #16029
To me, it would seem logical if collect
became more generic and Array(iterator)
did what you'd expect... For v1.0 with inferable keyword arguments (which is going to be kick-ass for API design), could we use something like Array(size = (3,4))
and Vector(length = 3)
?
We could do that, but I'm more worried about the idea of deprecating Array{T}((3,4))
. Would we actually go ahead with that?
In the extreme case we would have to since a tuple of ints is a valid iterable that people might want to collect in an array.
I'm not loving that aspect but the keyword arg version is at least very easy to read (though much less concise!). Would be an extremely annoying deprecation if we did this so not sure if it's truly worth the effort or not...
Most helpful comment
This might be a
convert
vs. construct difference ---convert
should not make a copy if the value is already of the right type, but arguably a constructor should always make a new object (the word "constructor" seems to imply it anyway). I'd vote for the other change, and makeArray
copying.