Julia: `Array(...)` (and constructors for all collection types) should always make copies

Created on 3 Aug 2017  路  13Comments  路  Source: JuliaLang/julia

The behavior between Dict and an Array is inconsistent when it comes to when copying is made:

julia> a = [1,2,3];

julia> b = typeof(a)(a);

julia> b[1] = 3
2

julia> a
3-element Array{Int64,1}:
 3
 2
 3

julia> A = Dict(1=>"a", 2 =>"b", 3=>"c");

julia> B = typeof(A)(A);

julia> B[1] => "c"
"b"=>"c"

julia> A
Dict{Int64,String} with 3 entries:
  2 => "b"
  3 => "c"
  1 => "a" # <---- not "c"

In my opinion, Dict(::Dict) should not make a copy if it can just return the dict.

Reported originally at: https://discourse.julialang.org/t/why-is-d-dict-dict-d/5197

arrays breaking collections

Most helpful comment

This might be a convert vs. construct difference --- convert should not make a copy if the value is already of the right type, but arguably a constructor should always make a new object (the word "constructor" seems to imply it anyway). I'd vote for the other change, and make Array copying.

All 13 comments

This might be a convert vs. construct difference --- convert should not make a copy if the value is already of the right type, but arguably a constructor should always make a new object (the word "constructor" seems to imply it anyway). I'd vote for the other change, and make Array copying.

What about the convert fulfills the construct contract (or however it phrased)?

-(::Type{T})(arg) where {T} = convert(T, arg)::T
+(::Type{T})(arg::T) where {T} = copy(arg)
+(::Type{T})(arg) where {T} = convert(T, arg)::T

???

Here's a survey of what happens calling mutable structs' own constructors on themselves :

julia> tas = [
           (Array, [1,2,3]),
           (BitArray, 2),
           (SharedArray, [1,2,3]),
           (Dict, 2 => "a"),
           (ObjectIdDict, 2 => "a"),
           (Set, "a"),
           (Task, () -> rand(9)),
           (Channel, 2),
           (WeakRef, "a"),
           (Ref, "a"),
           (Regex, "^a"),
           (IOStream, "a"),
           (IOBuffer, "a")
       ];

julia> for (t, a) in tas
           print(t, "\n     ")
           x = t(a)
           y = t(x)
           println(x === y)
       end
Array
     true
BitArray
     true
SharedArray
     true
Dict
     false
ObjectIdDict
     false
Set
     false
Task
     false
Channel
     true
WeakRef
     false
Ref
     true
Regex
     true
IOStream
     true
Base.AbstractIOBuffer{Array{UInt8,1}}
     true

Yikes, that's a bit haphazard.

Very occasionally, the overlap between conversion and construction is annoying. While the default method for construction calling convert may be to blame here, there is still the issue that convert itself has no guarantees about aliasing.

My point being that given that generic code may be dependent on aliasing behaviour, generally it would be great if each function had documented aliasing guarantees. I'm not sure the current set of functions is built around this idea (for constructors, convert, and many others too)...

Thinking aloud, perhaps a trait or something could describe the aliasing relationship between two types through convert so at least generic code could introspect and deal with this (at compile time)?

@JeffBezanson While, yes, constructors may tend to allocate new objects, many are immutable wrappers (I'm thinking RowVector as an example here) which are designed around the idea of not performing copies; OTOH it is aliasing behaviour (often of elements or fields) which is important for programmers to reason about, not just whether the outermost object is semantically a "new" instance (of the outermost type)...

...or constructors are only allowed not to copy their input if it is wrapped in a Move object. :grin: Next up: attaching && to types...

@martinholters see #15120... The fact that the fallback constructor doesn't call copy is just one way to look at the problem; convert might not even be defined, so we're basically just guessing at what might work to construct an instance of a type.

Thanks for the analysis @garborg . I wouldn't necessarily apply this issue to non-collections like IOStream though. And Task(::Task) should perhaps be an error.

Whatever else is decided here, it's very clear to me that convert cannot be required to make copies. convert(Any, x) should be a no-op, and therefore so should conversion to any type that already includes x.

I agree that some types like SubArray and RowVector act as "views" of data. They will have to have non-copying constructors; one size doesn't fit all. For an unknown type T, it's probably impossible and inadvisable to reason about whether T(x) aliases x. But I think we can very reasonably be concerned about this case:

function dostuff(pairs)
    d = Dict(pairs)
    d[newkey] = 0
    ...
end

Here I'm explicitly constructing a specific type of mutable collection. My function expects an iterator of pairs. It seems to work. But one day somebody passes an argument that's already a Dict, and the function clobbers their data. Hopefully we can agree that would be a bad situation.

I should point out that Array has kind of always been a special case here, since we don't have an Array(iterator) constructor like we do for Set and Dict. collect is basically this Array constructor.

This would also be fixed by #15120, since Array(array) would give a method error instead of calling convert.

we don't have an Array(iterator) constructor like we do for Set and Dict. collect is basically this Array constructor

little old, but xref #16633, #16029

To me, it would seem logical if collect became more generic and Array(iterator) did what you'd expect... For v1.0 with inferable keyword arguments (which is going to be kick-ass for API design), could we use something like Array(size = (3,4)) and Vector(length = 3)?

We could do that, but I'm more worried about the idea of deprecating Array{T}((3,4)). Would we actually go ahead with that?

In the extreme case we would have to since a tuple of ints is a valid iterable that people might want to collect in an array.

I'm not loving that aspect but the keyword arg version is at least very easy to read (though much less concise!). Would be an extremely annoying deprecation if we did this so not sure if it's truly worth the effort or not...

Was this page helpful?
0 / 5 - 0 ratings