The default implementation of dup and clone do exactly the opposite of what they are documented to do, they don't return a copy at all.
Shouldn't the default impl at least try to implement something useable?
I think maybe the best thing to do is to remove those methods from Object and implement them on each class in the standard library (well, not on all of them, maybe basic types such as primitives, Array, Hash, Set, etc.). That way if you want to dup/clone something you need to define what the behaviour should be.
Maybe the default implementation for dup can allocate a new object and copy all instance vars from self to the new object. For clone it would be the same, but using clone. I managed to do it with the current language (maybe we could add obj.@var = ...
to not need a separate method, right now we have obj.@var
for reading but not for writing).
class Reference
macro def dup
{{@type}}.allocate.tap &.initialize_dup(self)
end
protected macro def initialize_dup(other)
{% for ivar in @type.instance_vars %}
@{{ivar.id}} = other.@{{ivar.id}}
{% end %}
end
macro def clone
{{@type}}.allocate.tap &.initialize_clone(self)
end
protected macro def initialize_clone(other)
{% for ivar in @type.instance_vars %}
@{{ivar.id}} = other.@{{ivar.id}}.clone
{% end %}
end
end
class Foo
getter x
getter array
def initialize
@x = 1
@array = [1, 2, 3]
end
end
foo = Foo.new
pp foo
pp foo.array.object_id
foo_dup = foo.dup
pp foo_dup
pp foo_dup.array.object_id
foo_clone = foo.clone
pp foo_clone
pp foo_clone.array.object_id
But I'm not sure. Some types are immutable and dup
and clone
should just return self
. Others are mutable but are not meant to be cloned, though in generic code you might invoke clone
on a series of objects and you'd want that to work.
The current way is: dup
and clone
return self
, if you want a more specific behaviour override them. That's what the docs say, so for me that's not broken (the docs don't lie). But it's true that it's not very intuitive and it can lead to buggy code, or unexpected behaviour.
So another alternative is to not provide a default implementation but provide a macro to define dup
and clone
, like above. I think I like this solution more, as it's more explicit.
Other thoughts?
I think the current implementation is buggy, or at least highly unexpected, thus leading to bugs. Either the methods really dup/clone the object, or they should be left undefined on Object, and implemented on each type.
Since Crystal is mutable by default, then it's the job of immutable types to return self, not the other way around. Same for types that aren't meant to be cloned: maybe it should error out, unless this is taken care of (eg. return self if it makes sense).
The current impl fails PoLS since they do _exactly_ the opposite of what the method name suggests.
@mperham What do you mean by "exactly the opposite"? The base implementation just returns self and requires subclasses to override it for specific meaning. Then for example Array
implements dup
by creating a new array and copying the elements as-is to the new array (so it's shallow). It then implements clone
by doing the same, but invoking clone
on each element (so it's deep).
class Foo
end
describe "foo" do
it "should dup" do
f = Foo.new
f.should_not eq(f.dup)
end
end
@mperham I see.
I've been thinking about this and we can probably do this:
dup
and clone
from Objectdup
and clone
for primitive types (Int
, Float
, Nil
, Bool
, Char
) as returning self
, because they are immutabledup
for Reference by using the code belowdup
for Struct by returning self
(because that returns a copy)clone
for Reference
or Struct
: that's logic that is specific to each subclass. We of course implement it for some types in the standard library, like Array
, Hash
, Set
, etc. (well, that's already done)Reference#dup
can be implemented like this:
class Reference
macro def dup
{% begin %}
dup = {{@type}}.allocate
dup.as(Void*).copy_from(self.as(Void*), instance_sizeof({{@type}}))
dup
{% end %}
end
end
It basically allocates an instance of the same type and then copies the contents from self
into the new instance by copying the raw bytes behind the instance. This is probably more efficient and more generic than copying each instance var. It seems to work.
What do you think? I'll probably send a PR with this soon, but it'll target the next version as it's a breaking change (but probably a good breaking change).
isn't in this implementation of Reference#dup
, object_id also copied?
@kostya No, the type_id
is copied (first element behind the pointer), but object_id
is the pointer's address. Basically it's dup.as(Void*).address
: https://play.crystal-lang.org/#/r/z5a
I agree that the current behaviour is _bug-prone_, and "buggy" in relation to identifier-semantics and _expectation_.
Having them "always" work (and making special implementations for efficiency/safety when needed) would indeed be the nicest thing.
As to the "using-pointer-based-copying" version, I think this might be a bad move in a broader perspective. It makes low-level assumptions that makes it harder to integrate possibly automated locking/atomic-assigns generation in dup/clone later on when the model for concurrency, and especially the parallelism part of it, has been decided; depending of course on how openly and sharing it's defined to be/_allow_ (for instance all objs are thread-local except those inheriting Nomad/Shared/Locakable/SyncRef/Whatever and requires special attention and can be passed via channels, or whatever).
But I guess perhaps I'm a bit too pre-emptive in my criticism since none of that is on the table yet, and well, it can simply be rewritten then.
Just airing my thoughts :-)
Did String
get missed here? I've found a few times when I want to use #clone
on a String so I can get a copy of the memory and pass it into a C function that modifies that underlying memory, but I just found out that String#clone
and String#dup
are just returning self
.
@watzon strings are immutable and passing a String
to anything which modifies the underlying buffer is undefined behaviour (String#size
will return the wrong value if you do this)
Please make a copy of the buffer, instead of the String
.
Most helpful comment
@mperham I see.
I've been thinking about this and we can probably do this:
dup
andclone
from Objectdup
andclone
for primitive types (Int
,Float
,Nil
,Bool
,Char
) as returningself
, because they are immutabledup
for Reference by using the code belowdup
for Struct by returningself
(because that returns a copy)clone
forReference
orStruct
: that's logic that is specific to each subclass. We of course implement it for some types in the standard library, likeArray
,Hash
,Set
, etc. (well, that's already done)Reference#dup
can be implemented like this:It basically allocates an instance of the same type and then copies the contents from
self
into the new instance by copying the raw bytes behind the instance. This is probably more efficient and more generic than copying each instance var. It seems to work.What do you think? I'll probably send a PR with this soon, but it'll target the next version as it's a breaking change (but probably a good breaking change).