Crystal: Object#clone is broken

Created on 20 May 2016 · 11Comments · Source: crystal-lang/crystal

The default implementation of dup and clone do exactly the opposite of what they are documented to do, they don't return a copy at all.

https://github.com/crystal-lang/crystal/blob/bbf7c350dcb11e8906631fb62ac87fb965b23f4d/src/object.cr#L162

Shouldn't the default impl at least try to implement something useable?

bug stdlib

Source

mperham

👀1

Most helpful comment

@mperham I see.

I've been thinking about this and we can probably do this:

Remove dup and clone from Object
Implement dup and clone for primitive types (Int, Float, Nil, Bool, Char) as returning self, because they are immutable
Implement dup for Reference by using the code below
Implement dup for Struct by returning self (because that returns a copy)
We don't implement clone for Reference or Struct: that's logic that is specific to each subclass. We of course implement it for some types in the standard library, like Array, Hash, Set, etc. (well, that's already done)

Reference#dup can be implemented like this:

class Reference
  macro def dup
    {% begin %}
      dup = {{@type}}.allocate
      dup.as(Void*).copy_from(self.as(Void*), instance_sizeof({{@type}}))
      dup
    {% end %}
  end
end

It basically allocates an instance of the same type and then copies the contents from self into the new instance by copying the raw bytes behind the instance. This is probably more efficient and more generic than copying each instance var. It seems to work.

What do you think? I'll probably send a PR with this soon, but it'll target the next version as it's a breaking change (but probably a good breaking change).

asterite on 21 May 2016

👍5 ❤2

All 11 comments

I think maybe the best thing to do is to remove those methods from Object and implement them on each class in the standard library (well, not on all of them, maybe basic types such as primitives, Array, Hash, Set, etc.). That way if you want to dup/clone something you need to define what the behaviour should be.

Maybe the default implementation for dup can allocate a new object and copy all instance vars from self to the new object. For clone it would be the same, but using clone. I managed to do it with the current language (maybe we could add obj.@var = ... to not need a separate method, right now we have obj.@var for reading but not for writing).

class Reference
  macro def dup
    {{@type}}.allocate.tap &.initialize_dup(self)
  end

  protected macro def initialize_dup(other)
    {% for ivar in @type.instance_vars %}
      @{{ivar.id}} = other.@{{ivar.id}}
    {% end %}
  end

  macro def clone
    {{@type}}.allocate.tap &.initialize_clone(self)
  end

  protected macro def initialize_clone(other)
    {% for ivar in @type.instance_vars %}
      @{{ivar.id}} = other.@{{ivar.id}}.clone
    {% end %}
  end
end

class Foo
  getter x
  getter array

  def initialize
    @x = 1
    @array = [1, 2, 3]
  end
end

foo = Foo.new
pp foo
pp foo.array.object_id

foo_dup = foo.dup
pp foo_dup
pp foo_dup.array.object_id

foo_clone = foo.clone
pp foo_clone
pp foo_clone.array.object_id

In play

But I'm not sure. Some types are immutable and dup and clone should just return self. Others are mutable but are not meant to be cloned, though in generic code you might invoke clone on a series of objects and you'd want that to work.

The current way is: dup and clone return self, if you want a more specific behaviour override them. That's what the docs say, so for me that's not broken (the docs don't lie). But it's true that it's not very intuitive and it can lead to buggy code, or unexpected behaviour.

So another alternative is to not provide a default implementation but provide a macro to define dup and clone, like above. I think I like this solution more, as it's more explicit.

Other thoughts?

asterite on 21 May 2016

I think the current implementation is buggy, or at least highly unexpected, thus leading to bugs. Either the methods really dup/clone the object, or they should be left undefined on Object, and implemented on each type.

Since Crystal is mutable by default, then it's the job of immutable types to return self, not the other way around. Same for types that aren't meant to be cloned: maybe it should error out, unless this is taken care of (eg. return self if it makes sense).

ysbaddaden on 21 May 2016

The current impl fails PoLS since they do _exactly_ the opposite of what the method name suggests.

mperham on 21 May 2016

@mperham What do you mean by "exactly the opposite"? The base implementation just returns self and requires subclasses to override it for specific meaning. Then for example Array implements dup by creating a new array and copying the elements as-is to the new array (so it's shallow). It then implements clone by doing the same, but invoking clone on each element (so it's deep).

asterite on 21 May 2016

class Foo
end

describe "foo" do
  it "should dup" do
    f = Foo.new
    f.should_not eq(f.dup)
  end
end

mperham on 21 May 2016

@mperham I see.

I've been thinking about this and we can probably do this:

Remove dup and clone from Object
Implement dup and clone for primitive types (Int, Float, Nil, Bool, Char) as returning self, because they are immutable
Implement dup for Reference by using the code below
Implement dup for Struct by returning self (because that returns a copy)
We don't implement clone for Reference or Struct: that's logic that is specific to each subclass. We of course implement it for some types in the standard library, like Array, Hash, Set, etc. (well, that's already done)

Reference#dup can be implemented like this:

class Reference
  macro def dup
    {% begin %}
      dup = {{@type}}.allocate
      dup.as(Void*).copy_from(self.as(Void*), instance_sizeof({{@type}}))
      dup
    {% end %}
  end
end

What do you think? I'll probably send a PR with this soon, but it'll target the next version as it's a breaking change (but probably a good breaking change).

asterite on 21 May 2016

👍5 ❤2

isn't in this implementation of Reference#dup, object_id also copied?

kostya on 22 May 2016

@kostya No, the type_id is copied (first element behind the pointer), but object_id is the pointer's address. Basically it's dup.as(Void*).address: https://play.crystal-lang.org/#/r/z5a

asterite on 22 May 2016

I agree that the current behaviour is _bug-prone_, and "buggy" in relation to identifier-semantics and _expectation_.

Having them "always" work (and making special implementations for efficiency/safety when needed) would indeed be the nicest thing.

As to the "using-pointer-based-copying" version, I think this might be a bad move in a broader perspective. It makes low-level assumptions that makes it harder to integrate possibly automated locking/atomic-assigns generation in dup/clone later on when the model for concurrency, and especially the parallelism part of it, has been decided; depending of course on how openly and sharing it's defined to be/_allow_ (for instance all objs are thread-local except those inheriting Nomad/Shared/Locakable/SyncRef/Whatever and requires special attention and can be passed via channels, or whatever).

But I guess perhaps I'm a bit too pre-emptive in my criticism since none of that is on the table yet, and well, it can simply be rewritten then.

Just airing my thoughts :-)

ozra on 22 May 2016

Did String get missed here? I've found a few times when I want to use #clone on a String so I can get a copy of the memory and pass it into a C function that modifies that underlying memory, but I just found out that String#clone and String#dup are just returning self.

watzon on 9 Dec 2019

👀1

@watzon strings are immutable and passing a String to anything which modifies the underlying buffer is undefined behaviour (String#size will return the wrong value if you do this)

Please make a copy of the buffer, instead of the String.