Crystal: pointerof an instance variable in an initialiser does not return the final pointer

Created on 4 Oct 2015 · 10Comments · Source: crystal-lang/crystal

struct Test
    def initialize()
      @test = StaticArray(UInt32, 4)
      p pointerof(@test)
    end

    def array_ptr()
      p pointerof(@test)
    end
end

test = Test.new
test.array_ptr

results in

Pointer(StaticArray(UInt32, 4):Class)@0xbfa6b750
Pointer(StaticArray(UInt32, 4):Class)@0xbfa6b8c8

From what I can see from the LL dump, this is a result of the struct being constructed in one memory location, and then relocated elsewhere. My current workaround for this is calling a function after the initialiser, which works as expected.

Fixing this may require changes in how variables are constructed, and thus may not be feasible. I suggest that a warning be emitted for the use of pointerof on an instance variable in a constructor; this would expose any latent issues.

feature compiler

Source

Philpax

👍2

Most helpful comment

@sdogruyol Issue is still valid, see code below:

struct Test
  def initialize
    @test = StaticArray(UInt32, 4).new(10_u32)
    p pointerof(@test)
  end

  def array_ptr
    p pointerof(@test)
  end
end

test = Test.new
test.array_ptr

Sija on 13 Apr 2018

👍2

All 10 comments

This also happens with self (just got bit by this bug):

class Foo
  def initialize
    self2 = self
    puts "pointerof(self) = #{pointerof(self2)}"
  end
end

foo = Foo.new
puts "pointerof(foo)  = #{pointerof(foo)}"

prints:

pointerof(self) = Pointer(Foo)@0x7fff5d3114c8
pointerof(foo)  = Pointer(Foo)@0x7fff5d311748

…14c8 ≠ …1748.

ddfreyne on 4 Oct 2015

The first might be a bug, but it would be interesting to know your use case. Since a struct is passed by value, I wouldn't rely on pointers to it, because they can quickly become invalid.

The second case is not a bug: self and self2 are two different variables, their memory address is different.

asterite on 4 Oct 2015

My actual code looks something like

struct Test
    def initialize()
      @test = StaticArray(UInt32, 4)
      external_fn_taking_pointer(@test.to_unsafe)
    end
end

struct Test2
    def initialize()
      @test = Test.new
    end
end

test2 = Test2.new

which I would expect to produce a consistent memory address (as the array is a member variable of another member variable)

For comparison, the equivalent C++ works as expected:

#include <cstdio>
#include <cstdint>

struct Test
{
  uint32_t test[4];
  Test()
  {
    printf("%x\n", test);
  }
};

struct Test2
{
  Test test;
};

int main()
{
  Test2 test;
  printf("%x\n", test.test.test);
  return 0;
}

results in

bfab0e40

Philpax on 4 Oct 2015

Sounds a bit strange that structures of any kind should be "inited in one place" and then copied - if that happens for every type instantiation it would be a performance drag.

ozra on 4 Oct 2015

As of 0.24.2 this use-case is not valid. We now enforce the type of the instance variable when initializing.

Error in test.cr:12: instantiating 'Test:Class#new()'

test = Test.new
            ^~~

in test.cr:3: Can't infer the type of instance variable '@test' of Test

The type of a instance variable, if not declared explicitly with
`@test : Type`, is inferred from assignments to it across
the whole program.

The assignments must look like this:

  1. `@test = 1` (or other literals), inferred to the literal's type
  2. `@test = Type.new`, type is inferred to be Type
  3. `@test = Type.method`, where `method` has a return type
     annotation, type is inferred from it
  4. `@test = arg`, with 'arg' being a method argument with a
     type restriction 'Type', type is inferred to be Type
  5. `@test = arg`, with 'arg' being a method argument with a
     default value, type is inferred using rules 1, 2 and 3 from it
  6. `@test = uninitialized Type`, type is inferred to be Type
  7. `@test = LibSome.func`, and `LibSome` is a `lib`, type
     is inferred from that fun.
  8. `LibSome.func(out @test)`, and `LibSome` is a `lib`, type
     is inferred from that fun argument.

Other assignments have no effect on its type.

Can't infer the type of instance variable '@test' of Test

      @test = StaticArray(UInt32, 4)
      ^~~~~

sdogruyol on 13 Apr 2018

@sdogruyol Issue is still valid, see code below:

struct Test
  def initialize
    @test = StaticArray(UInt32, 4).new(10_u32)
    p pointerof(@test)
  end

  def array_ptr
    p pointerof(@test)
  end
end

test = Test.new
test.array_ptr

Sija on 13 Apr 2018

👍2

I would change this from bug to feature.

asterite on 20 Jul 2018

😕2

agreed, this isn't entirely unexpected given struct semantics

RX14 on 21 Jul 2018

I don't understand why copy is required at returning from #new. I think it should be able to handle #new like following on generating code because #initialize guarantees that all instance variables are set in some way.

struct A
  @foo : Int32
  @bar : Pointer(Int32)

  def initialize(@foo)
    @bar = pointerof(@foo)
  end
end

a = uninitialized A
a.initialize(100)

Which extra process is there before or after calling #initialize?
Which kind of #initialize process does need a copy?
Is it just because there is =?

I'm worrying that a struct is (implicitly) copied.

I would like to have an option to disallow any copy operation to a defined struct, like C++ can do this with putting the specific constructor declaration into private.

I (and some people) expect that struct is fixed location and fixed duration of lifetime. This semantic is useful to generate efficient code and sometimes required to call underlying C library, without loosing the experience of the user of wrapped library. For example, considering that the author of C library suggests to their function foo to pass local resource:

void user_func(...)
{
  struct X data;
  Y *ptr;
  ptr = foo(&data, ...);

  /* play with `ptr`, but actually read or write to `data` */
  bar(ptr, ...);
  baz(ptr, ...);
}

Although the content of data is public, no C programmer wants to copy data because it does not make sense.

To implement this in Crystal we need to write:

lib LibFoo
  struct X
     # omit
  end
  alias Y = Void

  fun foo(X*, ...) : Y*
  fun bar(Y*, ...) : Void
  fun baz(Y*, ...) : Void
end

def user_func(...)
  data = uninitialized LibFoo::X
  ptr = LibFoo.foo(pointerof(data), ...)

  # play with `ptr`
  LibFoo.bar(ptr, ...)
  LibFoo.baz(ptr, ...)

  # Is it guaranteed to `data` exists until here?
end

Developers who use Crystal consider this is unsafe and unwant to do. So I have to put this safe without loosing flexibility on calling bar, baz and so on.

About preferable usage of `pointerof` to use a C library

Before putting data into a struct, considering this:

struct Y
  @ptr : Pointer(LibFoo::Y)

  def initialize(@ptr)
  end

  def bar(...)
    LibFoo.bar(@ptr, ...)
  end

  def baz(...)
    LibFoo.baz(@ptr, ...)
  end
end

def bind_to_Y(..., &)
  data = uninitialized LibFoo::X
  ptr = Y.new(LibFoo.foo(pointerof(data), ...)
  ret = yield ptr
  data
  ret
end

# with this, a user can:

def user_func(...)
   bind_to_Y(...) do |y|
     # play with `y`.
     y.bar(...)
     y.baz(...)
   end
end

This looks safe but actually not safe.
|Operation|Safety|
|-------|------|
|Passing to an argument of another method|Safe if the calling method does not do any of followings|
|Copying to another local variable in the block|Safe|
|Passing to nested block|Unsafe unless executed inside this "scope"|
|Copying to a captured variable|Can be Unsafe|
|Returning y from block|Can be Unsafe|
|Passing to raise|Unsafe (because one can implement this with longjmp)|
|...|...|

Moreover, we have to nest to use multiple objects at once. This is inconvenient.

def user_func(...)
  bind_to_Y(...) do |y|
    bind_to_Y(...) do |z|
      # play with `y` and `z`, `y` gets copied (right?).
      y.bar(...)
      z.baz(...)
      y.baz(...)
    end
  end
end

Putting `data` to an instance variable of struct to manage in pair

Because data and ptr is paired together, I may put it to a single struct:

struct Y
  @data : LibFoo::X
  @ptr : Pointer(LibFoo::Y)
  @refdata : ...

  def initialize(@refdata)
    @data = uninitialized LibFoo::X
    @ptr = LibFoo.foo(pointerof(@data), @refdata)
  end

  def bar(...)
    LibFoo.bar(@ptr, ...)
  end

  def baz(...)
    LibFoo.baz(@ptr, ...)
  end
end

# with this, a user can:

def user_func(...)
   y = Y.new(...)
   # play with `y`.
   y.bar(...)
   y.baz(...)
end

This is not work as expected because it is copied when return from #new.

This is the best semantic of providing the experience of LibFoo, however, it is still unsafe:
|Operation|Safety|
|-------|------|
|Passing to an argument of another method|Safe but the passed value still points the original variable|
|Copying to another local variable in the block|Safe but it still points the original variable|
|Passing to nested block|Unsafe unless executed inside this "scope", also it still points the original variable|
|Copying to a captured variable|Can be Unsafe, also it still points the original variable|
|Returning y from method|Unsafe|
|Passing to raise|Unsafe (because one can implement this with longjmp)|
|...|...|

Workaround of above one

Since above does not work, we can make it to do delayed initialization:

struct Y
  @data : LibFoo::X
  @ptr : Pointer(LibFoo::Y)
  @refdata : ...

  def initialize(@refdata)
    @data = uninitialized LibFoo::X
    @ptr = Pointer(LibFoo::Y).null
  end

  def bar(...)
    LibFoo.bar(self, ...)
  end

  def baz(...)
    LibFoo.baz(self, ...)
  end

  def to_unsafe
    if @ptr.null?
      @ptr = LibFoo.foo(pointerof(@data), @refdata)
    end
    @ptr
  end
end

# with this, a user can:

def user_func(...)
   y = Y.new(...)
   # play with `y`.
   y.bar(...)
   y.baz(...)
end

In this case, LibFoo.foo will be called multiple times, if it has been copied before the calling one of these method.

  y = Y.new(...)
  z = y
  z.bar(...) # LibFoo.foo will be called for `z`
  y.baz(...) # LibFoo.foo will be called for `y`

Any copying before calling its methods is safe, but copying after calling any of its methods is possibly unsafe in same condition described above. This is too complicated.

Conclusion

We need a temporary object to bridge the user of binding library and underlying C library, but we don't want to fill the sea of heap memory with the garbage of temporary object, which can be used many many times. However, currently we can not handle a pointer from C library with a struct gracefully. Because it is freely and implicitly copied one to another. Disallowing copy can let the user of binding library to use the struct only in safe way, without loosing efficiency.

If you want a real world use case, here it is.

I have ideas of the methods to disallow copying:

Having protected or private #dup
- Same to do with Reference. (see below)
- Value#dup becomes an "special method" which makes a copy
- a = b where b is a Value should be interpreted as a = b.dup
- a = A.new where A is a Struct class can be interpreted as a = A.new.dup
Having protected or private #initialize(another : self)
- Same as C++.
- a = b where b is a Value should be interpreted as a = b.class.new(b)
- a = A.new where A is a Struct class never involves copy.
Special annotation to a struct
- The user of an annotated struct can never "monkey patch" to reenable copy operation of the struct.

N.B.

To protect Reference from copying is easy. Because it requires #dup to make a copy, so just put #dup to protected or private:

class A
  protected def dup
    self
  end
end

a = A.new
b = a.dup

Showing last frame. Use --error-trace for full trace.

error in line 8
Error: protected method 'dup' called for A

@asterite suggested to disallow to use pointerof to instance variable of struct in #9700. The language reference says using pointerof is (always) unsafe. If so, just disallowing it does not make sense, I think. Any invalid usage of pointerof should not be responsible to Crystal's developers, however you should provide the detailed specification such as "when returned pointer goes invalid". This should be guaranteed by language in some way.

(This table is not true. This depends how the language guarantees (i.e., decided by Crystal's developer))

lugia-kun on 29 Aug 2020

It works like that because of how new is defined, and how structs are passed by value.

@waj was working on improving this situation but it's probably something hard to change

asterite on 29 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Allow accessing named tuples with strings

costajob · 3Comments

Crystal's REPL

ArthurZ · 3Comments

how do i pass a function type to a function

lgphp · 3Comments

Allow type annotation + initialization `@var : Type = value`

oprypin · 3Comments

Rewrite sigfault.c in Crystal

asterite · 3Comments