Chapel: Support returning array of strings from Chapel to Python

Created on 21 Nov 2019  路  18Comments  路  Source: chapel-lang/chapel

I would like to be able to return an array of strings from Chapel when compiling as a Python module:

export proc foobar(): [] string {
  var A = ['foo', 'bar'];
  return A;
}

Here is the result as of chpl version 1.21.0 pre-release (a5c89b0c85):

> chpl retStringArr.chpl --library-python

Error compiling Cython file:
------------------------------------------------------------
...
                self.cleanup()


def foobar():
        cdef chpl_external_array ret_arr = chpl_foobar()
        cdef numpy.ndarray [string, ndim=1] ret = numpy.zeros(shape = ret_arr.num_elts, dtype = string)
                          ^
------------------------------------------------------------

retStringArr.pyx:37:27: Invalid type.
Traceback (most recent call last):
  File "retStringArr.py", line 12, in <module>
    libraries=["retStringArr"] + chpl_libraries + ["retStringArr"])))
  File "/Users/balbrecht/.local/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 1096, in cythonize
    cythonize_one(*args)
  File "/Users/balbrecht/.local/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 1219, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: retStringArr.pyx
Compiler Tools Feature Request

Most helpful comment

That sounds right to me.

However, I think it's probably best to hold off on supporting any more complex/container types until after we've thoroughly explored a more longterm solution for exporting records (and methods on records), which we've rightfully set aside for most of this (last?) release.

So that would mean _yes_ to supporting returning a numpy array of Python str, and _let's hold off_ on adding support for more types like list until after https://github.com/chapel-lang/chapel/issues/14420.

All 18 comments

So there's a bit of an open design question here.

For arrays as of today, we appear to return a numpy.ndarray of elements, which works when the element type is a primitive like an int64, but things get more tricky when the element type is a string.

We currently map Chapel string to Python str. If we wanted to convert an array of strings to a numpy array we'd need to map the Chapel string to something like the numpy unicode_ or string_.

As I see it, we have a few options here:

a) We can convert to a Python list of Python str.
b) We can convert to a numpy ndarray of numpy string_ (or unicode_).
c) We can create a numpy ndarray of Python object, which can store any Python object (in this case a bunch of strings)

I was thinking that it might be appealing to convert arrays of primitive types to numpy arrays and arrays of any other type to Python lists of the converted datatype (i.e., looking into the future when we support exporting records).

What would be the expected behavior for you as a user, @ben-albrecht ?

For the beginner, what is the difference / relationship between Python str and NumPy string_?

The Python str is the builtin Python string type, which is a kind of Python object. The numpy string_ is a fixed width byte string where the size is specified at numpy array creation (because numpy arrays are a contiguous block of memory).

When we create a numpy array, we have to use a numpy datatype.

I would be happy with option (a) for my use-cases, and think it makes sense as the default behavior. It would be nice to have the ability to opt into option (b) especially if you were using multi-dimensional string arrays. I am not sure how that would be specified in the Chapel code though.

What if the Chapel array of strings were a 2D array of strings? What if a List of Chapel strings became a Python list of Python strings and an array of Chapel strings became a NumPy array of NumPy strings?

What if a List of Chapel strings became a Python list of Python strings and an array of Chapel strings became a NumPy array of NumPy strings?

This could work nicely and would extend naturally to other types as well (a list(int) could become a python list instead of numpy array).

How are Python lists stored in memory? Consecutively? Using linked list nodes? Using chunks of some sort?

What about when we return arrays of primitive types such as int64? Should those continue to be mapped to numpy ndarray? Basically a question of consistency.

It sounds like we're advocating a case where the Chapel string type itself can potentially map to multiple different Python types - the Python str on its own and the numpy string_ when exported as an array element.

I don't know how good I feel about that. In addition to being somewhat unintuitive, numpy string_ have the constraint that each string has to allocate as much memory as the largest string in the collection, which makes it kind of inflexible.

Also pinging @lydia-duncan for input here.

What if a List of Chapel strings became a Python list of Python strings and an array of Chapel strings became a NumPy array of NumPy strings?

I was also thinking such a thing myself, that we could introduce first class support for exporting the list type.

How are Python lists stored in memory? Consecutively? Using linked list nodes? Using chunks of some sort?

I believe python lists are stored as a contiguous array of pointers to objects. Appending results in resizing.

What about when we return arrays of primitive types such as int64? Should those continue to be mapped to numpy ndarray?

I would think so, following the pattern of Chapel array translates to NumPy array, Chapel list translates to Python list

I updated my above comment after refreshing my browser.

I would feel more comfortable with perhaps exporting an array of Chapel strings as a ndarray(dt=object), where the datatype is any Python object (in this case a Python str).

This way we aren't actually converting Chapel string to different Python types depending on the context.

I've been following the conversation but I didn't feel I had much to add. I prefer mapping Chapel lists to Python lists and continuing to map Chapel arrays to NumPy arrays. I should point out that we do accept Python lists for Chapel array arguments at the moment (following the Python type permissiveness stance as much as possible) - if we take the stance of returning Python lists for Chapel list returns, should we be more restrictive about what types we accept for Chapel arguments? Or should Chapel list arguments be able to take and convert NumPy arrays in addition to Python lists?

Based on BenA's description of Python's implementation of List, it makes it sound like it would be a better match for the proposed "vector-style list" that we've discussed asking for a 1D contiguous list-based implementation.

So to clarify, the proposal is:

//
// Currently supported
//

// exports a numpy.ndarray(shape=(2,), dtype=int)
export proc foobar(): [] int {
  var A = [1, 2];
  return A;
}

//
// New features
//

// exports a numpy.ndarray(shape=(2,), dtype=str)
export proc foobar(): [] string {
  var A = ['foo', 'bar'];
  return A;
}

// exports a python list(str)
export proc foobar(): list(string) {
  var A: list(string);
  A.append('foo');
  A.append('bar');
  return A;
}

// exports a python list(int)
export proc foobar(): list(int) {
  var A: list(int);
  A.append(1);
  A.append(2);
  return A;
}

// and so on for other supports types...

Does this sound right @dlongnecke-cray ?

That sounds right to me.

However, I think it's probably best to hold off on supporting any more complex/container types until after we've thoroughly explored a more longterm solution for exporting records (and methods on records), which we've rightfully set aside for most of this (last?) release.

So that would mean _yes_ to supporting returning a numpy array of Python str, and _let's hold off_ on adding support for more types like list until after https://github.com/chapel-lang/chapel/issues/14420.

So that would mean yes to supporting returning a numpy array of Python str, and let's hold off on adding support for more types like list until after #14420.

Agreed. Let's consider the goal of this issue to support just this case:

// exports a numpy.ndarray(shape=(2,), dtype=str)
export proc foobar(): [] string {
  var A = ['foo', 'bar'];
  return A;
}

This would be immediately helpful to our code in crayai.

I have created a separate issue to track the export list proposal: https://github.com/chapel-lang/chapel/issues/15423

Super close to having something suitable for a PR, just dealing with a nil dereference bug caused by me improperly claiming memory when shuffling stuff around. After that just a bit of code cleanup, and then I'll post it.


Oh crap, a bit more work if you want to _pass_ arrays of strings and bytes as well, @ben-albrecht.

Nvm, reread above.

Wooo! Turns out the nil deref was actually a memory error because I forgot to call chpl_library_init in my python code!

Was this page helpful?
0 / 5 - 0 ratings