Chapel: ZMQ uses incompatible cast in send

Created on 21 Jun 2018  路  17Comments  路  Source: chapel-lang/chapel

When compiling a ZMQ program with CCE (which is more strict on its type checking than gcc),
we encounter a failure on line 771 of the ZMQ module
zmq_msg_init_data(msg, copy.c_str():c_void_ptr, copy.length:size_t, c_ptrTo(free_helper), c_nil) with the c_ptrTo(free_helper) portion being "incompatible" with c_fn_ptr.

Libraries / Modules Bug Portability

Most helpful comment

On my first reading, I missed that we were already assuming the POSIX rule that void * would work as a generic function pointer. Since CCE only runs on POSIX anyway, squelching this warning when transferring to and from void * would be a reasonable feature to request of CCE.

So, we have several possible angles of attack, with varying time frames to implement them.

All 17 comments

@nspark - heads up. I'm not quite sure what to do about this.

What are the underlying C types for c_fn_ptr and c_ptrTo(free_helper)?

Oh, well...

https://github.com/chapel-lang/chapel/blob/46af930ad0a5f619ce95bd59a3fe57d5485e573b/runtime/include/chpltypes.h#L49

I would think this is more than just a ZMQ problem.

C11 J.5.7 ("Common extensions: Function pointer casts") says...

  1. A pointer to a function may be cast to a pointer to an object or to void, allowing a function to be inspected or modified (for example, by a debugger) (6.5.4).

So, it seems Chapel was already assuming certain C language extensions that affect portability.

C11 §6.3.2.3-8 says:

A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the referenced type, the behavior is undefined.

I haven't thought through it extensively, but it would seem to be more portable (i.e., we can avoid the lie) to use something like:

// C
typedef void (c_fn)(void);

// Chapel
type c_fn_ptr = c_ptr(c_fn);

The problem here is that most compilers will warn on incompatible pointer types in assignments or arguments without casting the c_fn_ptr object to a function pointer type compatible with the assignment target or argument. ZMQ provides a typedef for the expected function type, so I think that ZMQ could perform such a cast with:

extern type zmq_free_fn;
zmq_msg_init_data(..., c_ptrTo(free_helper):c_ptr(zmq_free_fn), ...)

I'm not sure what issues such a change would cause other Chapel modules. That said, I'm still surprised this only tripped on CCE with the ZMQ module. This cast of pointer-to-function to pointer-to-void (or vice versa) seems like it should have shown up with other compilers or in other modules. FWIW, GCC warns on this cast with -Wpedantic, while Clang does not.

Any suggestions, @dmk42?

I am also surprised that we have not tripped over this before.

If we were to limit ourselves to POSIX systems, we could use void * as a generic function pointer and possibly avoid the warnings, because in POSIX, void * is required to be able to hold a function pointer. However, I wouldn't rule out the possibility of running into a compiler that would issue a warning anyway.

Maybe a good way to go would be to establish a type that the Chapel compiler recognizes internally as a "generic function pointer." Then it could automatically generate casts whenever it found itself transferring values in and out of that type.

While we work this out, a temporary fix that should be able to get things working for now would be to disable that particular message by adding a -hnomessage=<message-number> flag to CRAYPE_GEN_CFLAGS in $CHPL_HOME/make/compiler/Makefile.cray-prgenv-cray.

As another, potential alternative, if ZMQ really is the only place tripping up on this issue, I think the explicit cast to c_ptr(zmq_free_fn) might work, provided the extern type zmq_free_fn is added too. (I haven't tested this, though.)

On my first reading, I missed that we were already assuming the POSIX rule that void * would work as a generic function pointer. Since CCE only runs on POSIX anyway, squelching this warning when transferring to and from void * would be a reasonable feature to request of CCE.

So, we have several possible angles of attack, with varying time frames to implement them.

I suspect that the temporary fix may be the right way to go at the moment. We will likely run into some known issues with c_fn_ptrs if we push too much on that front (see #9411).

It is strange we didn't run into this before - I suspect that we just don't throw pedantic ever with gcc, as the same program worked just fine when I did a module swap (and has been fine when I've experimented with it locally with clang). Perhaps we don't run the ZMQ directory with CCE?

Right, our nightly testing with CCE covers release/examples and some of the other packages in library/packages but not ZMQ. Once this issue is resolved, it might be worth adding library/packages/ZMQ to the CCE nightly testing.

@lydia-duncan, @dmk42, @mppf: If I've understood, it seems like this issue could be closed in favor of the more general issue #9411. Does that seem correct?

Yes I'm pretty sure this is a duplicate. Note that there are other c_ptr (not c_fn_ptr issues) that are related (in particular #7983). To date, our approach to solve these has been to throw C compiler flags to avoid the warning (e.g. PR #2243). I personally think the c_fn_ptr issue is easier to solve than the c_ptr one, but maybe a similar strategy could work for both.

I'm fine with closing it but we should do the temporary fix at the same time, since this means a basic part of the ZMQ functionality does not work with CCE

I don't know who should logically own doing that, but I'm feeling kinda overwhelmed this sprint cycle (likely self-inflicted) so would appreciate someone else grabbing it.

I don't see why this needs to be addressed this sprint cycle. Am I missing something about its urgency?

Not so much urgent as "if we close this without doing it right away, we are going to forget"

I think the idea is that the temporary fix becomes urgent when this issue is closed. Delaying both to the next sprint would be OK. I agree with that point of view.

Was this page helpful?
0 / 5 - 0 ratings