Minimal Working Example
class A {
proc fn() : c_ptr(int) {
return c_nil : c_ptr(int);
}
}
proc main() {
var a = new A();
on Locales[0] {
forall 1..1 {
var ptr = a.fn();
}
}
}
Note that removing the explicit cast c_nil : c_ptr(int) shows this error...
type mismatch in assignment from c_void_ptr to c_ptr(int(64))
[Update] To expose the issue: (a) need to compile with --no-local; 'a' can also be an int or other types, any function taking 'a' as an argument can be used instead of a.fn(). Upcoming test:
test/parallel/forall/vass/flatten-with-ovar.chpl
However, with the explicit cast, upon inspection from debugging the compiler via gdb, some information that may assist in debugging this issue can be shown below...
{
<Expr> = {
<BaseAST> = {
_vptr.BaseAST = 0x7a5140 <vtable for SymExpr+16>,
astTag = E_SymExpr,
id = 1125140,
astloc = {
filename = 0xb4d350 "TupleMWE.chpl",
lineno = 11
},
static tabText = <optimized out>
},
parentSymbol = 0xa240090,
parentExpr = 0x0,
list = 0x0,
prev = 0x0,
next = 0x0
},
var = 0x292a7b0,
symbolSymExprsPrev = 0xa590cc0,
symbolSymExprsNext = 0x0
}
Note that the parentExpr is nil, which @bradcray said that this should never happen. As well, the variable parentSymbol, which presumably is the variable declaration, has the name a...
{
<BaseAST> = {
_vptr.BaseAST = 0x7acc70 <vtable for ShadowVarSymbol+16>,
astTag = E_ShadowVarSymbol,
id = 1125141,
astloc = {
filename = 0xb4d350 "TupleMWE.chpl",
lineno = 11
},
static tabText = <optimized out>
},
qual = QUAL_CONST_VAL,
type = 0x29299e0,
flags = {
<std::_Base_bitset<4>> = {
_M_w = {
103079215104,
0,
0,
0
}
},
<No data fields>
},
fieldQualifiers = 0x0,
name = 0xb736d0 "a",
cname = 0xb736d0 "a",
defPoint = 0xa2402c0,
symExprsHead = 0xa240870,
symExprsTail = 0x34fa030
}
I'll leave it at that for now, but hopefully this can assist in a timely resolution of this problem. This code will compile on TIO which is running release 1.17, but it will crash on the tag release/1.17 and master.
This is necessary for resolving issue #9727 by creating an aggregation library/abstraction that can be uses a buffer-pool to minimize downtime; I use c_ptr as I need to be able to create, recycle, and destroy buffers on the fly, and I need to return them from a function as first class functions are currently in a poor state, as per Brad's words: https://github.com/chapel-lang/chapel/issues/9871#issuecomment-402516617. Aggregation Source Code (Can't say that it is fully stable yet as I can't run it yet)
I'm not able to reproduce this with my chpl version 1.18.0 pre-release (88d25b66de) on Mac or Linux, including running under valgrind. Are you still seeing it? If so, can you specify what version of the compiler you're using, what platform you're using, etc.?
chpl --version:$CHPL_HOME/util/printchplenv --anonymize:gcc --version or clang --version:module list:Output of chpl --version
chpl version 1.18.0 pre-release (887bdaf752)
Copyright (c) 2004-2018, Cray Inc. (See LICENSE file for more details)
Output of $CHPL_HOME/util/printchplenv --anonymize
CHPL_TARGET_PLATFORM: cray-xc
CHPL_TARGET_COMPILER: cray-prgenv-gnu
CHPL_TARGET_ARCH: broadwell
CHPL_LOCALE_MODEL: flat
CHPL_COMM: ugni *
CHPL_TASKS: qthreads
CHPL_LAUNCHER: pbs-aprun *
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: intrinsics
CHPL_NETWORK_ATOMICS: ugni
CHPL_GMP: system
CHPL_HWLOC: hwloc
CHPL_REGEXP: re2
CHPL_AUX_FILESYS: none
Output of gcc --version
gcc (GCC) 7.3.0 20180125 (Cray Inc.)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Output of module list
Currently Loaded Modulefiles:
1) modules/3.2.10.6 11) nodehealth/5.6.3-6.0.6.0_19.1__g74feeca.ari 21) craype-hugepages16M
2) alps/6.6.0-6.0.6.0_35.25__gd0a1ab9.ari 12) system-config/3.5.2698-6.0.6.0_17.1__g7ce17b5.ari 22) craype-network-aries
3) nodestat/2.3.78-6.0.6.0_9.30__gbe57af8.ari 13) Base-opts/2.4.131-6.0.6.0_14.1__gd955f55.ari 23) craype/2.5.14
4) sdb/3.3.760-6.0.6.0_24.58__g2c7a3c4.ari 14) totalview-support/1.2.0.21 24) cray-mpich/7.7.0
5) udreg/2.3.2-6.0.6.0_15.18__g5196236.ari 15) totalview/2018.0.5 25) cray-libsci/18.04.1
6) ugni/6.0.14-6.0.6.0_18.12__g777707d.ari 16) moab/9.0.2-1469837953_f87b286-sles12 26) pmi/5.0.13
7) gni-headers/5.0.12-6.0.6.0_3.26__g527b6e1.ari 17) torque/6.0.2.h4 27) atp/2.1.1
8) dmapp/7.1.1-6.0.6.0_51.37__g5a674e0.ari 18) chapel/1.17.1 28) rca/2.2.18-6.0.6.0_19.14__g2aa4f39.ari
9) xpmem/2.2.14-6.0.6.0_10.1__g34333c9.ari 19) gcc/7.3.0 29) perftools-base/7.0.1
10) llm/21.3.522-6.0.6.0_34.1__ga1a54af.ari 20) craype-broadwell 30) PrgEnv-gnu/6.0.4
Hi @LouisJenkinsCS — That SHA (887bdaf752) doesn't seem to correspond to any on the public repository. Are you working from a private branch by any chance?
Things seem to work for me on a copy of 1.17.1 that I've built from source on Mac. Using the 1.17.1 module on a Cray, I'm seeing internal error: FLA0287 chpl version 1.17.1. Is that the same error you're getting? I'm working on a build from source now to see if I can debug it there.
Meanwhile, do you see the same error using a build of Chapel on a desktop/laptop rather than a Cray? Thanks.
I ran make clobber and updated/rebuilt everything, so I am on current version of master...
chpl version 1.18.0 pre-release (b550fa8746)
Copyright (c) 2004-2018, Cray Inc. (See LICENSE file for more details)
Here is the commit I am on according to git log
commit b550fa8746f12fc240347bf248cadb059eeb481e
Merge: da6b1008f7 1590d7a40e
Author: David Iten <[email protected]>
Date: Wed Jul 11 16:42:06 2018 -0500
Merge pull request #10278 from daviditen/hdf5-comment-cleanup
Clean up some comments in the HDF5 module
[comment updates, not reviewed]
Improve comments for HDF5_Chapel routines. For example, don't talk about
"output arrays" in comments about a function reading from disk into arrays.
Also, updated a couple variable names to correspond to the comment changes.
If that helps at all. Also yes, I believe that is the error I am getting (I modified compiler locally to output an error when I was debugging the issue with you on gitter, but that's the only change I made).
Also I do get the same issue on desktop/laptop, here is output of printchplenv --anonymize
CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: gnu
CHPL_TARGET_ARCH: unknown
CHPL_LOCALE_MODEL: flat
CHPL_COMM: gasnet *
CHPL_COMM_SUBSTRATE: udp
CHPL_GASNET_SEGMENT: everything
CHPL_TASKS: qthreads
CHPL_LAUNCHER: amudprun
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: intrinsics
CHPL_NETWORK_ATOMICS: none
CHPL_GMP: gmp
CHPL_HWLOC: hwloc
CHPL_REGEXP: re2
CHPL_AUX_FILESYS: none
OK, thanks. Your new SHA does correspond to one on the master branch.
I can reproduce on a Cray building 1.17.1 from source, with debugging / optimizations on or off. I still haven't been able to reproduce on a laptop / desktop. What version of g++ are you building the compiler with on the linux desktop?
Also, valgrind isn't turning up any errors for the 1.17.1 version that's built from source on the Cray which is both good (that there are no errors) and too bad (in that it'd make debugging the difference between the platforms easier to track down).
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Oh, I see what I was doing wrong in my desktop repro. The internal error only shows up when compiling --no-local (the default when CHPL_COMM!=none as on Crays), and I hadn't picked up that difference between your desktop configuration and mine until now. I'll dig a bit deeper tomorrow.
Thank you for putting in the time and effort here Brad, I know you're extremely busy, I appreciate it!
Also let me know if there is anyway I can assist further.
@vasslitvinov : Would you take a look at this? What I'm seeing is that, when compiled with --no-local the minimal working example at the top of the issue is hitting an assertion error in flattenFunctions.cpp replaceVarUses() line 287 due to a SymExpr that you create in this line of implementForallIntents.cpp not having a parentExpr. My impression is that all Exprs are supposed to have a parentExpr? (though surprisingly to me, running with --verify didn't flag a problem).
(Meanwhile, I'm looking into a workaround for the issue)
@LouisJenkinsCS: PR #10294 seems to work around the problem for your minimal working example, though it doesn't seem obviously directly related to the failure mode that I asked Vass to look into just above (i.e., I think Vass should still put effort into trying to understand what's going wrong there). I'll be curious whether that PR improves the behavior for your original motivating case for this issue as well or not.
Thank you Brad, that was quick! I super appreciate it. I'll checkout the PR as soon as I can
I still get the assertion error in my actual code, even when I checkout and run your current PR. I tried multiple things, but in the end I still get hit same thing. I guess the MWE doesn't show enough of the problem.
Okay so after removing the explicit casts, I get the following...
DestinationBuffers.chpl:45: In function 'getBuffer':
DestinationBuffers.chpl:61: error: illegal expression to return by ref
DestinationBuffers.chpl:188: In function 'aggregate':
DestinationBuffers.chpl:219: error: illegal expression to return by ref
Basically, wherever we return a tuple.
I still get the assertion error in my actual code
OK, as planned then, we'll have Vass look at what's going on for your simple example without my PR in order to understand what's causing the assertion error and hopefully fix it.
Just ran into Vass who confirmed that this is in his court (due to code he owns) and that he's looking for a fix.
PR #10294 seems to work around the problem for your minimal working example
@vasslitvinov pointed out that while this PR permits the cast in MWE to be removed, it doesn't fix the core issue—that I accidentally went back to forgetting to throw --no-local (d'oh!) On the plus side, that's a little reassuring since I couldn't figure out why my fix would change the behavior of the code related to the assertion failure.
Cool, it works! Thanks @bradcray and @vasslitvinov