Picongpu: MPI tag needs range check

Created on 30 Jul 2015  路  24Comments  路  Source: ComputationalRadiationPhysics/picongpu

Tags used for MPI communication needs a check that they were not out of range.

MPI_TAG_UB defines the upper limit for tags.

see discussion in #958

bug PMacc

Most helpful comment

I tried the patch on Juwels. The code completes successfully now.

All 24 comments

Because of this bug, execution aborts on Juwels when using IntelMPI/2019. Any chance we can get this fixed?

Abort(873046788) on node 243 (rank 243 in comm 0): Fatal error in PMPI_Irecv: Invalid tag, error stack:
PMPI_Irecv(162): MPI_Irecv(buf=0x362aa40, count=262144, MPI_CHAR, src=643, tag=1049959, comm=0x84000006, request=0x1f9cdee0) failed
PMPI_Irecv(96).: Invalid tag, value is 1049959

MPI_TAG_UB of MPI_COMM_WORLD in this configuration is 1048575

Thanks for reporting @jprotze !

Just to get some idea about it, does it happen only on large simulations or always?

To whoever investigates this,

I think we can add a check for tags and report/throw to these two routines which seem to cover all cases like that.

I can confirm such tags are generated even for standard LWFA on 1 rank. So it's a general bug in PIConGPU, not some quirk of a particular setup.

Strange I thought we stay always under the guaranteed max tag id <32000.

I think i found the issue and it's not with originally generated tags, just double-checking.

No @psychocoderHPC , I've checked that. Will post in a couple of mins

Sorry, I misinterpreted my print debug output, removed the message. But i think i'm on the right track.

So the actual problem is this. With input communicationTag = 32811 it produces uniqCommunicationTag = 1049954 inside, which is not a valid tag.

Sorry for confusion @psychocoderHPC , my last message should be the actual issue.

So the actual problem is this. With input communicationTag = 32811 it produces uniqCommunicationTag = 1049954 inside, which is not a valid tag.

but communication tag should never be 32811 how can this happen? I know we shift rhe tag by 5 to the left but the tag should always very small.

That happens here (the first place I suspected). Input: communicationTag == 43 and communicationTag | (1u << (20 - 5)) used as the next tag is 32811.

That also explains why the problem always occurs even for small simulations: that bitwise operations means the new tag is always at least 2^15 and then later shifted to 5 more, so roughtly a million at least.

Well spotted. We need to take care when we fix it that tags not collide with tags generated in gridbuffer.

There seems to be runtime checks (and throws) that the tags are unique. So at least if they collide we will know.

@psychocoderHPC do you think it's possible to get rid of this manual-workaroundish scheme alltogether and move fully to pmacc::traits::getNextId() ? I think this kind of initialization in the current form is always done sequentially between the tasks and so should be no issue?

Discussed it in a VC, I will provide a PR with a fix soon.

CC-ing: @jkelling This is a very important topic for the SPEC HPC benchmark! MPI is only guaranteeing up to 32k tags, more is optional and depends on how MPI is compiled.

closed with #3558

So it's fixed in the dev branch now. I am not sure which version you are using @jprotze , there should be no principle issues with applying the same fix to any version, but we could help if needed.

@jprotze He was using the SPEC HPG suite, @jkelling #3558 should be patched for SPEC too

I tried the patch on Juwels. The code completes successfully now.

I tried the patch on Juwels. The code completes successfully now.

Thanks for the feedback!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

psychocoderHPC picture psychocoderHPC  路  4Comments

ax3l picture ax3l  路  4Comments

sbastrakov picture sbastrakov  路  3Comments

ax3l picture ax3l  路  4Comments

ax3l picture ax3l  路  4Comments