Mfem: Difference between serial and parallel uniform refinements in a parallel code

Created on 27 Apr 2019  路  2Comments  路  Source: mfem/mfem

What is the difference between those two refinements? What is the right way to set those two levels based on number of processors? I could not find anywhere it is described.

For instance, I have a working parallel code (without AMR):
This works well:
mpirun -n 4 exMHDp -rs 2 -rp 2
But this fails immediately:
mpirun -n 4 exMHDp -rs 4 -rp 0

I thought they would be more or less the same, except maybe partitioning.

Thanks,
Qi

general question

Most helpful comment

Hi, @tangqi ,
The final number of elements will be the same but the ordering would likely be different. However, that's not the most important distinction. The "-rs" flag indicated "uniform refinement in serial" which means that each processor starts with the same mesh and refines it in precisely the same manner. This refined mesh is then used to build a partitioned parallel mesh which essentially means that each processor discards portions of the refined mesh and keeps only that portion assigned to the local processor index. So, as you can imagine if memory is limited and you request a large number of serial refinement levels on a large mesh then this could be a computationally expensive or memory intensive operation.

The "-rp" flag performs uniform refinement in parallel. In this case the mesh has already been partitioned amongst the processors and each processor only refines its local portion. This will be more memory efficient and more computationally efficient. One possible drawback is if you start with a coarse mesh and try to partition it amongst many processors the resulting mesh could be poorly load balanced. For example if you start with a mesh of 800 elements and partition it across 1000 processors then 200 processors won't contain any elements at all. Even if this mesh was partitioned across 500 processors many processors would have twice as many elements as others.

The best strategy is to refine a mesh in serial until it has enough elements that it can be distributed more or less evenly amongst the available processors and then continue to refine in parallel if needed.

Best wishes,
Mark

P.S. I'm not sure why your second case would fail unless you're starting with a very large mesh.

All 2 comments

Hi, @tangqi ,
The final number of elements will be the same but the ordering would likely be different. However, that's not the most important distinction. The "-rs" flag indicated "uniform refinement in serial" which means that each processor starts with the same mesh and refines it in precisely the same manner. This refined mesh is then used to build a partitioned parallel mesh which essentially means that each processor discards portions of the refined mesh and keeps only that portion assigned to the local processor index. So, as you can imagine if memory is limited and you request a large number of serial refinement levels on a large mesh then this could be a computationally expensive or memory intensive operation.

The "-rp" flag performs uniform refinement in parallel. In this case the mesh has already been partitioned amongst the processors and each processor only refines its local portion. This will be more memory efficient and more computationally efficient. One possible drawback is if you start with a coarse mesh and try to partition it amongst many processors the resulting mesh could be poorly load balanced. For example if you start with a mesh of 800 elements and partition it across 1000 processors then 200 processors won't contain any elements at all. Even if this mesh was partitioned across 500 processors many processors would have twice as many elements as others.

The best strategy is to refine a mesh in serial until it has enough elements that it can be distributed more or less evenly amongst the available processors and then continue to refine in parallel if needed.

Best wishes,
Mark

P.S. I'm not sure why your second case would fail unless you're starting with a very large mesh.

Thanks, Mark @mlstowell. Your explanation is very clear. It is very helpful. This is close to what I originally thought. But my code gave me some strange results on my mac, which made me wonder if I was missing something.

After reading your comments, I did more tests on my local machine and cluster. My parallel code is more robust on clusters and generates correct solutions in most cases. But it is possible that I have a subtle bug somewhere. I will use more tools to debug. Thanks again! --Qi

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kvoronin picture kvoronin  路  3Comments

brightzhang91 picture brightzhang91  路  4Comments

salazardetroya picture salazardetroya  路  3Comments

rcarson3 picture rcarson3  路  4Comments

sshiraiwa picture sshiraiwa  路  4Comments