Code Change Guide
The example in this section shows you one of the ways to change a legacy
program to effectively use the advantages of the MPI_THREAD_SPLIT
threading model.
In the original code (thread_split.cpp),
the functions work_portion_1(), work_portion_2(),
and work_portion_3() represent a CPU load
that modifies the content of the memory pointed to by the in
and out pointers. In this particular example,
these functions perform correctness checking of the MPI_Allreduce()
function.
Changes Required to Use the OpenMP* Threading Model
- To run MPI functions in a multithreaded environment, MPI_Init_thread()
with the argument equal to MPI_THREAD_MULTIPLE
must be called instead of MPI_Init().
- According to the MPI_THREAD_SPLIT model,
in each thread you must execute MPI operations over the communicator
specific to this thread only. So, in this example, the MPI_COMM_WORLD
communicator must be duplicated several times so that each thread
has its own copy of MPI_COMM_WORLD.
Note
The limitation is that communicators must be used in such a
way that the thread with thread_id
n on one node communicates only with the thread with thread_id
m on the other. Communications between different threads (thread_id n on one node, thread_id
m on the other) are not supported.
- The data to transfer must be split so that each thread handles
its own portion of the input and output data.
- The barrier becomes a two-stage one: the barriers on the MPI level
and the OpenMP level must be combined.
- Check that the runtime sets up a reasonable affinity for OpenMP
threads. Typically, the OpenMP runtime does this out of the box, but
sometimes, setting up the OMP_PLACES=cores
environment variable might be necessary for optimal multi-threaded
MPI performance.
Changes Required to Use the POSIX Threading Model
- To run MPI functions in a multithreaded environment, MPI_Init_thread()
with the argument equal to MPI_THREAD_MULTIPLE
must be called instead of MPI_Init().
- You must execute MPI collective operation over a specific communicator
in each thread. So the duplication of MPI_COMM_WORLD
should be made, creating a specific communicator for each thread.
- The info key thread_id must be properly
set for each of the duplicated communicators.
Note
The limitation is that communicators must be used in such a
way that the thread with thread_id
n on one node communicates only with the thread with thread_id m on the other. Communications
between different threads (thread_id
n on one node, thread_idi">m
on the other) are not supported.
- The data to transfer must be split so that each thread handles
its own portion of the input and output data.
- The barrier becomes a two-stage one: the barriers on the MPI level
and the POSIX level must be combined.
- The affinity of POSIX threads can be set up explicitly to reach
optimal multithreaded MPI performance.
See Also
thread_split.cpp