MPI Usage
About employing MPI as a distributed runtime backend.
The DAPHNE runtime is designed with the goal of supporting various distributed backends that rely on various technologies, e.g. MPI and RPC.
This document shows how a DAPHNE user can execute DaphneDSL scripts on a distributed computing environment with the MPI backend of the DAPHNE runtime.
This document assumes that DAPHNE was built with the --mpi
option, i.e., by ./build.sh --mpi
.
DAPHNE's build script uses Open MPI.
It does not configure the Open MPI installation with the Slurm support option.
For users who want to add Slurm, please visit the Open MPI documentation (adding --with-slurm
to the build command of the Open MPI library) and edit the DAPHNE build script.
Also, users who want to use other MPI implementations, e.g., Intel MPI may edit the corresponding part in the DAPHNE build script.
When DAPHNE is Installed Natively (w/o Container)
-
Ensure that your system knows about the installed MPI: The
PATH
andLD_LIBRARY_PATH
environment variables have to be updated as follows:export PATH=$PATH:<DAPHNE_INSTALLATION>/thirdparty/installed/bin/ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<DAPHNE_INSTALLATION>//thirdparty/installed/lib/
Please do not forget to replace
<DAPHNE_INSTALLATION>
with the actual path. -
Run the basic example
scripts/examples/matrix_addition_for_mpi.daph
as follows:
The command above executes 10 processes locally on one machine.
In order to run on a distributed system, you need to provide the machine names or the file which contains the machine names.
For instance, assuming that my_hostfile
is a text file that contains machine names, execute the following command:
mpirun -np 10 --hostfile my_hostfile ./bin/daphne --distributed --dist_backend=MPI scripts/examples/matrix_addition_for_mpi.daph
The command above starts 10 processes distributed on the hosts specified in the file my_hostfile
.
For more options, please see the Open MPI documentation.
From a DAPHNE runtime point of view, the --distributed
option tells the DAPHNE runtime to utilize the distributed backend, while the option --dist_backend=MPI
indicates the type of the backend implementation.
When DAPHNE is Installed with Containers (e.g. Singularity)
The main difference is that the mpirun
command is called at the level of the container as follows:
mpirun -np 10 singularity exec <singularity-image> daphne/bin/daphne --distributed --dist_backend=MPI --vec --num-threads=2 daphne/scripts/examples/matrix_addition_for_mpi.daph
Please do not forget to replace <singularity-image>
with the actual Singularity image.