DOLFINx in Parallel with MPI

Authors: Jack S. Hale, Corrado Maurini.

In scripts using DOLFINx you will have seen the use of code like MPI.COMM_WORLD and x.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD).

This notebook aims to explain when and why you need to use them in your own scripts.

Because notebooks do not elegantly support MPI we will execute the scripts from within the Jupyter Notebook using the shell magic !.

DOLFINx uses MPI-based parallelism

DOLFINx uses the Message Passing Interface (MPI) model to execute your code in parallel.

Simply put, MPI allows messages to be communicated very quickly between processes running on the same or even different computers (e.g. in a high-performance computing cluster).

A very simplified description of MPI is as follows.

  1. \(N\) processes are started within a communicator.

  2. The communicator containing all processes is called the world communicator. You will usually use the world communicator, although splitting the communicator is possible.

  3. Each process is given by the communicator a unique identifier called the rank.

Hello World

!cat 01-hello-world.py
print("Hello world")
!mpirun -n 1 python3 01-hello-world.py
Hello world
!mpirun -n 2 python3 01-hello-world.py
Hello world
Hello world

Two totally separate processes printed Hello world to the screen. Not very exciting!

Hello World with MPI

Python has makes MPI through the optional mpi4py package (https://mpi4py.readthedocs.io/en/stable/index.html).

!cat 02-hello-world-mpi.py
from mpi4py import MPI

comm = MPI.COMM_WORLD
print(f"Hello world from rank {comm.rank} of {comm.size}")
!mpirun -n 1 python3 02-hello-world-mpi.py
Hello world from rank 0 of 1
!mpirun -n 2 python3 02-hello-world-mpi.py
Hello world from rank 0 of 2
Hello world from rank 1 of 2

What happened? Two totally separate processes printed their rank (their unique identifier within the communicator) to the screen.

Some basic communication

!cat 03-communicate.py
from mpi4py import MPI

comm = MPI.COMM_WORLD
assert(comm.size == 2)

if comm.rank == 0:
    b = 3
    c = 5
    a = b + c
    comm.send(a, dest=1, tag=20)
    print(f"Rank {comm.rank} a: {a}")
elif comm.rank == 1:
    a = comm.recv(source=0, tag=20)
    print(f"Rank {comm.rank} a: {a}")
!mpirun -n 2 python3 03-communicate.py
Rank 0 a: 8
Rank 1 a: 8

MPI can do a lot more than this (https://mpi4py.readthedocs.io/en/stable/tutorial.html). The key points are:

  • \(N\) identical versions of your program run, one on each process (rank). Each process takes different paths through the program depending on its rank.

  • Processes (and hence their memory) are totally independent.

  • Communication between processes is must be manually performed by the programmer (explicit).

MPI and DOLFINx

DOLFINx abstracts most of the difficult aspects of distributing your problem across the MPI communicator away from the user.

!cat 04-mpi-dolfinx.py
from mpi4py import MPI
import dolfinx
import dolfinx.io

# DOLFINx uses mpi4py communicators.
comm = MPI.COMM_WORLD

def mpi_print(s):
    print(f"Rank {comm.rank}: {s}")

# When you construct a mesh you must pass an MPI communicator.
# The mesh will automatically be *distributed* over the ranks of the MPI communicator.
# Important: In this script we use dolfinx.cpp.mesh.GhostMode.none.
# This is *not* the default (dolfinx.cpp.mesh.GhostMode.shared_facet).
# We will discuss the effects of the ghost_mode parameter in the next section.
mesh = dolfinx.UnitSquareMesh(comm, 1, 1, diagonal="right", ghost_mode=dolfinx.cpp.mesh.GhostMode.none)
mesh.topology.create_connectivity_all()

mpi_print(f"Number of local cells: {mesh.topology.index_map(2).size_local}")
mpi_print(f"Number of global cells: {mesh.topology.index_map(2).size_global}")
mpi_print(f"Number of local vertices: {mesh.topology.index_map(0).size_local}")
mpi_print("Cell (dim = 2) to vertex (dim = 0) connectivity")
mpi_print(mesh.topology.connectivity(2, 0))
!mpirun -n 1 python3 04-mpi-dolfinx.py
Rank 0: Number of local cells: 2
Rank 0: Number of global cells: 2
Rank 0: Number of local vertices: 4
Rank 0: Cell (dim = 2) to vertex (dim = 0) connectivity
Rank 0: <AdjacencyList> with 2 nodes
  0: [0 1 3 ]
  1: [0 2 3 ]
!mpirun -n 2 python3 04-mpi-dolfinx.py
Rank 0: Number of local cells: 1
Rank 0: Number of global cells: 2
Rank 0: Number of local vertices: 3
Rank 0: Cell (dim = 2) to vertex (dim = 0) connectivity
Rank 0: <AdjacencyList> with 1 nodes
  0: [1 0 2 ]
Rank 1: Number of local cells: 1
Rank 1: Number of global cells: 2
Rank 1: Number of local vertices: 1
Rank 1: Cell (dim = 2) to vertex (dim = 0) connectivity
Rank 1: <AdjacencyList> with 1 nodes
  0: [2 0 1 ]

Now we will run a similar script but with ghost_mode=dolfinx.cpp.mesh.GhostMode.shared_facet passed to the mesh constructor. It also prints a bit more output to help us understand what is going on.

!cat 05-mpi-dolfinx-ghosts.py
from mpi4py import MPI
import dolfinx
import dolfinx.io

# DOLFINx uses mpi4py communicators.
comm = MPI.COMM_WORLD

def mpi_print(s):
    print(f"Rank {comm.rank}: {s}")

# When you construct a mesh you must pass an MPI communicator.
# The mesh will automatically be *distributed* over the ranks of the MPI communicator.
# Important: In this script we use dolfinx.cpp.mesh.GhostMode.none.
# This is *not* the default (dolfinx.cpp.mesh.GhostMode.shared_facet).
# We will discuss the effects of the ghost_mode parameter in the next section.
mesh = dolfinx.UnitSquareMesh(comm, 1, 1, diagonal="right", ghost_mode=dolfinx.cpp.mesh.GhostMode.shared_facet)
mesh.topology.create_connectivity_all()

mpi_print(f"Number of local cells: {mesh.topology.index_map(2).size_local}")
mpi_print(f"Number of global cells: {mesh.topology.index_map(2).size_global}")
mpi_print(f"Number of local vertices: {mesh.topology.index_map(0).size_local}")
mpi_print(f"Number of global vertices: {mesh.topology.index_map(0).size_global}")
mpi_print("Cell (dim = 2) to vertex (dim = 0) connectivity")
mpi_print(mesh.topology.connectivity(2, 0))

if comm.size == 1:
    mpi_print("Cell (dim = 2) to facet (dim = 0) connectivity")
    mpi_print(mesh.topology.connectivity(2, 1))
    
if comm.size == 2:
    mpi_print(f"Ghost cells (global numbering): {mesh.topology.index_map(2).ghosts}")
    mpi_print(f"Ghost owner rank: {mesh.topology.index_map(2).ghost_owner_rank()}")
!mpirun -n 1 python3 05-mpi-dolfinx-ghosts.py
Rank 0: Number of local cells: 2
Rank 0: Number of global cells: 2
Rank 0: Number of local vertices: 4
Rank 0: Number of global vertices: 4
Rank 0: Cell (dim = 2) to vertex (dim = 0) connectivity
Rank 0: <AdjacencyList> with 2 nodes
  0: [0 1 3 ]
  1: [0 2 3 ]

Rank 0: Cell (dim = 2) to facet (dim = 0) connectivity
Rank 0: <AdjacencyList> with 2 nodes
  0: [3 2 0 ]
  1: [4 2 1 ]

There is no difference in the output when running on a MPI communicator with a single rank.

However, when we run with two ranks we see something quite different.

With the shared facet ghost mode enabled, each process will also store information about some cells owned by the neighbouring process. These cells are called ghost cells.

In shared facet mode the logic of which cells are ghost cells is as follows:

  • All cells in the mesh share a common facet with one or more other cells.

  • The cells are partitioned between \(N\) MPI ranks. The set of cells associated with each MPI rank is said to be local to or owned by the rank.

  • If two cells are connected by shared facet and are on different MPI ranks then the topological and geometrical information about the cell owned by the other rank is duplicated. This duplicated set of cells associated with the other rank are called the ghost cells.

!mpirun -n 2 python3 05-mpi-dolfinx-ghosts.py
Rank 0: Number of local cells: 1
Rank 0: Number of global cells: 2
Rank 0: Number of local vertices: 3
Rank 0: Number of global vertices: 4
Rank 0: Cell (dim = 2) to vertex (dim = 0) connectivity
Rank 0: <AdjacencyList> with 2 nodes
  0: [1 0 2 ]
  1: [1 3 2 ]

Rank 0: Ghost cells (global numbering): [1]
Rank 0: Ghost owner rank: [1]
Rank 1: Number of local cells: 1
Rank 1: Number of global cells: 2
Rank 1: Number of local vertices: 1
Rank 1: Number of global vertices: 4
Rank 1: Cell (dim = 2) to vertex (dim = 0) connectivity
Rank 1: <AdjacencyList> with 2 nodes
  0: [2 0 1 ]
  1: [2 3 1 ]

Rank 1: Ghost cells (global numbering): [0]
Rank 1: Ghost owner rank: [0]

FunctionSpace

We will now look at how a ghosted Mesh creates a ghosted FunctionSpace.

Consider a continuous first-order Lagrange space on the mesh. The degrees of freedom of this space are associated with the vertices of the mesh.

!cat 06-mpi-dolfinx-function-space.py
from mpi4py import MPI
import dolfinx
import dolfinx.io

comm = MPI.COMM_WORLD

def mpi_print(s):
    print(f"Rank {comm.rank}: {s}")

mesh = dolfinx.UnitSquareMesh(comm, 1, 1, diagonal="right", ghost_mode=dolfinx.cpp.mesh.GhostMode.shared_facet)

V = dolfinx.FunctionSpace(mesh, ("CG", 1))

mpi_print(f"Global size: {V.dofmap.index_map.size_global}")
mpi_print(f"Local size: {V.dofmap.index_map.size_local}")
mpi_print(f"Ghosts (global numbering): {V.dofmap.index_map.ghosts}")
!mpirun -n 1 python3 06-mpi-dolfinx-function-space.py
Rank 0: Global size: 4
Rank 0: Local size: 4
Rank 0: Ghosts (global numbering): []
!mpirun -n 2 python3 06-mpi-dolfinx-function-space.py
Rank 0: Global size: 4
Rank 0: Local size: 3
Rank 0: Ghosts (global numbering): [3]
Rank 1: Global size: 4
Rank 1: Local size: 1
Rank 1: Ghosts (global numbering): [2 0 1]

Functions

A Function is built from a FunctionSpace. It contains memory (an array) in which the expansion coefficients (\(u_i\)) of the finite element basis (\(\phi_i\)) can be stored.

\[u_h = \sum_{i = 1}^4 \phi_i u_i\]

A Function built from a ghosted FunctionSpace has memory to store the expansion coefficients of the local degrees of freedom and the ghost degrees of freedom.

!cat 07-mpi-dolfinx-function.py
from mpi4py import MPI
import dolfinx
import dolfinx.io

comm = MPI.COMM_WORLD

def mpi_print(s):
    print(f"Rank {comm.rank}: {s}")

mesh = dolfinx.UnitSquareMesh(comm, 1, 1, diagonal="right", ghost_mode=dolfinx.cpp.mesh.GhostMode.shared_facet)

V = dolfinx.FunctionSpace(mesh, ("CG", 1))

u = dolfinx.Function(V)
vector = u.vector

mpi_print(f"Local size of vector: {vector.getLocalSize()}")

# .localForm() allows us to access the local array with space for both owned and local degrees of freedom.
with vector.localForm() as v_local:
    mpi_print(f"Local + Ghost size of vector: {v_local.getLocalSize()}")
    
vector.ghostUpdate()
!mpirun -n 1 python3 07-mpi-dolfinx-function.py
Rank 0: Local size of vector: 4
Rank 0: Local + Ghost size of vector: 4
!mpirun -n 2 python3 07-mpi-dolfinx-function.py
Rank 1: Local size of vector: 1
Rank 1: Local + Ghost size of vector: 4
Rank 0: Local size of vector: 3
Rank 0: Local + Ghost size of vector: 4

Simple scattering

Let’s say we want to change the expansion coefficient \(\phi_0\) (local numbering) on both processes to have a value equal to the MPI rank + 1 of the owning process. For consistency we need this change to be reflected in two places:

  1. In the memory of the process that owns the degree of freedom.

  2. In the memory of the process (if any) that has the degree of freedom as a ghost.

There are two ways to do this:

  1. Insert the values on both processes (i.e. four local set operations, some involving owned and some involving ghost dofs).

  2. Insert the values on the owning processes (i.e. two local set operations) and then scatter/communicate the values to the ghost dofs of the other process.

!cat 08-mpi-dolfinx-simple-scatter.py
from mpi4py import MPI
from petsc4py import PETSc
import dolfinx
import dolfinx.io

comm = MPI.COMM_WORLD

def mpi_print(s):
    print(f"Rank {comm.rank}: {s}")

mesh = dolfinx.UnitSquareMesh(comm, 1, 1, diagonal="right", ghost_mode=dolfinx.cpp.mesh.GhostMode.shared_facet)

V = dolfinx.FunctionSpace(mesh, ("CG", 1))

u = dolfinx.Function(V)
vector = u.vector

# Set the value locally. No communication is performed.
u.vector.setValueLocal(0, comm.rank + 1)

# Print the local and ghosted memory to screen. Notice that the memory on each process is inconsistent.
mpi_print("Before communication")
with vector.localForm() as v_local:
    mpi_print(v_local.array)
    
vector.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)

mpi_print("After communication")
with vector.localForm() as v_local:
    mpi_print(v_local.array)
!mpirun -n 2 python3 08-mpi-dolfinx-simple-scatter.py
Rank 1: Before communication
Rank 1: [2. 0. 0. 0.]
Rank 1: After communication
Rank 1: [2. 0. 1. 0.]
Rank 0: Before communication
Rank 0: [1. 0. 0. 0.]
Rank 0: After communication
Rank 0: [1. 0. 0. 2.]

Assembling vectors

Now we want to assemble a linear form \(L(v)\) into a vector \(b\) with

\[L(v) = \int_{\Omega} v \; \mathrm{d}x\]

When we call dolfinx.fem.assemble_vector(L) the following happens:

  1. Each process calculates the cell tensors \(b_T\) for cells that it owns.

  2. It assembles (adds) the result into its local array.

At this point no parallel communication has taken place! The vector is in an inconsistent state. It should not be used.

First, we need to take the values in the ghost regions and accumulate them into the owners values.

b.ghostUpdate(addv=PETSc.InsertMode.ADD, mode=PETSc.ScatterMode.REVERSE)

It is important to note that the ghosted part of the vector is still in an inconsistent state even after this call. However, it can be safely used for e.g. matrix-vector products (i.e. solving).

To update the ghost values with values from the owner.

b.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)

After this call all owned and ghosted values on all processes are in a consistent state.

!cat 09-mpi-dolfinx-assemble-vector.py
from mpi4py import MPI
from petsc4py import PETSc
import dolfinx
import dolfinx.io
import ufl
import os

comm = MPI.COMM_WORLD

def mpi_print(s):
    print(f"Rank {comm.rank}: {s}")

mesh = dolfinx.UnitSquareMesh(comm, 1, 1, diagonal="right", ghost_mode=dolfinx.cpp.mesh.GhostMode.shared_facet)

V = dolfinx.FunctionSpace(mesh, ("CG", 1))

u = dolfinx.Function(V)
v = ufl.TestFunction(V)

L = ufl.inner(1.0, v)*ufl.dx

b = dolfinx.fem.assemble_vector(L)

mpi_print("Before communication")
with b.localForm() as b_local:
    mpi_print(b_local.array)
    
print("\n")

# This call takes the values from the ghost regions and accumulates (adds) them to the owning process.
b.ghostUpdate(addv=PETSc.InsertMode.ADD, mode=PETSc.ScatterMode.REVERSE)

mpi_print("After ADD/REVERSE update")
with b.localForm() as b_local:
    mpi_print(b_local.array)
    
print("\n")

# Important point: The ghosts are still inconsistent!
# This call takes the values from the owning processes and updates the ghosts.
b.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)

mpi_print("After INSERT/FORWARD update")
with b.localForm() as b_local:
    mpi_print(b_local.array)
!mpirun -n 1 python3 09-mpi-dolfinx-assemble-vector.py
Rank 0: Before communication
Rank 0: [0.17 0.33 0.33 0.17]


Rank 0: After ADD/REVERSE update
Rank 0: [0.17 0.33 0.33 0.17]


Rank 0: After INSERT/FORWARD update
Rank 0: [0.17 0.33 0.33 0.17]
!mpirun -n 2 python3 09-mpi-dolfinx-assemble-vector.py
Rank 0: Before communication
Rank 0: [0.16666667 0.16666667 0.16666667 0.        ]


Rank 0: After ADD/REVERSE update
Rank 0: [0.33333333 0.16666667 0.33333333 0.        ]


Rank 0: After INSERT/FORWARD update
Rank 0: [0.33333333 0.16666667 0.33333333 0.16666667]
Rank 1: Before communication
Rank 1: [0.16666667 0.16666667 0.16666667 0.        ]


Rank 1: After ADD/REVERSE update
Rank 1: [0.16666667 0.16666667 0.16666667 0.        ]


Rank 1: After INSERT/FORWARD update
Rank 1: [0.16666667 0.33333333 0.33333333 0.16666667]

Matrix-Vector products

Explain basic aspects of a parallel matrix-vector product.

Explain why having consistent ghost values on each rank is not necessary for correct matrix-vector product.