Working With MPI¶

MPI, short for Message Passing Interface, is a communication protocol for parallel applications that is widely used in HPC. This tutorial is intended for administrators who want to enable MPI support on their endpoint, as well as users who want to submit MPI jobs to such an endpoint.

Note

Compute’s MPI system is a wrapper over Parsl’s MPI system. For more details on how the latter works, see the Parsl documentation.

Configure an Endpoint¶

If you are starting from scratch, you will need to initialize an endpoint with the configure subcommand:

$ globus-compute-endpoint configure my-ep

Modify the Configuration Template¶

Note

For more details on MPI-related configuration options, see Configuring for MPI.

Start with a clean user_config_template.yaml.j2:

engine:
    type: GlobusComputeEngine
    max_workers_per_node: 1

    provider:
        type: LocalProvider

        min_blocks: 0
        max_blocks: 1
        init_blocks: 1

Update the config to use GlobusMPIEngine and SimpleLauncher:

engine:
    type: GlobusMPIEngine
    max_workers_per_node: 1

    provider:
        type: LocalProvider

        launcher:
            type: SimpleLauncher

        min_blocks: 0
        max_blocks: 1
        init_blocks: 1

Depending on the target system, set the correct provider and mpi_launcher. For this tutorial, we’ll use Slurm; check Example Configurations for examples based on other schedulers.

engine:
    type: GlobusMPIEngine
    mpi_launcher: srun
    max_workers_per_node: 1

    provider:
        type: SlurmProvider

        launcher:
            type: SimpleLauncher

        min_blocks: 0
        max_blocks: 1
        init_blocks: 1

Finally, configure the shape of the resources available to MPI tasks. Set nodes_per_block to configure the size of the block Parsl will reserve for MPI tasks, and set max_workers_per_block to limit how many MPI tasks can be run per block.

engine:
    type: GlobusMPIEngine
    mpi_launcher: srun

    provider:
        type: SlurmProvider

        launcher:
            type: SimpleLauncher

        max_workers_per_block: 4
        nodes_per_block: 8

For this example we give each block 8 nodes, and allow up to 4 MPI jobs at once on a single block.

Start the Endpoint¶

Once the endpoint is configured, we can start it up.

$ globus-compute-endpoint start my-ep

Take note of the endpoint ID emitted to the console; we will use it later in the tutorial.

Submit Tasks from the SDK¶

Note

For more details on MPI support on the SDK, see Submitting MPI Tasks.

We’ll use the Executor to submit our MPI tasks, but first, we need to define our MPI function. In this case, we’ll just run hostname on every MPI node:

from globus_compute_sdk import MPIFunction
mpi_func = MPIFunction("hostname")

An MPIFunction can be submitted like any other Python function. When submitted, it runs bash commands on the endpoint, with the appropriate MPI executable and arguments handled by Parsl.

In order to run MPI tasks we need to give our Executor a resource_specification, which tells the endpoint how to distribute the nodes amongst MPI workers:

ep_id =  "..."  # Endpoint ID from before
with Executor(endpoint_id=ep_id) as ex:
    ex.resource_specification = {
        "num_ranks": 1,  # run 1 MPI task
        "num_nodes": 8   # and give it 8 nodes
    }
    f = ex.submit(mpi_func)
    mpi_result = f.result()
    print(mpi_result.stdout)

MPIFunction submissions return ShellResult objects, hence the .stdout.

Finally, each task can have its own resource_specification:

with Executor(endpoint_id=ep_id) as ex:
    for ranks in range(1, 4):  # reminder: (1, 2, 3). 4 not included
        # run "ranks" MPI tasks on each node
        ex.resource_specification = {
            "ranks_per_node": ranks,
            "num_nodes": 2
        }
        f = ex.submit(mpi_func)
        mpi_result = f.result()
        print(mpi_result.stdout)

This should result in output that looks something like the following:

# 2 nodes, 1 rank
my-node-1
my-node-2

# 2 nodes, 2 ranks
my-node-2
my-node-1
my-node-1
my-node-2

# 2 nodes, 3 ranks
my-node-1
my-node-2
my-node-1
my-node-2
my-node-2
my-node-1