Example Configurations¶
While Globus Compute is in use on various systems around the world, getting to a working configuration that matches the underlying system constraints and the requirements of the site-administrator often takes trial and error. Below are example configurations for some well-known systems that are known to work. These serve as a reference for getting started.
If you would like to add your system to this list please contact the Globus Compute Team via Slack. (The #help channel is a good place to start.)
Note
All configuration examples below must be customized for the user’s allocation, Python environment, file system, etc.
Anvil (RCAC, Purdue)¶

The following snippet shows an example configuration for executing remotely on Anvil, a
supercomputer at Purdue University’s Rosen Center for Advanced Computing (RCAC). The
configuration assumes the user is running on a login node, uses the SlurmProvider
to
interface with the scheduler, and uses the SrunLauncher
to launch workers.
amqp_port: 443
display_name: Anvil CPU
engine:
type: GlobusComputeEngine
max_workers_per_node: 2
address:
type: address_by_interface
ifname: ib0
provider:
type: SlurmProvider
partition: debug
account: {{ ACCOUNT }}
launcher:
type: SrunLauncher
# string to prepend to #SBATCH blocks in the submit
# script to the scheduler
# e.g., "#SBATCH --constraint=knl,quad,cache"
scheduler_options: {{ OPTIONS }}
# Command to be run before starting a worker
# e.g., "module load anaconda; source activate gce_env
worker_init: {{ COMMAND }}
init_blocks: 1
max_blocks: 1
min_blocks: 0
walltime: 00:05:00
Delta (NCSA)¶

The following snippet shows an example configuration for executing remotely on Delta, a
supercomputer at the National Center for Supercomputing Applications. The configuration
assumes the user is running on a login node, uses the SlurmProvider
to interface
with the scheduler, and uses the SrunLauncher
to launch workers.
amqp_port: 443
display_name: NCSA Delta 2 CPU
engine:
type: GlobusComputeEngine
max_workers_per_node: 2
address:
type: address_by_interface
ifname: eth6.560
provider:
type: SlurmProvider
partition: cpu
account: {{ ACCOUNT NAME }}
launcher:
type: SrunLauncher
# Command to be run before starting a worker
# e.g., "module load anaconda3; source activate gce_env"
worker_init: {{ COMMAND }}
init_blocks: 1
min_blocks: 0
max_blocks: 1
walltime: 00:30:00
Expanse (SDSC)¶

The following snippet shows an example configuration for executing remotely on Expanse,
a supercomputer at the San Diego Supercomputer Center. The configuration assumes the
user is running on a login node, uses the SlurmProvider
to interface with the
scheduler, and uses the SrunLauncher
to launch workers.
display_name: Expanse@SDSC
engine:
type: GlobusComputeEngine
max_workers_per_node: 2
worker_debug: False
address:
type: address_by_interface
ifname: ib0
provider:
type: SlurmProvider
partition: compute
account: {{ ACCOUNT }}
launcher:
type: SrunLauncher
# string to prepend to #SBATCH blocks in the submit
# script to the scheduler
# e.g., "#SBATCH --constraint=knl,quad,cache"
scheduler_options: {{ OPTIONS }}
# Command to be run before starting a worker
# e.g., "module load anaconda3; source activate gce_env"
worker_init: {{ COMMAND }}
init_blocks: 0
min_blocks: 0
max_blocks: 1
walltime: 00:05:00
The GlobusMPIEngine
adds support for running MPI applications. The following snippet
shows an example configuration for Expanse that uses the SlurmProvider
to provision
batch jobs each with 4 nodes, which can be dynamically partitioned to launch
MPI functions with srun
.
display_name: ExpanseMPI@SDSC
engine:
type: GlobusMPIEngine
mpi_launcher: srun
address:
type: address_by_interface
ifname: ib0
provider:
type: SlurmProvider
partition: compute
account: {{ ACCOUNT }}
launcher:
type: SimpleLauncher
# string to prepend to #SBATCH blocks in the submit
# script to the scheduler
# e.g., "#SBATCH --constraint=knl,quad,cache"
scheduler_options: {{ OPTIONS }}
# Command to be run before starting a worker
# e.g., "module load anaconda3; source activate gce_env"
worker_init: {{ COMMAND }}
nodes_per_block: 4
init_blocks: 0
min_blocks: 0
max_blocks: 1
walltime: 00:05:00
UChicago AI Cluster¶

The following snippet shows an example configuration for the University of Chicago’s AI
Cluster. The configuration assumes the user is running on a login node and uses the
SlurmProvider
to interface with the scheduler and launch onto the GPUs.
Link to docs.
display_name: AI Cluster CS@UChicago
engine:
type: GlobusComputeEngine
label: fe.cs.uchicago
worker_debug: False
address:
type: address_by_interface
ifname: ens2f1
provider:
type: SlurmProvider
partition: general
# This is a hack. We use hostname ; to terminate the srun command, and
# start our own.
launcher:
type: SrunLauncher
overrides: >
hostname; srun --ntasks={{ TOTAL_WORKERS }}
--ntasks-per-node={{ WORKERS_PER_NODE }}
--gpus-per-task=rtx2080ti:{{ GPUS_PER_WORKER }}
--gpu-bind=map_gpu:{{ GPU_MAP }} \
# To request a single gpu, use the following:
# hostname; srun --ntasks=1
# --ntasks-per-node=1
# --gres=gpu:1 \
# Scale between 0-1 blocks with 2 nodes per block
nodes_per_block: 1
init_blocks: 0
min_blocks: 0
max_blocks: 1
# Hold blocks for 30 minutes
walltime: 00:30:00
Here is some Python that demonstrates how to compute the variables in the YAML example above:
# Launch 4 managers per node, each bound to 1 GPU
# Modify before use
NODES_PER_JOB = 2
GPUS_PER_NODE = 4
GPUS_PER_WORKER = 2
# DO NOT MODIFY
TOTAL_WORKERS = int((NODES_PER_JOB * GPUS_PER_NODE) / GPUS_PER_WORKER)
WORKERS_PER_NODE = int(GPUS_PER_NODE / GPUS_PER_WORKER)
GPU_MAP = ",".join([str(x) for x in range(1, TOTAL_WORKERS + 1)])
Midway (RCC, UChicago)¶

The Midway cluster is a campus cluster hosted by the Research Computing Center at the
University of Chicago. The snippet below shows an example configuration for executing
remotely on Midway. The configuration assumes the user is running on a login node and
uses the SlurmProvider
to interface with the scheduler, and uses the
SrunLauncher
to launch workers.
engine:
type: GlobusComputeEngine
label: Midway@RCC.UChicago
max_workers_per_node: 2
address:
type: address_by_interface
ifname: bond0
provider:
type: SlurmProvider
launcher:
type: SrunLauncher
# e.g., pi-compute
account: {{ ACCOUNT }}
# e.g., caslake
partition: {{ PARTITION }}
# string to prepend to #SBATCH blocks in the submit
# script to the scheduler
# e.g., "#SBATCH --gres=gpu:4"
scheduler_options: {{ OPTIONS }}
# Command to be run before starting a worker
# e.g., "module load Anaconda; source activate compute-env"
worker_init: {{ COMMAND }}
# Scale between 0-1 blocks with 2 nodes per block
nodes_per_block: 2
init_blocks: 0
min_blocks: 0
max_blocks: 1
# Hold blocks for 30 minutes
walltime: 00:30:00
The following configuration example uses an Apptainer (formerly Singularity) container on Midway.
engine:
type: GlobusComputeEngine
label: Midway@RCC.UChicago
max_workers_per_node: 10
address:
type: address_by_interface
ifname: bond0
container_type: apptainer
container_cmd_options: -H /home/$USER
container_uri: {{ CONTAINER_URI }}
provider:
type: SlurmProvider
launcher:
type: SrunLauncher
# e.g., pi-compute
account: {{ ACCOUNT }}
# e.g., caslake
partition: {{ PARTITION }}
# string to prepend to #SBATCH blocks in the submit
# script to the scheduler
# e.g., "#SBATCH --gres=gpu:4"
scheduler_options: {{ OPTIONS }}
# Command to be run before starting a worker
# e.g., "module load Anaconda; source activate compute-env"
worker_init: {{ COMMAND }}
# Scale between 0-1 blocks with 2 nodes per block
nodes_per_block: 2
init_blocks: 0
min_blocks: 0
max_blocks: 1
# Hold blocks for 30 minutes
walltime: 00:30:00
Kubernetes Clusters¶

Kubernetes is an open-source system for container management, such as automating
deployment and scaling of containers. The snippet below shows an example configuration
for deploying pods as workers on a Kubernetes cluster. The KubernetesProvider exploits
the Python Kubernetes API, which assumes that you have kube config in
~/.kube/config
.
heartbeat_period: 15
heartbeat_threshold: 200
engine:
type: GlobusComputeEngine
max_workers_per_node: 1
# Encryption is not currently supported for KubernetesProvider
encrypted: false
address:
type: address_by_route
provider:
type: KubernetesProvider
init_blocks: 0
min_blocks: 0
max_blocks: 2
init_cpu: 1
max_cpu: 4
init_mem: 1024Mi
max_mem: 4096Mi
# e.g., default
namespace: {{ NAMESPACE }}
# e.g., python:3.12-bookworm
image: {{ IMAGE }}
# The secret key to download the image
secret: {{ SECRET }}
# e.g., "pip install --force-reinstall globus-compute-endpoint"
worker_init: {{ COMMAND }}
Polaris (ALCF)¶

The following snippet shows an example configuration for executing on Argonne Leadership
Computing Facility’s Polaris cluster. This example uses the GlobusComputeEngine
and connects to Polaris’s PBS scheduler using the PBSProProvider
. This
configuration assumes that the script is being executed on the login node of Polaris.
display_name: Polaris@ALCF
engine:
type: GlobusComputeEngine
max_workers_per_node: 4
# Un-comment to give each worker exclusive access to a single GPU
# available_accelerators: 4
address:
type: address_by_interface
ifname: hsn0
provider:
type: PBSProProvider
launcher:
type: MpiExecLauncher
# Ensures 1 manager per node, work on all 64 cores
bind_cmd: --cpu-bind
overrides: --depth=64 --ppn 1
account: {{ YOUR_POLARIS_ACCOUNT }}
queue: debug-scaling
cpus_per_node: 32
select_options: ngpus=4
# e.g., "#PBS -l filesystems=home:grand:eagle\n#PBS -k doe"
scheduler_options: "#PBS -l filesystems=home:grand:eagle"
# Node setup: activate necessary conda environment and such
worker_init: {{ COMMAND }}
walltime: 01:00:00
nodes_per_block: 1
init_blocks: 0
min_blocks: 0
max_blocks: 2
Perlmutter (NERSC)¶

The following snippet shows an example configuration for accessing NERSC’s
Perlmutter supercomputer. This example uses the GlobusComputeEngine
and
connects to Perlmutters’s Slurm scheduler. It is configured to request 2 nodes
configured with 1 TaskBlock per node. Finally, it includes override information to
request a particular node type (GPU) and to configure a specific Python environment on
the worker nodes using Anaconda.
display_name: Permutter@NERSC
engine:
type: GlobusComputeEngine
worker_debug: False
address:
type: address_by_interface
ifname: hsn0
provider:
type: SlurmProvider
partition: debug
# We request all hyperthreads on a node.
# GPU nodes have 128 threads, CPU nodes have 256 threads
launcher:
type: SrunLauncher
overrides: -c 128
# string to prepend to #SBATCH blocks in the submit
# script to the scheduler
# For GPUs in the debug qos eg: "#SBATCH --constraint=gpu\n#SBATCH --gpus-per-node=4"
scheduler_options: {{ OPTIONS }}
# Your NERSC account, eg: "m0000"
account: {{ NERSC_ACCOUNT }}
# Command to be run before starting a worker
# e.g., "module load Anaconda; source activate parsl_env"
worker_init: {{ COMMAND }}
# increase the command timeouts
cmd_timeout: 120
# Scale between 0-1 blocks with 2 nodes per block
nodes_per_block: 2
init_blocks: 0
min_blocks: 0
max_blocks: 1
# Hold blocks for 10 minutes
walltime: 00:10:00
Frontera (TACC)¶

The following snippet shows an example configuration for accessing the Frontera system
at TACC. The configuration below assumes that the user is running on a login node, uses
the SlurmProvider
to interface with the scheduler, and uses the SrunLauncher
to
launch workers.
display_name: Frontera@TACC
engine:
type: GlobusComputeEngine
max_workers_per_node: 2
worker_debug: False
address:
type: address_by_interface
ifname: ib0
provider:
type: SlurmProvider
# e.g., EAR22001
account: {{ YOUR_FRONTERA_ACCOUNT }}
# e.g., development
partition: {{ PARTITION }}
launcher:
type: SrunLauncher
# Enter scheduler_options if needed
scheduler_options: {{ OPTIONS }}
# Command to be run before starting a worker
# e.g., "module load Anaconda; source activate parsl_env"
worker_init: {{ COMMAND }}
# Add extra time for slow scheduler responses
cmd_timeout: 60
# Scale between 0-1 blocks with 2 nodes per block
nodes_per_block: 2
init_blocks: 0
min_blocks: 0
max_blocks: 1
# Hold blocks for 30 minutes
walltime: 00:30:00
Bebop (LCRC, ANL)¶

The following snippet shows an example configuration for accessing the Bebop system at
Argonne’s LCRC. The configuration below assumes that the user is running on a login
node, uses the SlurmProvider
to interface with the scheduler, and uses the
SrunLauncher
to launch workers.
display_name: Bebop@ANL
engine:
type: GlobusComputeEngine
max_workers_per_node: 2
worker_debug: False
address:
type: address_by_interface
ifname: ib0
provider:
type: SlurmProvider
partition: {{ PARTITION }} # e.g., bdws
launcher:
type: SrunLauncher
# Command to be run before starting a worker
# e.g., "module load anaconda; source activate gce_env"
worker_init: {{ COMMAND }}
nodes_per_block: 1
init_blocks: 0
min_blocks: 0
max_blocks: 1
walltime: 00:30:00
Bridges-2 (PSC)¶

The following snippet shows an example configuration for accessing the Bridges-2 system
at PSC. The configuration below assumes that the user is running on a login node, uses
the SlurmProvider
to interface with the scheduler, and uses the SrunLauncher
to
launch workers.
engine:
type: GlobusComputeEngine
max_workers_per_node: 2
worker_debug: False
address:
type: address_by_interface
ifname: ens3f0
provider:
type: SlurmProvider
partition: RM-small
launcher:
type: SrunLauncher
# string to prepend to #SBATCH blocks in the submit
# script to the scheduler
# e.g., "#SBATCH --constraint=knl,quad,cache"
scheduler_options: {{ OPTIONS }}
# Command to be run before starting a worker
# e.g., module load Anaconda; source activate parsl_env
worker_init: {{ COMMAND }}
# Scale between 0-1 blocks with 2 nodes per block
nodes_per_block: 2
init_blocks: 0
min_blocks: 0
max_blocks: 1
# Hold blocks for 30 minutes
walltime: 00:30:00
FASTER (TAMU)¶
The following snippet shows an example configuration for accessing the FASTER system at
Texas A & M (TAMU). The configuration below assumes that
the user is running on a login node, uses the SlurmProvider
to interface with the
scheduler, and uses the SrunLauncher
to launch workers.
amqp_port: 443
display_name: Access Tamu Faster
engine:
type: GlobusComputeEngine
worker_debug: False
strategy:
type: SimpleStrategy
max_idletime: 90
address:
type: address_by_interface
ifname: eno8303
provider:
type: SlurmProvider
partition: cpu
mem_per_node: 128
launcher:
type: SrunLauncher
scheduler_options: {{ OPTIONS }}
worker_init: {{ COMMAND }}
# increase the command timeouts
cmd_timeout: 120
# Scale between 0-1 blocks with 1 nodes per block
nodes_per_block: 1
init_blocks: 0
min_blocks: 0
max_blocks: 1
# Hold blocks for 10 minutes
walltime: 00:10:00
Open Science Pool¶
The Open Science Pool is a pool of opportunistic
computing resources operated for all US-associated open science by the
OSG consortium.
Unlike traditional HPC clusters, these computational resources are offered from campus and research cluster
resources that are loosely connected. The configuration below uses the CondorProvider
to interface with the scheduler, and uses apptainer
to distribute the computational environment
to the workers.
Warning
GlobusComputeEngine
relies on a shared-filesystem to distribute keys used for encrypting
communication between the endpoint and workers. Since OSPool does not support a writable
shared-filesystem, encryption is disabled in the configuration below.
display_name: OSPool
engine:
type: GlobusComputeEngine
max_workers_per_node: 1
# This config uses apptainer containerization to ensure a consistent
# python environment on the worker side. Since apptainer limits writable
# directory paths, set working directory paths paths used by the worker to /tmp
# P.S: These filepaths remain private to the container and will not be
# accessible on the host system
worker_logdir_root: /tmp/logs
working_dir: /tmp/tasks_dir
# GlobusComputeEngine relies on a shared-filesystem to distribute keys used
# for encrypting communication between the endpoint and workers.
# Since OSPool does not support a writable shared-filesystem,
# **encryption** is disabled in the configuration below.
encrypted: False
provider:
type: CondorProvider
init_blocks: 1
max_blocks: 1
min_blocks: 0
# Specify ProjectName and Apptainer image
scheduler_options: >
+ProjectName = {{ PROJECT_NAME }}
# To use apptainer on OSPool, build an apptainer image and copy it to
# OSDF and specify the full Specify the apptainer image path for eg.:
# "osdf:///ospool/ap20/data/USERNAME/globus_compute_py3.11.v1.sif"
+SingularityImage = {{ APPTAINER_IMAGE_PATH }}
# Add a condor requirement to guarantee that worker nodes support apptainer
Requirements = HAS_SINGULARITY == True && OSG_HOST_KERNEL_VERSION >= 31000
Stampede3 (TACC)¶

Stampede3 is a Dell technologies and Intel based supercomputer at the
Texas Advanced Computing Center (TACC). The following snippet shows an example
configuration that uses the SlurmProvider
to interface with the batch scheduler, and uses the SrunLauncher
to launch workers across nodes.
display_name: Stampede3@TACC
engine:
type: GlobusComputeEngine
max_workers_per_node: 2
address:
type: address_by_interface
ifname: ibp10s0
provider:
type: SlurmProvider
# e.g., EAR22001
account: {{ YOUR_TACC_ALLOCATION }}
# e.g., skx-dev
partition: {{ PARTITION }}
launcher:
type: SrunLauncher
# Enter scheduler_options if needed
scheduler_options: {{ OPTIONS }}
# Command to be run before starting a worker
# e.g., "module load Anaconda; source activate parsl_env"
worker_init: {{ COMMAND }}
# Add extra time for slow scheduler responses
cmd_timeout: 60
# Scale between 0-1 blocks with 1 node per block
nodes_per_block: 1
init_blocks: 0
min_blocks: 0
max_blocks: 1
# Hold blocks for 30 minutes
walltime: 00:30:00
Pinning Workers to devices¶
Many modern clusters provide multiple accelerators per compute note, yet many
applications are best suited to using a single accelerator per task. Globus Compute
supports pinning each worker to different accelerators using the
available_accelerators
option of the GlobusComputeEngine
. Provide either the
number of accelerators (Globus Compute will assume they are named in integers starting
from zero) or a list of the names of the accelerators available on the node. Each
Globus Compute worker will have the following environment variables set to the worker
specific identity assigned: CUDA_VISIBLE_DEVICES
, ROCR_VISIBLE_DEVICES
,
SYCL_DEVICE_FILTER
.
engine:
type: GlobusComputeEngine
max_workers_per_node: 4
# `available_accelerators` may be a natural number or a list of strings.
# If an integer, then each worker launched will have an automatically
# generated environment variable. In this case, one of 0, 1, 2, or 3.
# Alternatively, specific strings may be utilized.
available_accelerators: 4
# available_accelerators: ["opencl:gpu:1", "opencl:gpu:2"] # alternative
provider:
type: LocalProvider
init_blocks: 1
min_blocks: 0
max_blocks: 1