Configuration Reference¶

Globus Compute endpoints require two configuration files:

config.yaml for the manager endpoint process
user_config_template.yaml.j2 for user endpoint processes

These two YAML files serve as convenience interfaces to the Python configuration classes used internally by Compute endpoints. Anything specified here is conveyed during the endpoint startup to those backing classes; consequently, for a complete list of options, please see the internal class documentation at the bottom of this page. Meanwhile, the YAML interface is generally much easier to understand, easier to diff, and, with less required boilerplate code, easier to maintain.

User Endpoint Configuration¶

The user_config_template.yaml.j2 file is a Jinja template used to generate YAML configurations for user endpoint processes that execute tasks. Under the hood, all configuration options are used to create an instance of the UserEndpointConfig class.

For information on template capabilities and peculiarities, see Working with Templates.

Idle Timeout¶

User endpoint processes automatically shut down after a configurable idle timeout to conserve resources:

idle_heartbeats_soft: if there are no outstanding tasks still processing, and the user endpoint process has been idle for this many heartbeats, shut it down
idle_heartbeats_hard: if the user endpoint process is apparently idle (e.g., there are outstanding tasks, but they have not moved) for this many heartbeats, then shut down anyway

By default, a heartbeat occurs every 30s. If idle_heartbeats_hard is set to 7, and no tasks or results move (i.e., tasks received from the web service or results received from workers), then the user endpoint process will shut down after 3m30s (7 × 30s).

Engine¶

The only required configuration item is engine, with three available types: ThreadPoolEngine, ProcessPoolEngine, and GlobusComputeEngine. The first two are Compute endpoint wrappers of Python’s concurrent.futures.ThreadPoolExecutor and concurrent.futures.ProcessPoolExecutor, respectively. These engines are appropriate for single‑host installations (e.g., a personal workstation). For scheduler‑based clusters, GlobusComputeEngine, as a wrapper over Parsl’s HighThroughputExecutor, enables access to multiple computation nodes. The default configuration specifies GlobusComputeEngine.

The simplest configuration would use the ThreadPoolEngine:

~/.globus_compute/simple_threadpool/user_config_template.yaml.j2¶

engine:
  type: ThreadPoolEngine

Per Python’s default of max_workers=None, this configuration will create as many threads as the host has processor cores (up to 32, per Python 3.8+). Any argument to fine-tune the underlying executor’s behavior must be placed inside the engine stanza. For example, to limit the worker to 3 threads:

~/.globus_compute/three_threads/user_config_template.yaml.j2¶

engine:
  type: ThreadPoolEngine
  max_workers: 3

Similarly, if using the ProcessPoolEngine, one might implement a policy of workers only running 100 tasks before workers are respawned:

~/.globus_compute/four_workers_100_tasks/user_config_template.yaml.j2¶

engine:
  type: ProcessPoolEngine
  max_tasks_per_child: 100
  max_workers: 4

Given the above two endpoint configurations, the ps utility on the host console can verify the setup. The process pool has 4 worker nodes (per the max_workers configuration), while the thread pool endpoint has the same concept in threads:

ps of different engine configurations on the host; output edited for clarity¶

$ ps w --forest | grep "Globus Compute Endpoint"
25713 ... \_ Globus Compute Endpoint (..., four_workers_100_tasks)
25726 ...     \_ Globus Compute Endpoint (..., four_workers_100_tasks)
25727 ...     \_ Globus Compute Endpoint (..., four_workers_100_tasks)
25728 ...     \_ Globus Compute Endpoint (..., four_workers_100_tasks)
25729 ...     \_ Globus Compute Endpoint (..., four_workers_100_tasks)
26339 ... \_ Globus Compute Endpoint (..., three_threads)

$ ps wm --forest | grep -A2 three_threads
26339 ... Globus Compute Endpoint (..., three_threads)
    - ... -
    - ... -

Per usual Python semantics, the ThreadPoolEngine (ThreadPoolExecutor under the hood) is typically best for I/O oriented workflows, while the ProcessPoolEngine (ProcessPoolExecutor under the hood) will be a better fit for CPU-intensive tasks. But both engines will only run tasks on the endpoint host machine. If the endpoint is strictly limited to a single host (e.g., a home desktop, an idle workstation), then these engines may be the simplest option.

For running in a multi-node setup (e.g., clusters, with scheduling software like PBS or Slurm), the GlobusComputeEngine enables much more concurrency. This engine has more options and is similarly more complicated to configure. A rough equivalent to the ProcessPoolEngine example would be:

~/.globus_compute/my_first_cluster_setup/user_config_template.yaml.j2¶

engine:
  type: GlobusComputeEngine
  provider:
    type: LocalProvider
    max_blocks: 4

Retries¶

Functions submitted to the GlobusComputeEngine can fail due to infrastructure failures. For example, the worker executing the task might terminate due to it running out of memory, or all workers under a batch job could fail due to the batch job exiting as it reaches the walltime limit. GlobusComputeEngine can be configured to automatically retry these tasks by setting max_retries_on_system_failure=N, where N is the number of retries allowed. The default config sets retries to 0 since functions can be computationally expensive, not idempotent, or leave side effects that affect subsequent retries.

Example config snippet:

user_config_template.yaml.j2¶

engine:
    type: GlobusComputeEngine
    max_retries_on_system_failure: 2  # Default=0

Auto-Scaling¶

GlobusComputeEngine by default automatically scales workers in response to workload.

Strategy configuration is limited to two options:

max_idletime: Maximum duration in seconds that workers are allowed to idle before they are marked for termination
strategy_period: Set the # of seconds between strategy attempting auto-scaling events

The bounds for scaling are determined by the options to the Provider (init_blocks, min_blocks, max_blocks). Please refer to the Parsl docs for more info.

Here’s an example configuration:

user_config_template.yaml.j2¶

engine:
    type: GlobusComputeEngine
    job_status_kwargs:
        max_idletime: 60.0      # Default = 120s
        strategy_period: 120.0  # Default = 5s

Provider¶

Whereas the ThreadPoolEngine and ProcessPoolEngine wrappers have an implicit approach to managing the compute resources (the process model), the GlobusComputeEngine requires explicit knowledge of the local topology. The provider stanza chooses the Parsl mechanism by which to communicate with the local site. If the workers will be on the same host as the endpoint, LocalProvider is appropriate. But if workers will be on cluster nodes, those resources will be accessed via the site-specific scheduler, and communication between the endpoint and the workers will occur via a site-specific network interface.

Parsl implements a number of providers, so we will not describe them here. Instead, we present a single example, and then suggest reading through the rest of the endpoint configuration examples and Parsl’s documentation while keeping your project needs in mind.

The University of Chicago’s Midway cluster uses Slurm as the batch scheduler, so the following example configuration chooses Parsl’s SlurmProvider. The Slurm batch scheduler requires an allocation to debit for each job submission, specified for SlurmProvider via account. On each acquired cluster node, the worker_init is a set of shell-script lines that will be run prior to starting the worker; this example loads the site-specific module named Anaconda, and then activates the compute-env environment. When sending jobs to the batch scheduler, this requests only a single node at a time (nodes_per_block), from the partition of nodes named caslake (partition), does not have more than 1 active or pending job (max_blocks), and has the scheduler enforce a time limit for each job to no more than 5 minutes (walltime).

For communication between the endpoint and the worker nodes, tell the endpoint to open up communication ports on the internal interface, named bond0.

Example user_config_template.yaml.j2 of an endpoint on UChicago RCC’s Midway¶

engine:
    type: GlobusComputeEngine
    max_workers_per_node: 2
    provider:
        type: SlurmProvider
        account: {{ account }}
        partition: caslake
        worker_init: "module load Anaconda; source activate compute-env"
        nodes_per_block: 1
        max_blocks: 1
        walltime: 00:05:00
    address:
        type: address_by_interface
        ifname: bond0

Again, this is only a basic example. For more inspiration, please consult the list of examples and peruse Parsl’s documentation on both the HighThroughputExecutor and the available providers.

Note

How does one determine the appropriate interface (ifname) to use for each system?

There is no one answer to this. One route is to simply ask a more knowledgeable person (e.g., a colleague or system administrator). Another route might be a combination of educated guesses and trial and error.

To see what interfaces the host machine has configured, one can use the ip utility to look for the UP interfaces. From a hypothetical machine:

$ ip addr  # (typically 'ip' is in /usr/sbin/)
...
2: enp2s0f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 88:a4:c2:12:a8:6d brd ff:ff:ff:ff:ff:ff
...
4: enp2s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 10:C5:95:49:b0:a1 brd ff:ff:ff:ff:ff:ff
    inet 10.110.23.47/24 brd 10.119.23.255 scope global dynamic noprefixroute wlp3s0
       valid_lft 2669sec preferred_lft 2669sec
    inet6 fe80::6a05:caff:fee0:320a/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
...
7: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 50:2f:9b:12:3e:bb brd ff:ff:ff:ff:ff:ff
    inet 192.168.167.219/24 brd 192.168.167.255 scope global dynamic noprefixroute wlp3s0
       valid_lft 6007sec preferred_lft 6007sec
    inet6 fe80::1d48:c378:42d3:d031/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
...

(If you like color, can use ip -c addr to get the DOWN and UP interfaces in red and green output.)

From the above, enp2s0f1 and wlp3s0 would be likely candidates (c.f., the capitalized UP in the first line of those records). The next step is to recognize that most setups attach their cluster nodes to internal networks. Put differently, a ping destined for the outside world but placed on the internal interface, should fail. Programmatically, then, one can look for a failing ping invocation as an indication of the inward-facing interface. From the above output, we can use the -I argument to ping:

$ ping -c 1 -I wlp3s0 google.com 1>/dev/null 2>&1; echo $?
0

Zero. That means “Successfully pinged google.com and got a response.” But since we are looking for ping to fail, that is likely not the correct interface for the endpoint to utilize.

$ ping -c 1 -I enp2s0f1 google.com 1>/dev/null 2>&1; echo $?
1

Nonzero – “Failed to communicate with google.com”. Therefore, that is likely an internal interface, and a good bet for the interface that is plumbed up to talk with the cluster’s internal nodes.

At which point, try it “and see.”

Manager Endpoint Configuration¶

The config.yaml file contains the YAML configuration for the manager endpoint process, which manages user endpoint processes. Under the hood, all configuration options in this file are used to create an instance of the ManagerEndpointConfig class.

public

A boolean value, dictating whether other users can discover this endpoint in the Globus Compute web API and Globus Web UI. It defaults to true.

Warning

This field does not indicate access/usage of the endpoint. It determines only whether this endpoint is easily discoverable via the Globus web portal — access is controlled via the admins field (management of endpoint) and the identity mapping configuration (Compute usage), both described below.
config.yaml – example public multi-user endpoint¶
```
public: true
```
identity_mapping_config_path

A path to an identity mapping configuration, per the Globus Connect Server Identity Mapping Guide. The configuration file must be a JSON-list of identity mapping configurations. The multi-user endpoint documentation discusses the content of this file in detail.
Example config.yaml with an identity mapping path¶
```
identity_mapping_config_path: /path/to/idmap_config.json
```
user_config_template_path

The path to the user endpoint configuration Jinja2 template YAML file. If not specified, the default template path will be used: ~/.globus_compute/my-ep/user_config_template.yaml.j2.

See user_config_template.yaml.j2 for more information.
Example config.yaml with a custom user config template path¶
```
user_config_template_path: /path/to/my_template.yaml.j2
```
user_config_schema_path

The path to the user endpoint configuration JSON schema file. If not specified, the default schema path will be used: ~/.globus_compute/my-mep/user_config_schema.json.

See user_config_schema.json for more information.
Example config.yaml with a custom user config schema path¶
```
user_config_schema_path: /path/to/my_schema.json
```

admins

A list of Globus Auth identity IDs that have administrative access to the endpoint, in addition to the owner.

Important

This field requires an active Globus subscription (i.e., subscription_id).

config.yaml – specifying endpoint administrators¶

subscription_id: 600ba9ac-ef16-4387-30ad-60c6cc3a6853
admins:
  # Peter Gibbons (software engineer)
  - 10afcf74-b041-4439-8e0d-eab371767440
  # Samir Nagheenanajar (sysadmin, HPC services)
  - a6a7b9ee-be04-4e45-9832-d3737c2fafa2

display_name

If not specified, the endpoint will show up in the Web UI as the given local name. (In other words, the same name as used to create the endpoint with the configure subcommand, and as used for the directory name inside of ~/.globus_compute/.) This field is free-form (accepting space characters, for example).
config.yaml – naming a public endpoint¶
```
display_name: Debug queue, 10m max job time (RCC, Midway, UChicago)
public: true
```
allowed_functions

This field specifies an allow-list of functions that may be run by a user endpoint process. As this list is available at endpoint registration time, not only do the user endpoint processes verify that each task requests a valid function, but the web-service enforces the allowed functions list at task submission as well. For more information, see Function Allow Listing.
config.yaml – only allow certain functions¶
```
allowed_functions:
  - 00911703-e76b-4d0b-7b98-6f2e25ab9943
  - e552e7f2-c007-4671-6ca4-3a4fd84f3805
```
authentication_policy

Use a Globus Authentication Policy to restrict who can use a multi-user endpoint at the web service. See Authentication Policies for more information.
config.yaml – allowing only valid identities¶
```
authentication_policy: 498c7327-9c6a-4847-c954-1eafa923da8e
subscription_id: 600ba9ac-ef16-4387-30ad-60c6cc3a6853
```
pam

Use Pluggable Authentication Modules (PAM) for site-specific authorization requirements. A structure with enable and service_name options. Defaults to disabled and globus-compute-endpoint. See Multi-User § PAM for more information.
config.yaml – enabling PAM¶
```
pam:
  enable: true
```

Python Class Documentation¶

The YAML configurations discussed above are facades over the following Python classes. Though the vast majority of users will only use the YAML configurations, we present the following class documentation to show all of the available options.

class globus_compute_endpoint.endpoint.config.config.UserEndpointConfig(*, engine: GlobusComputeEngineBase | None = None, heartbeat_threshold: int = 120, idle_heartbeats_soft: int = 0, idle_heartbeats_hard: int = 5760, endpoint_setup: str | None = None, endpoint_teardown: str | None = None, log_dir: str | None = None, stdout: str = './endpoint.log', stderr: str = './endpoint.log', **kwargs)¶

Bases: BaseConfig

Holds the configuration items for a task-processing endpoint.

Typically, one does not instantiate this configuration directly, but specifies the relevant options in the endpoint’s config.yaml file. For example, to specify an endpoint that only uses threads on the endpoint host, the configuration might look like:

config.yaml¶

display_name: My single-block, host-only EP at site ABC
engine:
  type: GlobusComputeEngine
  provider:
    type: LocalProvider
    max_blocks: 1

Please see the BaseConfig class for a list of options that both ManagerEndpointConfig and UserEndpointConfig classes share.

Parameters:

engine – The GlobusComputeEngine for this endpoint to execute functions. The currently known engines are GlobusComputeEngine, ProcessPoolEngine, and ThreadPoolEngine. See User Endpoint Configuration for more information.
heartbeat_threshold – Seconds since the last heartbeat message from the Globus Compute web service after which the connection is assumed to be disconnected.
idle_heartbeats_soft – Number of heartbeats after an endpoint is idle (no outstanding tasks or results, and at least 1 task or result has been forwarded) before the endpoint shuts down. If 0, then the endpoint must be manually triggered to shut down (e.g., SIGINT [Ctrl+C] or SIGTERM).
idle_heartbeats_hard – Number of heartbeats after no task or result has moved before the endpoint shuts down. Unlike idle_heartbeats_soft, this idle timer does not require that there are no outstanding tasks or results. If no task or result has moved in this many heartbeats, then the endpoint will shut down. In particular, this is intended to catch the error condition that a worker has gone missing and will thus never return the task it was sent. Note that this setting is only enabled if the idle_heartbeats_soft is a value greater than 0. Suggested value: a multiplier of heartbeat_period equivalent to two days. For example, if heartbeat_period is 30s, then suggest 5760.
endpoint_setup – Command(s) to be run during the endpoint initialization process
endpoint_teardown – Command(s) to be run during the endpoint shutdown process
log_dir – path to the top-level directory where logs should be written
stdout – Path where the endpoint’s stdout should be written
stderr – Path where the endpoint’s stderr should be written

Bases: BaseConfig

Holds the configuration items for an endpoint manager.

Typically, one does not instantiate this configuration directly, but specifies the relevant options in the endpoint’s config.yaml file. In fact, the way that Compute internally determines which *EndpointConfig class is instantiated is whether there is an engine block in its config.yaml file:

config.yaml for Managers¶

# this file left empty; with no engine block, Compute interprets this
# as a manager endpoint

config.yaml for task-processing endpoints (i.e., non-Managers)¶

engine:
  ...

# with an engine block, Compute will instantiate a task-processing endpoint
# (i.e., a user-endpoint process, or UEP)

Note that for manager endpoints that will not be run with privileges, identity mapping is disabled (hence not specified above). Conversely, if the process will have elevated privileges (e.g., run by root user or has setuid(2) privileges) then identity mapping is required:

config.yaml (for a root-owned process)¶

display_name: Debug queue, 1-block max
identity_mapping_config_path: /path/to/this/idmap_conf.json

Please see the BaseConfig class for a list of options that both ManagerEndpointConfig and UserEndpointConfig classes share.

Parameters:

public –
Whether all users can discover this endpoint via the Globus Compute web user interface and API.

Warning

Do not use this flag as a means of security. It controls visibility in the web user interface. It does not control access to the endpoint.
user_config_template_path – Path to the user configuration template file for this endpoint. If not specified, the default template path will be used.
user_config_schema_path – Path to the user configuration schema file for this endpoint. If not specified, the default schema path will be used.
identity_mapping_config_path – Path to the identity mapping configuration for this endpoint. If the process is not privileged, a warning will be emitted to the logs and this item will be ignored; conversely, if privileged, this configuration item is required, and a ValueError will be raised if the path does not exist.
audit_log_path – Path to the audit log. If specified, and the endpoint is marked as High-Assurance (HA), then the MEP will write auditing records here. An auditing record is a single-line of text, received from child endpoints at “interesting” points in a task lifetime. For example, when the UEP interchange first receives a task, it will emit an auditing record of RECEIVED for that task. Similarly, the UEP will emit EXEC_START when the executor registers the task, RUNNING if the executor shares that the task is running, and EXEC_END when the task is complete. If the path is created, it will be created with user-secure permission (umask=0o077), but it not be checked for permission conformance thereafter. It is up to the administrator to ensure this path is generally secured.
pam – Whether to enable authorization of user-endpoints via PAM routines, and optionally specify the PAM service name. See PamConfiguration. If not specified, PAM authorization defaults to disabled.
mu_child_ep_grace_period_s – The web-services send a start-user-endpoint to the endpoint manager ahead of tasks for the target user endpoint. If the user-endpoint is already running, these requests are ignored. To account for the inherent race-condition of receiving a start request just before the user-endpoint shuts down, the endpoint manager will hold on to the most recent start request for the user-endpoint for this grace period.

Parameters:

display_name – The display name for the endpoint. If None, defaults to the endpoint name (i.e., the directory name in ~/.globus_compute/)
allowed_functions – List of identifiers of functions that are allowed to be run on the endpoint
authentication_policy – Endpoint users are evaluated against this Globus authentication policy
subscription_id – Subscription ID associated with this endpoint
amqp_port – Port to use for AMQP connections. Note that only 5671, 5672, and 443 are supported by the Compute web services. If None, the port is assigned by the services (which default to 443).
heartbeat_period – The interval (in seconds) at which heartbeat messages are sent from the endpoint to the Globus Compute web service
environment – Environment the endpoint should connect to. If not specified, the endpoint connects to production. (Listed here for completeness, but only used internally by the dev team.)
local_compute_services – Point the endpoint to a local instance of the Compute services. (Listed here for completeness, but only used internally by the dev team.)
debug – If set, emit debug-level log messages. This is a configuration implementation of the CLI’s --debug flag. Note that if this value is explicitly False, then the CLI flag, if utilized, will still put the EP into “debug mode.” The CLI wins.
admins – A list of Globus Auth identity IDs that have administrative access to the endpoint, in addition to the owner. This field requires an active Globus subscription (i.e., subscription_id).

class globus_compute_endpoint.endpoint.config.pam.PamConfiguration(enable: bool = True, service_name: str = 'globus-compute-endpoint')¶

Parameters:

enable – Whether to initiate a PAM session for each UEP start request.
service_name –
What PAM service name with which to initialize the PAM session. If a particular MEP has different requirements, define those PAM requirements in /etc/pam.d/, and specify the service name with this field.

See MEP § PAM for more information