Configuration Reference¶
Globus Compute endpoints require two configuration files:
config.yamlfor the manager endpoint processuser_config_template.yaml.j2for user endpoint processes
These two YAML files serve as convenience interfaces to the Python configuration
classes used internally by Compute endpoints. Anything specified here is
conveyed during the endpoint startup to those backing classes; consequently, for
a complete list of options, please see the internal class documentation at the bottom of this page. Meanwhile, the YAML interface
is generally much easier to understand, easier to diff, and, with less
required boilerplate code, easier to maintain.
User Endpoint Configuration¶
The user_config_template.yaml.j2 file is a Jinja template used to generate
YAML configurations for user endpoint processes that execute tasks. Under the
hood, all configuration options are used to create an instance of the
UserEndpointConfig class.
For information on template capabilities and peculiarities, see Working with Templates.
Idle Timeout¶
User endpoint processes automatically shut down after a configurable idle timeout to conserve resources:
idle_heartbeats_soft: if there are no outstanding tasks still processing, and the user endpoint process has been idle for this many heartbeats, shut it downidle_heartbeats_hard: if the user endpoint process is apparently idle (e.g., there are outstanding tasks, but they have not moved) for this many heartbeats, then shut down anyway
By default, a heartbeat occurs every 30s. If idle_heartbeats_hard is set to
7, and no tasks or results move (i.e., tasks received from the web service or
results received from workers), then the user endpoint process will shut down
after 3m30s (7 × 30s).
Engine¶
The only required configuration item is engine, with three available types:
ThreadPoolEngine, ProcessPoolEngine, and GlobusComputeEngine. The
first two are Compute endpoint wrappers of Python’s concurrent.futures.ThreadPoolExecutor and
concurrent.futures.ProcessPoolExecutor, respectively. These engines are appropriate for
single‑host installations (e.g., a personal workstation). For scheduler‑based
clusters, GlobusComputeEngine, as a wrapper over Parsl’s
HighThroughputExecutor, enables access to multiple computation nodes. The
default configuration specifies GlobusComputeEngine.
The simplest configuration would use the ThreadPoolEngine:
~/.globus_compute/simple_threadpool/user_config_template.yaml.j2¶engine:
type: ThreadPoolEngine
Per Python’s default of max_workers=None, this configuration will create as
many threads as the host has processor cores (up to 32, per Python 3.8+). Any
argument to fine-tune the underlying executor’s behavior must be placed inside
the engine stanza. For example, to limit the worker to 3 threads:
~/.globus_compute/three_threads/user_config_template.yaml.j2¶engine:
type: ThreadPoolEngine
max_workers: 3
Similarly, if using the ProcessPoolEngine, one might implement a policy of
workers only running 100 tasks before workers are respawned:
~/.globus_compute/four_workers_100_tasks/user_config_template.yaml.j2¶engine:
type: ProcessPoolEngine
max_tasks_per_child: 100
max_workers: 4
Given the above two endpoint configurations, the ps utility on the host
console can verify the setup. The process pool has 4 worker nodes (per the
max_workers configuration), while the thread pool endpoint has the same
concept in threads:
ps of different engine configurations on the host; output
edited for clarity¶$ ps w --forest | grep "Globus Compute Endpoint"
25713 ... \_ Globus Compute Endpoint (..., four_workers_100_tasks)
25726 ... \_ Globus Compute Endpoint (..., four_workers_100_tasks)
25727 ... \_ Globus Compute Endpoint (..., four_workers_100_tasks)
25728 ... \_ Globus Compute Endpoint (..., four_workers_100_tasks)
25729 ... \_ Globus Compute Endpoint (..., four_workers_100_tasks)
26339 ... \_ Globus Compute Endpoint (..., three_threads)
$ ps wm --forest | grep -A2 three_threads
26339 ... Globus Compute Endpoint (..., three_threads)
- ... -
- ... -
Per usual Python semantics, the ThreadPoolEngine (ThreadPoolExecutor
under the hood) is typically best for I/O oriented workflows, while the
ProcessPoolEngine (ProcessPoolExecutor under the hood) will be a better
fit for CPU-intensive tasks. But both engines will only run tasks on the
endpoint host machine. If the endpoint is strictly limited to a single host
(e.g., a home desktop, an idle workstation), then these engines may be the
simplest option.
For running in a multi-node setup (e.g., clusters, with scheduling software like
PBS or Slurm), the GlobusComputeEngine enables much more concurrency.
This engine has more options and is similarly more complicated to configure. A
rough equivalent to the ProcessPoolEngine example would be:
~/.globus_compute/my_first_cluster_setup/user_config_template.yaml.j2¶engine:
type: GlobusComputeEngine
provider:
type: LocalProvider
max_blocks: 4
Retries¶
Functions submitted to the GlobusComputeEngine can fail due to infrastructure
failures. For example, the worker executing the task might terminate due to it
running out of memory, or all workers under a batch job could fail due to the
batch job exiting as it reaches the walltime limit. GlobusComputeEngine can
be configured to automatically retry these tasks by setting
max_retries_on_system_failure=N, where N is the number of retries allowed.
The default config sets retries to 0 since functions can be computationally
expensive, not idempotent, or leave side effects that affect subsequent retries.
Example config snippet:
user_config_template.yaml.j2¶engine:
type: GlobusComputeEngine
max_retries_on_system_failure: 2 # Default=0
Auto-Scaling¶
GlobusComputeEngine by default automatically scales workers in response to
workload.
Strategy configuration is limited to two options:
max_idletime: Maximum duration in seconds that workers are allowed to idle before they are marked for terminationstrategy_period: Set the # of seconds between strategy attempting auto-scaling events
The bounds for scaling are determined by the options to the Provider
(init_blocks, min_blocks, max_blocks). Please refer to the Parsl
docs
for more info.
Here’s an example configuration:
user_config_template.yaml.j2¶engine:
type: GlobusComputeEngine
job_status_kwargs:
max_idletime: 60.0 # Default = 120s
strategy_period: 120.0 # Default = 5s
Provider¶
Whereas the ThreadPoolEngine and ProcessPoolEngine wrappers have an
implicit approach to managing the compute resources (the process model), the
GlobusComputeEngine requires explicit knowledge of the local topology. The
provider stanza chooses the Parsl mechanism by which to communicate with the
local site. If the workers will be on the same host as the endpoint,
LocalProvider is appropriate. But if workers will be on cluster nodes, those
resources will be accessed via the site-specific scheduler, and communication
between the endpoint and the workers will occur via a site-specific network
interface.
Parsl implements a number of providers, so we will not describe them here. Instead, we present a single example, and then suggest reading through the rest of the endpoint configuration examples and Parsl’s documentation while keeping your project needs in mind.
The University of Chicago’s Midway cluster uses Slurm as the batch scheduler,
so the following example configuration chooses Parsl’s SlurmProvider. The
Slurm batch scheduler requires an allocation to debit for each job submission,
specified for SlurmProvider via account. On each acquired cluster node,
the worker_init is a set of shell-script lines that will be run prior to
starting the worker; this example loads the site-specific module named
Anaconda, and then activates the compute-env environment. When sending
jobs to the batch scheduler, this requests only a single node at a time
(nodes_per_block), from the partition of nodes named caslake
(partition), does not have more than 1 active or pending job
(max_blocks), and has the scheduler enforce a time limit for each job to no
more than 5 minutes (walltime).
For communication between the endpoint and the worker nodes, tell the endpoint
to open up communication ports on the internal interface, named bond0.
user_config_template.yaml.j2 of an endpoint on UChicago
RCC’s Midway¶engine:
type: GlobusComputeEngine
max_workers_per_node: 2
provider:
type: SlurmProvider
account: {{ account }}
partition: caslake
worker_init: "module load Anaconda; source activate compute-env"
nodes_per_block: 1
max_blocks: 1
walltime: 00:05:00
address:
type: address_by_interface
ifname: bond0
Again, this is only a basic example. For more inspiration, please consult the
list of examples and peruse Parsl’s documentation on
both the HighThroughputExecutor and the available providers.
Note
How does one determine the appropriate interface (ifname) to use for each system?
There is no one answer to this. One route is to simply ask a more knowledgeable person (e.g., a colleague or system administrator). Another route might be a combination of educated guesses and trial and error.
To see what interfaces the host machine has configured, one can use the
ip utility to look for the UP interfaces. From a hypothetical machine:
$ ip addr # (typically 'ip' is in /usr/sbin/)
...
2: enp2s0f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
link/ether 88:a4:c2:12:a8:6d brd ff:ff:ff:ff:ff:ff
...
4: enp2s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 10:C5:95:49:b0:a1 brd ff:ff:ff:ff:ff:ff
inet 10.110.23.47/24 brd 10.119.23.255 scope global dynamic noprefixroute wlp3s0
valid_lft 2669sec preferred_lft 2669sec
inet6 fe80::6a05:caff:fee0:320a/64 scope link noprefixroute
valid_lft forever preferred_lft forever
...
7: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 50:2f:9b:12:3e:bb brd ff:ff:ff:ff:ff:ff
inet 192.168.167.219/24 brd 192.168.167.255 scope global dynamic noprefixroute wlp3s0
valid_lft 6007sec preferred_lft 6007sec
inet6 fe80::1d48:c378:42d3:d031/64 scope link noprefixroute
valid_lft forever preferred_lft forever
...
(If you like color, can use ip -c addr to get the DOWN and UP interfaces
in red and green output.)
From the above, enp2s0f1 and wlp3s0 would be likely candidates (c.f.,
the capitalized UP in the first line of those records). The next step is
to recognize that most setups attach their cluster nodes to internal
networks. Put differently, a ping destined for the outside world but placed
on the internal interface, should fail. Programmatically, then, one can look
for a failing ping invocation as an indication of the inward-facing
interface. From the above output, we can use the -I argument to ping:
$ ping -c 1 -I wlp3s0 google.com 1>/dev/null 2>&1; echo $?
0
Zero. That means “Successfully pinged google.com and got a response.”
But since we are looking for ping to fail, that is likely not the correct
interface for the endpoint to utilize.
$ ping -c 1 -I enp2s0f1 google.com 1>/dev/null 2>&1; echo $?
1
Nonzero – “Failed to communicate with google.com”. Therefore, that is
likely an internal interface, and a good bet for the interface that is
plumbed up to talk with the cluster’s internal nodes.
At which point, try it “and see.”
Manager Endpoint Configuration¶
The config.yaml file contains the YAML configuration for the manager
endpoint process, which manages user endpoint processes. Under the hood, all
configuration options in this file are used to create an instance of the
ManagerEndpointConfig class.
publicA boolean value, dictating whether other users can discover this endpoint in the Globus Compute web API and Globus Web UI. It defaults to
true.Warning
This field does not indicate access/usage of the endpoint. It determines only whether this endpoint is easily discoverable via the Globus web portal — access is controlled via the
adminsfield (management of endpoint) and the identity mapping configuration (Compute usage), both described below.config.yaml– example public multi-user endpoint¶public: true
identity_mapping_config_pathA path to an identity mapping configuration, per the Globus Connect Server Identity Mapping Guide. The configuration file must be a JSON-list of identity mapping configurations. The multi-user endpoint documentation discusses the content of this file in detail.
Exampleconfig.yamlwith an identity mapping path¶identity_mapping_config_path: /path/to/idmap_config.json
user_config_template_pathThe path to the user endpoint configuration Jinja2 template YAML file. If not specified, the default template path will be used:
~/.globus_compute/my-ep/user_config_template.yaml.j2.See user_config_template.yaml.j2 for more information.
Exampleconfig.yamlwith a custom user config template path¶user_config_template_path: /path/to/my_template.yaml.j2
user_config_schema_pathThe path to the user endpoint configuration JSON schema file. If not specified, the default schema path will be used:
~/.globus_compute/my-mep/user_config_schema.json.See user_config_schema.json for more information.
Exampleconfig.yamlwith a custom user config schema path¶user_config_schema_path: /path/to/my_schema.json
adminsA list of Globus Auth identity IDs that have administrative access to the endpoint, in addition to the owner.
Important
This field requires an active Globus subscription (i.e.,
subscription_id).config.yaml– specifying endpoint administrators¶subscription_id: 600ba9ac-ef16-4387-30ad-60c6cc3a6853 admins: # Peter Gibbons (software engineer) - 10afcf74-b041-4439-8e0d-eab371767440 # Samir Nagheenanajar (sysadmin, HPC services) - a6a7b9ee-be04-4e45-9832-d3737c2fafa2
display_nameIf not specified, the endpoint will show up in the Web UI as the given local name. (In other words, the same name as used to create the endpoint with the
configuresubcommand, and as used for the directory name inside of~/.globus_compute/.) This field is free-form (accepting space characters, for example).config.yaml– naming a public endpoint¶display_name: Debug queue, 10m max job time (RCC, Midway, UChicago) public: true
allowed_functionsThis field specifies an allow-list of functions that may be run by a user endpoint process. As this list is available at endpoint registration time, not only do the user endpoint processes verify that each task requests a valid function, but the web-service enforces the allowed functions list at task submission as well. For more information, see Function Allow Listing.
config.yaml– only allow certain functions¶allowed_functions: - 00911703-e76b-4d0b-7b98-6f2e25ab9943 - e552e7f2-c007-4671-6ca4-3a4fd84f3805
authentication_policyUse a Globus Authentication Policy to restrict who can use a multi-user endpoint at the web service. See Authentication Policies for more information.
config.yaml– allowing only valid identities¶authentication_policy: 498c7327-9c6a-4847-c954-1eafa923da8e subscription_id: 600ba9ac-ef16-4387-30ad-60c6cc3a6853
pamUse Pluggable Authentication Modules (PAM) for site-specific authorization requirements. A structure with
enableandservice_nameoptions. Defaults to disabled andglobus-compute-endpoint. See Multi-User § PAM for more information.config.yaml– enabling PAM¶pam: enable: true
Python Class Documentation¶
The YAML configurations discussed above are facades over the following Python classes. Though the vast majority of users will only use the YAML configurations, we present the following class documentation to show all of the available options.
- class globus_compute_endpoint.endpoint.config.config.UserEndpointConfig(*, engine: GlobusComputeEngineBase | None = None, heartbeat_threshold: int = 120, idle_heartbeats_soft: int = 0, idle_heartbeats_hard: int = 5760, endpoint_setup: str | None = None, endpoint_teardown: str | None = None, log_dir: str | None = None, stdout: str = './endpoint.log', stderr: str = './endpoint.log', **kwargs)¶
Bases:
BaseConfigHolds the configuration items for a task-processing endpoint.
Typically, one does not instantiate this configuration directly, but specifies the relevant options in the endpoint’s
config.yamlfile. For example, to specify an endpoint that only uses threads on the endpoint host, the configuration might look like:config.yaml¶display_name: My single-block, host-only EP at site ABC engine: type: GlobusComputeEngine provider: type: LocalProvider max_blocks: 1
Please see the
BaseConfigclass for a list of options that bothManagerEndpointConfigandUserEndpointConfigclasses share.- Parameters:
engine – The GlobusComputeEngine for this endpoint to execute functions. The currently known engines are
GlobusComputeEngine,ProcessPoolEngine, andThreadPoolEngine. See User Endpoint Configuration for more information.heartbeat_threshold – Seconds since the last heartbeat message from the Globus Compute web service after which the connection is assumed to be disconnected.
idle_heartbeats_soft – Number of heartbeats after an endpoint is idle (no outstanding tasks or results, and at least 1 task or result has been forwarded) before the endpoint shuts down. If 0, then the endpoint must be manually triggered to shut down (e.g., SIGINT [Ctrl+C] or SIGTERM).
idle_heartbeats_hard – Number of heartbeats after no task or result has moved before the endpoint shuts down. Unlike
idle_heartbeats_soft, this idle timer does not require that there are no outstanding tasks or results. If no task or result has moved in this many heartbeats, then the endpoint will shut down. In particular, this is intended to catch the error condition that a worker has gone missing and will thus never return the task it was sent. Note that this setting is only enabled if theidle_heartbeats_softis a value greater than 0. Suggested value: a multiplier of heartbeat_period equivalent to two days. For example, ifheartbeat_periodis 30s, then suggest 5760.endpoint_setup – Command(s) to be run during the endpoint initialization process
endpoint_teardown – Command(s) to be run during the endpoint shutdown process
log_dir – path to the top-level directory where logs should be written
stdout – Path where the endpoint’s stdout should be written
stderr – Path where the endpoint’s stderr should be written
- class globus_compute_endpoint.endpoint.config.config.ManagerEndpointConfig(*, public: bool = False, user_config_template_path: PathLike | str | None = None, user_config_schema_path: PathLike | str | None = None, identity_mapping_config_path: PathLike | str | None = None, audit_log_path: PathLike | str | None = None, pam: PamConfiguration | None = None, mu_child_ep_grace_period_s: float = 30.0, **kwargs)¶
Bases:
BaseConfigHolds the configuration items for an endpoint manager.
Typically, one does not instantiate this configuration directly, but specifies the relevant options in the endpoint’s
config.yamlfile. In fact, the way that Compute internally determines which*EndpointConfigclass is instantiated is whether there is anengineblock in itsconfig.yamlfile:config.yamlfor Managers¶# this file left empty; with no engine block, Compute interprets this # as a manager endpoint
config.yamlfor task-processing endpoints (i.e., non-Managers)¶engine: ... # with an engine block, Compute will instantiate a task-processing endpoint # (i.e., a user-endpoint process, or UEP)
Note that for manager endpoints that will not be run with privileges, identity mapping is disabled (hence not specified above). Conversely, if the process will have elevated privileges (e.g., run by
rootuser or hassetuid(2)privileges) then identity mapping is required:config.yaml(for aroot-owned process)¶display_name: Debug queue, 1-block max identity_mapping_config_path: /path/to/this/idmap_conf.json
Please see the
BaseConfigclass for a list of options that bothManagerEndpointConfigandUserEndpointConfigclasses share.- Parameters:
public –
Whether all users can discover this endpoint via the Globus Compute web user interface and API.
Warning
Do not use this flag as a means of security. It controls visibility in the web user interface. It does not control access to the endpoint.
user_config_template_path – Path to the user configuration template file for this endpoint. If not specified, the default template path will be used.
user_config_schema_path – Path to the user configuration schema file for this endpoint. If not specified, the default schema path will be used.
identity_mapping_config_path – Path to the identity mapping configuration for this endpoint. If the process is not privileged, a warning will be emitted to the logs and this item will be ignored; conversely, if privileged, this configuration item is required, and a
ValueErrorwill be raised if the path does not exist.audit_log_path – Path to the audit log. If specified, and the endpoint is marked as High-Assurance (HA), then the MEP will write auditing records here. An auditing record is a single-line of text, received from child endpoints at “interesting” points in a task lifetime. For example, when the UEP interchange first receives a task, it will emit an auditing record of
RECEIVEDfor that task. Similarly, the UEP will emitEXEC_STARTwhen the executor registers the task,RUNNINGif the executor shares that the task is running, andEXEC_ENDwhen the task is complete. If the path is created, it will be created with user-secure permission (umask=0o077), but it not be checked for permission conformance thereafter. It is up to the administrator to ensure this path is generally secured.pam – Whether to enable authorization of user-endpoints via PAM routines, and optionally specify the PAM service name. See
PamConfiguration. If not specified, PAM authorization defaults to disabled.mu_child_ep_grace_period_s – The web-services send a start-user-endpoint to the endpoint manager ahead of tasks for the target user endpoint. If the user-endpoint is already running, these requests are ignored. To account for the inherent race-condition of receiving a start request just before the user-endpoint shuts down, the endpoint manager will hold on to the most recent start request for the user-endpoint for this grace period.
- class globus_compute_endpoint.endpoint.config.config.BaseConfig(*, high_assurance: bool = False, display_name: str | None = None, allowed_functions: Iterable[UUID | str] | None = None, authentication_policy: UUID | str | None = None, subscription_id: UUID | str | None = None, amqp_port: int | None = None, heartbeat_period: float | int = 30, environment: str | None = None, local_compute_services: bool = False, debug: bool = False, admins: Iterable[UUID | str] | None = None)¶
- Parameters:
display_name – The display name for the endpoint. If
None, defaults to the endpoint name (i.e., the directory name in~/.globus_compute/)allowed_functions – List of identifiers of functions that are allowed to be run on the endpoint
authentication_policy – Endpoint users are evaluated against this Globus authentication policy
subscription_id – Subscription ID associated with this endpoint
amqp_port – Port to use for AMQP connections. Note that only 5671, 5672, and 443 are supported by the Compute web services. If None, the port is assigned by the services (which default to 443).
heartbeat_period – The interval (in seconds) at which heartbeat messages are sent from the endpoint to the Globus Compute web service
environment – Environment the endpoint should connect to. If not specified, the endpoint connects to production. (Listed here for completeness, but only used internally by the dev team.)
local_compute_services – Point the endpoint to a local instance of the Compute services. (Listed here for completeness, but only used internally by the dev team.)
debug – If set, emit debug-level log messages. This is a configuration implementation of the CLI’s
--debugflag. Note that if this value is explicitly False, then the CLI flag, if utilized, will still put the EP into “debug mode.” The CLI wins.admins – A list of Globus Auth identity IDs that have administrative access to the endpoint, in addition to the owner. This field requires an active Globus subscription (i.e.,
subscription_id).
- class globus_compute_endpoint.endpoint.config.pam.PamConfiguration(enable: bool = True, service_name: str = 'globus-compute-endpoint')¶
- Parameters:
enable – Whether to initiate a PAM session for each UEP start request.
service_name –
What PAM service name with which to initialize the PAM session. If a particular MEP has different requirements, define those PAM requirements in
/etc/pam.d/, and specify the service name with this field.See MEP § PAM for more information