Globus Compute Tutorial¶
Globus Compute is a Function-as-a-Service (FaaS) platform that enables fire-and-forget execution of Python functions on one or more remote Globus Compute endpoints.
This tutorial is configured to use a tutorial endpoint hosted by the
Globus Compute team. You can setup your own endpoint on resources to
which you have access by following the Globus Compute
documentation.
Globus Compute endpoints can be deployed on many cloud platforms,
clusters with batch schedulers (e.g., Slurm, PBS), Kubernetes, or on a
local PC. After configuring an endpoint you can use it in this tutorial
by simply setting the endpoint_id
below.
Note that although the tutorial endpoint has been made public by the Globus Compute team, endpoints created by users can not be shared publicly.
Globus Compute Python SDK¶
The Globus Compute Python SDK provides programming abstractions for interacting with the Globus Compute service. Before running this tutorial you should first install the Globus Compute SDK as follows:
$ pip install globus-compute-sdk
The Globus Compute SDK exposes a Client
and Executor
for
interacting with the Globus Compute service. In order to use Globus
Compute, you must first authenticate using one of hundreds of supported
identity provides (e.g., your institution, ORCID, or Google). As part of
the authentication process you must grant permission for Globus Compute
to access your identity information (to retrieve your email address) and
Globus Groups management access (for sharing functions).
from globus_compute_sdk import Executor
Note: Here we use the public Globus Compute tutorial endpoint.
You can use this endpoint to run the tutorial (the endpoint is shared
with all Globus Compute users). You can also change the
endpoint_id
to the UUID of any endpoint for which you have
permission to execute functions.
tutorial_endpoint = '4b116d3c-1703-4f8f-9f6f-39921e5864df' # Public tutorial endpoint
gce = Executor(endpoint_id = tutorial_endpoint)
print("Executor : ", gce)
Globus Compute 101¶
The following example demonstrates how you can execute a function with
the Executor
interface.
Submitting a function¶
To execute a function, you simply call submit
and pass a reference
to the function. Optionally, you may also specify any input arguments to
the function.
# Define the function for remote execution
def hello_world():
return "Hello World!"
future = gce.submit(hello_world)
print("Submit returned: ", future)
Getting results¶
When you submit
a function for execution (called a task
), the
executor will return an instance of ComputeFuture
in lieu of the
result from the function. Futures are a common way to reference
asynchronous tasks, enabling you to interrogate the future to find the
status, results, exceptions, etc. without blocking to wait for results.
ComputeFuture
s returned from the Executor
can be used in the
following ways: * future.done()
is a non-blocking call that returns
a boolean that indicates whether the task is finished. *
future.result()
is a blocking call that returns the result from the
task execution or raises an exception if task execution failed.
# Returns a boolean that indicates task completion
future.done()
# Waits for the function to complete and returns the task result or exception on failure
future.result()
Catching exceptions¶
When a task fails and you try to get its result, the future
will
raise an exception. In the following example, the ZeroDivisionError
exception is raised when future.result()
is called.
def division_by_zero():
return 42 / 0 # This will raise a ZeroDivisionError
future = gce.submit(division_by_zero)
try:
future.result()
except Exception as exc:
print("Globus Compute returned an exception: ", exc)
Functions with arguments¶
Globus Compute supports registration and execution of functions with
arbitrary arguments and returned parameters. Globus Compute will
serialize any *args
and **kwargs
when executing a function and
it will serialize any return parameters or exceptions.
Note: Globus Compute uses standard Python serialization libraries (i.e., Dill). It also limits the size of input arguments and returned parameters to 10 MB. For larger input or output data we suggest using Globus.
The following example shows a function that computes the sum of a list of input arguments.
def get_sum(a, b):
return a + b
future = gce.submit(get_sum, 40, 2)
print(f"40 + 2 = {future.result()}")
Functions with dependencies¶
In order to execute a function on a remote endpoint, Globus Compute requires that functions explicitly state all dependencies within the function body. It also requires that any dependencies (e.g., libraries, modules) are available on the endpoint on which the function will execute. For example, in the following function, we explicitly import the datetime module.
def get_date():
from datetime import date
return date.today()
future = gce.submit(get_date)
print("Date fetched from endpoint: ", future.result())
Calling external applications¶
While Globus Compute is designed to execute Python functions, you can
easily invoke external applications that are accessible on the remote
endpoint. For example, the following function calls the Linux echo
command.
def echo(name):
import os
return os.popen("echo Hello {} from $HOSTNAME".format(name)).read()
future = gce.submit(echo, "World")
print("Echo output: ", future.result())
Running Parsl workflows¶
Globus Compute enables remote execution of Parsl workflows, which utilize Parsl Apps.
The recommended setup is to run your Globus Compute endpoint with default configuration on a login node, then allow Parsl to handle provider configuration, etc.
Below is a simple example. Note that we are returning a result, not
a Future
. The latter will cause serialization issues.
def workflow(n1, n2):
import parsl
from parsl.app.app import python_app, join_app
# First call clear() to avoid conflicts with existing
# global config variables
parsl.clear()
parsl.load()
@python_app
def add(n1, n2):
return n1 + n2
@python_app
def double(n):
return n*2
@join_app
def calc(n1, n2):
return double(add(n1, n2))
# Returning a Future will cause serialization issues
return calc(n1, n2).result()
f = gce.submit(workflow, 1, 2)
Running functions many times¶
One of the strengths of Globus Compute is the ease by which you can run functions many times, perhaps with different input arguments. The following example shows how you can use the Monte Carlo method to estimate pi.
Specifically, if a circle with radius \(r\) is inscribed inside a square with side length \(2r\), the area of the circle is \(\pi r^2\) and the area of the square is \((2r)^2\). Thus, if \(N\) uniformly-distributed points are dropped at random locations within the square, approximately \(N\pi/4\) will be inside the circle and therefore we can estimate the value of \(\pi\).
import time
# function that estimates pi by placing points in a box
def pi(num_points):
from random import random
inside = 0
for i in range(num_points):
x, y = random(), random() # Drop a point randomly within the box.
if x**2 + y**2 < 1: # Count points within the circle.
inside += 1
return (inside*4 / num_points)
# execute the function 3 times
estimates = []
for i in range(3):
estimates.append(gce.submit(pi,
10**5))
# get the results and calculate the total
total = [future.result() for future in estimates]
# print the results
print("Estimates: {}".format(total))
print("Average: {:.5f}".format(sum(total)/len(estimates)))
Endpoint operations¶
You can retrieve information about endpoints including status and information about how the endpoint is configured.
from globus_compute_sdk import Client
gcc = Client()
gcc.get_endpoint_status(tutorial_endpoint)