Globus Compute for AWS Lambda Developers¶
This guide is intended for prospective Globus Compute endpoint administrators and users who are familiar with AWS Lambda but not with Compute, and aims to explain core concepts in Compute in terms of similar concepts in Lambda.
Major Differences¶
AWS Lambda |
Globus Compute |
|
|---|---|---|
Pricing Model |
Usage-based billing |
Free; optional subscription |
Deployment |
Everything in the cloud |
Hybrid cloud and on-premises |
Execution Environment |
Managed by AWS |
Managed by endpoint administrators |
Function Definition |
Packaged code with handler function |
Plain Python functions |
Dependencies |
Deployment package, layers |
Managed by endpoint administrators |
Functions & Endpoints¶
In Compute, functions are simply Python or Bash source code that users register with the service. Unlike Lambda, they are not tied to any particular execution context. Instead, when calling a function, users choose an endpoint they can access to execute their code. Like any Lambda runtime, a Compute endpoint is what is ultimately responsible for executing a function’s code.
To manage Compute endpoints, an endpoint administrator installs the package on a Linux-based machine they want to share with others,
then uses the globus-compute-endpoint executable to create endpoints, configure
them, and register them with the hosted Compute services. During the registration
process, the Compute web service assigns the endpoint a UUID identifier that users can
use to submit to that endpoint. Generally any function can be run on any endpoint that
a user has access to. For security reasons
endpoints are only accessible to their owners by default, but they can easily be opened
up as needed.
Defining and Registering Functions¶
Because Compute functions are standard Python functions, they are frequently defined, registered, and invoked all in the same Python context. They can also be defined and registered separately, similar to Lambda.
Consider a simple AWS Lambda function that computes statistics on an array:
lambda.py: simple Lambda function definition¶import numpy as np
def lambda_handler(event, context):
array = np.array(event.get("array", []))
mean = np.mean(array)
std = np.std(array)
return {
"statusCode": 200,
"body": {
"mean": mean,
"std": std
}
}
lambda.py to Lambda¶zip function.zip lambda.py
aws lambda create-function \
--function-name arrayStats \
--runtime python3.11 \
--handler lambda.lambda_handler \
--zip-file fileb://function.zip \
--role <some IAM role with lambda execution permissions> \
--layers <some layer with numpy>
Note
This example assumes there is a pre-configured AWS Lambda layer that includes numpy.
Compare with its Compute analog:
from globus_compute_sdk import Executor
# define
def array_stats(array):
import numpy as np
array = np.array(array)
mean = np.mean(array)
std = np.std(array)
return { "mean": mean, "std": std }
# register
function_id = Executor().register_function(array_stats)
# function_id is a UUID, eg eca16448-bfe2-445a-900c-cd46b1426522
Important
In order for this to run without errors, numpy must be present on the executing
side. See Dependency Management for details.
Unlike Lambda handler functions, which must follow a specific calling convention,
Compute functions can have arbitrary signatures (ie def function(what, *ever,
**args)) and return arbitrary objects (ie return anything). And while Lambda
functions are registered by shipping entire modules to AWS, Compute functions are
registered by shipping standalone functions to the Compute web service. Generally this
is achieved by serializing the in-memory representation of the function, but there are
various alternative strategies such as
packaging the function’s source code.
One side-effect of Compute’s approach, and a notable difference from Lambda, is that
any code written outside of the function body is not executed as a matter of course.
This is why, in the Compute example, import numpy as np is inside the function body
instead of at the top of the file.
While this style covers the vast majority of use cases, Compute also supports a “module with a handler function” approach like Lambda:
module.py¶import numpy as np
def array_stats(array):
array = np.array(array)
mean = np.mean(array)
std = np.std(array)
return { "mean": mean, "std": std }
module.py¶from pathlib import Path
from globus_compute_sdk import Executor
other_module_code = Path("module.py").read_text()
function_id = Executor().register_source_code(
other_module_code, function_name="array_stats"
)
# function_id is a UUID, eg a8cfa586-d8ff-489f-bf77-e62f3d0ffa1a
Calling Functions¶
While Compute provides a public REST API that can be used to call functions, Compute endpoints generally expect arguments to be serialized a certain Pythonic way that the Compute SDK handles automatically.
Continuing the example from the previous sections, here’s what a function call might look like in Lambda:
import boto3
import json
client = boto3.client('lambda')
payload = {
"array": [1, 2, 3, 4, 5]
}
response = client.invoke(
FunctionName="arrayStats",
Payload=json.dumps(payload),
)
result = json.loads(response['Payload'].read())
print(result)
And this is what a function call might look like in Compute:
from globus_compute_sdk import Executor
endpoint_id = "<some endpoint UUID>"
array_stats_function_id = "<the function ID from the previous example>"
array = [1, 2, 3, 4, 5]
with Executor(endpoint_id) as gcx:
future = gcx.submit_to_registered_function(
array_stats_function_id, array
)
print(future.result())
This example may look synchronous, but the Compute SDK has constructed a future
that resolves asynchronously with the result of the computation. Notice also the use of
the context manager (with) — an Executor
does a lot of work in the background to track tasks and futures, so it’s good practice
to encapsulate its usage in a with-statement to clean it up even if other
components break unexpectedly.
Note
See the Executor User Guide for more details on
how Compute futures and the Executor work together.
So far these examples have shown how to register and call functions as separate steps, like in Lambda. Compute also supports a mode where function registration and submission happen transparently in the same call:
Executor.submit¶from globus_compute_sdk import Executor
def string_stats(string):
return {
"length": len(string),
"uppercase": string.upper(),
"lowercase": string.lower()
}
string = "Hello, Compute!"
with Executor(endpoint_id) as gcx:
future = gcx.submit(string_stats, string)
print(future.result())
While there is no direct analog to Lambda’s event-source mappings, Compute functions can be called in response to events using Globus Flows. See also the documentation on the Compute Action Provider for Flows.
Dependency Management¶
Since Compute functions contain no dependency information beside import statements,
endpoint administrators must take care to ensure any needed dependencies are present on
the endpoint side. For static workflows a manual approach is sufficient — ie, administrators install packages as needed to the Python environment that the
endpoint is running in — but there are mechanisms that allow for more
dynamic situations.
These lean on Compute’s configuration template capabilities, which don’t have an analog in Lambda. In short, Compute endpoints spawn child processes that actually handle the execution of the task, and the configuration of these child processes is determined by a combination of the endpoint’s configuration template and the details of the task at hand.
One approach allows users to specify a container with their task. With the following configuration options included:
user_config_template.yaml.j2 with dynamic containers¶display_name: My Containerized Endpoint
engine:
type: GlobusComputeEngine
container_type: {{ container_type }}
container_uri: {{ container_uri }}
container_cmd_options: {{ container_cmd_options|default() }}
provider:
init_blocks: 1
max_blocks: 1
min_blocks: 0
type: LocalProvider
Users can then leverage the user_endpoint_config feature of the executor to choose
a container:
from globus_compute_sdk import Executor
ep_id = "..."
config = {
"container_type": "docker", # or podman, apptainer...
"container_uri": "<my docker container>",
}
with Executor(
endpoint_id=ep_id,
user_endpoint_config=config,
) as gcx:
gcx.submit(...)
# or:
with Executor(endpoint_id=ep_id) as gcx:
gcx.user_endpoint_config = config
gcx.submit(...)
Important
The container being used must have the globus-compute-endpoint package installed.
For static workflows where users always invoke the same function, administrators can simply install dependencies directly into the endpoint’s Python environment. In these cases administrators might want to ensure that only this known function is called, which can be achieved with a function allow list:
allowed_functions:
- eca16448-bfe2-445a-900c-cd46b1426522