Parsl - a Python |parallel scripting library
Parsl is a parallel programming library for Python. Parsl augments Python with simple, scalable, and flexible constructs for encoding parallelism. Developers annotate Python functions to specify opportunities for concurrent execution.
Tutorial
Introduction
- Three types of apps:
@python_appand@bash_app, and@join_app. - Two types of futures:
AppFuturesandDataFutures. output: listandinput: listto specify IO withFileabstraction
A sample of local HighThroughputExecutor pool:
Advanced features
- Multiple Sites
- Define multiple executors pool.
- Combine
labeland theexecutors=[]parameter in the decorator.
- Elasticity
- Specify the resources constraints in the config.
- Tweaking the
parallelismparameter to favor reusing resources or elastically growth.
- Fault tolerance and caching
- Caching results from completed apps, result is returned when same {function, name, arguments, function body} are invoked.
- Incremental checkpointing saves state changes from previous checkpoint.
- Globus data management
- Monitoring
- Optional module needed:
pip3 install parsl[monitoring] - Monitoring info stored in SQLite database.
- Visualization of data and workflow
- Optional module needed:
A sample multi-site configuration:
multi_site_config = Config(
executors=[
ThreadPoolExecutor(
max_threads=8,
label='local_threads'
),
HighThroughputExecutor(
label="local_htex",
worker_debug=True,
max_workers=1,
provider=LocalProvider(
channel=LocalChannel(),
init_blocks=1,
max_blocks=1,
),
)
]
)Configure decorators to use the pools:
@bash_app(executors=["local_threads"])Issues faced in development
- ❗ Big problem: how to ensure the type checking?