Parsl - a Python |parallel scripting library

Parsl/parsl

Parsl is a parallel programming library for Python. Parsl augments Python with simple, scalable, and flexible constructs for encoding parallelism. Developers annotate Python functions to specify opportunities for concurrent execution.

Tutorial

Parsl documentation

Parsl/parsl-tutorial

Introduction

  • Three types of apps: @python_app and @bash_app, and @join_app.
  • Two types of futures: AppFutures and DataFutures.
  • output: list and input: list to specify IO with File abstraction

A sample of local HighThroughputExecutor pool:

Advanced features

  • Multiple Sites
    • Define multiple executors pool.
    • Combine label and the executors=[] parameter in the decorator.
  • Elasticity
    • Specify the resources constraints in the config.
    • Tweaking the parallelism parameter to favor reusing resources or elastically growth.
  • Fault tolerance and caching
    • Caching results from completed apps, result is returned when same {function, name, arguments, function body} are invoked.
    • Incremental checkpointing saves state changes from previous checkpoint.
  • Globus data management
  • Monitoring
    • Optional module needed: pip3 install parsl[monitoring]
    • Monitoring info stored in SQLite database.
    • Visualization of data and workflow

A sample multi-site configuration:

multi_site_config = Config(
    executors=[
        ThreadPoolExecutor(
            max_threads=8,
            label='local_threads'
        ),
        HighThroughputExecutor(
            label="local_htex",
            worker_debug=True,
            max_workers=1,
            provider=LocalProvider(
                channel=LocalChannel(),
                init_blocks=1,
                max_blocks=1,
            ),
        )
    ]
)

Configure decorators to use the pools:

@bash_app(executors=["local_threads"])

Issues faced in development

  • ❗ Big problem: how to ensure the type checking?

Detailed Discussion