Concurrency and Async Programming in Python

Threading

  • With threading library, real OS level threads are created, but GIL explicitly prevents multiple threads from executing Python bytecode at the same time.
  • GIL will suspend the thread every 5ms.
  • Delegating synchronous tasks to thread with asyncio.to_thread

asyncio

  • coroutine
    • async def functions (most likely with await inside) are coroutine functions and return coroutine objects.
    • coroutine is an object that encapsulates the ability to resume (by await) an underlying function that has been suspended before completion.
    • Coroutines can either be await-ed, or be ran by putting into event loop. asyncio.run(), loop.run_until_complete(), asyncio.create_task() -> Task
    • Inside a coroutine, asyncio.sleep() should be used instead of time.sleeo(), so that the single main thread is not blocked.
    • A well designed coroutine should be nonblocking, i.e. releases GIL when possible.
  • Task vs Future
    • Task is a subclass of Future, Future is closer to Promise in JavaScript and is mostly used for framework.
    • Future has methods such as set_value(), done(), cancel()
    • asyncio.Future and concurrent.futures.future are mostly the same, but the latter is not await-able.
  • asyncio.get_event_loop (globally) and asyncio.get_running_loop (inside a coroutine) to get the loop — manually managing the loop is no longer needed.
  • To cancel tasks, use Task.cancel(), where CancelledError will be raised at the await expression. In normal exit, StopIteration is raised.
  • await asyncio.sleep(0) can be used as a gap to return control so that other threads can proceed. However CPU intensive operations should still be delegated to processes.
  • For blocking functions:
    • Run them in a (thread or process) executor with loop.run_in_executor()
    • Or delegate to another thread with asyncio.to_thread()
  • Objects cannot be used as sentinel since they lose identity after serialization and deserialization. None is not suitable if it can occur in the data stream.
  • futures.as_complete forms a generator that yields future result as they’re done, which can be combined with tqdm to create progress bar.
  • futures.result() returns a value or raises exceptions caught.
  • Other async syntactic sugars
    • Asynchronous context manager and generators, with __aenter__, __aexit__, and __anext__, __aiter__.
    • Async generators.
    • Async iterators, with def __aiter__ and async def __anext__.

Libraries

  • greenlet can be used for cooperative multitasking without special syntax, it’s used by SQLAlchemy internally. gevent makes Python’s socket library nonblocking.
  • lelo library allows parallelizing functions simply by @parallelize decorator
  • Unlike JavaScript, Python supports 3rd party async runtime such as Curio and Trio, which may provide more sensible API. (since asyncio allows other implementations of event loop)