Pythonic

Python is “simple and correct”!

  • Walrus operator :=, a := 10 will perform assignment and also return 10.
  • case [str(name), _, _, (float(lat), float(lon))]:
  • == is usually what you want. is is mostly used for comparison with None or other sentinels, it can’t be overloaded so it’s slightly faster than __eq__
  • Weak references | Fluent Python, several classes can be used to reference objects without actually being counted, useful for caches.
  • Prefer EAFP (easy to ask forgiveness than permission) over LBYL (look before you leap), with try-except blocks. This improves safety in multi-threaded programs.

String

  • Single quote by default.
  • '%s/%s' % ('hello', 'world), the % formating operator understands tuple.
  • '\N{NAME OF UNICODE CHAR}'
  • str.maketrans and str.translate
  • locale.strxform for transforming Unicode text to locale-aware form for comparison. Alternative, use pyuca and the more powerful pyicu.
  • unicodedata module for metadata about unicode characters.
  • RegEx built from str and bytes are different
  • Use reprlib.repr() to get concise and readable representation for objects

Data Models

  • l[i, ...] could mean l[i, :, :, :]
  • __missing__ method may or may not be called, depending the class we’re subclassing
  • dict is now ordered by default, but OrderedDict is still optimized for ordering
  • types.MutableMappingProxy
  • Consider extending from UserDict instead of dict due to their different implementations
  • dict.setdefault is helpful for updating mutable mappings
  • dict.keys and dict.vales return a view, which is similar to a set
  • memoryview creates a shared view without copying bytes
  • __post_init__ method for data classes
  • Inherit from TypedDict to have annotations for fields in a dictionary. Unlike named tuple or data classes, it has no runtime effect, can’t set defaults.

Object Oriented

  • Mixin classes can be used to provide additional functionalities to a class when inherited.
  • Class attributes are stored in a shared dictionary, hence adding attributes after instantiation compromises performance.
  • In data classes, fields without type hints are considered class attributes, unless it’s typed as ClassVar or InitVar
  • Use __match_args__ method to implement objects matching with positional attributes
  • Name mangling: __attr becomes _ClassName__attr
  • A method calls super() will activate the classes in __mro__, the corresponding method may need to also call super() to cooperate
  • Extend from UserDict, UserList, UserStr, since the built-in types are implemented in C and they don’t call magical methods in the expected way.
  • Try to avoid inheritance unless the benefit is clear. Static duck typing is better.
  • Operator overloading
    • For a + b, a.__add__ is invoked first, if doesn’t work, then b.__radd__
    • Return NotImplemented if not supported
  • When accessing attributes from an instance, it first searches in obj.__class__ for properties, if it doesn’t exist, falls back to a key in obj.__ditc. then to obj.__getattr__. This can be bypassed by directly accessing obj.__dict__.
  • Properties are class attributes. Can be created with @properties, with corresponding decorators for getters and setters. Can also be created with property factory functions.
  • property(fget=None, fset=None, fdel=None, doc=None)
  • __init__ is initializer, __new__ is the real constructor.
  • Using descriptors as getter/setter to validate properties. __set__, __get__ methods.
  • Every function is a non-overriding descriptor. Cls.func.__get__(obj) returns a method bound to obj, Cls.func.__get__(None, Cls) returns a function without binding.

Metaprogramming

  • type(name, bases, dict) can be used to construct classes in run time. type is a meta class that build other classes.
  • __init_subclass__ can be used to customize some behaviors of subclasses when they’re created. Note that __slots__ can’t be tweaked here since the classes have been constructed by __new__ — meta class is needed for this.
  • type is an instance of type, type is a subclass of object, object is an instance of type.
  • Every class is an instance of type (maybe indirectly), but only meta classes are subclass of type. Meta classes like abc.ABCMeta inherits from type the power to construct classes.
  • MetaKlass.__new__ constructs new classes.

Utils

  • sorted is based on __lt__ method
  • func.__code__ to view the code, and func.__code__.co_varnames and func.__code__.co_freevars to view the var names. func.__closure__ to view the closure. Content can be viewed with func.__closure__[0].cell_contents
  • Given a str, we can have str.fmt to use it as a formatting string.
  • inspection lib, to help inspect objects.
  • local.getpreferredencoding
  • list(zip(*a)) can be used to transpose a 2D list
  • itertools.chain to chain generators together

Functions

  • Position-only argument, tuple-captured arguments, keyword-only arguments… e.g. func(a, b, /, *d, e, **f)
  • operator library, which contains operators that can be used together with reduce, or to be used to avoid creating lambda functions. itemgetter, attrgetter, and methodcaller can be used to create a function that accesses certain item/attr, or call a certain method. e.g. key=itemgetter(2).
  • Use functool.partial and partialmethod to create a callable from a callable with arguments bound to certain values.
  • Optional arg, Optional[str] is a shorthand of Union[str, None], or str | None
  • tuple[int, ...] for unlimited tuple
  • Use abc.Mapping instead of dict in param type hints. Parameters should be typed with generics, returns should be with concretes.
  • @functools.wraps(func) decorator.
  • @cache can take up all memory, @lru_cache with maxsize is safer. maxsize should be a power of 2. typed defaults to False
@singledispatch
def htmlize(obj: object) -> str:
    content = html.escape(repr(obj))
    return f'<pre>{content}</pre>'
 
@htmlize.register
def _(text: str) -> str:
    content = html.escape(text).replace('\n', '<br/>\n')
    return f'<p>{content}</p>'
 
# stack and pass types if needed
@htmlize.register(decimal.Decimal)
@htmlize.register(float)
def _(x) -> str:
    ...
  • locals() can be used to access all local vars.

Types

Duck Typing

Prefer duck typing when possible, then goose typing, then nominal typing (isinstance) if really needed.

  • Type x is consistent with y, if x supports all operations of y. Any is a special case.
  • TypeVar
    • T = TypeVar('T', int, float), or T = TypeVar('T', bound=Hashable), other options are also supported
    • TypeVar can be replaced with type keyword in 3.12. But for generic classes they still implicitly inherit from Generic
    • Use variance argument to specify
# define a protocol
from typing import Protocol, Any
class SupportsLessThan(Protocol):
    def __lt__(self, other: Any) -> bool: ...
LT = TypeVar('LT', bound=SupportsLessThan)
  • typing.TYPE_CHECKING is a bool that is False in runtime and True in checking
  • __annotations__ can be used to read type hints from data classes. The annotations are read as string, and evaluated when called with inspect.get_annotations()
  • Goose typing by sub-classing the ABC, or define classes first, then register them as virtual subclass of an ABC.
  • Duck typing by EAFP (easier to ask for forgiveness than for permission), e.g. to assert o can is compatible with list, simply call list(o).
  • __subclasshook__ can be used to modify the behavior of subclass/instance. For example, any object with __len__ will be considered a Sequence, even without subclassing or registering.
  • However, runtime checks don’t always work well, static protocol typing is still helpful.
  • @runtime_checkable to make protocols checkable at runtime.
  • It’s better to make protocols “single method”, as in SupportsComplex.
  • Typing for numbers are tricky, since number.Number has no methods in ABC
  • Python interpreter goes a long way for iterators and sequences to work, even when methods are partially implemented.
  • Use @overload to define multiple function signatures.

Decorators

  • pytest.mark.parametrize for testing. e.g. @mark.parametrize('a, b', [...])
  • @classmethod receives class itself as first arg, @staticmethod receives nothing
  • @typing.final - cannot be overridden, @typing.override - explicitly tell type checker that this function overrides super class

Iterators and Generators

  • re.finditer performs lazy evaluation (unlike findall), useful for building lazy evaluated iterators.
  • An object is Iterable as long as it have __getitem__ (doesn’t have to have __next__)
  • To build iterator, also try using yield, yield from, and generator expressions. Raise StopIteration exception if done.
  • Generator and Coroutine types are functionally equivalent, but classic coroutine with generator function should use Generator
  • Iterator produces items, generator consumes items.
  • Pattern: send a sentinel value to generator to terminate it; raise StopIteration inside generator to terminate it.
  • When gen.close() is invoked, a StopIteration exception will be raised in the yield statement.

Control Flow

Context Manager

  • ctx.__enter__ and ctx.__exit__ will be called. ctx.__exit__ will also be called with exceptions for handling.
  • gen.__exit__ returns a truthy value to indicate that exceptions have been handled, otherwise they will be propagated.
  • Multiple contexts since Python 3.10: with ( Ctx1() as ctx1, Ctx2() as ctx2 ):
  • @contextlib.contextmanager to make a function a context. Anything before yield will be __enter__, the yielded object will be given to as, anything after will be __exit__. It’s assumed that the context handles the exceptions.
  • Functions decorated with @contextmanager can be used to decorate other functions.

else Block

  • else can be used with for and while, which will be executed if the loop is terminated by break.
  • else can be used with except, which will be executed if the try block didn’t raise any exceptions.

Modules and Packages

  • Modules are also first-class objects, can be found in sys.modules.
  • Avoid doing things in the module’s body except binding attributes.
  • Special methods like __getattr__ at module body will work.
  • imoprt builtins or the __builtins__ attribute can be used to monkey patch builtins.
  • If the first statement is a string literal, it’s bound to __doc__
  • from mod import * binds all attributes except those starting with _. Module can define a __all__ for a list of names that should be bound.
  • Importing is implemented in builtin.__import__, it first searches in sys.modules
  • .pth file for Python path config. When importing, sys.path is looked up, the final path for module M would be M/__init__.py
  • Use python -m testname to see if a name exists in stdlib.
  • importlib.reload(M) can reload a module, but existing bindings to attributes won’t be reloaded (hence import is preferred over from when importing a module)
  • Hooks can be added for module importing.
  • P/__init__.py, P’s module body, must be present for P to be recognized as a package (except for namespace packages). If P.__all__ is not set, from P import * will only import names in __init__.py
  • Relative import: from . import X will look for X in current package, more dots can go up in hierarchy.