NumPy: Numerical Python

  • Good C API, making NumPy both efficient and ideal for wrapping C, C++, and legacy Fortran code.
  • NumPy internally stores data in contiguous blocks of memory, this takes much less memory, and operation on it don’t have overhead with regular interpreted Python code.
  • NumPy is designed to work with very large arrays, so all slicing returns a view instead of a copy. (Boolean indexing creates a copy, thought.)
  • Numba

Usage

  • “Fancy indexing” selects with “coordinates tuple”, not rectangular regions, e.g. arr[[1, , 3, 4], [5, 6, 7, 8]] actually returns arr[1, 5], arr[2, 4]... instead of arr[[1, 2, 3, 4]][:, [5, 6, 7, 8]].
  • NumPy universal functions ufunc are functions that operate element wise on the whole array.
  • NumPy almost always return a view instead of copy.
  • Reshaping
    • reshape method.
    • ravel and flatten method, the latter returns a copy!
    • order='C' or 'F' can be passed for C (traverse higher dimensions first) or Fortran (traverse higher dimensions last) style order.
  • vstack and hstack are good shorthands for concatenate for 2D arrays. np.c_ and np.r_ are even more concise.
  • repeat and tile, with tile, the arg specifies the “layout” of the tiling.
  • Broadcasting
    • Vectorization and broadcast: performing operations on array without writing loops, e.g. with xs, ys = np.meshgrid(...) or np.where(cond, xarr, yarr)
    • Broadcasting rule: for each trailing dimension, the axis length match or either is 1. The broadcast is made over the missing or or length 1 dimension.
    • np.newaxis can be used to easily create new axis: arr = arr[:, np.newaxis, :].
  • “Local reduce” reduceat is similar to grouping: np.add.reduceat(arr, [0, 5, 8]) aggregates arr[0:5] and arr[5:8] and arr[8:].
  • Structured array
    • Can be used to hold somewhat heterogenous data.
    • dtype = [('x', np.float64), ('y', np.int64)]
    • Can also add shape: ('x', np.int64, 3), in this case arr['x'] returns an array.
    • This can be further nested
  • Sorting
    • argsort and np.lexsort((field1, field2)) both returns indexers.
    • kind='mergesort' is the only available stable sorting.
    • np.partition(arr, 3) will populate the least 3 elements in the beginning, argpartition is similar but returns an indexer.
    • arr.searchsorted() performs binary search on sorted data.
    • labels = bins.searchsorted(data) can be used to categorize data.
  • Use np.memmap() to load a memmap file. Any changes will be buffered in mem until flush method is invoked.
  • Memory contiguity is important for performance
    • arr.flags has C_CONTINUOUS and F_CONTIGUOUS fields.
    • When an array is C_CONTIGUOUS, aggregation on the rows are much faster.
    • arr.copy('F') can create a copy in Fortran order.