NumPy: Numerical Python

Good C API, making NumPy both efficient and ideal for wrapping C, C++, and legacy Fortran code.
NumPy internally stores data in contiguous blocks of memory, this takes much less memory, and operation on it don’t have overhead with regular interpreted Python code.
NumPy is designed to work with very large arrays, so all slicing returns a view instead of a copy. (Boolean indexing creates a copy, thought.)
Numba

Usage

“Fancy indexing” selects with “coordinates tuple”, not rectangular regions, e.g. arr[[1, , 3, 4], [5, 6, 7, 8]] actually returns arr[1, 5], arr[2, 4]... instead of arr[[1, 2, 3, 4]][:, [5, 6, 7, 8]].
NumPy universal functions ufunc are functions that operate element wise on the whole array.
NumPy almost always return a view instead of copy.
Reshaping
- reshape method.
- ravel and flatten method, the latter returns a copy!
- order='C' or 'F' can be passed for C (traverse higher dimensions first) or Fortran (traverse higher dimensions last) style order.
vstack and hstack are good shorthands for concatenate for 2D arrays. np.c_ and np.r_ are even more concise.
repeat and tile, with tile, the arg specifies the “layout” of the tiling.
Broadcasting
- Vectorization and broadcast: performing operations on array without writing loops, e.g. with xs, ys = np.meshgrid(...) or np.where(cond, xarr, yarr)
- Broadcasting rule: for each trailing dimension, the axis length match or either is 1. The broadcast is made over the missing or or length 1 dimension.
- np.newaxis can be used to easily create new axis: arr = arr[:, np.newaxis, :].
“Local reduce” reduceat is similar to grouping: np.add.reduceat(arr, [0, 5, 8]) aggregates arr[0:5] and arr[5:8] and arr[8:].
Structured array
- Can be used to hold somewhat heterogenous data.
- dtype = [('x', np.float64), ('y', np.int64)]
- Can also add shape: ('x', np.int64, 3), in this case arr['x'] returns an array.
- This can be further nested
Sorting
- argsort and np.lexsort((field1, field2)) both returns indexers.
- kind='mergesort' is the only available stable sorting.
- np.partition(arr, 3) will populate the least 3 elements in the beginning, argpartition is similar but returns an indexer.
- arr.searchsorted() performs binary search on sorted data.
- labels = bins.searchsorted(data) can be used to categorize data.
Use np.memmap() to load a memmap file. Any changes will be buffered in mem until flush method is invoked.
Memory contiguity is important for performance
- arr.flags has C_CONTINUOUS and F_CONTIGUOUS fields.
- When an array is C_CONTIGUOUS, aggregation on the rows are much faster.
- arr.copy('F') can create a copy in Fortran order.

My Vault

Explorer

numpy

NumPy: Numerical Python

Usage

Graph View

Table of Contents

Backlinks