NumPy: Numerical Python
Good C API, making NumPy both efficient and ideal for wrapping C ,
C++ , and legacy Fortran code.
NumPy internally stores data in contiguous blocks of memory, this takes much
less memory, and operation on it don’t have overhead with regular interpreted
Python code.
NumPy is designed to work with very large arrays, so all slicing returns a
view instead of a copy. (Boolean indexing creates a copy, thought.)
Numba
Usage
“Fancy indexing” selects with “coordinates tuple”, not rectangular regions,
e.g. arr[[1, , 3, 4], [5, 6, 7, 8]] actually returns
arr[1, 5], arr[2, 4]... instead of arr[[1, 2, 3, 4]][:, [5, 6, 7, 8]].
NumPy universal functions ufunc are functions that operate element wise on
the whole array.
NumPy almost always return a view instead of copy.
Reshaping
reshape method.
ravel and flatten method, the latter returns a copy !
order='C' or 'F' can be passed for C (traverse higher dimensions
first ) or Fortran (traverse higher dimensions last ) style
order.
vstack and hstack are good shorthands for concatenate for 2D arrays.
np.c_ and np.r_ are even more concise.
repeat and tile, with tile, the arg specifies the “layout” of the
tiling.
Broadcasting
Vectorization and broadcast : performing operations on array without
writing loops, e.g. with xs, ys = np.meshgrid(...) or
np.where(cond, xarr, yarr)
Broadcasting rule: for each trailing dimension, the axis length match or
either is 1. The broadcast is made over the missing or or length 1
dimension.
np.newaxis can be used to easily create new axis:
arr = arr[:, np.newaxis, :].
“Local reduce” reduceat is similar to grouping:
np.add.reduceat(arr, [0, 5, 8]) aggregates arr[0:5] and arr[5:8] and
arr[8:].
Structured array
Can be used to hold somewhat heterogenous data.
dtype = [('x', np.float64), ('y', np.int64)]
Can also add shape: ('x', np.int64, 3), in this case arr['x'] returns an
array.
This can be further nested
Sorting
argsort and np.lexsort((field1, field2)) both returns indexers.
kind='mergesort' is the only available stable sorting.
np.partition(arr, 3) will populate the least 3 elements in the beginning,
argpartition is similar but returns an indexer.
arr.searchsorted() performs binary search on sorted data.
labels = bins.searchsorted(data) can be used to categorize data.
Use np.memmap() to load a memmap file. Any changes will be buffered in mem
until flush method is invoked.
Memory contiguity is important for performance
arr.flags has C_CONTINUOUS and F_CONTIGUOUS fields.
When an array is C_CONTIGUOUS, aggregation on the rows are much faster.
arr.copy('F') can create a copy in Fortran order.