-
NumPy is esssential for numerical operations and handling arrays
- in a high-level language
History
Python extended for indexing syntax to make array computing easier
-
Array indexing refers to the use of the square brackets to index array values
######### # Single Element Indexing ##### >>> import numpy as np >>> x = np.arange(0b11) >>> x[1] 1 >>> x[0] 0
Extending Python for Numerical Computation
-
Python includes numerical extensions implementing a matrix package
-
with influences from Array Programming Languages
- rather than retrofit an existing numerical language to support the wealth of features found in a general-purpose programming language, it makes more sense to attack the problem from the other end instead
-
with influences from Array Programming Languages
-
All of the basic numeric operators are supported element-wise for matrices
- basic array comparison and logical operators are provided as methods
- The matrix object is designed to support array of arbitrary dimensions
-
Mapping semantics are used to support multi-dimensional product form indexing for matrix objects
-
multi-dimension indices are a sequence of python objects
-
the first object corresponds to the first dimension of the matrix, and so on
-
the following objects are possible for each dimension
- a single integer indicating one element from that dimension and a reduction in the number of dimensions in the returned matrix
- a sequence of integers not neccesarily unique, selecting an arbitrary collection of elements along that dimension
- a slice object which selects a range of elements along the dimension with the optional start index, stop index, and step size
-
the following objects are possible for each dimension
-
the first object corresponds to the first dimension of the matrix, and so on
-
multi-dimension indices are a sequence of python objects
-
Every aritimetic operator is implemented as a special ofunc object with python
-
these functions are to be called with matrix or scalar arguments to return the normal result
-
these functions can be subscripted to indicate that they should be applied at specific rank
- and they can be modified to indicate that instead of a direct product, they should be applied as a generalized outer, or inner product, or as a reduction or accumulation.
-
these functions can be subscripted to indicate that they should be applied at specific rank
-
these functions are to be called with matrix or scalar arguments to return the normal result
-
Full range of primitive data types
- Matrix object supports arrays of chars, bytes, shorts, ints, longs, floats, doubles, complex floats, complex doubles, and raw python objects
-
No changes to the Python Core
-
two new types designed
- one for matrices and one for the functions on matrices
- and the module
-
two new types designed
Numarray and Numpy
-
A new package called Numarray, which had faster operations for large arrays but slower than Numeric on small ones
-
NumPy was created to merge both Numeric and Numarray array objects in python
- Numarray features were ported to Numeric
What is an Array Programming Language?
https://web.archive.org/web/20110227013846/http://www.vector.org.uk/archive/v223/smill222.htm
-
lets use a simple metaphor for classifying programming languages and follow this with examples of programming a simple example in both a hypothetical and representative machine language and in BASIC
- which we will consider representative of a conventional programming language
-
Suppose we have a box of apples from which we wish to select all of the good apples
-
the instructions should conventionally be something like
Reserve a place for the good apples. Then select an apple from the box, and if it is good put it in a reserved place. Select a second apple, and if it is good put it too in the reserved place. … Continue in this manner examining each apple in turn until all of the good apples have been selected.
-
the instructions corresponding to an array language would be
Select all of the good apples from the box
-
-
Based on that analogy we classify programming languages as follows:
- one apple at a time Fortran, BASIC, Pascal, C++, Java
- all the apples at once APL, J, MATLAB, R, JULIA
Reading
The NumPy array: a structure for efficient numerical computing (2011)
-
The NumPy array: a structure for efficient numerical computation
-
NumPy arrays are the standard representation for numerical data in the Pythonic sphere of paradigms.
Three techniques are applied to improve performance
- vectorizing calculations
- avoiding copying data in-memory
- minimizing operation costs
-
-
A NumPy array is a multidimensional, uniform collection of elements
- An array is characterized by the type of elements it contains and by it shape
-
A matrix may be represented as an array of shape M x N
-
NumPy arrays can have any dimensionality
-
they contain other kinds and combinations of elements
- booleans, datetime
-
they contain other kinds and combinations of elements
-
NumPy arrays can have any dimensionality
structure of a NumPy array: a view on memory
-
NumPy array called ndarray describes memory using:
-
Data Pointer
- memory address of first byte in the array
-
Data Type Description
- the kind of elements contained in the array
-
Shape
- the shape of the array
-
Strides
- the number of bytes to skip in memory to proceed to the next element
-
Flags
-
defines whether we are allowed to modify the array
-
whether memory layout is C or Fortran contiguous, etc.
-
In C memory is laid out in "row-major" order
- rows are stored on after another in-memory
- In Fortran columns are stored successivelyx``
-
In C memory is laid out in "row-major" order
-
whether memory layout is C or Fortran contiguous, etc.
-
defines whether we are allowed to modify the array
-
Data Pointer
-
the strided memory model provides a way of viewing the same memory
-
in different ways without copying data
x = np.arange(6).reshape((2,3)) x.strides # (8, 4) # default integer is 4 bytes in-memory # skip 2 integers to get to the next row
-
by modifying strides
-
array can be transposed or reshaped at zero cost
- no memory needs to be copied
-
array can be transposed or reshaped at zero cost
-
the strides, shape, and dtype attributes of an array
- can be specified manually by the user
-
##################
# In each of these cases
# the resulting array points to the same memory
##################
>>> x = np.arange(60).reshape((6,10))
>>> xT = x.T
>>> xT
array([[ 0, 10, 20, 30, 40, 50],
[ 1, 11, 21, 31, 41, 51],
[ 2, 12, 22, 32, 42, 52],
[ 3, 13, 23, 33, 43, 53],
[ 4, 14, 24, 34, 44, 54],
[ 5, 15, 25, 35, 45, 55],
[ 6, 16, 26, 36, 46, 56],
[ 7, 17, 27, 37, 47, 57],
[ 8, 18, 28, 38, 48, 58],
[ 9, 19, 29, 39, 49, 59]])
>>> xT.strides
(4, 40)
>>> z = x.reshape((6,10))
>>> z
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])
>>> z.strides
(40, 4)
>>> z.dtype = np.dtype('uint8')
>>> z
array([[ 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0,
4, 0, 0, 0, 5, 0, 0, 0, 6, 0, 0, 0, 7, 0, 0, 0,
8, 0, 0, 0, 9, 0, 0, 0],
[10, 0, 0, 0, 11, 0, 0, 0, 12, 0, 0, 0, 13, 0, 0, 0,
14, 0, 0, 0, 15, 0, 0, 0, 16, 0, 0, 0, 17, 0, 0, 0,
18, 0, 0, 0, 19, 0, 0, 0],
[20, 0, 0, 0, 21, 0, 0, 0, 22, 0, 0, 0, 23, 0, 0, 0,
24, 0, 0, 0, 25, 0, 0, 0, 26, 0, 0, 0, 27, 0, 0, 0,
28, 0, 0, 0, 29, 0, 0, 0],
[30, 0, 0, 0, 31, 0, 0, 0, 32, 0, 0, 0, 33, 0, 0, 0,
34, 0, 0, 0, 35, 0, 0, 0, 36, 0, 0, 0, 37, 0, 0, 0,
38, 0, 0, 0, 39, 0, 0, 0],
[40, 0, 0, 0, 41, 0, 0, 0, 42, 0, 0, 0, 43, 0, 0, 0,
44, 0, 0, 0, 45, 0, 0, 0, 46, 0, 0, 0, 47, 0, 0, 0,
48, 0, 0, 0, 49, 0, 0, 0],
[50, 0, 0, 0, 51, 0, 0, 0, 52, 0, 0, 0, 53, 0, 0, 0,
54, 0, 0, 0, 55, 0, 0, 0, 56, 0, 0, 0, 57, 0, 0, 0,
58, 0, 0, 0, 59, 0, 0, 0]], dtype=uint8)
>>> z.shape
(6, 40)
>>> z.strides
(40, 1)
numerical operations on arrays: vectorization
- Grouping element-wise operations together as a process is vectorization
a = [1, 3, 5]
b = [3*x for x in a]
# VS.
# scalars and arrays
a = np.array([1, 3, 5])
b = 3 * a
#########
# fast element-wise operation
########
b - a
#########
# operations are broadcasted across the array
#########
m = np.arange(60).reshape((6,10))
b + m
####
# broadcasted arrays are never physically constructed
####
broadcasting rules
-
NumPy verifies all dimensions are matched before broadcasting two arrays
-
the dimension of the output array is expanded to the larger of the two
- therefore arrays are compatible to yield and output new shape
-
the dimension of the output array is expanded to the larger of the two
vectorization and broadcasting
-
evaluating functions
def f(x): return sin(x/2) ** 2 + cos(x) + cos(x) sin(x/2)**2 x = np.arange(1e5) y = [f(i) for i in x]
-
most array-based systems don't provide a convention for circumventing creation of temporary arrays
-
the user may choose to perform operations in-place
- by not allocating new memory and keeping all results in the current array
-
the user may choose to perform operations in-place
-
most array-based systems don't provide a convention for circumventing creation of temporary arrays
-
finite differencing
-
the derivative on a discrete sequence is usually computed using finite differencing
-
slicing makes this operation trivial
x = np.arange(0, 10, 2) y = x**2 dy_dx = (y[1:] - y[:-1]) / (x[1:] - x[:-1]) # y[1:] - y[:-1] has the effect of subtracting, # from each element in the array, # the element directly preceding it # calculating the central divided difference dy_dx_c = (y[2:]-y[:-2])/(x[2:]-x[:-2]) # using NumPy is 100x faster than Python
-
- performs the same differencing on the x array and dividing the two resulting arrays yields the forward divided difference
-
the derivative on a discrete sequence is usually computed using finite differencing
-
Creating a grid using broadcasting
-
suppose we want to produce a three-dimensional grid of distances
-
most vectorized programming languages require forming three intermediate MxN arrays, i, j, and k
########### # mgrid object # produces a meshgrid when sliced ############ i, j, k = np.mgrid[-100:100, -100:100, -100:100]
-
Broadcasting does not require large intermediate arrays
- construct coordinate vectors
-
Broadcasting does not require large intermediate arrays
-
# Construct the row vector: from-100 to +100 i = np.arange(-100, 100).reshape(200, 1, 1) # Construct the column vector j = np.reshape(i, (1, 200, 1)) # Construct the depth vector k = np.reshape(i, (1, 1, 200)) ### # OR JUST USE SHORTHAND ### i, j, k = np.ogrid[-100:100,-100:100,-100:100]
-
suppose we want to produce a three-dimensional grid of distances
-
Computer Vision
n x 3 array of 3D points 3 x 3 camera matrix
pts = np.random.random((100000, 3)) camera = np.array([[600., 0., 1800.], [...], [...]]) vecs = camera.dot(pts.T).T ###### # numpy's array operations with element-by-element # division and the broadcasting machinery ##### pixel_coords = vecs/vecs[:, 2, np.newaxis]
-
we want to transform 3D coordinates into 2D pixel locations on an image
-
this operation involves taking the matrix dot product of each point with the camera matrix
- then dividing the resulting vector by its third component
-
this operation involves taking the matrix dot product of each point with the camera matrix
-
When repeated operations take place on very LARGE chunks of memory
-
it may be better to use an outer for-loop
-
combined with a vectorized inner loop
- to make optimal use of system cache
-
combined with a vectorized inner loop
-
it may be better to use an outer for-loop
-
we want to transform 3D coordinates into 2D pixel locations on an image
Sharing Data
Efficient I/O with memory mapping
-
an array storedon disk may be addressed directly without copying it to memory using memory mapping
-
its useful for addressing only a small portion of a very large array
a = np.memmap('/tmp/myarray.memmap', mode='write', shape=(300,300), dtype=np.int) a.flat = np.arange(300* 300) # when the flush is called its data is written to disk a.flush() # the array can now be loaded # and parts of it can be manipulated b = np.memmap('/tmp/myarray.memmap', mode='r+', shape=(300, 300), dtype=np.int) b[100, :] *= 2 b.flush()
-
the array interface for foreign blocks of memory
-
NumPy defines an array interface
-
that specifies how a given objects
- exposes a block of memory
-
that specifies how a given objects
-
NumPy knows how to view any object
-
with a valid arrayinterface dictionary attribute as an array
-
important values include
-
data, address of data in-memory
-
shape
-
typestr
import ctypes class MutableString(object): def __init__(self, s): # Allocate string memory self._s = ctypes.create_string_buffer(s) self.__array_interface__ = { # Shape of the array ’shape’: (len(s),), # Address of data, # the memory is not read-only ’data’: (ctypes.addressof(self._s), False), # Stores 1-byte unsigned integers. # "|" indicates that Endianess is # irrelevant for this data-type. ’typestr’: ’|u1’, } def __str__(self): #"Convert to a string for printing." return str(buffer(self._s))
-
-
important values include
-
with a valid arrayinterface dictionary attribute as an array
structured data-types to expose complex data
-
NumPy arrays are homogeneous
- each element of the array has the same type
-
arrays that store compound elements
- combinations of data-types
- are strucutred arrays
dt = np.dtype([('time', np.uint64),
('pos', [('x', float),
('y', float)])])
x = np.array([(1, (0,0.5)),
(2, (0, 10.3)),
(3, (5.5, 1.1))],
dtype=dt)
x['time']
times = (x['time'] >= 2)
x[times]['pos']['x']
################
# reading complex binary files
##################
data = np.fromfile("foo.dat", dtype=dt)
Basics
Getting Started
Creating NumPy Arrays
import numpy as np
# 1D Array creation
arr_1d = np.array([1,2,3])
print(f"1D array: {arr_1d}")
# 2D Array
arr_2d = np.array([1,2,3],[4,5,6])
print(f"2D array: {arr_2d}")
# create an array of zeros
zeros = np.zeros((3,3)) # 3x3 matrix
print(f"array of zeros: {zeros}")
# create an array of ones
ones = np.ones((2,4))
print(f"array of ones: {ones}")
# create an array with a range of values
range_arr = np.arange(0, 10, 2)
print(f"range array: {range_arr}")
Basic Array Operations
# Array Addition
arr_sum = arr_1d + 10
# Array Multiplication
arr_product = arr_1d * 2
# Element-wise multiplication of two arrays
arr_2d_product = arr_2d * np.array([1, 2, 3])
Reshaping Arrays
# 1D to 2D array
arr_reshaped = np.arange(12).reshape((3,4)) # 3 rows 4 cols
Mathematical Functions on Arrays
sqrt_array = np.sqrt(arr_reshaped)
# add 1 to avoid log(0)
log_array = np.log1p(arr_reshaped)
Statistical Operations
arr = np.array([1, 2, 3, 4, 5, 6, 7, 9, 10])
mean_v = np.mean(arr)
median_v = np.median(arr)
std_v = np.std(arr)
Random Data Generation
random_coords = np.random.uniform(low=-90, high=90, size=(5, 2))
Indexing
**NumPy targets the CPython reference implementation of Python NumPy uses C-Order Indexing
-
that means the last index usually represents the most rapidly changing memory location
- Fortran and IDL use the first index to represent the most rapidly changing location in memory
>>> x = np.arange(0b110)
>>> x
array([0, 1, 2, 3, 4, 5])
>>> x.shape = (0b1, 0b10, 0b11)
>>> x
array([[[0, 1, 2],
[3, 4, 5]]])
>>> x[0,1]
array([3, 4, 5])
>>> x[0][1]
array([3, 4, 5])
>>> x[:0]
array([], shape=(0, 2, 3), dtype=int32)
###############
# slicing and string arrays
###############
>>> x = np.arange(0b1111)
>>> x
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
>>> x.shape = (0b011, 0b101,)
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[2:5:12]
array([[10, 11, 12, 13, 14]])
>>> x[2::12]
array([[10, 11, 12, 13, 14]])
>>> x[2:12]
array([[10, 11, 12, 13, 14]])
>>> x[4:12]
array([], shape=(0, 5), dtype=int32)
>>> x[1:12]
array([[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> x[0:12]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> arr = np.arange(12).reshape((2,6))
>>> arr
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
>>> condition = arr > 5
>>> arr[condition]
array([ 6, 7, 8, 9, 10, 11])