NumPy is esssential for numerical operations and handling arrays
- in a high-level language

History

Python extended for indexing syntax to make array computing easier

Array indexing refers to the use of the square brackets to index array values

#########
# Single Element Indexing
#####
>>> import numpy as np
>>> x = np.arange(0b11)
>>> x[1]
1
>>> x[0]
0

Extending Python for Numerical Computation

Python includes numerical extensions implementing a matrix package
- with influences from Array Programming Languages
  - rather than retrofit an existing numerical language to support the wealth of features found in a general-purpose programming language, it makes more sense to attack the problem from the other end instead
All of the basic numeric operators are supported element-wise for matrices
- basic array comparison and logical operators are provided as methods
The matrix object is designed to support array of arbitrary dimensions
Mapping semantics are used to support multi-dimensional product form indexing for matrix objects
- multi-dimension indices are a sequence of python objects
  - the first object corresponds to the first dimension of the matrix, and so on
    - the following objects are possible for each dimension
      - a single integer indicating one element from that dimension and a reduction in the number of dimensions in the returned matrix
      - a sequence of integers not neccesarily unique, selecting an arbitrary collection of elements along that dimension
      - a slice object which selects a range of elements along the dimension with the optional start index, stop index, and step size
Every aritimetic operator is implemented as a special ofunc object with python
- these functions are to be called with matrix or scalar arguments to return the normal result
  - these functions can be subscripted to indicate that they should be applied at specific rank
    - and they can be modified to indicate that instead of a direct product, they should be applied as a generalized outer, or inner product, or as a reduction or accumulation.
Full range of primitive data types
- Matrix object supports arrays of chars, bytes, shorts, ints, longs, floats, doubles, complex floats, complex doubles, and raw python objects
No changes to the Python Core
- two new types designed
  - one for matrices and one for the functions on matrices
- and the module

Numarray and Numpy

A new package called Numarray, which had faster operations for large arrays but slower than Numeric on small ones
NumPy was created to merge both Numeric and Numarray array objects in python
- Numarray features were ported to Numeric

What is an Array Programming Language?

https://web.archive.org/web/20110227013846/http://www.vector.org.uk/archive/v223/smill222.htm

lets use a simple metaphor for classifying programming languages and follow this with examples of programming a simple example in both a hypothetical and representative machine language and in BASIC
- which we will consider representative of a conventional programming language
Suppose we have a box of apples from which we wish to select all of the good apples
- the instructions should conventionally be something like
  
  Reserve a place for the good apples. Then select an apple from the box, and if it is good put it in a reserved place. Select a second apple, and if it is good put it too in the reserved place. … Continue in this manner examining each apple in turn until all of the good apples have been selected.
- the instructions corresponding to an array language would be
  
  Select all of the good apples from the box
Based on that analogy we classify programming languages as follows:
- one apple at a time Fortran, BASIC, Pascal, C++, Java
- all the apples at once APL, J, MATLAB, R, JULIA

Reading

The NumPy array: a structure for efficient numerical computing (2011)

The NumPy array: a structure for efficient numerical computation
- NumPy arrays are the standard representation for numerical data in the Pythonic sphere of paradigms.
  
  Three techniques are applied to improve performance
  - vectorizing calculations
  - avoiding copying data in-memory
  - minimizing operation costs
A NumPy array is a multidimensional, uniform collection of elements
- An array is characterized by the type of elements it contains and by it shape
- A matrix may be represented as an array of shape M x N
  - NumPy arrays can have any dimensionality
    - they contain other kinds and combinations of elements
      - booleans, datetime
NumPy is a convenient way of describing one or more blocks of computer memory, so that numbers represented may be easily manipulated

structure of a NumPy array: a view on memory

NumPy array called ndarray describes memory using:
- Data Pointer
  - memory address of first byte in the array
- Data Type Description
  - the kind of elements contained in the array
- Shape
  - the shape of the array
- Strides
  - the number of bytes to skip in memory to proceed to the next element
- Flags
  - defines whether we are allowed to modify the array
    - whether memory layout is C or Fortran contiguous, etc.
      - In C memory is laid out in "row-major" order
        
        rows are stored on after another in-memory
      - In Fortran columns are stored successivelyx``
the strided memory model provides a way of viewing the same memory
- in different ways without copying data
```
x = np.arange(6).reshape((2,3))

x.strides
# (8, 4) # default integer is 4 bytes in-memory
            # skip 2 integers to get to the next row
```
- by modifying strides
  - array can be transposed or reshaped at zero cost
    - no memory needs to be copied
- the strides, shape, and dtype attributes of an array
  - can be specified manually by the user

##################
# In each of these cases
# the resulting array points to the same memory
##################
>>> x = np.arange(60).reshape((6,10))
>>> xT = x.T
>>> xT
array([[ 0, 10, 20, 30, 40, 50],
       [ 1, 11, 21, 31, 41, 51],
       [ 2, 12, 22, 32, 42, 52],
       [ 3, 13, 23, 33, 43, 53],
       [ 4, 14, 24, 34, 44, 54],
       [ 5, 15, 25, 35, 45, 55],
       [ 6, 16, 26, 36, 46, 56],
       [ 7, 17, 27, 37, 47, 57],
       [ 8, 18, 28, 38, 48, 58],
       [ 9, 19, 29, 39, 49, 59]])
>>> xT.strides
(4, 40)
>>> z = x.reshape((6,10))
>>> z
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])
>>> z.strides
(40, 4)
>>> z.dtype = np.dtype('uint8')
>>> z
array([[ 0,  0,  0,  0,  1,  0,  0,  0,  2,  0,  0,  0,  3,  0,  0,  0,
         4,  0,  0,  0,  5,  0,  0,  0,  6,  0,  0,  0,  7,  0,  0,  0,
         8,  0,  0,  0,  9,  0,  0,  0],
       [10,  0,  0,  0, 11,  0,  0,  0, 12,  0,  0,  0, 13,  0,  0,  0,
        14,  0,  0,  0, 15,  0,  0,  0, 16,  0,  0,  0, 17,  0,  0,  0,
        18,  0,  0,  0, 19,  0,  0,  0],
       [20,  0,  0,  0, 21,  0,  0,  0, 22,  0,  0,  0, 23,  0,  0,  0,
        24,  0,  0,  0, 25,  0,  0,  0, 26,  0,  0,  0, 27,  0,  0,  0,
        28,  0,  0,  0, 29,  0,  0,  0],
       [30,  0,  0,  0, 31,  0,  0,  0, 32,  0,  0,  0, 33,  0,  0,  0,
        34,  0,  0,  0, 35,  0,  0,  0, 36,  0,  0,  0, 37,  0,  0,  0,
        38,  0,  0,  0, 39,  0,  0,  0],
       [40,  0,  0,  0, 41,  0,  0,  0, 42,  0,  0,  0, 43,  0,  0,  0,
        44,  0,  0,  0, 45,  0,  0,  0, 46,  0,  0,  0, 47,  0,  0,  0,
        48,  0,  0,  0, 49,  0,  0,  0],
       [50,  0,  0,  0, 51,  0,  0,  0, 52,  0,  0,  0, 53,  0,  0,  0,
        54,  0,  0,  0, 55,  0,  0,  0, 56,  0,  0,  0, 57,  0,  0,  0,
        58,  0,  0,  0, 59,  0,  0,  0]], dtype=uint8)
>>> z.shape
(6, 40)
 >>> z.strides
(40, 1)

numerical operations on arrays: vectorization

Grouping element-wise operations together as a process is vectorization

a = [1, 3, 5]

b = [3*x for x in a]

# VS.

# scalars and arrays

a = np.array([1, 3, 5])

b = 3 * a


#########
# fast element-wise operation
########
b - a

#########
# operations are broadcasted across the array
#########
m = np.arange(60).reshape((6,10))

b + m

####
# broadcasted arrays are never physically constructed
####

broadcasting rules

NumPy verifies all dimensions are matched before broadcasting two arrays
- the dimension of the output array is expanded to the larger of the two
  - therefore arrays are compatible to yield and output new shape

vectorization and broadcasting

evaluating functions
```
def f(x):
    return sin(x/2) ** 2 + cos(x) + cos(x) sin(x/2)**2

x = np.arange(1e5)

y = [f(i) for i in x]
```
- most array-based systems don't provide a convention for circumventing creation of temporary arrays
  - the user may choose to perform operations in-place
    - by not allocating new memory and keeping all results in the current array

finite differencing

the derivative on a discrete sequence is usually computed using finite differencing

slicing makes this operation trivial

x = np.arange(0, 10, 2)

y = x**2

dy_dx = (y[1:] - y[:-1]) / (x[1:] - x[:-1])
# y[1:] - y[:-1] has the effect of subtracting,
# from each element in the array,
# the element directly preceding it

# calculating the central divided difference
dy_dx_c = (y[2:]-y[:-2])/(x[2:]-x[:-2])
# using NumPy is 100x faster than Python

performs the same differencing on the x array and dividing the two resulting arrays yields the forward divided difference

Creating a grid using broadcasting

suppose we want to produce a three-dimensional grid of distances
- most vectorized programming languages require forming three intermediate MxN arrays, i, j, and k
```
###########
# mgrid object
# produces a meshgrid when sliced
############
i, j, k = np.mgrid[-100:100, -100:100, -100:100]
```
  - Broadcasting does not require large intermediate arrays
    - construct coordinate vectors

# Construct the row vector: from-100 to +100
i = np.arange(-100, 100).reshape(200, 1, 1)
# Construct the column vector
j = np.reshape(i, (1, 200, 1))
# Construct the depth vector
k = np.reshape(i, (1, 1, 200))

###
# OR JUST USE SHORTHAND
###
i, j, k = np.ogrid[-100:100,-100:100,-100:100]

Computer Vision

n x 3 array of 3D points 3 x 3 camera matrix
```
pts  = np.random.random((100000, 3))
camera = np.array([[600., 0., 1800.],
                  [...],
                  [...]])

vecs = camera.dot(pts.T).T

######
# numpy's array operations with element-by-element
# division and the broadcasting machinery
#####
pixel_coords = vecs/vecs[:, 2, np.newaxis]
```
- we want to transform 3D coordinates into 2D pixel locations on an image
  - this operation involves taking the matrix dot product of each point with the camera matrix
    - then dividing the resulting vector by its third component
- When repeated operations take place on very LARGE chunks of memory
  - it may be better to use an outer for-loop
    - combined with a vectorized inner loop
      - to make optimal use of system cache

Efficient I/O with memory mapping

an array storedon disk may be addressed directly without copying it to memory using memory mapping

its useful for addressing only a small portion of a very large array

a = np.memmap('/tmp/myarray.memmap',
              mode='write', shape=(300,300),
              dtype=np.int)
a.flat = np.arange(300* 300)

# when the flush is called its data is written to disk
a.flush()
# the array can now be loaded
# and parts of it can be manipulated
b = np.memmap('/tmp/myarray.memmap',
              mode='r+', shape=(300, 300),
              dtype=np.int)
b[100, :] *= 2
b.flush()

the array interface for foreign blocks of memory

NumPy defines an array interface
- that specifies how a given objects
  - exposes a block of memory

NumPy knows how to view any object

with a valid array_interface dictionary attribute as an array

important values include

data, address of data in-memory
shape

typestr

import ctypes
class MutableString(object):
    def __init__(self, s):
        # Allocate string memory
        self._s = ctypes.create_string_buffer(s)
        self.__array_interface__ = {
            # Shape of the array
            ’shape’: (len(s),),
            # Address of data,
            # the memory is not read-only
            ’data’: (ctypes.addressof(self._s), False),
            # Stores 1-byte unsigned integers.
            # "|" indicates that Endianess is
            # irrelevant for this data-type.
            ’typestr’: ’|u1’,
        }
     def __str__(self):
         #"Convert to a string for printing."
         return str(buffer(self._s))

structured data-types to expose complex data

NumPy arrays are homogeneous
- each element of the array has the same type
- arrays that store compound elements
  - combinations of data-types
  - are strucutred arrays

dt = np.dtype([('time', np.uint64),
               ('pos', [('x', float),
                        ('y', float)])])
x = np.array([(1, (0,0.5)),
              (2, (0, 10.3)),
              (3, (5.5, 1.1))],
             dtype=dt)

x['time']

times = (x['time'] >= 2)

x[times]['pos']['x']

################
# reading complex binary files
##################
data = np.fromfile("foo.dat", dtype=dt)

Basics

Getting Started

Creating NumPy Arrays

import numpy as np
# 1D Array creation
arr_1d = np.array([1,2,3])
print(f"1D array: {arr_1d}")
# 2D Array
arr_2d = np.array([1,2,3],[4,5,6])
print(f"2D array: {arr_2d}")

# create an array of zeros
zeros = np.zeros((3,3)) # 3x3 matrix
print(f"array of zeros: {zeros}")
# create an array of ones
ones = np.ones((2,4))
print(f"array of ones: {ones}")
# create an array with a range of values
range_arr = np.arange(0, 10, 2)
print(f"range array: {range_arr}")

Basic Array Operations


# Array Addition
arr_sum = arr_1d + 10

# Array Multiplication
arr_product = arr_1d * 2

# Element-wise multiplication of two arrays
arr_2d_product = arr_2d * np.array([1, 2, 3])

Reshaping Arrays

# 1D to 2D array
arr_reshaped = np.arange(12).reshape((3,4)) # 3 rows 4 cols

Mathematical Functions on Arrays

sqrt_array = np.sqrt(arr_reshaped)

# add 1 to avoid log(0)
log_array = np.log1p(arr_reshaped)

Statistical Operations

arr = np.array([1, 2, 3, 4, 5, 6, 7, 9, 10])
mean_v = np.mean(arr)
median_v = np.median(arr)
std_v = np.std(arr)

Random Data Generation

random_coords = np.random.uniform(low=-90, high=90, size=(5, 2))

Indexing

**NumPy targets the CPython reference implementation of Python NumPy uses C-Order Indexing

that means the last index usually represents the most rapidly changing memory location
- Fortran and IDL use the first index to represent the most rapidly changing location in memory

>>> x = np.arange(0b110)
>>> x
array([0, 1, 2, 3, 4, 5])

>>> x.shape = (0b1, 0b10, 0b11)
>>> x
array([[[0, 1, 2],
        [3, 4, 5]]])
>>> x[0,1]
array([3, 4, 5])
>>> x[0][1]
array([3, 4, 5])
>>> x[:0]
array([], shape=(0, 2, 3), dtype=int32)
###############
# slicing and string arrays
###############
>>> x = np.arange(0b1111)
>>> x
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
>>> x.shape = (0b011, 0b101,)
>>> x
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> x[2:5:12]
array([[10, 11, 12, 13, 14]])
>>> x[2::12]
array([[10, 11, 12, 13, 14]])
>>> x[2:12]
array([[10, 11, 12, 13, 14]])
>>> x[4:12]
array([], shape=(0, 5), dtype=int32)
>>> x[1:12]
array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> x[0:12]
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> arr = np.arange(12).reshape((2,6))
>>> arr
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])
>>> condition = arr > 5
>>> arr[condition]
array([ 6,  7,  8,  9, 10, 11])

Num Py