Num Py

  • NumPy is esssential for numerical operations and handling arrays
    • in a high-level language

History

Python extended for indexing syntax to make array computing easier

  • Array indexing refers to the use of the square brackets to index array values

    #########
    # Single Element Indexing
    #####
    >>> import numpy as np
    >>> x = np.arange(0b11)
    >>> x[1]
    1
    >>> x[0]
    0
    

Extending Python for Numerical Computation

  • Python includes numerical extensions implementing a matrix package
    • with influences from Array Programming Languages
      • rather than retrofit an existing numerical language to support the wealth of features found in a general-purpose programming language, it makes more sense to attack the problem from the other end instead
  • All of the basic numeric operators are supported element-wise for matrices
    • basic array comparison and logical operators are provided as methods
  • The matrix object is designed to support array of arbitrary dimensions
  • Mapping semantics are used to support multi-dimensional product form indexing for matrix objects
    • multi-dimension indices are a sequence of python objects
      • the first object corresponds to the first dimension of the matrix, and so on
        • the following objects are possible for each dimension
          • a single integer indicating one element from that dimension and a reduction in the number of dimensions in the returned matrix
          • a sequence of integers not neccesarily unique, selecting an arbitrary collection of elements along that dimension
          • a slice object which selects a range of elements along the dimension with the optional start index, stop index, and step size
  • Every aritimetic operator is implemented as a special ofunc object with python
    • these functions are to be called with matrix or scalar arguments to return the normal result
      • these functions can be subscripted to indicate that they should be applied at specific rank
        • and they can be modified to indicate that instead of a direct product, they should be applied as a generalized outer, or inner product, or as a reduction or accumulation.
  • Full range of primitive data types
    • Matrix object supports arrays of chars, bytes, shorts, ints, longs, floats, doubles, complex floats, complex doubles, and raw python objects
  • No changes to the Python Core
    • two new types designed
      • one for matrices and one for the functions on matrices
    • and the module

Numarray and Numpy

  • A new package called Numarray, which had faster operations for large arrays but slower than Numeric on small ones

  • NumPy was created to merge both Numeric and Numarray array objects in python

    • Numarray features were ported to Numeric

What is an Array Programming Language?

https://web.archive.org/web/20110227013846/http://www.vector.org.uk/archive/v223/smill222.htm

  • lets use a simple metaphor for classifying programming languages and follow this with examples of programming a simple example in both a hypothetical and representative machine language and in BASIC
    • which we will consider representative of a conventional programming language
  • Suppose we have a box of apples from which we wish to select all of the good apples
    • the instructions should conventionally be something like

      Reserve a place for the good apples. Then select an apple from the box, and if it is good put it in a reserved place. Select a second apple, and if it is good put it too in the reserved place. … Continue in this manner examining each apple in turn until all of the good apples have been selected.

    • the instructions corresponding to an array language would be

      Select all of the good apples from the box

  • Based on that analogy we classify programming languages as follows:
    • one apple at a time Fortran, BASIC, Pascal, C++, Java
    • all the apples at once APL, J, MATLAB, R, JULIA

Reading

The NumPy array: a structure for efficient numerical computing (2011)

  • The NumPy array: a structure for efficient numerical computation
    • NumPy arrays are the standard representation for numerical data in the Pythonic sphere of paradigms.

      Three techniques are applied to improve performance

      • vectorizing calculations
      • avoiding copying data in-memory
      • minimizing operation costs
  • A NumPy array is a multidimensional, uniform collection of elements
    • An array is characterized by the type of elements it contains and by it shape
    • A matrix may be represented as an array of shape M x N
      • NumPy arrays can have any dimensionality
        • they contain other kinds and combinations of elements
          • booleans, datetime
    NumPy is a convenient way of describing one or more blocks of computer memory, so that numbers represented may be easily manipulated

structure of a NumPy array: a view on memory

  • NumPy array called ndarray describes memory using:
    • Data Pointer
      • memory address of first byte in the array
    • Data Type Description
      • the kind of elements contained in the array
    • Shape
      • the shape of the array
    • Strides
      • the number of bytes to skip in memory to proceed to the next element
    • Flags
      • defines whether we are allowed to modify the array
        • whether memory layout is C or Fortran contiguous, etc.
          • In C memory is laid out in "row-major" order
            • rows are stored on after another in-memory
          • In Fortran columns are stored successivelyx``
  • the strided memory model provides a way of viewing the same memory
    • in different ways without copying data

      x = np.arange(6).reshape((2,3))
      
      x.strides
      # (8, 4) # default integer is 4 bytes in-memory
                  # skip 2 integers to get to the next row
      
    • by modifying strides

      • array can be transposed or reshaped at zero cost
        • no memory needs to be copied
    • the strides, shape, and dtype attributes of an array

      • can be specified manually by the user
##################
# In each of these cases
# the resulting array points to the same memory
##################
>>> x = np.arange(60).reshape((6,10))
>>> xT = x.T
>>> xT
array([[ 0, 10, 20, 30, 40, 50],
       [ 1, 11, 21, 31, 41, 51],
       [ 2, 12, 22, 32, 42, 52],
       [ 3, 13, 23, 33, 43, 53],
       [ 4, 14, 24, 34, 44, 54],
       [ 5, 15, 25, 35, 45, 55],
       [ 6, 16, 26, 36, 46, 56],
       [ 7, 17, 27, 37, 47, 57],
       [ 8, 18, 28, 38, 48, 58],
       [ 9, 19, 29, 39, 49, 59]])
>>> xT.strides
(4, 40)
>>> z = x.reshape((6,10))
>>> z
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])
>>> z.strides
(40, 4)
>>> z.dtype = np.dtype('uint8')
>>> z
array([[ 0,  0,  0,  0,  1,  0,  0,  0,  2,  0,  0,  0,  3,  0,  0,  0,
         4,  0,  0,  0,  5,  0,  0,  0,  6,  0,  0,  0,  7,  0,  0,  0,
         8,  0,  0,  0,  9,  0,  0,  0],
       [10,  0,  0,  0, 11,  0,  0,  0, 12,  0,  0,  0, 13,  0,  0,  0,
        14,  0,  0,  0, 15,  0,  0,  0, 16,  0,  0,  0, 17,  0,  0,  0,
        18,  0,  0,  0, 19,  0,  0,  0],
       [20,  0,  0,  0, 21,  0,  0,  0, 22,  0,  0,  0, 23,  0,  0,  0,
        24,  0,  0,  0, 25,  0,  0,  0, 26,  0,  0,  0, 27,  0,  0,  0,
        28,  0,  0,  0, 29,  0,  0,  0],
       [30,  0,  0,  0, 31,  0,  0,  0, 32,  0,  0,  0, 33,  0,  0,  0,
        34,  0,  0,  0, 35,  0,  0,  0, 36,  0,  0,  0, 37,  0,  0,  0,
        38,  0,  0,  0, 39,  0,  0,  0],
       [40,  0,  0,  0, 41,  0,  0,  0, 42,  0,  0,  0, 43,  0,  0,  0,
        44,  0,  0,  0, 45,  0,  0,  0, 46,  0,  0,  0, 47,  0,  0,  0,
        48,  0,  0,  0, 49,  0,  0,  0],
       [50,  0,  0,  0, 51,  0,  0,  0, 52,  0,  0,  0, 53,  0,  0,  0,
        54,  0,  0,  0, 55,  0,  0,  0, 56,  0,  0,  0, 57,  0,  0,  0,
        58,  0,  0,  0, 59,  0,  0,  0]], dtype=uint8)
>>> z.shape
(6, 40)
 >>> z.strides
(40, 1)

numerical operations on arrays: vectorization

  • Grouping element-wise operations together as a process is vectorization
a = [1, 3, 5]

b = [3*x for x in a]

# VS.

# scalars and arrays

a = np.array([1, 3, 5])

b = 3 * a


#########
# fast element-wise operation
########
b - a

#########
# operations are broadcasted across the array
#########
m = np.arange(60).reshape((6,10))

b + m

####
# broadcasted arrays are never physically constructed
####

broadcasting rules

  • NumPy verifies all dimensions are matched before broadcasting two arrays
    • the dimension of the output array is expanded to the larger of the two
      • therefore arrays are compatible to yield and output new shape

vectorization and broadcasting

  • evaluating functions

    def f(x):
        return sin(x/2) ** 2 + cos(x) + cos(x) sin(x/2)**2
    
    x = np.arange(1e5)
    
    y = [f(i) for i in x]
    
    • most array-based systems don't provide a convention for circumventing creation of temporary arrays
      • the user may choose to perform operations in-place
        • by not allocating new memory and keeping all results in the current array
  • finite differencing

    • the derivative on a discrete sequence is usually computed using finite differencing
      • slicing makes this operation trivial

        x = np.arange(0, 10, 2)
        
        y = x**2
        
        dy_dx = (y[1:] - y[:-1]) / (x[1:] - x[:-1])
        # y[1:] - y[:-1] has the effect of subtracting,
        # from each element in the array,
        # the element directly preceding it
        
        # calculating the central divided difference
        dy_dx_c = (y[2:]-y[:-2])/(x[2:]-x[:-2])
        # using NumPy is 100x faster than Python
        
    • performs the same differencing on the x array and dividing the two resulting arrays yields the forward divided difference
  • Creating a grid using broadcasting

    • suppose we want to produce a three-dimensional grid of distances
      • most vectorized programming languages require forming three intermediate MxN arrays, i, j, and k

        ###########
        # mgrid object
        # produces a meshgrid when sliced
        ############
        i, j, k = np.mgrid[-100:100, -100:100, -100:100]
        
        • Broadcasting does not require large intermediate arrays
          • construct coordinate vectors
    # Construct the row vector: from-100 to +100
    i = np.arange(-100, 100).reshape(200, 1, 1)
    # Construct the column vector
    j = np.reshape(i, (1, 200, 1))
    # Construct the depth vector
    k = np.reshape(i, (1, 1, 200))
    
    ###
    # OR JUST USE SHORTHAND
    ###
    i, j, k = np.ogrid[-100:100,-100:100,-100:100]
    
  • Computer Vision

    n x 3 array of 3D points 3 x 3 camera matrix

    pts  = np.random.random((100000, 3))
    camera = np.array([[600., 0., 1800.],
                      [...],
                      [...]])
    
    vecs = camera.dot(pts.T).T
    
    ######
    # numpy's array operations with element-by-element
    # division and the broadcasting machinery
    #####
    pixel_coords = vecs/vecs[:, 2, np.newaxis]
    
    • we want to transform 3D coordinates into 2D pixel locations on an image
      • this operation involves taking the matrix dot product of each point with the camera matrix
        • then dividing the resulting vector by its third component
    • When repeated operations take place on very LARGE chunks of memory
      • it may be better to use an outer for-loop
        • combined with a vectorized inner loop
          • to make optimal use of system cache

Sharing Data

Efficient I/O with memory mapping

  • an array storedon disk may be addressed directly without copying it to memory using memory mapping
    • its useful for addressing only a small portion of a very large array

      a = np.memmap('/tmp/myarray.memmap',
                    mode='write', shape=(300,300),
                    dtype=np.int)
      a.flat = np.arange(300* 300)
      
      # when the flush is called its data is written to disk
      a.flush()
      # the array can now be loaded
      # and parts of it can be manipulated
      b = np.memmap('/tmp/myarray.memmap',
                    mode='r+', shape=(300, 300),
                    dtype=np.int)
      b[100, :] *= 2
      b.flush()
      

the array interface for foreign blocks of memory

  • NumPy defines an array interface
    • that specifies how a given objects
      • exposes a block of memory
  • NumPy knows how to view any object
    • with a valid arrayinterface dictionary attribute as an array
      • important values include
        • data, address of data in-memory

        • shape

        • typestr

          import ctypes
          class MutableString(object):
              def __init__(self, s):
                  # Allocate string memory
                  self._s = ctypes.create_string_buffer(s)
                  self.__array_interface__ = {
                      # Shape of the array
                      ’shape’: (len(s),),
                      # Address of data,
                      # the memory is not read-only
                      ’data’: (ctypes.addressof(self._s), False),
                      # Stores 1-byte unsigned integers.
                      # "|" indicates that Endianess is
                      # irrelevant for this data-type.
                      ’typestr’: ’|u1’,
                  }
               def __str__(self):
                   #"Convert to a string for printing."
                   return str(buffer(self._s))
          

structured data-types to expose complex data

  • NumPy arrays are homogeneous
    • each element of the array has the same type
    • arrays that store compound elements
      • combinations of data-types
      • are strucutred arrays
dt = np.dtype([('time', np.uint64),
               ('pos', [('x', float),
                        ('y', float)])])
x = np.array([(1, (0,0.5)),
              (2, (0, 10.3)),
              (3, (5.5, 1.1))],
             dtype=dt)

x['time']

times = (x['time'] >= 2)

x[times]['pos']['x']

################
# reading complex binary files
##################
data = np.fromfile("foo.dat", dtype=dt)

Basics

Getting Started

Creating NumPy Arrays

import numpy as np
# 1D Array creation
arr_1d = np.array([1,2,3])
print(f"1D array: {arr_1d}")
# 2D Array
arr_2d = np.array([1,2,3],[4,5,6])
print(f"2D array: {arr_2d}")

# create an array of zeros
zeros = np.zeros((3,3)) # 3x3 matrix
print(f"array of zeros: {zeros}")
# create an array of ones
ones = np.ones((2,4))
print(f"array of ones: {ones}")
# create an array with a range of values
range_arr = np.arange(0, 10, 2)
print(f"range array: {range_arr}")

Basic Array Operations


# Array Addition
arr_sum = arr_1d + 10

# Array Multiplication
arr_product = arr_1d * 2

# Element-wise multiplication of two arrays
arr_2d_product = arr_2d * np.array([1, 2, 3])

Reshaping Arrays

# 1D to 2D array
arr_reshaped = np.arange(12).reshape((3,4)) # 3 rows 4 cols

Mathematical Functions on Arrays

sqrt_array = np.sqrt(arr_reshaped)

# add 1 to avoid log(0)
log_array = np.log1p(arr_reshaped)

Statistical Operations

arr = np.array([1, 2, 3, 4, 5, 6, 7, 9, 10])
mean_v = np.mean(arr)
median_v = np.median(arr)
std_v = np.std(arr)

Random Data Generation

random_coords = np.random.uniform(low=-90, high=90, size=(5, 2))

Indexing

**NumPy targets the CPython reference implementation of Python NumPy uses C-Order Indexing

  • that means the last index usually represents the most rapidly changing memory location
    • Fortran and IDL use the first index to represent the most rapidly changing location in memory
>>> x = np.arange(0b110)
>>> x
array([0, 1, 2, 3, 4, 5])

>>> x.shape = (0b1, 0b10, 0b11)
>>> x
array([[[0, 1, 2],
        [3, 4, 5]]])
>>> x[0,1]
array([3, 4, 5])
>>> x[0][1]
array([3, 4, 5])
>>> x[:0]
array([], shape=(0, 2, 3), dtype=int32)
###############
# slicing and string arrays
###############
>>> x = np.arange(0b1111)
>>> x
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
>>> x.shape = (0b011, 0b101,)
>>> x
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> x[2:5:12]
array([[10, 11, 12, 13, 14]])
>>> x[2::12]
array([[10, 11, 12, 13, 14]])
>>> x[2:12]
array([[10, 11, 12, 13, 14]])
>>> x[4:12]
array([], shape=(0, 5), dtype=int32)
>>> x[1:12]
array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> x[0:12]
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> arr = np.arange(12).reshape((2,6))
>>> arr
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])
>>> condition = arr > 5
>>> arr[condition]
array([ 6,  7,  8,  9, 10, 11])