Autocorrelation Coeff Function

Overview

what is a random process?

  • a random process is a function indexed by a random key

autocorrelation function of a random process \(X(t)\) is \(R_X(t_1-t_2) = \mathbb{E}[X(t_1)X(t_2)]\)

  • takes two time instances \(t_1\) and \(t_2\)
    • since \(X(t_1)\) and \(X(t_2)\) are random variables

\(R_X(t_1,t_2) = \mathbb{E}[X(t_1)X(t_2)]\)

  • measure the correlation of these random variables

autocorrelation means correlation with itself

\(ACF \approx cor(y, y_{lagged})\)

Correlation Coefficient

Let \(X\) be a random variable, and \(\{x_1, x_2,..., x_T\}\) a random sample

\(E(X) = \mu\) is the population mean, and \(\bar{x}=\sum_{i=1}^T x_i/T\) is the sample mean

\(\sigma^2 = E((X - \mu)^)\) is the population variance \(S^2 = \sum_{i=1}^T (x_i - \bar{x})/(T-1)\) is the sample variance

Let \(X\) and \(Y\) be random variables, then theoretical correlation coefficient

  • between \(X\) and \(Y\) is \(\rho = E((X-\mu_x)(Y-\mu_y))\)

Let \((x_1,y_1),(x_2,y_2),...,(x_T,y_T)\) be the observations (random sample) of \((X,Y)\) then the sample correlation coefficient between \(X\) and \(Y\) is…

\(r = cor(x,y) = \frac{cov(x,y)}{s_x s_y} = \frac{\sum_{i=1}^T(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^T(x_i-\bar{x})\sum_{i=1}^T(y_i-\bar{y})}}\) when sample size \(T\) is large,

\(\mu \approx \bar{x}\)

\(\sigma^2 \approx S^2\)

\(\rho \approx r\)

Stationarity around a linear trend

doesn't matter where you start from

suppose the variable \(Y\) evolves according to \(Y_t = a \cdot t + b + e_t\)

  • where \(t\) is time and \(e_t\) is error term
    • hypothesized to be white noise
      • or more generally to have been generated by any stationary process

then one can use linear regression

  • to obtain an estimate \(\hat{a}\)
    • of the true underlying trend slope \(a\)
      • and an estimate \(\hat{b}\)
        • of the underlying to intercept term \(b\)
  • if the estimate \(\hat{a}\) is significantly different from zero
    • this is sufficient to show with high confidence that the variable \(Y\) is non-stationary
  • the residuals from this regression are given by… \(\hat{e_t} = Y_t - \hat{a} \cdot t - \hat{b}\)
    • if this estimated residuals can be statistically shown to be stationary

more precisely, if one can reject the hypothesis that the true underlying errors are non-stationary

  • then the residuals are referred to as the detrended data

ACF Practice

e.g Let \(A \sim Uniform[0,1], X(t)=A cos(2\pi t)\)

  • find \(R_X(t_1,t_2)\)

soln: \(R_X(t_1,t_2) = \mathbb{E}[A cos (2\pi t_1) A cos(2\pi t_2)]\)

\(R_X(t_1,t_2) = \mathbb{E}[A^2] cos (2\pi t_1) cos(2\pi t_2)]\)

\(R_X(t_1,t_2) = \frac{1}{3} cos (2\pi t_1) A cos(2\pi t_2)]\)