Overview
what is a random process?
- a random process is a function indexed by a random key
autocorrelation function of a random process \(X(t)\) is \(R_X(t_1-t_2) = \mathbb{E}[X(t_1)X(t_2)]\)
-
takes two time instances \(t_1\) and \(t_2\)
- since \(X(t_1)\) and \(X(t_2)\) are random variables
\(R_X(t_1,t_2) = \mathbb{E}[X(t_1)X(t_2)]\)
- measure the correlation of these random variables
autocorrelation means correlation with itself
\(ACF \approx cor(y, y_{lagged})\)
Correlation Coefficient
Let \(X\) be a random variable, and \(\{x_1, x_2,..., x_T\}\) a random sample
\(E(X) = \mu\) is the population mean, and \(\bar{x}=\sum_{i=1}^T x_i/T\) is the sample mean
\(\sigma^2 = E((X - \mu)^)\) is the population variance \(S^2 = \sum_{i=1}^T (x_i - \bar{x})/(T-1)\) is the sample variance
Let \(X\) and \(Y\) be random variables, then theoretical correlation coefficient
- between \(X\) and \(Y\) is \(\rho = E((X-\mu_x)(Y-\mu_y))\)
Let \((x_1,y_1),(x_2,y_2),...,(x_T,y_T)\) be the observations (random sample) of \((X,Y)\) then the sample correlation coefficient between \(X\) and \(Y\) is…
\(r = cor(x,y) = \frac{cov(x,y)}{s_x s_y} = \frac{\sum_{i=1}^T(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^T(x_i-\bar{x})\sum_{i=1}^T(y_i-\bar{y})}}\) when sample size \(T\) is large,
\(\mu \approx \bar{x}\)
\(\sigma^2 \approx S^2\)
\(\rho \approx r\)
Stationarity around a linear trend
doesn't matter where you start from
suppose the variable \(Y\) evolves according to \(Y_t = a \cdot t + b + e_t\)
-
where \(t\) is time and \(e_t\) is error term
-
hypothesized to be white noise
- or more generally to have been generated by any stationary process
-
hypothesized to be white noise
then one can use linear regression
-
to obtain an estimate \(\hat{a}\)
-
of the true underlying trend slope \(a\)
-
and an estimate \(\hat{b}\)
- of the underlying to intercept term \(b\)
-
and an estimate \(\hat{b}\)
-
of the true underlying trend slope \(a\)
-
if the estimate \(\hat{a}\) is significantly different from zero
- this is sufficient to show with high confidence that the variable \(Y\) is non-stationary
-
the residuals from this regression are given by… \(\hat{e_t} = Y_t - \hat{a} \cdot t - \hat{b}\)
- if this estimated residuals can be statistically shown to be stationary
more precisely, if one can reject the hypothesis that the true underlying errors are non-stationary
- then the residuals are referred to as the detrended data
ACF Practice
e.g Let \(A \sim Uniform[0,1], X(t)=A cos(2\pi t)\)
- find \(R_X(t_1,t_2)\)
soln: \(R_X(t_1,t_2) = \mathbb{E}[A cos (2\pi t_1) A cos(2\pi t_2)]\)
\(R_X(t_1,t_2) = \mathbb{E}[A^2] cos (2\pi t_1) cos(2\pi t_2)]\)
\(R_X(t_1,t_2) = \frac{1}{3} cos (2\pi t_1) A cos(2\pi t_2)]\)