Bayesian Information Criterion

$IC = f(SSE) + penalty$

when fitting models, its possible to increase the maximum likelihood
- by adding parameters, but doing so may result in overfitting
  - BIC resolves this problem by introducing a penalty term
    - for the number of parameters in the model
SBC or SIC or SBIC is a criterion for model selection
- among a finite set of models

the BIC is an increasing function of the error variance $σ_e²$

BIC is merely a heuristic and not tranformed Bayes factors

BIC SUFFERS FROM TWO MAIN LIMIMTATION

the above approximation is only valid for sample size $n$ much larger than the number $k$ of parameters in the model
the BIC cannot handle complex collections of models as in the variable selection or feature selection problem in high-dimension

Estimation the dimension of a model

how to choose the appropriate dimensionality of a model

that will fit a given set of observations

e.g. the choice of degree for a polynomial regression
- or the choice of order for a multi-step Markov chain
maximum likelihood principle invariably leads to choosing the highest dimension possible

Akaike suggests…
- for the problem of choosing among different models
  - with different numbers of parameters
- suggestion amounts to maximizing the likelihood function separately
  - for each model $j$ obtaining say $M_{j}=(X_1,...,X_n)$
    - and then choosing the model for which $\log M_j(X_1,...,X_n)-k_j$
      - is the largest where $k_j$ is the dimension of the model

Alternative

Choose the model for which $\log M_j(X_1,...,X_n)-\frac{1}{2}k_j\log n$ is the largest.