odds ratio \(= \frac{n_{11}, n_{22}}{n_{21}\cdot n_{12}}\)
concordant - in agreement
-
a concordant pair is a pair of observations
-
each on two variables \((X_1,Y_1)\) and \((X_2,Y_2)\)
-
having the property that
\({\displaystyle \operatorname {sgn} (X_{2}-X_{1})\ =\operatorname {sgn} (Y_{2}-Y_{1}),}\)
-
where \(sgn\) refers to a whether a number is positive, zero, or negative (its sign)
- the signum function is defined as: \({\displaystyle \operatorname {sgn} x={\begin{cases}-1,&x<0\\0,&x=0\\1,&x>0\end{cases}}}\)
-
where \(sgn\) refers to a whether a number is positive, zero, or negative (its sign)
-
-
each on two variables \((X_1,Y_1)\) and \((X_2,Y_2)\)
-
a discordant pair is a pair of two variables observations such that
\({\displaystyle \operatorname {sgn} (X_{2}-X_{1})\ =-\operatorname {sgn} (Y_{2}-Y_{1})}\)
- if one pair contains a higher value value \(X\) then the other pair contains a higher value of \(Y\)
Somer's D
Goodman's and Kruskal's gamma
P: Concordant pair Q: Discordant pair
\(G = (P-Q)/(P+Q)\)
Kendall tau-b
Stuart tau
c
Pearson Chi-square test
Mantel-Haenszel chi-square test
significance association
- cannot assess direction at all
Spearman Correlation Statistic
Cramer's V
Practice
when a sample size decreases
-
the p-value increaes
- and the width of the CL for the odds ratio increases
there aren't upper bound and lower bound for a logit
\(logit = \log(\frac{p}{1-p}) = x\beta\)
Greenacre's method
similar to gradient descent to find the maximum likelihood
-
chooses a least reduction of chi square
-
hierachically clusters
-
collapses levels of contigency tables
Variable Clustering
-
complexity of dataset increases rapidly with increasing dimensionality
-
computation times, exploring the model, model scoring, redundancy in the datset
Eigenvalue - variance explained by each PC for all the variables
-
column total of each PC
-
if an eigenvalue of PC is greater than specific threshold
- then cluster is split
-
if an eigenvalue of PC is greater than specific threshold
-
-