
Adam J Sullivan
Assistant Professor of Biostatistics
Brown University
We use \[\eta_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \cdots + \beta_px_{ip} = \sum_{j=1}^p \beta_jx_{ij}\]
where:
Some common link functions are
Random Component | Link Function | Outcome | Explanatory | Model |
---|---|---|---|---|
Normal | Identity | Continuous | Factor | ANOVA |
Normal | Identity | Continuous | Continuous | Regression |
Binomial | Logit | Binary | Mixed | Logistic Regression |
Multinomial | Generalized logit | Binary | Mixed | Multinomial Regression |
Poisson | Log | Count | Mixed | Poisson Regression |
\[ Y = \begin{cases} 1 & \text{if sucess}\\ 0 & \text{if failure} \end{cases} \]
\[E(Y) = np\] \[Var(Y)= np(1-p)\]
\[ \begin{aligned} p_i &= \dfrac{\exp\left(\beta_0 + \beta_1x_i\right)}{1+\exp\left(\beta_0 + \beta_1x_i\right)}\\ p_i\left(1+\exp\left(\beta_0 + \beta_1x_i\right)\right)&=\exp\left(\beta_0 + \beta_1x_i\right)\\ p_i &= \exp\left(\beta_0 + \beta_1x_i\right)\left(1-p_i\right)\\ \log\left(\dfrac{p_i}{1-p_i}\right) &= \beta_0 + \beta_1x_i\\ logit\left(p_i\right) &= \beta_0 + \beta_1x_i \end{aligned} \]
Then if we consider the logit:
\[ \begin{aligned} \text{If } p= 0 & \text{then } \log\left(\dfrac{p}{1-p}\right)=-\infty\\ \text{If } p= \tfrac{1}{2} & \text{then } \log\left(\dfrac{p}{1-p}\right)=0\\ \text{If } p= 1 & \text{then } \log\left(\dfrac{p}{1-p}\right)=\infty \end{aligned} \]
\[ \begin{aligned} \eta &= \beta_0 + \beta_1x_{i1} + \cdots + \beta_px_{ip} \\ g(E(y_i)) &= \beta_0 + \beta_1x_{i1} + \cdots + \beta_px_{ip} \\ g(p_i) &= logit\left(p_i\right) \end{aligned} \]
\[ \begin{aligned} \Pr(Y_i=0|x_i) &= 1- \Pr(Y_i=1|x_i)\\ &= 1 - \dfrac{\exp\left(\beta_0 + \beta_1x_i\right)}{1+ \exp\left(\beta_0+\beta_1x_i\right)}\\ &= \dfrac{1}{1+ \exp\left(\beta_0+\beta_1x_i\right)} \end{aligned} \]
library(haven)
wcgs <- read_dta("wcgs2.dta")
wcgs <- wcgs[,-16]
Name | Description |
---|---|
id | Subject identification number |
age | Age in years |
height | Height in inches |
weight | Weight in lbs. |
sbp | Systolic blood pressure in mm |
dbp | Diastolic blood pressure in mm Hg |
chol | Fasting serum cholesterol in mm |
Name | Description |
---|---|
behpat | Behavior |
1 = A1 | |
2 = A2 | |
3 = B3 | |
4 = B4 | |
ncigs | Cigarettes per day |
dibpat | Behavior |
1 = type A | |
2 = type B |
Name | Description |
---|---|
chd69 | Coronary heart disease |
1 = Yes | |
0 = no | |
typechd | Type of CHD |
1 = myocardial infarction or death | |
2 = silent myocardial infarction | |
3 = angina perctoris | |
time169 | Time of CHD event or end of follow-up |
Name | Description |
---|---|
arcus | Arcus senilis |
0 = absent | |
1 = present | |
bmi | Body Mass Index |
library(broom)
fit.cont <- glm(chd69 ~ age, data=wcgs, family=binomial(link="logit"))
tidy(fit.cont, conf.int=TRUE)[,-c(3:4)]
## # A tibble: 2 x 5
## term estimate p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -5.94 3.00e-27 -7.02 -4.87
## 2 age 0.0744 4.56e-11 0.0523 0.0966