Final Exam Extra Questions

Here are the questions I have received. Due to the nature of them I will just answer them via this page instead of a video.

Hazard vs Survival

The hazard is actually the negative of the derivative of the natural logarithm of survival.

\[h(t) = -\dfrac{d}{dt}log\left[S(t)\right]\] They do not have the same meaning. Survival is a probability of surviving past a certain time \(t\) or \(\Pr(T>t)\). However, the hazard is the instantaneous rate of failure at time \(t\).

This means survival is basically trying to calculate the proportion of individuals who have survived past some time \(t\) and hazard would be the rate at which we expect subjects to fail at time \(t\).

Anytime you are discussing different concepts, then there are different interpretations. Consider Lecture 13 from slide 8 - 13 for the definitions of survival and hazard. Then consider Lecture 14 slides 13-22 for interpretations.

Stratified Cox PH Models

When we stratify the Cox Proportional Hazards models we are startifying the baseline hazard function which we do not interpret. Consider the example from Lecture 14

We began fitting a model to look at modeling the hazard of going back to prison within 1 year of being released. We found that age did not pass the proportional hazards test. We placed age into categories to stratify over.

Then we fit the model with stratified age:

term estimate p.value conf.low conf.high
finyes 0.711198 0.0731668 0.4898793 1.032505
prio 1.098636 0.0004973 1.0419799 1.158373

We can plot the model to see the effects of stratification:

## Error in library(simPH): there is no package called 'simPH'
## Error in ggfitStrata(mod1_fit, byStrata = FALSE): could not find function "ggfitStrata"

We can see that the strate do give us different survival curves. This is do to the different baseline hazard ratios. However the model output is the same regardless of which age category an individual is in:

term estimate p.value conf.low conf.high
finyes 0.711198 0.0731668 0.4898793 1.032505
prio 1.098636 0.0004973 1.0419799 1.158373

We do however account for age in the interpretation. For example if we were to interpret fin, we would see that if we compare 2 people in the same age category and the same numer of prior convictions for an individual who receives financial support they have a 28.9% decrease in hazard of returning to prison as the one who did not receive financial support.

Exam Material

As I mentioned in class, I am not going to give complete lists of everything on the exam. However the main concepts being tested on the exam will be:

You can find the notes on this in the calendar section of the website. I have also provided a practice exam.

Model Comparisons

Let’s consider the Recividivism data of Lecture 14. We wish to model whether or not a released prisoner will return to prison. If we fit our model with logistic regression we get:

term estimate p.value
(Intercept) 0.8302458 0.5398357
finyes 0.6592192 0.0697542
prio 1.1089695 0.0057502
age.cat(19,25] 0.3258687 0.0001494
age.cat(25,30] 0.3873753 0.0129659
age.cat(30,Inf] 0.1621535 0.0000677

We find that if we are interested in whether or not financial assistance helps then would look at the fin variable in the model and see that this is not significant. We have a p-value of 0.07.

This model assumes that we follow subjects for the same time period. If we look at the density of of time.

We can see that just in the individual age groups that there is quite the variation of time and this may violate the assumption of same time for each individual. We then may wish to consider Poisson regression instead.

term estimate p.value
(Intercept) 0.4394671 0.0002833
finyes 0.7613478 0.1516880
prio 1.0602218 0.0247865
age.cat(19,25] 0.5051420 0.0020580
age.cat(25,30] 0.5656266 0.0586176
age.cat(30,Inf] 0.2815092 0.0014321

We can also see that in this model fin is not significant with a p-value of 0.15. With this model we are assuming that the rate at which a subject is likely to go back to prison remains the same. This may be violated in that we can assume that the longer a subject is out, the less likely they are to return to prison. This would then suggest that we cannot use Poisson and cannot use logistic. Now we are left with a survival model of Cox Proportional hazards:

term estimate p.value conf.low conf.high
finyes 0.711198 0.0731668 0.4898793 1.032505
prio 1.098636 0.0004973 1.0419799 1.158373

We also find in this model that fin is not significant which is the same result as we have had in all previous models but the data is better modeled by Cox PH in this case.