"Consider the problem of estimating the causal effect of skipping classes on final exam score. In a simple regression framework, we have
score = β0 + β1 skipped + u, (15.8)
where score is the final exam score and skipped is the total number of lectures missed during the semester. We certainly might be worried that skipped is correlated with other factors in u: more able, highly motivated students might miss fewer classes. Thus, a simple regression of score on skipped may not give us a good estimate of the causal effect of missing classes.
What might be a good IV for skipped? We need something that has no direct effect on score and is not correlated with student ability and motivation. At the same time, the IV must be correlated with skipped. One option is to use distance between living quarters and campus. Some students at a large university will commute to campus, which may increase the likelihood of missing lectures (due to bad weather, oversleeping, and so on). Thus, skipped may be positively correlated with distance; this can be checked by regressing skipped on distance and doing a t test, as described earlier.
Is distance uncorrelated with u? In the simple regression model (15.8), some factors in u may be correlated with distance. For example, students from low-income families may live off campus; if income affects student performance, this could cause distance to be correlated with u. Section 15.2 shows how to use IV in the context of multiple regression, so that other factors affecting score can be included directly in the model. Then, distance might be a good IV for skipped. An IV approach may not be necessary at all if a good proxy exists for student ability, suhc as cumulative GPA prior to the semester."
Whoever implemented the NUS attendance policy because studies found a correlation between bad grades and skipping lessons would do well to find proper instrumental variables.