Read an Excerpt
Mastering 'Metrics
The Path from Cause to Effect
By Joshua D. Angrist, Jörn-Steffen Pischke PRINCETON UNIVERSITY PRESS
Copyright © 2015 Princeton University Press
All rights reserved.
ISBN: 978-1-4008-5238-3
CHAPTER 1
Randomized Trials
Kwai Chang Caine: What happens in a man's life is already written. A man must move through life as his destiny wills.
Old man: Yet each is free to live as he chooses. Though they seem opposite, both are true.
Kung Fu, Pilot
Our Path
Our path begins with experimental random assignment, both as a framework for causal questions and a benchmark by which the results from other methods are judged. We illustrate the awesome power of random assignment through two randomized evaluations of the effects of health insurance. The appendix to this chapter also uses the experimental framework to review the concepts and methods of statistical inference.
1.1 In Sickness and in Health (Insurance)
The Affordable Care Act (ACA) has proven to be one of the most controversial and interesting policy innovations we've seen. The ACA requires Americans to buy health insurance, with a tax penalty for those who don't voluntarily buy in. The question of the proper role of government in the market for health care has many angles. One is the causal effect of health insurance on health. The United States spends more of its GDP on health care than do other developed nations, yet Americans are surprisingly unhealthy. For example, Americans are more likely to be overweight and die sooner than their Canadian cousins, who spend only about two-thirds as much on care. America is also unusual among developed countries in having no universal health insurance scheme. Perhaps there's a causal connection here.
Elderly Americans are covered by a federal program called Medicare, while some poor Americans (including most single mothers, their children, and many other poor children) are covered by Medicaid. Many of the working, prime-age poor, however, have long been uninsured. In fact, many uninsured Americans have chosen not to participate in an employer-provided insurance plan. These workers, perhaps correctly, count on hospital emergency departments, which cannot turn them away, to address their health-care needs. But the emergency department might not be the best place to treat, say, the flu, or to manage chronic conditions like diabetes and hypertension that are so pervasive among poor Americans. The emergency department is not required to provide long-term care. It therefore stands to reason that government-mandated health insurance might yield a health dividend. The push for subsidized universal health insurance stems in part from the belief that it does.
The ceteris paribus question in this context contrasts the health of someone with insurance coverage to the health of the same person were they without insurance (other than an emergency department backstop). This contrast highlights a fundamental empirical conundrum: people are either insured or not. We don't get to see them both ways, at least not at the same time in exactly the same circumstances.
In his celebrated poem, "The Road Not Taken," Robert Frost used the metaphor of a crossroads to describe the causal effects of personal choice:
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Frost's traveler concludes:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
The traveler claims his choice has mattered, but, being only one person, he can't be sure. A later trip or a report by other travelers won't nail it down for him, either. Our narrator might be older and wiser the second time around, while other travelers might have different experiences on the same road. So it is with any choice, including those related to health insurance: would uninsured men with heart disease be disease-free if they had insurance? In the novel Light Years, James Salter's irresolute narrator observes: "Acts demolish their alternatives, that is the paradox." We can't know what lies at the end of the road not taken.
We can't know, but evidence can be brought to bear on the question. This chapter takes you through some of the evidence related to paths involving health insurance. The starting point is the National Health Interview Survey (NHIS), an annual survey of the U.S. population with detailed information on health and health insurance. Among many other things, the NHIS asks: "Would you say your health in general is excellent, very good, good, fair, or poor?" We used this question to code an index that assigns 5 to excellent health and 1 to poor health in a sample of married 2009 NHIS respondents who may or may not be insured. This index is our outcome: a measure we're interested in studying. The causal relation of interest here is determined by a variable that indicates coverage by private health insurance. We call this variable the treatment, borrowing from the literature on medical trials, although the treatments we're interested in need not be medical treatments like drugs or surgery. In this context, those with insurance can be thought of as the treatment group; those without insurance make up the comparison or control group. A good control group reveals the fate of the treated in a counterfactual world where they are not treated.
The first row of Table 1.1 compares the average health index of insured and uninsured Americans, with statistics tabulated separately for husbands and wives. Those with health insurance are indeed healthier than those without, a gap of about .3 in the index for men and .4 in the index for women. These are large differences when measured against the standard deviation of the health index, which is about 1. (Standard deviations, reported in square brackets in Table 1.1, measure variability in data. The chapter appendix reviews the relevant formula.) These large gaps might be the health dividend we're looking for.
Fruitless and Fruitful Comparisons
Simple comparisons, such as those at the top of Table 1.1, are often cited as evidence of causal effects. More often than not, however, such comparisons are misleading. Once again the problem is other things equal, or lack thereof. Comparisons of people with and without health insurance are not apples to apples; such contrasts are apples to oranges, or worse.
Among other differences, those with health insurance are better educated, have higher income, and are more likely to be working than the uninsured. This can be seen in panel B of Table 1.1, which reports the average characteristics of NHIS respondents who do and don't have health insurance. Many of the differences in the table are large (for example, a nearly 3-year schooling gap); most are statistically precise enough to rule out the hypothesis that these discrepancies are merely chance findings (see the chapter appendix for a refresher on statistical significance). It won't surprise you to learn that most variables tabulated here are highly correlated with health as well as with health insurance status. More-educated people, for example, tend to be healthier as well as being overrepresented in the insured group. This may be because more-educated people exercise more, smoke less, and are more likely to wear seat belts. It stands to reason that the difference in health between insured and uninsured NHIS respondents at least partly reflects the extra schooling of the insured.
Our effort to understand the causal connection between insurance and health is aided by fleshing out Frost's two-roads metaphor. We use the letter Y as shorthand for health, the outcome variable of interest. To make it clear when we're talking about specific people, we use subscripts as a stand-in for names: Yi is the health of individual i. The outcome Yi is recorded in our data. But, facing the choice of whether to pay for health insurance, person i has two potential outcomes, only one of which is observed. To distinguish one potential outcome from another, we add a second subscript: The road taken without health insurance leads to Y0i (read this as "y-zero-i") for person i, while the road with health insurance leads to Y1i (read this as "y-one–i") for person i. Potential outcomes lie at the end of each road one might take. The causal effect of insurance on health is the difference between them, written Y1i - Y0i.
To nail this down further, consider the story of visiting Massachusetts Institute of Technology (MIT) student Khuzdar Khalat, recently arrived from Kazakhstan. Kazakhstan has a national health insurance system that covers all its citizens automatically (though you wouldn't go there just for the health insurance). Arriving in Cambridge, Massachusetts, Khuzdar is surprised to learn that MIT students must decide whether to opt in to the university's health insurance plan, for which MIT levies a hefty fee. Upon reflection, Khuzdar judges the MIT insurance worth paying for, since he fears upper respiratory infections in chilly New England. Let's say that Y0i = 3 and Y1i = 4 for i = Khuzdar. For him, the causal effect of insurance is one step up on the NHIS scale:
Y1,Khuzdar - Y0,Khuzdar = 1.
Table 1.2 summarizes this information.
It's worth emphasizing that Table 1.2 is an imaginary table: some of the information it describes must remain hidden. Khuzdar will either buy insurance, revealing his value of Y1i, or he won't, in which case his Y0i is revealed. Khuzdar has walked many a long and dusty road in Kazakhstan, but even he cannot be sure what lies at the end of those not taken.
Maria Moreño is also coming to MIT this year; she hails from Chile's Andean highlands. Little concerned by Boston winters, hearty Maria is not the type to fall sick easily. She therefore passes up the MIT insurance, planning to use her money for travel instead. Because Maria has Y0,Maria = Y1,Maria = 5, the causal effect of insurance on her health is
Y1,Maria - Y0,Maria = 0.
Maria's numbers likewise appear in Table 1.2.
Since Khuzdar and Maria make different insurance choices, they offer an interesting comparison. Khuzdar's health is YKhuzdar = Y1,Khuzdar = 4, while Maria's is YMaria = Y0,Maria = 5. The difference between them is
YKhuzdar - YMaria = -1.
Taken at face value, this quantity—which we observe—suggests Khuzdar's decision to buy insurance is counterproductive. His MIT insurance coverage notwithstanding, insured Khuzdar's health is worse than uninsured Maria's.
In fact, the comparison between frail Khuzdar and hearty Maria tells us little about the causal effects of their choices. This can be seen by linking observed and potential outcomes as follows:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
The second line in this equation is derived by adding and subtracting Y0,Khuzdar, thereby generating two hidden comparisons that determine the one we see. The first comparison, Y1,Khuzdar - Y0,Khuzdar, is the causal effect of health insurance on Khuzdar, which is equal to 1. The second, Y0,Khuzdar - Y0,Maria, is the difference between the two students' health status were both to decide against insurance. This term, equal to -2, reflects Khuzdar's relative frailty. In the context of our effort to uncover causal effects, the lack of comparability captured by the second term is called selection bias.
You might think that selection bias has something to do with our focus on particular individuals instead of on groups, where, perhaps, extraneous differences can be expected to "average out." But the difficult problem of selection bias carries over to comparisons of groups, though, instead of individual causal effects, our attention shifts to average causal effects. In a group of n people, average causal effects are written Avgn]Y1i - Y0i], where averaging is done in the usual way (that is, we sum individual outcomes and divide by n):
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1.1)
The symbol [summation]ni=1 indicates a sum over everyone from i = 1 to n, where n is the size of the group over which we are averaging. Note that both summations in equation (1.1) are taken over everybody in the group of interest. The average causal effect of health insurance compares average health in hypothetical scenarios where everybody in the group does and does not have health insurance. As a computational matter, this is the average of individual causal effects like Y1,Khuzdar - Y0,Khuzdar and Y1,Maria - Y0,Maria for each student in our data.
An investigation of the average causal effect of insurance naturally begins by comparing the average health of groups of insured and uninsured people, as in Table 1.1. This comparison is facilitated by the construction of a dummy variable, Di, which takes on the values 0 and 1 to indicate insurance status:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
We can now write Avgn]Yi|Di = 1] for the average among the insured and Avgn]Yi|Di = 0] for the average among the uninsured. These quantities are averages conditional on insurance status.
The average Yi for the insured is necessarily an average of outcome Y1i, but contains no information about Y0i. Likewise, the average Yi among the uninsured is an average of outcome Y0i, but this average is devoid of information about the corresponding Y1i. In other words, the road taken by those with insurance ends with Y1i, while the road taken by those without insurance leads to Y0i. This in turn leads to a simple but important conclusion about the difference in average health by insurance status:
Difference in group means
= Avgn[Yi|Di = 1]- Avgn[Yi|Di = 0] = Avgn[Y1i|Di = 1]- Avgn[Y0i|Di = 0], (1.2)
an expression highlighting the fact that the comparisons in Table 1.1 tell us something about potential outcomes, though not necessarily what we want to know. We're after Avgn]Y1i - Y0i], an average causal effect involving everyone's Y1i and everyone's Y0i, but we see average Y1i only for the insured and average Y0i only for the uninsured.
To sharpen our understanding of equation (1.2), it helps to imagine that health insurance makes everyone healthier by a constant amount, κ. As is the custom among our people, we use Greek letters to label such parameters, so as to distinguish them from variables or data; this one is the letter "kappa." The constant-effects assumption allows us to write:
Y1i = Y0i + κ, (1.3)
or, equivalently, Y1i - Y0i = κ. In other words, κ is both the individual and average causal effect of insurance on health. The question at hand is how comparisons such as those at the top of Table 1.1 relate to κ.
(Continues...)
Excerpted from Mastering 'Metrics by Joshua D. Angrist, Jörn-Steffen Pischke. Copyright © 2015 Princeton University Press. Excerpted by permission of PRINCETON UNIVERSITY PRESS.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.