Results matching “angrist”

Matthew Bogard writes:

Regarding the book Mostly Harmless Econometrics, you state:
A casual reader of the book might be left with the unfortunate impression that matching is a competitor to regression rather than a tool for making regression more effective.
But in fact isn't that what they are arguing, that, in a 'mostly harmless way' regression is in fact a matching estimator itself? "Our view is that regression can be motivated as a particular sort of weighted matching estimator, and therefore the differences between regression and matching estimates are unlikely to be of major empirical importance" (Chapter 3 p. 70) They seem to be distinguishing regression (without prior matching) from all other types of matching techniques, and therefore implying that regression can be a 'mostly harmless' substitute or competitor to matching. My previous understanding, before starting this book was as you say, that matching is a tool that makes regression more effective. I have not finished their book, and have been working at it for a while, but if they do not mean to propose OLS itself as a matching estimator, then I agree that they definitely need some clarification. I actually found your particular post searching for some article that discussed this more formally, as I found my interpretation (misinterpretation) difficult to accept. What say you?

My reply:

I don't know what Angrist and Pischke actually do in their applied analysis. I'm sorry to report that many users of matching do seem to think of it as a pure substitute for regression: once they decide to use matching, they try to do it perfectly and they often don't realize they can use regression on the matched data to do even better. In my book with Jennifer, we try to clarify that the primary role of matching is to correct for lack of complete overlap between control and treatment groups.

But I think in their comment you quoted above, Angrist and Pischke are just giving a conceptual perspective rather than detailed methodological advice. They're saying that regression, like matching, is a way of comparing-like-with-like in estimating a comparison. This point seems commonplace from a statistical standpoint but may be news to some economists who might think that regression relies on the linear model being true.

Gary King and I discuss this general idea in our 1990 paper on estimating incumbency advantage. Basically, a regression model works if either of two assumptions is satisfied: if the linear model is true, or if the two groups are balanced so that you're getting an average treatment effect. More recently this idea (of their being two bases for an inference) has been given the name "double robustness"; in any case, it's a fundamental aspect of regression modeling, and I think that, by equating regression with matching, Angrist and Pischke are just trying to emphasize that these are just tow different ways of ensuring balance in a comparison.

In many examples, neither regression nor matching works perfectly, which is why it can be better to do both (as Don Rubin discussed in his Ph.D. thesis in 1970 and subsequently in some published articles with his advisor, William Cochran).

Alex Tabarrok quotes Randall Morck and Bernard Yeung on difficulties with instrumental variables. This reminded me of some related things I've written.

In the official story the causal question comes first and then the clever researcher comes up with an IV. I suspect that often it's the other way around: you find a natural experiment and look at the consequences that flow from it. And maybe that's not such a bad thing. See section 4 of this article.

More generally, I think economists and political scientists are currently a bit overinvested in identification strategies. I agree with Heckman's point (as I understand it) that ultimately we should be building models that work for us rather than always thinking we can get causal inference on the cheap, as it were, by some trick or another. (This is a point I briefly discuss in a couple places here and also in my recent paper for the causality volume that Don Green etc are involved with.)

I recently had this discussion with someone else regarding regression discontinuity (the current flavor of the month; IV is soooo 90's), but I think the point holds more generally, that experiments and natural experiments are great when you have them, and they're great to aspire to and to focus one's thinking, but in practice these inferences are sometimes a bit of a stretch, and sometimes the appeal of an apparently clean identification strategy masks some serious difficulty mapping the identified parameter to underlying quantities of interest.

P.S. How I think about instrumental variables.

Classics of statistics

Christian Robert is planning a graduate seminar in which students read 15 classic articles of statistics. (See here for more details and a slightly different list.)

Actually, he just writes "classics," but based on his list, I assume he only wants articles, not books. If he wanted to include classic books, I'd nominate the following, just for starters:
- Fisher's Statistical Methods for Research Workers
- Snedecor and Cochran's Statistical Methods
- Kish's Survey Sampling
- Box, Hunter, and Hunter's Statistics for Experimenters
- Tukey's Exploratory Data Analysis
- Cleveland's The Elements of Graphing Data
- Mosteller and Wallace's book on the Federalist Papers.
Probably Cox and Hinkley, too. That's a book that I don't think has aged well, but it seems to have had a big influence.

I think there's a lot more good and accessible material in these classic books than in the equivalent volume of classic articles. Journal articles can be difficult to read and are typically filled with irrelevant theoretical material, the kind of stuff you need to include to impress the referees. I find books to be more focused and thoughtful.

Accepting Christian's decision to choose articles rather than books, what would be my own nominations for "classics of statistics"? To start with, there must be some tradeoff between quality and historical importance.

One thing that struck me about the list supplied by Christian is how many of these articles I would definitely not include in such a course. For example, the paper by Durbin and Watson (1950) does not seem at all interesting to me. Yes, it's been influential, in that a lot of people use that statistical test, but as an article, it hardly seems classic. Similarly, I can't see the point of including the paper by Hastings (1970). Sure, the method is important, but Christian's students will already know it, and I don't see much to be gained by reading the paper. I'd recommend Metropolis et al. (1953) instead. And Casella and Strawderman (1981)? A paper about minimax estimation of a normal mean? What's that doing on the list??? The paper is fine--I'd be proud to have written it, in fact I'd gladly admit that it's better than 90% of anything I've ever published--but it seems more of a minor note than a "classic." Or maybe there's some influence here of which I'm unaware. And I don't see how Dempster, Laird, and Rubin (1977) belongs on this list. It's a fine article and the EM algorithm has been tremendously useful, but, still, I think it's more about computation than statistics. As to Berger and Sellke (1987), well, yes, this paper has had an immense influence, at least among theoretical statisticians--but I think the paper is basically wrong! I don't want to label a paper as a classic if it's sent much of the field in the wrong direction.

For other papers on Christian's list, I can see the virtue of including in a seminar. For example, Hartigan and Wong (1979), "Algorithm AS 136: A K-Means Clustering Algorithm." The algorithm is no big deal, and the idea of k-means clustering is nothing special. But it's cool to see how people thought about such things back then.

And Christian also does include some true classics, such as Neyman and Pearson's 1933 paper on hypothesis testing, Plackett and Burnham's 1946 paper on experimental design, Pitman's 1939 paper on inference (I don't know if that's the best Pitman paper to include, but that's a minor issue), Cox's hugely influential 1972 paper on hazard regression, Efron's bootstrap paper, and classics by Whittle and Yates. Others I don't really feel so competent to judge (for example, Huber (1985) on projection pursuit), but it seems reasonable enough to include them on the list.

OK, what papers would I add? I'll list them in order of time of publication. (Christian used alphabetical order, which, as we all know, violates principles of statistical graphics.)

Neyman (1935). Statistical problems in agricultural experimentation (with discussion). JRSS. This one's hard to read, but it's certainly a classic, especially when paired with Fisher's comments in the lively discussion.

Tukey (1972). Some graphic and semigraphic displays. This article, which appears in a volume of papers dedicated to George Snedecor, is a lot of fun (even if in many ways unsound).

Akaike (1973). Information theory and an extension of the maximum likelihood principle. From a conference proceedings. I prefer this slightly to Mallows's paper on Cp, written at about the same time (but I like the Mallows paper too).

Lindley and Smith (1972). Bayes estimates for the linear model (with discussion). JRSS-B. The methods in the paper are mostly out of date, but it's worth it for the discussion (especially the (inadvertently) hilarious contribution of Kempthorne).

Rubin (1976). Inference and missing data. Biometrika. "Missing at random" and all the rest.

Wahba (1978). Improper priors, spline smoothing and the problem of guarding against model errors in regression. JRSS-B. This stuff all looks pretty straightforward now, but maybe not so much so back in 1978, back when people were still talking about silly ideas such as "ridge regression." And it's good to have all these concepts in one place.

Rubin (1980). Using empirical Bayes techniques in the law school validity studies (with discussion). JASA. Great, great stuff, also many interesting points come up in the discussion. If you only want to include one Rubin article, keep this one and leave "Inference and missing data" for students to discover on their own.

Hmm . . . why are so many of these from the 1970s? I'm probably showing my age. Perhaps there's some general principle that papers published in year X have the most influence on graduate students in year X+15. Anything earlier seems simply out of date (that's how I feel about Stein's classic papers, for example; sure, they're fine, but I don't see their relevance to anything I'm doing today, in contrast to the above-noted works by Tukey, Akaike, etc., which speak to my current problems), whereas anything much more recent doesn't feel like such a "classic" at all.

OK, so here's a more recent classic:

Imbens and Angrist (1994). Identification and estimation of local average treatment effects. Econometrica.

Finally, there are some famous papers that I'm glad Christian didn't consider. I'm thinking of influential papers by Wilcoxon, Box and Cox, and zillions of papers of that introduced particular hypothesis tests (the sort that have names that they tell you in a biostatistics class). Individually, these papers are fine, but I don't see that students would get much out of reading them. If I was going to pick any paper of that genre, I'd pick Deming and Stephan's 1940 article on iterative proportional fitting. I also like a bunch of my own articles, but there's no point in mentioning them here!

Any other classics you'd like to nominate (or places where you disagree with me)?

Causal inference in economics

Aaron Edlin points me to this issue of the Journal of Economic Perspectives that focuses on statistical methods for causal inference in economics. (Michael Bishop's page provides some links.)

To quickly summarize my reactions to Angrist and Pischke's book: I pretty much agree with them that the potential-outcomes or natural-experiment approach is the most useful way to think about causality in economics and related fields. My main amendments to Angrist and Pischke would be to recognize that:

1. Modeling is important, especially modeling of interactions. It's unfortunate to see a debate between experimentalists and modelers. Some experimenters (not Angrist and Pischke) make the mistake of avoiding models: Once they have their experimental data, they check their brains at the door and do nothing but simple differences, not realizing how much more can be learned. Conversely, some modelers are unduly dismissive of experiments and formal observational studies, forgetting that (as discussed in Chapter 7 of Bayesian Data Analysis) a good design can make model-based inference more robust.

2. In the case of a "natural experiment" or "instrumental variable," inference flows forward from the instrument, not backwards from the causal question. Estimates based on instrumental variables, regression discontinuity, and the like are often presented with the researcher having a causal question and then finding an instrument or natural experiment to get identification. I think it's more helpful, though, to go forward from the intervention and look at all its effects. Your final IV estimate or whatever won't necessarily change, but I think my approach is a healthier way to get a grip on what you can actually learn from your study.

Now on to the articles:

Causality and Statistical Learning

[The following is a review essay invited by the American Journal of Sociology. Details and acknowledgments appear at the end.]

In social science we are sometimes in the position of studying descriptive questions (for example: In what places do working-class whites vote for Republicans? In what eras has social mobility been higher in the United States than in Europe? In what social settings are different sorts of people more likely to act strategically?). Answering descriptive questions is not easy and involves issues of data collection, data analysis, and measurement (how should one define concepts such as "working class whites," "social mobility," and "strategic"), but is uncontroversial from a statistical standpoint.

All becomes more difficult when we shift our focus from What to What-if and Why.

Thinking about causal inference

Consider two broad classes of inferential questions:

1. Forward causal inference. What might happen if we do X? What are the effects of smoking on health, the effects of schooling on knowledge, the effect of campaigns on election outcomes, and so forth?

2. Reverse causal inference. What causes Y? Why do more attractive people earn more money, why do many poor people vote for Republicans and rich people vote for Democrats, why did the economy collapse?

In forward reasoning, the potential treatments under study are chosen ahead of time, whereas, in reverse reasoning, the research goal is to find and assess the importance of the causes. The distinction between forward and reverse reasoning (also called "the effects of causes" and the "causes of effects") was made by Mill (1843). Forward causation is a pretty clearly-defined problem, and there is a consensus that it can be modeled using the counterfactual or potential-outcome notation associated with Neyman (1923) and Rubin (1974) and expressed using graphical models by Pearl (2009): the causal effect of a treatment T on an outcome Y for an individual person (say), is a comparison between the value of Y that would've been observed had the person followed the treatment, versus the value that would've been observed under the control; in many contexts, the treatment effect for person i is defined as the difference, Yi(T=1) - Yi(T=0). Many common techniques, such as differences in differences, linear regression, and instrumental variables, can be viewed as estimated average causal effects under this definition.

In the social sciences, where it is generally not possible to try more than one treatment on the same unit (and, even when this is possible, there is the possibility of contamination from past exposure and changes in the unit or the treatment over time), questions of forward causation are most directly studied using randomization or so-called natural experiments (see Angrist and Pischke, 2008, for discussion and many examples). In some settings, crossover designs can be used to estimate individual causal effects, if one accepts certain assumptions about treatment effects being bounded in time. Heckman (2006), pointing to the difficulty of generalizing from experimental to real-world settings, argues that randomization is not any sort of "gold standard" of causal inference, but this is a minority position: I believe that most social scientists and policy analysts would be thrilled to have randomized experiments for their forward-causal questions, even while recognizing that subject-matter models are needed to make useful inferences from any experimental or observational study.

Reverse causal inference is another story. As has long been realized, the effects of action X flow naturally forward in time, while the causes of outcome Y cannot be so clearly traced backward. Did the North Vietnamese win the American War because of the Tet Offensive, or because of American public opinion, or because of the skills of General Giap, or because of the political skills of Ho Chi Minh, or because of the conflicted motivations of Henry Kissinger, or because of Vietnam's rough terrain, or . . .? To ask such a question is to reveal the impossibility of answering it. On the other hand, questions such as "Why do whites do better than blacks in school?", while difficult, do not seem inherently unanswerable or meaningless.

We can have an idea of going backward in the causal chain, accounting for more and more factors until the difference under study disappears--that is, is "explained" by the causal predictors. Such an activity can be tricky--hence the motivation for statistical procedures for studying causal paths--and ultimately is often formulated in terms of forward causal questions: causal effects that add up to explaining the Why question that was ultimately asked. Reverse causal questions are often more interesting and motivate much, perhaps most, social science research; forward causal research is more limited and less generalizable but is more doable. So we all end up going back and forth on this.

We see three difficult problems in causal inference:

Dumpin' the data in raw

Benjamin Kay writes:

I just finished the Stata Journal article you wrote. In it I found the following quote: "On the other hand, I think there is a big gap in practice when there is no discussion of how to set up the model, an implicit assumption that variables are just dumped raw into the regression."

I saw James Heckman (famous econometrician and labor economist) speak on Friday, and he mentioned that using test scores in many kinds of regressions is problematic, because the assignment of a score is somewhat arbitrary even if the order was not. He suggested that positive, monotonic transformations scores contain the same information and lead to different standard errors if in your words one just "dumped into the regression". It was somewhat of a throw away remark, but considering it longer, I imagine he mans that a difference of test scores need have no constant effect. The remedy he suggested was to recalibrate exam scores such that they have some objective meaning. For example, a mechanics exam scored between one and a hundred, one can pass (65) only if they successfully rebuild the engine in the time allotted, but better scores indicate higher quality or faster speed. In this example one might change it to a binary variable to passing or not, an objective testing of a set of competencies. However, doing that clearly throws away information.

Do you or the readers of Statistical Modeling, Causal Inference, and Social Science blog have any advice here? The transformation of the variable is problematic and the critique of transformations on using it raw seems a serious one, but the act of narrowly mapping it onto a set of objective discrete skills seems to destroy lots of information. Percentile ranks on exams might be a substitute for the raw scores in many cases, but introduces other problems like in comparisons between groups.

My reply: Heckman's suggestion sounds like it would be good in some cases but it wouldn't work for something like the SAT which is essentially a continuous measure. In other cases, such as estimated ideal point measures for congressmembers, it can make sense to break a single continuous ideal-point measure into two variables: political party (a binary variable: Dem or Rep) and the ideology score. This gives you the benefits of discretization without the loss of information.

In chapter 4 of ARM we give a bunch of examples of transformations, sometimes on single variables, sometimes combining variables, sometimes breaking up a variable into parts. A lot of information is coded in how you represent a regression function, and it's criminal to just take the data as they appear in the Stata file and just dump them in raw. But I have the horrible feeling that many people either feel that it's cheating to transform the variables, or that it doesn't really matter what you do to the variables, because regression (or matching, or difference-in-differences, or whatever) is a theorem-certified bit of magic.

Econometrics reaches The Economist

Hal Varian pointed me to this article in The Economist:

Instrumental variables help to isolate causal relationships. But they can be taken too far

"Like elaborately plumed birds...we preen and strut and display our t-values." That was Edward Leamer's uncharitable description of his profession in 1983. Mr Leamer, an economist at the University of California in Los Angeles, was frustrated by empirical economists' emphasis on measures of correlation over underlying questions of cause and effect, such as whether people who spend more years in school go on to earn more in later life. Hardly anyone, he wrote gloomily, "takes anyone else's data analyses seriously". To make his point, Mr Leamer showed how different (but apparently reasonable) choices about which variables to include in an analysis of the effect of capital punishment on murder rates could lead to the conclusion that the death penalty led to more murders, fewer murders, or had no effect at all.

In the years since, economists have focused much more explicitly on improving the analysis of cause and effect, giving rise to what Guido Imbens of Harvard University calls "the causal literature". The techniques at the heart of this literature--in particular, the use of so-called "instrumental variables"--have yielded insights into everything from the link between abortion and crime to the economic return from education. But these methods are themselves now coming under attack.

Avi Feller and Chris Holmes sent me a new article on estimating varying treatment effects. Their article begins:

Randomized experiments have become increasingly important for political scientists and campaign professionals. With few exceptions, these experiments have addressed the overall causal effect of an intervention across the entire population, known as the average treatment effect (ATE). A much broader set of questions can often be addressed by allowing for heterogeneous treatment effects. We discuss methods for estimating such effects developed in other disciplines and introduce key concepts, especially the conditional average treatment effect (CATE), to the analysis of randomized experiments in political science. We expand on this literature by proposing an application of generalized additive models to estimate nonlinear heterogeneous treatment effects. We demonstrate the practical importance of these techniques by reanalyzing a major experimental study on voter mobilization and social pressure and a recent randomized experiment on voter registration and text messaging from the 2008 US election.

This is a cool paper--they reanalyze data from some well-known experiments and find important interactions. I just have a few comments to add:

After six entries and 91 comments on the connections between Judea Pearl and Don Rubin's frameworks for causal inference, I thought it would be good to draw the discussion to a (temporary) close. I'll first present a summary from Pearl, then briefly give my thoughts.

Pearl writes:

A correspondent writes:

I've recently started skimming your blog (perhaps steered there by Brad deLong or Mark Thoma) but despite having waded through such enduring classics as Feller Vol II, Henri Theil's "Econometrics", James Hamilton's "Time Series Analysis", and T.W. Anderson's "Multivariate Analysis", I'm finding some of the discussions such as Pearl/Rubin a bit impenetrable. I don't have a stats degree so I am thinking there is some chunk of the core curriculum on modeling and causality that I am missing. Is there a book (likely one of yours - e.g. Bayesian Data Analysis) that you would recommend to help fill in my background?

1. I recommend the new book, "Mostly Harmless Econometrics," by Angrist and Pischke (see my review here).

2. After that, I'd read the following chapters from my book with Jennifer:

Chapter 9: Causal inference using regression on the treatment variable

Chapter 10: Causal inference using more advanced models

Here are some pretty pictures, from the low-birth-weight example:

fig10.3.png

and from the Electric Company example:

fig23.1_small.png

3. Beyond this, you could read the books by Morgan and Winship and Pearl, but both these are a bit more technical and less applied that the two books linked to above.

The commenters may have other suggestions.

In the most recent round of our recent discussion, Judea Pearl wrote:

There is nothing in his theory of potential-outcome that forces one to "condition on all information" . . . Indiscriminate conditioning is a culturally-induced ritual that has survived, like the monarchy, only because it was erroneously supposed to do no harm.

I agree with the first part of Pearl's statement but not the second part (except to the extent that everything we do, from Bayesian data analysis to typing in English, is a "culturally induced ritual"). And I think I've spotted a key point of confusion.

To put it simply, Donald Rubin's approach to statistics has three parts:

1. The potential-outcomes model for causal inference: the so-called Neyman-Rubin model in which observed data are viewed as a sample from a hypothetical population that, in the simplest case of a binary treatment, includes y_i^1 and y_i^2 for each unit i).

2. Bayesian data analysis: the mode of statistical inference in which you set up a joint probability distribution for everything in your model, then condition on all observed information to get inferences, then evaluate the model by comparing predictive inferences to observed data and other information.

3. Questions of taste: the preference for models supplied from the outside rather than models inspired by data, a preference for models with relatively few parameters (for example, trends rather than splines), a general lack of interest in exploratory data analysis, a preference for writing models analytically rather than graphically, an interest in causal rather than descriptive estimands.

As that last list indicates, my own taste in statistical modeling differs in some ways from Rubin's. But what I want to focus on here is the distinction between item 1 (the potential outcomes notation) and item 2 (Bayesian data analysis).

The potential outcome notation and Bayesian data analysis are logically distinct concepts!

Items 1 and 2 above can occur together or separately. All four combinations (yes/yes, yes/no, no/yes, no/no) are possible:

- Rubin uses Bayesian inference to fit models in the potential outcome framework.

- Rosenbaum (and, in a different way, Greenland and Robins) use the potential outcome framework but estimate using non-Bayesian methods.

- Most of the time I use Bayesian methods but am not particularly thinking about causal questions.

- And, of course, there's lots of statistics and econometrics that's non-Bayesian and does not use potential outcomes.

Bayesian inference and conditioning

In Bayesian inference, you set up a model and then you condition on everything that's been observed. Pearl writes, "Indiscriminate conditioning is a culturally-induced ritual." Culturally-induced it may be, but it's just straight Bayes. I'm not saying that Pearl has to use Bayesian inference--lots of statisticians have done just fine without ever cracking open a prior distribution--but Bayes is certainly a well-recognized approach. As I think I wrote the other day, I use Bayesian inference not because I'm under the spell of a centuries-gone clergyman; I do it because I've seen it work, for me and for others.

Pearl's mistake here, I think, is to confuse "conditioning" with "including on the right-hand side of a regression equation." Conditioning depends on how the model is set up. For example, in their 1996 article, Angrist, Imbens, and Rubin showed how, under certain assumptions, conditioning on an intermediate outcome leads to an inference that is similar to an instrumental variables estimate. They don't suggest including an intermediate variable as a regression predictor or as a predictor in a propensity score matching routine, and they don't suggest including an instrument as a predictor in a propensity score model.

If a variable is "an intermediate outcome" or "an instrument," this is information that must be encoded in the model, perhaps using words or algebra (as in econometrics or in Rubin's notation) or perhaps using graphs (as in Pearl's notation). I agree with Steve Morgan in his comment that Rubin's notation and graphs can both be useful ways of formulating such models. To return to the discussion with Pearl: Rubin is using Bayesian inference and conditioning on all information, but "conditioning" is relative to a model and does not at all imply that all variables are put in as predictors in a regression.

Another example of Bayesian inference is the poststratification which I spoke of yesterday (see item 3 here). But, as I noted then, this really has nothing to do with causality; it's just manipulation of probability distributions in a useful way that allows us to include multiple sources of information.

P.S. We're lucky to be living now rather than 500 years ago, or we'd probably all be sitting around in a village arguing about obscure passages from the Bible.

To follow up on yesterday's discussion, I wanted to go through a bunch of different issues involving graphical modeling and causal inference.

Contents:
- A practical issue: poststratification
- 3 kinds of graphs
- Minimal Pearl and Minimal Rubin
- Getting the most out of Minimal Pearl and Minimal Rubin
- Conceptual differences between Pearl's and Rubin's models
- Controlling for intermediate outcomes
- Statistical models are based on assumptions
- In defense of taste
- Argument from authority?
- How could these issues be resolved?
- Holes everywhere
- What I can contribute

Book reviews in academic journals

I thought that economists might be interested in my thoughts on the new book by Angrist and Pischke and, more generally, on the different perspectives that statisticians and economists have on causal inference. So I wrote them up as a short document and asked an econometrician friend where to send it. He said that the Journal of Economic Literature does book reviews so I sent it there. They returned it to me with kind words on my review but the note: "The JEL has avoided reviewing textbooks, focusing instead on research monographs. The review makes fine points about the coverage in this textbook, but neither the book nor the review are attempting to advance the state of the art."

Fair enough. So where to send the review. I asked some colleagues and they all agreed that JEL is the only economics journal that reviews books. So I guess econ textbooks just don't get reviewed!

This surprised me, given that book reviews appear in several top statistical journals, including the Journal of the American Statistical Association, the American Statistician, Biometrics, the Journal of the Royal Statistical Society, Statistics in Medicine, and Technometrics. There are also lots of places that review books in political science.

I'm surprised that there's only one place for book reviews for economists.

From Jessica, I saw a review by "Econjeff" of my review of Joshua Angrist and Jorn-Steffen Pischke's new book, "Mostly Harmless Econometrics: An Empiricist's Companion."

Econjeff pretty much agrees with what I wrote, but with one comment:

I [Econjeff] am a bit surprised by Gelman's call for more on hierarchical models; I think economists are right to treat these as a combination of useful pedagogical tool for education research design and an unnecessarily functional-form dependent way to get the standard errors right when then the unit of treatment differs from the units available in the data.

I think this is a common perception of multilevel (hierarchical) models among economists. Regular readers of this blog will not be surprised to hear that I disagree completely! The purpose of a multilevel model is not to "get the standard errors right" but rather to model structure in the data.

An analogy that might help here for economists is time series analysis. If you have data with time series structure and you ignore it, you can get over-optimistic standard errors. But that's not the main reason people do time series modeling. The main reason is that the time series structure is interesting and important in its own right. We are interested in individual and contextual effects and unexplained variation at the individual and group levels, just as we are interested in autocorrelation, periodicity, long-range dependence, and so forth.

See chapters 1 and 11 of ARM for more discussion of motivations for multilevel modeling.

We were discussing the Angrist and Pischke book with Paul Rosenbaum and I mentioned my struggle with instrumental variables: where do they come from, and doesn't it seem awkward when you see someone studying a causal question and looking around for an instrument?

And Paul said: No, it goes the other way. What Angrist and his colleagues do is to find the instrument first, and then they go from there. They might see something in the newspaper or hear something on the radio and think: Hey--there's a natural experiment--it could make a good instrument! And then they go from there.

This sounded fun at first, but I actually prefer this to the usual presentation of instrumental variables. The "find the IV first" approach is cleaner: in this story, all causation flows from the IV, which has various consequences. So if you have a few key researchers such Angrist keeping their ears open, hearing of IV's, then you'll learn some things. This approach also fits in with my fail-safe method of understanding IV's when I get stuck with the usual interpretation.

Sometimes the "lead with the natural experiment" approach can lead to missteps, as illustrated by Angrist and Pischke's overinterpretation of David Lee's work on incumbency in elections. (See here for my summary of Lee's research along with a discussion of why he's estimating the "incumbent party advantage" rather than the advantage of individual incumbency.) But generally it seems like the way to go, much better than the standard approach of starting with a causal goal of interest and then looking around for an IV.

In this spirit, let me again mention my own pet idea for a natural experiment:

The Flynn effect, and the related occasional re-norming of IQ scores, causes jumps in the number of people classified as mentally retarded (conventionally, an IQ of 70, which is two standard deviations below the mean if the mean is scaled at 100). When they rescale the tests, the proportion of people labeled "retarded" jumps up. Seems like a natural experiment that might be a good opportunity to study effects of classifying people in this way on the margin. If the renorming is done differently in different states or countries, this would provide more opportunity for identifying treatment effects.

I think it would be so cool if someone could take this idea and run with it.

Now that we're on the topic of econometrics . . . somebody recommended to me a book by Deirdre McCloskey. I can't remember who gave me this recommendation, but the name did ring a bell, and then I remembered I wrote some other things about her work a couple years ago. See here.

And, because not everyone likes to click through, here it all is again:

Mostly Harmless Econometrics

I just read the new book, "Mostly Harmless Econometrics: An Empiricist's Companion," by Joshua Angrist and Jorn-Steffen Pischke. It's an excellent book and, I think, well worth your $35. I recommend that all of you buy it.

I also have a few comments.

Scott Cunningham writes,

Today I was rereading Deirdre McCloskey and Ziliak's JEL paper on statistical significance, and then reading for the first time their detailed response to a critic who challenged their original paper. I was wondering what opinion you had about this debate. Is statistical significance and Fisher tests of significance as maligned and problematic as McCloskey and Ziliak claim? In your professional opinion, what is the proper use of seeking to scientifically prove that a result is valid and important?
1

Recent Comments

  • rahul: hiiiiiiii friends For more than 20 years, Deidre McCloskey has read more
  • Deirdre McCloskey: Dear Professor Cunningham: You have missed the main point, not read more
  • kio: There is a high probability for standard tools of significance read more
  • pat toche: Rational addiction, as far as I understand it, is a read more
  • LemmusLemmus: Ah, rational addiction, thanks for reminding me. I read the read more
  • Andrew: KMC, Yes, some effects really are zero (or as close read more
  • bccheah: After reading your post on editing Wikipedia I looked up read more
  • Shane: And now here: http://www.marginalrevolution.com/marginalrevolution/2007/10/oomph-v-statist.html read more
  • Shane: Andrew: I see this is getting picked up on the read more
  • Giu: The paper of Hoover and Siegler is an insult to read more
  • KMC: One of the few certainties in life is that the read more

Find recent content on the main index or look in the archives to find all content.