I enjoy reading the Freakonomics blog, but as I've noted previously, I remain puzzled by the presence of two appealing but, to my mind, incompatible forms of reasoning that seem to be used more generally in the world of "freakonomics"...
Chris Blattman reports on a study by Seema Jayachandran and Ilyana Kuziemko that makes the following argument: Medical research indicates that breastfeeding suppresses post-natal fertility. We [Jayachandran and Kuziemko] model the implications for breastfeeding decisions and test the model's predictions...
Gregor Gorjanc writes:...
Paul Cross writes: In reading your book and papers on multilevel modeling I've noticed that you do not do much explicit modeling of spatial or temporal effects. I'm wondering if this is philosophically driven, perhaps because you prefer to get...
Denis Cote writes: I am reviewing a paper using logistic regression and I am uncertain about the way they coded their inputs. They have different ordinal variables coming from self-report questions. For example, self-perceived health" with its answer choice: excellent,...
Hal Varian pointed me to this article in The Economist: Instrumental variables help to isolate causal relationships. But they can be taken too far "Like elaborately plumed birds...we preen and strut and display our t-values." That was Edward Leamer's uncharitable...
Nils Hjort, Chris Holmes, Peter Muller, and Stephen Walker have come out with a new book on Bayesian Nonparametrics. It's great stuff, makes me realize how ignorant I am of this important area of statistics. Here are the chapters: 0....
Avi Feller and Chris Holmes sent me a new article on estimating varying treatment effects. Their article begins: Randomized experiments have become increasingly important for political scientists and campaign professionals. With few exceptions, these experiments have addressed the overall causal...
Daniel Egan sent me a link to an article, "Standardized or simple effect size: What should be reported?" by Thom Baguley, that recently appeared in the British Journal of Psychology. Here's the abstract: It is regarded as best practice for...
Among other things, while on sabbatical in Paris next year I'll be working with my longtime collaborator Frederic Bois, a toxicologist who uses hierarchical Bayes models extensively. We have a project in toxicology that necessarily also involves research in Bayesian...
It goes like this: there's something you want to estimate and you have some data. Maybe, to take my favorite recent example, you want to break down support for school vouchers by religion, ethnicity, income, and state (or maybe you'd...
To follow up on yesterday's discussion, I wanted to go through a bunch of different issues involving graphical modeling and causal inference. Contents: - A practical issue: poststratification - 3 kinds of graphs - Minimal Pearl and Minimal Rubin -...
1. Coalitions, voting power, and political instability. Thurs 4 Jun, 3:30pm, Kane Hall 210 at the University of Washington. Part of the Math Across Campus series. We shall consider two topics involving coalitions and voting. Each topic involves open questions...
Andrew J. Oswald and Nattavudh Powdthavee write: In remarkable research, the sociologist Rebecca Warner and the economist Ebonya Washington have shown that the gender of a person's children seems to influence the attitudes and actions of the parent. Warner (1991)...
Andrew Grogan-Kaylor writes:...
Christian Robert, Nicolas Chopin, and Judith Rousseau wrote this article that will appear in Statistical Science with various discussions, including mine. I hope those of you who are interested in the foundations of statistics will read this. Sometimes I feel...
The political website Talking Points Memo is featuring a discussion of Red State, Blue State this week. The discussants so far have included software developer / political activist Aaron Swartz, historian Eric Rauchway, political scientist Nolan McCarty, journalist Steve Sailer,...
We all know to look at main effects first and then look for interactions. But a former student pointed me to some disturbing advice from some statistics textbooks. I'll give his quotes and then my reactions:...
Aaron Strauss spoke today on his work with Kosuke Imai on estimating the optimal order of priority and the optimal approach for contacting voters in a political campaign. They use inferences from field experiments on voter turnout and persuasion and...
Now that we're on the topic of econometrics . . . somebody recommended to me a book by Deirdre McCloskey. I can't remember who gave me this recommendation, but the name did ring a bell, and then I remembered I...
I just read the new book, "Mostly Harmless Econometrics: An Empiricist's Companion," by Joshua Angrist and Jorn-Steffen Pischke. It's an excellent book and, I think, well worth your $35. I recommend that all of you buy it. I also have...
By Aleks, Grazia, Yu-Sung and myself. Here's the article, and here's the abstract: We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5,...
Nate Silver and Greg Mankiw have an interesting exchange about the use of exogenous instruments to estimate causal effects. Unfortunately, the subject is macroeconomics, a topic on which I know next to nothing beyond what I learned in Mr. Cutlip's...
This semester I'm teaching my "how to teach" class: The Teaching of Statistics at the University Level. (Stat 6600, or those of you here at Columbia.) I'll post more on that in a bit. Here I want to talk about...
A couple weeks ago I posted an analysis of rich and poor voters in rich and poor states from exit polls in 2008, and a commenter ("Audacious Epigone") picked up on Larry Bartels's observation that, among whites, the Republican advantage...
AT writes: A Facebook friend pointed me to this Gladwell piece discussing how you can('t) predict whether a teacher will be successful, but more importantly, on the range of advancement of a class depending on a teacher's ability. The claim...
Chris Chatham writes: I am using multilevel logistic regression to model individuals' abilties to 'stop' a planned motor movement (my binary outcome), based on the delay between the beginning of the trial and the occurrence of the stop signal (my...
Rajeev sends a link to this paper on hierarchical modeling for evaluating multi-site interventions: This article discusses the evaluation of programs implemented at multiple sites. Two frequently used methods are pooling the data or using fixed effects (an extreme version...
Bruce McCullough writes: Don't know if you're aware of this, but if you need more evidence for the primacy of interaction effects, data mining is a great place to look. My degree is in economics. I was taught to use...
If you're in D.C., you should stop by. . . . I'm speaking in the statistics department at George Washington University on the topic of interactions. Here's the powerpoint and here's the abstract: As statisticians and practitioners, we all know...
Mohammed Mohammed points me to this article by John Nichols, which begins: Observational studies comparing groups or populations to evaluate services or interventions usually require case-mix adjustment to account for imbalances between the groups being compared. Simulation studies have, however,...
See here for Jeremy's comments to my comments. I agree with what he writes. The whole discussion reminds me of a comment made to me once by a statistician who generally works with engineers. He said that when he talks...
John Kastellec sent me this attractive paper: We [Kastellec et al.] study the relationship between state-level public opinion and the roll call votes of senators on Supreme Court nominees. Applying recent advances in multilevel modeling, we use national polls on...
I have mixed feelings about this picture and accompanying note of Jeremy Freese, who writes: Key findings in quantitative social science are often interaction effects in which the estimated “effect” of a continuous variable on an outcome for one group...
John Kastellec writes: Let's say you wanted to estimate a multilevel model with an interaction in the individual-level model, say: Pr(y=1) = logit-1(B0 + B1X + B2Z + B3XZ) and you wanted to allow the interaction effect to vary by...
Here are my thoughts, to appear in the American Statistician: 1. Introduction 2. Teaching Bayesian statistics to social scientists, including a discussion of what is Bayesian about making graphs to get a better understanding of the deterministic part of a...
Robert Rohrschneider writes: I [Rohrschneider] am trying to gain an understanding of the pitfalls of multi-level analyses in my work which typically requires that I merge country data with surveys of individuals, usually in Europe. I wonder whether you could...
Longhai Li did a really cool Ph.D. thesis (under the supervision of Radford Neal) on computing for models with deep interactions. The website containing all stuff about this software, including the R packages, documentations and references, is here and here....
I'm speaking Monday 14 April at 4:30 on weakly informative prior distributions and models with interactions. I'll try to make things accessible to a general audience of people who might not know much about statistics in general or Bayesian methods...
Jowei Chen sent along this paper: In the aftermath of the summer 2004 Florida hurricane season, the Federal Emergency Management Agency (FEMA) distributed $1.2 billion in disaster aid to Florida residents. This research presents two empirical findings that collectively suggest...
Boliang writes,...
Robin Hanson suggested here an experimental design in which patients, instead of randomly assigned to particular treatments, are randomly given restrictions (so that each patient would have only n-1 options to consider, with the one option removed at random). I...
David Afshartous writes,...
I had various course titles floating around: my course at Columbia this spring is officially called Applied Statistics, and I had promised people that it would cover Bayesian statistics. At Harvard they asked me to teach Statistical Computing, but I...
Lingzhou Michael Xue writes in with two questions:...
Manuel Spínola writes,...
Jeff pointed me to this interesting paper by David Primo, Matthew Jacobsmeier, and Jeffrey Milyo comparing multilevel models and clustered standard errors as tools for estimating regression models with two-level data....
Dan Schrage writes with a question about how to model group-level variation: I [Dan] am trying to better understand the recommendation in your new book to always use random effects (pg. 246) in modeling. (I'm following your definition #5 here...
I can't say I have much of an explanation for this, but it's interesting: --> Church attendance is a strong predictor of how high-income people vote, not such a good predictor for low-income voters. There's lots of talk about religion...
Mike Alvarez, Delia Bailey, and Jonathan Katz just completed this paper: Since the passage of the “Help America Vote Act” in 2002, nearly half of the states have adopted a variety of new identification requirements for voter registration and participation...
Mike Larsen asks,...
I was reminded of the varieties of Bayesians after reading this article by Robin Hanson: [I]n our standard framework systems out there have many possible states and our minds can have many possible belief states, and interactions between minds and...
The political science talk: Culture wars, voting, and polarization: divisions and unities in modern American politics. (Here's the higher-resolution powerpoint version.) Here's the handout that goes with the talk The statistics talk: Interactions are important....
Scott Cunningham writes, Today I was rereading Deirdre McCloskey and Ziliak's JEL paper on statistical significance, and then reading for the first time their detailed response to a critic who challenged their original paper. I was wondering what opinion you...
We've been trying to figure out how to set up a weekly lab meeting--something where people take turns giving updates on their research, along with a "Hill Street Blues" sort of summary of progress on ongoing projects. My impression is...
Someone writes in with a question that I can't answer but which reminds me of a general point about interactions between statisticians and others....
Seth Wayland writes, In Chapter 14.1 of your new book, the example uses only predictors for which you have census data at the state level. In the postratification step, you just plug the values of those covariates into the model,...
How do you summarize logistic regressions and other nonlinear models? The coefficients are only interpretable on a transformed scale. One quick approach is to divide logistic regression coefficients by 4 to convert on to the probability scale--that works for probabilities...
This (by Aleks, Grazia, Yu-Sung, and myself) is really cool. Here's the abstract: We propose a new prior distribution for classical (non-hierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5,...
Song Qian sent me this paper, to appear in the journal Ecology, on ecological applications of multilevel analysis of variance. Here's the abstract: A Bayesian representation of the analysis of variance by Gelman (2005) is introduced with ecological examples. These...
Robin Hanson points out that biological systems that have a useful function are not necessarily optimal when put in new environments. This reminds me of an interesting interesting article by Witold Rybczynski where I learned that the structural engineer Ove...
How to talk so kids will listen and listen so kids will talk, by Adele Faber and Elaine Mazlish. I read this book long before I had kids--it's incredibly helpful for interactions with adults as well. It's definitely #1 on...
Jarrett Byrnes writes, A group of us are working through your Multilevel book, and a question has come up regarding models incorporating multiple predictors. We were working some of the chapters on using simulation to draw inference, but have been...
Wil Wilkinson points to an interesting article by Nicholas Eberstadt (and adds some comments of his own) on the topic of the high birth rates in the United States compared to Europe. Wilkinson attributes the difference to Americans' higher average...
Yu-Sung and Jeff pointed me to a study by Joseph Price and Justin Wolfers on racial discrimination among NBA referees. Basically, black refs call more fouls on white players and vice-versa. Here's a news article (by Alan Schwarz), here's the...
Statisticians often talk about a bias-variance tradeoff, comparing a simple unbiased estimator (for example, a difference in differences) to something more efficient but possibly biased (for example, a regression). There's commonly the attitude that the unbiased estimate is a better...
Michael Kubovy writes, Can you point me to a model report of empirical research (preferably of a designed experiment) using mixed models? As you know, the pattern in psychology is to have a stultifying paragraph listing which effects and interactions...
I received the following email:...
Most academic research employs basic variables that are then correlated or regressed on with outcomes of interest. These basic variables are, for example, income, state and similar. Using such variables we can claim that, on average, urban dwellers vote for...
I came across this talk by David Donoho (see also here for more detail) from 2000. I was disappointed to see that he scooped me on the phrase "blessing of dimensionality" but I guess this is not such an obscure...
Michael Weiksner writes, I [Weiksner] do research on deliberation, where the treatment itself is defined as the interaction with other people (who are inevitably also randomly assigned to the treatment group). Because all the treated individuals interact, I know that...
I've become increasingly convinced of the importance of treatment interactions--that is, models (or analyses) in which a treatment effect is measurably different for different units. Here's a quick example (from my 1994 paper with Gary King): But there are lots...
Regression coefficients are not very pleasant to look at when listed in a table. Moreover, the value of the coefficient is not what really matters. What matters is the value of the coefficient multiplied with the value of the corresponding variable: this is the actual "effect" that contributes to the value of the outcome, or with logistic regression, towards the log-odds ratio. With this approach, it is no longer necessary to scale variables prior to regression. A nomogram is the visualization method based on this idea.
Holly writes, I am interested in where children live when their parent is incarcerated. It turns out that there is a major gender difference in that when the father is incarcerated the child tends to live with the other parent,...
I just finished reading Dick Berk's book, "Regression analysis: a constructive critique" (2004). It was a pleasure to read, and I'm glad to be able to refer to it in our forthcoming book. Berk's book has a conversational format and...
Andrew Sutter writes,...
Many scientists of the "selfish gene" persuasion get bothered by instances of altruistic behavior by humans and other animals. For example, Damon Centola forwarded these links: Human beings routinely help others to achieve their goals, even when the helper receives...
What makes an observation interesting? Through the example of devious quizzes that ask you to distinguish ape art from modern art, we will investigate the fundamental idea of support vector machines: a SVM is a classifier specified in terms of...
My New Year's resolutions:...
Recent studies by police departments and researchers confirm that police stop racial and ethnic minority citizens more often than whites, relative to their proportions in the population. However, it has been argued that stop rates more accurately reflect rates of...
Encouraged by Carrie's plug, I read Leslie Savan's book, "Slam Dunks and No Brainers": It's an entertaining and thought-provoking look at "pop language," which are a particular kind of enjoyable and powerful cliche that we use in speech (and sometimes...
In a recent article in the New York Review of Books (see also here), Freeman Dyson writes, Great scientists come in two varieties, which Isaiah Berlin, quoting the seventh-century-BC poet Archilochus, called foxes and hedgehogs. Foxes know many tricks, hedgehogs...
Jim Hodges, Yue Cui, Daniel Sargent, and Brad Carlin completed their paper on "smoothed Anova". The abstract begins: "We present an approach to smoothing balanced, single-term analysis of variance (ANOVA) that emphasizes smoothing interactions, the premise being that for a...
I'm trying to integrate class-participation activities into the Applied Regression and Multilevel Modeling course I'm teaching this semester. We have a whole bunch of these activities for introductory statistics (in my intro class I have at least one demo and...
Scott de Marchi writes, regarding the "blessing of dimensionality": One of my students forwarded your blog, and I think you've got it wrong on this topic. More data does not always help and this has been shown in numerous applications...
Here's the talk I gave last week on interactions in multilevel models (work in collaboration with Samantha Cook and Shouhao Zhou). The short version: (1) interactions are important, (2) more work is needed on how to reasonably model complex structures...
I was at the UCLA statistics preprint site, which is full of interesting papers--we should so something like that here at Columbia--and came across this paper by Richard Berk on randomized experiments. From the abstract to Berk's paper:...
Jeff Fagan forwarded this article on gun violence by Jeffrey Bingenheimer, Robert Brennan, and Felton Earls. The research looks at children in Chicago who were exposed to gun violence, and uses propensity score matching to find a similar group who...
In reference to the recent entry on misperception of minorities, John Sides sent me the following data on the estimated, and actual, percentage of foreign-born residents in each of 20 European countries: The estimates are average survey responses in each...
Someone sent me a question about whether it makes sense to use multilevel modeling in a study of polls from many countries. I'll give the question and my response. The topic has been on my mind because I just wrote...
On June 20, we had a miniconference on causal inference at the Columbia University Statistics Department. The conference consisted of six talks and lots of discussion. One topic of discussion was the use of propensity scores in causal inference, specifically,...
Zhiqiang Tan (Biostatistics, Johns Hopkins) writes, regarding my blog entry on regression and matching. I wrote: I'm imagining a unification of matching and regression methods, following the Cochran and Rubin approach: (1) matching, (2) keeping the treated and control units...
I noticed the blog of Kevin Brancato. I've been enjoying reading the blog entries, especially since Kevin is a former student of ours at Columbia! His paper on macroeconomic statistics is also interesting (and relevant to some of my work)....
Here are the slides of the talk I gave at the CDC last week. And here's the abstract: Multilevel (hierarchical) models are increasingly popular for data with hierarchical, longitudinal, and cross-classified structures. We consider several questions that arise in the...
Fully Bayesian analyses of hierarchical linear models have been considered for at least forty years. A persistent challenge has been choosing a prior distribution for the hierarchical variance parameters. Proposed models include uniform distributions (on various scales), inverse-gamma distributions, and...
We would like to incorporate matching methods into a Bayesian regression framework for causal inference, with the ultimate goal of being able to do more effective inference using hierarchical modeling. The founding work here are papers by Cochran and Rubin...
Postdoctoral research opportunity: Columbia University, Departments of Epidemiology and Statistics Supervisors: Ezra Susser (epidemiology) and Andrew Gelman (statistics) We have a NIH-funded postdoctoral position (1 or 2 years) available for what is essentially statistical research as applied to some important...
Tim Halpin-Healy (Physics, Barnard College) spoke today at the Collective Dynamics Group on "The Dynamics of Conformity and Dissent". Unfortunately I wasn't able to attend his talk--it looked interesting--but I have to say, speaking curmudgeonly and parochially as a political...
Christopher Avery, Mark Glickman, Caroline Hoxby, and Andrew Metrick wrote a paper recently ranking colleges and universities based on the "revealed preferences" of the students making decisions about where to attend. They apply, to data on 3000 high-school students, statistical...
Juan Robalino and Alex Pfaff have written a paper on estimating the factors that influence the decision of Costa Rican farmers to clear forest land. This is an important question because, as they note in the article, Rural areas of...
Daniel Ho, Kosuke Imai, Gary King, and Liz Stuart recently wrote a paper on matching, followed by regression, as a tool for causal inference. They apply the methods developed by Don Rubin in 1970 and 1973 to some political science...
This is the reference to the work of Chipman on including interactions: Chipman, H. (1996), ``Bayesian Variable Selection with Related Predictors'', Canadian Journal of Statistics , 24, 17--36....
In a multi-way analysis of variance setting, the number of possible predictors can be huge. For example, consider a 10x19x50 array of continuous measurements, there is a grand mean, 10+19+50 main effects, 10x19+19x50+10x50 two-way interactions. and 10x19x50 three-way interactions. Multilevel...
If you use an RSS reader, you can subscribe to a feed of all future entries matching 'interactions'. [What is this?]