<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Statistical Modeling, Causal Inference, and Social Science</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/mlm/" />
    <link rel="self" type="application/atom+xml" href="http://www.stat.columbia.edu/~cook/movabletype/mlm/atom.xml" />
    <id>tag:www.stat.columbia.edu,2008-11-24:/~cook/movabletype/mlm/1</id>
    <updated>2009-11-22T17:32:16Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.31-en</generator>

<entry>
    <title>Does the Senate Finance Committee version of the health-care bill threaten to cripple evidence-based medicine?</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/does_the_senate.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2891</id>

    <published>2009-11-22T17:31:28Z</published>
    <updated>2009-11-22T17:32:16Z</updated>

    <summary>Harry Selker and Alastair Wood say yes....</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Decision Theory" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Public Health" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Harry Selker and Alastair Wood say <a href="http://scienceblogs.com/appliedstatistics/2009/11/does_the_senate_finance_commit.php">yes</a>.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Some sort of update to ggplot2</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/some_sort_of_up.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2890</id>

    <published>2009-11-22T16:55:45Z</published>
    <updated>2009-11-23T04:02:21Z</updated>

    <summary>Jeroen Ooms writes: Here&apos;s a first version of a new web application for exploratory graphical analysis. It attempts to implement the layered graphics from the R package ggplot2 in a user-friendly way. This two-minute demo video demonstrates a quick how-to....</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Statistical graphics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Jeroen Ooms writes:</p>

<blockquote><a href="http://yeroon.net/ggplot2">Here's</a> a first version of a new web application for exploratory graphical analysis. It attempts to implement the layered graphics from the R package ggplot2 in a user-friendly way. <a href="http://www.youtube.com/watch?v=_haIgb4nFFY&hd=1">This</a> two-minute demo video demonstrates a quick how-to.</blockquote>

<p>He asks for feedback, so if you have any, feel free to comment.  I don't know ggplot2 but my impression is that I should really be using it.  Maybe Yu-Sung and Daniel should consider using it for mrp.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Everybody&apos;s a critic</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/everybodys_a_cr.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2889</id>

    <published>2009-11-22T16:54:17Z</published>
    <updated>2009-11-22T16:55:17Z</updated>

    <summary>Christopher Nelson tries his hand at being a graphics curmudgeon....</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Statistical graphics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Christopher Nelson <a href="http://scienceblogs.com/appliedstatistics/2009/11/everybodys_a_critic.php">tries his hand</a> at being a graphics curmudgeon.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Is Gallup &quot;upping the sample to black Americans&quot;?</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/is_gallup_uppin_1.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2893</id>

    <published>2009-11-22T14:19:25Z</published>
    <updated>2009-11-22T21:07:07Z</updated>

    <summary>Mark Blumenthal links to Rush Limbaugh accusing Gallup of &quot;upping the sample to black Americans to keep [Obama] up at 50%&quot; in the polls. (For the context, see the last paragraph of the transcript.) Frank Newport of Gallup responds here....</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Miscellaneous Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Mark Blumenthal <a href="http://www.pollster.com/blogs/a_big_fat_outlier.php">links</a> to Rush Limbaugh accusing Gallup of "upping the sample to black Americans to keep [Obama] up at 50%" in the polls.  (For the context, see the last paragraph of <a href="http://www.rushlimbaugh.com/home/daily/site_111909/content/01125106.guest.html">the transcript</a>.)</p>

<p>Frank Newport of Gallup responds <a href="http://pollingmatters.gallup.com/2009/11/response-to-rush-limbaughs-claim.html">here</a>.  Newport denies it all, but he would, wouldn't he?</p>

<p>Seriously, though, it's hard to believe that Limbaugh really believes that Gallup is fudging the numbers.  As a big-time radio host, he's gotta know all about marketing surveys, right?  I'm just assuming he said that "upping the sample" bit as more of a joke or an off-the-wall speculation.  It did raise two interesting questions in my mind, though:</p>

<p>1.  The assumption behind Limbaugh's argument--as with many arguments about polls--is that the published poll results have an effect of their own, beyond he president's underlying popularity.  For example, maybe some senator would vote for the health care bill if he read that Obama's approval rating was 51% but would vote no if he read that Obama only had 49% approval.  This might very well be true--it makes sense--I just don't really know.</p>

<p>2.  What if you were a pollster and really did want to cheat and overrepresent Democrats?  How would you do it?  Contra Limbaugh's suggestion, I don't think you'd oversample blacks.  I'm assuming Gallup does telephone surveys, and it's not like there's a separate telephone directory for blacks.  Also, as several commenters to Newport noted, the percentage of blacks among the survey respondents is easy enough to check.  And, for that matter, many survey organizations (possibly including Gallup) do post-sampling weighting adjustments for race, anyway, in which case oversampling blacks won't do anything for you at all.</p>

<p>If you're doing a telephone poll and want to oversample Democrats, you can just call states and area codes where more Democrats live.  Call New York, LA, Chicago, etc.  You can even call people in Democratic-leaning white areas if you want to mix things up a bit.  That'll do the trick.  Bury it deep enough in the sampling algorithm and maybe nobody will notice!</p>

<p>P.S.  I looked at Gallup's home page and was surprised not to see any link to a description of their sampling methods.  Or maybe it's somewhere and I didn't see it.</p>

<p>P.P.S.  Blumenthal sent me <a href="http://www.gallup.com/poll/110380/How-does-Gallup-Daily-tracking-work.aspx">this helpful link</a>.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Type M errors are all over the place</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/type_m_errors_a.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2888</id>

    <published>2009-11-21T20:09:57Z</published>
    <updated>2009-11-21T20:27:28Z</updated>

    <summary>Jimmy points me to this article, &quot;Why most discovered true associations are inflated,&quot; by J. P. Ioannidis. As Jimmy pointed out, this is exactly what we call type M (for magnitude) errors. I completely agree with Ioannidis&apos;s point, which he...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Multilevel Modeling" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Jimmy points me to <a href="http://www.ncbi.nlm.nih.gov/pubmed/18633328">this article</a>, "Why most discovered true associations are inflated," by J. P. Ioannidis.  As Jimmy pointed out, this is exactly what we call type M (for magnitude) errors.  I completely agree with Ioannidis's point, which he seems to be making more systematically than David Weakliem and I did in <a href="http://www.stat.columbia.edu/~gelman/research/published/power4r.pdf">our recent article</a> on the topic.</p>

<p>My only suggestion beyond what Ioannidis wrote has to do with potential solutions to the problem.  His ideas include:  "being cautious about newly discovered effect sizes, considering some rational down-adjustment, using analytical methods that correct for the anticipated inflation, ignoring the magnitude of the effect (if not necessary), conducting large studies in the discovery phase, using strict protocols for analyses, pursuing complete and transparent reporting of all results, placing emphasis on replication, and being fair with interpretation of results."</p>

<p>These are all good ideas.  Here are two more suggestions:</p>

<p>1.  Retrospective power calculations.  See page 312 of <a href="http://www.stat.columbia.edu/~gelman/research/published/power4r.pdf">our article</a> for the classical version or page 313 for the Bayesian version.  I think these can be considered as implementations of Iaonnides's ideas of caution, adjustment, and correction.</p>

<p>2.  Hierarchical modeling, which partially pools estimated effects and reduces Type M errors as well as handling many multiple comparisons issues.  <a href="http://www.stat.columbia.edu/~gelman/research/unpublished/multiple2f.pdf">Fuller discussion here</a> (or <a href="http://www.stat.columbia.edu/~martin/Workshop/statistics_neuro_data_931_speaker_04.mov">see here</a> for the soon-to-go-viral video version).</p>

<p>P.S.  Here's <a href="http://www.stat.columbia.edu/~gelman/research/published/francis8.pdf">the first</a> mention of Type M errors that I know of.  The problem is important enough, though, that I suspect there are articles on the topic going back to the 1950s or earlier in the psychometric literature.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Postdoc openings here in fall, 2010 !!!</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/postdoc_opening_1.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2883</id>

    <published>2009-11-21T11:52:46Z</published>
    <updated>2009-11-21T16:44:49Z</updated>

    <summary>Postdoc opportunities working with Prof. Andrew Gelman in the Department of Statistics on problems related to hierarchical modeling and statistical computing, with projects including high-dimensional modeling, missing-data imputation, and parallel computing. Application areas include public opinion and voting, social networks,...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Miscellaneous Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Postdoc opportunities working with Prof. Andrew Gelman in the Department of Statistics on problems related to hierarchical modeling and statistical computing, with projects including high-dimensional modeling, missing-data imputation, and parallel computing. Application areas include public opinion and voting, social networks, international development, dendrochronology, and models of cancer and drug abuse. Applicants should have experience with Bayesian methods, a willingness to program, and an interest in learning. Applications will be considered as they arrive. The application consisting of cover letter, cv and a selection of published or unpublished articles should be emailed to asc.coordinator@stat.columbia.edu. Please also arrange for three letters of recommendation to be sent to the same email address. This is an exciting place to work: our research group involves several faculty, postdocs, graduate students, and undergraduates working on a wide range of interesting applied problems. We also have strong links to the Earth Institute, the Center for Computational Learning Systems, and the Columbia Population Research Center, as well as to Statistics, Political Science, and other academic departments at Columbia. As a postdoc here, you will have an opportunity to work on collaborative projects on theory, application, computation, and graphics. You can talk to our current and former postdocs if you want to hear how great it is to work here. Positions are usually for two years. Columbia University is an Equal Opportunity/Affirmative Action employer.</p>

<p>Also, if you're finishing up your Ph.D. in statistics, have interest in public health and international development, and would like to work with me, please contact me regarding the <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/10/time_to_apply_f.html">Earth Institute postdoc</a>.  Application deadline is 1 Dec, so time to get moving on this!</p>]]>
        
    </content>
</entry>

<entry>
    <title>&quot;Science revolves around the discovery of new cause-effect relationships but the entire statistics literature says almost nothing about how to do this.&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/science_revolve.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2885</id>

    <published>2009-11-19T16:13:32Z</published>
    <updated>2009-11-19T14:06:36Z</updated>

    <summary>Seth writes: Is this a fair statement, do you think? Science revolves around the discovery of new cause-effect relationships but the entire statistics literature says almost nothing about how to do this. It&apos;s part of an abstract for a talk...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Miscellaneous Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Seth writes:</p>

<blockquote>Is this a fair statement, do you think?

<blockquote>Science revolves around the discovery of new cause-effect relationships but the entire statistics literature says almost nothing about how to do this.</blockquote>

<p>It's part of an abstract for a talk I [Seth] will give at the ASA conference next July. Haven't submitted the abstract yet so can revise it or leave it out.</blockquote></p>

<p>My reply:  This seems reasonable to me.</p>

<p>You could clarify that the EDA literature is all about discovery of new relationships but with nothing about causality, while the identification literature is all about causality but nothing about the discovery of something new.</p>]]>
        <![CDATA[<p>What literature there is on discovery of new causal relationships comes from structural equation modeling (also called graphical modeling), but this work is not particularly exploratory (it's all about discovering relationships in a pre-specified set of variables) and there's a lot of debate about how causal it is.  (The proponents of structural equation modeling and graphical modeling think these tools can be used to discover causality from just about any observational data, but others are skeptical of these claims--rightfully skeptical, in my opinion.)</p>

<p>My point here is not to bash structural equation modeling, just to say that, unless you happen to be a strong believer in that family of methods, Seth's statement is pretty much correct.  And an interesting point it is.  It might very well be that statistics just isn't suited to such questions--I think we make a lot of useful progress in descriptive analysis of the Red State, Blue State variety (or of the identifying-genes-associated-with-diseases variety) but it's worth at least occasionally thinking about the deeper questions.</p>]]>
    </content>
</entry>

<entry>
    <title>Senators and health care; also a discussion of pretty statistical graphics</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/senators_and_he.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2887</id>

    <published>2009-11-19T13:16:01Z</published>
    <updated>2009-11-19T14:07:05Z</updated>

    <summary>Nate, Daniel, and I have an op-ed in the Times today, about senators&apos; positions and state-level opinion on health care. We write: Lawmakers&apos; support for or opposition to reform generally has less to do with the views of their constituents...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Statistical graphics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Nate, Daniel, and I have an op-ed in the Times today, about senators' positions and state-level opinion on health care.  <a href="http://www.nytimes.com/2009/11/19/opinion/19silver.html">We write</a>:</p>

<blockquote>Lawmakers' support for or opposition to reform generally has less to do with the views of their constituents and more to do with the issue of presidential popularity. . . .

<p>For instance, Senator Blanche Lincoln, a Democrat who has been a less-than-strong supporter of the present health care bill, recently told The Times, "I am responsible to the people of Arkansas, and that is where I will take my direction." But where does she look for her cue? Hers is a poor state whose voters support health care subsidies six percentage points more than the national average. On the other hand, Mr. Obama got just 40 percent of the vote there.</p>

<p>Likewise, in Louisiana, where the Annenberg surveys showed health care reform to be popular but where Mr. Obama is not, the Democrats are not assured of Mary Landrieu's vote. . . .</blockquote></p>

<p>Here's our graph that makes this point:</p>]]>
        <![CDATA[<p><img alt="senators.long-reduced.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/senators.long-reduced.png" width="450" height="450" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></p>

<p>In putting together the op-ed, the art dept at the Times made some changes (with our guidance and approval).  Here's what they made:</p>

<p><img alt="senatorsnyt.jpg" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/senatorsnyt.jpg" width="650" height="855" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></p>

<p>Much nicer than our original, I have to say!</p>

<p>We also look at public opinion within states:</p>

<blockquote>Using a statistical method called multilevel regression and post-stratification, we also mapped opinion on health care, breaking down voters by age, family income and state. We're used to thinking about red states and blue states, but the geographic variation is dwarfed by the demographic patterns: younger, lower-income Americans strongly support increased government spending on health care, while elderly and well-off Americans are much less supportive of the idea.</blockquote>

<p>And here are the maps that tell the story:</p>

<p><a href="http://www.stat.columbia.edu/~cook/movabletype/mlm/healthcare2004-StateAgeIncome.png"><img alt="healthcare2004-StateAgeIncome.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/healthcare2004-StateAgeIncome.png" width="500" height="400" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></a></p>

<p>Again, the Times improved it (saving space slightly by combining the two highest income categories):</p>

<p><img alt="mapsnyt.jpg" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/mapsnyt.jpg" width="650" height="474" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></p>

<p>The Times version is not just more attractive; it's also easier to read, I think, in the sense of being more self-contained.  (I still prefer our color scheme, though.)</p>

<p><strong>Summary on the politics</strong></p>

<p>Swing senators' positions on health care are often presented in terms of worries about voter attitudes in the senators' home states.  Overall I don't think this fits the data.  Attitudes on health care vary more consistently by age and income than by state (<a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/10/who_supports_go.html">compare to</a> our graphs of ideology and partisanship), and constituents' views on health care are not a strong predictor of senators' stances.</p>

<p>Public opinion is certainly relevant to the health care debate, but not in the direct senator-follows-the-state way that it is sometimes imagined.</p>

<p><strong>Summary on the graphics</strong></p>

<p>I liked our graphs, but the Times versions are better.  Our graphs took months of effort, but the Times versions were not immediate either.  We had to go back and forth several times to get the clarity we all wanted.  I'd like to think, though, that our effort was not wasted:  by being able to make a bunch of graphs that were informative for us, we were able to home in on the story.  At that point, the graphics professionals helped us to do better.</p>

<p>It's tougher to make graphs for a newspaper than for a book, scholarly journal, or even a blog, I think.  Even beyond the different audiences, a newspaper graph really has to be self-contained.  In a book or article I can accompany the graph with a caption, and I make full use of captions to make each graph reasonably self-contained (to the benefit of people such as myself who jump from graph to graph when reading), and in a blog I can put whatever I want right below the graph.  But in the newspaper, the graph really has to stand alone and with minimal captioning.</p>]]>
    </content>
</entry>

<entry>
    <title>Statfight!</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/statfight.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2886</id>

    <published>2009-11-19T09:38:00Z</published>
    <updated>2009-11-19T14:07:57Z</updated>

    <summary>Fun stuff....</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Sports" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p><a href="http://gladwell.typepad.com/gladwellcom/2009/11/pinker-on-what-the-dog-saw.html">Fun stuff.</a></p>]]>
        
    </content>
</entry>

<entry>
    <title>They call me Dear Abby, or, This might at first seem like a pointless tautological exercise, but actually I think it can lead you forward</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/they_call_me_de.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2884</id>

    <published>2009-11-18T20:50:30Z</published>
    <updated>2009-11-18T22:12:22Z</updated>

    <summary>Daniel Corsi writes: I am a PhD student in epidemiology at McMaster University and I am interested in exploring how characteristics of communities are related to child health in developing countries. I have been using multilevel models to relate physical...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Multilevel Modeling" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Sociology" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Daniel Corsi writes:</p>

<blockquote>I am a PhD student in epidemiology at McMaster University and I am interested in exploring how characteristics of communities are related to child health in developing countries.

<p>I have been using multilevel models to relate physical characteristics of communities such as the number of schools, health clinics, sanitation facilities etc to child height for age and weight for age using observational/survey data.</p>

<p>I have several questions with regards to the group (community-level) level predictors in these models.</blockquote></p>]]>
        <![CDATA[<blockquote>1.    My first question is about interpretation of the group-level coefficients.  I have found some modest coefficients around the order of .13 (se .05) on several community-level variables (i.e. number of schools) predicting child height for age in standard deviation units.  I know from your ARM book that we should be interpreting these coefficients cautiously especially in observational studies.  My question is does this apply to the interpretation of all variables or just variables created by aggregating an individually-measured variable to the group level? 

<p>2.    The second and related point is do you have any suggestions on combining several predictors together at the group level?  I am wondering if it is more useful to look at the effects for several variables related to schools, health clinics, other services in separately or combine these variables in to some form of an index to include in one model.   These variables are typically highly correlated and therefore it doesn't seem to make sense to me to include several individual variables in the same model without combining into some form of a 'total' community facility index - but I haven't found much in the literature about this point.</p>

<p>3.    And the last question I have for you is - Is it even reasonable to be looking at group level influences on child health in this way? And would you suggest controlling for other individual-level predictors of child health for instance household socioeconomic status or mother education? As community-facilities are likely related to these intermediating variables which are stronger predictors of child health, any potential effect of the community environment could be masked by for instance the household SES.  It is also likely that it is the high-SES areas that will have access to better facilities so I am having a difficulty with this issue. </p>

<p>I am not necessarily looking for causal effects, although it is helpful to think this way.  What I am really interested in is what can be learned about community-level characteristics and their influence on child health parameters by using multilevel models, and is there a way to try and understand this that doesn't require causal interpretation of the group-level coefficients?</p>

<p>Thank you for your help with this, I realize that it is a potentially a complex issue, but I haven't found many references to this point, if you have any advice or references that would be very helpful.</blockquote></p>

<p>My reply:</p>

<p>1.  It's always a good idea to be careful.  When I'm stuck on causal interpretations, I go back to descriptive language, for example:  Comparing two kids of the same race, birth order, socioeconomic status, etc., but one kid lives in neighborhood X (which is 1 sd above the mean on #schools but at the mean level on all other neighborhood-level characteristics) and the other kid lives in neighborhood Y (which is 1 sd below the mean on #schools but at the mean level on all other neighborhood-level characteristics).  Based on the model, you'd expect the kid in neighborhood Y to differ by ** much from the kid in neighborhood X.</p>

<p>This might at first seem like a pointless tautological exercise, but actually I think it can lead you forward.  First, it gets you thinking about what does it mean for a neighborhood to be 1 sd above or 1 sd below the mean on a given characteristic.  Second, it pushes you to think about the individual neighborhoods, to give you a sense of what these statistical results are really saying.  Third, it gets you thinking about correlations between the predictors,  Does it really make sense to compare two neighborhoods that differ in #schools but are identical in all other ways, or would it be better to compare neighborhoods that, more realistically, differ in many dimensions?</p>

<p>2.  When combining predictors, I'm a big fan of simple averages, as discussed in chapter 4 of ARM.  The same reasoning goes for group-level predictors.  You just want to think a bit about the scaling.</p>

<p>3.  I don't have any great answers here, but if you're thinking causally--and you should be, I'm sure--it helps to visualize some potential interventions and think about how they'd trickle down through the predictors in your models.  Also think of some hypothetical experiments or observational studies you'd ideally like to do, then see if you can do some modeling to do your best job to fill in the gaps needed to make the inferences you're interested in.</p>

<p>P.S.  Hey, maybe that should be my statistical motto:  "This might at first seem like a pointless tautological exercise, but actually I think it can lead you forward"!</p>]]>
    </content>
</entry>

<entry>
    <title>Clearing up some misconceptions about Bayesian statistics</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/clearing_up_som.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2855</id>

    <published>2009-11-18T12:03:10Z</published>
    <updated>2009-11-18T12:25:21Z</updated>

    <summary>I was checking out the comments at my bloggingheads conversation with Eliezer Yudkowsky, and I noticed the following, from commenter bbbeard: My sense is that there is a fundamental sickness at the heart of Bayesianism. Bayes&apos; theorem is an uncontroversial...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Bayesian Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>I was checking out the <a href="http://bloggingheads.tv/diavlogs/23065">comments</a> at my  bloggingheads conversation with Eliezer Yudkowsky, and I noticed the following, from commenter bbbeard:</p>

<blockquote>My sense is that there is a fundamental sickness at the heart of Bayesianism. Bayes' theorem is an uncontroversial proposition in both frequentist and Bayesian camps, since it can be formulated precisely in terms of event ensembles. However, the fundamental belief of the Bayesian interpretation, that all probabilities are subjective, is problematic -- for its lack of rigor. . . .</blockquote>]]>
        <![CDATA[<blockquote>One of the features of frequentist statistics is the ease of testability. Consider a binomial variable, like the flip of a fair coin. I can calculate that the probability of getting seven heads in ten flips is 11.71875%. I can check this, first of all, with a computer program that generates random numbers uniformly in [0,1) in groups of ten, and keeping tabs on what fraction of samples have exactly seven numbers less than 0.5. Obviously I can do this for any (m,n). I can also take a coin and flip it many times and get an empirical approximation to 11.71875%. At some point a departure from the predicted value may appear, and frequentist statistics give objective confidence intervals that can precisely quantify the degree to which the coin departs from fairness. . . . What is unclear to me is how a Bayesian would map out an experiment, either numerical or empirical, to demonstrate the posterior distribution in the unknown unfair coin experiment. That's why I ask, "what does the posterior distribution mean"? . . . The Bayesian interpretation is certainly not what we use in physics. Suppose we lived at a time before the speed of light was measured accurately. You could poll a bunch of people, even "experts", and get a range of guesses about the value of the speed of light. A Bayesian would construct a prior from this information. But what happens when you go do the experiment? . . . </blockquote>

<p>I don't know that any readers of this blog will need an answer to these questions, but just quickly:</p>

<p>1.  No, Bayesian probabilities don't have to be subjective.  See chapter 1 of Bayesian Data Analysis for discussion and examples.</p>

<p>2.  Bayesian models can indeed be tested.  See chapter 6 of Bayesian Data Analysis.</p>

<p>3.  Probability distributions in physics are not so clear as you might think.  See the bottom half of page 7 in my <a href="http://www.stat.columbia.edu/~gelman/research/published/badbayesresponsemain.pdf">Bayesian Analysis discussion</a> here.</p>

<p>OK, I think that just about covers it.</p>

<p>P.S.  These definitions (from pages 1-2 of <a href="http://www.stat.columbia.edu/~gelman/research/published/badbayesmain.pdf">this article</a>) may also be of help:</p>

<blockquote>"Bayesian inference" represents statistical estimation as the conditional distribution of parameters and unobserved data, given observed data. "Bayesian statisticians" are those who would apply Bayesian methods to all problems. (Everyone would apply Bayesian inference in situations where prior distributions have a physical basis or a plausible scientific model, as in genetics.) "Anti-Bayesians" are those who avoid Bayesian methods themselves and object to their use by others.</blockquote>]]>
    </content>
</entry>

<entry>
    <title>&quot;What&apos;s a statistician?  An accountant without the laughs.&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/whats_a_statist.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2882</id>

    <published>2009-11-17T21:11:59Z</published>
    <updated>2009-11-17T21:18:10Z</updated>

    <summary>Andrew Roberts writes: I teach political science at Northwestern. I have a book coming out with U of Chicago Press called &quot;The Thinking Student&apos;s Guide to College&quot; and I wanted to ask you a question about one part. I have...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Miscellaneous Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Andrew Roberts writes:</p>

<blockquote>I teach political science at Northwestern. I have a book coming out with U of Chicago Press called "The Thinking Student's Guide to College" and I wanted to ask you a question about one part.

<p>I have a section where I advocate a few "neglected majors". One of them is statistics. I wrote the following (see below) about statistics, but it seems a little dull to me. I'd be curious if you would add anything that would make the major seem more attractive. (FYI, the other neglected majors are linguistics, regional studies, and sociology).</p>

<blockquote>To fully understand just about any phenomenon in the world, from atoms to people to countries, you need a grasp of statistics. Statistics teaches you how to measure quantities, collect data, and then draw inferences from that information. Though this might sound boring, these tasks are necessary to explain most of the forces affecting our lives, whether the workings of markets, the movement of public opinion, or the spread of disease. Not only does a statistics major give you the skills to answer these questions, it is also extremely marketable. There is hardly a firm which could not benefit from a trained statistician, and statisticians are just as desirable for public interest groups hoping to help the disadvantaged. And if you worry that you are not the math type, statistics is considerably less demanding than a pure math major and does more to help you understand the real world in all its complexities.</blockquote></blockquote>

<p>"Considerably less demanding than a pure math major," huh?  OK, OK . . .</p>

<p>My main suggestion would be to be less apologetic.  No need to say "Though this might sound boring"!</p>

<p>Perhaps some of you have specific suggestions for Andrew Roberts for his book?</p>]]>
        
    </content>
</entry>

<entry>
    <title>Adaptively scaling the Metropolis algorithm using expected scaled jumped distance</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/adaptively_scal.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2866</id>

    <published>2009-11-17T17:01:20Z</published>
    <updated>2009-11-17T16:21:26Z</updated>

    <summary>In the spirit of Christian Robert, I&apos;d like to link to my own adaptive Metropolis paper (with Cristian Pasarica): A good choice of the proposal distribution is crucial for the rapid convergence of the Metropolis algorithm. In this paper, given...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Bayesian Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Statistical computing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>In the spirit of <a href="http://xianblog.wordpress.com/2009/11/07/adaptive-metropolis/">Christian Robert</a>, I'd like to link to <a href="http://www.stat.columbia.edu/~gelman/research/published/A06-109-new_version.pdf">my own adaptive Metropolis paper</a> (with Cristian Pasarica):</p>

<blockquote>A good choice of the proposal distribution is crucial for the rapid convergence of the Metropolis algorithm. In this paper, given a family of parametric Markovian kernels, we develop an adaptive algorithm for selecting the best kernel that maximizes the expected squared jumped distance, an objective function that characterizes the Markov chain. We demonstrate the effectiveness of our method in several examples.</blockquote>

<p>The key idea is to use an importance-weighted calculation to home in on a jumping kernel that maximizes expected squared jumped distance (and thus minimizes first-order correlations).  We have a bunch of examples to show how it works and to show how it outperforms the more traditional approach of tuning the acceptance rate:</p>

<p><img alt="jumpingplot.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/jumpingplot.png" width="637" height="836" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></p>

<p>Regarding the adaptivity issue, our tack is to recognize that the adaptation will be done in stages, along with convergence monitoring.  We stop adapting once approximate convergence has been reached and consider the earlier iterations as burn-in.  Given what is standard practice here anyway, I don't think we're really losing anything in efficiency by doing things this way.</p>

<p>Completely adaptive algorithms are cool too, but you can do a lot of useful adaptation in this semi-static way, adapting every 100 iterations or so and then stopping the adaptation when you've reached a stable point.</p>

<p>The article will appear in Statistica Sinica.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Alan Abramowitz on politicians and ideological conformity</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/alan_abramowitz.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2881</id>

    <published>2009-11-17T13:33:49Z</published>
    <updated>2009-11-16T21:38:06Z</updated>

    <summary>In response to my note on the limited ideological constraints faced by legislators running for reelection, Alan Abramowitz writes: I [Abramowitz] agree--although they probably have less leeway now than in the past due to growing pressure toward ideological conformity within...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>In response to <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/politicians_hav.html">my note</a> on the limited ideological constraints faced by legislators running for reelection, Alan Abramowitz writes:</p>

<blockquote>I [Abramowitz] agree--although they probably have less leeway now than in the past due to growing pressure toward ideological conformity within parties, especially GOP.  But one thing that struck me as very interesting in your graph is that it looks like the advantage of a moderate voting record is considerably smaller now than it used to be, down from over 4 percentage points in the 1980s to maybe 1.5 points on average now.  It suggests to me that the electorate has become increasingly partisan and that fewer voters are going to defect to an incumbent from the opposing party regardless of voting record.  This could reflect more concern among voters with party control of Congress itself.  Along these lines, one thing I've found in the NES data is a growing correlation between presidential job evaluations and voting for both House and Senate candidates over time. </blockquote>

<p>My reply:  Yes, that makes sense.  The trend is suggestive although (as you can see from the error bars) not statistically significant.  Recently I have not had my thoughts organized enough to write any articles on this stuff, but it feels good to at least post these fragments for others to chew on. </p>]]>
        
    </content>
</entry>

<entry>
    <title>More on risk aversion etc etc etc</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/more_on_risk_av.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2880</id>

    <published>2009-11-16T21:17:27Z</published>
    <updated>2009-11-16T21:33:23Z</updated>

    <summary>A correspondent writes: You may be interested in this article by Matthew Rabin which makes the point that you make in your article: if you are an expected utility maximizer then turning down small actuarially unfair bets (e.g. 50% win...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Decision Theory" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Economics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>A correspondent writes:</p>

<blockquote>You may be interested in <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.43.530&rep=rep1&type=pdf">this article</a> by Matthew Rabin which makes the point that you make in your article: if you are an expected utility maximizer then turning down small actuarially unfair bets (e.g. 50% win $120; 50% lose $100) implies that you would never accept a bet where could lose $1000 (even if you might win an infinite amount of money).  (But proved in more generality).

<p>This was taught to me in the first year of my econ phd program (which I'm currently in!) as why you probably don't want to extrapolate from decisions over small bets to risk aversion in general, not as why we should throw out risk aversion and expected utility maximization completely.   Of course, decision theorists do all kinds of things to try to "fix" this problem.</blockquote></p>

<p>My reply:  Yitzhak (as we called him in high school) wrote his paper after mine had appeared; unfortunately my article was in a statistics journal and he had not heard about it.  (This was before I could publicize everything on the blog.  And, even now, I think a few papers of mine manage to get out there without being noticed.)</p>

<p>I'm glad they teach this stuff in grad schools now--although, in a way, this still proves my point, in that the nonlinear-utility-function-for-money model is still considered such a standard that they feel the need to debunk it.</p>

<p>My correspondent replied:  "I wouldn't call it a debunking....we still go on to use it as the workhorse model in everything we do...."</p>

<p>I think there are good and bad things about this "workhorse model":</p>]]>
        <![CDATA[<p>I think utility theory is great, both in theory and even in practice (which is why I devoted a chapter of Bayesian Data Analysis to it). And I have no problem with the study of risk aversion--that is, of the psychological/economic phenomenon of aversion to risk. I also think it's a good idea to study aversion to loss (not the same thing as risk, for example people don't seem to even like to lose $10, but that's hardly a "risk" in the usual sense of the word) and aversion to uncertainty (as in the $20/30/40 example). All three of these phenomena seem interesting to me, and important enough that it's worth keeping them as three separate concepts. Heck, I even like the game <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/08/correcting_for.html">Risk</a>.</p>

<p>But . . . I think that equating risk aversion to the declining utility of money is a mistake that doesn't help anybody. Given the well-known phenomenon of uncertainty aversion (even apart from loss aversion or risk aversion), I don't think it makes sense to use people's preferences over gambles, at whatever scale, to try to assess their utility functions.</p>

<p>I'm sure that there are lots of useful tools that people have for addressing these problems in applied economic analysis; as noted above, my frustration comes from always having to clear the air about risk aversion, uncertainty aversion, etc. I really think the term "risk aversion" does more harm than good, by leading people to think that there's one single concept that handles all these different psychological/economic phenomena.</p>]]>
    </content>
</entry>

</feed>
