<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Statistical Modeling, Causal Inference, and Social Science</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/mlm/" />
    <link rel="self" type="application/atom+xml" href="http://www.stat.columbia.edu/~cook/movabletype/mlm/atom.xml" />
    <id>tag:www.stat.columbia.edu,2008-11-24:/~cook/movabletype/mlm/1</id>
    <updated>2009-11-06T21:20:03Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.31-en</generator>

<entry>
    <title>In the Applied Statistics Blog this week</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/in_the_applied.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2856</id>

    <published>2009-11-06T21:15:29Z</published>
    <updated>2009-11-06T21:20:03Z</updated>

    <summary>1. Understanding the &apos;Russian Mortality Paradox&apos; in Central Asia: Evidence from Kyrgyzstan Short answer: alcohol and suicide. 2. Lumberjacks as a counterexample to the idea of a &quot;risk premium&quot; They take lots of risks and don&apos;t get paid well for...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Literature" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Miscellaneous Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Sociology" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Statistical graphics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>1.  <a href="http://scienceblogs.com/appliedstatistics/2009/11/understanding_the_russian_mort.php">Understanding the 'Russian Mortality Paradox' in Central Asia: Evidence from Kyrgyzstan</a></p>

<p>Short answer:  alcohol and suicide.</p>

<p>2.  <a href="http://scienceblogs.com/appliedstatistics/2009/11/lumberjacks_as_a_counterexampl.php">Lumberjacks as a counterexample to the idea of a "risk premium"</a></p>

<p>They take lots of risks and don't get paid well for it.</p>

<p>3.  <a href="http://scienceblogs.com/appliedstatistics/2009/11/cell_size_and_scale.php">Cell size and scale</a></p>

<p>This is a visualization you won't want to miss.</p>

<p>4.  <a href="http://scienceblogs.com/appliedstatistics/2009/11/ok_so_this_is_how_i_ended_up_w.php">Three guys named Matt</a></p>

<p>5.  <a href="http://scienceblogs.com/appliedstatistics/2009/10/the_political_philosophy_of_th.php">The political philosophy of the private eye</a></p>

<p>A genre that was rendered obsolete in 1961 (but nobody realizes it).</p>]]>
        
    </content>
</entry>

<entry>
    <title>The two blogs</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/the_two_blogs.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2847</id>

    <published>2009-11-06T14:44:24Z</published>
    <updated>2009-11-06T15:58:35Z</updated>

    <summary>Tyler Cowen writes: Andrew Gelman will have a second blog. I don&apos;t yet understand the forthcoming principle of individuation across the two blogs....</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Sociology" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Tyler Cowen <a href="http://www.marginalrevolution.com/marginalrevolution/2009/11/assorted-links.html">writes</a>:</p>

<blockquote>Andrew Gelman will have <a href="http://scienceblogs.com/appliedstatistics/">a second blog</a>.  I don't yet understand the forthcoming principle of individuation across the two blogs.</blockquote>]]>
        <![CDATA[<p>I have to admit I haven't thought this through at any level of detail.  When the Science Blogs people asked me if I wanted to blog there, I canvassed my co-bloggers, and most of them thought it was a good idea.  The Science Blog would reach a new audience, but I didn't want to abandon the blog here.  (This blog is an extension of my research persona, which seems about right to me.  On Science Blogs, I'm just one of seventy bloggers, which is fine--I think a bunch of those blogs get a lot more readers than I do--but it didn't seem right as a blogging home for me.)</p>

<p>So, how does the content differ?  My plan was--is--for the Science Blog to be optional and a portal on to this main blog.  Thus, I'll post links here to the Science Blogs content so you can click over and take a look at it if you'd like.  I don't really have any plans to separate content, except of course that I won't be putting the technical statistical stuff over there.</p>

<p>I might just start crossposting everything there on to here--that's what I do when I post on New Majority, 538, and the Monkey Cage--but for now I'll try to have some material that's only over there.  It seems only fair, since they gave me a blog there, that I post some unique content.</p>

<p>I'm still not sure if this new blog makes sense.  New Majority is fine--I sent them material on occasion and they decide whether to post it--and 538 is great--again, most of my stuff doesn't really fit there, but every week or so I have something that's highly relevant to current political events, and then I post there.  The Monkey Cage is no big deal because I can crosspost for them whenever--the blog has its own existence without needing too much from me.  But this new blog . . . well, we'll see how it goes.  It's a bid to spread the statistical gospel to a wider audience.</p>]]>
    </content>
</entry>

<entry>
    <title>Slipperiness of the term &quot;risk aversion&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/slipperiness_of.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2835</id>

    <published>2009-11-05T19:57:46Z</published>
    <updated>2009-11-05T20:49:06Z</updated>

    <summary>I don&apos;t like the term &quot;risk aversion&quot; (see here and here). For a long time I&apos;ve been meaning to write something longer and more systematic on the topic, but every once in awhile I see something that reminds me of...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Decision Theory" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Economics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>I don't like the term "risk aversion" (see <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/12/risk_aversion_a.html">here</a> and <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2006/11/loss_aversion_i.html">here</a>).  For a long time I've been meaning to write something longer and more systematic on the topic, but every once in awhile I see something that reminds me of the slipperiness of the topic.</p>

<p>For example, Alex Tabarrok <a href="http://www.marginalrevolution.com/marginalrevolution/2009/10/why-are-americans-more-risk-averse-about-medicine-than-europeans.html">asks</a>, "Why are Americans more risk averse about medicine than Europeans?"  It's a good question, and it's something I've wondered about myself.  But I don't know what he's talking about when he says that "the stereotype is that Americans are more risk-loving" than Europeans.  Huh?  Americans are notorious for worrying about risks, with car seats, bike helmets, high railings on any possible place where someone could fall, Purell bottles everywhere, etc etc. The commenters on Alex's blog are all talking about drug company regulations, but it seems like a broader cultural thing to me.</p>

<p>But I'm bothered by the term "risk aversion."  Why exactly is it appropriate to refer to strict rules on drug approvals as "risk averse"?  In a general English-language use of the words, I understand it, but it gets slippery when you try to express it more formally.</p>]]>
        <![CDATA[<p>I understand what Alex is saying--people are afraid of the risk of an adverse drug reaction, with this fear being "risk averse" rather than simple rational prudence if the cost of the risk aversion outweighs, in expectation, the risk being avoided.  (After all, we don't call it "risk averse" to avoid going down Niagara Falls in a barrel.  The idea of "aversion" is that one is evaluating a tradeoff using a rule that is more stringent than the calculation of expected values.)</p>

<p>Still, it's tricky to refer to this as "risk aversion" in a general sense.  In the drug-approval context, there are two risks--the risks from an adverse drug reaction, and, on the other side, the risk of something bad happening that could've been prevented by taking the drug.  It's risk vs. risk.  What if someone said we should approve just about every drug, so as to avoid the risk of some otherwise-preventable condition?  That would be risk-averse in another way, right?</p>

<p>This stance might seem fanciful, but I actually think it's pretty common, if you shift the context just slightly.  Having done some (academic) work on pest control, I've learned that the most effective method of reducing home roach infestation is to clean the place, put poison in the cracks in the walls, and seal the cracks.  "Bombing" the apartment doesn't really do the trick. It kills some roaches but then the others come back.  And this is beyond whatever poisoning you might get from the pesticide that's sprayed all over.</p>

<p>Nonetheless, people just love, love that bombing.  Every month in our building they put up a list asking who wants their apartment bombed, and lots of people sign up.  (And, beyond these individual choices, there's an institutional choice to bomb people's apartments for free.  Nobody's offering to clean and seal our apartments for free.)  Every month they do it, so I'm pretty sure the roaches are coming back.</p>

<p>To get back to the main point of discussion, this behavior can be viewed as risk-seeking or risk-averse.  Risk-seeking because people are taking on a risk of being exposed to poison and basically getting nothing out of it.  Or, risk-averse because people are willing to do something pretty extreme to avoid the risk of roach exposure.  In general, the "take a pill for it" or "bomb it" attitude can be seen as risk-averse.  Or not, depending on how you look at it.</p>

<p>I guess what I'm trying to say is that the original question--different attitudes toward drug approval and risky behavior among people in different places--is fascinating.  I just don't think "risk aversion" is a useful way of framing it.  As I noted above, I'd like to write something more general on this topic, once I can think of the right way of putting it.</p>]]>
    </content>
</entry>

<entry>
    <title>Computing power, n, and multilevel models</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/computing_power.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2722</id>

    <published>2009-11-05T14:37:37Z</published>
    <updated>2009-11-01T09:29:08Z</updated>

    <summary>Asa writes: I took your class on multilevel models last year and have since found myself applying them in several different contexts. I am about to start a new project with a dataset in the tens of millions of observations....</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Multilevel Modeling" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Statistical computing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Asa writes:</p>

<blockquote>I took your class on multilevel models last year and have since found myself applying them in several different contexts.  I am about to start a new project with a dataset in the tens of millions of observations.  In my experience, multilevel modeling has been most important when the number of observations in at least one subgroup of interest is small.  Getting started on this project, I have two questions:

<p>1) Do multilevel models still have the potential to add much accuracy to predictions when n is very large in all subgroups of interest?</p>

<p>2) Do you find SAS, STATA, or R to be more efficient at handling multilevel/"mixed effects" models with such a large dataset (wont be needing any logit/poisson/glm models)? </blockquote></p>

<p>My reply:</p>

<p>Regarding software, I'm not sure, but my guess is that Stata might be best with large datasets.  Stata also has an active user community that can help with such questions.</p>

<p>For your second question, if n is large in all subgroups, then multilevel modeling is typically not needed.  But if n is large in all subgroups, you can simply fit a separate model in each group.  That is equivalent to a full-interaction model.  At that point you might be interested in details within subgroups, and then you might want a multilevel model.</p>

<p>Asa then wrote:</p>

<blockquote>Yes, a "full interaction" model was the alternative I was thinking of. And yes, I can imagine the results from that model raising further questions about whats going on within groups as well.

<p>My previous guess was that SAS would be the most efficient for multilevel modeling with big data. But I just completely wrecked my (albeit early 2000's era) laptop looping proc mixed a bunch of times with a much smaller dataset.</blockquote></p>

<p>I don't really know on the SAS vs. Stata issue.  In general, I have warmer feelings toward Stata than SAS, but, on any particular problem, who knows?  I'm pretty sure that R would choke on any of these problems.</p>

<p>On the other hand, if you end up breaking the problem into smaller pieces anyway, maybe the slowness of R wouldn't be so much of a problem.  R does have the advantage of flexibility.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Jewish Marriage Tied to Israel Trip </title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/jewish_marriage.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2851</id>

    <published>2009-11-05T10:06:42Z</published>
    <updated>2009-11-05T10:09:41Z</updated>

    <summary>Aleks sends along this amusing news article by Jennifer Levitz: A new study found that rates of marriage outside the faith were sharply curbed among young Jews who have taken &quot;birthright&quot; trips to Israel . . . Over the past...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Sociology" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Aleks sends along <a href="http://online.wsj.com/article_email/SB125652745959507567-lMyQjAxMDI5NTM2MTUzMjE3Wj.html">this amusing news article</a> by Jennifer Levitz:</p>

<blockquote>A new study found that rates of marriage outside the faith were sharply curbed among young Jews who have taken "birthright" trips to Israel . . . Over the past decade, Taglit-Birthright Israel, a U.S. nonprofit founded by Jewish businessmen, has sponsored nearly 225,000 young Jewish adults for free 10-day educational tours of Israel as a way to foster Jewish identity. . . .

<p>A study [by Brandeis University researcher Leonard Saxe and partly funded by Taglit-Birthright] showed that 72% of those who went on the trip married within the faith, compared with 46% of people who applied for the trip but weren't selected in a lottery. . . . The Brandeis study looked at 1,500 non-Orthodox Jewish adults who took Taglit trips or applied for one between 2001 and 2004. . . . The Brandeis study looked at 1,500 non-Orthodox Jewish adults who took Taglit trips or applied for one between 2001 and 2004.</blockquote></p>

<p>The article also said that 10,000 people participated in these trips last summer, which suggests that the 1,500 people in the research study represent a very small fraction of the participants from 2001-2004.  I have no idea if this is a random sample, or what.  Also I wonder about the people who participated in the lottery, were selected, but didn't go on the trip.  Excluding these people (if there are many of them) could bias the results.  The news article unfortunately doesn't link to any research report.</p>]]>
        <![CDATA[<p>.P.S.  The article also says:</p>

<blockquote>Most estimates of America's Jewish population place it a little higher than six million, although some demographers have argued it is higher. The U.S. census doesn't track people by faith.</blockquote>

<p>I've seen estimates that are closer to 5 million.  More to the point, I don't think that "tracking people by faith" is the right way to think about it.  For this sort of counting exercise, Judaism is as much of an ethnicity or a nationality than a "faith."  For example:</p>

<blockquote>Taglit's founders and funders include Charles Bronfman, heir to the Seagram liquor empire, and Michael Steinhardt, a former hedge-fund manager. . . . Mr. Steinhardt, who describes himself as an atheist, has said he supports Taglit because he wants to pass along Judaism's humanistic values.</blockquote>]]>
    </content>
</entry>

<entry>
    <title>Null and Vetoed: &quot;Chance Coincidence&quot;?</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/null_and_vetoed.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2848</id>

    <published>2009-11-04T20:39:17Z</published>
    <updated>2009-11-04T22:02:27Z</updated>

    <summary>Philip Stark sent along this set of calculations on the probability that the hidden message in Gov. Schwartzenegger&apos;s message could&apos;ve occurred by chance. The message, if you haven&apos;t heard, is:...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Philip Stark sent along <a href="http://statistics.berkeley.edu/~stark/Preprints/acrosticVeto09.htm">this set of calculations</a> on the probability that the hidden message in Gov. Schwartzenegger's message could've occurred by chance.  The message, if you haven't heard, is:</p>]]>
        <![CDATA[<blockquote>To the Members of the California State Assembly:

<p>I am returning Assembly Bill 1176 without my signature.</p>

<p>For some time now I have lamented the fact that major issues are overlooked while many <br />
unnecessary bills come to me for consideration. Water reform, prison reform, and health <br />
care are major issues my Administration has brought to the table, but the Legislature just <br />
kicks the can down the alley.</p>

<p>Yet another legislative year has come and gone without the major reforms Californians <br />
overwhelmingly deserve. In light of this, and after careful consideration, I believe it is <br />
unnecessary to sign this measure at this time.</blockquote></p>

<p>Philp concludes:</p>

<blockquote>The null hypothesis for testing "coincidences" matters. In this example, it is easy to get a wide spectrum of values for the "probability" of a coincidence. In the six calculations here, the probability ranges from about one in a couple of thousand to one in 487 billion: a factor of nearly 200 million--more than 8 orders of magnitude. News consumers should be wary of calculations of the "chance" of a coincidence, regardless of the context.</blockquote>

<p>Amusing--but I don't think the 1 in 2520 model makes a lot of sense!  As Philip writes, "a better 'null model; would pull full sentences at random from Governor Schwartzenegger's other vetoes, string them together, and see where the linebreaks fell."  I expect the model of random words from the Gutenburg corpus would come pretty close to that.  Then you have to multiply by some factor to correct for multiplicity, but considering you're starting with a probability of about 1 in a trillion, it seems unlikely that any correction for multiplicity could make this sort of thing very likely.</p>

<p>A good classroom or homework example, in any case.</p>]]>
    </content>
</entry>

<entry>
    <title>Med School Interview Questions</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/med_school_inte.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2810</id>

    <published>2009-11-04T17:35:32Z</published>
    <updated>2009-11-01T09:32:55Z</updated>

    <summary>The questions are no big deal, but what I find interesting is that medical school do personal interviews at all. No place where I&apos;ve ever worked has interviewed grad school applicants. It&apos;s hard for me to see what you get...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Decision Theory" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Economics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p><a href="http://www.blog.sethroberts.net/2009/10/20/med-school-interview-questions/">The questions</a> are no big deal, but what I find interesting is that medical school do personal interviews at all.  No place where I've ever worked has interviewed grad school applicants.  It's hard for me to see what you get from it, that it would be worth the cost.  I guess there must be quite a bit of psychology literature on this question.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Constructing informative priors</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/constructing_in.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2840</id>

    <published>2009-11-04T07:57:06Z</published>
    <updated>2009-11-01T09:34:31Z</updated>

    <summary>Christiaan de Leeuw writes: I write to you with a question about the construction of informative priors in Bayesian analysis. Since most Bayesians at the statistics department here are more of the &apos;Objective&apos; Bayes persuasion, I wanted some outside opinions...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Bayesian Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Christiaan de Leeuw writes:</p>

<blockquote>I write to you with a question about the construction of informative priors in Bayesian analysis. Since most Bayesians at the statistics department here are more of the 'Objective' Bayes persuasion, I wanted some outside opinions as well.</blockquote>]]>
        <![CDATA[<blockquote>I am now working on my master's thesis project. My interest is in Bayesian statistics using informative priors, and the goal of my thesis project is to develop (the basis for) a method for constructing such priors using published results of earlier studies (in this case, specifically for linear regression models, where only reported results and not the data sets are available). This seemed like an obvious source of prior information to me, but when I searched through the statistical literature I found virtually nothing on the subject. Though I found numerous instances of researchers using existing literature in some way when specifying priors, this was almost always in a seemingly rather informal and ad hoc way. I could not find any attempts to systematically combine results from several existing studies to obtain informative priors. Searching through the literature on (Bayesian) meta-analysis similarly yielded very little of relevance to this issue.

<p>My question is therefore the following: why is there so little literature on systematically combining existing results from earlier studies (to obtain priors, or more generally as a meta-analysis)? I did quite extensive searches so it seems unlikely that it is there and I just missed it. It also doesn't seem like a trivial problem to me. Even if the informed consensus is that it is for some reason not worth the effort, I still would expected to find some papers on the subject. Consequently I am thoroughly puzzled as to why I can't find anything, what I am overlooking.</p>

<p>My reply:  I know what you mean.  For <a href="http://www.stat.columbia.edu/~gelman/research/published/bois2.pdf">our 1996 article</a>, Frederic Bois and I constructed prior distributions based on the medical literature, but as you put it, we did it in an ad hoc way.  My main advice on this point is that a good parameterization can help:  if the parameters mean something, and if their meaning transfers well across people, they it's a more reasonable task to try to put together a prior distribution.</p>

<p>More generally, yes, there is a literature on meta-analysis--we even have a couple examples in Bayesian Data Analysis.  The general idea, as formulated by Rubin and others a few decades ago, is to set up a group-level regression model.  I don't think that there's any consensus that it's "not worth the effort" (as you put it).  Maybe you're not looking in the right places.</p>]]>
    </content>
</entry>

<entry>
    <title>Can pseudo-R-squareds from logistic regressions be compared and used as a measure of fit?</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/can_pseudo-r-sq.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2642</id>

    <published>2009-11-03T14:18:24Z</published>
    <updated>2009-11-01T09:28:16Z</updated>

    <summary>Jay Kaufman writes:...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Miscellaneous Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Jay Kaufman writes:</p>]]>
        <![CDATA[<blockquote><a href="http://www.nytimes.com/2009/08/17/health/research/17hepatitis.html?_r=1&scp=1&sq=Hepatitis%20C&st=cse">This article</a> by Nicholas Wade, "Genes Tied to Gap in Treatment of Hepatitis C," notes that 55% of European Americans respond favorably to standard hepatitis C treatment, but only 25% of African-Americans.  The authors of the <a href="http://www.nature.com/nature/journal/vaop/ncurrent/full/nature08309.html">new article</a> in Nature assert that this is due in part to an allele they discovered, which is more common in European Americans (based on a survey of Duke University students!).  The authors state that 58% of the ethnic difference is due to the differential distribution of this one allele.   An interesting part of this story from the point of view of statisticians would be the basis for this 58% number.  In the <a href="http://www.nature.com/nature/journal/vaop/ncurrent/extref/nature08309-s1.pdf">supplementary material</a> available with the article, the authors explain their statistical analysis as follows:

<blockquote>Logistic regression does not have a direct equivalent to the R2 that is found in ordinary least squares (OLS) regression that represents the proportion of variance explained by the predictors. However, it is possible to use an analog, so-called a pseudo-R2, to mimic the OLS-R2 in evaluating the goodness-of-fit and the variability explained, which is the approach we used (ref 14). Using this approach we estimated that rs12979860 could account for 58% of the ethnicity-explained variability by estimating the difference between the expected variability if the IL28B SNP does not account for the variability explained by ethnicity at all, and the observed variability explained by both ethnicity and rs12979860.</blockquote>

<p>While these details are somewhat vague, I trust that you will join me in finding this very suspicious. </p>

<p>The pseudo-R2 simply compares the log-likelihood from the null model (only an intercept) to the log-likelihood from the full model (all covariates included).  I would call neither the R2 nor the pseudo-R2 a measure of "goodness of fit", but at least the R2 in a linear model does mean something straightforward.  The pseudo-R2 in a logistic model, however, seems to me to have no straightforward interpretation at all, and I was under the impression that no serious statistician uses this statistic.</p>

<p>The authors wrote that they used this statistic to ascertain that the exposure (allele) "could account for 58% of the ethnicity-explained variability" and yet the pseudo-R2 does not measure variability at all.  They claim to have come up with the 58% number by "by estimating the difference between the expected variability if the IL28B SNP does not account for the variability explained by ethnicity at all, and the observed variability explained by both ethnicity and rs12979860."  I am not sure I quite follow that, but it sounds like they are computing the pseudo-R2 statistic twice, once for a model that contains ethnicity and the allele of interest, and once for the model that contains only ethnicity.  Perhaps one of these numbers that results is 58% as big as the other, or something like that.  If this is indeed what they did, I see no logical connection between this analysis and their claim that 58% of the ethnic disparity is due to the differential distribution of the allele.  I have to assume that Nature has careful statistical review, but this doesn't make a lot of sense to me, based on what I can glean from this description.</blockquote></p>

<p>My reply:  I've never used pseudo-R-squared myself, but I can't speak for the general population of "serious statisticians" here.  I know that a lot of statisticians don't like regular R-squared, but I find it helpful sometimes (see graphs on page 42 of ARM) and even wrote <a href="http://www.stat.columbia.edu/~gelman/research/published/rsquared.pdf">a research article</a> (with Iain Pardoe) on the topic.  So I'd be wary about slamming pseudo-R-squared in general terms.</p>

<p>I also don't know enough about genetics to try to interpret the 58%.  But, yes, my guess is that this difference isn't really 58% of something.  Maybe it makes sense as some approximation, though.  My recommendation in this sort of situation is to forget about log-likelihoods and R-squares and just attack the comparison more directly, perhaps through an ROC curve or some similar approach.</p>]]>
    </content>
</entry>

<entry>
    <title>Your chance to help some people make money (maybe) and improve research (maybe)</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/your_chance_to.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2709</id>

    <published>2009-11-02T14:00:22Z</published>
    <updated>2009-11-01T09:22:23Z</updated>

    <summary>I received the following email: Hello, my name is Lauren Schmidt, and I recently graduated from the Brain &amp; Cognitive Sciences graduate program at MIT, where I spent a lot of time doing online research using human subjects. I also...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Miscellaneous Science" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>I received the following email:</p>

<blockquote>Hello, my name is Lauren Schmidt, and I recently graduated from the Brain & Cognitive Sciences graduate program at MIT, where I spent a lot of time doing online research using human subjects.  I also spent a lot of time being frustrated with the limitations of various existing online research tools.  So now I am co-founding a start-up, HeadLamp Research, with the goal of making online experimental design and data collection as fast, easy, powerful, and painless as can be.  But we need your help to come up with an online research tool that is as useful as possible!

<p>We have <a href="http://www.surveygizmo.com/s/170914/headlamp">a short survey</a> (5-10 min) on your research practices and needs, and we would really appreciate your input if you are interested in online data collection.</blockquote></p>

<p>I imagine they're planning to make money off this start-up and so I think it would be only fair if they pay their survey participants.  Perhaps they can give them a share of the profits, if any exist?</p>]]>
        
    </content>
</entry>

<entry>
    <title>Reminder:  my talks in London today and tomorrow</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/reminder_my_tal.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2846</id>

    <published>2009-11-02T08:22:37Z</published>
    <updated>2009-11-01T09:25:24Z</updated>

    <summary>Why we (usually) don&apos;t worry about multiple comparisons Culture wars, voting and polarization: divisions and unities in modern American politics...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Miscellaneous Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p><a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/10/multiple_compar_1.html">Why we (usually) don't worry about multiple comparisons</a></p>

<p><a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/10/culture_wars_vo.html">Culture wars, voting and polarization: divisions and unities in modern American politics</a></p>]]>
        
    </content>
</entry>

<entry>
    <title>Just to disillusion you about the reproducibility of textbook analyses</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/11/just_to_disillu.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2730</id>

    <published>2009-11-01T07:07:27Z</published>
    <updated>2009-11-01T09:21:02Z</updated>

    <summary>Guilherme Rocha writes:...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Bayesian Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Teaching" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Guilherme Rocha writes:  </p>]]>
        <![CDATA[<blockquote>I am using the 2nd. Edition of your "Bayesian Data Analysis" book in <a href="http://mypage.iu.edu/~gvrocha/teaching/S426_S626/">a Bayesian data analysis course</a> at Indiana University.

<p>I am preparing the "kidney cancer" data on section 2.8 for class.</p>

<p>I have a comment and a few questions.</p>

<p>1)	First, the comment.  I have noticed that, in the gd85to89.txt file, the state of IDAHO is spelled as IDADO. This may cause some difficulty if one is using the maps library in R (at first I thought Idaho was missing from this file).</p>

<p>2)	After fixing that, I tried to reproduce figures 2.7 and 2.8 but couldn't. I am wondering if I misunderstood what the data are in the raw data files.</p>

<p>I am guessing gd80to84.txt and gd85to89.txt are data regarding kidney cancer occurrence by county in 1980-1984 and 1985-1989 respectively. I tried to get the raw cancer rate as 10^5*(dc in gd80to84 + dc in gd85to89)/(pop in gd80to84 + pop in gd85to89).  I get the same pattern but not the same counties.</p>

<p>Here are the questions</p>

<p>2a)	How are the cancer rates leading to figures 2.7 and 2.8 computed?</p>

<p>2b)	What is dcC? It seems to be around 10^5*(dc/pop) in each file. I am wondering if this is the age corrected rate... If so, how are they computed?</p>

<p>2c)	What is aadc?</blockquote></p>

<p>My reply:</p>

<p>This reminds me that I should document the data better.  The quick story is that I was analyzing adjusted data as if they were raw counts.  I made various reasonable adjustments which I now forget.  When I have more time I will have to go back and clarify this.  Somewhere I have computer files (probably S-plus code) so I should be able to do this!</p>]]>
    </content>
</entry>

<entry>
    <title>The new blog</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/10/the_new_blog.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2845</id>

    <published>2009-10-31T21:48:09Z</published>
    <updated>2009-10-31T21:48:03Z</updated>

    <summary>Here. Official opening is Monday but youall get to see it earlier....</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Literature" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Teaching" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p><a href="http://scienceblogs.com/appliedstatistics/">Here</a>.  Official opening is Monday but youall get to see it earlier.</p>]]>
        
    </content>
</entry>

<entry>
    <title>An undergraduate econ student asks about how to learn Bayesian statistics</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/10/an_undergraduat.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2827</id>

    <published>2009-10-31T19:08:42Z</published>
    <updated>2009-10-31T21:46:02Z</updated>

    <summary>Matt Stephenson writes:...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Bayesian Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Matt Stephenson writes:</p>]]>
        <![CDATA[<blockquote>I am currently an undergraduate student in economics . . . In order to better facilitate graduate study in Bayesianism next year, I'm arranging an independent study on Bayesian econometrics with a (frequentist) professor, and I am expected to set up the course.  Would you mind if I asked you a few questions about directions to take?

<p>1.a. If my likely use of Bayesian stats is econometrics, do you think it'd be good to use an "Intro to Bayesian Econometrics" textbook (like Lancaster's) or a general introduction textbook like your "Bayesian Data Analysis."</p>

<p>1.b. If you think the latter (and I'm inclined to think a general introduction would be better) would you recommend supplementing your textbook with any other book, perhaps one on BUGS, or another general textbook?  I'm not trying to put you in the difficult spot here of either listing the shortcomings of the textbook or sounding over-confident.  My question really comes from the fact that I won't have a teacher to bolster my understanding.</p>

<p>2. You've mentioned a few times that "to do statistical research [now]... you have to be a computer programmer."  Are there any programming languages I could supplement my "course" with to better prepare me? </p>

<p>3.  As a side note, I recently read your older review of Axelrod and the misapplication of the "prisoner's dilemma" model.  Is it a coincidence that such a critique came from a Bayesian?  It seems indeed that thinking hard about the model with which one is working, and said model's applicability to the subject, is one of the great strengths of applied Bayesianism.</blockquote></p>

<p>My reply:</p>

<p>1.  I like Tony Lancaster's book, and, of course, I like my own.  I think either book would be fine, if your advisor is comfortable with it.  If you want stuff on Bugs, I'd recommend looking into my book with Hill.</p>

<p>2.  R or Matlab for statistics,  But Stata is what's popular in economics.  And then of course there are C and Python.  I think the best way to learn a language is to have to learn it; that is, to have a problem that you need to program to solve.  Fortunately (or unfortunately), there are a lot of problems like that.  Just about any applied statistics problem requires programming if you want to do it right.</p>

<p>3.  I did the prisoner's dilemma stuff as a senior in college, before I knew much about Bayesian statistics.  I certainly didn't perceive any connection at the time.  (And, back then, people dind't casually throw around the term "Bayesian" as a synonym for "rationality" the way they do today.)  But perhaps the same seriousness-about-models that inspired me to criticize Axelrod also made me sympathetic to Donald Rubin's approach to Bayesian statistics.</p>]]>
    </content>
</entry>

<entry>
    <title>Culture wars, voting and polarization:  my talk at the London School of Economics on Tuesday</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/10/culture_wars_vo.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2843</id>

    <published>2009-10-31T10:25:37Z</published>
    <updated>2009-10-31T10:35:49Z</updated>

    <summary>Tuesday 3 Nov, 4-5:30pm in Room R505, Department of Government, LSE. Culture wars, voting and polarization: divisions and unities in modern American politics On the night of the 2000 presidential election, Americans sat riveted in front of their televisions as...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p><a href="http://www2.lse.ac.uk/government/PSPE/ResearchSeminar.aspx">Tuesday 3 Nov, 4-5:30pm</a> in Room R505, Department of Government, LSE.</p>

<blockquote><a href="http://www.stat.columbia.edu/~gelman/presentations/redbluetalkubc.pdf">Culture wars, voting and polarization:  divisions and unities in modern American politics</a>
<p>
<p>
On the night of the 2000 presidential election, Americans sat riveted in front of their televisions as polling results divided the nation's map into red and blue states. Since then the color divide has become a symbol of a culture war that thrives on stereotypes--pickup-driving red-state Republicans who vote based on God, guns, and gays; and elitist, latte-sipping blue-state Democrats who are woefully out of touch with heartland values.  But how does this fit into other ideas about America being divided between the haves and the have-nots?  Is political polarization real, or is the real concern the perception of polarization?

<p>This work is joint with David Park, Boris Shor, Joseph Bafumi, Jeronimo Cortina, and Delia Baldassarri.<br />
</blockquote></p>

<p>(Here's <a href="http://www.youtube.com/watch?v=5JYiJwDob1w">a video version</a> of the talk, from when I gave it at Google.)</p>

<p>I'll be interested to see if people can explain to me the relevance (or lack thereof) of this work to politics in Britain and other countries.</p>

<p>P.S.  I'm speaking at LSE on Monday also (<a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/10/multiple_compar_1.html">on a different topic</a>).</p>

<p>P.P.S.  I'll be speaking again a couple times in London later in the academic year, but on other topics.  All my talks there will be different.</p>]]>
        
    </content>
</entry>

</feed>
