<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Statistical Modeling, Causal Inference, and Social Science</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/mlm/" />
    <link rel="self" type="application/atom+xml" href="http://www.stat.columbia.edu/~cook/movabletype/mlm/atom.xml" />
    <id>tag:www.stat.columbia.edu,2008-11-24:/~cook/movabletype/mlm/1</id>
    <updated>2009-07-03T22:28:39Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.25</generator>

<entry>
    <title>Confusing reliability with validity</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/07/this_note_by_st.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2546</id>

    <published>2009-07-03T21:11:55Z</published>
    <updated>2009-07-03T22:28:39Z</updated>

    <summary>This note by Steve Hsu on the history of the Wranglers (winners of a mathematics competition held each year from 1753-1909 at Cambridge University) reminded me of my experience in the U.S. math olympiad training program in high school. At...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Teaching" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p><a href="http://infoproc.blogspot.com/2009/07/wranglers.html">This note</a> by Steve Hsu on the history of the Wranglers (winners of a mathematics competition held each year from 1753-1909 at Cambridge University) reminded me of my experience in the U.S. math olympiad training program in high school.  At the time, it seemed clear that we were clearly ordered by ability (with my position somewhere between 15th and 20th out of 24!).  In retrospect, I think there are a lot of tricks to solving and writing up solutions to "Olympiad problems," and I didn't know a lot of these tricks.</p>

<p>It was the usual paradox of measurement:  I was confusing reliability with validity, as they say in the psychometric literature.</p>]]>
        <![CDATA[<p>I<a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/12/could_we_publis.html">n retrospect</a>, it worked out well for me to learn (even if falsely) that there were 15 or 20 kids my age better than me in math.  This made me realize that a career as a "mathematician" (to the extent I understood what this meant, based on my experiences up to the age of 16) was not for me.  Given what I know now, I think I would've wanted to be a statistician even if I'd been the #1 kid at the Olympiad.  Luckily this didn't happen to me.</p>

<p><strong>And now for the most important part</strong></p>

<p>One of the tricks that I didn't know about in the math olympiad training program is to plod along without giving up.  Sometimes the direct approach works, solving a problem by eliminating all alternatives.  That's a "trick" that's useful in a lot of areas of academic life.  For years I've been trying to get this message out to students:  If you get stuck right away, don't just stare at your desk and give up.  Instead, work actively.  This is a point I made in chapter 19 of the ARM book (in particular, the graph on page 416).  My theory is that students at top universities have succeeded pretty well by being able to solve problems quickly; they haven't really needed to develop the tools to solve problems systematically by brute force, the way I like to do it.</p>

<p>I think they did try to explain this principle to us at the olympiad program (How to Solve It, and all that), but I didn't ever get the point, partly I think because the problems were so artificial that there only seemed to be a point to solving them if it could be done easily or through some clever trick.</p>]]>
    </content>
</entry>

<entry>
    <title>How does statistical analysis differ when analyzing the entire population rather than a sample?</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/07/how_does_statis.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2442</id>

    <published>2009-07-03T11:46:10Z</published>
    <updated>2009-07-03T11:50:13Z</updated>

    <summary>Daljit Dhadwal writes: On the Ask Metafilter site, someone asked the following: How does statistical analysis differ when analyzing the entire population rather than a sample? I need to do some statistical analysis on legal cases. I happen to have...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Miscellaneous Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Daljit Dhadwal writes:</p>

<blockquote>On <a href="http://ask.metafilter.com/122797/Statistics-on-the-entire-population">the Ask Metafilter site</a>, someone asked the following:

<p>How does statistical analysis differ when analyzing the entire population rather than a sample? I need to do some statistical analysis on legal cases. I happen to have the entire population rather than a sample. I'm basically interested in the relationship between case outcomes and certain features (e.g., time, the appearance of certain words or phrases in the opinion, the presence or absence of certain issues). Should I do anything different than I would if I were using a sample? For example, is a p-value meaningful in this kind of case?</blockquote></p>

<p>My reply:</p>

<p>This is a question that comes up a lot.  For example, what if you're running a regression on the 50 states.  These aren't a sample from a larger number of states; they're the whole population.</p>

<p>To get back to the question at hand, it might be that you're thinking of these cases as a sample from a larger population that includes future cases as well.  Or, to put it another way, maybe you're interested in making predictions about future cases, in which case the relevant uncertainty comes from the year-to-year variation.  That's what we did when estimating the seats-votes curve:  we set up a hierarchical model with year-to-year variation estimated from a separate analysis.  (Original model is <a href="http://www.stat.columbia.edu/~gelman/research/published/electoral2.pdf">here</a>, later version is <a href="http://www.stat.columbia.edu/~gelman/research/published/unified2.pdf">here</a>.)</p>

<p>So, one way of framing the problem is to think of your "entire population" as a sample from a larger population, potentially including future cases.  Another frame is to think of there being an underlying probability model.  If you're trying to understand the factors that predict case outcomes, then the implicit full model includes unobserved factors (related to the notorious "error term") that contribute to the outcome.  If you set up a model including a probability distribution for these unobserved outcomes, standard errors will emerge.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Statistics on hiring statisticians</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/07/statistics_on_h.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2545</id>

    <published>2009-07-02T21:30:01Z</published>
    <updated>2009-07-02T21:36:55Z</updated>

    <summary>Via Business Insider:...</summary>
    <author>
        <name>Aleks Jakulin</name>
        <uri>http://stat.columbia.edu/~jakulin</uri>
    </author>
    
        <category term="Miscellaneous Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Via <a href="http://www.businessinsider.com/where-businesses-are-hiring-for-tech-jobs-2009-7#statisticians-13">Business Insider</a>:</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="statisticians.jpg" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/statisticians.jpg" width="400" height="300" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></span></p>]]>
        
    </content>
</entry>

<entry>
    <title>Arthur Jensen:  &quot;the possible indicators of g are of unlimited diversity . . .&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/07/the_arthur_jens.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2540</id>

    <published>2009-07-02T13:05:21Z</published>
    <updated>2009-07-02T14:06:56Z</updated>

    <summary>After finding the Howard Wainer interview, I looked up the entire series of Profiles in Research published by the Journal of Educational and Behavioral Statistics. I don&apos;t have much to say about most of these interviews: some of these people...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Miscellaneous Science" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Miscellaneous Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Sociology" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>After finding the <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/he_shared_an_of.html">Howard Wainer interview</a>, I looked up the entire series of <a href="https://www.aera.net/publications/Default.aspx?menu_id=40&id=6198">Profiles in Research</a> published by the Journal of Educational and Behavioral Statistics.  I don't have much to say about most of these interviews:  some of these people I'd never heard of, and I don't really have much research overlap with the others.  Probably I have the most overlap with R. D. Bock, who's done a lot of work on multilevel modeling, but, for whatever reason, his stories didn't grab my interest.</p>

<p>But I was curious about the interview with Arthur Jensen.  I've never met him--he gave a talk at the Berkeley statistics department once when I was there, but for some reason I wasn't able to attend the talk.  But I've heard of him.  As the interviewers (Daniel Robinson and Howard Wainer) state:</p>]]>
        <![CDATA[<blockquote>Dr. Jensen has authored over 435 articles, books, and book chapters and is perhaps best known for his controversial 123-page article that appeared in the Harvard Educational Review in 1969. In the article, Dr. Jensen concluded that the differences between Whites and Blacks on IQ tests were attributable to inherent intellectual differences between the two races. In 1980, his Bias in Mental Testing book concluded that intelligence tests were not biased against Blacks, resulting in even more controversy.</blockquote>

<p><a href="http://jeb.sagepub.com/cgi/reprint/31/3/327.pdf?ijkey=jrQgD7VSKj3RU&keytype=ref&siteid=spjeb">The interview</a> had some interesting bits.  First, something on education and individual differences:</p>

<blockquote>The problems of schooling illustrate the first and second laws of individual differences. I call them laws because they are demonstrated without exception both in the psychological laboratory and in "real life." Unfortunately, they happen to contradict the popular faith in education as the "great leveler." The first law is that individual differences in learning and performance increase as task complexity increases. The second law is that individual differences in performance increase with continuing practice and experience, unless the particular task imposes an artificially low ceiling on proficiency.

<p>One notable consequence of these laws is that successful attempts to raise performance by improving methods and amounts of instruction raises the overall mean of the treated group but at the same time widens the distribution of individual differences. The very same effect also applies to group differences. A benefit of raising the overall educational level of the whole population is that it moves a greater proportion of the population above the threshold levels of knowledge and skill required for gainful employment. The downside is the resulting increase in individual and group differences.</blockquote></p>

<p>Then there are his ruminations on "g":</p>

<blockquote>The educated public today knows of Newton's law of gravitation, Darwin's natural selection, and Einstein's equivalence of mass and energy. They should also know about Spearman's g. Discovered in 1904, g is an essential concept for understanding variation in human abilities. . . . At the top of the factor hierarchy is g, the most general factor. Every cognitive ability that shows individual differences is loaded on the g factor. Tests differ in their g loadings, but their g loadings are not related to any particular knowledge or skills assessed by the various tests. So the possible indicators of g are of unlimited diversity. . . . 

<p>It is also important to understand what g is not. It is not a mixture or average of a number of diverse tests representing many different abilities. Rather, it is a distillate, representing the single factor that all different manifestations of cognition have in common. In fact, g is not really an ability at all. It does not reflect the tests' contents per se, or any particular kind of performance. <em>It defies description in psychological terms.</em> [italics added] Actually, it reflects some properties of the brain that cause diverse forms of cognitive activity to be positively correlated, not only in psychometric tests but in all of life's mental demands. IQ scores are an attempt to estimate g. But because IQ is just a vehicle for g, it inevitably reflects other broad factors as well, such as verbal, numerical, and spatial abilities, and the specific properties of the particular IQ test. Yet, g is the sine qua non of all IQ tests.</blockquote></p>

<p>A bit over the top, no?  I mean, I'm a political scientist and I think party id and ideology are important, and I even talk about our conceptual model in which each person has a position on a left-right scale and can get shifted by valence issues etc etc--but we know not to take that stuff too seriously!</p>

<p>The most interesting part of the interview, from a historical perspective, was Jensen's discussion of the reaction to his papers.  He tells a story in which he was giving a professional lecture in Chicago that was disrupted by 100 protesters who had infiltrated themselves into the crowd--and then he was rescued from the demonstration by a group of 10 police officers who had infiltrated themselves into the infiltrators.</p>]]>
    </content>
</entry>

<entry>
    <title>More on the median voter</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/07/more_on_the_med.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2542</id>

    <published>2009-07-01T20:03:43Z</published>
    <updated>2009-07-01T19:58:59Z</updated>

    <summary>A correspondent read my recent note on the limited influence of the median voter and writes: My understanding of median voter theorem is that each election has its own median voter, and that the median voter&apos;s influence is limited to...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>A correspondent read my recent note on <a href="http://www.fivethirtyeight.com/2009/06/limited-influence-of-median-voter.html">the limited influence of the median voter</a> and writes:</p>

<blockquote>My understanding of median voter theorem is that each election has its own median voter, and that the median voter's influence is limited to the outcome of that election only. I don't understand, then, why the graph in your post is evidence that the median voter has little influence. It seems to me that there are two elections being considered in that graph, with two different median voters. The graph appears to consider "moderation" to be having a moderate voting record in Congress, but it seems to me that the median voter in Congress is likely quite different from the median voter in any particular Congressional district. The power of the median voter in Congress, it seems to me, is to affect the outcome of Congressional votes, not to improve his own chances for re-election, which are determined by his proximity to the median voter in his district. Thus, I'm not sure why we would expect moderation, as measured by the median Congressional voter, to translate into electoral success, which we would expect to be determined by the median district voter.</blockquote>

<p>My reply:</p>]]>
        <![CDATA[<p>Yes, there are two medians:  the median congressmember (or maybe the 60th-most-liberal senator), and the median voter in any congressional district or state.</p>

<p>I definitely agree with your point about the median congressmember.  As I wrote in the blog entry you cited, "Certainly the median congressmember is important: by definition, it's that marginal vote you need to get a majority. But where do the median congressmember's positions come from?"</p>

<p>What our graph showed was that it's not as important as you might think for a congressmember to be near the median voter in his or her congressional district.  This was the point that I was focusing on, because this was the point being made by various pundits:  Ben Nelson can't be too liberal because he's representing the people of Nebraska; or, Many Democrats in Congress represent moderate-to-conservative districts, so therefore they can't be too liberal; or, There's no way Olympia Snowe can get away with voting against Obama all the time, given that Maine is a strongly Democratic state; etc.  These arguments have some force--Ben Nelson, Olympia Snowe, etc., certainly could lose their seats--but the evidence shows that the benefits from moderation aren't huge.</p>]]>
    </content>
</entry>

<entry>
    <title>Should Mark Sanford resign?</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/should_mark_san.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2543</id>

    <published>2009-07-01T03:20:14Z</published>
    <updated>2009-07-01T03:36:31Z</updated>

    <summary>At our sister blog, Tom Schaller says no: Is Sanford a cad for bolting his family on Father&apos;s Day weekend? Of course, but that is a private, moral failing, rather than a failure of public duty. . . . I...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Statistical graphics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>At our sister blog, Tom Schaller says <a href="http://www.fivethirtyeight.com/2009/06/should-sanford-resign.html">no</a>:</p>

<blockquote>Is Sanford a cad for bolting his family on Father's Day weekend? Of course, but that is a private, moral failing, rather than a failure of public duty. . . .
<p>
I [Schaller] oppose most of what Mr. Sanford stands for politically. His showy rejection of federal stimulus money targeted for his state was a crass publicity stunt designed to garner national attention for Mr. Sanford at the expense of his constituents, many of whom are struggling economically. . . . Should Mr. Sanford's ambitions founder on the shoals of a personal scandal, however, yet another opportunity will be lost to establish the long-overdue separation between private comportment and public service. So here's hoping he doesn't resign or, if he does, it is a matter of personal choice rather than him bowing to political pressure.</blockquote>

<p>I see where Schaller is coming from.  Lots of people have complicated personal lives, and it's not clear at all that these difficulties have much if anything to do with governing.  But I don't know if I agree with him on the wall of separation between private comportment and public service.</p>

<p>Consider the Sanford case.  Schaller's a Democrat, so he can evaluate Sanford on his policies.  But if Schaller were a Republican, he might very well want Sanford out of there because he tarnishes the brand, makes the party a laughingstock, etc.  Also makes it harder for Sanford to convincingly follow a "family values" agenda which Schaller (if he were a Republican) might want.  These are legitimate concerns for a Republican to have.  Even if you don't think Sanford's personal indiscretions are important, you might want him gone and replaced by a more effective Republican.  Just as, from the other direction, a Democrat would've preferred a zipped-fly version of Bill Clinton.</p>]]>
        <![CDATA[<p>But the first thing I noticed in Schaller's otherwise excellent post were the ugly pie charts.  Boy are they ugly.  Damn!  Some quick points:<br />
- The wedges aren't labeled directly.  Instead, the reader has to go back and forth, back and forth, between the chart and the legend.<br />
- The color schemes are a mess.  The top graph goes from blue to purple to yellow to green??<br />
- The responses are ordered, and the pie obscures this by being circular.  For example, in the top graph, the natural order is More, Same, Less (with Don't Know as a separate category); in the second graph, Yes, Not Sure, No.<br />
- The goofy orientation of the second graph makes it hard to see that the blue area ("Yes") is larger than the red area ("No").<br />
- On the plus side, the charts are reasonably sided (not too large, not too small), have clear titles, are unambiguously labeled, and are not tilted or 3-D (thus, areas actually do represent proportions).</p>

<p>These aren't hard-and-fast rules.  The real point is that it's hard for me to just look at the pie charts and see what's going on.  There are too many colors, legends, numbers, etc., floating around.  When all is said and done, I guess the charts aren't horrible, but they're the graphical equivalent of meandering, hard-to-follow paragraphs.</p>]]>
    </content>
</entry>

<entry>
    <title>Visualizing correlations circularly</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/visualizing_tab.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2372</id>

    <published>2009-06-30T14:29:34Z</published>
    <updated>2009-06-30T14:37:26Z</updated>

    <summary>Some time ago FlowingData had an article on visualizing tables - which really is about visualizing spreadsheets in terms of correlations between columns. While Circos generates very colorful displays: Today I was impressed by a much cleaner and Tuftier variant...</summary>
    <author>
        <name>Aleks Jakulin</name>
        <uri>http://stat.columbia.edu/~jakulin</uri>
    </author>
    
        <category term="Statistical graphics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Some time ago FlowingData had an article on <a href="http://flowingdata.com/2009/04/21/visual-representation-of-tabular-information-how-to-fix-the-uncommunicative-table/">visualizing tables</a> - which really is about visualizing spreadsheets in terms of correlations between columns. While <a href="http://srs.bcgsc.bc.ca/circos/">Circos</a> generates very colorful displays:</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="circos.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/circos.png" width="379" height="368" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></span></p>

<p>Today I was impressed by a much cleaner and Tuftier variant on the theme by Mike Bostock, called <a href="http://cs.stanford.edu/people/mbostock/iv/dependency-tree.html">Dependency Tree</a>:</p>

<p><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><a href="http://cs.stanford.edu/people/mbostock/iv/dependency-tree.html"><img alt="dependency-tree.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/dependency-tree.png" width="450" height="343" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></a></span></p>

<p>Click on the link, it's interactive. <a href="http://jheer.org/">Jeff Heer</a> and Bostock also have a new JavaScript visualization toolkit out <a href="http://vis.stanford.edu/protovis/">ProtoVis</a>, which simplifies the creation of such stuff. The computer scientist in me finds this development very cool. But I still like my <a href="http://www.stat.columbia.edu/~jakulin/Politics/matrix.png">correlation matrices</a>.</p>]]>
        
    </content>
</entry>

<entry>
    <title>&quot;A paved United States in our day&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/good_roads_ever_1.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2538</id>

    <published>2009-06-30T12:07:38Z</published>
    <updated>2009-06-30T12:17:56Z</updated>

    <summary>Sometimes you hear discussion of how the red states get more from the government than they pay in taxes while the blue states get less and pay more. This is slightly misleading because the blue states are richer and rich...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Sometimes you hear discussion of how the red states get more from the government than they pay in taxes while the blue states get less and pay more.  This is slightly misleading because the blue states are richer and rich people pay a higher rate of income tax, but it does raise the interesting question of the regionally distributive effects of national taxing and spending poliicies.</p>

<p><a href="http://www.stat.columbia.edu/~cook/movabletype/mlm/6a00d8341c6d6753ef010536249841970b-800wi.jpg"><span class="mt-enclosure mt-enclosure-image" style="display: inline;"><img alt="minimap.jpg" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/minimap.jpg" width="400" height="274" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /></span></a></p>

<p>For some perspective on where this is coming from:  In our office is a map from 1924 titled "Good Roads Everywhere" that shows a proposed system of highways spanning the country, "to be built and forever maintained by the United States Government." The map, made by the National Highways Association, also includes the following explanation for the proposed funding system:  "Such a system of National Highways will be paid for out of general taxation.  The 9 rich densely populated northeastern States will pay over 50 per cent of the cost. They can afford to, as they will gain the most.  Over 40 per cent will be paid for by the great wealthy cities of the Nation. . . . The farming regions of the West, Mississippi Valley, Southwest and South will pay less than 10 per cent of the cost and get 90 per cent of the mileage." Beyond its quaint slogans ("A paved United States in our day") and ideas that time has passed by ("Highway airports"), the map gives a sense of the potential for federal taxing and spending to transfer money between states and regions.</p>

<p>P.S.  Yes, I posted this last year, but without the pretty map image (click on it for higher resolution, which unfortunately still isn't quite good enough to make out the text)..</p>]]>
        
    </content>
</entry>

<entry>
    <title>He shared an office with an assortment of mops and brooms</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/he_shared_an_of.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2539</id>

    <published>2009-06-29T20:19:32Z</published>
    <updated>2009-06-29T20:27:30Z</updated>

    <summary>The Howard Wainer story. On of the fun parts is this story from his days as an assistant professor:...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Teaching" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p><a href="http://www.stat.columbia.edu/~gelman/stuff_for_blog/wainer.pdf">The Howard Wainer story</a>.</p>

<p>On of the fun parts is this story from his days as an assistant professor:</p>]]>
        <![CDATA[<blockquote>Soon after I [Wainer] arrived, a parade of students and faculty came to my door asking for help. . . . It wasn't long before every spare minute was used up doing analyses for others. I felt useful, but a bit overwhelmed. About mid-year I was back in Princeton having lunch with Harold [Gulliksen], and when he asked about my research, I grimaced and told him that there was no time. He asked what was taking it all up, and I explained. His advice was sage and practical. He told me that I should remember that my goal was not to help the students get their projects done, but rather help them learn something. He suggested a 4-step solution:
<p><p>
1. Ask all who come for a consultation to prepare first a one-paragraph description of their problem and give it to me a day or two in advance, so I might be able to think about it (this alone cut back on the line by 30-50%).

<p>2. Prepare an annotated bibliography.</p>

<p>3. Check off the appropriate reading on the bibliography and give that to the student.</p>

<p>4. Only if I didn't know an appropriate reading should I meet face-to-face with the student.</p>

<p>I [Wainer] followed this advice and found that, once students realized that they would have to do something themselves, the torrent of help-seekers shrank to a trickle.</blockquote></blockquote></p>

<p>But the "prepare an annotated bibliography" step seems like a lot of work!  How did he find the time to do that?</p>]]>
    </content>
</entry>

<entry>
    <title>Casey Mulligan is consistent</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/casey_mulligan.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2537</id>

    <published>2009-06-29T04:12:23Z</published>
    <updated>2009-06-29T04:12:54Z</updated>

    <summary>Back in April, in an article about partisan perceptions of the economy, John Sides and I wrote:...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Economics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Back in April, in an article about partisan perceptions of the economy, John Sides and I <a href="http://www.fivethirtyeight.com/2009/04/red-and-blue-economies.html">wrote</a>:</p>]]>
        <![CDATA[<blockquote>Democrats feel better about the economy when Democrats are in power, and Republicans feel better when their party rules. What's striking, though, is how quickly these perceptions can change.

<p>For example, in mid-September, John McCain notoriously said, "The fundamentals of our economy are still strong." But then in early March, he said that the American people "want to know how we got into this ditch--the worst economic crisis since the great Depression." Based on these two statements, the slide into the ditch apparently occurred sometime between September 16 and March 3.</p>

<p>Similarly, University of Chicago economist Casey Mulligan spent the end of 2008 arguing that the economy is just not that bad, but then changed course in March, writing that "the crash of 2008 did not bother me" but "the crash of 2009 is more worrisome . . . So far productivity has been good in this recession, but 2009's stock market could well see that changing."</p>

<p>It's no surprise that John McCain and Casey Mulligan's views on the economy differ from those of Rahm Emanuel and Paul Krugman, or for that matter Barack Obama, who just last week was beginning to see "glimmers of hope" in the economy. . . .</blockquote></p>

<p>I don't know about John McCain and Barack Obama, but I recently checked on Casey Mulligan, and I'm pleased to report that he does <em>not</em> seem to have shown a partisan tack in his statements about the economy.  For example, some recent posts:</p>

<blockquote><a href="http://caseymulligan.blogspot.com/2009/06/real-disposable-personal-income-per_26.html">Real Disposable Personal Income per Capita Higher than Ever</a>
The BEA reported that real disposable personal income was $2478 per person in May 2009. The only month in U.S. history higher than that was May 2008 ($2499). Based on the recent trends, I [Mulligan] expect that June 2009 (which is almost over) will have the highest real disposable personal income ever.</blockquote>

<blockquote><a href="http://caseymulligan.blogspot.com/2009/06/592-per-person.html">$592 per person</a>
Compared to a world in which real GDP remained at the (thusfar) all time high (achieved in 2008 Q2), the BEA's report this morning shows that through 2009 Q1 the U.S. economy had lost $181 billion (measured at 2008 Q4 prices).  $181 billion is equivalent to:
<blockquote>
    - $592 per person, which is equivalent to

<p>    - 4.6 days of GDP (that is, we are producing like we took 3 weeks of vacation per year, instead of two)</blockquote></blockquote></p>

<p>Mulligan clearly has a partisan perspective, but his take on the economy--that things aren't going so badly--has been broadly consistent since the fall of 2008.  He does not seem to have changed this view or applied any partisan filters in response to the change in power in Washington.</p>

<p>I know nothing about macroeconomics--even less than I know about the EM algorithm (that's an inside joke; Xiao-Li can explain it to you)--and I am not trying in any way to agree or disagree with Mulligan's analyses (not that my position on this matter would mean anything, anyway).  I just wanted to follow up on my earlier offhand remark that had implied that Mulligan had changed course following the change in administration.</p>]]>
    </content>
</entry>

<entry>
    <title>A scary thought</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/a_scary_thought.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2388</id>

    <published>2009-06-29T03:51:54Z</published>
    <updated>2009-06-28T23:10:29Z</updated>

    <summary>A colleague and I were talking the other day about how much we pay our research assistants. It turns out that she pays much more. In fact, sometimes I don&apos;t get around to paying my research assistants at all, but...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Economics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>A colleague and I were talking the other day about how much we pay our research assistants.  It turns out that she pays much more.  In fact, sometimes I don't get around to paying my research assistants at all, but she pays hers a decent amount.</p>

<p>My colleague, who's an untentured professor, said that was understandable because she makes less money than I do, so she can better relate to the students' lifestyles.  That's a pretty scary thought--it should really go the other way, right?  I get paid more so I should be able to afford to be more generous.  But maybe she's right; if so, it's a sobering insight.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Some NIH-funded projects are less than earthshaking</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/some_nih-funded.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2535</id>

    <published>2009-06-27T21:59:17Z</published>
    <updated>2009-06-27T22:30:24Z</updated>

    <summary>One major impediment, scientists agree, is the grant system itself. It has become a sort of jobs program, a way to keep research laboratories going year after year . . . I was on an NIH panel a couple of...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Sociology" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p><a href="http://www.nytimes.com/2009/06/28/health/research/28cancer.html?_r=1&partner=rss&emc=rss">One major impediment, scientists agree, is the grant system itself. It has become a sort of jobs program, a way to keep research laboratories going year after year . . .</a></p>

<p>I was on an NIH panel a couple of years ago with about 25 other scientists, reviewing something like 90 grants.  It was pointless.  25 people is just too many to make a decision.  What happened was that there were 3 or 4 people who were experienced in the process, who ended up guiding the entire discussion.</p>

<p>The highlight--or, I should say, lowlight--was when we were reviewing a proposal involving the study of the carcinogenic effects of hookah (water pipe) smoking.  I asked if this was really such a big deal, and one of the panel members told me that smoking tobacco through a hookah is something like 10 times worse than smoking a cigarette.  If so, the public health consequences could be pretty serious, even if not so many people did it.  I said this sounded like a reasonable point to me.  Then this guy across the table from me spoke up and said that he knew somebody who was 80 years old, had been smoking with a hookah all his life and was none the worse from it.  At this point, I blew up.  I couldn't believe that the "my elderly aunt smokes and she didn't get cancer" argument could be brought up at an NIH panel!</p>]]>
        
    </content>
</entry>

<entry>
    <title>Statistical Tests and Election Fraud</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/statistical_tes.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2534</id>

    <published>2009-06-27T21:00:38Z</published>
    <updated>2009-06-27T21:01:17Z</updated>

    <summary>My final thoughts on those Iran vote analyses:...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>My final thoughts on those Iran vote analyses:</p>]]>
        <![CDATA[<p>From Florida in 2000 and Ohio in 2004 to Mexico City in 2006 to Iran in 2009, tightly contested elections are often accompanied by claims of fraud or serious error: that is, that the election outcome does not match the intentions of the people who voted. Sometimes there is direct evidence of fraud: people voting multiple times, tampering with ballot boxes, etc., and often there is evidence of mistakes, including overvotes (such as a person choosing a candidate and also writing in his name), lost ballots, and technical problems with voting machines.</p>

<p>But the the usual sort of evidence for major problems is a discrepancy between the overall election outcome and what was expected from polls or from extrapolation from other elections. A notorious example is Patrick Buchanan's votes on the "butterfly ballot" in Palm Beach in 2000, which were inconsistent with patterns in other Florida counties in that year.</p>

<p>For the Iran election, the natural step is to compare to previous election returns and look for large changes, as was done by political scientist Walter Mebane. Striking patterns found in such a comparison to not prove fraud but can be useful in giving people a sense of where to focus attention if they want to look further.</p>

<p>Scacco and Beber's analysis is based on the idea that, if there is election fraud, the cheaters are probably acting in a hurry and with various constraints on what numbers they can actually manipulate. As a result, the fake numbers might show some patterns that would be highly unlikely to be seen in tallies of real votes. Again, it is hard for such circumstantial evidence to be entirely convincing on its own, but the patterns they find can support particular theories of how the vote totals came to be.</p>

<p>Another way to calibrate our understanding of such statistical tests is to apply them to a large number of actual elections to see where apparent anomalies appear. Are anomalies happening pretty much at random, as might be expected if one were simply trawling through the data looking for patterns, or do they actually coincide with elections known to be suspicious or fraudulent on other grounds?</p>

<p>Even if statistical tests cannot prove fraud, they can help the news media and observers on the ground to focus their inquiries. </p>]]>
    </content>
</entry>

<entry>
    <title>Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/multiple_imputa_4.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2533</id>

    <published>2009-06-27T02:51:03Z</published>
    <updated>2009-06-27T02:58:18Z</updated>

    <summary>Our article (by Yu-Sung, Jennifer, Masanao, and myself, and based also on work with Kobi, Grazia, and Peter Messeri) will be appearing in the Journal of Statistical Software, in a special issue on missing-data imputation. Here&apos;s the abstract: Our mi...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Miscellaneous Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Statistical computing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p><a href="http://www.stat.columbia.edu/~gelman/research/published/mipaper.rev04.pdf">Our article</a> (by Yu-Sung, Jennifer, Masanao, and myself, and based also on work with Kobi, Grazia, and Peter Messeri) will be appearing in the Journal of Statistical Software, in a special issue on missing-data imputation.  Here's the abstract:</p>

<blockquote>Our mi package in R has several features that allow the user to get inside the imputation process and evaluate the reasonableness of the resulting models and imputations. These features include: flexible choice of predictors, models, and transformations for chained imputation models; binned residual plots for checking the fit of the conditional distributions used for imputation; and plots for comparing the distributions of observed and imputed data in one and two dimensions. In addition, we use Bayesian models and weakly informative prior distributions to construct more stable estimates of imputation models. Our goal is to have a demonstration package that (a) avoids many of the practical problems that arise with existing multivariate imputation programs, and (b) demonstrates state-of-the-art diagnostics that can be applied more generally and can be incorporated into the software of others.</blockquote>

<p>We've made lots of improvements since listing the package last year (<a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/12/new_multiple_im.html">here</a>).  There's still a lot more work to do, in many different directions (including multilevel models, nonignorable models, the self-cleaning oven, and making the program run faster in sorts of ways), and we keep improving it.  But it's good to have something out there.</p>

<p>To actually get the R package, just open your R window, click on Packages, Install packages, and grab mi.</p>]]>
        
    </content>
</entry>

<entry>
    <title>&quot;These stories make the case far better than any statistics ever could&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/these_stories_m.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2532</id>

    <published>2009-06-26T20:43:20Z</published>
    <updated>2009-06-26T20:50:34Z</updated>

    <summary>Pinchas Lev writes:...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Decision Theory" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        <![CDATA[<p>Pinchas Lev writes:</p>]]>
        <![CDATA[<blockquote>After reading your blog and Krugman's article, and noticing the interest you have in the ongoing debate surrounding healthcare reform, I figured you might find the email I received from the President's website interesting.

<p>In particular, I [Pinchas] would like to highlight the sentence, six lines into the email, in which Biden says that "[these] stories make the case far better than any statistics ever could."  I find it troubling that policymakers who are advocating "evidence based medicine," trump anecdotal evidence over that which is anchored in statistics. </blockquote></p>

<p>Here's the email:</p>

<blockquote>---------- Forwarded message ----------
Date: Thu, 25 Jun 2009 20:06:43 -0400
From: Vice President Joe Biden <info@barackobama.com>
Subject: You've got to read these

<p>A few weeks ago, President Obama asked you to share your personal story about how the health care crisis has affected you and the ones you love. Hundreds of thousands of stories poured in from every corner of the country. The President and I have read through many of them ourselves -- and now I'm encouraging you to do so as well.</p>

<p>Read these powerful, personal stories from people in your area and around the country:</p>

<p>http://healthcare.barackobama.com/stories</p>

<p>And after you do, please forward this note on to as many people as you can.</p>

<p>For folks who don't yet understand why health care reform is such an urgent priority, these stories make the case far better than any statistics ever could.<br />
. . .</blockquote></p>

<p>The funny thing is, Biden's gotta be right, that for most people, the stories do make the case "better than any statistics ever could."  It's still a little disturbing to me to see this, though.</p>]]>
    </content>
</entry>

</feed>
