<?xml version="1.0" encoding="iso-8859-1"?> <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/"> 
<channel>
<title>Statistical Modeling, Causal Inference, and Social Science</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/mlm/</link>
<description></description>
<dc:language>en-us</dc:language>
<dc:creator>gelman@stat.columbia.edu</dc:creator>
<dc:rights>Copyright 2008</dc:rights>
<dc:date>2008-11-23T18:51:18-05:00</dc:date>
<admin:generatorAgent rdf:resource="http://www.movabletype.org/?v=3.34" />
<admin:errorReportsTo rdf:resource="mailto:gelman@stat.columbia.edu"/>
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase>

<item>
<title>Visualizing election polls</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/visualizing_ele.html</link>
<description><![CDATA[<p>A colleague points me to <a href="http://www.unews.utah.edu/p/?r=092908-3">these supremely ugly pie-like graphs</a> by Richard Riesenfeld and Geoff Draper.  On the other hand, who am I to say they're ugly?  I'm sympathetic to the goal of "exposing complex relationships that are not obvious by usual methods of statistical analysis."  And it's hard to argue with "Eighty-eight percent said they enjoyed using the software and 71 percent completed all the tasks without errors."  I've certainly never performed such an evaluation of my own graphical methods, instead relying, Tufte-like, on my introspective judgment.</p>]]></description>
<guid isPermaLink="false">2066@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p>A colleague points me to <a href="http://www.unews.utah.edu/p/?r=092908-3">these supremely ugly pie-like graphs</a> by Richard Riesenfeld and Geoff Draper.  On the other hand, who am I to say they're ugly?  I'm sympathetic to the goal of "exposing complex relationships that are not obvious by usual methods of statistical analysis."  And it's hard to argue with "Eighty-eight percent said they enjoyed using the software and 71 percent completed all the tasks without errors."  I've certainly never performed such an evaluation of my own graphical methods, instead relying, Tufte-like, on my introspective judgment.</p><a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2066" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/visualizing_ele.html#comments" title="Comment on: Visualizing election polls">Comments (4)</a></p>]]></content:encoded>
<dc:subject>Statistical graphics</dc:subject>
<dc:date>2008-11-23T18:51:18-05:00</dc:date>
</item>
<item>
<title>The score</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/the_score.html</link>
<description><![CDATA[<p>Occasionally I post comments here on other people's books or articles, and sometimes I email the authors to get their feedback.  Here's the score:</p>

<p>Responded:</p>

<p>John Clute<br />
Richard Florida<br />
Malcolm Gladwell<br />
Sander Greenland<br />
Daniel Gross<br />
Mickey Kaus<br />
Paul Krugman<br />
Andrew Leonard<br />
John Lott<br />
Jay Nordlinger<br />
Andrew Oswald<br />
Ed Park<br />
Steve Sailer<br />
John Seabrook<br />
Nassim Taleb<br />
Josh Tenenbaum</p>

<p>Did not respond:</p>

<p><a href="http://redbluerichpoor.com/blog/?p=187">Robert Frank</a><br />
Satoshi Kanazawa<br />
<a href="http://redbluerichpoor.com/blog/?p=116">George Packer</a><br />
<a href="http://redbluerichpoor.com/blog/?p=187">Russ Alan Price</a><br />
<a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/06/but_viewed_in_r.html">David Runciman</a></p>

<p>I think I've missed a few here (in both categories).  Also, some people I'm still waiting to hear from, and some respond but not in a useful way.</p>

<p>P.S.  I just noticed:  all these people are male (and most are white)!  I'll have to diversify a bit!</p>]]></description>
<guid isPermaLink="false">2063@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p>Occasionally I post comments here on other people's books or articles, and sometimes I email the authors to get their feedback.  Here's the score:</p>

<p>Responded:</p>

<p>John Clute<br />
Richard Florida<br />
Malcolm Gladwell<br />
Sander Greenland<br />
Daniel Gross<br />
Mickey Kaus<br />
Paul Krugman<br />
Andrew Leonard<br />
John Lott<br />
Jay Nordlinger<br />
Andrew Oswald<br />
Ed Park<br />
Steve Sailer<br />
John Seabrook<br />
Nassim Taleb<br />
Josh Tenenbaum</p>

<p>Did not respond:</p>

<p><a href="http://redbluerichpoor.com/blog/?p=187">Robert Frank</a><br />
Satoshi Kanazawa<br />
<a href="http://redbluerichpoor.com/blog/?p=116">George Packer</a><br />
<a href="http://redbluerichpoor.com/blog/?p=187">Russ Alan Price</a><br />
<a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/06/but_viewed_in_r.html">David Runciman</a></p>

<p>I think I've missed a few here (in both categories).  Also, some people I'm still waiting to hear from, and some respond but not in a useful way.</p>

<p>P.S.  I just noticed:  all these people are male (and most are white)!  I'll have to diversify a bit!</p><a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2063" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/the_score.html#comments" title="Comment on: The score">Comments (0)</a></p>]]></content:encoded>
<dc:subject>Sociology</dc:subject>
<dc:date>2008-11-23T18:24:21-05:00</dc:date>
</item>
<item>
<title>Political engagement on the web</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/political_engag.html</link>
<description><![CDATA[<p>The <a href="http://blog.compete.com">Compete Blog</a> (which posts a wealth of interesting data charts mined from monitoring web surfers) posted statistics <br />
about <a href="http://blog.compete.com/2008/10/20/election-obama-mccain-state-engagement-colorado/">proportion of web surfers that visit political websites</a>:</p>

<p><img alt="webpolitics.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/webpolitics.png" width="506" height="348" /></p>

<p>Colorado, Connecticut and New Jersey are at the top. Colorado was a battleground state.</p>]]></description>
<guid isPermaLink="false">2062@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p>The <a href="http://blog.compete.com">Compete Blog</a> (which posts a wealth of interesting data charts mined from monitoring web surfers) posted statistics <br />
about <a href="http://blog.compete.com/2008/10/20/election-obama-mccain-state-engagement-colorado/">proportion of web surfers that visit political websites</a>:</p>

<p><img alt="webpolitics.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/webpolitics.png" width="506" height="348" /></p>

<p>Colorado, Connecticut and New Jersey are at the top. Colorado was a battleground state.</p><a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2062" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/political_engag.html#comments" title="Comment on: Political engagement on the web">Comments (0)</a></p>]]></content:encoded>
<dc:subject>Political Science</dc:subject>
<dc:date>2008-11-23T17:59:36-05:00</dc:date>
</item>
<item>
<title>A question about the youth vote</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/a_question_abou_6.html</link>
<description><![CDATA[<p>Shivaji Sondhi writes:</p>

<blockquote>I had a question for you about the youth vote. What are its ethnic and red/blue composition? The reason I ask is that I was trying to integrate the apparently growing Democratic dominance in this segment with various other beliefs I have seen expressed, e.g

<p>a) that red states have larger fertility (affordable family formation or whatever)</p>

<p>b) that families have an impact on the political beliefs of children (more than educators, as educators  insist - at least at the college level, I haven't really seen a discussion of school teachers) which would then provide a mechanism for (a) to affect voting share to the right of the spectrum</p>

<p>c) that the minorities form a growing share of the young which would tilt the playing field to the left.</blockquote></p>

<p>My reply:</p>

<p>1.  I don't yet have raw survey data.  The <a href="http://www.cnn.com/ELECTION/2008/results/polls/#USP00p1">exit polls on the web</a> do break down the vote by age and race.  Among blacks, Obama won about the same among all age groups.  Among Hispanics, Obama did 8% better among the young than the old, and among whites, Obama did 14% better among the young than the old.</p>

<p>But . . . if you believe the exit polls (which I don't, completely), there was an interaction between age and race:  many more of the young voters were ethnic minorities.  Among blacks and Hispanics, there were three times as many under-30's as over-65's.  (By comparison, among whites, there were more old voters than young voters.)</p>

<p>So the age effect partly arose from lots of young ethnic minorities coming out to vote.</p>

<p>2.  People do tend to vote like their parents--children of Republicans are, on average, more likely to vote Republican--but cohort effects go on top of this.  The recent economy and George W. Bush's approval ratings aren't likely to make the Republican Party popular with young people--especially those who are ethnic minorities.  Any differences in birth rates between states are small compared to these big political swings, which are not just about Obama; see <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2006/06/party_id_and_pa.html">this graph</a> from 2006:</p>

<p><img alt="27-4.gif" src="http://www.stat.columbia.edu/~cook/movabletype/archives/27-4.gif" width="291" height="272" /></p>]]></description>
<guid isPermaLink="false">2064@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p>Shivaji Sondhi writes:</p>

<blockquote>I had a question for you about the youth vote. What are its ethnic and red/blue composition? The reason I ask is that I was trying to integrate the apparently growing Democratic dominance in this segment with various other beliefs I have seen expressed, e.g

<p>a) that red states have larger fertility (affordable family formation or whatever)</p>

<p>b) that families have an impact on the political beliefs of children (more than educators, as educators  insist - at least at the college level, I haven't really seen a discussion of school teachers) which would then provide a mechanism for (a) to affect voting share to the right of the spectrum</p>

<p>c) that the minorities form a growing share of the young which would tilt the playing field to the left.</blockquote></p>

<p>My reply:</p>

<p>1.  I don't yet have raw survey data.  The <a href="http://www.cnn.com/ELECTION/2008/results/polls/#USP00p1">exit polls on the web</a> do break down the vote by age and race.  Among blacks, Obama won about the same among all age groups.  Among Hispanics, Obama did 8% better among the young than the old, and among whites, Obama did 14% better among the young than the old.</p>

<p>But . . . if you believe the exit polls (which I don't, completely), there was an interaction between age and race:  many more of the young voters were ethnic minorities.  Among blacks and Hispanics, there were three times as many under-30's as over-65's.  (By comparison, among whites, there were more old voters than young voters.)</p>

<p>So the age effect partly arose from lots of young ethnic minorities coming out to vote.</p>

<p>2.  People do tend to vote like their parents--children of Republicans are, on average, more likely to vote Republican--but cohort effects go on top of this.  The recent economy and George W. Bush's approval ratings aren't likely to make the Republican Party popular with young people--especially those who are ethnic minorities.  Any differences in birth rates between states are small compared to these big political swings, which are not just about Obama; see <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2006/06/party_id_and_pa.html">this graph</a> from 2006:</p>

<p><img alt="27-4.gif" src="http://www.stat.columbia.edu/~cook/movabletype/archives/27-4.gif" width="291" height="272" /></p><a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2064" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/a_question_abou_6.html#comments" title="Comment on: A question about the youth vote">Comments (0)</a></p>]]></content:encoded>
<dc:subject>Political Science</dc:subject>
<dc:date>2008-11-22T17:59:23-05:00</dc:date>
</item>
<item>
<title>The Denominator, or, Is it an advantage to have a humble background?</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/the_denominator.html</link>
<description><![CDATA[<p><a href="http://www.newyorker.com/reporting/2008/11/10/081110fa_fact_gladwell">Malcolm Gladwell recounts</a> the story of Sidney Weinberg, a kid who grew up in the slums of Brooklyn around 1900 and rose to become the head of Goldman Sachs and well-connected rich guy extraordinaire.  Gladwell conjectures that Weinberg's success came not in spite of but because of his impoverished background:</p>

<blockquote>Why did [his] strategy work . . . it's hard to escape the conclusion that . . . there are times when being an outsider is precisely what makes you a good insider.</blockquote>

<p>Later, he continues:</p>

<blockquote>It’s one thing to argue that being an outsider can be strategically useful. But Andrew Carnegie went farther. He believed that poverty provided a better preparation for success than wealth did; that, at root, compensating for disadvantage was more useful, developmentally, than capitalizing on advantage.</blockquote>

<p>At some level, there's got to be some truth to this:  you learn things from the school of hard knocks that you'll never learn in the Ivy League, and so forth.  But . . . there are so many more poor people than rich people out there.  Isn't this just a story about a denominator?  Here's my hypothesis:<br />
<strong><blockquote><br />
Pr (success | privileged background) >> Pr (success | humble background)</p>

<p># people with privileged background << # of people with humble background<br />
</blockquote></strong><br />
Multiply these together, and you might find that many extremely successful people have humble backgrounds, but it does not mean that being an outsider is actually an advantage.</p>

<p>Here's more from Gladwell's article:</p>]]></description>
<guid isPermaLink="false">2056@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p><a href="http://www.newyorker.com/reporting/2008/11/10/081110fa_fact_gladwell">Malcolm Gladwell recounts</a> the story of Sidney Weinberg, a kid who grew up in the slums of Brooklyn around 1900 and rose to become the head of Goldman Sachs and well-connected rich guy extraordinaire.  Gladwell conjectures that Weinberg's success came not in spite of but because of his impoverished background:</p>

<blockquote>Why did [his] strategy work . . . it's hard to escape the conclusion that . . . there are times when being an outsider is precisely what makes you a good insider.</blockquote>

<p>Later, he continues:</p>

<blockquote>It’s one thing to argue that being an outsider can be strategically useful. But Andrew Carnegie went farther. He believed that poverty provided a better preparation for success than wealth did; that, at root, compensating for disadvantage was more useful, developmentally, than capitalizing on advantage.</blockquote>

<p>At some level, there's got to be some truth to this:  you learn things from the school of hard knocks that you'll never learn in the Ivy League, and so forth.  But . . . there are so many more poor people than rich people out there.  Isn't this just a story about a denominator?  Here's my hypothesis:<br />
<strong><blockquote><br />
Pr (success | privileged background) >> Pr (success | humble background)</p>

<p># people with privileged background << # of people with humble background<br />
</blockquote></strong><br />
Multiply these together, and you might find that many extremely successful people have humble backgrounds, but it does not mean that being an outsider is actually an advantage.</p>

<p>Here's more from Gladwell's article:</p><p><a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/the_denominator.html" title="Continue Reading: The Denominator, or, Is it an advantage to have a humble background?">Continued reading The Denominator, or, Is it an advantage to have a humble background?...</a>
<p>Posted by Andrew at November 21, 2008  2:32 PM</p><p>
<a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2056" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/the_denominator.html#comments" title="Comment on: The Denominator, or, Is it an advantage to have a humble background?">Comments (12)</a></p>]]></content:encoded>
<dc:subject>Miscellaneous Statistics</dc:subject>
<dc:date>2008-11-21T14:32:25-05:00</dc:date>
</item>
<item>
<title>Netflix Prize scoring function isn&apos;t Bayesian</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/netflix_prize_s.html</link>
<description><![CDATA[<p>NY Times has a good article on the state of recommender systems: <a href="http://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html?_r=1&pagewanted=all">"If You Liked This, Sure to Love That "</a>. This is a description of one of the problems:</p>

<blockquote>
But his progress had slowed to a crawl. [...] Bertoni says it’s partly because of “Napoleon Dynamite,” an indie comedy from 2004 that achieved cult status and went on to become extremely popular on Netflix. It is, Bertoni and others have discovered, maddeningly hard to determine how much people will like it. When Bertoni runs his algorithms on regular hits like “Lethal Weapon” or “Miss Congeniality” and tries to predict how any given Netflix user will rate them, he’s usually within eight-tenths of a star. But with films like “Napoleon Dynamite,” he’s off by an average of 1.2 stars.

<p>The reason, Bertoni says, is that “Napoleon Dynamite” is very weird and very polarizing. [...] It’s the type of quirky entertainment that tends to be either loved or despised.  <br />
</blockquote></p>

<p>And here is the stunning conclusion by fortunately anonymous computer scientists:<br />
<blockquote><br />
Some computer scientists think the “Napoleon Dynamite” problem exposes a serious weakness of computers. They cannot anticipate the eccentric ways that real people actually decide to take a chance on a movie.<br />
</blockquote></p>

<p>Actually, computers do quite a good job modeling probability distributions for those more eccentric and unpredictable of us. Yes, the humble probability distribution, the centuries-old staple of statisticians is enough to model eccentricity! The problem is that Netflix makes it hard to use sophisticated models the scoring function is the antiquated and not just pre-Bayesian but actually pre-probabilistic <i>root mean squared error</i> or <a href="http://en.wikipedia.org/wiki/RMSE">RMSE</a>. For all practical purposes, the square root in RMSE is a monotonic transformation that won't affect the ranking of recommender models, and we can drop it outright. </p>

<p>So, if one looked at the distribution of ratings for Napoleon Dynamite on Amazon, it has high variance:<br />
<img alt="napoleondynamite.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/napoleondynamite.png" width="135" height="80" /></p>

<p>On the other hand, Lethal Weapon 4 ratings have lower variance:<br />
<img alt="lethalweapon4.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/lethalweapon4.png" width="133" height="85" /></p>

<p>If we use the average number of stars as the context-ignorant unpersonalized predictor (which I've discussed <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2007/03/bayesian_sortin.html">before</a>), ND will give you mean squared pain of 3.8, and LW4 will give you the mean squared pain of 2.7. Now, your model might choose not to make recommendations with controversial movies - but this won't help you on Netflix Prize - you're forced to make errors even when you know you're making them. <b>(R)MSE is pre-probabilistic: it gives no advantage to a probabilistic model that's aware of its own uncertainty.</b></p>]]></description>
<guid isPermaLink="false">2061@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p>NY Times has a good article on the state of recommender systems: <a href="http://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html?_r=1&pagewanted=all">"If You Liked This, Sure to Love That "</a>. This is a description of one of the problems:</p>

<blockquote>
But his progress had slowed to a crawl. [...] Bertoni says it’s partly because of “Napoleon Dynamite,” an indie comedy from 2004 that achieved cult status and went on to become extremely popular on Netflix. It is, Bertoni and others have discovered, maddeningly hard to determine how much people will like it. When Bertoni runs his algorithms on regular hits like “Lethal Weapon” or “Miss Congeniality” and tries to predict how any given Netflix user will rate them, he’s usually within eight-tenths of a star. But with films like “Napoleon Dynamite,” he’s off by an average of 1.2 stars.

<p>The reason, Bertoni says, is that “Napoleon Dynamite” is very weird and very polarizing. [...] It’s the type of quirky entertainment that tends to be either loved or despised.  <br />
</blockquote></p>

<p>And here is the stunning conclusion by fortunately anonymous computer scientists:<br />
<blockquote><br />
Some computer scientists think the “Napoleon Dynamite” problem exposes a serious weakness of computers. They cannot anticipate the eccentric ways that real people actually decide to take a chance on a movie.<br />
</blockquote></p>

<p>Actually, computers do quite a good job modeling probability distributions for those more eccentric and unpredictable of us. Yes, the humble probability distribution, the centuries-old staple of statisticians is enough to model eccentricity! The problem is that Netflix makes it hard to use sophisticated models the scoring function is the antiquated and not just pre-Bayesian but actually pre-probabilistic <i>root mean squared error</i> or <a href="http://en.wikipedia.org/wiki/RMSE">RMSE</a>. For all practical purposes, the square root in RMSE is a monotonic transformation that won't affect the ranking of recommender models, and we can drop it outright. </p>

<p>So, if one looked at the distribution of ratings for Napoleon Dynamite on Amazon, it has high variance:<br />
<img alt="napoleondynamite.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/napoleondynamite.png" width="135" height="80" /></p>

<p>On the other hand, Lethal Weapon 4 ratings have lower variance:<br />
<img alt="lethalweapon4.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/lethalweapon4.png" width="133" height="85" /></p>

<p>If we use the average number of stars as the context-ignorant unpersonalized predictor (which I've discussed <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2007/03/bayesian_sortin.html">before</a>), ND will give you mean squared pain of 3.8, and LW4 will give you the mean squared pain of 2.7. Now, your model might choose not to make recommendations with controversial movies - but this won't help you on Netflix Prize - you're forced to make errors even when you know you're making them. <b>(R)MSE is pre-probabilistic: it gives no advantage to a probabilistic model that's aware of its own uncertainty.</b></p><a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2061" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/netflix_prize_s.html#comments" title="Comment on: Netflix Prize scoring function isn't Bayesian">Comments (6)</a></p>]]></content:encoded>
<dc:subject>Miscellaneous Statistics</dc:subject>
<dc:date>2008-11-21T13:27:39-05:00</dc:date>
</item>
<item>
<title>Still another 10 days to apply for an Earth Institute postdoc</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/still_another_1.html</link>
<description><![CDATA[<p>The Earth Institute is looking for applicants for its postdoctoral fellows program, and if you're doing statistics you can work with me.  It's a highly competitive program, deadline is 1 December so apply now:</p>]]></description>
<guid isPermaLink="false">2060@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p>The Earth Institute is looking for applicants for its postdoctoral fellows program, and if you're doing statistics you can work with me.  It's a highly competitive program, deadline is 1 December so apply now:</p><p><a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/still_another_1.html" title="Continue Reading: Still another 10 days to apply for an Earth Institute postdoc">Continued reading Still another 10 days to apply for an Earth Institute postdoc...</a>
<p>Posted by Andrew at November 20, 2008  9:42 PM</p><p>
<a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2060" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/still_another_1.html#comments" title="Comment on: Still another 10 days to apply for an Earth Institute postdoc">Comments (0)</a></p>]]></content:encoded>
<dc:subject>Miscellaneous Statistics</dc:subject>
<dc:date>2008-11-20T21:42:27-05:00</dc:date>
</item>
<item>
<title>I don&apos;t know the answer to this one, or even if there is an answer</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/i_dont_know_the.html</link>
<description><![CDATA[<p>Thanh Nguyen writes:</p>

<blockquote>Could you tell me what is the difference between "uncertainty" and "ignorance" in this theory [of belief functions]? Some authors define "ignorance" as the "uncommitted belief" which is assigned to the whole frame of discernment, others define it as the difference between Plausibility and Belief (Pl() - Bel()). Some authors define value assigned by Belief function for elements as "uncertainty".</blockquote>

<p>I don't know.  All I know about belief functions is in <a href="http://www.stat.columbia.edu/~gelman/research/published/augie4.pdf">my article about the boxer, the wrestler, and the coin flip</a>, which is actually a writeup of something I did 20 years ago.  So no new thoughts, unfortunately.</p>]]></description>
<guid isPermaLink="false">2059@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p>Thanh Nguyen writes:</p>

<blockquote>Could you tell me what is the difference between "uncertainty" and "ignorance" in this theory [of belief functions]? Some authors define "ignorance" as the "uncommitted belief" which is assigned to the whole frame of discernment, others define it as the difference between Plausibility and Belief (Pl() - Bel()). Some authors define value assigned by Belief function for elements as "uncertainty".</blockquote>

<p>I don't know.  All I know about belief functions is in <a href="http://www.stat.columbia.edu/~gelman/research/published/augie4.pdf">my article about the boxer, the wrestler, and the coin flip</a>, which is actually a writeup of something I did 20 years ago.  So no new thoughts, unfortunately.</p><a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2059" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/i_dont_know_the.html#comments" title="Comment on: I don't know the answer to this one, or even if there is an answer">Comments (1)</a></p>]]></content:encoded>
<dc:subject>Bayesian Statistics</dc:subject>
<dc:date>2008-11-20T21:38:47-05:00</dc:date>
</item>
<item>
<title>The Future of Data Analysis</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/the_future_of_bayes.html</link>
<description><![CDATA[<p><b>Introduction</b> A few days ago I was trying to explain the benefits of the Bayesian approach to a physicist who didn't care about the religion of truth and inference but primarily about solving a particular detection problem in particle physics. The probabilistic approach is rather standard and requires little persuasion, but the Bayesian aspect is is a level further than the probabilistic approach. So what is the benefit of the Bayesian approach? This posting will attempt to provide several reasons, from the most obvious to the least.</p>

<p><b>Frequentist Probability</b> Probability is easily justified as a very elegant way of dealing with uncertainty in cases and variables. But probability is not observed directly but instead inferred - as are the parameters in contrast to observable predictors and outcomes. Frequentists state that the probability should be measured through the gold standard of an infinite sequence of observations, and question the benefit of Bayesian approach while criticizing the fact that inferring a parameter Bayesianly can yield worse accuracy than their favored method of "estimators" - and a bad prior can totally mess up inference. So why not use estimators if their asymptotic properties are good and the methodology often simpler than Bayes?</p>

<p><b>Overfitting</b> Dividing the number of positive outcomes with the number of all outcomes to estimate the probability of the positive outcome is a very simple estimator: it's easy to have enough data to calculate this. But most interesting questions are not as simple: it is not interesting to calculate the probability of getting cancer, and the probability of getting cancer given smoking also requires removing the obvious effect of age. All these additional variables make a model more complicated, and the number of parameters greater. Without care and attention the model can start hallucinating properties that aren't there. The problem is shown in the following picture:</p>

<p><img alt="why-bayes.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/why-bayes.png" width="432" height="403" /></p>

<p>If your modeling problem is in the green area, you can happily use estimators or maximum likelihood. If you're entering the yellow area and want to retain some generalization power, you need some sort of regularization, epitomized by L1 and L2 regularization, AIC, feature selection or support vector machines. So why shouldn't we just regularize?</p>

<p><b>Priors</b> Priors are how a Bayesian would perform regularization. After seeing a large number of regression problems from medical domains, we can safely assign a prior distribution to the size of a regression coefficient, as we have done in <a href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1010421">our paper</a>. But then, what is the advantage over regularization? A prior is just a distribution of what the parameters should be over a particular category of problems! Isn't this a nice way to formulate regularization?</p>

<p><b>Model Uncertainty</b> The crux of Bayes is in using probability to represent the uncertainty about the Platonic - the model, its parameters, the probability. The Bayesian approach truly starts paying a dividend when there is uncertainty in models and parameters, when we have insufficient data to accurately fit the model. Even if an estimator could rather accurately match the predictions obtained by a posterior, the variance in the posterior allows us to understand when the model can't be fit. To the best of my knowledge, no other methodology can automatically detect such problems.</p>

<p>Another problem that Andrew identified is that there might be situations where the data doesn't match the model very well - and even though there might be lots of data and a relatively simple model - it just doesn't fit, and the posterior will be vague.</p>

<p><b>Language of Modeling</b> WinBUGS is an example of a <i>higher-level modeling language</i>. Just as programming languages have been celebrated as improving programmers' productivity: they do not require the programmer to think in terms of individual statements such as SET or JMP but in terms of functions, procedures, loops. Similarly, with Bayesian models we no longer have to think in terms of derivatives and fitting algorithms, but in terms of parameters having distributions and tied together in models. Gibbs sampler is a general-purpose fitter and proto-compiler. Of course, it's not nearly as efficient as a hand-written optimizer, but in the future tools like the Hierarchical Bayes Compiler (<a href="http://www.cs.utah.edu/~hal/HBC/">HBC</a>) will create custom fitters given a higher-level specification of the model.</p>

<p><b>Summary</b> The primary value of the Bayesian paradigm is its formal elegance which allows automation of key problems: probability takes care of unpredictability in phenomena, priors help prevent overfitting by providing outside experience (AI practitioners would refer to it as background knowledge), the use of model uncertainty helps determine the reliability of predictions, and applied Bayesians are beginning to develop model compilers! </p>

<p><b>Future</b> The theory and practice of data analysis is currently all mixed up among a number of overlapping disciplines: (applied/mathematical/geo/medical/...)statistics, machine learning, data mining, (econo/psycho/bio)metrics, bioinformatics. All of them pursue the same problems with different but qualitatively similar tools, lacking the scale to build tools that would help them get to the next level. It is important to disentangle them. The future of data analysis should lie on these four fronts: <br />
<ol><br />
<li><b>reliable compilers and samplers</b> that will work with large databases, provide reliable sampling (see <a href="http://www.mrc-bsu.cam.ac.uk/bugs/">BUGS</a>, <a href="http://www.cs.utah.edu/~hal/HBC/">HBC</a> - empowered by the new generation of programming languages such as Haskell)</li></p>

<p><li><b>internet databases</b> intended to manage background knowledge and related data sets, where the same variable appears and the same phenomenon appear in multiple tables, allowing priors to be based on more than a single data set. Research should be presented as raw data in a standardized form, not as reports and aggregates that prevent others from building on top of the finished work. Too many people are working on the same problems but not sharing the data because of an unsolved issue of the rights of the collectors of data who can only gain credit for publications (see <a href="http://www.freebase.com/">FreeBase</a>, <a href="http://archive.ics.uci.edu/ml/">Machine Learning Repository</a>, <a href="http://www.trendrr.com/">Trendrr</a>, <a href="http://swivel.com">Swivel</a>, <a href="http://lysander.sourceoecd.org/">OECD.Stat</a>)</li></p>

<p><li><b>visualization & modeling environments</b> that make it easier to clean and transform data, experiment with models, to present insights, to reduce the amount of time needed to turn data into a model that can be communicated. (see <a href="http://www.r-project.org/">R Project</a>, <a href="http://processing.org/">Processing</a>, <a href="http://www.gapminder.org/">Gapminder</a>)</li></p>

<p><li><b>interpretable modeling</b> is important to bring formal models closer to human intuition. It is still not clear what is the importance of a predictor for the outcome - the regression coefficient is close, but yet often confusing. With more powerful modeling frameworks, it is going to be possible to focus on this - not being worried about what one can fit, but instead with model choice, model selection, model language, visual language.</li><br />
</ol></p>

<p>What do you think? What links did we miss?</p>]]></description>
<guid isPermaLink="false">2058@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p><b>Introduction</b> A few days ago I was trying to explain the benefits of the Bayesian approach to a physicist who didn't care about the religion of truth and inference but primarily about solving a particular detection problem in particle physics. The probabilistic approach is rather standard and requires little persuasion, but the Bayesian aspect is is a level further than the probabilistic approach. So what is the benefit of the Bayesian approach? This posting will attempt to provide several reasons, from the most obvious to the least.</p>

<p><b>Frequentist Probability</b> Probability is easily justified as a very elegant way of dealing with uncertainty in cases and variables. But probability is not observed directly but instead inferred - as are the parameters in contrast to observable predictors and outcomes. Frequentists state that the probability should be measured through the gold standard of an infinite sequence of observations, and question the benefit of Bayesian approach while criticizing the fact that inferring a parameter Bayesianly can yield worse accuracy than their favored method of "estimators" - and a bad prior can totally mess up inference. So why not use estimators if their asymptotic properties are good and the methodology often simpler than Bayes?</p>

<p><b>Overfitting</b> Dividing the number of positive outcomes with the number of all outcomes to estimate the probability of the positive outcome is a very simple estimator: it's easy to have enough data to calculate this. But most interesting questions are not as simple: it is not interesting to calculate the probability of getting cancer, and the probability of getting cancer given smoking also requires removing the obvious effect of age. All these additional variables make a model more complicated, and the number of parameters greater. Without care and attention the model can start hallucinating properties that aren't there. The problem is shown in the following picture:</p>

<p><img alt="why-bayes.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/why-bayes.png" width="432" height="403" /></p>

<p>If your modeling problem is in the green area, you can happily use estimators or maximum likelihood. If you're entering the yellow area and want to retain some generalization power, you need some sort of regularization, epitomized by L1 and L2 regularization, AIC, feature selection or support vector machines. So why shouldn't we just regularize?</p>

<p><b>Priors</b> Priors are how a Bayesian would perform regularization. After seeing a large number of regression problems from medical domains, we can safely assign a prior distribution to the size of a regression coefficient, as we have done in <a href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1010421">our paper</a>. But then, what is the advantage over regularization? A prior is just a distribution of what the parameters should be over a particular category of problems! Isn't this a nice way to formulate regularization?</p>

<p><b>Model Uncertainty</b> The crux of Bayes is in using probability to represent the uncertainty about the Platonic - the model, its parameters, the probability. The Bayesian approach truly starts paying a dividend when there is uncertainty in models and parameters, when we have insufficient data to accurately fit the model. Even if an estimator could rather accurately match the predictions obtained by a posterior, the variance in the posterior allows us to understand when the model can't be fit. To the best of my knowledge, no other methodology can automatically detect such problems.</p>

<p>Another problem that Andrew identified is that there might be situations where the data doesn't match the model very well - and even though there might be lots of data and a relatively simple model - it just doesn't fit, and the posterior will be vague.</p>

<p><b>Language of Modeling</b> WinBUGS is an example of a <i>higher-level modeling language</i>. Just as programming languages have been celebrated as improving programmers' productivity: they do not require the programmer to think in terms of individual statements such as SET or JMP but in terms of functions, procedures, loops. Similarly, with Bayesian models we no longer have to think in terms of derivatives and fitting algorithms, but in terms of parameters having distributions and tied together in models. Gibbs sampler is a general-purpose fitter and proto-compiler. Of course, it's not nearly as efficient as a hand-written optimizer, but in the future tools like the Hierarchical Bayes Compiler (<a href="http://www.cs.utah.edu/~hal/HBC/">HBC</a>) will create custom fitters given a higher-level specification of the model.</p>

<p><b>Summary</b> The primary value of the Bayesian paradigm is its formal elegance which allows automation of key problems: probability takes care of unpredictability in phenomena, priors help prevent overfitting by providing outside experience (AI practitioners would refer to it as background knowledge), the use of model uncertainty helps determine the reliability of predictions, and applied Bayesians are beginning to develop model compilers! </p>

<p><b>Future</b> The theory and practice of data analysis is currently all mixed up among a number of overlapping disciplines: (applied/mathematical/geo/medical/...)statistics, machine learning, data mining, (econo/psycho/bio)metrics, bioinformatics. All of them pursue the same problems with different but qualitatively similar tools, lacking the scale to build tools that would help them get to the next level. It is important to disentangle them. The future of data analysis should lie on these four fronts: <br />
<ol><br />
<li><b>reliable compilers and samplers</b> that will work with large databases, provide reliable sampling (see <a href="http://www.mrc-bsu.cam.ac.uk/bugs/">BUGS</a>, <a href="http://www.cs.utah.edu/~hal/HBC/">HBC</a> - empowered by the new generation of programming languages such as Haskell)</li></p>

<p><li><b>internet databases</b> intended to manage background knowledge and related data sets, where the same variable appears and the same phenomenon appear in multiple tables, allowing priors to be based on more than a single data set. Research should be presented as raw data in a standardized form, not as reports and aggregates that prevent others from building on top of the finished work. Too many people are working on the same problems but not sharing the data because of an unsolved issue of the rights of the collectors of data who can only gain credit for publications (see <a href="http://www.freebase.com/">FreeBase</a>, <a href="http://archive.ics.uci.edu/ml/">Machine Learning Repository</a>, <a href="http://www.trendrr.com/">Trendrr</a>, <a href="http://swivel.com">Swivel</a>, <a href="http://lysander.sourceoecd.org/">OECD.Stat</a>)</li></p>

<p><li><b>visualization & modeling environments</b> that make it easier to clean and transform data, experiment with models, to present insights, to reduce the amount of time needed to turn data into a model that can be communicated. (see <a href="http://www.r-project.org/">R Project</a>, <a href="http://processing.org/">Processing</a>, <a href="http://www.gapminder.org/">Gapminder</a>)</li></p>

<p><li><b>interpretable modeling</b> is important to bring formal models closer to human intuition. It is still not clear what is the importance of a predictor for the outcome - the regression coefficient is close, but yet often confusing. With more powerful modeling frameworks, it is going to be possible to focus on this - not being worried about what one can fit, but instead with model choice, model selection, model language, visual language.</li><br />
</ol></p>

<p>What do you think? What links did we miss?</p><a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2058" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/the_future_of_bayes.html#comments" title="Comment on: The Future of Data Analysis">Comments (4)</a></p>]]></content:encoded>
<dc:subject>Bayesian Statistics</dc:subject>
<dc:date>2008-11-19T17:10:01-05:00</dc:date>
</item>
<item>
<title>Genetically-influenced traits running in families</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/geneticalyinflu.html</link>
<description><![CDATA[<p><a href="http://www.newyorker.com/reporting/2008/11/10/081110fa_fact_seabrook">John Seabrook writes</a>:</p>

<blockquote>There is also little consensus among researchers about what causes psychopathy. Considerable evidence, including several large-scale studies of twins, points toward a genetic component. Yet psychopaths are more likely to come from neglectful families than from loving, nurturing ones.</blockquote>

<p>I'm confused here.  If there's a big genetic component, wouldn't it stand to reason that parents of psychopaths are more likely to be neglectful and less likely to be loving and nurturing?  So why the "Yet" in the quote above?  Or is there something I'm missing?</p>

<p><strong>P.S. in response to commenters:</strong>  Yes, I agree that it's <em>possible</em> for psychopathy to be largely genetic without parents of psychopaths being much more likely to be neglectful.</p>

<p>What I didn't understand was Seabrook's implication that this would be <em>surprising</em>, the idea that if (a) a trait is genetically linked, and (b) a trait can be (somewhat) predicted by parental behavior, that the combination of (a) and (b) should be considered puzzling. By default, I'd think (a) and (b) would go together.</p>]]></description>
<guid isPermaLink="false">2057@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p><a href="http://www.newyorker.com/reporting/2008/11/10/081110fa_fact_seabrook">John Seabrook writes</a>:</p>

<blockquote>There is also little consensus among researchers about what causes psychopathy. Considerable evidence, including several large-scale studies of twins, points toward a genetic component. Yet psychopaths are more likely to come from neglectful families than from loving, nurturing ones.</blockquote>

<p>I'm confused here.  If there's a big genetic component, wouldn't it stand to reason that parents of psychopaths are more likely to be neglectful and less likely to be loving and nurturing?  So why the "Yet" in the quote above?  Or is there something I'm missing?</p>

<p><strong>P.S. in response to commenters:</strong>  Yes, I agree that it's <em>possible</em> for psychopathy to be largely genetic without parents of psychopaths being much more likely to be neglectful.</p>

<p>What I didn't understand was Seabrook's implication that this would be <em>surprising</em>, the idea that if (a) a trait is genetically linked, and (b) a trait can be (somewhat) predicted by parental behavior, that the combination of (a) and (b) should be considered puzzling. By default, I'd think (a) and (b) would go together.</p><a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2057" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/geneticalyinflu.html#comments" title="Comment on: Genetically-influenced traits running in families">Comments (6)</a></p>]]></content:encoded>
<dc:subject>Miscellaneous Statistics</dc:subject>
<dc:date>2008-11-19T14:06:12-05:00</dc:date>
</item>
<item>
<title>Left-handers are more likely to be depressed</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/lefthanders_are.html</link>
<description><![CDATA[<p>Kevin Denny <a href="http://www.ucd.ie/economics/research/papers/2008/WP08.14.pdf">writes</a>:</p>

<blockquote>Depressive symptoms are significantly higher amongst left-handed men. While 19% of right handed men report experiencing depressive symptoms for at least a two week period, the figure for left handed men is almost 25%. For women the corresponding percentages are 33% and 36% respectively but the difference is not statistically significant.</blockquote>

<p>The analysis is of "a new large population survey from twelve European countries," a random sample of 27000 non-institutionalized people aged 50 and older.  Handedness was classified based on self-reporting, and depression is measured using standard questions.  Of the sample, about 7% of men and 6% of women were classified as left-handed.</p>

<p>My only suggestion (beyond reporting fewer significant digits in the tables) is to rescale the depression scale by <a href="http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf">dividing by two standard deviations</a>; this would allow the coefficients to be interpretable on the same scale as those for the binary outcome (see Table 2).</p>]]></description>
<guid isPermaLink="false">2054@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p>Kevin Denny <a href="http://www.ucd.ie/economics/research/papers/2008/WP08.14.pdf">writes</a>:</p>

<blockquote>Depressive symptoms are significantly higher amongst left-handed men. While 19% of right handed men report experiencing depressive symptoms for at least a two week period, the figure for left handed men is almost 25%. For women the corresponding percentages are 33% and 36% respectively but the difference is not statistically significant.</blockquote>

<p>The analysis is of "a new large population survey from twelve European countries," a random sample of 27000 non-institutionalized people aged 50 and older.  Handedness was classified based on self-reporting, and depression is measured using standard questions.  Of the sample, about 7% of men and 6% of women were classified as left-handed.</p>

<p>My only suggestion (beyond reporting fewer significant digits in the tables) is to rescale the depression scale by <a href="http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf">dividing by two standard deviations</a>; this would allow the coefficients to be interpretable on the same scale as those for the binary outcome (see Table 2).</p><a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2054" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/lefthanders_are.html#comments" title="Comment on: Left-handers are more likely to be depressed">Comments (1)</a></p>]]></content:encoded>
<dc:subject>Miscellaneous Science</dc:subject>
<dc:date>2008-11-19T09:02:25-05:00</dc:date>
</item>
<item>
<title>Estimated votes by county among non-blacks</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/estimates_votes.html</link>
<description><![CDATA[<p><a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/03/new_faces_in_po.html">Ben Lauderdale</a> writes:</p>

<blockquote>I [Ben] had this map [see below] on my door for the last week.  Based on <a href="http://redbluerichpoor.com/blog/?p=289">exactly the same calculation using constant 95% black support and census-proportional representation</a>.  The white counties are the ones whose census names didn't match properly with the names used in the library(maps) package in R, I was too lazy to fix them.</blockquote>

<p><img alt="ben1.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/ben1.png" width="524" height="337" /></p>

<p>Cool.  I'd only suggest using light gray rather than heavy black lines between counties; the map as it is overemphasizes the county borders, I think.  But I respect his laziness; there's always time later to fix the details.</p>

<p>Ben continues:  </p>

<blockquote>[Below are] the state-by-state county share plots for the lower 49, Obama vote share as a function of black population share.  V.O. Key's observation that whites who live near blacks in southern states are less positively inclined towards them is *still* visible in several states.</blockquote>

<p><img alt="ben2.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/ben2.png" width="657" height="658" /></p>

<p>The circle areas are proportional to county voter turnout.  (The biggest circle is L.A. county in California, and so forth.)</p>

<p>Ben also had this comment about his map:</p>

<blockquote>It reminded me of something Bob Putnam would say every time someone presented an empirical talk in our Center for the Study of Democratic Politics series during the year he was a fellow here at Princeton: "You should include miles to the Canadian border as a variable in your regression, it is the most important proxy for political culture in America!"  At least in the eastern half of the country, he has a point.</blockquote>

<p>Except for New Hampshire and Vermont, I think. </p>

<p><strong>P.S.</strong>  For graphics enthusiasts, here are some earlier graphs that I gave the thumbs-down on before Ben came up with the 50 plots above:</p>]]></description>
<guid isPermaLink="false">2055@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p><a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/03/new_faces_in_po.html">Ben Lauderdale</a> writes:</p>

<blockquote>I [Ben] had this map [see below] on my door for the last week.  Based on <a href="http://redbluerichpoor.com/blog/?p=289">exactly the same calculation using constant 95% black support and census-proportional representation</a>.  The white counties are the ones whose census names didn't match properly with the names used in the library(maps) package in R, I was too lazy to fix them.</blockquote>

<p><img alt="ben1.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/ben1.png" width="524" height="337" /></p>

<p>Cool.  I'd only suggest using light gray rather than heavy black lines between counties; the map as it is overemphasizes the county borders, I think.  But I respect his laziness; there's always time later to fix the details.</p>

<p>Ben continues:  </p>

<blockquote>[Below are] the state-by-state county share plots for the lower 49, Obama vote share as a function of black population share.  V.O. Key's observation that whites who live near blacks in southern states are less positively inclined towards them is *still* visible in several states.</blockquote>

<p><img alt="ben2.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/ben2.png" width="657" height="658" /></p>

<p>The circle areas are proportional to county voter turnout.  (The biggest circle is L.A. county in California, and so forth.)</p>

<p>Ben also had this comment about his map:</p>

<blockquote>It reminded me of something Bob Putnam would say every time someone presented an empirical talk in our Center for the Study of Democratic Politics series during the year he was a fellow here at Princeton: "You should include miles to the Canadian border as a variable in your regression, it is the most important proxy for political culture in America!"  At least in the eastern half of the country, he has a point.</blockquote>

<p>Except for New Hampshire and Vermont, I think. </p>

<p><strong>P.S.</strong>  For graphics enthusiasts, here are some earlier graphs that I gave the thumbs-down on before Ben came up with the 50 plots above:</p><p><a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/estimates_votes.html" title="Continue Reading: Estimated votes by county among non-blacks">Continued reading Estimated votes by county among non-blacks...</a>
<p>Posted by Andrew at November 18, 2008  4:25 PM</p><p>
<a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2055" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/estimates_votes.html#comments" title="Comment on: Estimated votes by county among non-blacks">Comments (4)</a></p>]]></content:encoded>
<dc:subject>Political Science</dc:subject>
<dc:date>2008-11-18T16:25:29-05:00</dc:date>
</item>
<item>
<title>For teaching a course in the comic novel</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/for_teaching_a.html</link>
<description><![CDATA[<p>A colleague was asking for suggestions for teaching a course in the comic novel.  Beyond the obvious (Waugh, Wodehouse, Roth, Nabokov), I thought of:Our Man in Havana, by Graham Greene.  Twain is another obvious call, except that his funny novels are also serious.  The funniest non-serious thing I know of by Twain is Adam's Diary, but that's just a short story.  We also discussed End Zone by Don DeLillo.  And I've also heard that Gulliver's Travels is pretty good; I've never read it.  I also think much of The Sportswriter and Independence Day by Richard Ford are hilarious, but I don't think they'd be classified as comic novels.</p>

<p>My latest thought is Little Children by Tom Perotta.  It's an excellent book but it's not a great work of art, but that's the point:  when teaching a class, maybe it's better to have something where the seams show a little.</p>

<p>P.S.  See comments below.  Also, Bridget Jones's Diary.  And some kids' humor book:  not something like Lemony Snicket that's supposed to be good, but something more lowbrow such as Goosebumps or Captain Underpants, to get a sense of what people think is funny.  Also, something funny but completely non-novel-like, for example Chris Rock's book.  Students can compare how the comic novels differ from the quick jokes.</p>]]></description>
<guid isPermaLink="false">2053@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p>A colleague was asking for suggestions for teaching a course in the comic novel.  Beyond the obvious (Waugh, Wodehouse, Roth, Nabokov), I thought of:Our Man in Havana, by Graham Greene.  Twain is another obvious call, except that his funny novels are also serious.  The funniest non-serious thing I know of by Twain is Adam's Diary, but that's just a short story.  We also discussed End Zone by Don DeLillo.  And I've also heard that Gulliver's Travels is pretty good; I've never read it.  I also think much of The Sportswriter and Independence Day by Richard Ford are hilarious, but I don't think they'd be classified as comic novels.</p>

<p>My latest thought is Little Children by Tom Perotta.  It's an excellent book but it's not a great work of art, but that's the point:  when teaching a class, maybe it's better to have something where the seams show a little.</p>

<p>P.S.  See comments below.  Also, Bridget Jones's Diary.  And some kids' humor book:  not something like Lemony Snicket that's supposed to be good, but something more lowbrow such as Goosebumps or Captain Underpants, to get a sense of what people think is funny.  Also, something funny but completely non-novel-like, for example Chris Rock's book.  Students can compare how the comic novels differ from the quick jokes.</p><a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2053" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/for_teaching_a.html#comments" title="Comment on: For teaching a course in the comic novel">Comments (15)</a></p>]]></content:encoded>
<dc:subject>Literature</dc:subject>
<dc:date>2008-11-17T23:33:46-05:00</dc:date>
</item>
<item>
<title>Bayesian Analysis for the Intelligence Community</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/bayesian_analys.html</link>
<description><![CDATA[<p>Drew Conway pointed me to <a href="http://blogs.nyu.edu/blogs/agc282/zia/2008/11/inital_forays_into_bayesian_an_1.html">this</a>:</p>

<blockquote>The article entitled, "Bayesian Analysis for Intelligence: Some Focus on the Middle East," was written by Nicholas Schweitzer . . . JIOX provides no information on the essay's origins, but . . . it appears to be a declassified CIA piece written sometime in the 1970's (note mentions of Presidents Asad and Sadat, and Prime Minister Rabin on page one). . . . Schweitzer concludes that in general the Bayesian technique was able to more quickly predict "non-events" (i.e., when no hostilities would occur among Middle Eastern nations) than analysts using only their expertise and intuitions. The research design included no baseline for comparison to an actual event; therefore, we are left wondering if the Bayesian technique described here would be able to predict when something will actually happen. Despite this obvious shortcoming, it is very encouraging to observe the level of sophistication being implemented by CIA analysts some thirty-odd years ago.</blockquote>

<p>I actually participated a couple years ago in an (unclassified) meeting on Bayesian analysis for military intelligence, so I know that these ideas are still out there.  My only comment, regarding the Bayesian issue per se, is that the key to good statistical methods is typically making use of relevant information; non-Bayesian methods can also be effective if they can be adapted to use the info that goes into a Bayesian procedure.</p>]]></description>
<guid isPermaLink="false">2052@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p>Drew Conway pointed me to <a href="http://blogs.nyu.edu/blogs/agc282/zia/2008/11/inital_forays_into_bayesian_an_1.html">this</a>:</p>

<blockquote>The article entitled, "Bayesian Analysis for Intelligence: Some Focus on the Middle East," was written by Nicholas Schweitzer . . . JIOX provides no information on the essay's origins, but . . . it appears to be a declassified CIA piece written sometime in the 1970's (note mentions of Presidents Asad and Sadat, and Prime Minister Rabin on page one). . . . Schweitzer concludes that in general the Bayesian technique was able to more quickly predict "non-events" (i.e., when no hostilities would occur among Middle Eastern nations) than analysts using only their expertise and intuitions. The research design included no baseline for comparison to an actual event; therefore, we are left wondering if the Bayesian technique described here would be able to predict when something will actually happen. Despite this obvious shortcoming, it is very encouraging to observe the level of sophistication being implemented by CIA analysts some thirty-odd years ago.</blockquote>

<p>I actually participated a couple years ago in an (unclassified) meeting on Bayesian analysis for military intelligence, so I know that these ideas are still out there.  My only comment, regarding the Bayesian issue per se, is that the key to good statistical methods is typically making use of relevant information; non-Bayesian methods can also be effective if they can be adapted to use the info that goes into a Bayesian procedure.</p><a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2052" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/bayesian_analys.html#comments" title="Comment on: Bayesian Analysis for the Intelligence Community">Comments (4)</a></p>]]></content:encoded>
<dc:subject>Bayesian Statistics</dc:subject>
<dc:date>2008-11-17T21:40:44-05:00</dc:date>
</item>
<item>
<title>Too loose</title>
<link>http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/too_loose.html</link>
<description><![CDATA[<p>Will Wilkinson interviewed me for Bloggingheads today, and it was a disaster.  I was too relaxed and I treated it as a conversation rather than a formal presentation or interview.  As a result, I did too much b.s.-ing and too much conversational yapping, and not enough presentation of our research findings.  I also said a bunch of things that are interesting or funny in informal conversation but probably come off as obnoxious or off-the-cuff in an interview that can be viewed interactively.</p>

<p>It's too bad, because my Red State, Blue State presentation is fun and informative, and I think the radio interviews I've done (with lengths ranging from 5 minutes to an hour) have gone well also.  The two things that threw me off:<br />
1.  I've met Will before and I felt comfortable with him, hence too relaxed.  Will was an excellent interviewer and gave me many opportunities to explain things; it wasn't his fault that I spouted off too much.<br />
2.  I've already spoken with Will about the book and so it was hard for me to remember to start from scratch--the audience won't necessarily be familiar with it.<br />
3.  Seeing my image in front of me while I was talking made me extra-focused on not twitching--always a bad thing.  In a face-to-face or telephone interview, I usually forget about the twitching after a minute or so.  Trying to suppress it takes a lot of mental effort that would be better used to think about my responses.</p>

<p>It would've been better to have some written talking points in front of me to keep me focused.  The funny thing is, I did that for my early radio interviews but as I got more used to the format, I started speaking more off the cuff and it was going fine.  This was just an interview too far.   I had fun while it was happening, but afterward I realized what had gone wrong.</p>

<p>Anyway, it felt good to get this off my chest.</p>]]></description>
<guid isPermaLink="false">2050@http://www.stat.columbia.edu/~cook/movabletype/mlm/</guid>
<content:encoded><![CDATA[<p>Will Wilkinson interviewed me for Bloggingheads today, and it was a disaster.  I was too relaxed and I treated it as a conversation rather than a formal presentation or interview.  As a result, I did too much b.s.-ing and too much conversational yapping, and not enough presentation of our research findings.  I also said a bunch of things that are interesting or funny in informal conversation but probably come off as obnoxious or off-the-cuff in an interview that can be viewed interactively.</p>

<p>It's too bad, because my Red State, Blue State presentation is fun and informative, and I think the radio interviews I've done (with lengths ranging from 5 minutes to an hour) have gone well also.  The two things that threw me off:<br />
1.  I've met Will before and I felt comfortable with him, hence too relaxed.  Will was an excellent interviewer and gave me many opportunities to explain things; it wasn't his fault that I spouted off too much.<br />
2.  I've already spoken with Will about the book and so it was hard for me to remember to start from scratch--the audience won't necessarily be familiar with it.<br />
3.  Seeing my image in front of me while I was talking made me extra-focused on not twitching--always a bad thing.  In a face-to-face or telephone interview, I usually forget about the twitching after a minute or so.  Trying to suppress it takes a lot of mental effort that would be better used to think about my responses.</p>

<p>It would've been better to have some written talking points in front of me to keep me focused.  The funny thing is, I did that for my early radio interviews but as I got more used to the format, I started speaking more off the cuff and it was going fine.  This was just an interview too far.   I had fun while it was happening, but afterward I realized what had gone wrong.</p>

<p>Anyway, it felt good to get this off my chest.</p><a href="http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi?__mode=view&entry_id=2050" onclick="OpenTrackback(this.href); return false">TrackBack (0)</a> | <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/too_loose.html#comments" title="Comment on: Too loose">Comments (1)</a></p>]]></content:encoded>
<dc:subject>Teaching</dc:subject>
<dc:date>2008-11-17T20:03:53-05:00</dc:date>
</item>


</channel>
</rss>
