January 2007 Archives

Families of prisoners

| 4 Comments

Ernest Drucker sent along this paper along with this story:

I [Drucker] have been speaking about the problem of mass imprisonment for years to anyone who would listen – mostly professional groups and students. Once I spoke to the Urban League national convention in Pittsburgh - to little response. But one such talk was to the medical students at Einstein where I teach - they had organized a social medicine course outside the formal curriculum and I was happy to see their interest went beyond clinical medicine. My topic was the epidemiology of incarceration. I showed all my usual PPT slides – tables of data showing the sharp rise in imprisonment in the USA over the last 30 years- and of how far imprisonment had spread in the black community. I talked about the epidemic and our countries drug war policies – something I’ve done dozens of times.

But in the audience was Dean S who came up to me after the presentation and asked if Id be willing to give the same talk to her group of students – who it turned out were all in High school in the Bronx. They were in the Einstein program to bring Bronx HS kids into the medical school labs and hospitals – to let them see about medical science and maybe interest them in careers in health in some way. As select HS students many would go to college, so maybe it was a bit of early recruiting of local talent for Einstein admission in 4 or 5 years. They came from most of the public and parochial high school of the Bronx, but these kids were the pick of the crop. To be in the program they had to sign up in advance fore limited number of slots, they had to get up there to Einstein every week for a term, and their parents were supposed to come in too- for a conference with Dean S about their progress. These were serious kids from families that support their academic goals ands valued education enough to go to some extra trouble to cultivate it.

As I often do with audiences, I asked who had ever had a family member or close friend go to prison. To my amazement they all raised their hands – 100% of them had a member who had been in prison - that’s a very simple and striking measure of the prevalence of incarceration at this time in the Bronx – every family in this select group was affected directly by incarceration.

We'll be talking with Ernie this Thursday in the social networks working group. His work seems related to the ideas of this paper (or at least to its title).

Here.

Dr. Bob Sports

| No Comments

Here's another one from Chance News (scroll down to "Hot streaks rarely last"), John Gavin refers to an interesting article from the Wall Street Journal by Sam Walker about a guy called Dr. Bob who has a successful football betting record:

YEAR WIN/LOSS/TIE %
1999 49-31-1 61%
2000 47-25-0 65%
2001 35-28-0 56%
2002 49-44-3 53%
2003 46-55-2 46%
2004 55-34-1 62%
2005 51-21-2 71%
2006 45-34-3 57%

Chris Paulse writes,

A puzzle from Laurie Snell

| 10 Comments

From Chance News (scroll down to "A Challenge"), Laurie Snell writes:

The mathematics department at Dartmouth has just moved to a new building and the previous math building is being demolished. The students called this building "Shower Towers" suggest by this picture of one wall of the building.

bradley2.jpg

For at least 30 years we walked by this wall assuming that the tiles were randomly placed. One day, as we were walking by it, our colleaugue John Finn said "I see they are not randomly placed." What did he see?

This is sort of a funny quote, though, from a statistician's perspective, because those of us who do survey sampling know that random assignment is hard: in this case, they'd either have to have a pile of tiles that they randomly select from, or else take tiles and put them in random locations. Neither of these is easy, as it requires picking random numbers from a long list, or physical randomization of a really heavy pile of tiles!

Sometimes statisticians use the word "haphazard" to represent processes that do not have any known distribution and so would not be called "random" in the usual statistical sense.

Vin Scully is a Republican

| No Comments

From Sports Media Watch, via Can't Stop the Bleeding. Kareem appears to be a Democrat, but he gave to Bill Bradley so maybe this doesn't really count.

Sarah Igo came yesterday in our seminar to tell us about her recent book, The Averaged American. It was a lot of fun, and she commented that when she speaks to historians, they just let her speak, but we're more fun because we interrupt her frequently. I assured her that if interruption=fun, then economists are the most fun of all...

The talk was an interesting historical overview of the Middletown, Gallup, and Kinsey surveys with some fun historical photos. One of the best things about the photos was the hidden world they revealed--all the things in the backgrounds. It reminded me of when I watched a bunch of Speed Racer cartoons with Phil in a movie theater in the early 90s. These were low-budget Japanese cartoons from the 60s that we loved as kids. From my adult perspective, the best parts were during the characters' long drives, where you could see Japanese industrial scenes in the background.

Survey costs and technology

To get back to Igo's topic: her main theme was how the existence of surveys changed Americans' ideas of the mass public as an agglomeration of individuals rather than interest groups or social classes. Thinking about the changes since the 1950s, there are a lot more polls now, and this would seem to be technology-driven: it's just a lot cheaper to do a survey now than it used to be. I actually think it should be considered unethical to survey people without compensating them.

Relevant to the discussion here is the hilarious (but sad) movie Our Brand is Crisis, which shows U.S. political consulting techniques being introduced in Bolivia.

What do we learn from polls?

There was a lively discussion in the seminar about what people learn from surveys. Most obviously, they learn issue opinions, for example that most people support the death penalty, the country is divided on abortion, and most people oppose the Iraq war. I commented that this tends to reduce the "availability bias" under which people tend to think that most people share their opinions on issues: for example, I might think everyone supports my preferred Presidential candidate, but a simple poll will tell me it's 50/50. But a couple of other possibilities came up:

1. Majorities can be politically strentghened. For example, once it is known that over 60% of the people agree with them, this strengthens the political efforts of death penalty advocates.

2. Minorities can be politically strengthened, for example, when Kinsey's results were extrapolated to estimate that 10% of Americans were gay [this was questionable for methodological reasons, but that's another story].

3. Tolerance: once I realize that I'm on the minority in many issues, I become more tolerant of minority rights in general. Of course, most people are in the majority in any given issue (if there are 2 options), but each of us is in the minority on some issues.

4. Polling and the illusion of control: According to Igo, presidents from Roosevelt on have had internal polling organizations. I conjecture that this convenient polling gives politicians the illusion that they can't lose--that they can just use polling to alter their pitches until they come up with something popular.

Shortening of the long tails

| 6 Comments

Chris Anderson popularized the idea that internet will fundamentally change the way media works: while the mass media and retail create a small number of big hits, the internet will "flatten" the field in the sense that there will be a large number of smaller hits. There are particular reasons for short-tailed world:

  • limited shelf space - smaller choice, the ubiquity of the "popular"
  • mass media broadcasts - coverage of the universally ("lowest common denominator") popular at the expense of the niche interests
The delivery model of internet retail obliterates the shelf space restrictions. The distributed media, such as blogs, can cater to more specific interests. The consequence is that people can find something closer to their needs, but at the same time the amount of investment into those works is smaller.

Ilya Grigorik used the Netflix data (I have written about his analysis already) to examine the shifts in the taste across years on Netflix. He finds that with years from 2000 to 2005, the head has grown bigger, but the tail thinner:

Longtail comparison

The blockbusters have increased, while the tail has become thinner, especially among the hits. On the first sight, this would appear to indicate that the long-tailed vision is wrong. I disagree: the reasons lie in marketing. There is a distinct difference between early adopters and late adopters. Early adopters are more experimental in their tastes, which is the fundamental reason why they would go bother with novelty services. Moreover, they cannot find their favorite niche movie in the local Blockbuster, driving them towards mail distribution.

As the early adopters spread the word, as Netflix polishes up their service and reduces the cost, more and more of the late adopters join the ranks of Netflix customers, but these late adopters 1) do not have the itch for the rare 2) are not experimental 3) are still largely governed by the mass media. My guess is that the thinning of the tail is temporary.

In response to a summary by Alfred G. Cuzán of election forecasts, I wrote,

A new causality blog

| 9 Comments

A group of from University of California in Los Angeles, including the popular author of books on Bayesian networks (sometimes referred to as belief networks or as graphical models, as they aren't Bayesian in the Bayesian statistics sense) and causality Judea Pearl, have set up a new blog on causality. Their approach to causality is based on probability theory with random variables and operators. For a taste of it, see "Causality is undefinable" or "The meaning of counterfactuals".

While it takes the form of a blog, the system is more like a help line. The good stuff is often in the comments.

Jeff Heer reports that IBM has released their Many Eyes platform for browser-based data analysis. I have already written about Swivel, and there is another similar system called Data 360. However, the Many Eyes seems to be the most impressive of all, with very clean visualizations and numerous types of graphs, including, for example, social networks and maps.


manyeyes.png

Indecision and agency

| No Comments

I recently read Indecision by Benjamin Kunkel. It was hilarious--I can't wait to read more by him. Beyond this, I noticed two thing about the book that I didn't see mentioned in any reviews:

1. The book reminded me a lot of the work of Philip K. Dick, in particular that all the characters had agency. That is, each character had his or her own ideas and seemed to act on his or her own ideas, rather than merely carrying the plot along or providing scenery. Not to many books (or dramatic productions) have this feature. The funny comments by the various characters (while just "being themselves") contributed a lot to the book's humor.

2. Much was made of the main character's indecision, and how he finally becomes more decisive. But, reading between the lines, it's clear that indecision has been good to Dwight. He likes to be with women but is never quite sure he wants to be with whoever he's with, and (in the context of the book, at least) this just draws them in. It's the indecision--the not needing it--that makes him so appealing to these women. But then, at the end, when he decides he really does want to be with a particular person, she tells him no. At least in this aspect of Dwight's life, indecision worked better than the alternative.

The Machine Learning Department

| 6 Comments

Radev pointed me to this discussion by John Langford of Carnegie-Mellon's new Machine Learning department. I don't have much to add to the links and comments posted there, but I'm generally supportive of new academic departments, or else of letting existing departments become more flexible about requirements. I think Columbia's Statistics Department would improve by splitting into two separate departments:
- Applied Statistics
- Probability and Theoretical Statistics
or else becoming a single department of Statistics and Probability with formal tracks for students and faculty in the different subfields. As it is, the statisticians end up suffering through measure theory and the probabilists spend a lot of time teaching intro statistics to unhappy undergrads. I don't think there's anything wrong with statisticians learning measure theory, but in practice it takes valuable time and effort that they could instead be using to learn computer science, or economics, or some other sister discipline. (I agree, perhaps, with John's statement of "‘rogramming as the missing member of reading, ‘riting, and ‘rithmetic.")

Anyway, having a Machine Learning department sounds like a good idea if it means that the students and faculty there can have a bit more flexibility in what they can do. I also think of statistics as a branch of engineering but it's not usually in engineering schools. (Columbia has an operations research department in the engineering school but they also do a lot of theoretical probability; some sort of rejiggering seems possible to me, just as they moved Dallas out of the NFC East (I assume they've done that by now???) etc.)

Slightly related is the fact that it can be difficult to persuade statistics Ph.D. students to take courses in experimental design and sample surveys, even though these are huge application areas. And then there's this. My experience at Berkely taught me to be an intellectual pluralist.

Bayesian model selection

| No Comments

A researcher writes,

I have made use of the material in Ch. 6 of your Bayesian Data Analysis book to help select among candidate models for inference in risk analysis. In doing so, I have received some criticism from an anonymous reviewer that I don't quite understand, and was wondering if you have perhaps run into this criticism. Here's the setting. I have observable events occurring in time, and I need to choose between a homogeneous Poisson process, and a nonhomogeneous Poisson process, in which the rate is a function of time ( e.g., lognlinear model for the rate, which I'll call lambda).

I could use DIC to select between a model with constant lambda and one where the log of lambda is a linear function of time. However, I decided to try to come up with an approach that would appeal to my frequentist friends, who are more familiar with a chi-square test against the null hypothesis of constant lambda. So, following your approach in Ch. 6, I had WinBUGS compute two posterior distributions. The first, which I call the observed chi-square, subtracts the posterior mean (mu[i] = lambda[i]*t[i]) from each observed value, square this, and divides by the mean. I then add all of these values up, getting a distribution for the total. I then do the same thing, but with draws from the posterior predictive distribution of X. I call this the replicated chi-square statistic.

If my putative model has good predictive validity, it seems that the observed and replicated distributions should have substantial overlap. I called this overlap (calculated with the step funtion in WinBUGS) a "Bayesian p-value." The model with the larger p-value is a better fit, just like my frequentist friends are used to.

Now to the criticism. An anonymous reviewer suggests this approach is weakened by "using the observed data twice." Well, yes, I do use the observed data to estimate the posterior distribution of mu, and then I use it again to calculate a statistic. However, I don't see how this is a problem, in the sense that empirical Bayes is problematic to some because it uses the data first to estimate a prior distribution, then again to update that prior. I am also not interested in "degrees of freedom" in the usual sense associated with MLEs either.

I am tempted to just write this off as a confused reviewer, but I am not an expert in this area, so I thought I would see if I am missing something. I appreciate any light you can shed on this problem.

My thoughts:

I came across this video on making a taser from a disposible camera (following the link from Digg, from Buzzfeed, from Stay Free. I haven't tried it out yet, but it reminded me of a story that I'll tell sometime about my friend and diet author Seth.

Jeff Lax pointed me to this online article by Jeanna Bryner:

Higher education tied to memory problems later, surprising study finds

Going to college is a no-brainer for those who can afford it, but higher education actually tends to speed up mental decline when it comes to fumbling for words later in life.

Participants in a new study, all more than 70 years old, were tested up to four times between 1993 and 2000 on their ability to recall 10 common words read aloud to them. Those with more education were found to have a steeper decline over the years in their ability to remember the list, according to a new study detailed in the current issue of the journal Research on Aging. . . .

As Jeff pointed out, they only consider the slope and not the intercept. Pehaps the college graduates knew more words at the start of the study?

Here's a link to the study by Dawn Alley, Kristen Southers, and Eileen Crimmins. Looking at the article, we see "an older adult with 16 years of schooling or a college education scored about 0.4 to 0.8 points higher at baseline than a respondent with only 12 years of education." Based on Figures 1 and 2 of the paper, it looks like higher-educated people know more words at all ages, hence the title of the news article seems misleading.

The figures represent summaries of the fitted models. I'd like to see graphs of the raw data (for individual subjects in the study and for averages). It's actually pretty shocking to me that in a longitudinal analysis, such graphs are not shown.

Hans Rosling again

| 1 Comment

Cengiz Belentepe writes,

I’m trying to find a very interesting interactive presentation on demographics and sociology that I believe you posted a link to on your blog last year. I believe the professor who did the research was Scandinavian and also designed some special software to display the results but I’m not 100% sure. I know this description isn’t much to go on but if you do recall the presentation, I’d love to know the professor’s name and have a link to his presentation.

He's Hans Rosling, the software is Gapminder, and the link is here.

Unethical ethicists

| 1 Comment

Suresh sent along this item from Eric Schwitzgebel:

Ethics books are more likely to be stolen than non-ethics books in philosophy (looking at a large sample of recent ethics and non-ethics books from leading academic libraries). Missing books as a percentage of those off shelf were 8.7% for ethics, 6.9% for non-ethics, for an odds ratio of 1.25 to 1. [followed by analyses of various subsets of the data that confirm this result]

I'd like to see some data on how often these books are checked out before being stolen (or lost), but setting such statistical questions aside, I have to say that I've always been suspicious about ethics classes, in that I think it's natural for them to focus on the tough cases (true ethical dilemmas) rather than on the easier calls which people nonetheless get wrong. I'm not speaking with any expertise here, but my impression is that the most common ethical errors are clear-cut, where people do something they know (or should know) is unethical, but they think they won't get caught.

P.S. Here are my some of my more considered thoughts on ethics and statistics. (Also see Section 10.5 of my book with Deb.)

Andrew Oswald (see here and here) sends in this paper. Here's the abstract:

It has been known for centuries that the rich and famous have longer lives than the poor and ordinary. Causality, however, remains trenchantly debated. The ideal experiment would be one in which status and money could somehow be dropped upon a sub-sample of individuals while those in a control group received neither. This paper attempts to formulate a test in that spirit. It collects 19th-century birth data on science Nobel Prize winners and nominees. Using a variety of corrections for potential biases, the paper concludes that winning the Nobel Prize, rather than merely being nominated, is associated with between 1 and 2 years of extra longevity. Greater wealth, as measured by the real value of the Prize, does not seem to affect lifespan.

The natural worry here is a selection bias, in which people who die at age X are less likely to receive the prize (for example, if you die at age 60, but you would have received the prize had you lived past age 62). The authors address this using a survival-analysis approach to condition on the age at which the relevant scientists are nominated for or receive the prize.

Two years is a large effect, but at the same time I could imagine this difference occurring from some sort of statistical artifact, so I would't call such a study conclusive, but it adds to the literature on status, health, and longevity.

Explanation of the above title to this blog entry

Thinking more about the particular case of Nobel Prizes, I've long thought that the pain of not receiving the prize is far greater, on average, than the joy of receiving it. Feeling like you deserve the prize, then not getting it year after year . . . that can be frustrating, I'd think. Sort of like waiting for that promotion that never comes. Getting it, on the other hand, I'm sure is nice, but so many more eligible people don't get it than do (and the No comes year after year). I'd guess that it's a net reducer of scientists' lifespans.

Diversity in learning

| 3 Comments

Once I figure out how to do it, I'll be reorganizing the list of links and adding Seth's blog, but, in the meantime, here's a fascinating article on diversity in learning, where Seth describes a class assignment where he let students do whatever they wanted:

I [Seth] taught a class called Psychology and the Real World where the off-campus work essentially was the course. Students could do any off-campus work related to psychology – at least 60 hours of it during the 15-week semester. In addition, we met weekly for discussions and the students wrote three short papers. Eight students signed up. Their off-campus work was learning how to be a mediator, developing a television show about happiness, working at a shelter for battered women, working at a nursing home, talking with patients in a mental hospital for the criminally insane, taking care of two-year-old twins, tutoring high-school students, and making bereavement support calls. It was time well-spent.

I had a few thoughts:

1. This sounded a lot better than the class on left-handedness that Seth and I taught 12 years ago. The students liked the class OK but they certainly didn't do anything substantial on their own. But, even then, I recall Seth telling me that he thought a big problem with college courses, as they were usually configured, is that they have the goal of making the student as much like the instructor (or the textbook) as possible. It's a rare class where students' differing experiences and talents are appreciated. (One rare positive example among my own classes is my seminar with Shigeo, where it really works well that different students have different knowledge bases about political science. But in other classes it's been hard to make use of students' diversity.)

2. It's funny that only 8 students signed up, out of the 20,000 undergraduates at UC Berkeley. Setting aside selection issues, it sounds like at least a few more students would've benefited. But I have to say that it's hard to get good attendance in a non-required course. I recall that Mike Jordan said that he gets an enrollment of 125 in his Bayesian statistics course at Berkeley, which seems pretty impressive--I certainly don't get 125 in my classes here--but maybe it's required.

3. I somehow expect that this course wouldn't work so well if I \--or almost anyone else--were teaching it. Part of this is that Seth knows a lot about psychology, but it's also something about working with students. When I've tried to have students do open-ended projects, they've almost always done something pretty uninteresting (see Section 11.4.3 in Teaching Statistics: A Bag of Tricks for more on this). I remember discussing this with Seth several years ago. The conversation went something like this:

Me: Students generally pick uninteresting topics, skimp on the real work of data collection, and avoid any kind of random sampling or even systematic design, so I'm thinking I have to give them more structure, a better list of project topics, maybe assign them to projects.

Seth: Try giving them less structure and see what they come up with.

It seems that Seth's suggestion has worked--for him. I'll give it a try. But I still think I'll have to check their ideas and rule out the worst, such as comparisons of GPA's of athletes and nonatheletes, surveys of students about hours studying and drinking, etc etc. Actually, I really don't know what I should do about this.

4. Seth's article also has a bunch of hypotheses about evolution of various social behaviors. I neither believe nor disbelieve these things--I just don't know how to evaluate such things--but I think of them in a utilitarian sense as useful in helping Seth formulate hypotheses for his self-experimentation. Also, I like the Jane Jacobs references because I am also a big fan of her work (although maybe not all of it).

Futurist George Dvorsky included Bayesianism into the list of Must-know terms for the 21st Century intellectual:

Bayesian Rationality: Bayesian rationality is a probabilistic approach to reasoning. Bayesian rationalists describe probability as the degree to which a person should believe a proposition. They also apply Bayes' theorem when inferring or updating their degree of belief when given new information. Some scientists and epistemologists hope to replace the Popperian view of proof with a Bayesian view.

I agree (of course). Popperian falsification is just a special case of the Bayesian view: if the likelihood P(data|model) is zero (indicating that the data is impossible given the model), P(model|data) is zero, regardless of the prior. But the Bayesian approach offers some sort of a weighted preference among all the models that haven't been refuted yet, balancing the Ockhamist preference for simplicity through the prior and the desire for accuracy through the likelihood.

After several years I have looked at Brad Efron's R. A. Fisher in the 21st century. He provides an interesting chart of statistical techniques:


efron%20statistics%20space.png

Another interesting chart is the following description of the different ideals that each school in statistics is pursuing:


efron%20ideals.png

Many readers might already be familiar with the frequentist and the Bayesian ideologies, but the fiducial approach is often a bit of a mystery. I like to explain as an analytic version of parametric bootstrap: assume that θ is the ML parameter estimate; now draw samples of the same size from θ, and do a ML estimate on each of them. This way you will obtain a distribution of θ-reestimates with a very similar function as the posterior distribution of θ had one adopted the Bayesian approach. I dislike the fact that we 'guess' the ML estimate of θ in the first place, however, and then proceed by assuming that it is true.

Jean-Luc pointed me to Anomaly Hunt; or, How To Write a Research Paper. This brings me to the vague topic of what is interesting. They say that you haven't understood a concept until you have been able to explain it to something as dumb as a computer. For that reason, a lot of my past research has dealt with how to take a philosophical concept, such as "interaction", and convert it into a mathematical device. It turned out that the interestingness of an interaction is captured by measuring the additional information we gain allowing two variables to jointly affect the outcome.

The most interesting scientific articles are those that update our internal models of the world the most. An average case does not affect the model much, but an outlier or an extreme event will affect it more: because it updates the our intuitive distribution more than would an average case.

Statistical models and spam

| 4 Comments

In the old days, internet technologies were developed for ethical well-behaved people. But when the hordes were unleashed on the internet, the old technologies could not cope with the bad behavior, but it has been very hard to change the underlying fabric of internet standards and protocols. Spam in particular is one of the most annoying problems. Spam filtering is an automated classification of messages (e-mails, but also blog comments, blog trackbacks, instant messages and so on) into the good (ham) and into the bad (spam).

The so-called Bayesian filtering has been popularized by Paul Graham in his essays A Plan for Spam and Better Bayesian Filtering a few years ago, but goes back to a Microsoft Research who first worked on detecting insults in 1997 and then junk in 1998. The traditional way of dealing with spam has been to identify individual words that seem to be overrepresented in spam (Viagra, free, money, casino, games). The appearance of each such word increases the log-odds that the email is spam. On the other hand, the appearance of words related to one's interests and work increases the log-odds that the email is spam. When we sum up all these log-odds, we obtain a score, which is used to classify the email. This approach is known as the naive Bayesian classification. Of course, "Bayesian" here is as in Bayes rule, not as in Bayesian statistics. Although models such as logistic regression and support vector machines would yield better accuracy for spam filtering, practitioners tell me that naive Bayes is still heavily used in practice because it scales to the huge collections of data: given a list of spam emails and a list of non-spam emails, we can figure out how much log-odds to add or subtract for each word (or some other aspect of the message, such a the number of all-uppercase words, or the maximum length of a sequence of exclamation marks).

My pet peeve with spam filtering is that it doesn't root the problem out. It merely provides an efficient broom to sweep it under the rug. The innocent users are paying for filtering and wasted internet bandwidth, have to keep separating emails into spam and ham, risk the loss of important emails due to spam filters, whereas the spammers incur practically no penalties, only profits. Furthermore, the adaptive aspect of filtering has been used in the adversarial strategy of Bayesian poisoning, where messages that will be classified as spam are made up of purely legitimate words, and where legitimate words are injected into the spam messages. Moreover, spammy messages are now stored in images, which cannot be easily filtered automatically. For these reasons, the effectiveness of spam filters has gone down over the past year or two. Until we get internet postage stamps, internet taxes and internet police, I would prefer vigilante approaches such as the notorious Make Love not Spam. But pessimists like to use the following cannot-solve-spam form.

It is still refreshing to observe new developments. I have come across a paper on the next generation of spam filtering techniques Spam Filtering Using Statistical Data Compression Models by some of my former colleagues. Andrej Bratko et al. have found that models based on individual letters outperform the models based on the word counts. For example, their method can employ the indentation pattern "> " which is far more frequent in legitimate emails than in spam. With sufficient training, they would also be able to detect misspellings and foreign languages. Although not rooting the problem out, they can still buy some time.

A taxonomy of visualizations

| 2 Comments

The Visual Literacy project has a wonderful taxonomy of visualizations formatted as a periodic table:

periodictablevisualization2.jpg

Each type of visualization is described in terms of four multi-level attributes:


  • high/low complexity of the visualization ("mass") [updated 1/11/07]

  • data/information/concept/strategy/metaphor/compound visualization

  • process/structure visualization

  • overview/detail/both

  • divergent(exploratory) / convergent(summary) thinking


While I find the examples of data visualization quite limited, it is interesting to see how much wider the scope of visualization is.

They also have a taxonomy/directory of visualization scholars.

I've had problems viewing it in Firefox (the pop-ups are empty), but it works fine in IE. I found this on Information Aesthetics.

Janet Rosenbaum writes:

New York City has recently required restaurants with uniform menus to post calorie content on their menus with a font size equal to the prices. This initiative may not decrease obesity, but if we're able to gather good data, posting calories on menus could help us better understand how people choose food.

Currently, we don't have a good understanding of how people choose what they eat. Observations of people's food choices through nutritional surveys and food diaries tell us only what people will admit to eating. Laboratory experiments tell us how people who volunteer for psychology experiments choose foods in a new environment, but may not generalize to larger populations in real life situations. Non-laboratory experiments with vending machines have found that people will buy more healthy foods when healthy food is "subsidized" and when less healthy food is "taxed", but nutritional information is not immediately available to subjects even in these experiments: the foods which were manipulated were pretty obvious candidates for healthy and unhealthy foods such as carrot sticks and potato chips.

We also don't know how much knowledge about food people have: when someone chooses a high calorie food, we don't know whether they have chosen that food in ignorance of its calorie content or despite its calorie content. Putting calories on the menu in a visible way gives consumers information which is more readily available than on food packages, and reduces the second problem: some people will read the calorie content of their food when making their choices, and the calorie content may influence their choices.

If calorie information becomes widespread, we could even begin to discuss an elasticity of demand according to both the price and calorie content of the food, as well as a willingness to pay for fewer calories. Just thinking about the McDonald's menu, people can minimize the number of calories they eat by choosing either the least expensive (basic hamburger + fries) or the most expensive items on the menu (salads, grilled chicken).

Some have speculated that posting calorie information on the menus won't affect behavior at all because people choosing to eat at places with unhealthy food can't expect lower calories, but that seems naive. After all, even people shopping at expensive stores are somewhat price sensitive, and all retailers go to lengths to make people feel as though they are getting a bargain.

The inclusion of calorie information on menus gives a tremendous opportunity for social scientists, if only we can get sales data suitable for a quasi-experiment (pre-post with control). Any ideas?

My first thought here is that I imagine that people who eat salad and grilled chicken at McDonalds are only at Mickey's in the first place because someone else in their party wants a burger and fries. There's got to be some information on this sort of thing from marketing surveys (although these might not be easily accessible to researchers outside the industry).

My other thought is that it would be great if the food industry and public health establishment could work together on this (see note 4 here).

Measuring model fit

| No Comments

Ahmed Shihab writes,

I have a quick question on clustering validation.

I am interested in the problem of measuring how well a given data set fits a proposed GMM [Gaussian mixture model]. As opposed to the notion of comparing models, this "validation" idea asserts that a GMM already represents a specific mixture of distributions, it already represents an absolute, so we can find out directly if the data fits that representation or not.

In fuzzy clustering, such validity measures abound. But it struck me that in the probabilistic world of GMMs our only measure is the actual sum of probabilities given by the GMM. The closer it is to one, the better. However, if the sum is say 0.69 it can be misleading; when the clusters do not match in populations the bigger cluster, even though it fits badly, adds substantially to the overall probability score and so the overall impression is that there is a good fit.

My reponse: I don't have much experience with these models, but I recommend simulating replicated datasets from the fitted model and comparing them (visually, and using numerical summaries) to the observed data, as discussed in Chapter 6 of Bayesian Data Analysis.

My other comment is that clusters typically represent chioces rather than underlying truth. For an extreme example, consider a simulation of 10,000 points from a unit bivariate normal distribution. This can certainly be considered as a single cluster, but it can also be divided into 50 or 100 or 200 little clusters (e.g., via k-means or any other clustering algorithm). Depending on the purpose, any of these choices can be useful. But if you have a generative model, then you can check it by comparing replications to the data.

We've been extending our work on income and voting to include religion as well. For example:

superplot.png

MS, OH, and CT represent poor, middle-income, and rich states, respectively, and the red, blue, and gray lines on each plot represent frequent church attenders, occasional church attenders, and nonattenders.

Rch people vote more Republican and church attenders vote more Republican, but in addition, the difference between church attenders and nonattenders is greatest among rich people in poor states (as in the Mississippi graph above).

To get more understanding of the patterns of income and religion by state, here's state average religious attendance vs. state average income, with the colors indicating states that went for Bush or Gore in 2000:

st.rel.inc.png

There's a negative correlation, but also regional patterns, with the western states (except for Utah), some northeastern states, and some others below the main line.

And here are the within-state correlations between income and religious attendance (again plotted vs. state average income):

corr.st.rel.inc.png

Religious attendance is positively correlated with income in Mississippi (for example) but negatively correlated in Connecticut. Thus, the zero correlation between income and Republican voting in Connecticut ("What's the matter with Connecticut") is partly explained by the fact that poor people in Connecticut are more religious than rich people in Connecticut. (But this doesn't tell the whole story, as we can see from the graphs at the top of this entry.)

Anyway, we're thinking more about this--other factors to consider are religious denomination, urban/suburban/rural, and ethnicity. Our original goal was to understand the pattern that richer voters go Republican, but richer states support the Democrats, but now there are many more patterns to figure out.

By the way, if you're planning to use our new book in a class this winter/spring, you should email me directly so I can make sure your students get copies of the book.

cover.gif

Our book is finally out! (Here's the Amazon link) I don't have much to say about the book here beyond what's on its webpage, which has some nice blurbs as well as links to the contents, index, teaching tips, data for the examples, errata, and software.

But I wanted to say a little about how the book came to be.

I don't want to hear about it

| No Comments

The 1/4-power transformation

| 12 Comments

I like taking logs, but sometimes you can't, because the data have zeroes. So sometimes I'll take the square root (as, for example for the graphs on the cover of our book). But often the square root doesn't seem to go far enough. Also it lacks the direct interpretation of the log. Peter Hoff told me that Besag prefers the 1/4-power. This seems like it might make sense in practice, although I don't really have a good way of thinking about it--except maybe to think that if the dynamic range of the data isn't too large, it's pretty much like taking the log. But then why not the 1/8 power? Maybe then you get weird effects near zero? I haven't ever really seen a convincing treatment of this problem, but I suspect there's some clever argument that would yield some insight.

Here's my first try: a set of plots of various powers (1/8, 1/4, 1/2, and 1, i.e., eighth-root, fourth-root, square-root, and linear), each plotted along with the logarithm, from 0 to 50:

4powers.png

OK, here's the point. Suppose you have data that mostly fall in the range [10, 50]. then the 1/4 power (or the 1/8 power) is a close fit to the log, which means that the coefficients in a linear model on the 1/4 power or the 1/8 power can be interpreted as muliplicative effects.

On this scale, the difference between either of these powers and the log occurs at the low end of the scale. As x goes to 0, the log actually goes to negative infinity, and the powers go to zero. The big difference betwen the 1/8 power and the 1/4 power is that the x-points near 0 are mapped much further away from the rest for the 1/4 power than for the 1/8 power.

An argument in favor of the 1/4-power transformation thus goes as follows:

First, the 1/4 power maps closely to the log over a reasonably large range of data (a factor of 5, for example from 10 to 50). Thus, an additive model on the 1/4-power scale approximately corresponds to a multiplicative model, which is typically what we want. (In contrast, the square-root does not map so well, and so a model on that scale is tougher to interpret.)

Second, on the 1/4-power scale, the x-points near zero map reasonably close to the main body of points. So we're not too worried about these values being unduly influential in our analysis. (In contrast, the 1/8-power takes x=0 and puts it so far away from the other data that I could imagine it messing up the model.)

Could this argument be made more formally? I've never actually used the 1/4 power, but I'm wondering if maybe it really is a good idea.

P.S. Just to clear up a few things: the purpose of a transformation, as I see it, is to get a better fit and better interpretability for an additive model. Students often are taught that transformations are about normality, or about equal variance, but these are chump change compared to additivity and interpretability.

Recent Comments

  • Justin Smith: http://26.media.tumblr.com/ApftSg4I1k3wbm8lE1JTAuzWo1_400.jpg In all seriousness, Connect Four is a cool game. read more
  • Jerzy: Reminds me of that quote (from a Bush aide?) about read more
  • fraac: We're wild animals with lies on top, nothing more or read more
  • thingsbreak: Any chance we can entice you into digging into the read more
  • DK: On this topic, readers of this blog may be amused read more
  • joshtk76: Could you post a link to the article? read more
  • Marco Khalifah: you're in point of fact a just right webmaster. The read more
  • Roberta Slanina: Spot on with this write-up, I really assume this web read more
  • Andrew Gelman: DK: Getting rid of tenure may solve some problems but read more
  • Andrew Gelman: DK: He sent me a copy of his paper with read more
  • Phil: Mr. McFadden, I'm no fan of making decisions based on read more
  • DK: Would you care to comment on Kanazawa reproducing, with a read more
  • DK: Universities are run by professors. It's only natural that that read more
  • Michael J. McFadden: It's funny seeing all this sort of criticism aimed at read more
  • Andrew Gelman: Yes, I was kidding about the 93% tax rate. I read more

About this Archive

This page is an archive of entries from January 2007 listed from newest to oldest.

December 2006 is the previous archive.

February 2007 is the next archive.

Find recent content on the main index or look in the archives to find all content.