December 2004 Archives

How does one measure the fit of a model to data? Supppose data are (y_1,...,y_n), and the estimate from the model is (x_1,...,x_n). Then one can simply measure fit by the correlation of x and y, or by the root-mean-squared error (the square root of the average of the (y_i-x_i)^2's).

When the n data points have structure, however, such simple pointwise error measures may miss the big picture. For example, suppose x and y are time series (that is, the n points are in a sequence), and x is a perfect predictor of y but just lagged by 2 time points (so that x_1=y_3, x_2=x_4, x_3=y_5, and so forth). Then we'd rather say that our error is "a lag of 2" rather than looking at the unlagged pointwise errors.

More generally, the lag need not be constant; thus, for example, there could be an error in the lag with standard deviation 1.3 time units, and an error in the prediction (after correcting for the lag) with standard deviation 0.4 units in the scale of y. Hence the title of this entry.

We have applied this idea to examples in time series and spatial statistics. Summarizing fitting error by a combination of distortion and additive error seems like a useful idea. It should be possible to do more by further decomposing fitting error.

For more, see the paper by Cavan Reilly, Phil Price, Scott Sandgathe, and myself (to appear in the journal Biometrics).

What is the value of a life?


What is the value of a life, and can it be estimated by finding a wage premium for risk?

In statistics, we learn about Type 1 and Type 2 errors. For example, from an intro stat book:

A Type 1 error is commtted if we reject the null hypothesis when it is true.

A Type 2 error is committed if we accept the null hypothesis when it is false.

(Usually these are written as I and II, in the manner of World Wars and Super Bowls, but to keep things clean with later notation I'll stick with 1 and 2.)

Actually, though . . .

There has been some discussion about adjusting public opinion polls for party identification (for example, see this page by Alan Reifman, which I found in a Google search). Apparently there has been some controversy over the idea as it was applied in the 2004 Presidential election campaign. Setting aside details of recent implementations, adjusting for party ID is in general a good idea, although it's not as easy as adjusting for characteristics such as sex, age, and ethnicity, whose population proportions are well-estimated from the Census (and which change very little, if at all, during an election campaign).

The "law of parsimony"?


Speaking of parsimony, I came across the following quotation from Commentary magazine (page 80 in the December 2004 issue):

The law of parsimony tells us that when there are alternative explanations of events, the simplest one is likely to be correct.

Commentary is a serious magazine, and this quotation (which I disagree with!) makes me wonder whether this idea of a scientific "law" is common among serious literary and political critics.

Hierarchical modeling is gradually being recognized as central to Bayesian statistics. Why? Well, one way of looking at it is that any given statistical model or estimation procedure will be applied to a series of problems--not just once--and any such set of problems corresponds to some distribution of parameter values which can themselves be modeled, conditional on any other available information. (This is the "meta-analysis" paradigm.)

Fully Bayesian Computing

| 1 Comment

Introducing A Programming Tool for Bayesian Data Analysis and Simulation using R.

Our new application for the data analysis program R eliminates most of the tedium in Bayesian simulation post-processing.

Now you can draw simulations from a posterior predictive distribution with a single line of code. You can pass random arguments to already existing functions such as mean() and sum(), and obtain simulations of distributions that you can summarize simply by typing the name of a variable on the console. It is also possible to plot credible intervals of a random vector y simply by typing plot(y)...

By enabling "random variable objects" in R, summarizing and manipulating posterior simulations will be as easy as dealing with regular numerical vectors and matrices.

Read all about it in our new paper, "Fully Bayesian Computing,".

The first beta version of the program will be soon released.

Our radon risk page (created jointly with Phil Price of the Indoor Environment Division, Lawrence Berkeley National Laboratory), is fully functional again.

You can now go over to the map, click on your state and then your county, give information about your house, give your risk tolerance (or use the default value), and get a picture of the distribution of radon levels in houses like yours. You also get an estimate of the dollar costs and lives saved from four different decision options along with a decision recommendation. (Here's an example of the output.)

We estimate that if all homeowners in the U.S. followed the instructions on this page, there would be a net savings of about $10 billion (with no additional loss of life) compared to what would happen if everybody followed the EPA's recommendation.

Wacky computer scientists

| 1 Comment

Aleks pointed us to an interesting article on the foundations of statistical inference by Walter Kirchherr, Ming Li, and Paul Vitanyi from 1997. It's an entertaining article in which they discuss the strategy of putting a prior distribution on all possible models, with higher prior probabilities for models that can be described more concisely. Thus linking Bayesian inference with Occam's razor.

I'm not convinced, though. They've convinced me that their model has nice mathematical properties but I don't see why it should work for problems I've worked on such as estimating radon levels or incumbency advantage or the probability of having a death sentence overturned or whatever.

Mark Hansen and Bin Yu have worked on applying this "minimum description length" idea to regression modeling, and I think it's fair to say that these ideas are potentially very useful without being automatically correct or optimal in the sense that seems to be implied by Kirchherr et al. in the paper linked to above.

We had our first annual statistical teaching, application, and research conference here at Columbia last Friday. The goal of the conference was to bring together people at Columbia who do quantitative research, or who teach statistics, but are spread out among many departments and schools (including biology, psychology, law, medical informatics, economics, political science, sociology, social work, business, and many others, as well as statistics and biostatistics).

Both the application/research and teaching sessions went well, with talks that were of general interest but went into some depth, and informed and interesting discussions.

Multiple imputation is the standard approach to accounting for uncertainty about missing or latent data in statistics. Multiple imputation can be considered as a special case of Bayesian posterior simulation, but its focus is not so much on the imputation model itself as on the imputations themselves, which can be used, along with the observed data, in subsequent "completed-data analyses" of the dataset that would have been observed (under the model) had there been no missingness.

How can we check the fit of models in the presence of missing data?

Too many polls


The U.S. is over-polled. You might have noticed this during the recent election campaign when national polls were performed roughly every 2 seconds. (This graph shows the incredible redundancy just from the major polling organizations.)

It would be interesting to estimate the total number of persons polled during the last year. A few years ago, I refereed a paper for Public Opinion Quarterly reporting on a survey that asked people, How many times were you surveyed in the past year? I seem to recall that the average response was close to 1.

My complaint is not new, but this recent campaign was particularly irritating because it became commonplace for people to average batches of polls to get more accurate estimators. As news consumers, we're like gluttons stuffing our faces with 5 potato chips at a time, just grabbing them out of the bag.

Against parsimony


A lot has been written in statistics about "parsimony"--that is, the desire to explain phenomena using fewer parameters--but I've never seen any good general justification for parsimony. (I don't count "Occam's Razor," or "Ockham's Razor," or whatever, as a justification. You gotta do better than digging up a 700-year-old quote.)

Maybe it's because I work in social science, but my feeling is: if you can approximate reality with just a few parameters, fine. If you can use more parameters to fold in more information, that's even better.

In practice, I often use simple models--because they are less effort to fit and, especially, to understand. But I don't kid myself that they're better than more complicated efforts!

My favorite quote on this comes from Radford Neal's book, Bayesian Learning for Neural Networks, pp. 103-104:

Sometimes a simple model will outperform a more complex model . . . Nevertheless, I believe that deliberately limiting the complexity of the model is not fruitful when the problem is evidently complex. Instead, if a simple model is found that outperforms some particular complex model, the appropriate response is to define a different complex model that captures whatever aspect of the problem led to the simple model performing well.


P.S. regarding the title of this entry: there's an interesting paper by Albert Hirschman with this title.

Eduardo writes:

Blog, take two

| No Comments

I'm sure most of you noticed that our blog disappeared for a while last week.

Some f&*^ing kid in Michigan of all places hacked into my account through the Wiki. I think the Wiki security problems are now fixed, and have also learned the hard way not to rely on anyone else to back things up!

Anyway, I just wanted to let everyone know that the blog is now functional again, and getting close to being back to its old glory. Most if not all of the entries are back up. I'll work on comments tomorrow. The uploaded files and links were lost. I'll replace the ones I have access to, but you might want to check your own entries and update links and pictures (same goes for the Wikis). All authors have the same user names as they were before, and passwords have been set back to the original ones I made up (let me know if you don't know your password).

I apologize for the interruption!


Recent Comments

  • Alan Mainwaring: I thought I understood this material, but now I am read more
  • Matt Leifer: I stumbled across your post via a google alert. I read more
  • Peter: Scott Aaronson wrote a long piece on this article, read more
  • lylebot: You might be interested in Scott Aaronson's blog, which is read more
  • Bill Jefferys: I see that I already mentioned this in another blog read more
  • Bill Jefferys: I missed Andrew's last comment, so this is very late. read more
  • Bill Jefferys: A big difference between astronomy and social sciences is that read more
  • John Mashey: Assuming that's the Monkey Cage, yes, I suspect this topic read more
  • D. Mayo: I don't know what you mean by regularization procedures ...or read more
  • Andrew Gelman: John: I followed your links. Perhaps I'll discuss that stuff read more
  • Andrew Gelman: Mayo: Indeed, frequentist statistics allows regularization procedures. But it is read more
  • D. Mayo: A quick note: frequentist statistics does not disallow probabilistic prior read more
  • John Mashey: My 5 seconds of Science fame was fun… although 1) read more
  • David W. Hogg: I would love to use experimental data, but my answer read more
  • Andrew Gelman: Dan: No, the problem is that the parameter itself has read more

About this Archive

This page is an archive of entries from December 2004 listed from newest to oldest.

November 2004 is the previous archive.

January 2005 is the next archive.

Find recent content on the main index or look in the archives to find all content.