Don't let this distract you from the more serious items on this blog.

# June 2006 Archives

The blog is fully working, so your comments will be processed again. And have a fun 4th-of-July weekend!

As discussed here, I've been interested in finding studies of the costs and benefits of approvals of new medical treatments, but not in the narrow sense of the costs and benefits to those being treated, but the larger balance sheet, incluing costs of running the study, risks to participants, and likely gains to the general population. (For example, approving a study early allows for potentially more gains to the general population but also more risks of unforseen adverse events.)

Jim Hammitt pointed me to this paper by Tomas J. Philipson, Ernst R. Berndt, Adrian H. B. Gottschalk, Matthew W. Strobeck, entitled "Assessing the Safety and Efficacy of the FDA: The Case of the Prescription Drug User Fee Acts." Here's the summary of the paper, and here's the abstract:

The US Food and drug Administration (FDA) is estimated to regulate markets accounting for about 20% of consumer spending in the US. This paper proposes a general methodology to evaluate FDA policies, in general, and the central speed-safety tradeoff it faces, in particular. We apply this methodology to estimate the welfare effects of a major piece of legislation affecting this tradeoff, the Prescription Drug User Fee Acts (PDUFA). We find that PDUFA raised the private surplus of producers, and thus innovative returns, by about $11 to $13 billion. Dependent on the market power assumed of producers while having patent protection, we find that PDUFA raised consumer welfare between $5 to$19 billion; thus the combined social surplus was raised between $18 to $31 billions. Converting these economic gains into equivalent health benefits, we find that the more rapid access of drugs on the market enabled by PDUFA saved the equivalent of 180 to 310 thousand life-years. Additionally, we estimate an upper bound on the adverse effects of PDUFA based on drugs submitted during PDUFA I/II and subsequently withdrawn for safety reasons, and find that an extreme upper bound of about 56 thousand life-years were lost. We discuss how our general methodology could be used to perform a quantitative and evidence-based evaluation of the desirability of other FDA policies in the future, particularly those affecting the speed-safety tradeoff.

I haven't read the paper (that takes more effort than linking to it!) but I like that they're trying to measure all the costs and benefits quantitatively.

I got the following (unsolicited) email from a publisher today:

We are developing a new, introductory statistics textbook with a data analysis approach, and would value your answers to our brief survey regarding the proposed table of contents (attached). . . .

Having read the table of contents (see below), all I can say is . . . yuck! It's gotta be tough being a book publisher if you're expected to be coming up with new intro texts all the time.

OK, here it is:

Hybrid Monte Carlo is not a new energy-efficient auto race. It's a computational method developed by physicists to improve the efficiency of random-walk simulation (i.e., the Metropolis algorithm) by adding auxiliary variables that characterize the "momentum" of the simulation path. I saw Radford Neal give a talk on this over 10 years ago, and it made a lot of sense to me. (See here, for example.)

My question is: why isn't hybrid Monte Carlo used all the time in statistics? I can understand that it can be difficult to program, but why isn't it in software such as Bugs, where things only have to be programmed once? Even if it doesn't solve all problems, shouldn't it be an improvement over basic Metropolis?

Gregor sent another question:

The comment file got corrupted, so we're trying to figure out how to fix it. In the meantime, the blog is not currently displaying comments. It appears to be storing the comments, however, so I hope we'll get it fixed within a few days.

Jennifer writes,

You may want to check out the website for the Indianapolis Public Schools. It has several nice features. They have school and district report cards online. Also if you are looking at a school "snapshot" there is an interesting section called "Delve deeper into the data" which allows the user to do several things, one of which is to compare this school to similar schools where the user can define what characteristics to use when defining similarity (about a dozen characteristics including: number of students, avg % passing their standardized tests, attendance rates, schedule time, grade span, ethnic composition, % free lunch and several others including things like "school improvement models"). It might be a nice model for things we are working towards.

Here's an example (for the "George Washington Carver School"). I hate the pie chart and 3-D bar charts (of course), but they do allow access to quite a bit of data as well as comparisons such as here. It makes me realize how little information is available from Columbia (or other universities).

This item reminds me of the time I was riding on the New Jersey Transit train, sitting next to a 6-foot-2-inch woman. It turned out she played the role of Miss Frizzle on the traveling production of The Magic School Bus. She said the kids on the show are played by short adult actors.

I sas this in the New York TImes today: A CBS News poll asked the following question:

Should U.S. troops stay in Iraq as long as it takes to make sure Iraq has a stable democracy, even if it takes a long time, or leave as soon as possible, even if Iraq is not completely stable?

I seem to recall some advice in the sample survey literature about not asking double-barrelled questions (here, the assumption that U.S. troops will "make sure Iraq has a stable democracy," along with the question of how long the troops should stay). In any case, it seems like a good example of a problem with question wording.

Incidentally, the Times feature on this poll (it was only a paragraph, not a full article) did not point out the problem with the question wording, and it also featured a yucky graph (as Tufte would put it, "chartjunk").

Jay Goodliffe writes,

I recently read your paper on scaling coefficients that you posted on the PolMeth site. I hope you don't mind if I send a comment/question on your manuscript.I usually like to include some sort of "substantive significance" table after the regular tables to report something like first differences. I have also thought recently about how to compare relative effects of variables when some variables are binary and others are not.

My current approach is to code all binary variables with the modal category as 0, set all variables to their median, and then see how the predicted dependent variable changes when each independent variable is moved to the 90th percentile, one at a time. This approach makes it easy to specify the "baseline" observation, so there are no .5 Female voters, which occurs if all variables are set to the mean instead. There are, of course, some problems with this. First, you need all of the binary variables to have at least 10% of the observations in each category. Second, it's not clear this is the best way to handle skewed variables. But it is similar in kind to what you are suggesting.

My comment is that your approach may not always work so well for skewed variables. With such variables, the range mean +/- s.d. will be beyond the range of observed data. Indeed, in your NES example, Black is such a variable. In linear models, this does not matter since you could use the range [mean, mean + 2 s.d.] and get the same size effect. But it might matter in non-linear models, since it matters what the baseline is. And there is something less...elegant in saying that you are moving Black from -0.2 to 0.5, rather than 0 to 1.

My question is: You make some comments in passing that you prefer to present results graphically. Could you give me a reference to something that shows your preferred practice?

Thanks.

--Jay

P.S. I've used tricks from _Teaching Statistics_ book in my undergraduate regression class.

To start with, I like anyone who uses our teaching tricks, and, to answer the last question first, here's the reference to my preferred practice on making graphs instead of tables.

On to the more difficult questions: There are really two different issues that Jay is talking about:

1. What's a reasonable range of variation to use in a regression input, so as to interpret how much of its variation translates into variation in y?

2. How do you summarize regressions in nonlinear models, such as logistic regression?

For question 1, I think my paper on scaling by dividing by two sd's provides a good general answer: in many cases, a range of 2 sd's is a reasonable low-to-high range. It works for binary variables (if p is not too far from .5) and also for many continuous variables (where the mean-sd is a low value, and the mean+sd is a high value). For this interpretation of standardized variables, it's not so important that the range be mean +/- 1sd; all that matters is the total range. (I agree that it's harder to interpret the range for a binary variable where p is close to 0 or 1 (for example, the indicator for African American), but in these cases, I don't know that there's any perfect range to pick--going from 0 to 1 seems like too much, it's overstating the reasonable changes that could be expected--and I'm happy with 2sd's a choice.

For question 2, we have another paper just on the topic of these predictive comparisons. The short answer is that, rather than picking a single center point to make comparisons, we average over all of the data, considering each data point in turn as a baseline for comparisons. (I'll have to post a blog entry on this paper too....)

Following the link from Jon Baron's site, I found this interesting blog from the American Journal of Bioethics.

Interpretation of regression coefficients is sensitive to the scale of the inputs. One method often used to place input variables on a common scale is to divide each variable by its standard deviation. Here we propose dividing each variable by *two* times its standard deviation, so that the generic comparison is with inputs equal to the mean +/- 1 standard deviation. The resulting coefficients are then directly comparable for untransformed binary predictors. We have implemented the procedure as a function in R. We illustrate the method with a simple public-opinion analysis that is typical of regressions in social science.

Here's the paper, and here's the R function.

Standardizing is often thought of as a stupid sort of low-rent statistical technique, beneath the attention of "real" statisticians and econometricians, but I actually like it, and I think this 2 sd thing is pretty cool.

The posters for the second mid-Atlantic causal modeling conference have been listed (thanks to Dylan Small, who's organizing the conference). The titles all look pretty interesting, especially Egleston's and Small's on intermediate outcomes. Here they are:

Aleks pointed me to this report by Scott Keeter of the Pew Research Center. First the quick pictures:

These should not be a surprise (given that there are tons of surveys that ask about age, voting, and party ID) but it's interesting to see the pictures. Peak Republican age is about 46--that's a 1960 birthday, meaning that your first chances to vote were the very Republican years of 1978 and 1980, when everybody hated Jimmy Carter.

The Pew report also had information on political participation:

As expected, the under-30s vote a lot less than other Americans. But a lot more of them try to persuade others, which is interesting and is relevant to our studies of political attitudes in social networks.

P.S. The graphs are pretty good, although for party id vs age, I would get rid of those dotted lines and clean up the axes to every 10% on the y-axis and every 10 years on x. The table should definitely be made into a graph. The trouble is, it takes work to make the graph and you wouldn't really get any credit for doing it. That's why we'd like a general program that makes such tables into graphs.

I thought this was amusing.

Iven Van Mechelen told me about this postdoctoral research position in the psychology department at the University of Leuven. They're a great research group; I went on sabbatical there 10 years ago and have been collaborating with them since (here's my favorite paper of ours).

Here the paper, coauthored with Jeff Fagan and Alex Kiss, which will appear in the Journal of the American Statistical Association (see earlier entry). We analyzed data from 15 months of street stops by NYC police in the late 1990s. The short version:

- Blacks (African-Americans) and hispanics (Latinos) were stopped more than whites (European-Americans) in comparison to the prevalence of each group in the population.

- The claim has been made that these differences merely reflect differences in crime rates between these groups, but there was still a disparity when instead the comparison was made to the number of each group arrested in the previous year.

- The claim has been made that these differences are merely geographic--that police make more stops of *everyone* in high-crime areas--but the disparity remained (actually, increased) when the analysis also controlled for the precinct of the arrest.

In the years since this study was conducted, an extensive monitoring system was put into place that would accomplish two goals. First, procedures were developed and implemented that permitted monitoring of officers’ compliance with the mandates of the NYPD Patrol Guide for accurate and comprehensive recording of all police stops. Second, the new forms were entered into databases that would permit continuous monitoring of the racial proportionality of stops and their outcomes (frisks, arrests).

Monte Carlo is the ubiquitous little beast of burden in Bayesian statistics. Val points to the article by Nick Metropolis "The Beginning of the Monte Carlo Method." Los Alamos Science, No. 15, p. 125, 1987 about his years at Los Alamos (1943-1999) with Stan Ulam, Dick Feynman, Enrico Fermi and others. Some excerpts:

I've become increasingly convinced of the importance of treatment interactions--that is, models (or analyses) in which a treatment effect is measurably different for different units. Here's a quick example (from my 1994 paper with Gary King):

But there are lots more: see this talk.

Given all this, I was surprised to read Simon Jackman's blog describing David Freedman's talk at Stanford, where Freedman apparently said, "the default position should be to analyze experiments as experiments (i.e., simple comparison of means), rather than jamming in covariates and even worse, interactions between covariates and treatment status in regression type models."

Well, I agree with the first part--comparison of means is the best way to start, and that simple comparison is a crucial part of any analysis--for observational or experimental data. (Even for observational studies that need lots of adjustment, it's a good idea to compute the simple difference in averages, and then understand/explain how the adjustment changes things.) But why is is it "even worse" to look at treatment interactions??? On the contrary, treatment interactions are often the most important part of a study!

I've already given one example--the picture above, where the most important effect of redistricting is to pull the partisan bias toward zero. It's not an additive effect at all. For another example that came up more recently, we found that the coefficient of income, in predicting vote, varies by state in interesting ways:

Now, I admit that these aren't experiments: the redistricting example is an observational study, and the income-and-voting example is a descriptive regression. But given the power of interactions in understanding patterns in a nonexperimental context, I don't see why anyone would want to abandon this tool when analyzing experiments. Simon refers to this as "fritzing around wtih covariate-asjustment via modeling" but in these examples, interactions are more important than the main effects.

**Interactions are important**

Dave Krantz has commented to me that it is standard in psychology research to be intersted in interactions, typically 3-way interactions actually. The point is that, in psychology, the main effects are obvious; it's the interactions that tell you something.

To put it another way, the claim is that the simple difference in means is the best thing to do. This advice is appropriate for additive treatment effects. I'd rather not make the big fat assumption of additivity if I can avoid it; I'd rather look at interactions (to the extent possible given the data.)

**Different perspectives yield different statistical recommendations**

I followed the link at Simon's blog and took a look at Freedman's papers. They were thought-provoking and fun to read, and one thing I noticed (in comparison to my papers and books) was: no scatterplots, and no plots of interactions! I'm pretty sure that it would've been hard for me to have realized the importance of interactions without making lots of graphs (which I've always done, even way back before I knew about interactions). In both the examples shown above, I wasn't looking for interactions--they were basically thrust upon me by the data. (Yes, I know that the Mississippi/Ohio/Connecticut plot doesn't show raw data, but we looked at lots of raw data plots along the way to making this graph of the fitted model.) If I hadn't actually looked at the data in these ways--if I had just looked at some regression coefficients or differences in means or algebraic expressions--I wouldn't have thought of modeling the interactions, which turned out to be crucial in both examples.

I know that all my data analyses could use a bit of improvement (if only I had the time to do it!). A first step for me is to try to model data and underlying processes as well as I can, and to go beyond the idea that there's a single "beta" or "treatment effect" that we're trying to estimate.

Gregor writes:

Through Robert Israel's sci.math posting I've found an excellent online resource, especially for many statistical topics: **Encyclopaedia of Mathematics**. It seems better than the other two better-known options Wolfram MathWorld and Wikipedia. Another valuable resource **Quantlets** has several interesting books and tutorials, especially on the more financially oriented topics; while some materials are restricted, much of it is easily accessible. Finally, I have been impressed by **Computer-Assisted Statistics Teaching** - while it is of introductory nature, the nifty Java applets make it worth registering.

Phil pointed me to this fun graph:

Preparing data for use, converting it, cleaning it, leafing through thick manuals that explain the variables, asking collaborators for clarifications takes a lot of our time. The rule of thumb in data mining is that 80% of the time is spent on preparing the data. Also, it is often painful to read bad summaries of interesting data in papers when one would want to actually examine the data directly and do the analysis for oneself.

While there are many repositories of data on the web, they are not very sophisticated: usually there is a ZIP file with the data in some format that yet has to be figured out. Today I have stumbled upon Virtual Data System that provides an open source implementation of a data repository that enables one to view variables, the distribution of their values in the data, perform certain types of filtering, all through the internet browser interface. An example can be seen at Henry A. Murray Research Archive - click on Files tab and then on Variable Information button. Moreover, the system enables one to cite data similarly as one would cite a paper.

A similar idea developed for publications a few years earlier is GNU EPrints, which is a system of repositories of technical reports and papers that almost anyone can set up. Having used EPrints, I was frustrated by the inability to move data from one repository to another, to have some sort of a search system that would span several repositories, to have integration with search and retrieval tools such as Citeseer.

But regardless of the problems, such things are immensely useful parts of the now-developing scientific infrastructure on the internet. There would be wonders if even 5% of the money that goes into the antiquated library system was channelled into the development of such alternatives.

David Berri very nicely gave detailed answers to my four questions about his research in basketball-metrics. Below are my questions and Berri's responses.

Carrie links to a Wall Street Journal article about scientific journals that encourage authors to refer to other articles from the same journal:

John B. West has had his share of requests, suggestions and demands from the scientific journals where he submits his research papers, but this one stopped him cold. . . After he submitted a paper on the design of the human lung to the American Journal of Respiratory and Critical Care Medicine, an editor emailed him that the paper was basically fine. There was just one thing: Dr. West should cite more studies that had appeared in the respiratory journal. . . . "I was appalled," says Dr. West of the request. "This was a clear abuse of the system because they were trying to rig their impact factor." . . .The result, says Martin Frank, executive director of the American Physiological Society, which publishes 14 journals, is that "we have become whores to the impact factor." He adds that his society doesn't engage in these practices. . . .

From my discussions with Aleks and others, I have the impression that impact factors are taken more seriously in Europe than in the U.S. They also depend on the field. The Wall Street Journal article says that impact factors "less than 2 are considered low." In statistics, though, an impact factor of 2 would be great (JASA and JRSS are between 1 and 2, Biometrics and Biometrika are around 1). Among the top stat journals are Statistics in Medicine (1.4) and Statistical Methods in Medical Research (1.9), which are considered OK but not top stat journals. You gotta reach those doctors (or the computer scientists and physicists; they cite each other a lot).

A question came in which relates to an important general problem in survey weighting. Connie Wang from Novartis pharmaceutical in Whippany, New Jersey, writes,

I read your article "Struggles with survey weighting and regression modeling" and I have a question on sampling weights.My question is on a stratified sample design to study revenues of different companies. We originally stratified the population (about 1000 companies) into 11 strata based on 9 attributes (large/small Traffic Volume, large/small Size of the Company, with/without Host/Remote Facilities, etc.) which could impact revenues, and created a sampling plan. In this sampling plan, samples were selected from within each stratum in PPS method (probabilities proportionate to size), and we computed the sampling weights (inverse of probability of selection) for all samples in all strata. In this sampling plan, sampling weights for different samples in the same stratum may not be the same since samples were drawn from within each stratum not in SRS (simple random sample) but in PPS/census.

Writing this last item reminded me that a friend once used the phrase "straw person," with a perfectly straight face, in conversation. I told him that in my opinion it was ok to say "straw man" since it's a negative thing and so non-sexist to associate it with men.

I've been thinking about this because we're doing the final copyediting for our book . . . there are some words you just can't use because people get confused:

- "which" and "that": Some people, including copy editors, are confused on this; see here for more on the topic). The short answer is that you can use "which" pretty much whenever you want, but various misinformed people will tell you otherwise.

- "comprise": I once had a coauthor correct my manuscript by crossing out "comprise" and replacing with the (wrong) "is comprised of." I try to avoid the word now so that people won't think I'm making a mistake when I use "comprise" correctly.

- "inflammable": This one has never come up in my books, but I've always been amused that "inflammable" means "can catch fire." But it sounds like it means "nonflammable," so now we just use "flammable" to avoid confusion.

- "forte": According to the dictionary, it's actually pronounced "fort,", not "fortay." But everybody says "fortay," so I can either use it wrong, like everybody else, or say "fort" and leave everybody confused. I just avoid it.

- "bimonthly": I think by now you know where I'm heading on this one. Again, I just avoid the word (on the rare occasions that it would arise).

- "whom": I never know whether I should just use "who" and sound less pedantic.

- splitting infinitives: I do this when it sounds right, even when (misinformed) people try to stop me.

By the way, my copy editor has been great. He's made lots of unncessary comments (for example, on "which" and "that") but I just ignore these. More importantly, he's found some typos that I hadn't caugh.

People sometimes ask me how to combine ecological regression with survey data. This paper by Jackson, Best, and Richardson, "Improving ecological inference using individual-level data" from Statistics in Medicine seems like it should be very useful. Here's the abstract:

In typical small-area studies of health and environment we wish to make inference on the relationship between individual-level quantities using aggregate, or ecological, data. Such ecological inference is often subject to bias and imprecision, due to the lack of individual-level information in the data. Conversely, individual-level survey data often have insufficient power to study small-area variations in health. Such problems can be reduced by supplementing the aggregate-level data with small samples of data from individuals within the areas, which directly link exposures and outcomes. We outline a hierarchical model framework for estimating individual-level associations using a combination of aggregate and individual data. We perform a comprehensive simulation study, under a variety of realistic conditions, to determine when aggregate data are sufficient for accurate inference, and when we also require individual-level information. Finally, we illustrate the methods in a case study investigating the relationship between limiting long-term illness, ethnicity and income in London.

Mouser sent along this link to an applet that simulates World Cup outcomes.

While looking for the Willis distribution (which was mentioned in Mandelbrot's classic paper on taxonomies), Aleks found this, which linked to this site. For example, here's the frequency of the last name Lo, by state:

Not as fun as the baby name site but still somewhat cool.

A few weeks ago, I posted an entry about a bad graphical display of financial data; specifically, which asset classes have performed well, or badly, by year. Here's the graphic:

I pointed out that although this graphic is poor, it's not easy to display the same information really well, either. For instance, a simple line plot does a far better job than the original graphic of showing the extent to which asset classes do or don't vary together, and which ones have wilder swings from year to year, but it's also pretty confusing to read. Here's what I mean:

I suggested that others might take a shot at this, and a few people did.

[See update at end of this entry.]

Jeff Lax pointed me to the book, "Discrete choice methods with simulation" by Kenneth Train as a useful reference for logit and probit models as they are used in economics. The book looks interesting, but I have one question. On page 28 of his book (go here and click through to page 28), Train writes, "the coefficients in the logit model will be √1.6 times larger than those for the probit model . . . For example, in a mode choice model, suppose the estimated cost coefficient is −0.55 from a logit model . . . The logit coefficients can be divided by √1.6, so that the error variance is 1, just as in the probit model. With this adjustment, the comparable coefficients are −0.43 . . ."

This confused me, because I've always understood the conversion factor to be 1.6 (i.e., the variance scales by 1.6^2, so the coefficients themselves scale by 1.6). I checked via a little simulation in R:

I read Malcolm Gladwell's article in the New Yorker about the book, "The Wages of Wins," by David J. Berri, Martin B. Schmidt, and Stacey L. Brook. Here's Gladwell:

Weighing the relative value of fouls, rebounds, shots taken, turnovers, and the like, they’ve created an algorithm that, they argue, comes closer than any previous statistical measure to capturing the true value of a basketball player. The algorithm yields what they call a Win Score, because it expresses a player’s worth as the number of wins that his contributions bring to his team. . . .In one clever piece of research, they analyze the relationship between the statistics of rookies and the number of votes they receive in the All-Rookie Team balloting. If a rookie increases his scoring by ten per cent—regardless of how efficiently he scores those points—the number of votes he’ll get will increase by twenty-three per cent. If he increases his rebounds by ten per cent, the number of votes he’ll get will increase by six per cent. . . . Every other factor, like turnovers, steals, assists, blocked shots, and personal fouls—factors that can have a significant influence on the outcome of a game—seemed to bear no statistical relationship to judgments of merit at all. Basketball’s decision-makers, it seems, are simply irrational.

I have a few questions about this, which I'm hoping that Berri et al. can help out with. (A quick search found that this blog that they are maintaining.) I should also take a look at their book, but first some questions:

I came across this paper. Could someone please convert all the tables into graphs? Thank you.

Rafael pointed me toward some great stuff at the UCLA statistics website, including a page on Multilevel modeling that's full of great stuff (No link yet to our forthcoming book, but I'm sure that will change...) It would also benefit from a link to R's lmer() package.

**Fixed and random (whatever that means)**

One funny thing is that they link to an article on "distingushing between fixed and random effects." Like almost everything I've ever seen on this topic, this article treats the terms "random" and "fixed" as if they have a precise, agreed-upon definition. People don't seem to be aware that these terms are used in different ways by different people. (See here for five different definitions that have been used.)

**del.icio.us isn't so delicious**

At the top of UCLA's multilevel modeling webpage is a note saying, "The links on this are being superseded by this link: Statistical Computing Bookmarks". I went to this link. Yuck! I like the original webpage better. I suppose the del.icio.us page is easier to maintain, so it's probably worth it, but it's too bad it's so much uglier.

Aleks sent me these slides by Jan de Leeuw describing the formation of the UCLA statistics department. Probably not much interest unless you're a grad student or professor somewhere, but it's fascinating to me, partly because I know the people involved and partly because I admire the UCLA stat dept's focus on applied and computational statitics. In particular, they divide the curriculum into "Theoretical", "Applied", and "Computational". I think that's about right, and, to me, much better than the Berkeley-style division into "Probability", "Theoretical", and "Applied". Part of this is that you make do with what you have (Berkely has lots of probabilists, UCLA has lots of ocmputational people) but I think that it's a better fit to how statistics is actually practice.

It's also interesting that much of their teaching is done by continuing lecturers and senior lecturers, not by visitors, adjuncts, and students. I'm not sure what to think about this. One of the difficulties with hiring lecturers is that the hiring and evaluation itself should be taken seriously, which really means that experienced teachers should be doing the evaluation. So I imagine that getting this started could be a challenge.

I also like the last few slides, on Research:

Marcia Angell has an interesting article in the New York Review of Books on the case of Vioxx, the painkiller drug that was withdrawn after it was found to cause heart attacks. (She cites an estimate of tens of thousands of heart attacks caused by the use of Vioxx and related drugs, referring to Eric J. Topol, "Failing the Public Health—Rofecoxib, Merck, and the FDA," The New England Journal of Medicine, October 21, 2004.) Angell writes,

In late 1998 and early 1999, Celebrex and then Vioxx were approved by the FDA. They were given rapid "priority" reviews—which means the FDA believed them likely to be improvements over drugs already sold to treat arthritis pain. Was that warranted? Neither drug was ever shown to be any better for pain relief than over-the-counter remedies such as aspirin or ibuprofen (Advil) or naproxen (Aleve). But theory predicted that COX-2 inhibitors would be easier on the stomach, and that was the reason for the enthusiasm. As it turned out, though, only Vioxx was shown to reduce the rate of serious stomach problems, like bleeding ulcers, and then, mainly in people already prone to these problems, a small fraction of users. In other words, the theory just didn't work out as anticipated.Furthermore, people vulnerable to stomach ulcers could probably get the same protection and pain relief by taking a proton-pump inhibitor (like Prilosec) along with an over-the-counter pain reliever. So the COX-2 inhibitors did not really fill an unmet need, despite the one seemingly attractive claim made in favor of them.

She also goes into detail on conflict of interest in the FDA advisory committees, and recommends that the FDA shouldn't approve new drugs so hastily. This sounds like a good recommendation for Vioxx etc. (tens of thousands of heart attacks doesn't seem good). But how many drugs are there on the other side--effective drugs that are still waiting for approval? I'm curious what Angell's colleagues at the Harvard Center for Risk Analysis would say. Would it be possible to have an approval process that catches the Vioxx-type drugs but approves others faster?

Jonathan Zaozao Zhang writes,

For the dataset in my research, I am currently trying to compare the fit between a linear (y=a+bx) and a nonlinear model y=(a0+a1*x)/(1-a2*x).The question is: For the goodness of fit, can I compare R-squared values?(I doubt it... Also, the nls command in R does not give R-squared value for the nonlinear regression) If not, why not? and what would be a common goodness of fit measure that can be used for such comparsion?

My response: first off, you can compare the models using the residual standard deviations. R^2 is ok too, since that's just based on the residual sd divided by the data sd. Data sd is same in 2 models (since you're using the same dataset), so comparing R^2 is no different than comparing residual sd.

Even simpler, I think, is to note that model 2 includes model 1 as a special case. If a2=0 in model 2, you get model 1. So you can just fit model 2 and look at the confidence interval for a2 to get a sense of how close you are to model 1.

Continuing on this theme, I'd graph the fitted model 2 as a curve of E(y) vs x, showing a bunch of lines indicating inferential uncertainty in the fitted regression curve. Thien you can see the fitted model and related possibilities, and see how close it is to linear.

## Recent Comments

Andrew Gelman:Arthur: Our kids can't make us bring lunch, but we read moreArthur B.:In negotiations, one can benefit from being bound by restrictive read moreRogerH:I've learnt from experience not to expect scatterplots from econometricians, read moreNick Cox:If none of the co-authors was the plagiarist, then the read moreMarcus:I visited Vegas last month and was staggered to find read moreAndrew Gelman:Zbicylist: I agree the total numbers are low, but still: read morezbicyclist:Check page 32, figure 4. This shows that the effect read moreMichaelG:Certain types of racist actions are greatly reduced since the read moreGabriel:I have a big problem with the last sentence of read morePaul:My frustration with congruence vs. incongruence is that it assumes read moreJonathan:"The effects on the overall sex crime rate and rapes read moreRobert Bell:Andrew: Thanks for the post, a great find. I agree read moreElad:Any plans on creating a similar back up project for read moreAndrew Rogers:To complete what Sarang said, "-paths" in this context can read moreK? O'Rourke:Thanks Nick, Google scholar says its "Ebooks" See below - read more