Adjusted R-sq = 0.001

A correspondent writes:

Wanted to add to my comment on the Black Swan review… but didn’t want to hang people in public.

You mentioned… (Mosteller and Wallace made a similar point in their Federalist Papers book about how they don’t trust p-values less than 0.01 since there can always be unmodeled events. Saying p<0.01 is fine, but please please don't say p<0.00001 or whatever.) which is a terrific point! I had a related experience just last week when attending a seminar recently. Some guys were modeling some marketing information and showed ranges of coefficents from the set of regressions and argued that everything was significant. At the bottom of the table, it read: "Adjusted R-sq = 0.001". I had to check my glasses. I thought I was hallucinating. That line didn't seem to unfaze anyone else. The audience were asking modeling questions, why didn't you model it this way or that, etc. I turned around and asked my neighbor: were you bothered by R-sq of 0.1%? His answer was "I have seen 0.001 or lower for panel data". Now I'm not an expert in panel data analysis. But I am shocked, shocked, that apparently such models are allowable in academia. Pray tell me not!

I don’t know what to say. In theory, R^2 can be as low as you want, but I have to admit I’ve never seen something like 0.001.

10 thoughts on “Adjusted R-sq = 0.001

  1. Okay, it soiunds like you're saying that 0.001 is inappropriately precise, when unanticipated events might influence the figure by as much as, say, 0.01.

    Question: is there a way to arithmetically determine what the appropriate precision should be, or is it just a matter of applying common sense?

  2. I for one couldn't care less about R2.

    Now, I am even less of a fan of pseudo R2 ML concoctions.

    I somehow suspect this is what the poster is talking, as they are really badly behaved.

  3. Derek,

    Yeah, I guess it's common sense and depends on the context. Personally I cut off my p-value reporting at 1%. Of course the R^2 business is another matter entirely!

    Not an R2 fan,

    That's an interesting point: perhaps the R^2 was something like 4% or some other low but plausible value, and then the correction (depending on how it was done) took it down to 0.001.

    R-squared is not perfect (see the pictures on page 42 of our new book) but I think it can be helpful; see here.

  4. I expect that the sample size is very large, so inv(X'*X) is very small. For a fixed value of sigma^2, you can find a sample size that will give you any p-value you want.

    This brings into question whether or not those coefficient estimates really are significant: p-values for significance tests should decrease with sample size.

  5. When Bayesians talk about p-values, are they talking about the probability of the data given a null hypothesis? Or are they talking about the probability of the null hypothesis given the posterior distribution?

  6. Yeah, the R2 thing can be get weird sometimes…
    I have seen many papers in econ where the author(s) report R2 even with instrumental variable (IV) estimation.

  7. A financial analyst who was on my MSc course many years ago told me that he regularly had models with R&sup2; as low as 1 or 2%. He said that if the market was perfectly efficient there would be nothing to model and R&sup2; would be zero. He claimed his models detected small departures from perfect efficiency and that the models could be used to make a lot of money if you moved quickly enough. It seemed plausible at the time.

  8. It is really difficult to tell without further context. However, any non-zero R2 could in principle have either theoretical or practical importance. The theoretical importance is fairly obvious … I think.

    For practical importance R2 is a notoriously bad measure (e.g., the Salk vaccine trial data R2 = .0001 or so.) There are other well known examples. R2 will depend on measurement error and the 'dose' of the effect amongst other things.

    My particular favourite point is that small effects can be cumulative (see Abelson, 1985, a variance explanation paradox) and so tiny effects can sometimes be much more important than supposedly big effects.

    Equally, there are contexts where very high R2 can be 'inadequate' in some sense. I have a paper where two measures correlate at r = .985 but this correlation conceals an important (at least in the context of the paper) non-linearity.

  9. a belated remark. If one were to regress random variables on other random variables, randomly sampling the dependent variable and the number of independent variables, and so on, would one be more likely to get a R2 greater than, equal to or less than 0.5 ? If you sample with recall, you would potentially occasionally regress a random draw onto another random draw of the same underlying distribution, and would that not be more likely to cause high R2 ?

    I don't really know what I'm talking about, but is it not more likely to have very high R2 than very low R2 ?!?

    Another thing is that adjusted R2 can be negative, so 0.01 is not necessarily close to the lower admissible bound for the adjusted R2. An adjusted R2 of 0 is not the lowest value you can get!

    Pat Toche.

Comments are closed.