Statistics in the real world

Here’s an interesting and informative rant I received recently in the email:

This document is a consultant’s report to the Traverse City Convention & Visitor’s Bureau, quoted — literally photocopied into ­— a market analysis for an application for an approx. 270,000 square foot shopping center. The full report is here. On page 6 of the .pdf, we are told the following:

“After extensive evaluation and testing of these variables [that possibly determine tourist visitor volume to Grand Traverse County] for their predictive ability, the Consultant determined there are three variables with statistically significant associations. These are population in Grand Traverse County, Gross Domestic Product (GDP), and the External Event dummy variable.

“The Consultant found GDP [national, not regional or local] alone is a significant predictor however [sic] it does not hold up in association with either Grand Traverse Population or the External Event dummy variable.”

The Consultant then goes on to run a regression using GT population and the dummy, but not GDP. The resulting equation has an adjusted R-square of .95, and F=87.0. While GT pop has a t-value=10.9 & p=.000012, the dummy isn’t significant (p=0.3). The Consultant thus takes GT population projections out to 2025 to forecast annual tourist visits for that time frame.

That seems rather sketchy to me. Correct me because I’m likely wrong, but the Consultant basically said that 95% of the variation in annual tourist visits was due to (predicted by) county population, and then used population projections to forecast future tourist visits. And even though GDP was a significant variable, she used population instead, with no explanation why. (Or, none that I can find.) GDP and population were apparently the only two significant variables (though we don’t know how population held up if she removed the insignificant dummy from the specification) of the host of variables she tested; e.g., DoD/military contracts, even though our military presence is limited to a couple Coast Guard helicopters. (And her regression is based on about 10 data points.)

Surely, local population can’t be the driver of tourist visits. It does seem reasonable that population is driven by tourism, since people who visit here might end up wanting to move here, no? That seems to be a questionable variable for trying to forecast tourism in the future, when at least one other significant variable, GDP, is available — even if that was found by data mining as well.

I wish I could say this is typical, but in my experience, local units of government, &c., pay money for analyses even more questionable than what I just presented. For example, the market study in which the above was quoted reports consumer demand in 2005 $194,896,255 less than supply. Setting aside the problems this claim has in view of economic theory, the values labeled “demand” and “supply” are consumer expenditures and retail sales: retailers sold approx. $195 million more that consumers purchased. And there is no explanation of why this is; in 2005, within a 50-mile radius, consumers spent $1,371,392 on “News Dealers and Newsstands,” while retail sales in the same category was $0, and there is no explanation of that $1.4-milion gap!

Well, I [my correspondent] guess there’s no real point to this email other than to complain, and shouting at the sky is getting me a lot of strange looks. I’ll close by just asking you to ask your students to get involved in their communities, and at the very least, act as bullshit detectors and raise their voices when something smells.

This certainly doesn’t surprise me: I’ve seen worse from paid statistical consultants on court cases, including one from a consultant (nobody I’ve ever met or know personally in any way) who reportedly was paid hundreds of thousands of dollars for his services.

The key problems seem to be:

1. Statistics is hard, and not many people know how to do it.

2. The people who need statistical analysis don’t always know where to look.

8 thoughts on “Statistics in the real world

  1. Umm..okay, I could stand to be educated here. Going solely on what's contained in the post, would the consultant not have been justified in dropping one of either GDP or population if they were highly correlated with each other (which wouldn't be that shocking to me if GDP wasn't reported as per capita – higher population probably = higher GDP, right?) I know there's no way to be sure, but isn't that a plausible scenario? Or am I just hallucinating?

    The comment "Surely, local population can’t be the driver of tourist visits." is also making me wonder. The regression described in the post couldn't have been describing a causal relationship, unless the consultant in question was *really* sleepwalking…right? The quote from the consultant given in the post just mentions that population is a "predictor" of tourism, not that the two are causally linked.

    The strongest argument, to me, seems to be the one about projecting from the limited data set into the future.

    I guess I'm just having trouble following the thread of the complaints. Is it such a strange idea that places with a larger population might have higher tourism revenue? I mean, if I have to choose between Don't-blink-or-you'll-miss-it, Manitoba, or Vancouver, I'm going to Vancouver. So…help me out! What am I missing here?

  2. Winawer,

    I didn't look into this in detail. My impression is that the complaint is that the analysis did not directly address the questions of interest to the client.

  3. "Surely, local population can’t be the driver of tourist visits."

    Sometimes people travel because they want to visit people that they know. If you have more people, more people will visit you.

  4. Unfortunately, this doesn't surprise me.

    With regard to local governments purchasing statistical analytic services (as opposed to data alone), it would seem that its a market for lemons.

    It would be interesting to compare the quality of the services obtained by municipalities in which buyers may be presumed to have some expertise close at hand (eg., in college towns) versus municipalities more generally. Unfortunately, I think college town populations differ in a number of other ways, which could make this dicey.

  5. I'm the poor slob who sent this rant to Andrew. What's important is not the particular strengths & weaknesses of this particular analysis; what's important is that people with the ability to act as b.s. detectors need to understand how decisions affecting their hearths & homes tend to be founded on a poor grasp of reality.

    Isabel—

    I'm happy to admit to my own poor writing. Elsewhere in the document, why people visit and where they stay are both addressed, and seeing friends or relatives isn't indicated as much of a reason for visiting. So, (in addition to anecdotal evidence) the report itself discounts visiting residents as a driver of tourism, and lists other drivers of tourism other than population.

    If population serves as a proxy for those variables, then it might be useful; however, it would only be a proxy and the relevant variables seem quantifiable.

    Winawer—

    As above, I could have composed the original message more skillfully. Sorry. ^_^ I agree with you that ten annual figures seems a bit thin, and I mentioned that parenthetically because I assumed it'd be obvious to Andrew when I shot the email to him.

    In terms of GDP vs. population & external events, the report gives no reason to take population & external events as rather than GDP. We're given a specification with population and an external-events dummy, but the dummy isn't significant and still this specification is supposed to explain 95% of the variation in tourist visits!

    Think about that: 95% of the variation in tourist visits in this area are predicted by population and an insignificant dummy?

    Chris Walsh—

    Interesting question!

  6. Chris' point resonates with me. The last few years I have made an effort to keep my eye on what my local municipality is up to.

    Few people read the documents that are used to justify policy. Sometimes it is clear that I am the only one to actually read the document with a critical eye. I consistently find BS. Unfortunately much of the time the BS is what the city administrators desired so it will probably continue until enough people cry foul.

    Like John, I want to encourage people to spend an hour a month looking at what their local government is producing and make public comments on their findings (good or bad).

  7. If I follow the consultant's report, there was a lot of statistical hand-waving to get to the conclusion that tourist visits will follow the same growth curve as population.

    As a naive assumption, this probably isn't so bad. You might make the same assumption in projecting water, sewer, or waste disposal needs 20 years into the future.

    The main fault lies in dressing this pig up in a wedding dress.

Comments are closed.