Direct mail marketing problem

Gordon Dare writes with an interesting example of the kind of statistics question that comes up all the time but isn’t in the textbooks:

I took survey sampling with you a few years ago. I would be grateful if you could help me with with a hypothesis test problem. I work in a direct mail company and would like to know if the difference in results in two groups are significant. Here are the details:

group1: 50,000 custs mailed a catalog. Of these 100 purchased and the mean of their $ spend was $50 with a standard deviation of $10.

group2: 50,000 custs mailed a catalog. Of these 120 purchased and the mean of their $ spend was $55 with a standard deviation of $11.

I did extensive research but am still confused as to what N I should use to test if there is a significant difference between these two means ($50 and $55). Is it the 50,000 or the 100? Should I use the $0 for the non-buyers to calculate the mean and SD. If so, with so much nonbuyers how should I cater for such a skewed distribution or is this not an issue?

My response:

The quick answer is to use all 50,000 customers in each group (counting the non-buyers as zeroes). Skewness really isn’t an issue given that you have over 100 nonzeroes in each group. You could also do more elaborate analyses, considering the purchasing decision and the average purchase separately, but the quick summary would be to just use the total.

8 thoughts on “Direct mail marketing problem

  1. Here's another typical direct marketing problem… one can almost always ensure a statistically significant effect (however small) if one were to increase N enough… in some scenarios, it is plausible to have N in the hundreds of thousands or even a million. The typical advice is "use your common sense", "stress practical significance".

    But aren't statisticians chickening out? That advice is equivalent to saying throw out the hypothesis testing apparatus because it doesn't matter what the test says. Do we have a better answer than that?

  2. KAiser: The problem comes (as you half-mention with your remark on practical significance)from the notion that "statistical" as a modifier to the term "significance" has any more authority than the term "significance" by itself. When you have millions of observations, the chance that two datasets pulled from the same distribution will have different means by chance is trivial. All the "statistical" part means is that pure chance isn't the reason the means differ. So what? You knew that going in. So the test in this case measures something you aren't particularly interested in. That doesn't mean you should "throw out hypothesis testing." It just means you should test something interesting, something you didn't know before you ran the experiment.

  3. I think the right answer is "throw out hypothesis testing because it doesn't matter what the test says."

    Well, I exaggerate a little, but not much. I think hypothesis testing is OK when what you want to do is test a hypothesis, but I think it is very rare…maybe even "very very rare" that that is what you want to do. Usually, as in the marketing problem discussed here, you have no interest whatsoever in the question "are the two catalogs EXACTLY the same, in terms of the orders they generate." What you're actually interested in is something more like "which catalog do I think will make me more money?", or maybe " "should I send out the additional 100,000 copies of Catalog B that I have in my warehouse, or should I discard them and spend an extra $5000 to print up more copies of Catalog A?" Hypothesis testing doesn't provide an answer to questions like these, so why use it?

    In the example above, Catalog A brought in $5000 in orders, and catalog B brought in $6600. If the question is, "I want to print and send another 50,000 catalogs, should I do A or B", you don't need any fancy-schmancy statistics to tell you the answer is B, all other things being equal.

    I think that what you want in this problem, and many similar ones, is estimates and uncertainties of the dollars brought in per issue of catalog A and B. To say that the difference is "statistically significant" at p=0.027 or whatever…who cares? This is really a decision analysis problem (which catalog should I send, or how much more should I pay for a better catalog), not a hypothesis testing problem.

  4. We all agree hypothesis testing is useless in this scenario, because we know the large N will lead to a statistically significant but probably not profitable result.

    However, the principle behind hypothesis testing is still extremely useful and I disagree with the 2nd half of Phil's comment. the p-value and so on tells us whether the difference between catalogs A and B is "real" or is it random noise. This becomes even more important when the difference is small because it is then more likely that it is due to noise.

    One way to reason out of this is that with millions of observations, our precision of estimation is so high that we should take our sample difference as "real", and then only worry about the practical implication of that difference.

    But then this is ever so slightly uncomfortable because we move from statistics into accounting, essentially.

  5. First, I disagree that the p-value tells you whether the difference between catalogs is "real" or is random noise. What it tells you is that IF there were no difference between catalogs, you would only see a difference this big x% of the time. It's hard for me to picture a case in which that would be a directly relevant question. Well, maybe not "hard to picture", but most examples seem pretty contrived.

    Second, in the catalogs example given here, even if we did have an estimate of the probability that catalog A is really better than catalog B, that's probably not a relevant question either! We can see by inspection that, absent any other data, the probability that A is better will be more than 50%. So why would we send out catalog B? At least, if there's no cost associated with the decision, we send out A. If there is a cost — suppos it costs $0.25 more to print A than B — then we're not interested in the probability that A is actually better, we're interested in a quantitative estimate of how much better A is…but that's not going to be answered by a hypothesis test.

    Like everyone else, I do calculate (and look at) p-values — why not? — but I rarely think they're the "right" thing to look at. I certainly don't put any stock in arbitrary "significance" numbers: I wouldn't take a different action at p=0.0499 than at p=0.0501.

    I don't want to sound like a wacko anti-hypothesis-testing guy…I want to sound like a _sane_ anti-hypothesis-testing guy. Hypothesis testing has its place, p-values can be useful to look at, yada yada. But I think what one is really interested in is rarely answered by a hypothesis test.

  6. First, a hearty reiteration of ignoring standard level like 0.05. If that's the part of hypothesis testing that dies, it's absolutely the right part. But I don't see how your example, Phil, undermines hypothesis testing… we just have a new hypothesis — is the profitability from A > profitability of catalog B? We just form a different hypothesis and test it directly in the same fashion.

  7. As a nonstatistician, my first response would be to separate the question into two parts
    1) Is there a difference in the likelihood of purchase (n=50,000) and
    2) Is there a difference in the size of the purchase (n=100 or 120).

    As I am desperately trying to relearn stats from years ago and update what I know to a level where it is useful, a little more detail on why another approach was recommended would be deeply appreciated.

    Also, thanks for the blog. It is one of my favourites, both fun and interesting, and deeply informative.

  8. By reveiwing the given data and null hypothesis, I assume, two sample t-test would be suitable test to apply for the reason that we can include all the given information to conclude the results.

Comments are closed.