“Science revolves around the discovery of new cause-effect relationships but the entire statistics literature says almost nothing about how to do this.”

Seth writes:

Is this a fair statement, do you think?

Science revolves around the discovery of new cause-effect relationships but the entire statistics literature says almost nothing about how to do this.

It’s part of an abstract for a talk I [Seth] will give at the ASA conference next July. Haven’t submitted the abstract yet so can revise it or leave it out.

My reply: This seems reasonable to me.

You could clarify that the EDA literature is all about discovery of new relationships but with nothing about causality, while the identification literature is all about causality but nothing about the discovery of something new.

What literature there is on discovery of new causal relationships comes from structural equation modeling (also called graphical modeling), but this work is not particularly exploratory (it’s all about discovering relationships in a pre-specified set of variables) and there’s a lot of debate about how causal it is. (The proponents of structural equation modeling and graphical modeling think these tools can be used to discover causality from just about any observational data, but others are skeptical of these claims–rightfully skeptical, in my opinion.)

My point here is not to bash structural equation modeling, just to say that, unless you happen to be a strong believer in that family of methods, Seth’s statement is pretty much correct. And an interesting point it is. It might very well be that statistics just isn’t suited to such questions–I think we make a lot of useful progress in descriptive analysis of the Red State, Blue State variety (or of the identifying-genes-associated-with-diseases variety) but it’s worth at least occasionally thinking about the deeper questions.

24 thoughts on ““Science revolves around the discovery of new cause-effect relationships but the entire statistics literature says almost nothing about how to do this.”

  1. And, but, maybe

    "discover causality from just about any observational data" _and_ assumptions (of varying reasonableness and check ability) [ if possible to credibly adjust the distribution due to observational occurrence to the distribution under randomized intervention ]

    _but_ are you here referring to causes of effects versus effects of causes?

    _maybe_ Seth would have time to read the expository article Pearl last posted on your blog

    Keith

  2. Building off Kieth's comment:What about the attempts to tease causality out of observational data with a quasi-experimental structure? Regression discontinuity design and instrumental variables (when the focus is causality) are two examples. Granted those approaches have their flaws, but there is an enormous literature on both those areas, alot of which focuses on trying to get at causality.

  3. This has been bothering me greatly in the past few years and the worst part is that the general public, the medias etc. think we are finding causal links with all our fancy statistics (otherwise what's the point of doing them right?).

    Companies, politicians, courts etc. that make use of the results of scientific studies rely on this perception to push their product/agenda. It is especially damaging in the medical field. My wife is a doctor, most doctors do not know how to interpret statistics. I would even go as far as saying that the people who teach statistics to doctors do not know how or do not have time to teach the complexities in interpreting statistics. They'll teach up to p-values and that's about it. Pharmaceutical companies spend the greater part of their operating money to send representatives to doctor's cabinets and promote a cause and effect interpretation of taking their medication even though, looking closely, the studies often show barely significant results that could totally be due to systematic biases or other small non causal spurious correlations. You can't control everything in a study. Maybe one of the group was interviewed on a bright sunny day and they were in a good mood. That is enough to make results significants. With the weak standards we have in statistics interpretations, no wonder it is so easy to find significant effects even in placebos and everyone gets amazed at great power of placebos!

    I'm afraid to say that those who are skeptical of scientific studies and their interpretations currently have it right. If scientists are to gain credibility, this problem has to be solved or at least the scientific community should establish strict and _simple_ guidelines on interpretation of results and call out any one that go outside these guidelines. The term "significant" should mean something that is orders of magnitudes more significant than what you can find with a placebo. There should be a margin for results above the size of the placebo effect where they would be interpreted as "we are not sure". IMO if the confidence intervals do not fall above more than twice or three times what you can find with a placebo you fall in a zone where the results can easily be the effect of unknown systematic biases in the study.

    I was just reading a study the other day about allergy medication, the non drowsy kind that is sold over the counter in pharmacies. The results were significant at a level barely above placebos and way under what I would consider useful, a barely perceptible effect for the common person yet the conclusion of the study was praising the results, doctors are pushing this medication and advertisements are showing allergic people playing outside in flowers and pollen as if they were completely cured.

    but I disgress…

  4. I suggest that you read the following books, which deal directly with both discovering and expressing causal claims from observational data:

    Pearl, J. 2000. Causality: Models, reasoning, and inference. Cambridge U Press.

    Spirtes, P., Glymour, C. and Scheines, R. 1993. Causation, Prediction, and Search. Springer-Verlag.

    I deal with these topics as well in:

    Shipley, B. 2000. Cause and correlation in biology: A user's guide to path analysis, structural equations and causal inference. Cambridge U Press.

  5. Rigorous qualitative metholdogy is about discovering processes, mechanisms and even generating theory.

    If you are interested in understanding causality, trying to explain it or even trying to discover what it might be, qualitative methods and theory are critical. However, proving hypothesized causualities really requires quantiative methods. Quantitative data exploration can also suggest any number of ideas.

    Joe Maxwell does good basic stuff on quality methods. Creswell has a standard text. The grounded theory folks (e. g. Corbin, Strauss, glazer) have some really fundemental stuff, too.

    I would suggest that this interest in dsocovering new causalities is really about theory generation. Look qualitative for a
    lot of that.

  6. To all: The Pearl book is fine, and it's particularly popular among computer scientists, and that's fine too, but please read the last two paragraphs of my blog above. In particular:

    1. Structural equation modeling and its variants (including Pearl's work) are not about exploratory data analysis, the "discovery" issue that Seth is concerned with. They are methods for estimating relationships within existing datasets, which is something a little different.

    2. There are a lot of good reasons to be skeptical about the causal claims made from structural equation modeling. I've discussed some of this on the blog already and this isn't really the place to go over it again. Suffice it to say that many statisticians have been skeptical about this stuff for decades, and the reasons for this skepticism remain. I'm not saying these methods shouldn't be used, but I don't recommend bringing them up as a solution to Seth's question that got this discussion started.

    I think Ceolaf's comment above is more on target in relating to Seth's questions about theory generation and discovery.

  7. Andrew, can you explain what makes you skeptical of modern causality theory? (Or are you only skeptical of SEM?) I know you feel you have discussed this topic already, but as a frequent reader of your blog, I don't recall a direct discussion of this topic (other than the back-and-forth with Pearl, which wasn't particularly direct). If you think there are shaky claims or assumptions that underlie causality theory, I'm sure your readers would love to hear your thoughts.

    In any case, it seems that if we want to optimize the process of discovering new casual relationships, we can do it in two ways; first, by increasing the rate at which we expose ourselves to new evidence; second, by becoming better at using whatever evidence we have on hand to detect casual relationships. Seth's self-experimentation is a good example of the first. Causality theory provides many tools for the second, chief among them a formalism that clarifies a lot of concepts that were previously murky and gives researchers a way to talk clearly with each other about casual ideas and to share their data and intuitions more reliably.

    Having just read the first half of the 2nd edition of Causality, I think the theory is more useful than your comments about SEM imply. (The book starts with a general theory based on non-parametric functions, and only later in the book are those ideas applied to SEM.) It seems that your comments overlook the most important parts of the theory and boil it down to SEM, which Pearl himself spends a good bit of ink arguing has departed from its original causal roots and now serves as an unreliable guide to causality.

    To Ceolaf's comment, a large part of causality theory is about integrating qualitative and quantitative information, using information from the one side to test, interpret, and theorize about the other, and to formalize the relationships between the two. Aren't these exactly the kind of tools we will need if we want to discover causal relationships, to integrate data with intuition to get the best from both?

    As I framed it earlier, the discovery of new causal relationships is essentially a search problem. The input to the problem is our universe, a vast search space filled with hidden causal relationships. Our challenge is to search that space as efficiently as we can, to maximize our findings of interesting causal relationships before we run out of time and money. Seth has given us a way to explore nodes more cheaply — self-experimentation — which lets us cover more of the search space, and that helps us make more discoveries. Causality theory has given us a way to turn the search problem into one of directed search, where we can use the information we have already uncovered to select those nodes on the frontier of our understanding that are most likely to yield discoveries. It lets us spend our finite resources more efficiently, and that allows us to search more deeply, and that, too, helps us make more discoveries.

    Cheers,
    Tom

  8. What's the context of the talk? I'm sure I'm missing something, but the obvious example of new causes being discovered using statistics is in areas like agronomy, genetics, and pharmaceutical research where thousands of compounds are subject to exploratory RCTs on an industrial scale – without any specific theoretical reason to suspect a cause – and some of them have causal effects which are detected at the other end. This work doesn't get the respect it deserves because using lots of money and brute force to discover something is less glamourous than having a lone theorist deduce something from first principles, but it's hugely important.

    If we limit ourselves to exploratory analysis of observational data, I think you're right that statistics just isn't suited to such questions. I think the problem is that there isn't just one effect of interest – like higher yield – you can get identification on through experimental design. So you have multiple potential cause-effect relationships, and the combinatorics of filtering out the scarce genuine causal relationships from huge numbers of relationships generated by chance or confounding just isn't very good. Maybe the ASA talk will surprise me though?

  9. With all due respect Andrew, I believe you are interpreting the causal literature to narrowly – just "methods for estimating relationships within existing datasets"

    Pearl, Rubin and others make interesting points about identifying exactly what _blocks_ credible causal claims and identifying/designing ways to get around it.

    For instance, Sander Greenland in his Multiple Bias Analysis case study of electro-magnetic force and childhood luekemia – 12 case-control studies with similar risk estimates can't provide credible causal inference and another 100 new ones won't help – instead studies directly dealing with issues such as miss-classification are required.

    And of course Ceolaf's right, any evidence is necessarily a mix of qualitative and quantitative aspects.

    Keith

  10. Tom:

    I'm skeptical of Pearl et al.'s claim that one can learn about causal relations from observational data without additional scientific hypotheses or theory. This is a point I've discussed in the earlier blog entries. I also don't think it makes sense to ask questions such as "Does X cause Y?" with observational data. Statistically, this is the problem of estimating whether a parameter exactly equals zero, and in the sorts of problem I work on, that is not an interesting question.

    I am not an expert on SEM and the theories of Pearl, Spirtes, etc. I am referring to Pearl's work as an extension of SEM because it was my impression that he referred to it in that way. The connection to SEM was not meant to be a put-down of the causal graphical model stuff; I was just trying to put them in a general framework of methods that attempt to learn about causal relationships from arbitrary observational data.

    I agree with you that one can think of discovery as a search problem. I also agree with Seth's point that statistical methods, with the exception of the EDA literature (which I myself have contributed to!) aren't so strong on search.

    Alex:

    The stuff you're talking about seems very relevant to Seth's question. Perhaps you could supply some good references? (I've told Seth about the blog entry, so I think he's reading all of this–or he will be soon, at least).

    Keith:

    Interesting point. I went to a conference in 1989 on e-m waves and childhood leukemia, and in fact one of the examples in our multiple comparisons article comes from an experiment in this field. Since then, I've had the vague impression that the causal claims had been debunked, but I haven't followed the literature on this.

  11. >causal claims (e-m waves and childhood leukemia)had been debunked

    Greenland's position (a couple years ago anyways) was that careful explication and inclusion of informative prior information on biases and confounding outweighed or downweighted the 12 consistent estimates of harm from the case-control studies for any "conclusive" evidence of harm to be concluded – but more importantly more case-control studies would not help, but only studies that provide information about the relevant biases and confounding.

    But the general point I was earlier not making very clear is that in my reading of Rubin (mostly) and Pearl (a bit) I have found the stuff about designing new studies rather than making do with exist studies – the more important contributions. (Perhaps beacuse I thought a lot about meta-analysis as mostly an opportunity to figure out what studies need to be done next and how rather than an attempt to make do with existing studies)

    But also I know of some in the large clinical data bases arena who (because of the causal literature) are on the lookout for Instrumental varaiables or natural experiments were there might be less risky learning opportunity of as yet undefined effects – just knowing there were two roughly comparibale groups of patients in some way being treated differently presents a possible learning opportunity.

    Sorry if this post was a bit long

    Keith

  12. Alex, my talk will be part of a session called "The quantified self: personal data collection, analysis and exploration" organized by Hadley Wickham. My talk will be about what I have learned from keeping track of my health (e.g., my weight) for long periods of time. These records have helped me discover cause-effect relationships in two ways:

    1. They make it easier to notice a sudden change (e.g., in acne). After I see such a change, I ask myself: What did I change just before that?

    2. During a self-experiment to see if X affects Y. Z changes — apparently X affects Z. That I was already measuring Z makes this easier to see.

    By "discover" a cause-effect relationship I mean give it a boost from very low plausibility to the level of plausibility called "plausible". This initial boost I call "discover"; later increases in plausibility come about from what I call "testing" the idea (that X causes Z) or "confirming" the idea. The statistics literature, with a few exceptions, is all about testing/confirmation. A paper by Pearl in Statistics Surveys was about testing/confirmation, not discovery. Likewise, the work mentioned by Rubin and Greenland is about moving an idea from plausible to nearly certain.

    My usage of "discover" can be confusing because some people might say to "discover" that X causes Z you must take the plausibility of the relationship from very low to very high. To move the plausibility that X causes Z from plausible to nearly certain usually requires the sort of experiments that statisticians are familiar with. With that broad meaning of "discover" (very low to very high), sure, the statistics literature says a lot. But I think scientists usually use the term the way I'm using it. The person who does the heavy lifting, who pulls the plausibility from nowhere to somewhere, gets credit. For example, Mendel is said to have discovered simple ratios in breeding experiments. He didn't make them nearly certain; he merely made them plausible. Only when they were confirmed by others did the idea become nearly certain. But Mendel gets all the "discovery" credit.

    In areas where designed experiments are possible, there is plenty of testing, of course: you design an experiment to test if X causes Z. You can't afford to do a designed experiment to test the possibility that X causes Z when that relationship has very low plausibility. The expected payoff is too low.

    The brute force approaches used in a few fields, such as combinatorial chemistry, owe little or nothing to statistics research, as far as I know. I agree with Tom that discovering cause-effect relationships is basically a search problem. As far as I know — which isn't much — causality theory has not yet helped anyone accomplish the sort of discovery I'm talking about.

  13. Regarding Seth's last comment — doesn't statistics have a well-developed framework for discovering plausible relationships from (large amounts of) observational data? This is what "Data Mining" is all about. It tends to be grouped under Machine Learning rather than Statistics, but I believe the fields are tied together very closely.

  14. Alex: If you have some references on this for Seth, that would be great. My impression is that machine learning is focused more on prediction than understanding, but then there are methods used in fields such as genetics to try to sift through masses of data to find patterns that warrant further investigation.

    One difference between data mining and Seth's self-experimentation is that the former is entirely observational while Seth's work is experimental.

    One of Seth's points, I believe, is that experimentation–manipulation–is crucial, but the statistical literature on experimentation focuses on less important aspects such as randomization and double-blindness.

    Seth: You might be interested in the literature on industrial experimentation (to start, take a look at the classic book by Box, Hunter, and Hunter) which considers how to estimate effects when varying several factors at once (rather than just one at a time). This work doesn't have a high profile in statistics right now, but it might be just the thing for what you're doing. I imagine that these ideas are being used a lot in marketing research and maybe will resurface soon in the statistical literature.

  15. Thanks, Andrew. The classic book by Box, Hunter, and Hunter is what got me thinking along these lines! It is full of suggestions about how to test ideas — with not one word about how to come up with an idea worth testing.

  16. Thanks Seth – "plausibility from nowhere to somewhere" clarifies it for me I think.

    You might find CS Pierce's 1st category of his 3 categories of interest (abduction, deduction, induction or loosely put – might, must, should or maybe more aptly here, – qualitative, mathematical, statistical/inference)

    The challenge of his 1st category as he put it is once you "think" about it – its gone.

    Whether thats part of statistics is a good question – CR Rao's book for non-statisticians on statistics ommitted abduction in the first edition and then added a whole chapter in the later edition.

    My abduction would be what your interested in is almost all qualitative – of course until a start thinking about it a bit more …

    Hopefully you will post your talk sometime

    Keith

  17. After thinking about it a bit more …

    JG Gardin actually argued that researchers should be strongly discouraged from using "statistics" (data mining, EDA) to generate hypotheses (abduction) especially in his forced publication "in protest" of the SNARK program he developed to data mine archeological data in the late 1960,s. He argued these were non-productive hypothesis searches – and though one could not rule out a hypothesis by the way it was generated, one should be careful about chosing whioh ones to spend time further elaborating… (on the other hand, David Cox has suggested one of the values of causal graphs might be their constraining of such hypothesis searches)

    Also one thing Peirce was clear about was his arguments that creativity (good hypothesis) required real confusion or doubt and doubt could not be faked. So if your self experimentation does increase real doubt – it would consistent with this.

    Keith

  18. In response to Box, Hunter, and Hunter being full of suggestions about how to test ideas — with not one word about how to come up with an idea worth testing.

    That's not how I would view it. The 2nd Edition, published in 2005, has useful, entertaining and thought-provoking quotes in the front and back cover. Two that I particularly like are:

    Discovering the unexpected is more important than confirming the known.

    and

    The most exciting phrase in science, the one that heralds discoveries, is not "Eureka!" but "Now that's funny…" (Isaac Asimov)

    The first chapter of the book is entitled "Catalyzing the Generation of Knowledge", and that's how the authors view statistical methods, as a catalyst.

    There is a very nice example in Chapter 12 summarizing an experiment to optimize the yield of a chemical process. After a designed experiment was carried out, the experimenters found, rather surprisingly to them at the time, that there was a "ham sandwich" maximum rather than a point maximim, i.e the yield was highest on a plane in the three experimental variables (the ham) but decreased away from the optimal plane (the bread). Trying to understand this, a chemist on the team suggested that the system might be explained by two consecutive reactions, where in the first reaction the desired product was formed, but in the second reaction the desired product reacted to form an undesired product. Assuming the activation energies of the two reactions were the same a "ham sandwich" maximum would be obtained, similar to the statistical model. To increase the yield even further some way of increasing the speed of the first reaction relative to that of the second had to be found, and this was achieved by using a different catalyst.

    The success of the project depended on both subject matter knowledge and statistical methods. Perhaps it could have been done without statistical methods, but certainly not as quickly and maybe not at all.

  19. Might be useful to distinguish "interest" confusion/"Now that's funny…" with nuisance confusion – i.e. why was there a difference in adverse events between these apparently similar groups versus were these groups actually similar.

    Keith

  20. Keith, I agree with Pierce — creativity is greatly encouraged by doubt. That I had a sleep problem I wanted to solve (and didn't know how to solve) made it much easier to be creative. Yeah, I didn't just do self-experimentation, I did self-experimentation in the presence of doubt. A lot different than most self-experimentation by doctors.

    Neil, thanks for the example. I haven't looked closely at the 2nd edition of Box, Hunter, and Hunter. It was the first edition I was referring to.

Comments are closed.