November 19, 2013

Fireside Science: The Inefficiency (and Information Content) of Scientific Discovery

This content has been cross-posted to Fireside Science.

In this post, I will discuss somewhat of a trendy topic that needs further critical discussion. It combines a crisis in replicating experiments with the recognition that science is not an perfect or errorless pursuit. We start with a rather provocative article in the Economist called "Trouble at the Lab" [1]. The main idea: the practice of science needs serious reform in its practice, from standardization of experimental replicability to greater statistical rigor. 

While there are indeed perpetual challenges posed by the successful replication of experiments and finding the right statistical analysis for a given experimental design, most of the points in this article should be taken with a grain of salt. In fact, the conclusions seem to suggest that science should be run more like a business (GOAL: most efficient allocation of resources). This article suffers from many of the same issues as the Science article featured in my last Fireside Science post. Far from being an efficient process, the process of making scientific discoveries and discovering the secrets of nature require a very different set of ideals [2]. But don't just rely on my opinions. Here is a sampling of letters to the editor which followed:

The first is from Stuart Firestein, the author of "Ignorance: how it drives science", which is discussed in [2]. He argues that applying a statistician's theoretical standards to all forms of data is not realistic. While the portion of the original article [1] discussing problems with statistical analysis in most scientific papers is the strongest point made, it also rests on some controversial assumptions. 

The first involves a debate as to whether or not the Null Hypothesis Significance Test (NHST) is the best way to uncover significant relationships between variables. NHST is the use of t-tests and ANOVAs to determine significant differences between experimental conditions (e.g. treatment vs. no treatment). As an alternative, naive and other Bayesian methods have been proposed [3]. However, this still makes a number of assumptions about the scientific enterprise and process of experimentation to which we will return.

The second letter is refers to one's philosophy of science orientation. This gets a bit at the issue of scientific practice, and how the process of doing science may be misunderstood by a general audience. Interestingly, the notion of "trust, but verify" does not come from science at all, but from diplomacy/politics. Why this is assumed to also be the standard of science is odd.

The third letter will serve as a lead-in to the rest of this post. This letter suggests that the scientific method is simply not up to the task of dealing with highly complex systems and issues. The problem is one of public expectation, which I agree with in part. As experimental methods provide a way to rigorously examine hypothetical relationships between two variables, uncertainty may often swamp out that signal. While I think this aspect of the critique is a bit too pessimistic, let's keep these thoughts in mind.......

A reductionist tool in a complex world

Now let's turn to what an experiment uncovers with respect to the complex system you want to understand. While experiments have great potential for control, they are essentially hyper-reductionist in scope. When you consider that most experiments test the potential effect of one variable on another, an experiment may serve no less of a heuristic function than a simple mathematical model [4]. And yet in the popular mind, empiricism (e.g. data) tends to trump conjecture (e.g. theory) [5].

Figure 1. A hypothesis of the relationship between a single experiment and a search space (e.g. nature) that contains some phenomenon of interest.

Ideally, the goal of a single experiment is to reliably uncover some phenomenon in what is usually a very large discovery space. As we can see in Figure 1, a single experiment must be designed to overlap with the phenomenon. This can be very difficult to accomplish when the problem at hand is complex and multi-dimensional (HINT: most problems are). A single experiment is also a relatively information-poor way to conduct this investigation, as shown in Figure 2. Besides being a highly-controllable (or perhaps highly reduced complex) means to test hypotheses, an alternate way to think about experimental design is as an n-bit register [6].

Figure 2. A single experiment may be an elegant way to uncover the secrets of nature, but how much information does it actually contain?

Now to get an idea of how such overlap works in the context of replication, we can turn to the concept of an experimental footprint (Figure 3). Experimental footprints qualitatively describes what an experiment (or it's replication) uncovers relative to some phenomenon of interest. Let's take animal behavior as an example. There are many sources of variation that contribute to a specific behavior. In any one experiment, we can only observe some of the behavior, and even less of the underlying contributing factors and causes. 

A footprint is also useful in terms of describing two things we often do not think about. One is the presence of hidden variables in the data. Another is the effect of uncertainty. Both depend on the variables tested and problems chosen. But just because subatomic particles yield fewer surprises than human psychology does not necessarily mean that the Psychologist is less capable than the Physicist.

Figure 3. Experimental footprint of an original experiment and it's replication relative to a natural phenomenon.

The original maternal imprinting experiments conducted among geese by Konrad Lorenz serve as a good example. The original experiments were supposedly far messier [7] than the account presented in modern textbooks. What if we suddenly were to find out that replication of the original experimental template did not work in other animal species (or even among ducks anymore)? It suggests that we may need a new way to assess this (other than chalking it up to mere sloppiness).

So while lack of replication is a problem, the notion of a crisis is overblown. As we have seen in the last example, the notion of replicable results is an idealistic one. Perhaps instead of saying that the goal of experimental science is replication, we should consider a great experiment as one that reveals truths about nature. 

This may be best achieved not by the presence of homogeneity, but also a high degree of tolerance (or robustness) to changes in factors such as ecological validity. To assess the robustness of a given experiment and its replications (or variations), we can use information content to tell us whether or not a given set of non-replicable experiments actually yield information. This might be a happy medium between an anecdotal finding and a highly-repeatable experiment.

Figure 4. Is the goal of an experiment unfailingly successful replication, or a robust design that provides diverse information (e.g. successful replications, failures, and unexpected results) across replications?

Consider the case of an experimental paradigm that yields various types of results, such as the priming example from [1]. While priming is highly replicable under certain conditions (e.g. McGurk effect) [8], there is a complexity that requires taking the experimental footprint and systematic variation between experimental replications into account. 

This complexity can also be referred to as the error-tolerance of a given experiment. Generally speaking, the error tolerance of a given set of experiments is correspondingly higher as information content (related to variability) increases. So just because the replications do not pan out, they are nonetheless still informative. To maximize error-tolerance, the goal of an experiment should be an experiment with a small enough footprint to be predictive, but a large enough footprint to be informative. 

In this way, experimental replication would no longer be the ultimate goal. Instead, the goal would be to achieve a sort of meta-consistency. Meta-consistency could be assessed by both the robustness and statistical power of an experimental replication. And we would be able to sleep a little better at night knowing that the line between hyper-reductionism and fraudulent science has been softened while not sacrificing the rigors of the scientific method.


[1] Unreliable Research: trouble at the lab. Economist, October 19 (2013).

[2] Alicea, B.   Triangulating Scientific “Truths”: an ignorant perspective. Synthetic Daisies blog, December 5 (2012).

[3] Johnson, V.E.   Revised standards for statistical evidence. PNAS, doi: 10.1073/pnas.1313476110

[4] For more information, please see: Kaznatcheev, A.   Are all models wrong? Theory, Games, and Evolution Group blog, November 6 (2013).

[5] Note that the popular conception of what a theory is and what theories actually are (in scientific practice) constitutes two separate spheres of reality. Perhaps this is part of the reason for all the consternation.

[6] An n-bit register is a concept from computer science. In computer science, a register is a place to hold information during processing. In this case, processing is analogous to exploring the search space of nature. Experimental designs are thus representations of nature that enable this register.

For a more formal definition of a register, please see: Rouse, M.   What is a register? (2005).

[7] This is a personal communication, as I cannot remember the original source. The larger point here, however, is that groundbreaking science is often a trial-and-error affair. For an example (and its critique), please see: Lehrer, J.   Trials and Errors: why science is failing us. Wired, December 16 (2011).

[8] For more on the complexity of psychological priming, please see: Van den Bussche, E., Van den Noortgate, W., and Reynvoet, B.   Mechanisms of masked priming: a meta-analysis. Psychological Bulletin, 135(3), 452-477 (2009).

No comments:

Post a Comment