June 21, 2014

Fireside Science: The Representation of Representations

This content is being cross-posted to Fireside Science, and is the third in a three-part series on the "science of science".

This is the final in a series of posts on the science of science and analysis. In past posts, we have covered theory and analysis. However, there is a third component of scientific inquiry: representation. So this post is about the representation of representations, and how representations shape science in more ways than the casual observer might believe.

The three-pronged model of science (theory, experiment, simulation). Image is adapted from Fermi Lab Today Newsletter, April 27 (2012).

For the uninitiated, science is mostly analysis and data collection with theory being a supplement at best and necessary evil at worst. Ideally, modern science rests on three pillars: experiment, theory, and simulation. For these same uninitiated, the representation of scientific problems is a mystery. But in fact, it has been the most important motivation for much of the scientific results we celebrate today. Interestingly, the field of computer science relies heavily on representation, but this concern generally does not carry over into the empirical sciences.

Ideagram (e.g. representation) of complex problem solving. Embedded are a series of Hypotheses and the processes that link them together. COURTESY: Diagram from [1].

Problem Representation
So exactly what is scientific problem representation? In short, it is the basis for designing experiments and conceiving of models. It is the sieve through which scientific inquiry flows, restricting the typical "question to be asked" to the most plausible or fruitful avenues. It is often the basis of consensus and assumptions. On the other hand, representation is quite a bit more subjective than people typically would like their scientific inquiry to be. Yet this subjectivity need not lead to an endless debate about the validity of one point of view versus another. There are heuristics one can use to ensure that problems are represented in a consistent and non-leading way.

3-D Chess: a high-dimensional representation of warfare and strategy.

Models that Converge
Convergent models speaks to something I alluded to in "Structure and Theory of Theories" when I discussed the theoretical landscape of different academic fields. The first way is whether or not allied sciences or models point in the same direction. To do this, I will use a semi-hypothetical example. The hypothetical case is to consider three models (A, B, and C) of the same phenomenon. Each of these models make different assumptions and includes different factors, but should at least be consistent with each other. One real-world example of this is the use of gene trees (phylogenies) and species trees (phylogenies) to understand evolution in a lineage [2]. In this case, each model uses the same taxa (evolutionary scenario), but includes incongruent data. While there are a host of empirical reasons why these two models can exhibit incongruence [3], models that are as representationally complete as possible might resolve these issues.

Orientation of Causality
The second way is to ensure that the one's representation gets the source of causality right. For problems that are not well-posed or poorly characterized, this can be an issue. Let's take Type III errors [4] as an example of this. In hypothesis testing, type III errors involve using the wrong explanation for a significant result. In layman's terms, this is getting the right answer for the wrong reasons. Even more than in the  case of type I and II errors, focusing on the correct problem representation plays a critical role in resolving potential type III errors.

Yet problem representation does not always help resolve these types of errors. Skeptical interpretation of the data can also be useful [5]. To demonstrate this, let us turn to the over-hyped area of epigenetics and its larger place in evolutionary theory. Clearly, epigenetics plays some role in the evolution of life, but is not deeply established in terms of models and theory. Because of this representational ambiguity, some interpretations play a trick. In a conceptual representation that embodies this trick, scarcely-understood high-level phenomena such as epigenetics will usurp the role of related phenomena such as genetic diversity and population processes. When the thing in your representation is not well-defined or quite popular (e.g. epigenetics), it can take on a causal life of its own. Posing the problem in this way allows us to obscure known dependencies between genes, genetic regulation, and the environment without proving exceptions to these established relationships.

Popularity is Not Sufficiency
The third way is to understand that popular conceptions do not translate into representational sufficiency. In logical deduction, it is often pointed out that necessity does not equal sufficiency. But as with the epigenetics example, it also holds that popularity cannot make something sufficient in and of itself. In my opinion, this is one of the problems with using narrative structures in the communication of science: sometimes an appealing narrative does more to obscure scientific findings than it does in making things accessible to lay people.

Fortunately, this can be shown by looking at media coverage of any big news story. The CNN plane coverage [6] shows this quite clearly: coverage of rampant speculation and conspiracy theory was a way to emphasize an increasingly popular story. In such cases, speculation is the order of the day, while thoughtful analysis gets pushed aside. But is this simply a sin of the uninitiated, or can we see parallels of this in science? Most certainly, there is a problem with recognizing the difference between "popular" science and worthwhile science [7]. There is also precedence from the way in which certain studies or areas of study are hyped. Some in the scientific community [8] have argued that Nature's hype of the ENCODE project [9] results fell into this category.

One example of a mesofact: ratings for the TV show The Simpsons over the course of several hundred episodes. COURTESY: Statistical analysis in [10].

Related to these points is the explicit relationship between data and problem representation. In some ways, this brings us back to a computational view of science, where data do not make sense unless it is viewed in the context of a data structure. But sometimes the factual aspect of data varies over time in a way that obscures our mental models, and in turn obscures problem representation.

To make this explicit, Sam Arbesman has coined the term "mesofact" [11]. A mesofact is knowledge that changes slowly over time given new data. Populations of specific places (e.g. Minneapolis, Bolivia, Africa) has changed in both absolute and relative terms over the past 50 years. But when problems and experimental designs are formulated assuming that facts related to these data (e.g. rank of cities by population) do not change over time, we can get the analysis fundamentally wrong.

This may seem like a trivial example. However, mesofacts have relevance to a host of problems in science, from experimental replication to inferring the proper order of causation. The problem comes down to an interaction between data's natural variance (variables) and the constructs used to represent our variables (facts). When the data exhibit variance against an unchanging mean, it is much easier to use this variable as a stand-in for facts. But when this is not true, scientifically-rigorous facts are much harder to come by. Instead of getting into an endless discussion about the nature of facts, we can instead look to how facts and problem representation might help us tease out the more metaphysical aspects of experimentation.

Applying Problem Representation to Experimental Manipulation
When we do experiments, how do we know what our experimental manipulations really mean? The question itself seems self-evident, but perhaps it is worth exploring. Suppose that you wanted to explore the causes of mental illness, but did not have the benefits of modern brain science as a guide. In defining mental illness itself, you might work from a behavioral diagnosis. But the mechanisms would still be a mystery. Is it a supernatural mechanism (e.g. demons) [12], an ultimate form of causation (reductionism), or a global but hard-to-see mechanism (e.g. quantum something) [13]? An experiment done the same way but assuming three different architectures could conceivably yield statistical significance for all of them.

In this case, a critical assessment of problem representation might be able to resolve this ambiguity. This is something that as modelers and approximators, computational scientists deal with all of the time. Yet it is also an implicit (and perhaps even more fundamental) component of experimental science. For most of the scientific method's history, we have gotten around this fundamental concern by relying on reductionism. But in doing so, this restricts us to doing highly-focused science without appealing to the big picture. In a sense, we are blinded by science by doing science.

Focusing on problem representation allows us a way out of this. Not only does it allow us to break free from the straightjacket of reductionism, but also allows us to address the problem of experimental replication more directly. As has been discussed in many other venues [14], the lack of an ability to replicate experiments has plagued both Psychological and Medical research. But it is in these areas which representation is most important, primarily because it is hard to get right. Even in cases where the causal mechanism is known, the underlying components and the amount of variance they explain can vary substantially from experiment to experiment.

Theoretical Shorthand as Representation
Problem representation also allows us to make theoretical statements using mathematical shorthand. In this case, we face the same problem as the empiricist: are we focusing on the right variables? More to the point, are these variables fundamental or superficial? To flesh this out, I will discuss two examples of theoretical shorthand, and whether or not they might be concentrating on the deepest (and most generalizable) constructs possible.

The first example comes from Hamilton's rule, derived by the behavioral ecologist W.D. Hamilton [15]. Hamilton's rule describes altruistic behavior in terms of kin selection. The rule is a simple linear equation that assumes adaptive outcomes will be optimal ones. In terms of a representation, these properties provide a sort of elegance that makes it very popular.

In this short representation, an individual's relatedness to a conspecific contributes more to their behavioral motivation to help that individual than a typical trade-off between costs and benefits. Thus, a closely-related conspecific (e.g. a brother) will invest more into a social relationship with their kin than with non-kin. In general, they will take more personal risks in doing so. While more math is used to support the logic of this statement [15], this inequality is often treated as a widely applicable theoretical statement. However, some observers [16] have found the parsimony of this representation to be both too incomplete and intellectually unsatisfying. And indeed, sometimes an over-simplistic model does not deal with exceptions well.

The second example comes from Thomas Piketty's work. Piketty, economist and author of "Capital in the 21rst Century" [17], has proposed something he calls the "First Law" which explains how income inequality relates to economic growth. The formulation, also a simple inequality, characterizes the relationship between economic growth, inherited wealth, and income inequality within a society.

In this equally short representation, inequality is driven by the relative dominance of two factors: inherited wealth and economic growth. When growth is very low, and inherited wealth exists at a nominal level, inequality persists and dampens economic mobility. In Piketty's book, other equations and a good amount of empirical investigation is used to support this statement. Yet, despite its simplicity, it has held up (so far) to the scrutiny of peer review [18]. In this case, representation through variables that generalize greatly but do not handle exceptional behavior well produce a highly-predictive model. On the other hand, this form of representation also makes it hard to distinguish between a highly unequal post-industrial society and a feudal, agrarian one.

Final Thoughts
I hope to have shown you that representation is an underappreciated component of doing and understanding science. While the scientific method is our best strategy for discovering new knowledge about the natural world, it is not without its burden of conceptual complexity. In the theory of theories, we learned that formal theories are based on both deep reasoning and are (by necessity) often incomplete. In the analysis of analyses, we learned that the data are not absolute. Much reflection and analytical detail must be taken to ensure that an analysis represents meaningful facets of reality. And in this post, these loose ends were tied together in the form of problem representation. While an underappreciated aspect of practicing science, representing problems in the right way is essential for separating out science from pseudoscience, reality from myth, and proper inference from hopeful inference.

[1] Eldrett, G.   The art of complex problem-solving. MediaExplored blog, July 10 (2010).

[2] Nichols, R.   Gene trees and species trees are not the same. Trends in Ecology and Evolution, 16(7), 358-364 (2001).

[3] Gene trees and species trees can be incongruent for many reasons. Nature Knowledge Project (2012).

[4] Schwartz, S. and Carpenter, K.M.   The right answer for the wrong question: consequences of type III error for public health research. American Journal of Public Health, 89(8), 1175–1180 (1999).

[5] It is important here to distinguish between careful skepticism and contrarian skepticism. In addition, skeptical analysis is not always compatible with the scientific method.

For more, please see: Myers, P.Z.   The difference between skeptical thinking and scientific thinking. Pharyngula blog, June 18 (2014) AND Hugin   The difference between "skepticism" and "critical thinking"? RationalSkepticism.org, May 19 (2010).

[6] Abbruzzese, J.   Why CNN is obsessed with Flight 370: "The Audience has Spoken". Mashable, May 9 (2014).

[7] Biba, E.   Why the government should fund unpopular science. Popular Science, October 4 (2013).

[8] Here are just a few examples of the pushback against the ENCODE hype:

a) Mount, S.   ENCODE: Data, Junk and Hype. On Genetics blog, September 8 (2012).

b) Boyle, R.   The Drama Over Project Encode, And Why Big Science And Small Science Are Different. Popular Science, February 25 (2013).

c) Moran, L.A.   How does Nature deal with the ENCODE publicity hype that it created? Sandwalk blog, May 9 (2014).

[9] For an example of the nature of this hype, please see: The Story of You: ENCODE and the human genome. Nature Video, YouTube, September 10 (2012).

[10] Fernihough, A.   Kalkalash! Pinpointing the Moments “The Simpsons” became less Cromulent. DiffusePrior blog, April 30 (2013).

[11] Arbesman, S.   Warning: your reality is out of date. Boston Globe, February 28 (2010). Also see the following website: http://www.mesofacts.org/

[12] Surprisingly, this is a contemporary phenomenon: Irmak, M.K.   Schizophrenia or Possession? Journal of Religion and Health, 53, 773-777 (2014). For a thorough critique, please see: Coyne, J.   Academic journal suggests that schizophrenia may be caused by demons. Why Evolution is True blog, June 10 (2014).

[13] This is an approach favored by Deepak Chopra. He borrows the rather obscure idea of "nonlocality" (yes, basically a wormhole in spacetime) to explain higher levels of conscious awareness with states of brain activity.

[14] Three (divergent) takes on this:

a) Unreliable Research: trouble at the lab. Economist, October 19 (2013).

b) Ioannidis, J.P.A.   Why Most Published Research Findings Are False. PLoS Med 2(8): e124 (2005).

c) Alicea, B.   The Inefficiency (and Information Content) of Scientific Discovery. Synthetic Daisies blog, November 19 (2013).

[15] Hamilton, W. D.   The Genetical Evolution of Social Behavior. Journal of Theoretical Biology, 7(1), 1–16 (1964). See also: Brembs, B.   Hamilton's Theory. Encyclopedia of Genetics.

[16] Goodnight, C.   Why I Don’t like Kin Selection. Evolution in Structured Populations blog, April 23 (2014).

[17] Piketty, T.   Capital in the 21st Century. Belknap Press (2014). See also: Galbraith, J.K.   Unpacking the First Fundamental Law. Economist's View blog, May 25 (2014).

[18] DeLong, B.   Trying, yet again, to communicate the arithmetic scaffolding of Piketty's "capital in the Twenty-First Century". Washington Center for Equitable Growth blog, June 5 (2014).

No comments:

Post a Comment