June 27, 2014

Historical Contingencies at the Birthday Party

Historical contingencies are perhaps the most interesting outcomes of the evolutionary process. Stephen J. Gould spent a lot of time and energy making this idea popular, but evidence comes from both paleontology [1], extant populations [2], and experimental evolution [3]. However, the ubiquity of the contingency concept does not resolve its phylogenetic consequences. Is historical contingency highly specific (a hard constraint resulting in unique paths), or is it a softer constraint? And how can we understand the role of convergent evolution within this framework? We will approach this from a mathematical perspective, and answer the riddle of what evolution and birthdays have in common.

Definition of generative science (Wikipedia) and historical science (RationalWiki). 

Evolutionary Histories and Their Accidents
Like human history, evolutionary history is a product of many forces and causes. We often think of these factors as a series of chance events (sometimes unique) that lead to a given outcome [5]. Observers sometimes use this point to argue that history is not systematic and thus cannot be separated from context (and thus comparative history would be quite impossible) [5]. But this also assumes that the factors that make a given evolutionary history unique (its branching events) are "hard". Not only are they irreversible, but also should not have significant similarities. The outcomes of the evolutionary process (genotypes and phenotypes) are locked in to a specific trajectory. By itself, this constraint should favor some changes over others and disallow changes that resemble even closely-related lineages.

If historical contingency is a hard constraint, then this leads us to an evolutionary hypothesis: historical contingency creates irreversible paths to highly-unique phenotypes. While a bit simplistic, this nonetheless serves to understand the consequences of contingency. Recall that the evolutionary process occurs through branching, and results in a series of evolutionary outcomes (Figure 1). While these outcomes are individually different, their degree of uniqueness relies on the "hardness" of the branch that separates one outcome from another. Figure 1 not only shows the results of branching, but also assumes "hard" constraints. The contingencies generated by this model involve hard constraints that results in a unique, lineage-specific partition of the search space.

Figure 1. An example of a phylogeny with unique, non-recurrent evolutionary outcomes. The evolutionary changes act as hard constraints, and each terminal taxon occupies a distinct 1-dimensional subspace. 

In Figure 1, a conventional phylogenetic model demonstrates how a search space can be partitioned through evolutionary branching processes. However, when the constraints are softer, each branching event results in less distinction between the resulting alternative forms and increases the chances that traits or forms that resemble those of a related lineage (even distantly so) will emerge. Figure 2 demonstrates this difference using the analogy of the Plinko game [6]. In this case, the combination of the process and outcome of contingency creates an overlapping search space over time for a given lineage (see the distribution of Plinko balls at the bottom of Figure 2). 

Contingency also rests on the assumption that evolutionary randomness results in unique combinations of traits. One feature of historical contingency involves building upon previously-acquired traits. As complexity is built in this way, the total number of possibilities decreases. But while the stochastic nature of evolution is a matter of conditioned chance, branching is an assumption of theoretical intuition. Therefore, evolutionary outcomes can converge even when their forms nominally exist in different lineages. But given these constraints, shouldn't convergent evolution be impossible? Before we answer the question (and the answer is no) we must take an intellectual detour by way of birthday parties.

Figure 2. What it means to have an overlapping space of evolutionary outcomes enabled by soft historical constraints. COURTESY: Plinko Probability, version 2.02. PhET Interactive Simulations.

How are birthday parties at all relevant here? The birthday party paradox, a statistical curiosity, might help us establish a link between contingency and recurrence. But first, let us revisit our evolutionary process-as-hierarchical tree model. In this model, all possible combinations of are classified using a tree-like structure. Given that the search space is much larger than the number of objects being classified, do they also end up in unique categories? Perhaps. But, as we will learn, it may not matter as much as does the size and complexity of the evolutionary landscape itself.

What is the Birthday Party paradox [7]? Amazingly, this did not make a Quora list of the most counterintuitive mathematical results [8]. But perhaps this result is not so intuitive after all. Say you were to survey a room of n people. Given that every day of the year has an equal chance of being a birthday, how many people will you need to sample in order to find at least two people with the same birthday? The answer you might give depends on your intuitions about randomness. With 365 days in a typical year, one might assume that you would need a lecture hall of at least 300 people. But in fact, once you reach a sample size of 47, the probability (95%) becomes asymptotic to 100%. See Figure 2 for a graphical representation.

Figure 2. Number of people surveyed (x-axis) vs. probability of at least two people having the same birthday (y-axis).

Evolutionary Histories and Their Coincidents
This outcome results from a mathematical principle called recurrence. This principle suggests that motifs and themes can recur at an unknown frequency -- it explains why you get runs of heads or tails in a series of coin flips. This recurrence has nothing to do with the outcomes being related to one another. They are merely conincidences inherent in a generative process. In the evolutionary outcome space example, this suggest that overlap can occur in the form of deep similarities. Can this be applied to the probability that n lineages will exhibit convergence?

Not exactly what we are talking about here, but an evolutionary birthday nonetheless.

Phylogenetic birthday (or contingency) paradox:
In the next few tables, I have shown how the mathematics and problem formulation of the standard birthday paradox can be used to understand a generative set of evolutionary configuration and the probability of a parallel evolutionary outcome. 

What the data should look like (standard Birthday Party paradox):

 What the data should look like (proposed evolutionary paradox):

In the case of the evolutionary paradox, an exceedingly small sample size of 60 possible configurations was used for demonstration purposes. It is of note that this model is scalable to very large numbers of distinct evolutionary configurations. However, it is clear that the probability of convergent evolution is nearly 100% well before a given lineage is locked in to a single point in the configuration space. As the number of changes increases, the number of possible configurations changes is correspondingly reduced. 

But....but.....there are assumptions!
This model makes a few general assumptions. One, while each change is assumed to be countable, there is no accounting of how hard or soft the constraint actually is. This could be resolved through using a soft classifier to characterize each change, although would not remove the effects of geographically-localized specialization. An example of this is in the supplemental Excel dataset (see Notes section below). Another is that all evolutionary configurations are countable in the same way (e.g. no modularity). Again, this can be resolved by generating a matrix for each component of an organism (e.g. phenotypic module). 

Despite these assumptions (for better or for worse), the general principle of recurrence should give us a somewhat useful model for estimating how plausible or implausible convergent evolution is for a given set of evolutionary relationships. Recurrence is a useful tool that is largely ignored in conventional discussions about evolutionary constraints and parallel evolution. Once again, recurrence (by way of Henri Poincare in Figure 3) allows us to use principles of complexity theory to better understand evolutionary phenomena [9].

Figure 3. An example of Poincare recurrence. In this example, an image of Henri Poincare has been permutated, with reconstruction of the original image (or a reasonable approximation) is reached well before the maximum number of possible combinations is reached.

UPDATE (6/30/2014):
It was pointed out to me by a reader that birthdays have a distribution of their own throughout the course of a year. For example, birthdates in the Summer months (June, July) are more common than those in the winter months. This is of course due to human mating preferences and seasonality (and so birthdays are actually a quasi-stochastic process). Hence, there is a clustering of more common (as opposed to less common) birthdates on the calendar (Figure 4).

Figure 4. Visualization of birthdate frequency (in heatmap form) distributed across the calendar year. COURTESY: VizWiz blog and NYTimes.

I imagine this type of probability density is also somewhat true for evolutionary data across the diversity of a genus, order, or domain. But this type of clustering is also an outcome of stochastic processes (and one reason why recurrence is possible). When sampled at a given point in time, the outcome of a stochastic process is often not uniformly distributed -- in fact, it reveals clusters which must be distinguished from clusters that result from non-random processes. The question would be whether or not birthdates (or confounding evolutionary processes) cluster so significantly as to override clusters that result from randomness. The birthday paradox equations don't explicitly take that into account, but that likely does not invalidate the larger pattern.

UPDATE (8/4/2014):
Here is a good recent article from Nautil.us Magazine on evolutionary contingency. Puts a lot of the contemporary support for the idea in perspective.

Zorich, Z.   If the World Began Again, Would Life as We Know It Exist? Nautil.us, June 19 (2014).

Mathematical notation courtesy Wolfram MathWorld (http://mathworld.wolfram.com). Implemented in Excel courtesy of eXcel eXchange (http://excelexchange.com). Excel workbook (computed using pseudo-data) located on Github (https://github.com/balicea/evo-birthdays).

[1] Vermeij, G.J.   Historical contingency and the purported uniqueness of evolutionary innovations. PNAS, 103(6), 1804-1809 (2006).

[2] Taylor, E.B. and McPhail, J.D.   Historical contingency and ecological determinism interact to prime speciation in sticklebacks, Gasterosteus. Proceedings of The Royal Society of London B, 267, 2375-2384 (2000).

[3] Blount, Z.D., Borland, C.Z., and Lenski, R.E.   Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli. PNAS, 105(23), 7899-7906 (2008).

[4] Travisano, M., Mongold, J.A., Bennett, A.F., and Lenski, R.E.   Experimental Tests of the Roles of Adaptation, Chance, and History in Evolution. Science, 267, 87-90 (1995).

[5] Fales, E.   Uniqueness and Historical Laws. Philosophy of Science, 47(2), 260-276 (1980).

[6] The Plinko analogy has also been used to describe the epigenetic landscapes of Waddington: Gordon, R. Introduction to differentiation waves Part 2. The evo-devo of epigenetic landscapes as differentiation trees. Embryogenesis Explained course (2013).

[7] Fletcher, J.   The Birthday Paradox at the World Cup. BBC News Magazine, June 15 (2014).

[8] Mathematics: what are some of the most counterintuitive mathematical results? Quora, March 27 (2014).

[9] Crutchfield, J., Farmer, J.D., Packard, N.H., and Shaw, R.S.   Chaos. Scientific American, December (1986).

No comments:

Post a Comment