I have recently read the book
"Proving Darwin" by Gregory Chaitin. Greg is a mathematician who is best known for his work in
computational complexity [1]. "Proving Darwin" is not only a Mathematician's take on evolution by natural selection, but also a rebuttal of intelligent design. Despite its slimness [2], the book is
quite interesting. Chaitin is particularly interested in the
creative and algorithmic complexity aspects of natural selection. While this would seem a bit obscure to a biologist, this perspective provides crucial evidence for the viability of natural selection, and even parallels some empirical results from artificial life and evolutionary biology. Aside from providing a review of this book, I will also provide a conceptual evaluation of of the core thesis by using simulated data.
To orient the biological audience, a
Mathematical proof is a formal logical construct one uses to demonstrate the plausibility of a given solution. In the case of an unsolved mathematical problem (e.g. Riemann, P vs. NP), the proof is what is required to claim that a problem is solved [3]. In this case, Chaitin does not lay out his proof directly, but does draw his conclusions from it. This is an approach he calls "
Metabiology" [4]. The metabiological approach characterizes living systems as programs. Therefore, the primary difference between living and non-living entities in metabiology is the ability to transmit information. This is not a new observation, but is key to his characterization of evolutionary systems. A flame is used to highlight this difference: while a flame can have metabolism and can self-reproduce, it cannot evolve like a living entity because it cannot transmit information [5]. While this may not be a completely reasonable analogy, it does allow for us to further understand the nature of creativity in living systems.
The algorithm of evolution proposed in the book is a
k-bit program which can be modified via mutation [6]. This is similar to
in silico approaches to evolutionary biology such as
Tierra and
Avida. The relative information content (or program size complexity) of a mutation in the
k-bit program is the information content of a non-mutated program (e.g. genotype) given the information content of a mutant program [7]. The
k-bit program (Figure 1), as a highly stylized and abstract population of genomes, of results in a partially-stochastic dynamical system, the relevance of which will soon become clear.
Figure 1. Schematic of a k-bit program and how it evolves
In the context of evolution, creativity is the ability to create new combinations -- and ultimately forms. And according to Metabiology, the most creative lineages and forms that contain the most information-rich mutations. But what does it mean to possess "informative" mutations? To better understand this, Chaitin provides us with three evolutionary regimes to test his mathematical theory. These are: 1) intelligent design, 2) brainless exhaustive search, and 3) cumulative random evolution. Intelligent design is provided as a representative of evolution being a deterministic process that is limited to fixed number of outcomes. By contrast, brainless exhaustive search assumes that evolution is entirely random, with no constraints. Furthermore, this model has no memory, and does not allow for evolutionary conservation or modularity. Cumulative random evolution provides a middle path between these two scenarios: random mutation given a
limited fitness landscape, resulting in evolutionary trajectories that are highly path-dependent [8].
By solely considering the
Big O metric of program complexity [9], it is shown that the intelligent design scenario achieves a maximum fitness value much faster than either brainless exhaustive search or cumulative random evolution (Figure 2). Why is this and how does such a result prove natural selection? Evolutionary algorithms tend to be much slower than comparable techniques in contexts such as
multivariate optimization [10], even for a convex (e.g. where the goal is clearly defined) search space. However, the speed of attaining a maximum fitness value does not translate into a robust result. This is because in Metabiology, fitness [11] is defined as the rate of biological creativity rather than as being
related to differential reproduction. The speed (or tempo) of evolution is measured by how fast fitness grows [12]. While this does not provide a specific test of selective forces, it does move us away from the naive adage of "
survival of the fittest". Instead, we find lineages with varying amounts of creativity, and a situation akin to "survival of the most creative".
Figure 2. How quickly maximum fitness is achieved for three evolutionary scenarios. Inset: the earliest portion of evolution (indicated by bracket). Hypothetical scenario modeled using simulated (pseudo) data.
The maximum fitness for the intelligent design scenario (as shown in Figure 3) occurs very early in the evolutionary trajectory. Essentially, it shows a big initial gain for intelligent design. Presumably, this strategy finds the known strategies first. The other two strategies reach their maximal fitness later on. However, it is the secondary properties, not simply speed of solution, which actually make for an adaptive outcome. Using these secondary properties as a guide, we can assess exactly why cumulative random evolution is more powerful than the alternatives. As characterized in the book, there are four additional ways to evaluate these models: creativity, time, robustness, and coherence of function. Figure 4 shows continua for each of these.
Figure 3. Table that shows the degree of creativity, time taken, robustness exhibited, and systemic coherence exhibited by each evolutionary scenario. Hypothetical scenario modeled using simulated (pseudo) data.
Finally, let us consider how these results square away with findings from the artificial life literature. When I first started to read about Metabiology, I was struck by how parallel this approach is to the
Avida platform. Both approaches use programs (in the form of instruction sets) as a stand-in for the genome. In particular, the tendency for fitness (and complexity) to grow over evolutionary time. This either suggests that there is a fundamental insight to be had here, or that systems like these are very good at maximizing yield.
Figure 4. Table that shows the degree of creativity, time taken, robustness exhibited, and systemic coherence exhibited by each evolutionary scenario. Values along continuum are arbitrary. 1 = Intelligent Design, 2 = Brainless Exhaustive Search, 3 = Cumulative Random Search.
Instruction-based evolutionary models are derived from
Turing machines [12], for which the machine's state is indeterminate. This is based on the halting mechanism, which does not allow for guided behavior in any way. Yet evolution must be guided in some fashion, and intelligent design provides brittle results at best. This is where Chaitin introduces readers the concept of an
Oracle [13], which can be thought of as a top-down guidance mechanism for the
halting (or mutational) behavior of a random system. Think of this as a natural version of
site-directed mutagenesis, so that lethal combinations would be much less likely than otherwise. This natural (albeit hypothetical) oracle could also resolve some of the evolutionary paradoxes we often witness in natural populations. For example, an oracle might prevent populations from getting stuck in low-fitness regimes, or make large portions of a rugged
fitness landscape more accessible [14]. In metabiological explorations, throwing out lethal mutants maintains high levels of fitness and allows for evolutionary dynamics such as
competition and
sustained arms races.
A naturalistic mechanism for the proposed oracle is still very much a mystery. However, the architecture of execution routines and the structure of
genetic regulatory networks (GRNs) is very similar. From a programming perspective, a hierarchical structure could serve as a heuristic stand-in for this oracle. In particular, an approach called subrecursion (e.g the use of
subrecursive hierarchies -- [15]) might provide a mechanism for the supervised maximization of fitness. This is the direction the end of the book should have taken -- however, this might also be work in progress. A more general mechanism may lie in the mysteries of
mathematical incompleteness and randomness itself. Mathematical incompleteness (discussed at length in this book in the form of Chaitin's Omega number) suggests that all consistent statements must include
undecidable propositions which neither be proven nor disproven. This might help us understand why although the forces of evolution are well characterized, the explanatory power of evolutionary laws is limited [16].
So what are the theorems of evolution? Chaitin does not make them explicit in this book, which is a letdown. And a causal reader gets the sense that Metabiology is simply a reformulation of results already confirmed in the areas of population genetics and digital biology. However, it is interesting that Chaitin has come at these results using a parallel methodology. Perhaps this convergence is the best evidence for evolution by natural selection.
NOTES:
[1] The field of
Algorithmic Information Theory, to be exact.
[2] Contrast this with the thickness of "
The Structure of Evolutionary Theory" by Steven J. Gould.
[3] As opposed to a
test of the null hypothesis (NHST) or
transitional fossil.
[4] This is based on a course at
UFRJ and a lecture at the
Santa Fe Institute (which I cannot locate on YouTube). For more information, please view this blog post at
Logical Atomist.
[5] Another difference between a flame and a living population is the scale of its
replicators. A flame, while hard to characterize as a population, behaves more like a
slime mold than an intermittently-coordinated population (e.g. flock of birds or herd of sheep). This is an interesting point which is beyond the scope of the book.
[6] The probability of mutation (or probability of M) scales asymptotically to program length 2
-k.
[7] In other words, the lineage resembles a
probabilistic graphical model (PGM - e.g. Bayesian network). The more complex lineages are simply those with more informative mutations.
For a tutorial on PGMs, please see the following reference: Airoldi, E.M.
Getting Started in Probabilistic Graphical Models. PLoS Computational Biology, 3(12), e252 (2007).
[8] For simplicity's sake we can assume that all programs maximize their fitness on a fitness landscape. The actual topology is not important in this example.
[9]
A Beginner's Guide to Big O Notation. Rob Bell blog, June 2009. See also
Big O Notation, Wikipedia page.
[10] For the speed of evolutionary algorithms in the context of multiobjective optimization, please see: Coello-Coello, C.A., Lamont, G.B., and Van Veldhuizen, D.A. Evolutionary Algorithms for Solving Multi-objective Problems. Springer, Berlin (2007).
[11] Fitness is characterized using the Busy Beaver (BB) function. In metabiology, BB(N) is the maximum fitness function attainable given the template program, suite of mutations, and strategy of evolution pursued.
For comparative purposes, another person doing work on the role of creativity in evolution (
in silico) is
Joel Lehman at UT-Austin.
[12] For more on the speed (or rate) of evolution through the accumulation of mutations in
E. coli (e.g. wet-lab experimentation), please see: Kryazhimskiy, S., Tkacik, G., and Plotkin, J.B. The dynamics of adaptation on correlated fitness landscapes. PNAS, 106(44), 18638-18643 (2009)
AND Desai, M.M., Fisher, D.S., and Murray, A.W. The Speed of Evolution and Maintenance of Variation in Asexual Populations. Current Biology, 17(5), 385-394 (2007).
For an Avidian study on the rate of evolution in the context of fitness landscape ruggedness (e.g. in silico experimentation), please see: Clune, J., Misevic, D., Ofria, C., Lenski, R.E., Elena, S.F., Sanjuan, R. Natural Selection Fails to Optimize Mutation Rates for Long-Term Adaptation on Rugged Fitness Landscapes. PLoS Computational Biology, 4(9), e100087.
[13] In thermodynamics, such an oracle is referred to as Maxwell's Demon. While this demon is rendered hypothetically impossible, a computational oracle is less bound by the laws of physics. For a primer on Maxwell's Demon, please see
this Java app (shown below).
[14] Franke, J., Klozer, A., de Visser, J.A.G.M., and Krug, J. Evolutionary Accessibility of Mutational Pathways. PLoS Computational Biology, 7(8), e1002134 (2011).
[15] For more information, please see: Calude, C. Theories of Computational Complexity. Elsevier, Amsterdam (1988)
AND Calude, C. Incompleteness, Complexity, Randomness, and Beyond. Minds and Machines, 12(4), 503-517 (2002).
[16] For those who are unfamiliar, the best-known evolutionary law is
allometry, or the scaling of size and growth across phylogeny.