December 26, 2013

A Mechanistic but Non-deterministic Holiday Season.....

This has been cross-posted to Tumbld Thoughts

The cold and emotionless holiday season......

Have a recursive holiday season!

Season's Fractals!

Here are some nice technological and mathematical non-sequiturs for the holiday season. Full of modern Western traditions. Thanks to Craig Ferguson, Benoit Mandelbrot, and natural forms for the conceptual help.

December 21, 2013

Dragons, Sandpiles, and Cavefish: an evolutionary inquiry

One area of evolutionary science that has always fascinated me involve subtle evolutionary mechanisms. Having an interest in evolutionary modeling and theoretical biology, I am particularly interested in evolutionary mechanisms that are nonlinear, and provide a path towards complex evolutionary dynamics. It is somewhat different from a traditional phylogenetic model, and requires a significant departure from standard population genetics thinking as well.

Whether this belongs to the extended evolutionary synthesis or not is not clear, although a mechanism-first approach is inclusive of development and other life-history considerations. We will begin by looking at a new paper [1] on the evolution of Mexican cavefish (Astyanax mexicanus) populations. A. mexicanus had been previously identified as a prime example of developmental processes playing a role in morphological divergence between species [2]. Namely, the cave-dwelling morph has lost its eyes, which are not needed in the cave environment. Figure 1 shows the latest version of this story.

Overview of Hsp90 phenotypic capacitance mechanism, based on account from [1].

In the latest paper, an inducible system is tested which depletes the available amount of Hsp90 (a chaperrone molecule which aids in protein folding). A few notes on the changes that have been linked to the absence of Hsp90:

1) the relationship between Hsp90 (chaperone) and proteins is one of a metastable signal transducer [3]. For example, one folding state results when chaperrone is present, while another state results when the chaperrone is not. This results in a sigmoidal response function. As the chaperrone is depleted, some deleterious traits become unmasked. But for large-scale changes to occur, a complete depletion of the
chaperrone is required.

Schematic demonstrating the shape of a sigmoidal function.

2) Hsp90 is intentionally overproduced in the sense that enough of the chaperone is available when unpredictable environmental stresses occur, requiring a greater amount of chaperrone to achieve the proper folding. This baseline is a conserved mechanism for morphological robustness (sometimes called phenotypic buffering).

3) in general, the more environmental stress that exists during development, the more Hsp90 is needed and used. When Hsp90 is exhausted, deleterious and large-scale changes can be unmasked (sometimes called phenotypic capacitance).

4) since selection does not act directly upon masked variation, multiple variants can be unmasked at the same time, revealing large changes in phenotype (similar to the notion of hopeful monster).

A hopeful monster represented using the Fisherian model of evolutionary mutation. Taken from [4].

The bottom line is that while an eyeless phenotype would not have a high fitness in an above-ground environment, having eyes would be quite costly in a cave environment. Thus, eyeless phenotypes would suddenly have a high fitness, but only in the context of this niche. But how do you get from point A to point B, particularly when most existing theoretical models assume gradual and/or genotypic-driven change?

This is where statistical models of extreme events comes into play. While the biological model suggests that eyeless phenotypes is the consequence of a failed protective mechanism, we can also understand these changes as extreme events that have a statistical distribution in any evolutionary context. Fortunately, we can turn to statistical physics for two candidate models: the Abelian sandpile model [5], and the Dragon King model [6].

Conceptual model and agent-based Java simulation of an Abelian Sandpile Model (a.k.a. the Bak-Weisenfeld-Tang model). Adding sand grains to the pile results in avalanches distributed according to a power law distribution.

In both cases, we must make the assumption that extreme events are not only possible but inevitable as evolutionary outcomes. In both "sandpile" and "dragon" evolution, extreme events can drive processes like speciation, niche specialization, and evolutionary diversification. The difference between sandpile evolution and dragon evolution involves whether or not extreme events are due to the same process as other evolutionary outcomes which are smaller in magnitude. This should not be interpreted as a verdict on so-called "gene-centered" evolution [7] -- while sandpile evolution is more likely to be dominated by changes in the genetic architecture, dragon evolution simply provides ways to organize the expression of these genotypic changes.

Distribution in time (top) and probability distribution (bottom) of Dragon King events (notice that they deviates from a conventional power law in the tail region). Images taken from [8].

The sandpile model demonstrates that the same underlying process (in this case, the growth and avalanche dynamics of a sandpile) is responsible for observed events of every magnitude. While this process is stochastic and unpredictable, it can be characterized using a power law distribution [5]. While you can predict the existence of avalanches (and perhaps at a certain frequency), you cannot predict when they will happen or the chain of events that lead up to them. In "sandpile evolution", the mutational structure serves as the driving force for evolutionary change. Even when the genotypic mutation rate is constant, cumulative changes (driven by delayed feedback) could sometimes lead to large-scale and sudden changes in phenotype.

A dynamical phase space representation of the Dragon King event, taken from [8]. In this case, the dynamical behavior of a coupled chaotic oscillator sporadically wanders far outside of its attractor orbit, resulting in an extreme event.

The Dragon King model, by contrast, assumes that events of large magnitude are not due to the same processes as events of small magnitude [6]. Dragon King events, such as financial crises [9], coherent structures in turbulent fluids [6], and the behavior of coupled chaotic oscillators [8], cannot be characterized well by a typical power law distribution, with exceptional differences in the tail region [6]. While "dragon
evolution" relies upon two or more concurrent processes, there is a historical contingency that allows for one of these processes to be sporadically amplified. This amplificion is accomplished through cumulative negative feedback from some mediating factor (perhaps chaperrones). Much as in the case of sandpile evolution, this generates large-magnitude events as a low frequency.

From what I can tell, the model of phenotypic capacitance for the cavefish matches the Dragon King criteria quite well. In this case, you have a dynamical system -- a variable concentration of Hsp90 that changes deterministically with respect to stochastic environmental fluctuations. When the Hsp90 concentration reaches zero (which happens rarely and represents a lower-bound), the phenotypic system sojourns far from equilibrium. Crucially, the depletion of Hsp90 and the absence of Hsp90 behave as independent systems: the depeletion of Hsp90 merely allows for deleterious phenotypes to be expressed.

It is of note that the original Hsp90 experiments in Drosophila [10], most of these phenotypes turned out to be embryonic lethal. But, using a different mechanism, the absence of Hsp90 allows for suites of mutations (representing latent variation) to be expressed, and resulting in a coherent, non-lethal embryonic phenotype that can have high fitness in a narrow range of environmental contexts. Perhaps this is the beginnings of a mathematical model for evo-devo!

Reconciling the Dragon King in phase space with phenotypic evolution. Images taken from [8] and [4].


[1] Rohner, N., Jarosz, D.F., Kowalko, J.E., Yoshizawa, M., Jeffery, W.R., Borowsky, R.L., Lindquist, S., and Tabin, C.J.   Cryptic Variation in Morphological Evolution: Hsp90 as a Capacitor for Loss of Eyes in Cavefish. Science, 342, 1372-1375 (2013). Associated Phenomenon blog article can be found here.

[2] Jeffery, W.R.   Evolution and development in the cavefish Astyanax. Current Topics in Developmental Biology, 86, 191-221 (2009).

[3] Pratt, W.B., Morishima, Y., and Osawa, Y.   The Hsp90 Chaperone Machinery Regulates Signaling by Modulating Ligand Binding Clefts. Journal of Biological Chemistry, 283, 22885-22889 (2008).

[4] Chouard, T.   Revenge of the Hopeful Monster. Nature, 463, 864-867 (2010).

[5] Bak, P., Tang, C., and Wiesenfeld, K.   Self-organized criticality: an explanation of 1/ƒ noise. Physical Review Letters, 59(4), 381–384 (1987).

[6] Sornetts, D. and Ouillon, G.   Dragon-kings: mechnisms, statistical methods, and empirical evidence. European Physical Journal, 205, 1-26 (2012).

[7] For the latest shots (and resulting kerfuffle) in this debate, please see: Dobbs, D. "Die, Selfish Gene, Die" has evolved. David Dobbs' Neuron Culture blog, December 13 (2013).

[8] de S. Cavalcante, H.L.D., Oria, M., Sornette, D., Ott, E., and Gauthier, D.J.   Predictability and Suppression of Extreme Events in a Chaotic System. Physical Review Letters, 111, 198701 (2013).

[9] Sornette, D.   Dragon-kinds, black swans, and the prediction of crises. International Journal of Terraspace Science and Engineering, 2(1), 1-18 (2009). Associated TED talk can be found here.

[10] Lindquist, S.L. and Rutherford, S.   Hsp90 as a capacitor for morphological evolution. Nature, 396, 336-342 (1998).

December 16, 2013

Fireside Science: Inspired by a visit to the Network's Frontier....

This post has been cross-posted to Fireside Science.

Recently, I attended the Network Frontiers Workshop at Northwestern University in Evanston, IL. This was a three-day session in which researchers engaged in network science from around the world gathered to present their work. They also came from many home disciplines, including computational biology, applied math and physics, economics and finance, neuroscience, and more.

The schedule (all researcher names and talk titles) can be found here. I was among one of the first presenters on the first day, presenting “From Switches to Convolution to Tangled Webs” [1], which involves network science from a evolutionary systems biology perspective.

One Field, Many Antecedents
For many people who have a passing familiarity with network science, it may not be clear as to how people from so many disciplines can come together around a single theme. Unlike more conventional (e.g. causal) approaches to science, network (or hairball) science is all about finding the interactions between the objects of analysis. Network science is the large-scale application of graph theory to complex systems and ever-bigger datasets. These data can come from social media platforms, high-throughput biological experiments, and observations of statistical mechanics. 

The visual definition of a scientific "hairball". This is not causal at all.....

25,000 foot View of Network Science
But what does a network science analysis look like? To illustrate, I will use an example familiar to many internet users. Think of a social network with many contacts. The network consists of nodes (e.g. friends) and edges (e.g. connections) [2]. Although there may be causal phenomena in the network (e.g. influence, transmission), the structure of the network is determined by correlative factors. If two individuals interact in some way, this increases the correlation between the nodes they represent. This gives us a web of connections in which the connectivity can range from random to highly-ordered, and the structure can range from homogeneous to heterogeneous.

Friend data from my Facebook account, represented as a sizable (N=64) heterogeneous network. COURTESY: Wolfram|Alpha Facebook app.

Continuing with the social network example, you may be familiar with the notion of “six degrees of separation” [3].  This describes one aspect (e.g. something that enables nth-order connectivity) of the structure inherent in complex networks. Again consider the social network: if there are preferences for who contacts whom, a randomly-connected network results. The path between any two individuals in such a network is generally high, as there are no reliable short-cuts. This path across the network is also known as the network diameter, and is an important feature of a network's topology.

Example of a social network. This example is homogeneous, but with highly-regular structure (e.g. non-random). 

Let us further assume that in the same network, there happen to be strong preferences for inter-node communication, which leads to changes in connectivity. In such cases, we get connectivity patterns that range from scale-free [4] to small-world [5]. In social networks, small-world networks have been implicated in the “six degrees” phenomenon, as the path between any two individuals is much shorter than in the random case. Scale-free and especially small-world networks have a heterogeneous structure, which can include local subnetworks (e.g. modules or communities) and small subpopulations of nodes with many more connections than other nodes (e.g. network hubs). Statistically, heterogeneity can be determined using a number of measures, including betweenness centrality and network diameter.

Example of a small-world network, in the scheme of things. 

Emerging Themes
While this example was made using a social network, the basic methodological and statistical approach can be applied to any system of strongly-interacting agents that can provide a correlation structure [6]. For example, high-throughput measurements of gene expression can be used to form a gene-gene interaction network. Genes that correlate with each other (above a pre-determined threshold) are consider connected in a first-order manner. The connections, while indirectly observed, can be statistically robust and validated via experimentation. And since all assayed genes (or the order of 103 genes) are likewise connected, second and third-order connections are also possible. The topology of a given gene-gene interaction network may be informative about the general effects of knockout experiments, environmental perturbations, and more [7].

This combination of exploratory and predictive power is just one reason why the network approach has been applied to many disciplines, and has even formed a discipline in and of itself [8]. At the Network Frontiers Workshop, the talks tended to coalesce around several themes that define potential future directions for this new field. These include:

A) general mechanisms: there are a number of mechanisms that allow for the network to adaptively change, stay the same in the face of pressure to change, or function in some way. These mechanisms include robustness, the identification of switches and oscillators, and the emergence of self-organized criticality among the interacting nodes. Papers representing this theme may be found in [9].

The anatomy of a forest fire's spread, from a network perspective.

B) nestedness, community detection, and clustering: Along with the concept of core-periphery organization, these properties may or may not exist in a heterogeneous network. But such techniques allow us to partition a network into subnetworks (modules) that may operate with a certain degree of independence. Papers representing this theme may be found in [10].

C) multilevel networks: even in the case of social networks, each "node" can represent a number of parallel processes. For example, while a single organism possesses both a genotype and a phenotype, the correlational structure for genotypic and phenotypic interactions may not always be identical. To solve this problem, a bipartite (two independent) graph structure may be used to represent different properties of the population of interest. While this is just a simple example, multilevel networks have been used creatively to attack a number of problems [11].

D) cascades, contagions: the diffusion of information in a network can be described in a number of ways. While the common metaphor of "spreading" may be sufficient in homogeneous networks, it may be insufficient to describe more complex processes. Cascades occur when transmission is sustained beyond first-order interactions. In a social network, messages that gets passed to a friend of a friend of a friend (e.g. third-order interactions) illustrate the potential of the network topology to enable cascade. Papers representing this theme may be found in [12].

E) hybrid models: as my talk demonstrates, the power and potential of complex networks can be extended to other models. For example, the theoretical "nodes" in a complex network can be represented as dynamic entities. Aside from real-world data, this can be achieved using point processes, genetic algorithms, or cellular automata. One theme I detected in some of the talks was the potential for a game-theoretic approach, while others involved using Google searches and social media activity to predict markets and disease outbreaks [13].

Here is a map of connectivity across three social media platforms: Facebook, Twitter, and Mashable. COURTESY: Figure 13 in [14].

[1] Here is the abstract and presentation. The talk centered around a convolution architecture, my term for a small-scale physical flow diagram that can be evolved to yield not-so-efficient (e.g. sub-optimal) biological processes. These architectures can be embedded into large, more complex networks as subnetworks (in a manner analogous to functional modules in gene-gene interaction or gene regulatory networks).

One person at the conference noted that this had strong parallels with the book “Plausibility of Life” (excerpts here) by Marc Kirschner and John Gerhart. Indeed, this book served as inspiration for the original paper and current talk.

[2] In practice, "nodes" can represent anything discrete, from people to cities to genes and proteins. For an example from brain science, please see: Stanley, M.L., Moussa, M.N., Paolini, B.M., Lyday, R.G., Burdette, J.H. and Laurienti, P.J.   Defining nodes in complex brain networks. Frontiers in Computational Neuroscience, doi:10.3389/fncom.2013.00169 (2013).

[3] the "six degrees" idea is based on an experiment conducted by Stanley Milgram, in which he sent out and tracked the progression of a series of chain letters through the US Mail system (a social network). 

The potential power of this phenomenon (the opportunity to identify and exploit weak ties in a network) was advanced by the sociologist Mark Granovetter: Granovetter, M.   The Strength of Weak Ties: A Network Theory Revisited. Sociological Theory, 1, 201–233 (1983).

The small-world network topology (the Watts-Strogatz model), which embodies the "six degrees" principle, was proposed in the following paper: Watts, D. J. and Strogatz, S. H.   Collective dynamics of 'small-world' networks. Nature, 393(6684), 440–442 (1998).

[4] Scale-free networks can be defined as a network with no characteristic number of connections across all nodes. Connectivity tends to scale with growth in the number of nodes and/or edges. Whereas connectivity in a random network can be characterized using a Gaussian (e.g. normal) distribution, connectivity in a scale-free network can be characterized using a Power Law (e.g. exponential) distribution.

[5] Small-world networks are defined by their hierarchical (e.g. strongly heterogeneous) structure and a short path length across the network. This is a special case of the more general scale-free pattern, and can be characterized with a strong power law (e.g. the distribution has a thicker tail). Because any one node can reach any other node in a relatively small number of steps, there are a number of organizational consequences to this type of configuration.

[6] Here are two foundational papers on network science [a, b] enlightening primers on complexity and network science [c, d]:
[a] Albert, R. and Barabasi, A-L.   Statistical mechanics of complex networks. Reviews in Modern Physics, 74, 47–97 (2002).

[b] Newman, M.E.J.   The structure and function of complex networks. SIAM Review, 45, 167–256 (2003).

[c] Shalizi, C.   Community Discovery Methods for Complex Networks. Cosma Shalizi's Notebooks - Center for the Study of Complex Systems, July 12 (2013).

[d] Voytek, B.   Non-linear Systems. Oscillatory Thoughts blog, June 28 (2013).

[7] For an example, please see: Cornelius, S.P., Kath, W.L., and Motter, A.E.   Controlling complex networks with compensatory perturbations. arXiv:1105.3726 (2011).

[8] Guimera, R., Uzzi, B., Spiro, J., and Amaral, L.A.N   Team Assembly Mechanisms Determine Collaboration Network Structure and Team Performance. Science, 308, 697 (2005).

[9] References for general mechanisms (e.g. switches and oscillators):
[a] Taylor, D., Fertig, E.J., and Restrepo, J.G.   Dynamics in hybrid complex systems of switches and oscillators. Chaos, 23, 033142 (2013).

[b] Malamud, B.D., Morein, G., and Turcotte, D.L.   Forest Fires: an example of self-organized critical behavior. Science, 281, 1840-1842 (1998).

[c] Ellens, W. and Kooij, R.E.   Graph measures and network robustness. arXiv: 1311.5064 (2013).

[d] Francis, M.R. and Fertig, E.J.   Quantifying the dynamics of coupled networks of switches and oscillators. PLoS One, 7(1), e29497 (2012).

[10] References for clustering [a], community detection [b-e], core-periphery structure detection [f], and nestedness [g]:
[a] Malik, N. and Mucha, P.J.   Role of social environment and social clustering in spread of opinions in co-evolving networks. Chaos, 23, 043123 (2013).

[b] Rosvall, M. and Bergstrom, C.T.   Maps of random walks on complex networks reveal community structure. PNAS, 105(4), 1118-1123 (2008).

* the image above was taken from Figure 3 of [a]. In [a], an information-theoretic approach to discovering network communities (or subgroups) is introduced.

[c] Colizza, V., Pastor-Satorras, R. and Vespignani, A.   Reaction–diffusion processes and metapopulation models in heterogeneous networks. Nature Physics, 3, 276-282 (2007).

[d] Bassett, D.S., Porter, M.A., Wymbs, N.F., Grafton, S.T., Carlson, J.M., and Mucha, P.J.   Robust detection of dynamic community structure in networks. Chaos, 23, 013142 (2013).

* the authors characterize the dynamic properties of temporal networks using methods such as optimization variance and randomization variance.

[e] Nishikawa, T. and Motter, A.E.   Discovering network structure beyond communities, Scientific Reports, 1, 151 (2011).

[f] Bassett, D.S., Wymbs, N.F., Rombach, M.P., Porter, M.A., Mucha, P.J., and Grafton,
S.T.   Task-Based Core-Periphery Organization of Human Brain Dynamics. PLoS Computational Biology, 9(9), e1003171 (2013).

* a good exampkle of how core-periphery structure is extracted from brain networks constructed from fMRI data.

[g] Staniczenko, P.P.A., Kopp, J.C., and Allesina, S.   The ghost of nestedness on ecological networks. Nature Communications, doi:10.1038/ncomms2422 (2012).

[11] References for multilevel networks:
[a] Szell, M., Lambiotte, R., Thurner, S.   Multirelational organization of large-scale social networks in an online world. PNAS, doi/10.1073/pnas.1004008107 (2010).

[b] Ahn, Y-Y., Bagrow, J.P., and Lehmann, S.   Link communities reveal multiscale complexity in networks. Nature, 466, 761-764 (2010).

[12] References for cascades and contagions:
[a] Centola, D.   The Spread of Behavior in an Online Social Network Experiment. Science, 329, 1194-1197 (2010).

[b] Brummitt, C.D., D’Souza, R.M., and Leicht, E.A.   Suppressing cascades of load in interdependent networks. PNAS, doi:10.1073/pnas.1110586109 (2011).

[c] Brockmann, D. and Helbing, D.   The Hidden Geometry of Complex, Network-Driven Contagion Phenomena. Science, 342(6164), 1337-1342 (2013).

[d] Glasserman, P. and Young, H.P.   How Likely is Contagion in Financial Networks? Oxford University Department of Economics Discussion Papers, #642 (2013).

[13] Reference for hybrid networks and other themes, including network evolution [a,b] and the use of big data in network analysis [c,d]:
[a] Pang, T.Y. and Maslov, S.   Universal distribution of component frequencies in biological and technological systems. PNAS, doi:10.1073/pnas.1217795110 (2012).

[b] Bassett, D.S., Wymbs, N.F., Porter, M.A., Mucha, P.J., and Grafton, S.T.   Cross-Linked Structure of Network Evolution. arXiv: 1306.5479 (2013).

[c] Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, M.S., and Brilliant, L.   Detecting influenza epidemics using search engine query data. Nature, 457, 1012–1014 (2008).

[d] Michel, J-B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Google Books Team, Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.   Quantitative Analysis of Culture Using Millions of Digitized Books. Science, 331(6014), 176-182 (2011).

[14] Ferrara, E.   A large-scale community structure analysis in Facebook. EPJ Data Science, 1:9 (2012).

December 11, 2013

Speculating (and modeling speculation) about Biology, Culture, and Peer-review

Here are the latest features cross-posted to Tumbld Thoughts. A cornucopia of themes, from a model of pure speculation (I), to a new paper and reflections on the diversity of life-history strategies to aging across 46 species (II), and the human actions and reactionary tendencies that result from massive cultural change (III). Also featured is an update on biases inherent in the peer-review process (IV). So let's get started.

I. Pure (or Applied) Speculation

Here is an interesting model of predicting the future, courtesy of Anthony Dunne and Stuart Candy. In the book "Speculative Everything", Dunne and co-author present a design-centered vision for predicting the future [1].

Dunne's talk at the Resonante conference features a model of future prediction proposed by Stuart Candy of the Sceptical Futurist blog and LongNow foundation. Candy's model treats the future as a prismatic spectrum of future outcomes.

Using the prismatic spectrum metaphor (my coinage), the future is understood as an extension of the present, with progressively more and less likely outcomes. The "preferred futures" fall between the most likely and the most promising potentials.

II. What happens when you combine phylogeny, demography, meta-analysis, and life-history?

Apparently, yes it does.....

The top picture is from a new paper [2] that combines a phylogenetic perspective with demography (quasi-phylodemography) to look at variation in aging across the life-history of 46 species. A summary and set of insights from Phenomenon blog can be found in [3]. 

By compiling data from multiple sources and conducting a meta-analysis, the authors of [2] found that life history trends for fertility, mortality, and survivorship vary widely both cross-culturally (in humans) and across the tree of life. 

To make sense of this diversity, the authors of [2] propose a fast-slow continuum of senescence: from populations with a short-lived, early reproductive period to populations with a long-lived, extended reproductive period. These results can be compared with the review in [4], which presents the standard view of why we age, circa 2000.

III. The Action and Reaction of Cultural Change

Here is a rather long, detailed FAQ (some might say manifesto) from the Slate Star Codex on how to be an anti-reactionary. The detail is in the nature of Reactionism (or Neo-reactionism), which is the tendency to embrace the past (or zombie ideas) as if modernization will only bring degradation and "ruin everything" [5].

This notion of progress and reaction are largely based on human value systems, as Scott Alexander points out. While the reactionary would argue that turning away from traditional cultural value systems leads to economic ruin and rampant crime, the data show the opposite.

To get right to the point of this argument (so to speak), go to Section 3.3 (then where does progress come from). There you will find data from the World Values Survey, where the so-called "vanguard" countries (in terms of growth and safety) possess high levels of both secular-rational and self-expression values [6]. 

IV. Herding in peer-review

With significant apologies to Gary Larson and the scientific community. Please read on....

In the last Synthetic Daisies post, I featured a new paper of mine posted to the arXiv called "A Semi-supervised Peer Review System", which was itself based on a previous Synthetic Daisies post. In this paper, I introduced a model of automated objective manuscript evaluation focused on fraud detection. 

Now there is a new paper in Nature [7] that discusses the phenomenon of herding in peer-review and evaluation. Here, the authors use a Bayesian (as opposed to a signal detection) statistical model to describe what happens when reviewers converge upon a misclassification (e.g. rejecting a paper with solid conclusions and methods). They call for the inclusion of subjectivity in the decision-making criterion: subjective decisions are those that include assessing both the strength of a reviewer's agreement with the conclusions and more conventional features of the manuscript (e.g. strength of the premises and methods employed).

In the graphs above, two scenarios are compared: M1 (which is the subjective strategy) and M2 (which is a purely objective strategy). The authors claim that the M1 strategy prevents so-called herding and promotes a more unbiased outcome.


[1] the inability of people to envision novel and coherent futures is a prominent theme in "The Secret War between Downloading and Uploading" by Peter Lunenfeld. 

[2] Jones, O.R.   Diversity of aging across the tree of life. Nature, doi:10.1038/nature12789 (2013).

[3] Hughes, V.   Why do we age: a 46-species comparison. Phenomena blog, December 8 (2013).

[4] Kirkwood, T.B.L. and Austad, S.N.   Why do we age? Nature 408, 233-238 (2000). Source of the bottom image.

[5] what that everything constitutes is not always clear. However, it could be the shock and uncertainty of the culture change process itself.

[6] similar to the argument Steven Pinker makes in "The Better Angels of our Nature", and for similar reasons.

[7] Park, I-U., Peacey, M.W., and Munafo, M.R.   Modelling the effects of subjective and objective decision-making in scientific peer review. Nature, doi:10.1038/nature12786 (2013) AND Dapper, A.   Should Scientists be more subjective? Nothing in Biology Makes Sense! blog, December 11 (2013).

December 7, 2013

Thought (Memetic) Soup, December edition

Here is the latest installment of assorted features from my micro-blog, Tumbld Thoughts. These include heuristic-based prediction of the future, system shock and recovery (human edition), and exposure vs. prestige. More from the intersection of human culture, technology, and complexity theory.

Heuristic-based Prediction of the Future

How do you predict the future? How does anyone predict the future? Perhaps they use heuristics such as the extrapolation of current trends, gradualistic change, or stasis in human value systems (see the "future prediction heuristics GUI", top picture). Here are two attempts at future approximation from the academia and the technology industry, respectively.

"We have tended to see the professor as a single figure, but he is now a multiple being of many types, tasks, and positions". Circa 2013.

The article in [1] is a counter to the common argument that academia has undergone a period of "deskilling". Here, the author thinks sociological differentiation rather than deskilling is at the root of institutional change, and that trend will continue into the future. 

"Do our computer pundits lack all common sense? The truth in no online database will replace your daily newspaper, no CD-ROM can take the place of a competent teacher and no computer network will change the way government works". Circa 1995.

The article in [2] is a retro look at critiques of the internet, circa 1995. The context for this critique was set against the unbridled optimism of what the internet would change in society. And even though many of the changes deemed too unrealistic actually came to pass, not all of them unfolded in the same way people expected them to in 1995 [3].

Systemic Shock and Recovery: human edition

For the end of the 2013 Hurricane season, I provide some storm-related free association. The first picture is about what tends to happen socio-economically in the aftermath of a hurricane. This was inspired not only by the aftermath of Typhoon Haiyan [4], but also by the response of fault-tolerant computer systems.

The picture above shows a summary [5] of all Hurricane tracks in the Atlantic during the 2013 season. This season was fairly quiet, with no major storms and a relatively small number of landfalls.

Exposure vs. Prestige

Randomly sampling Wikipedia entries and then using it to predict h-index scores may mean nothing. This is a play on the title of [6], but serves as a good, one-sentence critique of [7].

The authors of [7] suggest that personal profiles of scientists on Wikipedia should correspond with scientific impact (measured using the h-index). If they do not, then it suggests that Wikipedia is the source of distortion, artificially giving attention to lesser mortals (as it were). 

However, this assumes two things: that the properties of Wikipedia entries should reflect the scoring of citation indices, and that random samples of Wikipedia entries will correspond to the distributions of h-index values.

The first assumption is only valid if h-indices capture all possible information about scientific impact. Clearly, this is not the case, as many different indices have been developed [8] to characterize the various nuances inherent in scientific output and influence. 

The authors of [3] present a systematic review of various citation indices. Importantly, none of which produce a normal distribution centered around a mean. So when the mean h-index value of the Wikipedia sample is compared to the h-indices of different scientific fields, it does not mean as much as one would assume at first glance.

The brings us to the second assumption, which regards the underlying distribution of scientific impact. While this is not clearly discussed in [7], we know from other studies [9] that scientific impact can be explained using Lotka's Law (which can be characterized using a Pareto distribution).

While this long-tail can be mitigated using specialized metrics such as the x-index [10], this was not considered in [7]. In fact, one could argue that Wikipedia profiles and citation indices are statistically independent of one another.


[1] Williams, J.J.   The Great Stratification. Chronicle of Higher Education, December 2 (2013).

[2] This was a piece by Clifford Stoll in Time magazine. For more, please see: Yglesias, M.   Predictions About the Web From 1995. Moneybox blog, December 2 (2013).

[3] For an interesting look at the technological way forward from around that time, check out: Gates, B.   The Road Ahead. Penguin (1995).

[4] Langfitt, F.   After the Storm: commerce returns to damaged Phillipines city. Parallels NPR, November 25 (2013) AND Quismorio, E.A.   Looters' goods sold on Tacloban streets. Tempo: news in a flash, November 21 (2013).

[5] Masters, J.   The Unusually Quiet Atlantic Hurricane Season of 2013 Ends. Dr. Jeff Masters' WunderBlog, November 29 (2013). 

[7] Samoilenko, A. and Yasseri, T.   The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics. arXiv:1310. 8508 (2013).

[8] Alonso, S., Cabrerizo, F.J., Herrera-Viedma, E., and Herrera, F.   h-index: a review focused in its variants, computation and standardization for different scientific fields. Journal of Informetrics, 3(4), 273-289 (2009). doi:10.1016/j.joi.2009.04.001.

[9] MacRoberts, M.H. and MacRoberts, B.R.   A Re-Evaluation of Lotka's Law of Scientific Productivity. Social Studies of Science, 12(3), 443-450 (1982).

November 30, 2013

New Papers, Old Papers, and Re-convolved Concepts, November edition

I have been busy the past several months fleshing out new ideas and finishing up older ones. The first paper profiled here is "Cellular decision-making bias: the missing ingredient in cell functional diversity", something I published on arXiv [1] last month. This paper is a computational-oriented derivative of the paper "Defining phenotypic respecification diversity using multiple cell lines and reprogramming regimens", published earlier this year in Stem Cells and Development [2].

In [2], it was demonstrated that a series of different cell lines of the same type (e.g. fibroblast) exhibit great variability (many-fold differences) in terms of their direct cellular reprogramming efficiency. The efficiency of this process was measured using phenotypic (e.g. immunocytochemical) assays. This may or may not be due to the underlying genomic processes. Using a limited set of assays analyzed by means of differential gene expression, no smoking gun was found. While we did not investigate candidate epigenetic markers, the phenotypic trend was nevertheless consistent for both human and mouse cells reprogrammed to both generic muscle fiber and generic dopaminergic neurons [3].

The data collected and analyzed here also sets up a series of computational investigations using a method derived from Signal Detection Theory (SDT) and other signal-to-noise characterization methods [4]. SDT is generally used to understand cognitive decision-making in humans and animals. However, decision-making theory has also been used to explain outcomes at the cellular and molecular level, particularly switch-like processes [5]. Using the standard SDT as inspiration, I propose in [1] that cellular and molecular processes can be characterized and analyzed using a technique called cellular SDT.

Major collaborator on the Stem Cells and Development paper [2]: Dr. Steven Suhr, Michigan State University. 

Cellular SDT can uncover something called decision-making bias, which is hypothesized to occur during the conversion of cells from one phenotype to another [3]. In this case, the term bias refers to the magnitude of difference in conversion efficiency for the same cell line given two distinct stimuli. The overarching assumption is that differences observed across different small-scale stimuli (e.g. forced transcription factor activity) can be characterized systematically within and between specific cell types and lines.

My talk to the BEACON Center in May 2013. The first part (YouTube video) focused on modeling diversity in cellular reprogramming (an early version of cellular decision-making bias).

Here is the abstract of the paper. Associated code (on Github) can be found here:
"Cell functional diversity is a significant determinant on how biological processes unfold. Most accounts of diversity involve a search for sequence or expression differences. Perhaps there are more subtle mechanisms at work. Using the metaphor of information processing and decision-making might provide a clearer view of these subtleties. Understanding adaptive and transformative processes (such as cellular reprogramming) as a series of simple decisions allows us to use a technique called cellular signal detection theory (cellular SDT) to detect potential bias in mechanisms that favor one outcome over another. We can apply method of detecting cellular reprogramming bias to cellular reprogramming and other complex molecular processes. To demonstrate the scope of this method, we will critically examine differences between cell phenotypes reprogrammed to muscle fiber and neuron phenotypes. In cases where the signature of phenotypic bias is cryptic, signatures of genomic bias (pre-existing and induced) may provide an alternative. The examination of these alternates will be explored using data from a series of fibroblast cell lines before cellular reprogramming (pre-existing) and differences between fractions of cellular RNA for individual genes after drug treatment (induced). In conclusion, the usefulness and limitations of this method and associated analogies will be discussed."

The second paper profiled here is called "A Semi-automated Peer-review System", a short paper I published on the arXiv earlier this month [6]. The idea of an automated peer review system came to me after preparing a blog post [7] and reading a paper on the most common degree of novelty found among highly influential scientific papers [8]. The paper provides an outline of a human-assisted adaptive algorithm that detects fraud in a set of scientific papers without also filtering out innovative but highly-novel work. As in the case in [1], the approach was based on signal detection theory (SDT). In this case, however, a more conventional application (e.g. standard ROC curves) is used to minimize the number of truly low quality and fraudulent manuscripts while maintaining diversity and novelty in the scientific literature.

Here is the abstract and here is the associated code (mostly pseudo-code) on Github:
"A semi-supervised model of peer review is introduced that is intended to overcome the bias and incompleteness of traditional peer review. Traditional approaches are reliant on human biases, while consensus decision-making is constrained by sparse information. Here, the architecture for one potential improvement (a semi-supervised, human-assisted classifier) to the traditional approach will be introduced and evaluated. To evaluate the potential advantages of such a system, hypothetical receiver operating characteristic (ROC) curves for both approaches will be assessed. This will provide more specific indications of how automation would be beneficial in the manuscript evaluation process. In conclusion, the implications for such a system on measurements of scientific impact and improving the quality of open submission repositories will be discussed". 

Finally, I am giving a presentation at the Network Frontiers Workshop at Northwestern University's NICO Institute on the 4th of December. The title of the talk is "From Switches to Convolution to Tangled Webs: evolving sub-optimal, subtle biological mechanisms". The work is an extension of my arXiv paper from 2011 [9] on Biological Rube Goldberg Machines (RGMs), something I also refer to as a convolution architecture. Here is the abstract and here is the associated code on Github:
"One way to understand complexity in biological networks is to isolate simple motifs like switches and bi-fans. However, this does not fully capture the outcomes of evolutionary processes. In this talk, I will introduce a class of process model called convolution architectures. These models demonstrate bricolage and ad-hoc formation of new mechanisms atop existing complexity. Unlike simple motifs (e.g. straightforward mechanisms), these models are intended to demonstrate how evolution can produce complex processes that operate in a sub-optimal fashion. The concept of convolution architectures can be extended to complex network topologies. Simple convolution architectures with evolutionary constraints and subject to natural selection can produce step lengths that deviate from optimal expectation. When convolution architectures are represented as components of bidirectional complex network topologies, these circuitous paths should become “spaghetti-fied”, as they are not explicitly constrained by inputs and outputs. This may also allow for itinerant and cyclic self-regulation resembling chaotic dynamics. The use of complex network topologies also allows us to better understand how higher-level constraints (e.g. hub formation, modularity, preferential attachment) affect the evolution of sub-optimality and subtlety. Such embedded convolution architectures are also useful for modeling physiological, economic, and social complexity". 

And last but not least, a new preprint server has come online called BioRxiv. BioRxiv (administered by Cold Spring Harbor Laboratory) accepts manuscripts from a number of biological disciplines, from Bioinformatics to Molecular Biology to Zoology. I kicked things off in the Zoology category with an older manuscript (originally presented at a conference in 2006) entitled "Filling up the Tree: considering the self-organization of avian roosting behavior" [10]. However, for more theoretical and interdisciplinary work such as the paper in [11], I still plan on using arXiv.


[1] Alicea, B.   Cellular decision-making bias: the missing ingredient in cell functional diversity. arXiv repository, arXiv: 1310:8268 [q-bio.QM] (2013).

[2] Alicea, B., Murthy, S., Keaton, S.A., Cobbett, P., Cibelli, J.B., and Suhr, S.T.   Defining phenotypic
respecification diversity using multiple cell lines and reprogramming regimens. Stem Cells and Development, 22(19), 2641-2654 (2013).

[3]  In this example, conversion refers to direct cellular reprogramming technique (e.g. the creation of iPS cells) that result in the creation of induced neural cells (iNCs) and induced skeletal muscle cells (iSMCs). However, conversion could also refer to carcinogenesis or developmental processes.

Figure 1 from Alicea (2013). Frames A-D, immunocytochemical characterization of iNCs and iSMCs. Frames E-H, diversity in reprogramming efficiency for a range of cell lines.

[4] Schultz, S.R.   Signal-to-noise ratio in neuroscience. Scholarpedia, 2(6), 2046 (2007).

[5] Balazsi, G., van Oudenaarden, A., and Collins, J.J.   Cellular Decision-Making and Biological Noise: From Microbes to Mammals. Cell, 144(6), 910–925 (2011). 

[6] Alicea, B.   A Semi-automated Peer-review System. arXiv: 1311.2504 [cs.DL, cs.HC, cs.SI, physics.soc-ph] (2013).

[7] Alicea, B.   The Novelty-Consensus Dampening.   Synthetic Daisies blog, October 22 (2013). 

[8] Uzzi, B., Mukherjee, S., Stringer, M., and Jones, B.   Atypical Combinations and Scientific Impact. Science, 342, 468-472 (2013).

[9] Alicea,  B.   The ‘Machinery’ of  Biocomplexity:  understanding  non-optimal  architectures  in biological systems. arXiv repository, arXiv: 1104.3559 [nlin.AO, q-bio.QM, q-bio.PE] (2011).

[10] Alicea, B.   Filling up the Tree: considering the self-organization of avian roosting behavior. bioRxiv, doi:10.1101/000349 (2013).

[11] Alicea, B.   The Emergence of Animal Social Complexity: theoretical and biobehavioral evidence. arXiv repository, arxiv:1309.7990 [q-bio.PR, q-bio.NC] (2013).