September 30, 2014

Flash Lecture for "Toy Models for Macroevolution"

Last summer, I did a series of "flash lectures" on Human Augmentation on my Tumblr site, which was subsequently cross-posted to Synthetic Daisies [1]. Flash lectures are short, 5 minute lectures that are either multimedia-rich or presented in the form of a very quick summary. Sometimes they meld two or three disparate topics together around a single theme. In this post, I have chosen to present a new Biosystems paper from myself and Richard Gordon in such a format [2].

The first part of the paper introduces the toy model as a unified concept. From a writing perspective, this was the most challenging part of the paper, as we re-interpreted a diverse set of biological and evolutionary models. Some of these models are more traditional (e.g. Hardy-Weinberg and fitness landscapes), while others are more novel (e.g. coupled avalanches/evolutionary dynamics and self-organized adaptive change). 

In the end, however, we were able to define toy models as set of tools that summarize, represent, and allow for a prepared description of evolutionary change. Think of this strategy as a short-cut to complexity. While biological systems are extremely complex, we can nonetheless find much more compact and lower-dimensional representations that aid us in extracting useful patterns and trends.

We present 13 different toy models, loosely grouped into three functional categories. These include: the dynamic aspects of evolution, the hereditary aspects of evolution, and the adaptive and conserved features of populations in a small number of dimensions. Aside from these categories, there are three archetypes of toy model: hybrid, classical, and heuristic/phenomenological (see the slide above for details). 

Toy models are not simply theoretical concerns. They can be used in tandem with statistical tools to provide a sort of deep context to an analysis. This is particularly important in terms of big data-type analyses (e.g. high-throughput sequence data). Yet such context can also be useful when multiple types of data are used in the same analysis. The kinds of debates that surround gene trees and species trees will likely be replicated for studies that involve genome sequences vs. transcriptional patterns, or genomic and behavioral analyses. Combining several different toy models to act as "filters" of datasets that can aid in how these data are understood.

For supplemental information, there is also an emerging Github repository which will feature code for many of the toy models presented in this paper plus additional models.


[2] The paper was previously mentioned as a standalone publication, but now the paper has been reassigned to a special issue of Biosystems called "Patterns of Evolution".

September 28, 2014

Ten to the Fifth!

Congratulations! Synthetic Daisies has just garnered 100,000 all-time views. Just two years since the 20,000 mark, and a bit less than a year after 50,000 readers. Daily readership has been increasing steadily from the early years of the blog (2009-2011). Below is an analysis of the top 10 posts by readership and their frequency of readership.

September 23, 2014

Frontiers in Bioengineering Conference: The Farthest Out in Front

Earlier this month, I attended the Frontiers in Bioengineering conference at the Beckman Institute (UIUC). It was an interesting event with diverse perspectives on the field of Bioengineering. The two-day event featured researchers from all over the world, but focused on the synergies between Engineering and Cell/Molecular Biology.

One way to summarize the proceedings is to discuss the three most interesting (at least to me) technological advances. So here they are, in no particular order:

1) tissue co-cultures: Roger Kamm and Kevin Healey discussed the use of co-cultures to synthetically grow new organs and/or repair scaffolds. Co-cultures [1] are ex vivo systems within which multiple cell types are established and grown in media. The benefit of this artificial system involves the endogenous production of growth factors and a microenvironment. Exogenously-delivered factors apparently do not have the same efficacy for applications such as nerve grafts and cardiac repair. In the case of nerve grafts, supplying a bona-fide microenvironment can increase the distance of nerve innervation across a denervated gap. Co-cultures can also provide target tissues and anterograde cells to better approximate neuronal communication.

2) SHAPEseq: this an up-and-coming technique that has been used by a number of research groups to sequence the secondary structure of RNA [2]. SHAPEseq, or Selective 2'-hydroxyl acylation analyzed by primer extension, involves several steps that are similar to or go beyond the basic RNAseq technology. These include: preparing a barcoded RNA library, preparing a structure-specific cDNA library, aligning the corresponding reads, and calculating shape reactivities [3]. As with RNAseq, the objective is to build sequence libraries. Unlike with RNAseq, these libraries are structure-dependent. This allows for important structural information (e.g. hairpins, loops) to be estimated from a sample with single-nucleotide resolution.

A graphical summary of the SHAPEseq protocol. COURTESY: protocol description in [3].

3) NiN (nonviral induced neuronal) cells: this is a technique that was presented by Kam Leong at Duke. The idea is to use a non-viral genetic engineering approach (such as CRISPR) to introduce reprogramming factors into a cell. Non-viral factor delivery, as opposed to viral-mediated delivery using polycistronic vectors (genetic elements), is supposedly safer for transplantation and other therapeutic uses [4]. Other non-viral techniques (such as RNA-mediated reprogramming) have been tried with a mixed record of success. But by using the gene editing method [5], a cell population can be reprogrammed to a level of efficiency approximating viral-mediated reprogramming techniques. Despite various issues with estimating reprogramming efficiency and diversity across source cells [6], NiN techniques might be a easy and relatively controllable way to produce highly-specialized types of induced Neurons.

Honorable mention by association: The technology enabling the NiN advance is called CRISPR, or clustered, regularly interspaced, short palindromic repeat technology [7]. By using RNA-guided nucleases such as members of the Cas protein family (Cas9 in particular) [8], CRISPR technology can enable precise targeting of gene regulation. This includes the introduction and control of transgenes, something for which CRISPR has a lot of potential. To be fair, there are other, similar methods such as Transcription Activator-Like Effector Nucleases (TALENs) and Zinc Finger Nucleases (ZFNs) [9]. So congratulations to all of our gene editing technologies as we look to the future.

A diagram of the Cas-mediated CRISPR protocol. COURTESY: James Atmos, Wikipedia.

[1] For examples, please see the following articles:
a) Paschos, N.K., Brown, W.E., Eswaramoorthy, R., Hu, J.C., and Athanasiou, K.A.   Advances in tissue engineering through stem cell-based co-culture. Journal of Tissue Engineering and Regenerative Medicine, doi:10.1002/term.1870 (2014) AND

b) Ma, J., Both, S.K., Yang, F., Cui, F-Z., Pan, J., Meijer, G.J., Jansen, J.A., and van den Beucken, J.J.J.P.   Cell-Based Strategies in Bone Tissue Engineering and Regenerative Medicine. Stem Cells and Translational Medicine, sctm.2013-0126 (2013).

c) Meijer, G.J., de Bruijn, J.D., Koole, R., van Blitterswijk, C.A.   Cell-Based Bone Tissue Engineering. PLoS Medicine, 4(2), e9. doi:10.1371/journal.pmed.0040009 (2007).

[2] For examples, please see the following articles:

a) Lucks, J.B., Mortimer, S.A., Trapnell, C., Luof, S., Aviran, S., Schroth, G.P., Pachter, L., Doudna, J.A., and Arkin, A.P.   Multiplexed RNA structure characterization with selective 2′-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). PNAS, 108(27), 11063-11068 (2011)

b) Steen, K.A., Malhotra, A., Weeks, K.M.   Selective 2'-hydroxyl acylation analyzed by protection from exoribonuclease. Journal of the American Chemical Society, 132(29), 9940-9943 (2010).

[3] Mortimer, S.A., Trapnell, C., Aviran, C., Pachter, L., and Lucks, J.B.   SHAPE–Seq: High‐Throughput RNA Structure Analysis. Current Protocols in Chemical Biology, 10.1002/
9780470559277.ch120019 (2012).

[4] Park, H.J., Shin, J., Kim, J., and Cho, S.W.   Nonviral delivery for reprogramming to pluripotency and differentiation. Archives of Pharmacology Research, 37(1), 107-119 (2014).

[5] Perez-Pinera, P., Kocak, D.D., Vockley, C.M., Adler, A.F., Kabadi, A.M., Polstein, L.R., Thakore, P.I., Glass, K.A., Ousterout, D.G., Leong, K.W., Guilak, F., Crawford, G.E., Reddy, T.E., and Gersbach, C.A.   RNA-guided gene activation by CRISPR-Cas9–based transcription factors. Nature Methods, 10, 973-976 (2013).

[6] Alicea, B., Murthy, S., Keaton, S.A., Cobbett, P., Cibelli, J.B., and Suhr, S.T.   Defining phenotypic respecification diversity using multiple cell lines and reprogramming regimens. Stem Cells and Development, 22(19), 2641-2654.

[7] Hsu, P.D., Lander, E.S., and Zhang, F.   Development and Applications of CRISPR-Cas9 for Genome Engineering. Cell, 157(6), 1262-1278 (2014).

[8] Sander, J.D. and Joung, J.K.   CRISPR-Cas systems for editing, regulating and targeting genomes. Nature Biotechnology, 32, 347-355 (2014).

[9] Gaj, T., Gersbach, C.A., and Barbas, C.F.   ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering. Trends in Bioengineering, 31(7), 397-405 (2013).

September 17, 2014

Heuristic Haystacks and the Messy Lesson

As an admitted and self-styled parsimony skeptic, I was interested to see a discussion in the blogosphere on the seductive allure of simple explanations [1]. This was in the context of economic policy and decision-making, with Paul Krugman even offering an H.L. Mencken quote: "For every complex problem there is an answer that is clear, simple, and wrong" [2]. Yet while parsimony was never brought up, I suspect that hypotheses and arguments related to the efficient markets hypothesis were always somewhat in mind.

There are, of course, broader parallels between seductive simplicity and parsimony. As I have pointed out before, I find parsimony to be a overly-seductive null model [3]. The simplest explanation often leads us not to the truth, but to what is most conceptually consistent. In some cases (where theory is well-established) this works out well. Intuition in support of serendipity and serendipity in support of discovery is an unassuming (and often underplayed) pillar of science [4]. However, in cases where our intuitions get in the way of objective analysis, this becomes problematic. And this seeming exception is actually quite common. In a related manner, this brings up an interesting problem of the relationship between parsimony as a decision-making criterion and the epistomology of a scientific phenomenon.

An appalling lack of faith in both Occam's and Einstein's worldviews. More horrifying details in my Ignite! talk on the topic.

This relationship, or more accurately inconsistency, is due to argumentatively-influenced judgments on a naturalistic search space. Even in children, it is observed that argumentation is rife with confirmation bias and logically arguing to absurd positions [5]. While argumentation allows us to build hypotheses, it also gets us stuck in a conceptual minimum (my own ad-hoc phrase). In a previous post, I pointed to recent work on how belief systems and associated systems of argumentation can shape our perception of reality. But, of course, this cannot will the natural world to our liking. In fact, it often serves to muddy the conceptual and theoretical waters [6]. Therefore, you often have a conceptual gap unrelated to problem incompleteness which we will flesh out in the rest of this post.

The first point to be made here is that such an inconsistency introduces two biases that shape how we think about the simplest explanation, and more generally about what is optimal. First of all, can we even find the true simplest explanation? Perhaps the simplest possible statement that can be constructed cannot capture the true complexity of a given situation. This is particularly true when there are competing dimensions (or layers or levels) of complexity. Secondly, and particularly in the face of complexity, simplicity can often be a foil to deep understanding. Unfortunately, this is often conceptualized of and practiced upon in a destructive way, favoring simple and homogeneous mental models over more subtle ones.

How to dream of complex sheep....

In the parlance of decision-making theory, parsimony is consistent with the notion of good-enough heuristics. In the work of Gigerenzer [7], such heuristics are claimed to be nearly optimal when compared to formal analysis of a problem. This can also be seen with statistical prediction rules that outperform human judgments in a number of everyday contexts [8]. But is this a statement of problem "wickedness", or a statement of superiority with respect to human cognition? When compared to problems that require needle in a haystack criteria, fast and frugal heuristics (and hence parsimony) is severely lacking.

So complexity introduces a secondary bias at best and serves as a severe limitation to achieving parsimony at worst. One might expect that experimentally verifying a prediction made in conjunction with Occam's Razor requires finding an exact analytical solution. Finding this proverbial "needle in a haystack" requires both a multi-criterion, algorithmically-friendly heuristic solution in addition to a formal strategy that often defies intuition. Seemingly, the simple solution cannot keep up.

I found it! It was quick, but I was also quite lucky,

[1] The Simplicity Paradox. Stumbling and Mumbling blog, September 9 (2014) AND Krugman, P.   Simply Unacceptable. The Conscience of a Liberal blog, September 5 (2014).

[2] This is not to equate parsimony with methodological snake oil -- in fact, I am arguing quite the opposite. But I am merely pointing out that parsimony is an incomplete hypothesis for acquiring knowledge.

[3] For more, please see this Synthetic Daisies post: Alicea, B.   Argument from Non-Optimality: what does it mean to be optimal? Synthetic Daisies blog, July 28 (2013).

[4] Kantorovich, A.   Scientific Discovery: Logic and Tinkering. SUNY Press, Albany (1993).

[5] I say "even" in children even though the latter (logically arguing to absurd conclusions) is often expected from children. But we see these things in adults as well, and such is the point of argumentation theory. For more, please see: Mercier, H.   Reasoning Serves Argumentation in Children. Cognitive Development, 26(3), 177–191 (2011).

[6] Wolchover, N.   Is Nature Unnatural? Quanta Magazine, May 24 (2013).

[7] While there are likely other (and perhaps better) examples, I am using a reference cited in [1]: Gigerenzer, G.   Bounded and Rational. In "Contemporary Debates in Cognitive Science", R.J. Stainton eds. Blackwell, Oxford, UK (2006).

[8] lukeprog   Statistical Prediction Rules Out-Perform Expert Human Judgments. LessWrong blog, January 18 (2011).

September 10, 2014

Upcoming DevoWorm talk to the OpenWorm group

This Friday (9/12) at 9am PDT, I will be presenting a talk to the OpenWorm consortium Journal Club on the DevoWorm project. For those of you who are unfamiliar, DevoWorm is a collaborative attempt to simulate and theoretically re-interpret C. elegans development.

Cover slide with a list of the DevoWorm collaborators, circa September 2014.

The structure of the talk will loosely follow the white paper, with some additional theoretical and translational information. We are also trying to organize/raise money for a "science hackathon", which would greatly improve the state of the project [1].

 An explanation of a scientific hackathon (sensu DevoWorm Collaboration).

The talk will also deal with the issue of whole-organism emulation. In this case, we are using a sparse representation of the organism to model developmental processes. The key is to balance tractability with biological realism. Sparko the Robotic Dog and the EPFL's Human Brain Project are used as examples.

We also discuss the potential usefulness of C. elegans emulations to biological problems. One problem we identified was the need to emulate and identify the precursors and mechanisms of phenotypic mutants. While our discussion of this will be limited to only a few slides, DevoWorm has the potential to model the possibility space of phenotypic mutants and perhaps even suggest developmental precursors to phenotypic mutations. 

If you are interested in attending, here is the Google Hangouts link. Look forward to a good presentation.

UPDATE 9/12:
The talk went very well. We also changed the name to "DevoWorm: raising the (Open)Worm". Lots of discussion about the potential for future collaboration and the regenerative capacity of C. elegans (or lack thereof). The talk was recorded to YouTube, and the link is here.

[1] Improvements largely involve physically bringing the group together, solving some problems related to data analysis, and perhaps even planning out additional data collection. Apparently, the term "hackathon" has a rather broad definition. But if you are interested in participating/helping to facilitate this, please contact me.