Synthetic Daisies: general-theory

Showing posts with label general-theory. Show all posts

January 22, 2021

OREL and DevoWorm Review of 2020

As part of our preparations for the New Year, I prepared a set of presentations for my two research groups: Representational Brains and Phenotypes and DevoWorm. I have posted the slides below, and if you see something interesting that you would like to participate in, please contact me. If you are interested in learning more, please join the Orthogonal Research and Education Lab or OpenWorm Slack. You can also attend our weekly meetings: 3pm UTC Saturdays for Saturday Morning NeuroSim (more info), or 3pm UTC Mondays for DevoWorm (more info).

Saturday Morning NeuroSim presentation, with a focus on the Representational Brains and Phenotypes group (click slides to enlarge).

DevoWorm weekly meeting presentation, with a focus on the DevoWorm group (click slides to enlarge).

March 5, 2020

Open Data Day 2020

Welcome to Open Data Day 2020! Sponsored by the Orthogonal Research and Education Laboratory. Our activities start today, and will continue over the course of the next year. For this iteration of Open Data Day, we are looking for software developers, data scientists, statisticians, and quantitative biologists to work on a host of issues related to open data-related activities in the DevoWorm group. Listed below are a series are series of possible goals for the next year.

1) We would like to construct pseudo-data sets for theory-building and modeling. This involves establishing simulated and resampled data sets that can be used as the input to machine learning, statistical, and functional models. Examples of these would include numeric data generated using statistical distributions, a generative approach using selected features (cells) as inputs, or the energy potentials of kinetic processes in an embryo.

2) There is also a need to build towards metadata standards, particularly with respect to the integration of different data types. Metadata helpful to the DevoWorm group includes (but is not limited to) cell division timing, high-level descriptions, positional and geometric information, and other features. The development of metadata repositories according to a schema data structure would be helpful.

3) Also needed is a focus on DevoZoo maintenance, including the addition of datasets, the integration of data sets, and improvements in presentation style/interface design. Since last year's launch, the resources for each species or computational platform have become outdated. We would not only like to provide links to resources such as new data sets and gene expression atlases, but also provide access to “intermediate” resources such as ontologies, metadata, and models from other research groups. There is also a further desire to make DevoZoo sustainable.

Current iteration of DevoZoo (click to enlarge).

4) As an initiative farther off into the future, we would like to add semantic capabilities to our models and data sets. One such example is a “controlled vocabulary” for developmental microscopy images and molecular data. In concert with this, having the capability to attach meanings and other notes to image and simulation features would increase the interpretability of such data.

5) In conjunction with the Data Reuse Initiative, we would like to provide some application of the FAIR principles. FAIR stands for making data findable, accessible, interoperable, and reusable. There are two opportunities here: a FAIRness evaluation, or how to make data FAIR, and promotion of each component of FAIR. For example, making datasets on DevoZoo more findable by adding tags or other classification tools would help newcomers make the most of our resource.

February 16, 2019

Darwin meets Category Theory in the Tangential Space

For this Darwin Day (February 12), I would like to highlight the relationship between evolution by natural selection and something called category theory. While this post will be rather tangential to Darwin's work itself, it should be good food for thought with respect to evolutionary research. As we will see, category theory also has relevance to many types of functional and temporal systems (including those shaped by natural selection) [1], which is key to understanding how natural selection shapes individual phenotypes and populations more generally.

This isn't the last you'll hear from me in this post!

Category Theory originated in the applied mathematics community, particularly the "General Theory of Natural Equivalence" [2]. In many ways, category theory is familiar to those with conceptual knowledge of set theory. Uniquely, category theory deals with the classification of objects and their transformations between mappings. However, category theory is far more powerful than set theory, and serves as a bridge to formal logic, systems theory, and classification.

A category is defined by two basic components: objects and morphisms. An example of objects are a collection of interrelated variables or discrete states. Morphisms are things that link objects together, either structurally or functionally. This provides us with a network of paths between objects that can be analyzed using categorical logic. This allows us to define a composition (or path) by tracing through the set of objects and morphisms (so-called diagram chasing) to find a solution.

In this example, a pie recipe is represented as a category with objects (action steps) and morphisms (ingredients and results). This monoidal preorder can be added to as the recipe changes. From [3]. Click to enlarge.

Categories can also consist of classes: classes of objects might include all objects in the category, while classes of morphism include all relational information such as pathways and mappings. Groupoids are functional descriptions, and allow us to represent generalizations of group actions and equivalence relations. These modeling-friendly descriptions of a discrete dynamic system is quite similar to object-oriented programming (OOP) [4]. One biologically-oriented application of category theory can be found in the work of Robert Rosen, particularly topics such as relational biology and anticipatory systems.

Animal taxonomy according to category theory. This example focuses on exploring existing classifications, from species to kingdom. The formation of a tree from a single set of objects and morphisms is called a preorder. From [3]. Click to enlarge.

One potential application of this theory to evolution by natural selection is to establish an alternate view of phylogenetic relationships. By combining category theory with feature selection techniques, it may be possible to detect natural classes that correspond to common ancestry. Related to the discovery of evolutionary-salient features is the problem of phylogenetic scale [5], or hard-to-interpret changes occurring over multiple evolutionary timescales. Category theory might allow us to clarify these trends, particularly as they relate to evolving life embedded in ecosystems [6] or shaped by autopoiesis [7].

More relevant to physiological systems that are shaped by evolution are gene regulatory networks (GRNs). While GRNs can be characterized without the use of category theory, they also present an opportunity to produce an evolutionarily-relevant heteromorphic mapping [8]. While a single GRN structure can have multiple types of outputs, multiple GRN structures can also give rise to the same or similar output [8, 9]. As with previous examples, category theory might help us characterize these otherwise super-complex phenomena (and "wicked" problems) into well-composed systems-level representations.

NOTES:
[1] Spivak, D.I. (2014). Category theory for the sciences. MIT Press, Cambridge, MA.

[2] Eilenberg, S. and MacLane, S. (1945). General theory of natural equivalences. Transactions of the American Mathematical Society, 58, 231-294. doi:10.1090/S0002-9947-1945-0013131-6

[3] Fong, B. and Spivak, D.I. (2018). Seven Sketches in Compositionality: an invitation to applied category theory. arXiv, 1803:05316.

[4] Stepanov, A. and McJones, P. (2009). Elements of Programming. Addison-Wesley Professional.

[5] Graham, C.H., Storch, D., and Machac, A. (2018). Phylogenetic scale in ecology and
evolution. Global Ecology and Biogeography, doi:10.1111/geb.12686.

[6] Kalmykov, V.L. (2012). Generalized Theory of Life. Nature Precedings, 10101/npre.2012.7108.1.

[7] Letelier, J.C., Marin, G., and Mpodozis, J. (2003). Autopoietic and (M,R) systems. Journal of Theoretical Biology, 222(2), 261-272. doi:10.1016/S0022-5193(03)00034-1.

[8] Payne, J.L. and Wagner, A. (2013). Constraint and contingency in multifunctional gene regulatory circuits. PLoS Computational Biology, 9(6), e1003071. doi:10.1371/journal.pcbi.1003071.

[9] Ahnert, S.E. and Fink, T.M.A. (2016). Form and function in gene regulatory networks: the structure of network motifs determines fundamental properties of their dynamical state space. Journal of the Royal Society Interface, 13(120), 20160179. doi:10.1098/rsif.2016.0179.

March 3, 2018

Open Data Day 2018: Orthogonal Research Version

Time once again for International Open Data Day, an annual event hosted by organizations all around the world. For the Orthogonal Research contribution, I am sharing a presentation on the role of theory in data science (and the analysis of open data).

Full set of slides are available on Figshare, doi:10.6084/m9.figshare.5483746

A theory of data goes back to before there were concepts such as "big data" or "open data". In fact, we can learn a lot from attempts to characterize regularities in scientific phenomena, particularly in the behavioral sciences (e.g. Psychophysics).

There are a number of ways to build a mini-theory, but one advantage of the approach we are working on is that (assuming partial information about the data being analyzed) a theoretical model can be built with very limited amounts of data. I did not mention the role of non-empirical reasoning [1] in the theory-building, but might be an important issue for future consideration.

The act of theory-building is also creating generalized models of pattern interpretation. In this case, our mini-theory detects sheep-shaped arrays. But there are bottom-up and top-down assumptions that go into this recognition, and theory-building is a way to make those explicit.

Naive theories are a particular mode of error in theory-building from sparse or incomplete data. In the case of human reasoning, naive theories result from generalization based on limited empirical observation and blind inference of mechanism. They are characterized in the Cognitive Science literature as being based on implicit and non-domain-specific knowledge [2].

Taken together, mini-theories and naive theories can help us not only better characterize unlabeled and sparsely labelled data, but also gain an appreciation for local features in the dataset. In some cases, naive theory-building might be beneficial for enabling feature engineering, ontologies/metadata [3] and other characteristics of the data.

In terms of usefulness, theory-building in data science lies somewhere in between mathematical discovery programs and epistemological models.

NOTES:

[1] Dawid, R. (2013). Novel Confirmation and the Underdetermination of Scientific Theory Building. PhilSci Archive.

[2] Gelman, S.A., Noles, N.S. (2011). Domains and naive theories. WIREs Cognitive Science, 2, 490–502. doi:10.1002/wcs.124

[3] Rzhetsky, A., Evans, J.A. (2011). War of Ontology Worlds: Mathematics, Computer Code, or Esperanto? PLoS Computational Biology, 7(9), e1002191. doi:10.1371/journal.pcbi.1002191

June 18, 2017

Loose Ends Tied, Interdisciplinarity, and Consilience

LEFT: A network of scientific disciplines and concepts built from clickstream data. RIGHT: Science mapping based on relationships among a large database of publications. COURTESY: Figure 5 in [1] (left) and SciTech Strategies (right).

Having a diverse background in a number of fields, I have been quite interested in how people from different disciplines converge (or do not converge) upon similar findings. Given that disciplines are often methodologically distinct communities [2], it is encouraging when multiple disciplines can exhibit consilience [3] in attacking the same problem. For me, it is encouraging because it supports the notion that the phenomena we study are derived from deep principles consistent with a grand theorizing [4]. And we can see this is areas of inquiry such as learning and memory, with potential relevance to a wide variety of disciplines (e.g. cognitive psychology, history, cell biology) and the emergence of common themes according to various definitions of the phenomenon.

Maximum spanning tree of disciplinary interactions based on the Physics and Astronomy Classification Scheme (PACS). COURTESY: Figure 5 in [5].

The ability to converge upon a common set of findings may be an important part of establishing and maintaining coherent multidisciplinary communities. Porter and Rafols [6] have examined the growth of interdisciplinary citations as a proxy for increasing interdisciplinarity. Interdisciplinary citations tend to be less common than within-discipline citations, while also favoring linkages between closely-aligned topical fields. Perhaps consilience also relies upon the completeness of literature inclusion for people from different disciplines in an interdisciplinary context. Another recent paper [7] suggests that more complete literature citation might lead to better interdisciplinary science and perhaps ultimately consilience. This of course depends on whether the set of evidence itself is actually convergent or divergent, and what it means for concepts to be coherent. In the interest of not getting any more abstract and esoteric, I will leave the notion of coherence for another post.

NOTES:
[1] Bollen, J., Van de Sompel, H., Hagberg, A., Bettencourt, L., Chute, R., Rodriguez, M.A., and Balakireva, L. (2009). Clickstream Data Yields High-Resolution Maps of Science. PLoS One, 4(3), e4803. doi:10.1371/journal.pone.0004803.

[2] Osborne, P. (2015). Problematizing Disciplinarity, Transdisciplinary Problematics. Theory, Culture, and Society, 32(5-6), 3–35.

[3] Wilson, E.O. (1998). Consilience: the unity of knowledge. Random House, New York.

[4] Weinberg, S. (1993). Dreams of a Final Theory: the scientist's search for the ultimate laws of nature. Vintage Books, New York.

[5] Pan, R.J., Sinha, S., Kaski, K., and Saramaki, J. (2012). The evolution of interdisciplinarity in physics research. Scientific Reports, 2, 551. doi:10.1038/srep00551.

[6] Porter, A.L. and Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81, 719.

[7] Estrada, E. (2017). The other fields also exist. Journal of Complex Networks, 5(3), 335-336.

August 19, 2016

From Toy Models to Quantifying Mosaic Development

Time travel in the Terminator metaverse. COURTESY: Michael Talley.

Almost two years ago, Richard Gordon and I published a paper in the journal Biosystems called "Toy Models for Macroevolutionary Patterns and Trends" [1]. Now, almost exactly two years later [2], we have published a second paper (not quite a follow-up) called "Quantifying Mosaic Development: towards an evo-devo postmodern synthesis of the evolution of development via differentiation trees of embryos". While the title is quite long, the approach can be best described as computational/ statistical evolution of development (evo-devo).

Sketch of a generic differentiation tree, which figures prominently in our theoretical synthesis and analysis. COURTESY: Dr. Richard Gordon.

This paper is part of a special issue in the journal Biology called "Beyond the Modern Evolutionary Synthesis- what have we missed?" and a product of the DevoWorm project. The paper itself is a hybrid theoretical synthesis/research report, and introduces a variety of comparative statistical and computational techniques [3] that are used to analyze quantitative spatial and temporal datasets representing early embryogenesis. Part of this approach was previewed in our most recent public lecture to the OpenWorm Foundation.

The comparative data analysis involves investigations within and between two species from different parts of the tree of life: Caenorhabditis elegans (Nematode, invertebrate) and Ciona intestinalis (Tunicate, chordate). The main comparison involves different instances of early mosaic development, or a developmental process that is deterministic with respect to cellular fate. We also reference data from the regulative developing Axolotl (Amphibian, vertebrate) in one of the analyses. All of the analyses involve the reuse and analysis of secondary data, which is becoming an important part of the scientific process for many research groups.

One of the techniques featured in the paper is an information-theoretic technique called information isometry [4]. This method was developed within the DevoWorm group, and uses a mathematical representation called an isometric graph to visualize cell lineages organized in different ways (e.g. a lineage tree vs. a differentiation tree). This method is summarized and validated in our paper "Information Isometry Technique Reveals Organizational Features in Developmental Cell Lineages" [4]. Briefly, each level of the cell lineage is represented as an isoline, which contains points of a specific Hamming distance. The Hamming distance is the distance between that particular cell in two alternative cell lineage orderings (the forementioned lineage and differentiation trees).

An example of an isometric graph from Caenorhabditis elegans, taken from Figure 12 in [5]. The position of a point representing a cell is based on the depth of its node in the cell lineage. The positions of all points are rotated 45 degrees clockwise from a bottom-to-top differentiation tree (in this case) ordering, where the one-cell stage is at the bottom of the graph.

A final word on the new Biology paper as it related to the use of references. Recently, I ran across a paper called "The Memory of Science: Inflation, Myopia, and the Knowledge Network" [6], which introduced me to the statistical definition of citation age. This inspired me to calculate the citation age of all journal references from three papers: Toy Models, Quantifying Mosaic Development, and a Nature Reviews Neuroscience paper from Bohil, Alicea (me), and Biocca, published in 2011. This was used as an analytical control -- as it is a review, it should contain papers which are older than the contemporary literature. Here are the age distributions for all three papers.

Distribution of Citation Ages from "Toy Models for Macroevolutionary Patterns and Trends" (circa 2014).

Distribution of Citation Ages from "Quantifying Mosaic Development: Towards an Evo-Devo Postmodern Synthesis of the Evolution of Development Via Differentiation Trees of Embryos" (circa 2016).

Distribution of Citation Ages from "Virtual Reality in Neuroscience Research and Therapy" (circa 2011).

What is interesting here is that both "Toy Models" and "Quantifying Mosaic Development" show a long tail with respect to age, while the review article shows very little in terms of a distributional tail. While there are differences in topical literatures (the VR and associated perceptual literature is not that old, after all) that influence the result, it seems that the recurrent academic Terminators utilize the literature in a way somewhat differently than most contemporary research papers. While the respect for history is somewhat author and topically dependent, it does seem to add a extra dimension to the research.

NOTES:
[1] the Toy Models paper was part of a Biosystems special issue called "Patterns in Evolution".

[2] This is a Terminator metaverse reference, in which the Terminator comes back every ten years to cause, effect, and/or stop Judgement Day.

[3] Gittleman, J.L. and Luh, H. (1992). On Comparing Comparative Methods. Annual Review of Ecology and Systematics, 23, 383-404.

[4] Alicea, B., Portegys, T.E., and Gordon, R. (2016). Information Isometry Technique Reveals Organizational Features in Developmental Cell Lineages. bioRxiv, doi:10.1101/062539

[5] Alicea, B. and Gordon, R. (2016). Quantifying Mosaic Development: Towards an Evo-Devo Postmodern Synthesis of the Evolution of Development Via Differentiation Trees of Embryos. Biology, 5(3), 33.

[6] Pan, R.K., Petersen, A.M., Pammolli, F., and Fortunato, S. (2016). The Memory of Science: Inflation, Myopia, and the Knowledge Network. arXiv, 1607.05606.

August 26, 2015

Scientific Bytes and Pieces, August 2015

Welcome to this month's version of Scientific Bytes and Pieces. The first feature is a sad note: complexity theorist John Holland has passed away at the age of 86. The father of genetic algorithms and a pioneer in the field of complex adaptive systems, Holland's contributions will live on. Here are two obituaries: one from the New York Times and another from the Washington Post (written by Holand's colleague Scott Page).

R,I,P. John Holland. COURTESY: Plexus Institute.

The SAVE/Point collaboration and Stefano Meschiari have developed an interactive game called Super Planet Crash. This "hours-of-fun"-type game simulates the gravitational dynamics of solar systems. Build your own solar system today! The virtual world physics come courtesy of algorithms designed to detect exoplanets.

Screenshot of Super Planet Crash. WARNING: it is not as easy as it looks.

Next up is a recent article from FiveThirtyEight Blog called "Science Isn't Broken". Despite the sizable body of blog posts and articles lamenting the "brokenness" of the modern scientific enterprise, it turns out that such fears are misplaced. As it turns out, science is a hard enterprise, and prone to error, unexpectedness, and revision. Since I believe that couching these realities as symptoms of dysfunction does the scientific community more harm than good, this discussion is a welcome contribution to our understanding of how science is done.

Interestingly, whenever the topic of "broken science" comes up, cognitive biases are almost never mentioned. Yet cognitive biases play an integral role in decision-making and interpretation. Even algorithms have been shown to exhibit significant social bias. Jim Davies offers us an article via Nautil.us called "Why You’re Biased About Being Biased" in which he reviews the state of cognitive bias research. An accessible tour of the field as well as food for thought (and reevaluation of those thoughts).

Another reason why science is hard rather than broken is the existance of chaotic behavior. Strange and unpredictable phenomena such as transient chaos challenge the expectations and arguments of the "science is broken" crowd. Some people, such as Tamas Tel, find joy in these types of phenomena. See the recent Chaos article "The Joy of Transient Chaos" for this perspective. While not particularly accessible to a popular science audience, the article should give you a glimpse into an alternate perspective.

An artistic take on a series of hyperlinked documents. COURTESY: boingboing.net.

Despite the hard nature of the scientific enterprise, every once in a while breakthroughs are made. This year is a milstone for several of these. The word "hypertext" is 50 years old, and the Einstein's publication on General Relativity is 100 years old. On a related note, Einstein's "Annus Mirabilis" was 110 years ago this year. So much for broken science.

Following-up on a previous Synthetic Daisies post about Theory Hackathons, here is an article that makes the case for hackers to support the cause of scientific data analysis. While the focus is on taming the glut of Neuroscience data, the same principle would apply to all large-scale data. Can hackers help to make sense of data and can they help us bridge the gulf between data and theory? Perhaps we will discuss this in a future post.

July 30, 2015

Theory Hackathons

The theoretical physicist/surfer Garrett Lisi has a long-range vision called the scientific hostel. A scientific hostel is a facility (in a desirable location such as Maui) where scientists can visit and do science/interact for short periods of time.

I have pursued another type of collective scientific endeavor called the theory hackathon [1]. The initial version of this idea occurred in November 2014 when Dr. Richard Gordon (part of the DevoWorm project) visited Champaign-Urbana for a few days of collaboration and discussion. The proceedings here hosted by Orthogonal Research.

In their original form, hackathons are multi-day events that bring programmers together from far-flung physical locations. The "hacking" involves solving problems in a collaborative atmosphere, with the extended period of collaboration allowing for participants to benefit from "extended cognitive flow" [2]. A theory hackathon is quite similar, except that instead of programmers solving programming puzzles, theorists work to solve scientific puzzles.

Some images of the hackathon proceedings (lecture component taken at the Champaign (IL) Public Library).

The basic outline of a theory hackathon (held over several days) involves three interrelated activities: exploration of ideas, organizational sessions, and a formal talk. The session held between Richard and I was primarily to flesh out some pre-existing ideas, but this could be done on a larger scale and with a more formalized schedule.

Traditional Hackathon, with programming and programmers.

Beginnings of a theory hackathon?

As mentioned previously, our hackathon session was pretty informal. A more formal framework might include several activities:

* one-on-one or small group brainstorming sessions. This can be done using a electronic whiteboard or Python notebook to keep track of the cumulative efforts. The idea is to collectively explore a problem and develop as much of a solution as you can in a few hours.

* discussions and follow-ups on previous and outstanding projects. This is largely organizational, but including the housekeeping function as a part of the theory hackathon can drive forward those old ideas in new ideas. It's the "fresh eyes for an old problem" principle at work.

* semi-public lectures. Part of developing theory is working at organizing concepts, references, and data in a lecture format. This part ofo the theory hackathon might involve developing a lecture either ad-hoc or in advance, and then deconstructing the contents in a group setting.

Theory hackathons can be organized around a specific topic (e.g. developmental biology), or the mechanics of theory-building itself [3]. Either way, they can lead to fruitful collaborations and long-lasting ideas. If not, there will still be fledgling ideas to follow up on. While theory hackathons will undoubetedly produce many loose ends, subsequent collaborative meetings and hackathons can help advance this work further.

UPDATE (5/21/2018): If you want to develop your own Hackathon, please check out the badge series on Hackathons, hosted by the OpenWorm Foundation (on Badgelist). Begin with planning your agenda (Hackathon I), then move on to putting your plan into action (Hackathon II, Hackathon III).

NOTES:

[1] h/t Stephen Larson, for coining this phrase during one of our meetings.

[2] For more, please see: Csikszentmihalyi, M. The Systems Model of Creativity: The Collected Works of Mihaly Csikszentmihalyi. Dordrecht, Springer (2014).

[3] For one example of theory-building as a formal activity, please see: Weick, K.E. Theory Construction as Disciplined Imagination. Academy of Management Review, 14(4), 516-531 (1989).

May 31, 2015

Kuhnian Practice as a Logical Reformulation

Are 01110000 01100001 01110010 01100001 [1] shifts a loss, a gain, a mismatch, or an opportunity for intellectual integration and the birth of a new field?

In the Kuhnian [2] approach to empiricism, a well-known outcome observed across the history of science is the "paradigm shift". This occurs when a landmark finding shifts our pre-existing models of a given natural phenomenon. One example of this: Darwin's finches and their evolutionary history in the Galapagos. In this case, a model system confirmed previous intuitions and overturned old facts in a short period of time (hence the idea of a scientific revolution).

During a recent lecture by W. Ford Doolittle at the Insititute for Genomic Biology, I was introduced to a term called "Kuhn loss" [3]. Kuhn loss refers to the loss of accumulated knowledge due to a conceptual shift in a certain field. One might consider this to be a matter of housecleaning, or a matter of throwing out the baby with the bathwater. The context of this introduction was the debate between evolutionary genomicists [4] and the ENCODE consortium over the extent and nature of junk DNA. During the talk, Ford Doolittle presented the definitions of genome function proposed by the ENCODE consortium as a paradigm shift. The deeper intellectual history of biological function would suggest that indeed junk DNA not only exists, but requires a multidisciplinary and substantial set of results to overturn. Thus, rather than viewing the ENCODE results [5] as a paradigm shift, it can be viewed as a form of intellectual loss. The loss, paradigmatic or otherwise, provides us with a less satisfying and robust explanation than was previously the case.

A poster of the talk. COURTESY: IGB, University of Illinois, Urbana-Champaign

Whether or not you agree with Ford Doolittle's views of function, and I am of the opinion that you should, this introduces an interesting PoS issue. In the case of biological function, the caution is against a 'negative' Kuhn loss. But Kuhn loss (in a linear view of historical progress) usually refers to the loss of knowledge associated with folk theories or theories based on limited observational power. In some cases, these limited observations are augmented with deeper intuitive motivations. This type of intuition-guided theory usually becomes untenable given new observations and/or information about the world. Phlogiston theory [6] can be used to illustrate this type of 'positive' Kuhn loss. Quite popular in Ancient Greece and Medivel Europe, phlogiston theory predicts that the physical act of combustion released fire-like elements called phlogistons. Phlogistons operated in a manner opposite of the role we now know oxygen serves in combustion and other chemical reactions. Another less clear-cut example of 'positive' Kuhn loss involves a pre-relativity idea called aether theory predicts that the aether (an all-enveloping medium) is responsible for the propogation of light in space.

In each of these cases, what was lost? Surely the conclusions that arose from a faulty premise needed to be re-examined. A new framework also swept away inadequate concepts (such as "the aether" and "phlogistons"). But there was also a deeper set of logical structures that needed to be reformulated. In phlogiston theory, the direction of causality was essentially reversed. In aether theory, we essentially have a precursor to a more sophisticated concept (spacetime). Scientific revolutions are not all equal, and so neither is the loss that results. In some cases, Kuhn losses can be recovered and contribute to the advancement of a specific theoretical framework. Midwinter and Janssen [7] introduce us to the physicist/chemist Van Vleck, who improved upon the Kuhn loss introduced when quantum theory was introduced and replaced its antecedent theory. Van Vleck did this by borrowing mathematical formalisms from the theory of susceptibilities, and bringing them over to physics. While neither a restoration nor a paradigm shift, Van Vleck was able to improve upon the ability of quantum theory to make experimental predictions.

Tongue-in-cheek description of an empirically verified of phlogiston theory. COURTESY: [8]

Now let us revisit the Kuhnian content of the ENCODE kerfuffle vis a vis this framework of positive/negative Kuhn loss and Kuhn recovery. Is this conceptual clash ultimately a chance for a gain in theoretical richness and conceptual improvement? Does the tension between computational and traditional views of biological function neccessitate Kuhn loss (positive or negative)? According to the standard dialectical view [9], the answer to the former would be yes. In such case, we might expect a paradigm shift that results in an improved version of the old framework (e.g. 'positive' Kuhn loss). But perhaps there is also a cultural mismatch at play here [10] that could be informative for all studies of Kuhn loss. Since these differing perspectives come from very different intellectual and methodological traditions, we could say that any Kuhn loss would be negative due to a mismatch. This is a bit different from the phlogiston example in that while both approaches come from a scientific view of the world, they use different sets of assumptions to arrive at a coherent framework. However, what is more likely is that computational approaches (as new as they are to the biological domain) will influse themselves with older theoretical frameworks, resembling more of Kuhnian recovery (the quantum/antecedent theory example) than a loss or gain.

It is this intellectual (and logical) reformulation that will mark the way forward in computational biology, using an integrative approach (as one might currently take for granted in biology) rather than reasoning through the biology and computation as parallel entities. While part of the current state of affairs involves a technology-heavy computation being used to solve theoretically-challenging biological problems, better logical integration of the theory behind computational analysis and the theory behind biological investigation might greatly improve both enterprises. This might lead to new subfields such a the computation of biology, in which computation would be more than a technical appendage. Similarly, such a synthetic subfield would view of biological phenomena much more richly, albeit with the same cultural biases as previous views of life. Most importantly, this does not take a revolution. It merely takes a logical reformulation, one that could be put into motion with the right model system.

NOTES:
[1] the word "paradigmatic", translated into binary. COURTESY: Ashbox Binary Translator.

[2] Kuhn, T.S. The Structure of Scientific Revolutions. University of Chicago Press (1962).

[3] Hoyningen-Huene, P. Reconstructing Scientific Revolutions. University of Chicago Press (1983).

[4] Doolittle, W.F. Is junk DNA bunk? A critique of ENCODE. PNAS, 110(14), 5294-5300 (2013).

[5] The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57-74 (2012).

[6] Vihalemm, R. The Kuhn-loss Thesis and the Case of Phlogiston Theory. Science Studies, 13(1), 68 (2000).

[7] Midwinter, C. and Janssen, M. Kuhn Losses Regained: Van Vleck from Spectra to Susceptibilities. arXiv, 1205.0179 [physics.hist-ph] (2012).

[8] DrKuha The Phlogiston: Not Quite Vindicated. Spin One Half blog, May 19 (2009).

[9] what we should expect according to dialectical materialism: adherents of two ideologies struggle for dominance, with an eventual winner that is improved upon the both original ideologies. Not to be confused with the "argument to moderation".

[10] for more context (the difference between a scientific revolution and a scientific integration) please see: Alicea, B. Does the concept of paradigm shift need a rethink? Synthetic Daisies blog, December 25 (2014).