Synthetic Daisies: 2019

December 12, 2019

Google Summer of Docs congratulations!

Congrats to Casper daCosta-Luis (and co-mentors Bradly Alicea and Chee-Wai Lee) for successfully completing the inaugural Google Season of Docs! Casper's project involved automating project documentation (using Continuous Integration) at the OpenWorm Foundation. His final project report can be found here.

Thanks to our sponsor INCF for supporting our application. Speaking of Google Seasons, applications for Google Summer of Code (GSoC) 2020 will be opening soon. Once again, I am hosting two projects: one through the DevoWorm group (OpenWorm Foundation), and the other through Orthogonal Research and Education Laboratory. More information to come.

October 30, 2019

Pre-trained Models for Developmental Biology

Authors: Bradly Alicea, Richard Gordon, Abraham Kohrmann, Jesse Parent, Vinay Varma

This content is cross-posted to The Node Developmental Biology blog.

Our virtual discussion group (DevoWormML) has been exploring a number of topics related to the use of pre-trained models in machine learning (specifically deep learning). Pre-trained models such as GPT-2 [1], pix2pix [2], and OpenPose [3] are used for analyzing many specialized types of data (linguistics, image to image translation, and human body features, respectively) and have a number of potential uses for the analysis of biological data in particular. It may be challenging to find large, rich, and specific datasets for training a more general model. This is often the case in the fields of Bioinformatics or Medical Image analysis. Data acquisition in such fields is often restricted due to the following factors:

* privacy restrictions inhibit public access to personal information, and may impose limits on data use.

* a lack of labels and effective metadata for describing cases, variables, and context.

* missing data points, which require a strategy to normalize and can make the input data useless.

We can use these pre-trained models to extract a general description of classes and features without requiring a prohibitive amount of training data. We estimate that the amount of required training data may be reduced by an order of magnitude. To get this advantage, pre-trained models must be suitable to the type of input data. There are a number of models specialized for language processing and general use, but options are fewer within the unique feature space of developmental biology, in particular. In this post, we will propose that developmental biology requires a specialized pre-trained model.

This vision for a developmental biology-specific pre-trained model would be specialized for image data. Whereas molecular data might be better served with existing models specialized for linguistic- and physics-based models, we seek to address several features of developmental biology that might be underfit using current models:

* cell division and differentiation events.

* features demonstrating the relationship between growth and motion.

* mapping between spatial and temporal context.

Successful application of pre-trained models is contingent to our research problem. Most existing pre-trained models operate on two-dimensional data, while data types such as medical images are three-dimensional. A study by Raghu et.al [4] suggests techniques specified by pre-trained models (such as transfer learning by the ImageNet model) applied to a data set of medical images provides little benefit to performance. In this case, performance can be improved using data augmentation techniques. Data Augmentation, such as adding versions of the images that have undergone transformations such as magnification, translation, rotation, or shearing, can be used to add variability of our data and improve the generalizability of a given model.

One aspect of pre-trained models we would like to keep in mind is that models are not perfect representations of the phenomenology we want to study. Models can be useful, but are often not completely accurate. A model of the embryo, for example, might be based on the mean behavior of the phenomenology. Transitional states [5], far-from-equilibrium behaviors [6], and rare events are not well-suited to such a model. By contrast, a generative model that considers many of these features might generally underfit the mean behavior. We will revisit this distinction in the context of “blobs” and “symbols”, but for now, it appears that models are expected to be both imperfect and incomplete.

The inherent imperfection of models is both good and bad news for our pursuit. On the one hand, specialized models cannot be too specific, lest they overfit some aspects of development but not others. Conversely, highly generalized models assume that there are universal features that transcend all types of systems, from physical to social, and from artificial to natural. One example of this is found in complex network models, widely used to represent everything from proteomes to brains to societies. In their general form, complex network models are not customized for specific problems, relying instead on the node and edge formalism to represent interactions between discrete units. But this also requires that the biological system be represented in a specific way to enforce the general rules of the model. For example, a neural network’s focus on connectivity requires representations of a nervous system to be simplified down to nodes and arcs. As opposed to universality, particularism is an approach that favors the particular features of a given system, and does not require an ill-suited representation of the data. Going back to the complex networks example, there are specialized models such as multi-level networks and hybrid models (dynamical systems and complex networks) that solves the problem of universal assumptions.

Another aspect of pre-trained models is in balancing the amount of training data needed to produce an improvement in performance. How much training data can we save by applying a pre-trained model to our data set? We can reformulate this question more specifically to match our specific phenomenon and research interests. To put this in concrete terms, let us consider a hypothetical set of biological images. These images can represent discrete points in developmental time, or a range of biological diversity. Now let us suppose a developmental phenotype for which we want to extract multiple features. What features might be of interest, and are those features immediately obvious?

In the DevoWorm group (where we mostly deal with embryogenetic data), we have approached this in two ways. The first is to model the embryo as a mass of cells, so that the major features of interest are the shape, size, and position of cells in an expanding and shifting whole. Last summer, we worked on applying deep learning to

* Caenorhabditis elegans embryogenesis. Github: https://github.com/devoworm/GSOC-2019.

* colonies of the diatom Bacillaria paradoxa. Github: https://github.com/devoworm/Digital-Bacillaria.

While these models were effective for discovering discrete structural units (cells, filaments), they were not as effective at directly modeling movement, currents, or transformational processes. The second way we have approached this is to model the process of cell division and differentiation as a spatial and discrete temporal process. This includes the application of representational models such as game theory [7] and cellular automata [8]. This allows us to identify more subtle features that are not directly observable in the phenotype, but are less useful for predicting specific events or defining a distinct feature space.

Our model must be capable of modeling multiple structural features concurrently, but also sensitive to scenarios where single sets of attributes might yield more information. Ideally, we desire a training dataset that perfectly balances “biologically-typical” motion and transformations with clearly masked shapes representing cells and other phenotypic structures. Generally speaking, the greater degree of natural variation in the training dataset, the more robust the pre-trained model will turn out to be. More robust models will generally be easier to use during the testing phase, and result in a reduction in the need for subsequent training.

Finally, specialized pre-trained models bring up the issue of how to balance rival strategies for analyzing complex processes and data features. Conventional artificial intelligence techniques have relied on a representation which relies on the manipulation of symbols or a symbolic layer that results from the transformation of raw data to a mental framework. By contrast, modern machine learning methods rely on data to build a series of relationships that inform a classificatory system. While a combination of these two strategies might seem obvious, it is by no means a simple matter of implementation [9]. The notion of “blobs” (data) versus “symbols” (representations) draws on the current debate related to data-intensive representations versus formal (innate) representations [10-12], which demonstrates the timeliness of our efforts. Balancing these competing strategies in a pre-trained model allows us to more easily bring expert knowledge or complementary data (e.g. gene expression data in an analysis of embryonic phenotypes) to bear.

We will be exploring the details of pre-trained models in future discussions and meetings of the DevoWormML group. Please feel free to join us on Wednesdays at 1pm UTC at https://tiny.cc/DevoWorm or find us on Github (https://github.com/devoworm/DW-ML) if you are interested in discussing this further. You can also view our previous discussions on the DevoWorm YouTube channel, DevoWormML playlist (https://bit.ly/2Ni7Fs2).

References:

[1] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. OpenAI, https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.

[2] Isola, P., Zhu, J-Y., Zhou, T., Efros, A.A. (2017). Image-to-Image Translation with Conditional Adversarial Nets. Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Cao, Z., Hidalgo, G., Simon, T., Wei, S-E., and Sheikh, Y. (2018). OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. arXiv, 1812.08008.

[4] Raghu, M., Zhang, C., Kleinberg, J.M., and Bengio, S. (2019). Transfusion: Understanding Transfer Learning for Medical Imaging. arXiv, 1902.07208.

[5] Antolovic, V., Lenn, T., Miermont, A., Chubb, J.R. (2019). Transition state dynamics during a stochastic fate choice. Development, 146, dev173740. doi:10.1242/dev.173740.

[6] Goldenfeld, N. and Woese, C. (2011). Life is Physics: Evolution as a Collective Phenomenon Far From Equilibrium. Annual Review of Condensed Matter Physics, 2, 375-399. doi:10.1146/annurev-conmatphys-062910-140509.

[7] Stone, R., Portegys, T., Mikhailovsky, G., and Alicea, B. (2018). Origins of the Embryo: Self-organization through cybernetic regulation. Biosystems, 173, 73-82. doi:10.1016/j.biosystems.2018.08.005.

[8] Portegys, T., Pascualy, G., Gordon, R., McGrew, S., and Alicea, B. (2016). Morphozoic: cellular automata with nested neighborhoods as a metamorphic representation of morphogenesis. In “Multi-Agent Based Simulations Applied to Biological and Environmental Systems“. Chapter 3 in "Multi-Agent-Based Simulations Applied to Biological and Environmental Systems", IGI Global.

[9] Garnelo, M. and Shanahan, M. (2019). Reconciling deep learning with symbolic artiﬁcial intelligence: representing objects and relations. Current Opinion in Behavioral Sciences, 29, 17–23.

[10] Zador, A.M. (2019). A critique of pure learning and what artificial neural networks can learn from animal brains. Nature Communications, 10, 3770.

[11] Brooks, R.A. (1991). Intelligence without representation. Artificial Intelligence, 47, 139–159.

[12] Marcus, G. (2018). Innateness, AlphaZero, and Artificial Intelligence. arXiv, 1801.05667.

Resources:

* Model Zoo: pre-trained models for various platforms: https://modelzoo.co/

* DevoZoo: developmental data for model training and analysis: https://devoworm.github.io/

* Publicly available Medical Image datasets: https://medical-imaging-datasets and open-access-medical-imaging-datasets

* Popular papers on medical image segmentation along with code: https://paperswithcode.com/area/medical/medical-image-segmentation

* Microscopy specific image datasets: http://www.cellimagelibrary.org/pages/datasets and https://idr.openmicroscopy.org

October 25, 2019

OAWeek: share your own case study!

This post is part of a series published over the course of OAWeek 2019.

Do you use, share, or have an opinion about open data? The Data Reuse Initiative would like to hear from you! In honor of OAWeek 2019, we are looking for personal and research group testimonials on how you share or otherwise practice open data. Submit at your leisure (there is no deadline), but we would like to hear from you!

RULES:
* submit a testimonial (under 200 words) by submitting a pull request to our Github repository or submit to this Google Form.

* if you choose to submit an image (screenshot, diagram, or cartoon), please issue a pull request on Github.

* if you cannot access either of the links, or need help with your submission, please [contact us](mailto:balicea@openworm.org).

October 24, 2019

OA Week: Digital Badges on Open Data

This post is part of a series published over the course of OAWeek 2019. Today's post will preview a series of digital badges related to Data Reuse. These badges were designed in conjunction with the new Data Reuse Initiative.

Overview of Data Reusability I. Click to enlarge.

The first digital badge (Data Reusability I) provides the learner with some practical skills in data sharing. The practical examples are mostly biology-oriented, but is useful for learners from a wide range of fields. Activities include work with a selected article from the journal Genome Biology, posting a sample data set to Figshare, and working with data sets published on the Dryad repository. While these activities provide just a taste of the work involved in sharing data, it nonetheless imparts some key skills in interacting with and publishing data in an open fashion.

Overview of Data Reusability II. Click to enlarge.

The second digital badge (Data Reusability II) provides a tutorial that reviews public data sharing competencies in more depth. For this set of exercises, we have used the Mozilla Data Sharing Planning Template as a model for best practices community standards. Earners of this badge will develop competencies in metadata creation, data cleaning/processing, documenting data set provenance, assigning credit for the published work, and enabling easy and reproducible reuse of the data set. Check them out!

October 22, 2019

OA Week: History of Open Access

This post is part of a series published over the course of OAWeek 2019.

Timeline of scientific output from [1]. Click to enlarge.

This post will walk us through the History of Open Access (with a focus on Open Science) infographic mentioned in our inaugural blog post for this series. Randall Munroe [1] has previously summarized the progression of open science as a function of the scope of scientific output. The events and milestones for the featured historical overview were confirmed by internet search and synthesized from a survey of various tools and publications common in the field. This post characterizes the historical eras according to a developmental biology theme: from the embryo to developmental plasticity to an adult stage of life-history.

History of Open Access (1942-present), color-coded by historical era. Yellow: early, blue: transitional, green: contemporary. Click to enlarge. For a citable version and an alternate display type, please see [2].

1942-1999: Embryonic Ideas and Tools (early). Click to enlarge.

In the early period, there was an emergence of tools, ideas, and attempts to synthesize independent efforts. Early efforts such as the World Data System, MedLine, and Project Gutenberg served as inspiration for later efforts (particularly the development of MedLine into PubMed). Tools such as digital preprints (arXiv) and the internet (HTML, XML) served to provide the infrastructure of open science. Even tools such as Cyc (extraction of scientific rules from data) served to enable greater openness in the practice of science. The end of this era is marked by "Exploring the Development of the Independent, Electronic Scholarly Journal", a survey of open access journals in what coincides with the early internet era.

2000-2008: Institutional Plasticity (transitional). Click to enlarge.

The transitional period (or institutional plasticity) was a time for creating many of the institutions and established norms of the open science community. Many foundational ideas were either established (Creative Commons, digital object identifiers) or came to fruition (Human Genome Project) during this period. It is also of note that at least four declarations of practice were published during this period.

2009-present: A Juvenile No More! (contemporary). Click to enlarge.

The contemporary period has been defined by even more sophisticated tools (Altmetrics), quasi-historical summaries of past work for future development (Reinventing Discovery, The Future of OA), and the discussion of institutional standards at a greater level of specialization (FAIR Principles). This era is also marked by the use of open science to practice collaborative open science (Polymath Project), putting all of the pieces developed in previous eras into place.

NOTES:
[1] Munroe, R. (2013). The Rise of Open Access. Science, 342(6154), 58-59. doi:10.1126/science. 342.6154.58

[2] Alicea, B. (2019). History of Open Access Infographic. Figshare, doi:10.6084/m9.figshare. 9975713

October 21, 2019

Open Access Week 2019: Introduction

Welcome to OAWeek 2019! This year's features are being published in conjunction with the Orthogonal Research and Education Laboratory, the eLife Ambassadors program, and the associated Data Reuse Initiative.

The first feature for this year is an infographic called the History of Open Access [1]. Our history begins in 1943 with some Philosophy of Science [2], and proceeds through key innovations, publications, and institutions the span the late 20th and early 21st centuries. Below is a preview of the infographic, and will be discussed in more detail on Tuesday the 22nd.

History of Open Access infographic (Omega version).

The second feature is a series of digital badges (microcredentials) on Open Data practice [3]. The first badge in the series walks the learner through several lessons on how to identify, locate, and work with open datasets. The second badge walks the learner through preparing an open data set for publication. This lesson is based on the Mozilla Data Reuse Planning Template which help people adhere to best practices when making data public and shareable. These badges will be released on Thursday the 24th. Then, on Friday the 25th, we will give you the chance to make your own contributions (details to come). So join us for our week of celebrating Open Access!

NOTES:
[1] Figshare, doi:10.6084/m9.figshare.9975713

[2] Robert Merton, The Sociology of Science: theoretical and empirical investigations.

[3] Molloy, J.C. (2011). The Open Knowledge Foundation: Open Data Means Better Science. PLoS Biology, 9(12), e1001195.

September 23, 2019

Summer of Productivity at OREL

An update on what we did with our Summer at the Orthogonal Research and Education Laboratory. We hosted a study group based on an emerging collaboration to understand how Braitenberg Vehicles can be used as a model to study neural development [1].

We hosted one Google Summer of Code student (Stefan Dvoretskii), and mentored another GSoC student through the OpenWorm Foundation (Vinay Varma). Thanks go to INCF as well for their support. We also hosted the activities of several mentees (Ziyi Gong, Jesse Parent, Ankit Gupta, and Hrishikesh Kulkarni) which is topically diverse and will be featured in future blog posts [2]. Below are three tweets from the OREL Twitter account that show highlights from some of this work (congrats again to Stefan Dvoretskii, Vinay Varma, Jesse Parent, and Hrishikesh Kulkarni for their completed milestones).

Click on image for higher resolution. Click here for links.

NOTES:

[1] our project repository is located here.

[1] two additional students were hosted through the DevoWorm group: Ujjwal Singh and Asmit Singh, who worked on a project called Digital Bacillaria.

September 2, 2019

Introducing: DevoWormML

This has been cross-posted to The Node blog.

I am pleased to announce a new collaborative interest initiative called DevoWormML, based on work being done in the DevoWorm group. DevoWormML will meet on a weekly basis, and explore the application of machine learning and artificial intelligence to problems in developmental biology. These applications can be geared towards the analysis of imaging data, gaining a better understanding of thought experiments, or anything else relevant to the community.

While "ML" stands for machine learning, participation can include various types of intelligent systems approaches. Our goal is to stimulate interest in new techniques, discover new research domains, and establish new collaborations. Guests are welcome to attend, so if you know an interested colleague, feel free to direct them our way.

Meetings will be Wednesdays at 1pm UTC on Google Meet. Discussions will also take place on the #devowormml channel of OpenWorm Slack (request an invitation). We will discuss organizational details at our first meeting on September 4. If you cannot make this time but are still interested in participating, please contact me. Hope to see you there!

July 19, 2019

50th Anniversary of the Moon Landing

Click to enlarge.

For the 50th Anniversary of the moon landing, here is an infographic of the mission flight path, courtesy of the Smithsonian National Air and Space museum (above). Then explore all Apollo mission landing sites courtesy of Google Moon (below).

Click to enlarge.

June 18, 2019

Twenty years ago in complexity.....

It's always a 20th Anniversary somewhere, but these two are particularly notable for the complexity community:

1) Swarm Intelligence

2) Scale-free Networks

May 31, 2019

Summer of Working Groups

I am happy to announce that this Summer I will be advising/mentoring two research groups of Google Summer of Code (GSoC) students and applicants. Thanks to INCF for sponsoring our applications and applicants again this year. The first group (DevoWorm), based in the OpenWorm Foundation, is interested in image segmentation and Machine Learning. GSoC applicants Asmit Singh and Ujjwal Singh (currently attending IIT Delhi) are working on extracting quantitative data to construct a digital model for organisms in the diatom genus Bacillaria. The GSoC student (Vinay Varma, currently attending Amrita Vishwa Vidyapeetham University) is working on developing a method for Semantic Segmentation based on microscopy of the embryogenetic process, focusing on the biology of Caenorhabditis elegans.

The other group is based in the Orthogonal Research and Education Laboratory, and is focused on creating a methodology for developmental Braitenberg Vehicles. This involves simulating the formation of a brain (neurons and connectome) in a simple body that continuously interacts with its environment. The GSoC student in this group (Stefan Dvoretskii) is developing such a model using Genetic Algorithms and the open-source SimBrain platform. The GSoC applicants (Ziyi Gong, Jesse Parent, and Ankit Gupta) are working on a variety of unique approaches that will aid in our
understanding of this complex system. These alternative approaches range from biologically-inspired (Ziyi) to a cybernetic architecture based on the Every Good Regulator Theorem (Jesse).

The hybrid education/research working group is something I started with last year's Orthogonal Lab GSoC group. There is a good chance that this Summer's discussions and work periods will produce awesome, cutting-edge science. Follow us on Github (DW group, BV group) and YouTube (DW group, BV group) for more!

April 27, 2019

Open Leaders 7 is almost finished. Join us for a Sprint!

In a previous post, I mentioned that I was a part of Mozilla's Open Leaders program, which is dedicated to working open and improving the overall health of the internet. Each Open Leader worked on an open project over the duration of the 14 week program, which in my case is an open educational curriculum for the DevoWorm group and the OpenWorm Foundation more generally.

Purpose and Outcomes from the Mozilla's POP (purpose, outcomes, process) standards for project management). Click to enlarge.

The purpose of building this curriculum is twofold: to encourage contributions to the organization, and to create educational opportunities that enrich people's contributions. Therefore, our curriculum combines topical tutorials with course materials focused on interdisciplinary topics (bridging data science, computer science, and biology) and training in niche topical areas. The curriculum has a front-end (managed at Eliademy) and a back-end (managed in a Github repository). You can make contributions to the front-end by either enrolling in the course or making notes in hypothes.is.

Examples of the front-end, back-end, and course content. Click to enlarge.

Another way to contribute is to attend the Sprint for Internet Health. Last year this was referred to as the Global Sprint, and gives people an opportunity to engage in open source creation as well as consumption. Our project already has two contributors (Asmit Singh, Vinay Varma, and Uggwal Singh, who are also Google Summer of Code applicants to the organization) who have been refining the materials by way of pull requests. More details to come!

Details on how to get involved. Click to enlarge.

UPDATE (5/25): An interview with project lead Bradly Alicea (by Robert Schafer) is now available on the Mozilla Open Leaders Medium blog.

April 4, 2019

April is Documentation Month!

This content is cross-posted from OpenWorm Foundation blog.

Based on popular demand, April will be “Docs Month” within the OpenWorm Foundation. The month will be filled with reevaluating our organizational docs, in addition to putting together a proposal (in conjunction with INCF) for the inaugural Google Season of Docs.

If you have a technical writing background, we can work on a community contribution, which can include (but is not limited to) the following current organizational needs:

1) building an OpenWorm Wiki, which might reference special topics and within the subprojects.

2) website redesign so as to make it easy to find key information (e.g. people, how to get to Slack, Github, Docker and Blender models).

3) help refine a reference document started by Senior Contributor Gopal Sarma.

4) an inventory of projects and docs that align with current organizational goals.

5) a reorganization of all project docs, mainly merging DevoWorm, NeuroML, Geppetto, and other assorted docs into one location.

6) help with the educational curriculum.

7) find and write specs for “casual contributor-friendly” contributions.

Let's get working!

UPDATE (4/26): We submitted an application by the deadline that incorporates many of these elements. If you are interested in learning more, please join our Slack team and join the #documentation channel, or contact me via e-mail.

March 6, 2019

March is PyOpenWorm month!

This content is cross-posted from OpenWorm Foundation blog.

PyOpenWorm is a data-access and data sharing library for the OpenWorm project. It is designed to provide the project with a coordinated means for generically describing data in a variety of formats, tracking the origins of said data, and describing various transformations between formats.

This month should see a release for PyOpenWorm and further integration with c302, the neural network model generation tool set. We’re always happy to welcome contributors in this effort! There are, however, a number of other areas where you can help on PyOpenWorm: see the beginner label on PyOpenWorm issue tracker for a listing of issues to start with or else get in touch with Mark Watts, (github:mwatts15) for suggestions.

Also this summer, OpenWorm is again participating in Google Summer of Code (GSoC) under the banner of the INCF. My personal favorite project idea focuses on file sharing (model files, experimental data, etc.) within PyOpenWorm. The file sharing component will have a major role in PyOpenworm going forward, so this is a great project to get involved with now. You can see the list of all OpenWorm projects for 2019 here. Although applications aren’t open until 25 March, you can get in touch with the GSoC mentors and learn more about the problems they’re trying to solve today – so don’t be shy! You can learn more about GSoC in general on their website.

This project of the month kicks off with an “office hours” session on the OpenWorm Slack chat this Wednesday, 6 March at 17:00 UTC where several senior contributors (including me) simultaneously make themselves available for Q&A, discussion, and to talk about current and upcoming activities in the project. You can join by filling out this (short) form. This is also the best starting place if you’re interested in contributing to OpenWorm more broadly since it will let us know what you’re interested in and what skills you have now so we can help you find which area best fits your interests.

Thanks for reading! Looking forward to hearing from you :-)

Find us on Twitter: @OpenWorm

Link to the OW Contributor form is here

March 1, 2019

Open Data Day 2019: DevoZoo is Live!

Welcome to Open Data Day 2019, sponsored by Orthogonal Research and Education Laboratory! This year we are introducing an open data repository called DevoZoo. Based on work going on in the DevoWorm group, DevoZoo include various primary, secondary, and tertiary datasets from a variety of developing organisms (all characterizing embryogenesis). There are several other tabs on the site, including references on C. elegans developmental biology (the embryo image), a collection of Jupyter notebooks (DevoNotes), methods developed within the DevoWorm group (DevoMethods), and an educational curriculum (DevoWormU).

DevoZoo is based on the "Models for Data Recycling" mini-grant submitted to (but not funded by) Mozilla Science a few months ago. If you want to understand the greater vision for this initiative, please read over the grant proposal document. Feel free to contribute by playing with the datasets, or by proposing new resources by forking our Github repository and submitting a pull request.

February 16, 2019

Darwin meets Category Theory in the Tangential Space

For this Darwin Day (February 12), I would like to highlight the relationship between evolution by natural selection and something called category theory. While this post will be rather tangential to Darwin's work itself, it should be good food for thought with respect to evolutionary research. As we will see, category theory also has relevance to many types of functional and temporal systems (including those shaped by natural selection) [1], which is key to understanding how natural selection shapes individual phenotypes and populations more generally.

This isn't the last you'll hear from me in this post!

Category Theory originated in the applied mathematics community, particularly the "General Theory of Natural Equivalence" [2]. In many ways, category theory is familiar to those with conceptual knowledge of set theory. Uniquely, category theory deals with the classification of objects and their transformations between mappings. However, category theory is far more powerful than set theory, and serves as a bridge to formal logic, systems theory, and classification.

A category is defined by two basic components: objects and morphisms. An example of objects are a collection of interrelated variables or discrete states. Morphisms are things that link objects together, either structurally or functionally. This provides us with a network of paths between objects that can be analyzed using categorical logic. This allows us to define a composition (or path) by tracing through the set of objects and morphisms (so-called diagram chasing) to find a solution.

In this example, a pie recipe is represented as a category with objects (action steps) and morphisms (ingredients and results). This monoidal preorder can be added to as the recipe changes. From [3]. Click to enlarge.

Categories can also consist of classes: classes of objects might include all objects in the category, while classes of morphism include all relational information such as pathways and mappings. Groupoids are functional descriptions, and allow us to represent generalizations of group actions and equivalence relations. These modeling-friendly descriptions of a discrete dynamic system is quite similar to object-oriented programming (OOP) [4]. One biologically-oriented application of category theory can be found in the work of Robert Rosen, particularly topics such as relational biology and anticipatory systems.

Animal taxonomy according to category theory. This example focuses on exploring existing classifications, from species to kingdom. The formation of a tree from a single set of objects and morphisms is called a preorder. From [3]. Click to enlarge.

One potential application of this theory to evolution by natural selection is to establish an alternate view of phylogenetic relationships. By combining category theory with feature selection techniques, it may be possible to detect natural classes that correspond to common ancestry. Related to the discovery of evolutionary-salient features is the problem of phylogenetic scale [5], or hard-to-interpret changes occurring over multiple evolutionary timescales. Category theory might allow us to clarify these trends, particularly as they relate to evolving life embedded in ecosystems [6] or shaped by autopoiesis [7].

More relevant to physiological systems that are shaped by evolution are gene regulatory networks (GRNs). While GRNs can be characterized without the use of category theory, they also present an opportunity to produce an evolutionarily-relevant heteromorphic mapping [8]. While a single GRN structure can have multiple types of outputs, multiple GRN structures can also give rise to the same or similar output [8, 9]. As with previous examples, category theory might help us characterize these otherwise super-complex phenomena (and "wicked" problems) into well-composed systems-level representations.

NOTES:
[1] Spivak, D.I. (2014). Category theory for the sciences. MIT Press, Cambridge, MA.

[2] Eilenberg, S. and MacLane, S. (1945). General theory of natural equivalences. Transactions of the American Mathematical Society, 58, 231-294. doi:10.1090/S0002-9947-1945-0013131-6

[3] Fong, B. and Spivak, D.I. (2018). Seven Sketches in Compositionality: an invitation to applied category theory. arXiv, 1803:05316.

[4] Stepanov, A. and McJones, P. (2009). Elements of Programming. Addison-Wesley Professional.

[5] Graham, C.H., Storch, D., and Machac, A. (2018). Phylogenetic scale in ecology and
evolution. Global Ecology and Biogeography, doi:10.1111/geb.12686.

[6] Kalmykov, V.L. (2012). Generalized Theory of Life. Nature Precedings, 10101/npre.2012.7108.1.

[7] Letelier, J.C., Marin, G., and Mpodozis, J. (2003). Autopoietic and (M,R) systems. Journal of Theoretical Biology, 222(2), 261-272. doi:10.1016/S0022-5193(03)00034-1.

[8] Payne, J.L. and Wagner, A. (2013). Constraint and contingency in multifunctional gene regulatory circuits. PLoS Computational Biology, 9(6), e1003071. doi:10.1371/journal.pcbi.1003071.

[9] Ahnert, S.E. and Fink, T.M.A. (2016). Form and function in gene regulatory networks: the structure of network motifs determines fundamental properties of their dynamical state space. Journal of the Royal Society Interface, 13(120), 20160179. doi:10.1098/rsif.2016.0179.

February 6, 2019

Mozilla Open Leaders 7

I have been selected into the Mozilla Open Leaders program as a cohort member. This is the seventh edition of the program (hence the abbreviation OL7), and runs for 14 weeks. During this time, I will learn how to better leverage digital tools for building online communities. I am working within the OpenWorm Foundation to apply these principles to a specific context.

An example of "open branding" (iterative logo design as a series of blog posts).

The hashtag #WOLO also serves as the guiding principle of Open Leaders: work open, lead open. Open Leaders come from all walks of life, and all sorts of digital organizations. I am in cohort A, which is part of the project track. Together, we will work to improve the "health" of the internet.

January 1, 2019

January is DevoWorm month!

Blossoms or fireworks to ring in the New Year?

Welcome to 2019! And welcome to OpenWorm Foundation's project of the month for January, featuring DevoWorm. Here I will briefly go over progress in the DevoWorm group over the last year and a half. If you would like to know more, we have a group Slack channel (#devoworm) in the OpenWorm team, a group website, and a Github repository.

For the uninitiated, the DevoWorm group has a multifaceted set of interests. We are interested in simulating and analyzing data related to worm development, but have an interest in the development of other model organisms as well. In terms of results, we have focused mostly on publications and open datasets, but as you will see from the website, we have also been involved in the creation of unique demos and software development.

The DevoWorm group is also interested in education. Our educational efforts have largely spread out over four types of pedagogy: digital badges, tutorials via interactive notebooks, public lectures, and one-on-one mentorship through the Google Summer of Code (GSoC) program. The OpenWorm Foundation has hosted a DevoWorm GSoC student for the past two years (2017 and 2018), and will be offering a third opportunity this year (2019).

This is the 15th anniversary for the GSoC program, and it is always an excellent experience. The application process begins on February 25th. If you are interested in a mixture of computational biology, image processing, and machine learning, please contact us for more information.

COURTESY: Image from "One, Two, Three,....GSoC!" by Vipal Gupta

While GSoC is well-compensated opportunity to participate in DevoWorm, there are also less formal ways through which one can collaborate. One of these ways is through a conventional research pathway such as analyzing data, building a simulation, or curating a dataset. Another way to collaborate is to help create new types of educational content. We are particularly interested in creating virtual reality-based offerings in the near future. If you enjoy creating educational content, or simply enjoy learning, please get in touch!

Another new initiative is called DevoZoo. The DevoZoo site aggregates open datasets, methods, and techniques relevant to computational developmental biology and data science biology. We currently host open datasets for the following model organisms: C. elegans, Drosophila, Zebrafish, Ascidians, and Mouse. DevoZoo also hosts raw microscopy data in the form of movies for many of these model organisms as well as Spiders. As if this were not enough, we also try to engage learners and open scientists with artificial life models. The DevoZoo presents three: Morphozoans, developmental Braitenberg Vehicles, and Multicell Systems. The artificial life models in particular could use some further development. Check out the DevoZoo webpage or ask us if you would like to learn more.

Finally, you can participate by collaborating on a publication. The DevoWorm group has been featured in four publications in the past year. The OpenWorm article in the "Connectome to Behavior" special issue of Royal Society B provides a succinct description of the project and its current course. Some of our members served as editors and contributors to a special issue of BioSystems in honor of Dr. Lev Beloussov. This issue features 32 articles that provide a very broad and innovative look at the topic of morphogenesis. Our set of contributions (peer-reviewed papers) spanned from network models of the embryo to the developmental emergence of the connectome and quantitative approaches to organogenesis in the eye imaginal disc.

If you are interested in joining in on the discussion, we hold group meetings online every Monday at 9pm UTC. We are also starting to host hackathons on Fridays during the late morning/early afternoon North American time. Check out our scheduling page for more information. Hope to encounter you soon, and have a great month!