February 12, 2025

A review of carcinization: from the biology to computational models

COURTESY: 10 Reasons to Celebrate Darwin Day, Paleontology World, February 14, 2018.

For Darwin Day 2025, I will talk about a form of convergent evolutionary phenomenon called carcinization. Carcinization is the convergent evolution of a crab phenotype. Crablike body plans (defined by a flat, rounded shell and a tail that is folded underneath the body) evolved independently at least five times over the course of Decopod evolution (Hamers, 2023; Wolfe et.al, 2021). This is an example of convergent evolution, where similar phenotypes (and by extension functional evolution) recur in different lineages with ostensibly different underlying molecular mechanisms.

From a phylogenetic perspective, carcinization (acquisition of a crablike body plan) and decarcinization (loss of a crablike body plan) are ubiquitous across marine invertebrates. Ecological selection for such a body type that has led to phenotypic integration of multiple traits, particularly the carapace shape and abdomen (Wolfe et.al, 2021). Figure 1 shows the phylogenetic origins of carcinization in the Brachyura and Anomura clades (infraorders).

Figure 1. A phylogeny showing a variety of crab phenotypes (left), and an illustration of transformation to different crab phenotypes (right). Left image: "How Does a Crustacean Become a Crab", Phys.org. Right image: The "hermit to king" transition within the infraorder Anomura (Tsang et.al, 2011). Click to enlarge.

The most interesting attributes of the carcinization body plan involves: 1) multiple paths to a basic phenotype (shape). Many alternate genotypes result in self-similar phenotype, 2) phenotypic elaboration not due to common ancestry, and 3) a generalized form of common ancestry not at the level of traits. There are varying definitions of carcinization (or brachyurization, see Footnote 1) across genera and orders. The strict definition of McLaughlin and Lemaitre (1997) is a reduction and folding of the abdomen beneath the thorax, or the evolution of a crab-like appearance. 


We can use molecular methods to discover the deep evolutionary relationships between various instances of crab-like phenotypes. Using a mitochondrial phylogeny based on genomic rearrangements of an arthropod protein-coding gene, Morrison et.al (2002) suggest that once they appear, the independent evolution of crab-like forms may be irreversible. Another study by Wolfe et.al (2019) utilize nuclear genes and the Anchored Hybrid Enrichment (AHE) method to confirm monophyletic (single origin) relationships between all infraorders of the clade Decopoda. They also demonstrate that monophyletic "lobster" and "crab" groups exist. In terms of developmental origins, carcinization involves Brachyury (T-box Genes): great detail for its pivotal role in the development of the notochord and posterior mesoderm (Papaioannou, 2014; see also Footnote 1). Carcinization results from several transcriptional mechanisms related to physiology and phenotype, including energy metabolism-related pathways, ventral nerve cord fusion and associated apoptosis, metamorphosis, and abdominal-specific Hox genes (Yang et.al, 2021).


The evolution and development of the crab-like body plan can be characterized computationally in order to expand our understanding of convergent evolution in the evolution of development. There are a number of means to build a computational model of this process. Ostachuk (2021) proposes a network-based topological model of crab metamorphic development. In this model, the stages of brachyuran metamorphosis are modeled as a series of complex networks. Figure 2 shows this process of defining morphological unit centroids as network nodes, and topological transformations between morphological units as the network edges. A topological overlap analysis was conducted to demonstrate changes in phenotypic complexity. Traditional measures of complexity, such as modularity and hierarchical organization, increase across the course of development. This corresponds to what Ostachuk (2021) defines as a transition from intensive to extensive complexity.


Figure 2. A network of morphological units derived from a crab-like phenotype. From Figure 1B, Ostachuk, 2021. Click to enlarge.


Figure 3. A comparison of developmental network topologies from egg to crab phenotype. From Figure 3 in Ostachuk, 2021. Click to enlarge.


Ostachuk (2021) uses morphological networks rather than gene regulatory networks (GRNs) because it is difficult to make a mapping from network outputs to topological transformations of the phenotype. Yet one benefit of using genomic representations is to allow for a further representation of canalized morphogenesis. This is consistent with Waddington's notion of reduced sensitivity to genetic or environmental perturbations (Agam and Braun, 2025) and is amenable to understanding via an epigenetic landscape model. The epigenetic landscape model (Wang et.al, 2011) in particular is useful for modeling the evo-devo of carcinization. According to our evolutionary examples, we should expect the landscape to converge during development. A prediction can be made that most stable points in the epigenetic landscape favor a path towards crab-like phenotypes. Molecular mechanisms such as Hsp90 can provide a mechanism for phenotypic divergence in cavefish. Yet phenotypic buffering mechanisms can also work the other direction: multiple configurations of genomic loci converge to the same phenotype (Kovuri et.al, 2023). This phenotype tends to become irreversible as other options are no longer developmentally viable. Indeed, evolutionary irreversibility can be represented as a saddle node, a pitchfork bifurcation where two developmental pathways diverge (Ferrell, 2012).

Carcinization can also be summarized in the form of a computational genotype-phenotype map (Figure 4). On such a map, we can approximate convergent evolution as multiple genotypic representations that converge to a single phenotype. Genotype-phenotype maps also allow convergent evolution to be viewed as a study in self-similarity. In the complexity literature, self-similarity is defined as a complex system with the same statistical properties at multiple powers of magnitude (Magnusson, 2023). Figure 4 demonstrates three different types of genotype-phenotype maps: GRNs to phenotypic modules (Figure 4A), a correspondence map that maps between domains (Figure 4B), and a genotypic representation that maps to a phenotypic representation (Figure 4C). In Figure 4A, the output of three GRNs (G1, G2, G3) are mapped to four phenotypic modules (P1, P2, P3, P4). Each GRN can map their outputs to multiple phenotypic modules, components of a phenotype that we observe in the crab-like body plan. This allows us to estimate the contribution of each GRN to each phenotypic module (Wagner et.al, 2007; see also Footnote 2). Figure 4B provides a means to understand the space of genotypic variation and how this corresponds to the space of all possible phenotypic configurations. While the example we give is not specific to crab-like body plans, in such a case a wide variety of GRN activities (Wg) will correspond to a constrained set of locations in the phenotypic map (Wp). This allows us to apply more sophisticated Computational Biology models such as joint manifolds (Munteanu and Sole, 2008). To conclude, Figure 4C demonstrates the concept of phenotypic redundancy (Ahnert, 2017), which is a common feature of phenotypic maps. Phenotypic redundancy can also help us understand how multiple genotypic representations can converge upon a self-similar phenotype. Our genotypic representation contains a simple chromosome with multiple loci, which in Figure 4C are recombined and mutated across our three examples. Our phenotype is a 2-D layer of black and white cells, which result from the expression of the genotypic representation. Based on an application of theory, the crab-like body plan can be said to exhibit robustness against gene duplication, mutation, and recombination events.



Figure 4. Genotype-Phenotype map components. A) discrete model of genomic elements (containing a GRN) with their outputs mapping to different phenotypic modules, B) correspondence map showing how genotypic elements in domain Wg map to phenotypic elements in domain Wp, C) genotypic representations that converge upon a single phenotypic representation. Click to enlarge.

Let us conclude with two items for further study. Wolfe et.al (2021) asks: can you predict a phenotype from ecology or genomics? In the case of carcinization, we observe repeated gain and loss of body plan: polyphyletic nature of crab phenotype. We might also be able to predict crab-like phenotypes from the results of computational models. The developmental network and epigenetic landscape approaches are particularly promising in this regard. Might carcinization be a form of developmental buffering as predicted by the epigenetic landscape model? Patterson and Klingenberg (2007) suggest that phenotypic buffering is triggered by Hsp90 activities in flies, fish, and plants. Genotype-phenotype maps are indeed possible but require a complete characterization of the genetic diversity underlying the multitude examples of crab-like phenotypes found in nature.


Footnotes:
1. Brachyurization is perhaps related to brachyury, which involves with the epithelial-mesenchymal transition in development. A description of this process is reviewed in Huang et.al (2022) and Haerinck et.al (2023).

2. Further discussion of conducting genotype-phenotype mapping using network approaches are presented in Kim and Przytycka (2013).


References:

Ahnert, S.E. (2017). Structural properties of genotype–phenotype maps. Royal Society Interface, 14(132), 20170275.

Ferrell, J.E. (2012). Bistability, Bifurcations, and Waddington's Epigenetic LandscapeCurrent Biology, 22(11), R458-R466.

Haerinck, J., Goossens, S., and Berx, G. (2023). The epithelial–mesenchymal plasticity landscape: principles of design and mechanisms of regulationNature Reviews Genetics, 24, 590–609.

Hamers, L. (2023). Why do animals keep evolving into crabs? LiveScience, June 1.

Huang, Z., Zhang, Z., Zhou, C., Liu, L., and Huang, C. (2022). Epithelial–mesenchymal transition: The history, regulatory mechanism, and cancer therapeutic opportunitiesMedComm, 3(2), e144. doi:10. 1002/mco2.144.

Khodaee, F., Zandie, R., and Edelman, E.R. (2025). Multimodal learning for mapping genotype–phenotype dynamics. Nature Computational Science, doi:10.1038/s43588-024-00765-7.

Kim, Y-A. and Przytycka, T.M. (2013). Bridging the Gap between Genotype and Phenotype via Network Approaches. Frontiers in Genetics, 3, 227.

Kovuri, P., Yadav, A., and Sinha, H. (2023). Role of genetic architecture in phenotypic plasticity. Trends in Genetics, 39, 703-714.


McLaughlin, P.A. and Lemaitre, R. (1997). Carcinization in the Anomura: fact or fiction? I. Evidence from adult morphologyContributions to Zoology, 67(2), 79-123.

Morrison, C.L., Harvey, A.W., Lavery, S., Tieu, K., Huang, Y., and Cunningham, C.W. (2002). Mitochondrial Gene Rearrangements Confirm the Parallel Evolution of the Crab-like FormRoyal Society B, 269(1489), 345-350.

Munteanu, A. and Sole, R. (2008). Neutrality and Robustness in Evo-Devo: Emergence of Lateral Inhibition. PLoS Computational Biology, 4(11), e1000226. 


Papaioannou, V.E. (2014). The T-box gene family: emerging roles in development, stem cells and cancerDevelopment, 141(20), 3819–3833.

Patterson, J.S. and Klingenberg, C.P. (2007). Developmental buffering: how many genes? Evolution and Development, 9(6), 525–526.

Tsang, L-M., Chan, T-Y., Ahyong, S., and Chu, K-H. (2011). Hermit to King, or Hermit to All: Multiple Transitions to Crab-like Forms from Hermit Crab AncestorsSystematic Biology, 60(5), 616-629.

Wagner, G.P., Pavlicev, M., and Cheverud, J.M. (2007). The road to modularity. Nature Reviews Genetics, 8, 921–931.

Wang, J., Zhang, K., Xu, L., and Wang, E. (2011). Quantifying the Waddington landscape and biological paths for development and differentiation. PNAS, 108(20), 8257–8262. 

Wolfe, J.M., Breinholt, J.W., Crandall, K.A., Lemmon, A.R., Moriarty Lemmon, E., Timm, L.E., Siddall, M.E., and Bracken-Grissom, H. (2019). A phylogenomic framework, evolutionary timeline and genomic resources for comparative studies of decapod crustaceansProceedings in Biological Science, 286(1901), 20190079.

Wolfe, J.M., Luque, J., and Bracken-Grissom, H.D. (2021). How to become a crab: Phenotypic constraints on a recurring body planBioEssays, 2100020.

Yang, Y., Cui, Z., Feng, T., Bao, C., and Xu, Y. (2021). Transcriptome analysis elucidates key changes of pleon in the process of carcinizationJournal of Oceanology and Limnology, 39, 1471–1484.


January 18, 2025

Orthogonal Lab Annual Report for 2024

Another year of the Orthogonal Research and Education Laboratory (2024). We have had a great year! I posted a summary video on YouTube that covers activities related to Saturday Morning NeuroSim, our Open Science/Open-source interest group, the Computational Developmental Systems interest group, the Representational Brains and Phenotypes interest group, and the Cybernetics interest group. I also discuss our educational initiatives and conference/publication activities.

The seminal activity of the lab is the Saturday Morning NeuroSim meeting, which happens on Saturdays at 10AM ET (North America). This is based on the Saturday Morning Physics educational events, which originated in Germany and are popular in Physics departments and US National Labs. We had 45 meetings in 2024, and covered all manner of intersections between computational, neuroscience, social science, molecular biology, and complexity theory. 

The Computational Developmental Systems interest group can best be described as Neuro-Devo-Psych. This combines our work on Computational Critical Periods and Computational Developmental Biology (DevoWorm-associated). This also intersects with work in the Representational Brains and Phenotypes interest group and the Developmental Neurosimulation approach. 


Our Open Science/Open-source interest group sponsors Google Summer of Code participation in addition to assorted activities in the research practice and open-source project management spheres. The rejuvenation of our Cybernetics interest group features a reading group and other academic activities. 

November 30, 2024

OpenWorm Annual Meeting 2024 (DevoWorm update)

 Here are the slides for the DevoWorm group's report to the OpenWorm Annual Meeting (2024). You can watch Bradly Alicea present the talk on YouTube.


Aside from all the great stuff going on in DevoWorm, there are two new tools being developed by Padraig Gleeson's team at University College London. First up is the Connectome Toolbox, which brings together multiple key datasets and visualization tools for C. elegans connectome analysis. The other is OpenWorm.ai, which uses a large language model (LLM) hosted on Huggingface to query information about C. elegans biology.

Here are some slides from the presentation. Click to enlarge. Thanks to Mehul Arora and Pakhi Banchalia for their Google Summer of Code efforts.























October 24, 2024

OAWeek 2024: Intrinsic and Extrinsic Approaches to Open Access

This post is in celebration of Open Access (OA) Week 2024. The theme for this year is "Community over Commercialization". 

How do we incentivize people to adopt Open Access practices? We can take lessons from motivational Psychology to think about routes to better practice. Before doing so, we need to consider the current (and often sorry state) of open access.

It could be argued that in some important ways, Open Access has failed. The system of access to academic goods as currently structured is built on significant benefits to publishers and costs to libraries and authors. This benefit has been accrued by publishers due to reputational benefits: being published in Nature, Science, or Cell is highly prestigious. Yet the benefits of this prestige are necessarily limited to a few groups with lots of resources. And the beneficial attributes of open access have been captured by commercial entities. Similar problems plague the open-source community, a shift to the open ethos is the only way out.  

The different types of open access also play different roles in the marketplace of academic goods. Green open access, or self-archiving artifacts, are community goods. While this can be susceptible to the tragedy of the commons, proper social investment can ameliorate maintenance and growth imperatives. This is often seen as the highest standard for open access but requires community investment. Building a sustainable infrastructure of preprints, open peer review, and overlay journals has been elusive.

The Economic Benefits and Costs of Different Colored Access


Black open access using tools such as Sci-Hub is considered piracy (hence the "black" label). From an economic perspective, piracy is symptomatic of a dysfunctional market. Indeed, part of open access' failure is due to the dominant position of publishers and their own economic imperatives. In fact, black open access can be considered a rational response to closing access to article in a research culture of sharing and finding alternate routes to success [1-3]. 

To focus more on the publisher's advantage, and the failure point of open access more generally, is the current state of Gold, Diamond, and Platinum open access. Gold open access involves payment of an APC (Article Processing Charges) fee to the publisher. This often reduces the burdens on libraries, as they previously paid excessive subscription fees. This is because APCs actually increase the burden on individual authors, with disappointing results on the prestige economy. Without market power for the authors (or home institutions), there is no incentive to build Diamond and Platinum access systems. In such systems, no APC fee is paid, and we get the prestige that people seek. One barrier to this is shifting the burden back to publishers, but with proper management of community resources it is the least bad option.

Up to this point, I have been speaking in economically coded language. Without thinking about various motivations, however, we cannot fully understand ways to move forward. Let's think about various intrinsic/extrinsic motivations of authors and their institutions to reclaim open access. Intrinsic motivations are properties of individual cognition, while extrinsic motivations are things that motivate individual behaviors from the outside world.

Intrinsic motivations

There are many intrinsic motivations that drive acceptance and adoption of open access. But there are many that do not, and these motivations often come into tension. Positive drivers include striving for a better community, an imperative for sharing results with the community, the ability to provide different platforms for scientific communication (datasets, hypotheses, theory, out-of-scope studies), and recognition for unsung components of the scientific process (such as technical reports or negative results). Negative drivers include a need to satiate cultural traditions, an inability to convey prestige through open means, a conflation of open access with fraud and low-quality work, and an inability to meet the quality needed to do open access successfully.  

Extrinsic motivations

The multitude extrinsic motivations include institutional support, the need for career promotion, community rewards and prestige, the pressure for cost savings, and technological ease of adoption. These can be a mix of positive and negative drivers that make adoption of open access hard to justify. Interactions with open-source software can also drive open access adoption, as the commitment needed to develop shared data and code can be easily extended to other academic artifacts.

What is the path forward? 

Sometimes considering motivations are not enough, and the community is much pettier and more irrational than we like to admit. It is worth thinking about eLife's model in open peer review, which in part lead to a backlash against the editorial staff [4, 5] by less sympathetic members of the scientific community (and barely-disguised corporate interests). Part of this is a disagreement about open strategies, but this is also about the gatekeeping nature the scientific community itself. The eLife model allows for papers to be preprinted, and then peer reviewed. The paper remains live on eLife's website even if the reviews recommend rejection (although the rejection is noted) [6, 7]. This is not novel amongst open peer review platforms but has rankled the more hierarchically oriented members of the scientific community. Perhaps we need to also consider "irrational management strategies", or what intrinsic motivations drive decisions that favor obsolete conventions.


References:

[1] Melvin et.al (2020). Communicating and disseminating research findings to study participants: Formative assessment of participant and researcher expectations and preferences. Journal of Clinical and Translational Science, 4(3), 233–242.

[2] Casci and Adams (2020). Research Culture: Setting the right tone. eLife, 9, e55543.

[3] Nosek et.al (2015). Promoting an open research culture. Science, 348(6242), 1422-1425.

[4] eLife latest in string of major journals put on hold from Web of Science. RetractionWatch. https://retractionwatch.com/2024/10/24/elife-latest-in-string-of-major-journals-put-on-hold-from-web-of-science/

[5] Abbot (2023). Strife at eLife: inside a journal’s quest to upend science publishing. Nature News, March 17. https://www.nature.com/articles/d41586-023-00831-6

[6] F1000 Staff (2022). Open peer review: establishing quality. March 7.  https://www.f1000.com/blog/peer-review-establishing-quality

[7] McCallum et.al (2021). OpenReview NeurIPS 2021 Summary Report. https://docs.openreview.net/reports/conferences/openreview-neurips-2021-summary-report

May 22, 2024

Google Summer of Code 2024

 

Welcome to the new Google Summer of Code scholars for 2024! INCF is sponsoring four students for which I (Bradly Alicea) am acting as mentor: two students for the DevoWorm group (via the OpenWorm Foundation community) and two students for the Orthogonal Research and Education Laboratory


D-GNNs (sponsored by the DevoWorm group)

Congratulations to Pakhi Banchalia and Mehul Arora for being accepted to work on the Developmental Graph Neural Networks (D-GNNs) project. Pakhi will be working on incorporating Neural Developmental Programs (NDPs) into GNN models. Mehul will be working on hypergraph techniques for developmental lineage trees and embryogenesis [1]. Himanshu Chougule (Google Summer of Code scholar) is a co-mentor for this project.

Virtual Reality for Research and Open-source Sustainability (sponsored by the Orthogonal Research and Education Lab)

Congratulations to Sarrah Bastawala for being accepted to work on the Open-source Sustainability project [2]. Sarrah is incorporating Large Language Models (LLMs) into the collection of agent-based approaches for this project. Jesse Parent is a co-mentor for this project.

We also have two Open-source Development scholars for Summer 2024: Adama Koita and Shubham Soni. They will be participating in the Orthogonal Lab's open-source weekly meetings in addition to projects based around Virtual Reality and Open-source Sustainability, respectively.