Synthetic Daisies: 2024

November 30, 2024

OpenWorm Annual Meeting 2024 (DevoWorm update)

Here are the slides for the DevoWorm group's report to the OpenWorm Annual Meeting (2024). You can watch Bradly Alicea present the talk on YouTube.

Aside from all the great stuff going on in DevoWorm, there are two new tools being developed by Padraig Gleeson's team at University College London. First up is the Connectome Toolbox, which brings together multiple key datasets and visualization tools for C. elegans connectome analysis. The other is OpenWorm.ai, which uses a large language model (LLM) hosted on Huggingface to query information about C. elegans biology.

Here are some slides from the presentation. Click to enlarge. Thanks to Mehul Arora and Pakhi Banchalia for their Google Summer of Code efforts.

October 24, 2024

OAWeek 2024: Intrinsic and Extrinsic Approaches to Open Access

This post is in celebration of Open Access (OA) Week 2024. The theme for this year is "Community over Commercialization".

How do we incentivize people to adopt Open Access practices? We can take lessons from motivational Psychology to think about routes to better practice. Before doing so, we need to consider the current (and often sorry state) of open access.

It could be argued that in some important ways, Open Access has failed. The system of access to academic goods as currently structured is built on significant benefits to publishers and costs to libraries and authors. This benefit has been accrued by publishers due to reputational benefits: being published in Nature, Science, or Cell is highly prestigious. Yet the benefits of this prestige are necessarily limited to a few groups with lots of resources. And the beneficial attributes of open access have been captured by commercial entities. Similar problems plague the open-source community, a shift to the open ethos is the only way out.

The different types of open access also play different roles in the marketplace of academic goods. Green open access, or self-archiving artifacts, are community goods. While this can be susceptible to the tragedy of the commons, proper social investment can ameliorate maintenance and growth imperatives. This is often seen as the highest standard for open access but requires community investment. Building a sustainable infrastructure of preprints, open peer review, and overlay journals has been elusive.

The Economic Benefits and Costs of Different Colored Access

Black open access using tools such as Sci-Hub is considered piracy (hence the "black" label). From an economic perspective, piracy is symptomatic of a dysfunctional market. Indeed, part of open access' failure is due to the dominant position of publishers and their own economic imperatives. In fact, black open access can be considered a rational response to closing access to article in a research culture of sharing and finding alternate routes to success [1-3].

To focus more on the publisher's advantage, and the failure point of open access more generally, is the current state of Gold, Diamond, and Platinum open access. Gold open access involves payment of an APC (Article Processing Charges) fee to the publisher. This often reduces the burdens on libraries, as they previously paid excessive subscription fees. This is because APCs actually increase the burden on individual authors, with disappointing results on the prestige economy. Without market power for the authors (or home institutions), there is no incentive to build Diamond and Platinum access systems. In such systems, no APC fee is paid, and we get the prestige that people seek. One barrier to this is shifting the burden back to publishers, but with proper management of community resources it is the least bad option.

Up to this point, I have been speaking in economically coded language. Without thinking about various motivations, however, we cannot fully understand ways to move forward. Let's think about various intrinsic/extrinsic motivations of authors and their institutions to reclaim open access. Intrinsic motivations are properties of individual cognition, while extrinsic motivations are things that motivate individual behaviors from the outside world.

Intrinsic motivations

There are many intrinsic motivations that drive acceptance and adoption of open access. But there are many that do not, and these motivations often come into tension. Positive drivers include striving for a better community, an imperative for sharing results with the community, the ability to provide different platforms for scientific communication (datasets, hypotheses, theory, out-of-scope studies), and recognition for unsung components of the scientific process (such as technical reports or negative results). Negative drivers include a need to satiate cultural traditions, an inability to convey prestige through open means, a conflation of open access with fraud and low-quality work, and an inability to meet the quality needed to do open access successfully.

Extrinsic motivations

The multitude extrinsic motivations include institutional support, the need for career promotion, community rewards and prestige, the pressure for cost savings, and technological ease of adoption. These can be a mix of positive and negative drivers that make adoption of open access hard to justify. Interactions with open-source software can also drive open access adoption, as the commitment needed to develop shared data and code can be easily extended to other academic artifacts.

What is the path forward?

Sometimes considering motivations are not enough, and the community is much pettier and more irrational than we like to admit. It is worth thinking about eLife's model in open peer review, which in part lead to a backlash against the editorial staff [4, 5] by less sympathetic members of the scientific community (and barely-disguised corporate interests). Part of this is a disagreement about open strategies, but this is also about the gatekeeping nature the scientific community itself. The eLife model allows for papers to be preprinted, and then peer reviewed. The paper remains live on eLife's website even if the reviews recommend rejection (although the rejection is noted) [6, 7]. This is not novel amongst open peer review platforms but has rankled the more hierarchically oriented members of the scientific community. Perhaps we need to also consider "irrational management strategies", or what intrinsic motivations drive decisions that favor obsolete conventions.

References:

[1] Melvin et.al (2020). Communicating and disseminating research findings to study participants: Formative assessment of participant and researcher expectations and preferences. Journal of Clinical and Translational Science, 4(3), 233–242.

[2] Casci and Adams (2020). Research Culture: Setting the right tone. eLife, 9, e55543.

[3] Nosek et.al (2015). Promoting an open research culture. Science, 348(6242), 1422-1425.

[4] eLife latest in string of major journals put on hold from Web of Science. RetractionWatch. https://retractionwatch.com/2024/10/24/elife-latest-in-string-of-major-journals-put-on-hold-from-web-of-science/

[5] Abbot (2023). Strife at eLife: inside a journal’s quest to upend science publishing. Nature News, March 17. https://www.nature.com/articles/d41586-023-00831-6

[6] F1000 Staff (2022). Open peer review: establishing quality. March 7. https://www.f1000.com/blog/peer-review-establishing-quality

[7] McCallum et.al (2021). OpenReview NeurIPS 2021 Summary Report. https://docs.openreview.net/reports/conferences/openreview-neurips-2021-summary-report

May 22, 2024

Google Summer of Code 2024

Welcome to the new Google Summer of Code scholars for 2024! INCF is sponsoring four students for which I (Bradly Alicea) am acting as mentor: two students for the DevoWorm group (via the OpenWorm Foundation community) and two students for the Orthogonal Research and Education Laboratory.

D-GNNs (sponsored by the DevoWorm group)

Congratulations to Pakhi Banchalia and Mehul Arora for being accepted to work on the Developmental Graph Neural Networks (D-GNNs) project. Pakhi will be working on incorporating Neural Developmental Programs (NDPs) into GNN models. Mehul will be working on hypergraph techniques for developmental lineage trees and embryogenesis [1]. Himanshu Chougule (Google Summer of Code scholar) is a co-mentor for this project.

Virtual Reality for Research and Open-source Sustainability (sponsored by the Orthogonal Research and Education Lab)

Congratulations to Sarrah Bastawala for being accepted to work on the Open-source Sustainability project [2]. Sarrah is incorporating Large Language Models (LLMs) into the collection of agent-based approaches for this project. Jesse Parent is a co-mentor for this project.

We also have two Open-source Development scholars for Summer 2024: Adama Koita and Shubham Soni. They will be participating in the Orthogonal Lab's open-source weekly meetings in addition to projects based around Virtual Reality and Open-source Sustainability, respectively.

February 11, 2024

Charles Darwin meets Rube Goldberg: a tale of biological convolutedness

Charles Darwin studying a Rube Goldberg Machine (Freepik Generative AI, text-to-image)

For this Darwin Day post (2024), I will discuss the paper Machinery of Biocomplexity [1]. This paper introduces the notion of Rube Goldberg machines as a way to explore biological complexity and non-optimal function. This concept was first highlighted on Synthetic Daises in 2009 [2], while an earlier version of the paper was discussed on Synthetic Daisies in 2013 [3]. The paper was revised in 2014 to include a number of more advanced computational concepts, after a talk to the Network Frontiers Workshop at Northwestern University in 2013 [4].

Figure 1. Block and arrow model of a biological RGM (bRGM) that captures the non-optimal changes resulting from greater complexity. Mutation/Co-option removes the connection between A and B, then establishes a new set of connection with D. Inversion (bottom) flips the direction of connections between C-B and C-A, while also removing the output. This results in the addition of E and D which reestablishes the output in a circuitous manner.

Biological Rube Goldberg Machines (bRGNs) are defined as a computational abstraction of convoluted, non-optimal mechanisms. Non-optimal biological systems are represented using flexible Markovian box and arrow models that can be mutated and expanded given functional imperatives [5]. Non-optimality is captured through the principle of "maximum intermediate steps": biological systems such as neural pathways, metabolic reactions, and serial interactions do not evolve to the shortest route but is constrained (and perhaps even converge to) the largest number of steps. This results in a set of biological traits that functionally emerge as a biological process. Figure 1B shows an example where maximal steps represents a balance between the path of least resistance and exploration given constraints on possible interconnections [6]. The paths from A-E, E-B, and C-D are the paths of least resistance given the constraints of structure and function. In the sense that optimality is a practical outcome of physiological function, a great degree of intermediacy can preserve unconventional pathways that are utilized only spontaneously.

This can be seen in a wide variety of biological systems and is a consequence of evolution. Evolutionary exaptation, the evolution of alternative functions, and serial innovation all result in systems with a large number of steps from input to output. But sometimes convolution is the evolutionary imperative in and of itself. As fitness criteria change over evolutionary time, traces of these historical trajectories can be observed in redundant pathways and other results of subsequent evolutionary neutrality. One example from the paper involves a multiscale model (genotype-to-phenotype) that exploits both tree depth and lateral connectivity to maximize innovation in the production of a phenotype (Figure 2). While our models are based on connections between discrete states, bRGMs can also provide insight into the evolution of looser collections of single traits and even networks, where the sequence of function is bidirectional and hard to follow in stepwise fashion.

Figure 2. A hypothetical biological RGM representing a multi-scale relationship. Each set of elements (A-F) represents the number of elements at each scale (actual and potential connections are shown with bold and thin lines, respectively). Examples of convolutedness incorporate both loops (as with E5,1 and E5,5) and the depth of the entire network.

The paper also features extensions of the basic bRGM, including massively convoluted architectures and microfluidic implementations. In the former, interconnected networks represent systems that are not only maximal in terms of size or length, but also massively topologically complex [7]. One example of this is cortical folding and the resulting neuronal connectivity in Mammalian brains. The latter example is based on fluid dynamics and combinatorial architectures that are more in line with discrete bRGMs (Figure 3).

Figure 3. A microfluidic-inspired bRGM model that mimics the complexity of biological fluid dynamics (e.g. blood vessel networks). G1, G2, and G3 represent iterations of the system.

References:

[1] Alicea, B. (2014). The "Machinery" of Biocomplexity: understanding non-optimal architectures in biological systems. arXiv, 1104.3559.

[2] Non-razors, unite! January 30, 2009. https://syntheticdaisies.blogspot.com/2009/01/non-razors-unite.html

[3] Maps, Models, and Concepts: July Edition. Synthetic Daises blog. July 13, 2013. https://syntheticdaisies.blogspot.com/2013/07/maps-models-and-concepts-july-edition.html

[4] Inspired by a visit to the Network's Frontier.... Synthetic Daises blog. December 16, 2013. https://syntheticdaisies.blogspot.com/2013/12/fireside-science-inspired-by-visit-to.html

[5] when dealing with a large number of steps or in a polygenic context, these types of models can also resemble renormalization groups. For more on renormalization group, please see: Wilson, K.G. (1975). Renormalization group methods. Advances in Mathematics, 16(2), 170-186.

[6] this balance is as predicted by Constructive Neutral Evolution (CNE). For a relevant paper, please see: Gray et.al (2010). Irremediable Complexity? Science, 330(6006), 920-921.

[7] in the paper, this is referred to as Spaghettification, a term borrowed from the physics of gravitation. See this reference for an interesting implementation of this in soft materials: Bonamassa et.al (2024). Bundling by volume exclusion in non-equilibrium spaghetti. arXiv, 2401.02579.