October 24, 2024

OAWeek 2024: Intrinsic and Extrinsic Approaches to Open Access

This post is in celebration of Open Access (OA) Week 2024. The theme for this year is "Community over Commercialization". 

How do we incentivize people to adopt Open Access practices? We can take lessons from motivational Psychology to think about routes to better practice. Before doing so, we need to consider the current (and often sorry state) of open access.

It could be argued that in some important ways, Open Access has failed. The system of access to academic goods as currently structured is built on significant benefits to publishers and costs to libraries and authors. This benefit has been accrued by publishers due to reputational benefits: being published in Nature, Science, or Cell is highly prestigious. Yet the benefits of this prestige are necessarily limited to a few groups with lots of resources. And the beneficial attributes of open access have been captured by commercial entities. Similar problems plague the open-source community, a shift to the open ethos is the only way out.  

The different types of open access also play different roles in the marketplace of academic goods. Green open access, or self-archiving artifacts, are community goods. While this can be susceptible to the tragedy of the commons, proper social investment can ameliorate maintenance and growth imperatives. This is often seen as the highest standard for open access but requires community investment. Building a sustainable infrastructure of preprints, open peer review, and overlay journals has been elusive.

The Economic Benefits and Costs of Different Colored Access


Black open access using tools such as Sci-Hub is considered piracy (hence the "black" label). From an economic perspective, piracy is symptomatic of a dysfunctional market. Indeed, part of open access' failure is due to the dominant position of publishers and their own economic imperatives. In fact, black open access can be considered a rational response to closing access to article in a research culture of sharing and finding alternate routes to success [1-3]. 

To focus more on the publisher's advantage, and the failure point of open access more generally, is the current state of Gold, Diamond, and Platinum open access. Gold open access involves payment of an APC (Article Processing Charges) fee to the publisher. This often reduces the burdens on libraries, as they previously paid excessive subscription fees. This is because APCs actually increase the burden on individual authors, with disappointing results on the prestige economy. Without market power for the authors (or home institutions), there is no incentive to build Diamond and Platinum access systems. In such systems, no APC fee is paid, and we get the prestige that people seek. One barrier to this is shifting the burden back to publishers, but with proper management of community resources it is the least bad option.

Up to this point, I have been speaking in economically coded language. Without thinking about various motivations, however, we cannot fully understand ways to move forward. Let's think about various intrinsic/extrinsic motivations of authors and their institutions to reclaim open access. Intrinsic motivations are properties of individual cognition, while extrinsic motivations are things that motivate individual behaviors from the outside world.

Intrinsic motivations

There are many intrinsic motivations that drive acceptance and adoption of open access. But there are many that do not, and these motivations often come into tension. Positive drivers include striving for a better community, an imperative for sharing results with the community, the ability to provide different platforms for scientific communication (datasets, hypotheses, theory, out-of-scope studies), and recognition for unsung components of the scientific process (such as technical reports or negative results). Negative drivers include a need to satiate cultural traditions, an inability to convey prestige through open means, a conflation of open access with fraud and low-quality work, and an inability to meet the quality needed to do open access successfully.  

Extrinsic motivations

The multitude extrinsic motivations include institutional support, the need for career promotion, community rewards and prestige, the pressure for cost savings, and technological ease of adoption. These can be a mix of positive and negative drivers that make adoption of open access hard to justify. Interactions with open-source software can also drive open access adoption, as the commitment needed to develop shared data and code can be easily extended to other academic artifacts.

What is the path forward? 

Sometimes considering motivations are not enough, and the community is much pettier and more irrational than we like to admit. It is worth thinking about eLife's model in open peer review, which in part lead to a backlash against the editorial staff [4, 5] by less sympathetic members of the scientific community (and barely-disguised corporate interests). Part of this is a disagreement about open strategies, but this is also about the gatekeeping nature the scientific community itself. The eLife model allows for papers to be preprinted, and then peer reviewed. The paper remains live on eLife's website even if the reviews recommend rejection (although the rejection is noted) [6, 7]. This is not novel amongst open peer review platforms but has rankled the more hierarchically oriented members of the scientific community. Perhaps we need to also consider "irrational management strategies", or what intrinsic motivations drive decisions that favor obsolete conventions.


References:

[1] Melvin et.al (2020). Communicating and disseminating research findings to study participants: Formative assessment of participant and researcher expectations and preferences. Journal of Clinical and Translational Science, 4(3), 233–242.

[2] Casci and Adams (2020). Research Culture: Setting the right tone. eLife, 9, e55543.

[3] Nosek et.al (2015). Promoting an open research culture. Science, 348(6242), 1422-1425.

[4] eLife latest in string of major journals put on hold from Web of Science. RetractionWatch. https://retractionwatch.com/2024/10/24/elife-latest-in-string-of-major-journals-put-on-hold-from-web-of-science/

[5] Abbot (2023). Strife at eLife: inside a journal’s quest to upend science publishing. Nature News, March 17. https://www.nature.com/articles/d41586-023-00831-6

[6] F1000 Staff (2022). Open peer review: establishing quality. March 7.  https://www.f1000.com/blog/peer-review-establishing-quality

[7] McCallum et.al (2021). OpenReview NeurIPS 2021 Summary Report. https://docs.openreview.net/reports/conferences/openreview-neurips-2021-summary-report

May 22, 2024

Google Summer of Code 2024

 

Welcome to the new Google Summer of Code scholars for 2024! INCF is sponsoring four students for which I (Bradly Alicea) am acting as mentor: two students for the DevoWorm group (via the OpenWorm Foundation community) and two students for the Orthogonal Research and Education Laboratory


D-GNNs (sponsored by the DevoWorm group)

Congratulations to Pakhi Banchalia and Mehul Arora for being accepted to work on the Developmental Graph Neural Networks (D-GNNs) project. Pakhi will be working on incorporating Neural Developmental Programs (NDPs) into GNN models. Mehul will be working on hypergraph techniques for developmental lineage trees and embryogenesis [1]. Himanshu Chougule (Google Summer of Code scholar) is a co-mentor for this project.

Virtual Reality for Research and Open-source Sustainability (sponsored by the Orthogonal Research and Education Lab)

Congratulations to Sarrah Bastawala for being accepted to work on the Open-source Sustainability project [2]. Sarrah is incorporating Large Language Models (LLMs) into the collection of agent-based approaches for this project. Jesse Parent is a co-mentor for this project.

We also have two Open-source Development scholars for Summer 2024: Adama Koita and Shubham Soni. They will be participating in the Orthogonal Lab's open-source weekly meetings in addition to projects based around Virtual Reality and Open-source Sustainability, respectively. 

February 11, 2024

Charles Darwin meets Rube Goldberg: a tale of biological convolutedness


Charles Darwin studying a Rube Goldberg Machine (Freepik Generative AI, text-to-image)

For this Darwin Day post (2024), I will discuss the paper Machinery of Biocomplexity [1]. This paper introduces the notion of Rube Goldberg machines as a way to explore biological complexity and non-optimal function. This concept was first highlighted on Synthetic Daises in 2009 [2], while an earlier version of the paper was discussed on Synthetic Daisies in 2013 [3]. The paper was revised in 2014 to include a number of more advanced computational concepts, after a talk to the Network Frontiers Workshop at Northwestern University in 2013 [4]. 

Figure 1. Block and arrow model of a biological RGM (bRGM) that captures the non-optimal changes resulting from greater complexity. Mutation/Co-option removes the connection between A and B, then establishes a new set of connection with D. Inversion (bottom) flips the direction of connections between C-B and C-A, while also removing the output. This results in the addition of E and D which reestablishes the output in a circuitous manner.

Biological Rube Goldberg Machines (bRGNs) are defined as a computational abstraction of convoluted, non-optimal mechanisms. Non-optimal biological systems are represented using flexible Markovian box and arrow models that can be mutated and expanded given functional imperatives [5]. Non-optimality is captured through the principle of "maximum intermediate steps": biological systems such as neural pathways, metabolic reactions, and serial interactions do not evolve to the shortest route but is constrained (and perhaps even converge to) the largest number of steps. This results in a set of biological traits that functionally emerge as a biological process. Figure 1B shows an example where maximal steps represents a balance between the path of least resistance and exploration given constraints on possible interconnections [6]. The paths from A-E, E-B, and C-D are the paths of least resistance given the constraints of structure and function. In the sense that optimality is a practical outcome of physiological function, a great degree of intermediacy can preserve unconventional pathways that are utilized only spontaneously.

This can be seen in a wide variety of biological systems and is a consequence of evolution. Evolutionary exaptation, the evolution of alternative functions, and serial innovation all result in systems with a large number of steps from input to output. But sometimes convolution is the evolutionary imperative in and of itself. As fitness criteria change over evolutionary time, traces of these historical trajectories can be observed in redundant pathways and other results of subsequent evolutionary neutrality. One example from the paper involves a multiscale model (genotype-to-phenotype) that exploits both tree depth and lateral connectivity to maximize innovation in the production of a phenotype (Figure 2). While our models are based on connections between discrete states, bRGMs can also provide insight into the evolution of looser collections of single traits and even networks, where the sequence of function is bidirectional and hard to follow in stepwise fashion.

Figure 2.  A hypothetical biological RGM representing a multi-scale relationship. Each set of elements (A-F) represents the number of elements at each scale (actual and potential connections are shown with bold and thin lines, respectively). Examples of convolutedness incorporate both loops (as with E5,1 and E5,5) and the depth of the entire network.

The paper also features extensions of the basic bRGM, including massively convoluted architectures and microfluidic implementations. In the former, interconnected networks represent systems that are not only maximal in terms of size or length, but also massively topologically complex [7]. One example of this is cortical folding and the resulting neuronal connectivity in Mammalian brains. The latter example is based on fluid dynamics and combinatorial architectures that are more in line with discrete bRGMs (Figure 3). 

Figure 3. A microfluidic-inspired bRGM model that mimics the complexity of biological fluid dynamics (e.g. blood vessel networks). G1, G2, and G3 represent iterations of the system.


References:

[1] Alicea, B. (2014). The "Machinery" of Biocomplexity: understanding non-optimal architectures in biological systems. arXiv, 1104.3559.

[2] Non-razors, unite! January 30, 2009. https://syntheticdaisies.blogspot.com/2009/01/non-razors-unite.html

[3] Maps, Models, and Concepts: July Edition. Synthetic Daises blog. July 13, 2013. https://syntheticdaisies.blogspot.com/2013/07/maps-models-and-concepts-july-edition.html

[4] Inspired by a visit to the Network's Frontier....  Synthetic Daises blog. December 16, 2013. https://syntheticdaisies.blogspot.com/2013/12/fireside-science-inspired-by-visit-to.html

[5] when dealing with a large number of steps or in a polygenic context, these types of models can also resemble renormalization groups. For more on renormalization group, please see: Wilson, K.G. (1975). Renormalization group methods. Advances in Mathematics, 16(2), 170-186.

[6] this balance is as predicted by Constructive Neutral Evolution (CNE). For a relevant paper, please see: Gray et.al (2010). Irremediable Complexity? Science, 330(6006), 920-921.

[7] in the paper, this is referred to as Spaghettification, a term borrowed from the physics of gravitation. See this reference for an interesting implementation of this in soft materials:  Bonamassa et.al (2024). Bundling by volume exclusion in non-equilibrium spaghetti. arXiv, 2401.02579.

November 30, 2023

OpenWorm Annual Meeting 2023 (DevoWorm update)

It's that time again -- time for the OpenWorm Annual meeting. Below are the slides I presented on progress and the latest activities in the DevoWorm group. If anything looks interesting to you, and you would like to contribute, please let me know. Click on any slide to enlarge.












August 24, 2023

Saturday Morning NeuroSim Discussion Thread: Physical Computing

 

From the “Macy Conference Redux” feature form our July 1 meeting

Over the past three years, the Saturday Morning NeuroSim group has met weekly on Saturdays (mornings in North America). The Saturday Morning format continues in the tradition of Saturday Morning Physics and covers a wide variety of topics.

One recent lecture/discussion thread is on Physical Computation. Our approach to the topic begins with the debate around the role of computation in Cognitive Science and the Neurosciences. And so we begin in Week 1 with a discussion of the connections between computation, information processing, and the brain, largely focusing on the work of Gualtiero Piccinini and Corey Maley. A starting point for this session is their Stanford Encyclopedia of Philosophy article on “Computation in Physical Systems”. Many current assumptions about computation in the brain stem from the Church-Turing thesis, which often leads to a poor fit between model and experiment. Piccinini and Maley propose that the Church-Turing-Deutsch thesis is preferable when talking about systems that perform non-digital computations. Amanda Nelson pointed out the it makes sese to think of evolved biological systems (brains) as instances of analogue computers. Another interesting point from the session is the distinction between the digital (Von Neumann) computers and alternatives such as “physical” or “analog” computation, which would be picked up on in the next session.

Physical Computation Session I from June 24 (roughly one hour in length).

The second session focused on physical computation, and led us to discuss the idea of pancomputationalism. While pancomputationalism is the fundamental assumption behind the phrase “the brain is a computer” [1], we we also introduced to pancomputationalism in ferrofluidic systems and mycelial networks. We discussed the works of Richard Feynman (Feynman Lectures on Computation) and Edward Fredkin (Digital Physics), which helped us form an epistemic framework for computation in nature [2]. We also discussed Andy Adamatsky’s work on unconventional computation, particularly his work on Reaction-Diffusion (R-D) Automata, that while discrete in nature has connections to excitable (e.g neural) systems via the Fitzhugh-Nagumo model.

Physical Computation Session II from July 1 (roughly one hour in length)

After taking a break from the topic, our July 15 meeting featured an alternative viewpoint on pancomputationalism. This was made manifest in a shorter discussion on physical computation, with views from Tomasso Toffoli and Stephen Wolfram. We covered Toffoli’s paper “Action, or the funcgability of computation”, which connects physical entropy, information, action, and the amount of computation performed by a system. This paper is of great interest to the group in light of our work and discussions on 4E (embodied, embedded, enactive, and extended) cognition [3]. Toffoli makes some provocative arguments herein, including the notion of computation as “units of action”. A concrete example of this is a 10-speed bicycle, which is not only not a conventional computer, but also has linkages to perception and action. Amanda Nelson found the notion of transformation from one unit into another particularly salient to the distinction between analogue and digital computation. The physical basis of all forms of computation can also be better defined by revisiting “A New Kind of Science” [4], in which Wolfram sketches out the essential components and analogies of a computational system with a physical substrate. We can then compare some of the more abstract aspects of a physical computer with neural systems. This is particularly relevant to engineered systems that include select components of biological networks.

Physical Computation Session III from July 15 (about 15 minutes in length)

The next session followed up on computation in natural systems as well as Wolfram’s notion of universality, particularly in terms of computational models. In particular, Wolfram argues that cellular automata models can characterize universality, which is related to pancomputationalism. Universality suggests that any one computational model can capture system behavior that can be applied across a wide variety of domains. In this sense, context is not important. Rule 30 produces an output that resembles pattern formation in biological phenotypes (the shell of snail species Conus textile), but can also be used as a pseudo-random number generator [5]. In “A Framework for Universality in Physics, Computer Science, and Beyond”, this perspective is extended to understand the connections between computation defined by the Turing machine and a class of model called Spin Models. This provides a framework for universality that is useful form defining computation across the various levels of neural systems, but also gives rise to understanding what is uncomputable. This sessions natural system examples featured computation among bacterial colonies embedded in a colloidal substrate along with computation in granular matter itself. The latter is an example of non-silicon based polycomputation [6].

Physical Computation Session IV from July 22 (about 12 minutes in length).

After talking a more extended break from the topic, we returned to this discussion four weeks later (August 19). Our sixth (VI) session occurred in our August 19 meeting, and covered three topics: physical computation and topology, morphological computation, and RNA computing/Molecular Biology as universal computer.

We have discussed category theory before in our discussions on Symbolic Systems and Causality. In this section, we revisited the role of category theory, but this time with reference to Physical Computation. John Carlos Baez and Mike Stay give a tour of category theory’s role in computation via topology. The idea is that category theory forms analogies with computation, which can be expressed on a topological surface/space.

Computable Topology, Wikipedia.

Baez, J. and Stay, M. (2009). Physics, Topology, Logic and Computation: A Rosetta StonearXiv, 0903.0340.

Mapping category theory operators to a topological description.

We aslo covered the role of Morphological Computation by reviewing three papers on this form of physical computation that intersects with digital computational representations. Morphological Computation is the role of the body in the notion of “cognition is computation”. One idea that is critiqued with in these papers is offloading from the brain to the body. Offloading is moving computational capacity from the central nervous system to the periphery. If you grab a ball with your hand, you recognize and send commands to grasp the ball, but you must grasp and otherwise manipulate the object to fully compute the object. Thus, this capacity is said to be offloaded to the hand or peripheral nervous system.

Interestingly, offloading and embodiment are integral parts of 4E (Embodied, Embedded, Enactive, and Externalized) Cognition, which itself critiques the brain as computation idea. But as an analytical tool, morphological computation is much more utilitarian than Cognitive Science theory, and is concerned with how the robotic bodies and other mechanical systems interact with an intelligent controller. In non-embodied robotics, body dynamics is treated as noise. But in morphological computation, body dynamics play an integral role in the intelligent system and contribute to a dynamical system.

Muller, V.C. and Hoffmann, M. (2017). What Is Morphological Computation? On How the Body Contributes to Cognition and ControlArtificial Life, 23, 1–24.

Fuchslin, R.M., Dzyakanchuk, A., Flumini, D., Hauser, H., Hunt, K.J., Luchsinger, R.H., Reller, B., Scheidegger, S., and Walker, R. (2013). Morphological Computation and Morphological Control: Steps Toward a Formal Theory and ApplicationsArtificial Life, 19, 9–34.

Milkowski, M. (2018). Morphological Computation: Nothing but Physical ComputationEntropy, 20, 942.

The three insights from our morphological computational discussion.

While these papers do not get too deeply into the role of pancomputation in Morphological Computation, it is implicitly stated and plays a central role in our last topic: RNA computing and Molecular Biology. For more information, see this talk on YouTube and the paper below. Basically, while the pancomputationalism perspective is missing from biology, the structure and potential function of DNA and RNA provide a route to phycial computation.

Akhlaghpour, H. (2022). An RNA-based theory of natural universal computationJournal of Theoretical Biology, 537, 110984.

Bringing pancomputationalism into biology? What is its value?

Thanks to Morgan Hough for joining us from Hawaii (4:00 am!) on August 19.

References

[1] Richards, B.A. and Lillicrap, T.P. (2022). The Brain-Computer Metaphor Debate Is Useless: A Matter of SemanticsFrontiers in Computational Science, 4, 810358.

Should we just simply “shut up and calculate”, or debate some more?

[2] Fredkin, E. (2003). An Introduction to Digital PhilosophyInternational Journal of Theoretical Physics, 42(2), 189.

This work is the Rosetta Stone for many comparisons between modern AI systems and human-like intelligence, at least in terms of computation.

[3] Newen, A., DeBruin, L., and Gallagher, S. (2018). The Oxford Handbook of 4E Cognition. Oxford University Press.

[4] Wolfram, S. (2002). A New Kind of Science. Wolfram Media.

This is a link to the 20th Anniversary edition, with a full set of Cellular Automata rules, defined by number.

[5] Zenil, H. (2016). How can I generate random numbers using the Rule 30 Cellular Automaton? Quora post.

[6] Bongard, J. and Levin, M. (2023). There’s Plenty of Room Right Here: Biological Systems as Evolved, Overloaded, Multi-Scale MachinesBiomimetics, 8(1), 110.

Printfriendly