Synthetic Daisies: essays

Showing posts with label essays. Show all posts

November 27, 2020

Open-source Community-building Discussions

As part of my role as Community Manager at the Rokwire Initiative, I have been maintaining content on a personal blog. I am highlighting some of the systems-oriented posts here on Synthetic Daises. The first re-post summarizes the functional aspects of open-source communities, while in the second re-post I propose that open-source communities are actually form of collective intelligence. The third re-post is of particular interest to this blog's audience, which is a revisitation of wicked problems.

Striving for Issue 0 (why do we have a community)

Why do we want to build a community in the first place? This question may seem self-obvious on its surface, even in the open-source context. But it is good to state the reasons why we might want to create a community. More importantly, it is good to think about communities as having explicitly stated benefits. Communities allow us to tap into a heterogeneous pool of expertise, build upon a network of contributors, recruit collaborators for future projects, and serve to facilitate education.

What direction does the action flow? A countdown to the essential issue, or enumeration of the issues?

The first benefit of an open-source community is to allow for a single contributor to interact quickly with many other contributors with a wide range of skills and expertise. For example, if you are a technical writer and have a question about a new programming language, you can post a message to the Slack channel with your inquiry and receive a helpful answer much more quickly than making a blanket inquiry or searching for the answer yourself. This actually sounds like the guiding principle of a University, but with a much more decentralized and informal structure.

Unlike a University, open-source communities are inherently interconnected [1], and as such serve to form a social network of contributors. Our hypothetical Slack channel is part of a Slack team (with many channels), and the Slack team is in turn a part of the entire community. As with a traditional workplace or school, this involves the interactions of many people, often with different roles and interests. A community allows for people who might not otherwise interact to cross paths, and mutually benefit from this serendipity [2].

Example of network structure in an open source community. Map of openrpg bug report interactions. From [3].

By their nature, open-source community networks are broad and loosely integrated. This is necessary by design; many open-source contributors have other commitments and contribute erratically to the project. But this is a good thing! One benefit of this structure is to enable people who would otherwise not make the time commitment. Related to this is the existence of a talent pool that is poised to build out an existing initiative, or engage in a new initiative. This alleviates the need to recruit and incentivize people using conventional hiring mechanisms.

But these relationships should not designed to be exploitative. In fact, one contributor to a healthy and stable open-source community is the feeling that contributors are able to benefit from their participation. Short of financial compensation, there are many other ways that contributors can benefit. One of these is becoming educated about the software platform itself. Educational incentives such as badging systems (microcredentials) serve as small-scale incentives. Being a contributor also allows one to learn about the latest features, and even have a hand in developing them. Another way open-source communities can reinforce education is through outreach, particularly in the form of an Ambassadors program [4]. A focus on learning opportunities centered on a specific software platform, in addition to learning and promoting associated skills, builds participation incentives into the community.

In these ways, we can take a step back from issues and builds and think about how we can leverage the community as a benefit of the platform. Then we can work towards addressing Issue 0: building and maintaining a community.

NOTES:

[1] Teixeira, J., Robles, G., and Gonzalez-Barahona, J.M. (2015). Lessons learned from applying social network analysis on an industrial Free/Libre/Open Source Software ecosystem. Journal of Internet Services and Applications, 6, 14.

[2] Achakulvisut, T., Ruangrong, T., Acuna, D.E., Wyble, B., Goodman, D., and Kording, K. (2020). neuromatch: Algorithms to match scientists. eLife Labs, May 18.

[3] Crowston, K. And Howison, J. (2005). The social structure of free and open source software development. First Monday, 10(2).

[4] Starke, L. (2016). Building an Online Community Ambassador Program. Higher Logic blog, July 28.

The Role of Collective Intelligence in Open-source

What do ants have to do with open-source?

Ants (and social insect more generally) are capable of building structures that feature great complexity and labor specialization. These complicated structures result from the small contribution of individual ants, each of which have a specialized job in their community [1]. In this sense, ant colonies exhibit parallels with open-source collaboration. Both types of organization (ant societies and open-source communities) rely upon collective intelligence, or the wisdom and power of crowds to create an artifact of great complexity shaped by design principles that emerge from these interactions.

Collective intelligence can be defined as intentional behaviors based on a coordinated set of goals among multiple individuals, and emerges from various forms of collaboration, competition, and group effort. For systems ranging from insect colonies to human societies [2], collective intelligence is a prime enabler of coordinated social behaviors and movements [3]. Being aware of how this process works is important for making the most of an open-source effort.

Traditional crowdsourcing can be understood as cooperation between two groups of people: requesters and contributors [4]. Requesters can be thought of as people who want a functional artifact, but may not be able to implement the solution. Contributors are people with technical know-how who realize the initial specification. I have discussed the open-source ethos and related contributor motivations in other posts on this blog. In this post, the focus will be on how requesters and contributors work together to produce a coherent outcome.

The find-fix-verify (FFV) workflow [5] has been identified as a facilitator or collective intelligence in open-source projects. FFV involved three steps: 1) finding room for improvement, such as a new feature or correcting an error, 2) proposing specific solutions to said improvements, and 3) verify that such changes are acceptable solutions. FFV allows for a division of labor in open-source communities based on expertise and technical skill. Yet each step is not executed by a dedicated employee reporting to a supervisor. Rather, each step is performed by groups of contributors who make their own contributions based on personal experiences and context.

In terms of organization, there are three ways in which open-source communities can be organized [4]. These are demonstrated graphically using Paul Baran’s networking diagram. The first is through direct leadership (centralized, A), which typically involves a single coordinator. This is most typically found in traditional corporate or academic organizations, and are often the least effective at harnessing the power of open-source.

Open-source communities can also be organized through collaboratives (decentralized, B), which involves the coming-together of people with a common interest. This form of organization is usually maintained over the long-term through expedient subgroups that form and dissolve given the immediate imperative. Finally, the passive mode of organization (distributed, C) is perhaps the most effective mode for facilitating open-source. In this type of organization, members of the crowd work independently, and in fact may never collaborate directly. This mode resembles so-called leaderless movements [6]. While collaborative and passive organizational modes have their advantages for facilitating open-source, there is no one-size-fits-all solution. The optimal organizational structure is often project- and goal-dependent.

Paul Baran’s Networking Diagram

A great majority of open-source contributors are transient [7]. Generally, these people contribute to a single project, and within that project only contribute to a small portion of the codebase. Most open-source projects have a few key leaders (occasionally not dissimilar to queen ants) who coordinate and facilitate project management [8]. But this is only a coordination tactic; across hundreds or event thousands of contributors, great things can happen.

NOTES:

[1] Bonabeau, E., Dorigo, M., and Theraulaz, G. (1999). Swarm Intelligence: From Natural to Artificial Systems. Oxford University Press, New York.

[2] O’Bryan, L., Beier, M., and Salas, E. (2020). How Approaches to Animal Swarm Intelligence Can Improve the Study of Collective Intelligence in Human Teams. Journal of Intelligence, 8(1), 9.

[3] Sasaki, T. And Biro, D. (2017). Cumulative culture can emerge from collective intelligence in animal groups. Nature Communications, 15049.

[4] Bigham, J.P., Bernstein, M.S., and Adar, E. (2014). Human-Computer Interaction and Collective Intelligence. Chapter 2, Collective Intelligence Handbook.

[5] Bernstein, M.S., Little, G., Miller, R.C., Hartmann, B., Ackerman, M.S., Karger, D.R., Crowell, D., and Panovich, K. (2015). Soylent: A Word Processor with a Crowd Inside. Communications of the ACM, 58(8), 85-94. doi:10.1145/ 2791285.

[6] Alicea, B. (2012). Leaderless control: understanding unguided order. Synthetic Daisies, April 9.

[7] Cui, X. And Stiles, E. (2010). Workings of Collective Intelligence within Open Source Communities. Third International Conference on Social Computing, Behavioral Modeling, and Prediction, Bethesda, MD.

[8] Alicea, B. (2020). Building a Distributed Virtual Laboratory Adjacent to Academia. MetaArXiv, doi:10.31222/osf.io/4k3z6.

Infinite Issues (issue infinity): how to break down a wicked problem

In this blog post, I want to discuss how to break down a big problem into a set of smaller issues that are addressable in a short-term timescale. Previously, I discussed how to break a complex problem down into addressable action items. Now I want to do this in the context of wicked problems, or problems that are highly complex, hard to predict, and have multiple unintended or unforeseen outcomes.

Solving a basic problem requires you to take its most salient and/or interesting features, and then establish the outlines of a solution. This might involve breaking the problem up into more easily solved parts, or asking additional questions about the nature and context of the problem. Next is a consideration of what resources you might need. In an open-source project, many of these projects revolve around the time constraints of contributors. Therefore, one key to creating issues is to break down the problem into small pieces that are both relatively easy to solve and require a low time commitment. Individually, these might seem too small to matter. Taken together, however, they allow you to build large-scale applications.

Of course, this procedure works well for normal complex problems, such as “let’s build a smart watch”! But suppose we want to work on a problem that interacts strongly with social systems, such as how to mitigate a pandemic. Then we have to think about the problem in terms of so-called wicked problems.

Wicked problems have a very high degree of computational complexity, which translates into a system with many more moving parts than cannot be analyzed in a way that provides an exact answer.

1) ill-defined problems reign supreme: it is hard to define the problem or even a set of issues at least initially.

2) all solutions to a given problem are at best a guess. This includes the ability to break down a problem into salient issues. Most issues in wicked problems will require the approximation of the salient issues, as well as forms of rapid prototyping to refine issues as the problem domain becomes more familiar.

Wicked Problems from a design perspective. Image is from Figure 1 in [1].

3) no natural end-point, where a system does not have a clearly defined stopping points.

4) so-called messes interactions between subdomains, problems of which cannot be easily broken up into discrete parts [for more, see 2]. This might be because your problem domain has porous boundaries/categories or that the problem itself is highly variable over time.

One example of a wicked problem is an institutional response to COVID. The University of Illinois has done this fairly effectively, but has involved innovations on multiple fronts. Part of this has involved Safer Illinois, which is a normal complex problem. But Safer is necessary but not sufficient to solving this problem. The real success has been seen through multiple institutional components such as testing regimens and building ambassadors. While an app designed to manage one part of the pandemic response is a basic problem, the response as a while is a wicked problem. This is something to consider when we think about how our contributions fit into larger issues.

NOTES:

[1] Jobst, B. And Meinel, C. (2013). How Prototyping Helps to Solve Wicked Problems. Design Thinking Research, 105-113.

[2] Yeh, R.T. (1991). System Development as a Wicked Problem. International Journal of Software Engineering and Knowledge Engineering, 1(2), 117-130.

April 10, 2020

fQXi essay on the Undecidable, Uncomputable, and Unpredictable

It's that time of year again: the fQXi essay contest for 2020 is going strong! Every 12-24 months, fQXi (Foundational Questions Institute) sponsors an essay content on a different topic. The fQXi community [1] then responds to the essay using a ratings and comment system. This year's topic was "Undecidability, Uncomputability, and Unpredictability", a topic not only applicable to physics, but to fields ranging from Computer Science to Sociology and even Biology. Check out the collection of submissions for some incredibly creative takes on the topic.

Myself, along with Orthogonal Research and Education Lab members Jesse Parent (@JesParent on Twitter) and Ankit Gupta (@ankiitgupta7 on Twitter) submitted an essay called "The illusion of structure or insufficiency of approach? the un(3) of unruly problems".

I have also posted several essays from years past as part of a ResearchGate project. These include "Establishing the Phenomenological Conditions of Intention-like Goal-oriented Behavior" from 2016 and "Towards the meta-fundamental: introducing intercontextual invariants" from 2018.

A few weeks after submitting this year's essay, I discovered the work of Nicolas Gisin, who has published a series of papers [2] on alternative forms of mathematics (such as intuitionism) for describing complex systems. While his examples are limited to physics, they are a complement to this year's essay.

NOTES:
[1] for some stimulating internet discussion, check out the Alternative Models of Reality section of the fQXi community.

[2] Gisin, N. (2020). Mathematical languages shape our understanding of time in physics. Nature Physics, 16, 114–116.

February 16, 2019

Darwin meets Category Theory in the Tangential Space

For this Darwin Day (February 12), I would like to highlight the relationship between evolution by natural selection and something called category theory. While this post will be rather tangential to Darwin's work itself, it should be good food for thought with respect to evolutionary research. As we will see, category theory also has relevance to many types of functional and temporal systems (including those shaped by natural selection) [1], which is key to understanding how natural selection shapes individual phenotypes and populations more generally.

This isn't the last you'll hear from me in this post!

Category Theory originated in the applied mathematics community, particularly the "General Theory of Natural Equivalence" [2]. In many ways, category theory is familiar to those with conceptual knowledge of set theory. Uniquely, category theory deals with the classification of objects and their transformations between mappings. However, category theory is far more powerful than set theory, and serves as a bridge to formal logic, systems theory, and classification.

A category is defined by two basic components: objects and morphisms. An example of objects are a collection of interrelated variables or discrete states. Morphisms are things that link objects together, either structurally or functionally. This provides us with a network of paths between objects that can be analyzed using categorical logic. This allows us to define a composition (or path) by tracing through the set of objects and morphisms (so-called diagram chasing) to find a solution.

In this example, a pie recipe is represented as a category with objects (action steps) and morphisms (ingredients and results). This monoidal preorder can be added to as the recipe changes. From [3]. Click to enlarge.

Categories can also consist of classes: classes of objects might include all objects in the category, while classes of morphism include all relational information such as pathways and mappings. Groupoids are functional descriptions, and allow us to represent generalizations of group actions and equivalence relations. These modeling-friendly descriptions of a discrete dynamic system is quite similar to object-oriented programming (OOP) [4]. One biologically-oriented application of category theory can be found in the work of Robert Rosen, particularly topics such as relational biology and anticipatory systems.

Animal taxonomy according to category theory. This example focuses on exploring existing classifications, from species to kingdom. The formation of a tree from a single set of objects and morphisms is called a preorder. From [3]. Click to enlarge.

One potential application of this theory to evolution by natural selection is to establish an alternate view of phylogenetic relationships. By combining category theory with feature selection techniques, it may be possible to detect natural classes that correspond to common ancestry. Related to the discovery of evolutionary-salient features is the problem of phylogenetic scale [5], or hard-to-interpret changes occurring over multiple evolutionary timescales. Category theory might allow us to clarify these trends, particularly as they relate to evolving life embedded in ecosystems [6] or shaped by autopoiesis [7].

More relevant to physiological systems that are shaped by evolution are gene regulatory networks (GRNs). While GRNs can be characterized without the use of category theory, they also present an opportunity to produce an evolutionarily-relevant heteromorphic mapping [8]. While a single GRN structure can have multiple types of outputs, multiple GRN structures can also give rise to the same or similar output [8, 9]. As with previous examples, category theory might help us characterize these otherwise super-complex phenomena (and "wicked" problems) into well-composed systems-level representations.

NOTES:
[1] Spivak, D.I. (2014). Category theory for the sciences. MIT Press, Cambridge, MA.

[2] Eilenberg, S. and MacLane, S. (1945). General theory of natural equivalences. Transactions of the American Mathematical Society, 58, 231-294. doi:10.1090/S0002-9947-1945-0013131-6

[3] Fong, B. and Spivak, D.I. (2018). Seven Sketches in Compositionality: an invitation to applied category theory. arXiv, 1803:05316.

[4] Stepanov, A. and McJones, P. (2009). Elements of Programming. Addison-Wesley Professional.

[5] Graham, C.H., Storch, D., and Machac, A. (2018). Phylogenetic scale in ecology and
evolution. Global Ecology and Biogeography, doi:10.1111/geb.12686.

[6] Kalmykov, V.L. (2012). Generalized Theory of Life. Nature Precedings, 10101/npre.2012.7108.1.

[7] Letelier, J.C., Marin, G., and Mpodozis, J. (2003). Autopoietic and (M,R) systems. Journal of Theoretical Biology, 222(2), 261-272. doi:10.1016/S0022-5193(03)00034-1.

[8] Payne, J.L. and Wagner, A. (2013). Constraint and contingency in multifunctional gene regulatory circuits. PLoS Computational Biology, 9(6), e1003071. doi:10.1371/journal.pcbi.1003071.

[9] Ahnert, S.E. and Fink, T.M.A. (2016). Form and function in gene regulatory networks: the structure of network motifs determines fundamental properties of their dynamical state space. Journal of the Royal Society Interface, 13(120), 20160179. doi:10.1098/rsif.2016.0179.

February 12, 2018

Darwin as a Universal Principle

Background Diagram: Mountian-Sky-Astronomy-Big-Bang blog.

For this year's Darwin Day post, I would like to introduce the concept of Universal Darwinism. To understand what is meant by universal Darwinism, we need to explore the meaning of the term as well as the many instances Darwinian ideas have been applied to. The most straightforward definition of Universal Darwinism is a Darwinian processes that can be extended to any adaptive system, regardless of their suitability. Darwinian processes can be boiled down to three essential features:

1) production of random diversity/variation (or stochastic process).

2) replication and heredity (reproduction, historical contingency).

3) natural selection (selective mechanism based on some criterion).

A fourth feature, one that underlies all three of these points, is the production and maintenance of populations (e.g. population dynamics). These features are a starting point for many applications of universal Darwinism. Depending on the context of the application,these four features may be emphasized in different ways or additional features may be added.

Taken collectively, these three features constitute many different types of process, encompassing evolutionary epistemology [1] to cultural systems [2], neural systems [3, 4], physical systems [5, 6], and informational/cybernetic systems [7, 8]. Many of these universal applications are explicitly selectionist, and do not have uniform fitness criteria. In fact, fitness is assumed in the adaptive mechanism. This provides a very loose analogy to organismal evolution indeed.

Universal computational model shaped by Darwinian processes. COURTESY: Dana Edwards, Universal Darwinism and Cyberspace.

Of these, the application to cybernetic systems is the most general. Taking inspiration from both cybernetics theory and the selectionist aspects of Darwinian models, Universal Selection Theory [7, 8] has four basic claims that can be paraphrased in the following three statements:

1) "operate on blindly-generated variation with selective retention".

2) "process itself reveals information about the environment".

3) "processes built atop selection also operate on variation with selective retention".

The key notions are that evolution acts to randomly generate variation, retains only the most fit solutions, then builds upon this in a modular and hierarchical manner. In this way, universal Darwinian processes act to build complexity. As with the initial list of features, the formation and maintenance of populations is an important bootstrapping and feedback mechanism. Populations and heredity underlie all Darwinian processes, even if they are not defined in the same manner as biological populations. Therefore, all applications of Darwinian principles must at least provide an analogue to dynamic populations, even at a superficial level.

There is an additional advantage of using universal Darwinian models: capturing the essence of Darwinian processes in a statistical model. Commonalities between Darwinian processes and Bayesian inference [3, 5] can be proposed as a mechanism for change in models of cosmic evolution. In the Darwinian-Bayesian comparison, heredity and selection are approximated using the relationship between statistical priors and empirical observation. The theoretical and conceptual connections between phylogeny, populations, and Bayesian priors is a post-worthy topic in and of itself.

At this point, we can step out a bit and discuss the origins of universal Darwinian systems. The origin of a Darwinian (or evolutionary) system can take a number of forms [9]. There are two forms of "being from nothingness" in [9] that could be proposed as origin points for Darwinian systems. The first is an origin in the lowest possible energetic (or in our case also fitness) state, and the other is what exists when you remove the governance of natural laws. While the former is easily modeled using variations of the NK model (which can be generalized across different types of systems), the latter is more interesting and is potentially even more universal.

An iconic diagram of Cosmic Evolution. COURTESY: Inflation Theory by Dr. Alan Guth.

An iconic diagram of Biological Evolution. COURTESY: Palaeontological Scientific Trust (PAST).

So did Darwin essentially construct a "theory of everything" over 200 years ago? Did he find "42" in the Galapagos while observing finches and tortoises? There are a number of features from complexity theory that might also fit into the schema of Darwinian models. These include concepts from self-organization not explicitly part of the Darwinian formulation: scaling and complexity, dependence on initial condition, tradeoffs between exploitation and exploration, and order arising from local interactions in a disordered system. More explicitly, contributions from chaos theory might provide a bridge between nonlinear adaptive mechanisms and natural selection.

The final relationship I would like to touch on here is a comparison between Darwinian processes and Universality in complex systems. The simplest definition of Universality states that the properties of a system are independent of the dynamical details and behavior of the system. Universal properties such as scale-free behavior [10] and conformation to a power law [11] occur in a wide range of systems, from biological to physical and from behavioral to social systems. Much like applications of Universal Darwinism, Universality allows us to observe commonalities among entities as diverse as human cultures, organismal orders/genera, and galaxies/universes. The link to Universality also provides a basis for the abstraction of a system's Darwinian properties. This is the key to developing more representationally-complete computational models.

8-bit Darwin. COURTESY: Diego Sanches.

Darwin viewed his theory development of evolution by natural selection as an exercise in inductive empiricism [12]. Ironically, people are now using his purely observational exercise as inspiration for theoretical mechanisms for systems from the natural world and beyond.

NOTES:
[1] Radnitzky, G.,‎ Bartley, W.W., and Popper, K. (1993). Evolutionary Epistemology, Rationality, and the Sociology of Knowledge. Open Court Publishing, Chicago. AND Dennett, D. (1995). Darwin's Dangerous Idea. Simon and Schuster, New York.

[2] Claidiere, N., Scott-Phillips, T.C., and Sperber, D. (2014). How Darwinian is cultural evolution? Philosophical Transactions of the Royal Society B, 36(9), 20130368.

[3] Friston, K. (2007). Free Energy and the Brain. Synthese, 159, 417-458.

[4] Edelman, G.M. (1987). Neural Darwinism: the theory of neuronal group selection. Oxford University Press, Oxford, UK.

[5] Campbell, J. (2011). Universal Darwinism: the path to knowledge. CreateSpace Independent Publishing.

[6] Smolin, L. (1992). Did the universe evolve? Classical and Quantum Gravity, 9, 173-191.

[7] Campbell, D.T. (1974). Unjustified Variation and Selective Retention in Scientific Discovery. In "Studies in the Philosophy of Biology", F.J. Ayala and T. Dobzhansky eds., pgs. 139-161. Palgrave, London.

[8] Cziko, G.A. (2001). Universal Selection Theory and the complementarity of different types of blind variation and selective retention. In "Selection Theory and Social Construction", C. Hayes and D. Hull eds. Chapter 2. SUNY Press, Albany, NY.

[9] Siegal, E. (2018). The Four Scientific Meanings Of ‘Nothing’. Starts with a Bang! blog, February 7.

[10] Barabási, A-L. (2009). Scale-Free Networks: a decade and beyond. Science, 325, 412-413.

[11] Lorimer, T., Gomez, F., and Stoop, R. (2015). Two universal physical principles shape the power-law statistics of real-world networks. Scientific Reports, 5, 12353.

[12] Ayala, F.J. (2009). Darwin and the Scientific Method. PNAS, 106(1), 10033–10039.

August 3, 2016

Slate and the Solitary Ethnographic Diagram

While his style and message does not resonate with me at all, I've always thought that Donald Trump's speeches were highly-structured rhetoric. He seems to be using a form of intersubjective signaling [1] understood by a number of constituencies as communicating their values in an authentic manner. Specifically, the speeches have a sentence structure and cadence that can be differentiated from the literalism of contemporary mainstream society or more traditional forms of doublespeak ubiquitous in American politics.

This is why the most recent challenge from Slate Magazine was too good to pass up. The challenge (which has the feel of a Will Shortz challenge): diagram a passage from a Donald Trump speech given on July 21 in Sun City, South Carolina. The passage is as follows:

"Look, having nuclear—my uncle was a great professor and scientist and engineer, Dr. John Trump at MIT; good genes, very good genes, OK, very smart, the Wharton School of Finance, very good, very smart—you know, if you’re a conservative Republican, if I were a liberal, if, like, OK, if I ran as a liberal Democrat, they would say I’m one of the smartest people anywhere in the world—it’s true!—but when you’re a conservative Republican they try—oh, do they do a number—that’s why I always start off: Went to Wharton, was a good student, went there, went there, did this, built a fortune—you know I have to give my like credentials all the time, because we’re a little disadvantaged—but you look at the nuclear deal, the thing that really bothers me—it would have been so easy, and it’s not as important as these lives are (nuclear is powerful; my uncle explained that to me many, many years ago, the power and that was 35 years ago; he would explain the power of what’s going to happen and he was right—who would have thought?), but when you look at what’s going on with the four prisoners—now it used to be three, now it’s four—but when it was three and even now, I would have said it’s all in the messenger; fellas, and it is fellas because, you know, they don’t, they haven’t figured that the women are smarter right now than the men, so, you know, it’s gonna take them about another 150 years—but the Persians are great negotiators, the Iranians are great negotiators, so, and they, they just killed, they just killed us"

Okay, here you go -- an ethnographic-style diagram [2] based on one man, but perhaps instructive of an entire American subculture (click to enlarge). The diagram focuses on the relationship between John and Donald Trump (context-specific braintrust) and a specific worldview of power wielded through nuclear weapons, financial ability, and persuasion.

NOTES:

[1] In this case, intersubjective signaling could be used as a mechanism to reinforce group cohesion, particularly when the group's belief structure is defined by epistemic closure.

[2] Perceived lack of agency shown as red arcs terminated with a dot.

May 31, 2015

Kuhnian Practice as a Logical Reformulation

Are 01110000 01100001 01110010 01100001 [1] shifts a loss, a gain, a mismatch, or an opportunity for intellectual integration and the birth of a new field?

In the Kuhnian [2] approach to empiricism, a well-known outcome observed across the history of science is the "paradigm shift". This occurs when a landmark finding shifts our pre-existing models of a given natural phenomenon. One example of this: Darwin's finches and their evolutionary history in the Galapagos. In this case, a model system confirmed previous intuitions and overturned old facts in a short period of time (hence the idea of a scientific revolution).

During a recent lecture by W. Ford Doolittle at the Insititute for Genomic Biology, I was introduced to a term called "Kuhn loss" [3]. Kuhn loss refers to the loss of accumulated knowledge due to a conceptual shift in a certain field. One might consider this to be a matter of housecleaning, or a matter of throwing out the baby with the bathwater. The context of this introduction was the debate between evolutionary genomicists [4] and the ENCODE consortium over the extent and nature of junk DNA. During the talk, Ford Doolittle presented the definitions of genome function proposed by the ENCODE consortium as a paradigm shift. The deeper intellectual history of biological function would suggest that indeed junk DNA not only exists, but requires a multidisciplinary and substantial set of results to overturn. Thus, rather than viewing the ENCODE results [5] as a paradigm shift, it can be viewed as a form of intellectual loss. The loss, paradigmatic or otherwise, provides us with a less satisfying and robust explanation than was previously the case.

A poster of the talk. COURTESY: IGB, University of Illinois, Urbana-Champaign

Whether or not you agree with Ford Doolittle's views of function, and I am of the opinion that you should, this introduces an interesting PoS issue. In the case of biological function, the caution is against a 'negative' Kuhn loss. But Kuhn loss (in a linear view of historical progress) usually refers to the loss of knowledge associated with folk theories or theories based on limited observational power. In some cases, these limited observations are augmented with deeper intuitive motivations. This type of intuition-guided theory usually becomes untenable given new observations and/or information about the world. Phlogiston theory [6] can be used to illustrate this type of 'positive' Kuhn loss. Quite popular in Ancient Greece and Medivel Europe, phlogiston theory predicts that the physical act of combustion released fire-like elements called phlogistons. Phlogistons operated in a manner opposite of the role we now know oxygen serves in combustion and other chemical reactions. Another less clear-cut example of 'positive' Kuhn loss involves a pre-relativity idea called aether theory predicts that the aether (an all-enveloping medium) is responsible for the propogation of light in space.

In each of these cases, what was lost? Surely the conclusions that arose from a faulty premise needed to be re-examined. A new framework also swept away inadequate concepts (such as "the aether" and "phlogistons"). But there was also a deeper set of logical structures that needed to be reformulated. In phlogiston theory, the direction of causality was essentially reversed. In aether theory, we essentially have a precursor to a more sophisticated concept (spacetime). Scientific revolutions are not all equal, and so neither is the loss that results. In some cases, Kuhn losses can be recovered and contribute to the advancement of a specific theoretical framework. Midwinter and Janssen [7] introduce us to the physicist/chemist Van Vleck, who improved upon the Kuhn loss introduced when quantum theory was introduced and replaced its antecedent theory. Van Vleck did this by borrowing mathematical formalisms from the theory of susceptibilities, and bringing them over to physics. While neither a restoration nor a paradigm shift, Van Vleck was able to improve upon the ability of quantum theory to make experimental predictions.

Tongue-in-cheek description of an empirically verified of phlogiston theory. COURTESY: [8]

Now let us revisit the Kuhnian content of the ENCODE kerfuffle vis a vis this framework of positive/negative Kuhn loss and Kuhn recovery. Is this conceptual clash ultimately a chance for a gain in theoretical richness and conceptual improvement? Does the tension between computational and traditional views of biological function neccessitate Kuhn loss (positive or negative)? According to the standard dialectical view [9], the answer to the former would be yes. In such case, we might expect a paradigm shift that results in an improved version of the old framework (e.g. 'positive' Kuhn loss). But perhaps there is also a cultural mismatch at play here [10] that could be informative for all studies of Kuhn loss. Since these differing perspectives come from very different intellectual and methodological traditions, we could say that any Kuhn loss would be negative due to a mismatch. This is a bit different from the phlogiston example in that while both approaches come from a scientific view of the world, they use different sets of assumptions to arrive at a coherent framework. However, what is more likely is that computational approaches (as new as they are to the biological domain) will influse themselves with older theoretical frameworks, resembling more of Kuhnian recovery (the quantum/antecedent theory example) than a loss or gain.

It is this intellectual (and logical) reformulation that will mark the way forward in computational biology, using an integrative approach (as one might currently take for granted in biology) rather than reasoning through the biology and computation as parallel entities. While part of the current state of affairs involves a technology-heavy computation being used to solve theoretically-challenging biological problems, better logical integration of the theory behind computational analysis and the theory behind biological investigation might greatly improve both enterprises. This might lead to new subfields such a the computation of biology, in which computation would be more than a technical appendage. Similarly, such a synthetic subfield would view of biological phenomena much more richly, albeit with the same cultural biases as previous views of life. Most importantly, this does not take a revolution. It merely takes a logical reformulation, one that could be put into motion with the right model system.

NOTES:
[1] the word "paradigmatic", translated into binary. COURTESY: Ashbox Binary Translator.

[2] Kuhn, T.S. The Structure of Scientific Revolutions. University of Chicago Press (1962).

[3] Hoyningen-Huene, P. Reconstructing Scientific Revolutions. University of Chicago Press (1983).

[4] Doolittle, W.F. Is junk DNA bunk? A critique of ENCODE. PNAS, 110(14), 5294-5300 (2013).

[5] The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57-74 (2012).

[6] Vihalemm, R. The Kuhn-loss Thesis and the Case of Phlogiston Theory. Science Studies, 13(1), 68 (2000).

[7] Midwinter, C. and Janssen, M. Kuhn Losses Regained: Van Vleck from Spectra to Susceptibilities. arXiv, 1205.0179 [physics.hist-ph] (2012).

[8] DrKuha The Phlogiston: Not Quite Vindicated. Spin One Half blog, May 19 (2009).

[9] what we should expect according to dialectical materialism: adherents of two ideologies struggle for dominance, with an eventual winner that is improved upon the both original ideologies. Not to be confused with the "argument to moderation".

[10] for more context (the difference between a scientific revolution and a scientific integration) please see: Alicea, B. Does the concept of paradigm shift need a rethink? Synthetic Daisies blog, December 25 (2014).

October 6, 2014

The Map of the Cat, the Hair of the Dog, and Other Metaphors and Descriptors

What's in a set of descriptions, or a set of metaphors for that matter? Quite a bit or very little, depending on whether or not you are working in your area of specialty. Richard Feynman once (and to the great consternation of neurophysiologists within earshot) referred to a feline brain atlas as the “map of the cat” (not to be confused with Arnold’s Cat Map).

Recurrent cats! But what about its brain?

This parable, of course, speaks to the role of jargon in science. I am generally in support of jargon-filled science, providing it serves to conceptually unify and serve as shorthand for complex phenomena. The problem occurs when it serves as a membership proxy into the high priesthood of Discipline x or Disipline y (ironically for Feynman, one of these disciplines was and is theoretical physics).

Far from making one sound like a drunken PoMo generator, jargon and highly-specialized language is sometimes an efficient information encoding scheme. But sometimes shortcuts that transcend jargon (but only briefly) are quite useful as well. But words are not enough. Sometimes it takes not a paradigm shift but a conceptual shift. And sometimes that takes a semi-humorous (and non-specialized) turn of phrase. Perhaps even a pun or two (to wit):

Q: what do an airplane crash investigators and experimental scientists have in common?

A: both look for an answer inside of a black box!

June 21, 2014

Fireside Science: The Representation of Representations

This content is being cross-posted to Fireside Science, and is the third in a three-part series on the "science of science".

This is the final in a series of posts on the science of science and analysis. In past posts, we have covered theory and analysis. However, there is a third component of scientific inquiry: representation. So this post is about the representation of representations, and how representations shape science in more ways than the casual observer might believe.

The three-pronged model of science (theory, experiment, simulation). Image is adapted from Fermi Lab Today Newsletter, April 27 (2012).

For the uninitiated, science is mostly analysis and data collection with theory being a supplement at best and necessary evil at worst. Ideally, modern science rests on three pillars: experiment, theory, and simulation. For these same uninitiated, the representation of scientific problems is a mystery. But in fact, it has been the most important motivation for much of the scientific results we celebrate today. Interestingly, the field of computer science relies heavily on representation, but this concern generally does not carry over into the empirical sciences.

Ideagram (e.g. representation) of complex problem solving. Embedded are a series of Hypotheses and the processes that link them together. COURTESY: Diagram from [1].

Problem Representation
So exactly what is scientific problem representation? In short, it is the basis for designing experiments and conceiving of models. It is the sieve through which scientific inquiry flows, restricting the typical "question to be asked" to the most plausible or fruitful avenues. It is often the basis of consensus and assumptions. On the other hand, representation is quite a bit more subjective than people typically would like their scientific inquiry to be. Yet this subjectivity need not lead to an endless debate about the validity of one point of view versus another. There are heuristics one can use to ensure that problems are represented in a consistent and non-leading way.

3-D Chess: a high-dimensional representation of warfare and strategy.

Models that Converge
Convergent models speaks to something I alluded to in "Structure and Theory of Theories" when I discussed the theoretical landscape of different academic fields. The first way is whether or not allied sciences or models point in the same direction. To do this, I will use a semi-hypothetical example. The hypothetical case is to consider three models (A, B, and C) of the same phenomenon. Each of these models make different assumptions and includes different factors, but should at least be consistent with each other. One real-world example of this is the use of gene trees (phylogenies) and species trees (phylogenies) to understand evolution in a lineage [2]. In this case, each model uses the same taxa (evolutionary scenario), but includes incongruent data. While there are a host of empirical reasons why these two models can exhibit incongruence [3], models that are as representationally complete as possible might resolve these issues.

Orientation of Causality
The second way is to ensure that the one's representation gets the source of causality right. For problems that are not well-posed or poorly characterized, this can be an issue. Let's take Type III errors [4] as an example of this. In hypothesis testing, type III errors involve using the wrong explanation for a significant result. In layman's terms, this is getting the right answer for the wrong reasons. Even more than in the case of type I and II errors, focusing on the correct problem representation plays a critical role in resolving potential type III errors.

Yet problem representation does not always help resolve these types of errors. Skeptical interpretation of the data can also be useful [5]. To demonstrate this, let us turn to the over-hyped area of epigenetics and its larger place in evolutionary theory. Clearly, epigenetics plays some role in the evolution of life, but is not deeply established in terms of models and theory. Because of this representational ambiguity, some interpretations play a trick. In a conceptual representation that embodies this trick, scarcely-understood high-level phenomena such as epigenetics will usurp the role of related phenomena such as genetic diversity and population processes. When the thing in your representation is not well-defined or quite popular (e.g. epigenetics), it can take on a causal life of its own. Posing the problem in this way allows us to obscure known dependencies between genes, genetic regulation, and the environment without proving exceptions to these established relationships.

Popularity is Not Sufficiency
The third way is to understand that popular conceptions do not translate into representational sufficiency. In logical deduction, it is often pointed out that necessity does not equal sufficiency. But as with the epigenetics example, it also holds that popularity cannot make something sufficient in and of itself. In my opinion, this is one of the problems with using narrative structures in the communication of science: sometimes an appealing narrative does more to obscure scientific findings than it does in making things accessible to lay people.

Fortunately, this can be shown by looking at media coverage of any big news story. The CNN plane coverage [6] shows this quite clearly: coverage of rampant speculation and conspiracy theory was a way to emphasize an increasingly popular story. In such cases, speculation is the order of the day, while thoughtful analysis gets pushed aside. But is this simply a sin of the uninitiated, or can we see parallels of this in science? Most certainly, there is a problem with recognizing the difference between "popular" science and worthwhile science [7]. There is also precedence from the way in which certain studies or areas of study are hyped. Some in the scientific community [8] have argued that Nature's hype of the ENCODE project [9] results fell into this category.

One example of a mesofact: ratings for the TV show The Simpsons over the course of several hundred episodes. COURTESY: Statistical analysis in [10].

Mesofacts
Related to these points is the explicit relationship between data and problem representation. In some ways, this brings us back to a computational view of science, where data do not make sense unless it is viewed in the context of a data structure. But sometimes the factual aspect of data varies over time in a way that obscures our mental models, and in turn obscures problem representation.

To make this explicit, Sam Arbesman has coined the term "mesofact" [11]. A mesofact is knowledge that changes slowly over time given new data. Populations of specific places (e.g. Minneapolis, Bolivia, Africa) has changed in both absolute and relative terms over the past 50 years. But when problems and experimental designs are formulated assuming that facts related to these data (e.g. rank of cities by population) do not change over time, we can get the analysis fundamentally wrong.

This may seem like a trivial example. However, mesofacts have relevance to a host of problems in science, from experimental replication to inferring the proper order of causation. The problem comes down to an interaction between data's natural variance (variables) and the constructs used to represent our variables (facts). When the data exhibit variance against an unchanging mean, it is much easier to use this variable as a stand-in for facts. But when this is not true, scientifically-rigorous facts are much harder to come by. Instead of getting into an endless discussion about the nature of facts, we can instead look to how facts and problem representation might help us tease out the more metaphysical aspects of experimentation.

Applying Problem Representation to Experimental Manipulation
When we do experiments, how do we know what our experimental manipulations really mean? The question itself seems self-evident, but perhaps it is worth exploring. Suppose that you wanted to explore the causes of mental illness, but did not have the benefits of modern brain science as a guide. In defining mental illness itself, you might work from a behavioral diagnosis. But the mechanisms would still be a mystery. Is it a supernatural mechanism (e.g. demons) [12], an ultimate form of causation (reductionism), or a global but hard-to-see mechanism (e.g. quantum something) [13]? An experiment done the same way but assuming three different architectures could conceivably yield statistical significance for all of them.

In this case, a critical assessment of problem representation might be able to resolve this ambiguity. This is something that as modelers and approximators, computational scientists deal with all of the time. Yet it is also an implicit (and perhaps even more fundamental) component of experimental science. For most of the scientific method's history, we have gotten around this fundamental concern by relying on reductionism. But in doing so, this restricts us to doing highly-focused science without appealing to the big picture. In a sense, we are blinded by science by doing science.

Focusing on problem representation allows us a way out of this. Not only does it allow us to break free from the straightjacket of reductionism, but also allows us to address the problem of experimental replication more directly. As has been discussed in many other venues [14], the lack of an ability to replicate experiments has plagued both Psychological and Medical research. But it is in these areas which representation is most important, primarily because it is hard to get right. Even in cases where the causal mechanism is known, the underlying components and the amount of variance they explain can vary substantially from experiment to experiment.

Theoretical Shorthand as Representation
Problem representation also allows us to make theoretical statements using mathematical shorthand. In this case, we face the same problem as the empiricist: are we focusing on the right variables? More to the point, are these variables fundamental or superficial? To flesh this out, I will discuss two examples of theoretical shorthand, and whether or not they might be concentrating on the deepest (and most generalizable) constructs possible.

The first example comes from Hamilton's rule, derived by the behavioral ecologist W.D. Hamilton [15]. Hamilton's rule describes altruistic behavior in terms of kin selection. The rule is a simple linear equation that assumes adaptive outcomes will be optimal ones. In terms of a representation, these properties provide a sort of elegance that makes it very popular.

In this short representation, an individual's relatedness to a conspecific contributes more to their behavioral motivation to help that individual than a typical trade-off between costs and benefits. Thus, a closely-related conspecific (e.g. a brother) will invest more into a social relationship with their kin than with non-kin. In general, they will take more personal risks in doing so. While more math is used to support the logic of this statement [15], this inequality is often treated as a widely applicable theoretical statement. However, some observers [16] have found the parsimony of this representation to be both too incomplete and intellectually unsatisfying. And indeed, sometimes an over-simplistic model does not deal with exceptions well.

The second example comes from Thomas Piketty's work. Piketty, economist and author of "Capital in the 21rst Century" [17], has proposed something he calls the "First Law" which explains how income inequality relates to economic growth. The formulation, also a simple inequality, characterizes the relationship between economic growth, inherited wealth, and income inequality within a society.

In this equally short representation, inequality is driven by the relative dominance of two factors: inherited wealth and economic growth. When growth is very low, and inherited wealth exists at a nominal level, inequality persists and dampens economic mobility. In Piketty's book, other equations and a good amount of empirical investigation is used to support this statement. Yet, despite its simplicity, it has held up (so far) to the scrutiny of peer review [18]. In this case, representation through variables that generalize greatly but do not handle exceptional behavior well produce a highly-predictive model. On the other hand, this form of representation also makes it hard to distinguish between a highly unequal post-industrial society and a feudal, agrarian one.

Final Thoughts
I hope to have shown you that representation is an underappreciated component of doing and understanding science. While the scientific method is our best strategy for discovering new knowledge about the natural world, it is not without its burden of conceptual complexity. In the theory of theories, we learned that formal theories are based on both deep reasoning and are (by necessity) often incomplete. In the analysis of analyses, we learned that the data are not absolute. Much reflection and analytical detail must be taken to ensure that an analysis represents meaningful facets of reality. And in this post, these loose ends were tied together in the form of problem representation. While an underappreciated aspect of practicing science, representing problems in the right way is essential for separating out science from pseudoscience, reality from myth, and proper inference from hopeful inference.

NOTES:
[1] Eldrett, G. The art of complex problem-solving. MediaExplored blog, July 10 (2010).

[2] Nichols, R. Gene trees and species trees are not the same. Trends in Ecology and Evolution, 16(7), 358-364 (2001).

[3] Gene trees and species trees can be incongruent for many reasons. Nature Knowledge Project (2012).

[4] Schwartz, S. and Carpenter, K.M. The right answer for the wrong question: consequences of type III error for public health research. American Journal of Public Health, 89(8), 1175–1180 (1999).

[5] It is important here to distinguish between careful skepticism and contrarian skepticism. In addition, skeptical analysis is not always compatible with the scientific method.

For more, please see: Myers, P.Z. The difference between skeptical thinking and scientific thinking. Pharyngula blog, June 18 (2014) AND Hugin The difference between "skepticism" and "critical thinking"? RationalSkepticism.org, May 19 (2010).

[6] Abbruzzese, J. Why CNN is obsessed with Flight 370: "The Audience has Spoken". Mashable, May 9 (2014).

[7] Biba, E. Why the government should fund unpopular science. Popular Science, October 4 (2013).

[8] Here are just a few examples of the pushback against the ENCODE hype:

a) Mount, S. ENCODE: Data, Junk and Hype. On Genetics blog, September 8 (2012).

b) Boyle, R. The Drama Over Project Encode, And Why Big Science And Small Science Are Different. Popular Science, February 25 (2013).

c) Moran, L.A. How does Nature deal with the ENCODE publicity hype that it created? Sandwalk blog, May 9 (2014).

[9] For an example of the nature of this hype, please see: The Story of You: ENCODE and the human genome. Nature Video, YouTube, September 10 (2012).

[10] Fernihough, A. Kalkalash! Pinpointing the Moments “The Simpsons” became less Cromulent. DiffusePrior blog, April 30 (2013).

[11] Arbesman, S. Warning: your reality is out of date. Boston Globe, February 28 (2010). Also see the following website: http://www.mesofacts.org/

[12] Surprisingly, this is a contemporary phenomenon: Irmak, M.K. Schizophrenia or Possession? Journal of Religion and Health, 53, 773-777 (2014). For a thorough critique, please see: Coyne, J. Academic journal suggests that schizophrenia may be caused by demons. Why Evolution is True blog, June 10 (2014).

[13] This is an approach favored by Deepak Chopra. He borrows the rather obscure idea of "nonlocality" (yes, basically a wormhole in spacetime) to explain higher levels of conscious awareness with states of brain activity.

[14] Three (divergent) takes on this:

a) Unreliable Research: trouble at the lab. Economist, October 19 (2013).

b) Ioannidis, J.P.A. Why Most Published Research Findings Are False. PLoS Med 2(8): e124 (2005).

c) Alicea, B. The Inefficiency (and Information Content) of Scientific Discovery. Synthetic Daisies blog, November 19 (2013).

[15] Hamilton, W. D. The Genetical Evolution of Social Behavior. Journal of Theoretical Biology, 7(1), 1–16 (1964). See also: Brembs, B. Hamilton's Theory. Encyclopedia of Genetics.

[16] Goodnight, C. Why I Don’t like Kin Selection. Evolution in Structured Populations blog, April 23 (2014).

[17] Piketty, T. Capital in the 21st Century. Belknap Press (2014). See also: Galbraith, J.K. Unpacking the First Fundamental Law. Economist's View blog, May 25 (2014).

[18] DeLong, B. Trying, yet again, to communicate the arithmetic scaffolding of Piketty's "capital in the Twenty-First Century". Washington Center for Equitable Growth blog, June 5 (2014).

November 27, 2020

April 10, 2020

February 16, 2019

February 12, 2018

August 3, 2016

May 31, 2015

October 6, 2014

June 21, 2014

Printfriendly