December 15, 2017

Work With Me, the Orthogonal Laboratory, and the OpenWorm Foundation This Summer!

The Google Summer of Code (GSoC) is once again accepting applications from students to work on a range of programming-oriented projects over the Summer of 2018. Orthogonal Laboratory and the OpenWorm Foundation have contributed a number of projects to the list. Here are links to the project descriptions (login required):

Orthogonal Laboratory:

DevoWorm Group:

OpenWorm Foundation:

I am the contact person for the Orthogonal Laboratory and DevoWorm Group projects, and Matteo Cantarelli is the contact person for the other projects. If you have any questions about the application process or want to have be review your application before submission, please feel free do so. The deadline for application submission is tentatively in late March/early April. Stay tuned!

Join us on "The Road to GSoC"!

December 1, 2017

Coherence and Relevance in Scientific Communities

          During the past year, Synthetic Daisies featured a series of posts on relevance theory and intellectual coherence within research communities [1]. In this post, I would like to use a set of small datasets to demonstrate how relevance plays a role in shaping scientific practice [2]. We are using a syntactic approach (or word frequency) to infer changes over time in specific scientific fields. 

          This is done using a list of words extracted from titles of accepted papers at the NIPS (Neural Information Processing Systems) conference from various years past. The NIPS conference (annual) represents a set of fields (Neural Modeling, Artificial Intelligence, Machine Learning) that has experienced rapid innovation and vast methodological change over the past 20 years [3]. To get a handle on patterns that represent a more stable field, data from GECCO (Genetic and Evolutionary Computation Conference). While there is plenty of innovation in this area of CS research, the short-term wholesale turnover of methods is much less prominent.

          Our approach involves ranking words used in paper titles in terms of frequency, and then comparing these rankings between different time intervals. Title words are in many ways more precise than keywords in that titles tend to be descriptive of specific methods and approaches. Each list has the top 15 results for each year listed. Changes in rank are represented by lines between their location in each pairwise list, and words that newly appear or disappear from the list project to a black dot underneath the ranked lists. 

          The working hypothesis is that periods of rapid change are characterized by very little carry-over between two neighboring time-points. Basic descriptive terms specific to the field should remain, but all other terms in the earlier list will be replaced a new set of terms. 

NIPS Conference Accepted Papers for 10-year intervals.

          The first graph shows the change in top terms (relevance) across 10-year intervals. As expected for such a fast-moving field, the terms exhibit an almost complete turnover for each interval (4/15 terms are continuous between 1994 and 2007, and 5/15 terms are continuous between 2007 and 2016). The only three terms that are present in both 1994 and 2016 are "learning", "model", and "neural". These are consistent with the basic descriptive terms in our working hypothesis.

NIPS Conference Accepted Papers for 3-year intervals.

          The second graph demonstrates changes in top terms (relevance) between 2010 and 2016, using intervals of three years. As expected, there is more continuity between terms (8/15 terms are continuous between 2010 and 2013, and 11/15 terms are continuous between 2013 and 2016). The 2013-2016 interval is interesting in that two of the terms new to the 2016 list ("optimal" and "gradient") are descriptors of a word that was lost from the 2013 list ("algorithm"). This suggests that there was much coherence in research topics within this interval as compared the 2010-2013 intervals.

GECCO Conference Accepted Papers for 1-year intervals.

          For both one-year intervals, 11/15 terms are preserved from one interval to the next. The terms that exhibit this continuity are consistent with the idea of basic descriptive terms. This might be seen as the signature of stability within communities, as it matches what is observed between 2013 and 2016 for the NIPS data.

          In keeping with the idea of scientific revolutions [4], we might adjust our view of paradigm shifts as revolutions in relevance. This serves as an alternative to the "big person" view of history, where luminaries such as Newton or Einstein singularly make big discoveries that change the course of their field and upend prevailing views. In this case, revolutions occur when communities shift their discourse, sometimes quite rapidly, to new sets of topics. This seems to be the case with various samplings of the NIPS data.

          For papers presented at NIPS and GECCO, what is relevant in a particular year is made salient to the audience of people who attend the conference. Whether or not this results in a closed feedback loop (people perpetually revisiting a focused set of topics) is dependent on other social dynamics.

UPDATE (12/7):
A preprint is now available! Check it out here: How to find a scientific revolution: intellectual field formation and the analysis of terms. Psyarxiv, doi:10.17605/OSF.IO/RHS9G (2017).

[1] For more information, please see the following posts: Excellence vs. Relevance. July 2 AND Breaking Out From the Tyranny of the PPT, April 17 AND Loose Ends Tied, Interdisciplinarity, and Consilience. June 18.

[2] Lenoir, R. (2006). Scientific Habitus: Pierre Bourdieu and the Collective Individual. Theory, Culture, and Society, 23(6), 25-43.

[3] For more about the experience and history of NIPS, please see: Surmenok, P. (2017). NIPS 2016 Reviews. Medium.

[4] Kuhn, T.S. (1962). The Structure of Scientific Revolutions. University of Chicago Press, Chicago, IL.

November 18, 2017

New Badges (Microcredentials) for Fall 2017

I have some new badges to advertise, one set from the OpenWorm Badge System, and one set from the Orthogonal Lab Badge System. As discussed previously on this blog, badges are microcredentials we are using to encourage participation in our research ecosystems at an introductory level.

An education-centric sketch of the OpenWorm and Orthogoanl Laboratory research ecosystems.

The first new badge series is an introduction to what is going on in the DevoWorm group, but also gives biologists and computationalists unfamiliar with Caenorhabditis elegans developmental biology a chance to get their feet wet by taking a multidisciplinary approach to the topic.

Worm Development I focuses on embryonic development and associated pattern formation. Worm Development I is a prerequisite to II, so be sure to try this one first.

Worm Development II focuses on larval development, including the postembryonic lineage tree and what characterizes each life-history stage.

The second badge series is hosted on the Orthogonal Lab Badge System, and provides an overview of Peer Review issues and techniques. This series is meant to give young scholars a working familiarity with the process of peer review. It is notable that Publons Academy now offers a course on Peer Review, to which this badge might serve as an abbreviated complement.

Peer Review I covers the history of peer review and the basics of pre-publication peer review. Be aware that Peer Review I is a prerequisite for Peer Review II (but not Peer Review for Data).

Peer Review II delves into how to decompose an article for purposes of peer review. An evaluation strategy for post-publication peer review is also covered.

Peer Review for Data contains a brief how-to for conducting peer review for open datasets.

November 15, 2017

Deep Reading Brings New Things to Life (Science)

Here is an interesting Twitter thread from Jacquelyn Gill on 'deep reading':

The basic idea is that exploring older literature can lead to new insights, which in turn lead to new research directions. The new research of our era tends to focus on the most relevant and cutting-edge literature [1]. This recency bias excludes many similarly relevant articles, including articles that perhaps inspired the more recent citations to begin with [2]. 

I have my own list of deep reads that have influenced some of my research in a similar fashion. These references can be either foundational or so-called "sleeping beauties" [3]. Regardless, I am doing my part to maintain connectivity [4] amongst academic citation networks:

1) Woodger, J.H. The Axiomatic Method in Biology. 1937.

An argument for biological rules, an influence on cladistics (developed in the 1960s), and a natural bridge to geometric approaches to data analysis and modeling. While there is a strong argument to be made against the axiomatic approach [5], this directly inspired much of my thinking in the biological modeling area. 

2) Davis R.L., Weintraub H., and Lassar A.B. Expression of a single transfected cDNA converts fibroblasts to myoblasts. Cell 51, 987–1000. 1987.

This was the first proof-of-concept for direct cellular reprogramming, and predates the late 2000's Nobel-winning work in stem cells by decades. In this case, a single transcription factor (MyoD) was used to convert a cell from one phenotype to another without a strict regard for function. More generally, this paper helped inspired my thinking in the area of cellular reprogramming to go beyond a biological optimization or algorithmic approach [6].

3) Ashby, W.R. Design for a Brain. 1960.

"Design for a Brain" serves as a stand-in for the entirely of Ashby's bibliography, but this is the best example of how Ashby successfully merged explanations of adaptive behavior [7] with systems models (cybernetics). In fact, Ashby originally coined the phrase "Intelligence Augmentation" [8]. I first discovered Ashby's work while working in the area of Augmented Cognition, and has been more generally useful as inspiration for complex systems thinking.

Not so much a couple of sleeping beauty as easy reading technical reference guides for all things complexity theory.

5) Bourdieu, P. Outline of a Theory of Practice. Cambridge University Press. 1977 AND Alexander, C., Ishikawa, S., and Silverstein, M. A Pattern Language: towns, buildings, construction. Oxford
University Press. 1977.

This is a bonus, not because the references are particularly obscure or even from the same academic field, but because they partially influenced my own view of cultural evolution. This is yet another piece of advice to young researchers: take things that appear to be disparate on their surface and incorporate them into your mental model. If nothing else, you will gain valuable skills in intellectual synthesis.

UPDATE (11/17):
Here is another example of old (classic, not outdated) work influencing new scholarship.

[1] Evans, J.A. (2008). Electronic Publication and the Narrowing of Science and Scholarship. Science, 321(5887), 395-399 AND Scheffer, M. (2014). The forgotten half of scientific thinking. PNAS, 111(17), 6119.

[2] related topics discussed on this blog include distributions of citation ages and most-cited papers.

[3] van Raan, A.F.J. (2004). Sleeping Beauties in Science. Scientometrics, 59(3), 467–472.

[4] Editors (2010). On citing well. Nature Chemical Biology, 6, 79.

[5] For the semantic approach (which had been influential to my more recent work), please see: Lloyd, E.A. (1994). The Structure and Confirmation of Evolutionary Theory. Princeton University Press, Princeton, NJ.

[6] Ronquist, S. (2017). Algorithm for cellular reprogramming. PNAS, 114(45), 11832–11837.

[7] Sterling, P. and Eyer, J. (1988). Allostasis: A new paradigm to explain arousal pathology. In "Handbook of life stress, cognition, and health". Fisher, S. and Reason, J.T. eds. Wiley, New York. 

[8] Ashby, W.R. (1956). An Introduction to Cybernetics. Springer, Berlin.

October 26, 2017

Open Access Week 2017: Version-Controlled Papers

The subject of a recent workshop [1], the next-generation scientific paper will include digital tools that formalize things such as version control and data sharing/access. Orthogonal Laboratory is developing a method for version-controlled documents that integrates formatting, bibliographic aspects, and content management. While this is not a novel approach to writing and composition [2], this post will cover how to apply a version-controlled strategy to presenting a scientific workflow. Below are brief sketches of our system for generating next-generation papers.

The first element is the process through which a document is generated, styled, and published (assigned a unique digital identifier or doi):

The key element of our system is a version control repository. We are using Bitbucket, but Github or a more specialized platforms such as Authorea or Penflip might also be sufficient. The idea is to build documents using the the Markdown language [3], then incorporate stylistic elements using CSS and HTML. VScode is used to manage spellcheck and grammar in the Markdown documents (containing the authored content). Reference management is done via Zotero, but again, any open source alternative will do.

The diffs function [4] of version control can be used to operate on final versions of Markdown files for the purpose of alternating between document versions. The idea is to not only find a consensus between collaborators, but to use branches strategically to push alternative versions of content to the doi as desired. This combinatorial editing framework could be desirable in appealing to different audiences or stressing specific aspects of the work at different points in time. Note that this is distinct from the editorial function of pulls and merges, which are meant to be more "under the hood".

Pandoc serves as a conversion tool, and can style documents according to particular specifications. This includes conventions such as APA style, or document formats such as LaTeX or pdf [5]. Additional components include code and data repositories, supplemental materials, and post-publication peer review.

Orthogonal Lab generally uses a host such as Figshare to generate dois for such content, but there are other hosts that generate version-specific dois as well. It is worth noting that Github-hosted academic journals are beginning to appear. Two examples are ReScience and Journal of Open Science Software. What we are providing (for our community and yours) is a means to generate styled documents (technical papers, blogposts, formal publications) in a version-controlled format. This also means papers can be dynamic rather than static: content at a given doi can be updated as desired.

[1] Perkel, J. (2017). C. Titus Brown: Predicting the paper of the future. Nature TechBlog, June 1.

[2] Eve, M.P. (2013). Using git in my writing workflow. August 18. Also, much of this functionality is accessible in Overleaf using TeX and a GUI interface.

[3] Cifuentes-Goodbody, N. (2016). Academic Writing in Markdown. YouTube. AND Sparks, D. and Smith, E. Markdown Field Guide, MacSparky.

[4] Diffs are also useful in comparing different versions of a published document as events unfold. Newsdiffs performs this function quite nicely on documents containing unfolding news.

[5] A few references for further reading:

a) Building your own Document Processor Tools:
Building Publishing Workflows with Pandoc and Git. Simon Fraser University Publishing.

b) Git + Diffs = Word Diffs:
Diff (and collaborate on) Microsoft Word documents using GitHub. Ben Balter blog.

c) Using Microsoft Word with Git. Martin Fenner blog.