October 26, 2017

Open Access Week 2017: Version-Controlled Papers

The subject of a recent workshop [1], the next-generation scientific paper will include digital tools that formalize things such as version control and data sharing/access. Orthogonal Laboratory is developing a method for version-controlled documents that integrates formatting, bibliographic aspects, and content management. While this is not a novel approach to writing and composition [2], this post will cover how to apply a version-controlled strategy to presenting a scientific workflow. Below are brief sketches of our system for generating next-generation papers.

The first element is the process through which a document is generated, styled, and published (assigned a unique digital identifier or doi):

The key element of our system is a version control repository. We are using Bitbucket, but Github or a more specialized platforms such as Authorea or Penflip might also be sufficient. The idea is to build documents using the the Markdown language [3], then incorporate stylistic elements using CSS and HTML. VScode is used to manage spellcheck and grammar in the Markdown documents (containing the authored content). Reference management is done via Zotero, but again, any open source alternative will do.

The diffs function [4] of version control can be used to operate on final versions of Markdown files for the purpose of alternating between document versions. The idea is to not only find a consensus between collaborators, but to use branches strategically to push alternative versions of content to the doi as desired. This combinatorial editing framework could be desirable in appealing to different audiences or stressing specific aspects of the work at different points in time. Note that this is distinct from the editorial function of pulls and merges, which are meant to be more "under the hood".

Pandoc serves as a conversion tool, and can style documents according to particular specifications. This includes conventions such as APA style, or document formats such as LaTeX or pdf [5]. Additional components include code and data repositories, supplemental materials, and post-publication peer review.

Orthogonal Lab generally uses a host such as Figshare to generate dois for such content, but there are other hosts that generate version-specific dois as well. It is worth noting that Github-hosted academic journals are beginning to appear. Two examples are ReScience and Journal of Open Science Software. What we are providing (for our community and yours) is a means to generate styled documents (technical papers, blogposts, formal publications) in a version-controlled format. This also means papers can be dynamic rather than static: content at a given doi can be updated as desired.

[1] Perkel, J. (2017). C. Titus Brown: Predicting the paper of the future. Nature TechBlog, June 1.

[2] Eve, M.P. (2013). Using git in my writing workflow. August 18. Also, much of this functionality is accessible in Overleaf using TeX and a GUI interface.

[3] Cifuentes-Goodbody, N. (2016). Academic Writing in Markdown. YouTube. AND Sparks, D. and Smith, E. Markdown Field Guide, MacSparky.

[4] Diffs are also useful in comparing different versions of a published document as events unfold. Newsdiffs performs this function quite nicely on documents containing unfolding news.

[5] A few references for further reading:

a) Building your own Document Processor Tools:
Building Publishing Workflows with Pandoc and Git. Simon Fraser University Publishing.

b) Git + Diffs = Word Diffs:
Diff (and collaborate on) Microsoft Word documents using GitHub. Ben Balter blog.

c) Using Microsoft Word with Git. Martin Fenner blog.