March 14, 2015

A Modest Framework for Scientific Transparency

Here are six points for the integration of open-access science publishing and open data. This was developed from personal practice and research in addition to interactions with the Research Data Service (University of Illinois) and the SciFund challenge. This pipeline begins at the write-up stage, but some points rely on practice prior to analysis and write-up.

A)   Preprint (e.g. kernel of hypothesis- or question-driven results).

A number of options exist for this, including arXiv, bioRxiv, PLoS One, or another permanent location that provides a formal archival address or digital object identifier (doi). The core paper should be brief (6-12 pgs) and formal.

B)   Advanced methods/theory.

These can be submitted as supplemental materials, either in the same repository as the preprint itself or on another permanent server. As opposed to simple auxillary files, this should be set up more along the lines of an iPython notebook.

C)   Advanced Analysis.

This can be treated in the same manner as the advanced methods/theory. This will include transformational datasets (e.g. time-frequency decompositions, log transforms, combinations of data from multiple sources in a common framework) and the associated data tables and figures/graphs.

D)   Datasets.

1)   Raw Data: images, unprocessed vectorial or matricial output.

These will be stored as formatted image files, ASCII files, or tabular files.

2)   Processed Data: numeric variables, simple annotation.

These will be appended to the raw data either in the file or as linked files in the same directory.

3)   Higher-level Data: correlational, data fusion, decompositional.

These will include the transformational datasets mentioned in the section on Advanced Analysis. These datasets are to be linked to the raw and processed data directory. Simple annotation methods will confirm the identity.

4) Higher-level Representation: RDF/XML descriptive models, algorithmic (e.g. data landscapes, possibility spaces).

These types of representations can help us go beyond the typical reliance on “statistical significance” and “future directions” to provide a rigorous approach to guide future investigations. An example of this is parameterization models from existing data.

E)   Blogging Publicity.

All materials should be promoted through a blog post. This can be in the form of a feature article, or as a series of annotated links. This can be followed up with reposting key features of the initial post to a social blog like Tumblr or sharing a link via Twitter.

F)   Peer Commentary.

While this is typically kept confidential, there are so-called post-peer-review venues that provide a means to review work (e.g. PeerJ, F1000). This includes both formal (actionable) statements and informal statements in the form of critiques. 

This outline represents the entirely of a scientific reporting pipeline (from formal write-up to published items), although I am no doubt missing something. I will be fleshing each of these points out in future posts with real data and examples from Orthogonal Research and my work at the University of Illinois.

No comments:

Post a Comment