March 5, 2020

Open Data Day 2020


Welcome to Open Data Day 2020! Sponsored by the Orthogonal Research and Education Laboratory.  Our activities start today, and will continue over the course of the next year. For this iteration of Open Data Day, we are looking for software developers, data scientists, statisticians, and quantitative biologists to work on a host of issues related to open data-related activities in the DevoWorm group. Listed below are a series are series of possible goals for the next year.

1) We would like to construct pseudo-data sets for theory-building and modeling. This involves establishing simulated and resampled data sets that can be used as the input to machine learning, statistical, and functional models. Examples of these would include numeric data generated using statistical distributions, a generative approach using selected features (cells) as inputs, or the energy potentials of kinetic processes in an embryo.

2) There is also a need to build towards metadata standards, particularly with respect to the integration of different data types. Metadata helpful to the DevoWorm group includes (but is not limited to) cell division timing, high-level descriptions, positional and geometric information, and other features. The development of metadata repositories according to a schema data structure would be helpful.

3) Also needed is a focus on DevoZoo maintenance, including the addition of datasets, the integration of data sets, and improvements in presentation style/interface design. Since last year's launch, the resources for each species or computational platform have become outdated. We would not only like to provide links to resources such as new data sets and gene expression atlases, but also provide access to “intermediate” resources such as ontologies, metadata, and models from other research groups. There is also a further desire to make DevoZoo sustainable.

Current iteration of DevoZoo (click to enlarge).

4) As an initiative farther off into the future, we would like to add semantic capabilities to our models and data sets. One such example is a “controlled vocabulary” for developmental microscopy images and molecular data. In concert with this, having the capability to attach meanings and other notes to image and simulation features would increase the interpretability of such data.

5) In conjunction with the Data Reuse Initiative, we would like to provide some application of the FAIR principles. FAIR stands for making data findable, accessible, interoperable, and reusable. There are two opportunities here: a FAIRness evaluation, or how to make data FAIR, and promotion of each component of FAIR. For example, making datasets on DevoZoo more findable by adding tags or other classification tools would help newcomers make the most of our resource.

1 comment:

Printfriendly