March 19, 2018

Google Summer of Code Application Deadline is Approaching!

A quick reminder as to what is near. Nothing to fear -- except that applications for Google Summer of Code 2018 are due on March 27th (in a little more than a week). Thanks to all the applicants to our projects so far. I am the mentor for the following projects (all sponsored by INCF):

Contextual Geometric Structures (Project 8)

Towards a k-D embryo (Project 10.2)

Physics-based XML (Project 10.1)

Apply to work work either the Orthogonal Research Lab community (Project 8) or the OpenWorm Foundation community (Projects 10.1, 10.2). Please contact me (Bradly Alicea) for more information.

There are several other projects hosted by the OpenWorm Foundation that combine work with Neuroscientific data and Computational Modeling. These include:

Advanced Neuron Dynamics in WormSim (Project 10.3)

Mobile application to explore C. elegans nervous system dynamics (Project 10.4)

Add support for Neurodata Without Borders 2.0 to Geppetto (Project 10.5)

All three of these projects are based in the OpenWorm Foundation community, and are lead by mentors Matteo Cantarelli, Giovanni Idili, and Stephen Larson.

March 3, 2018

Open Data Day 2018: Orthogonal Research Version

Time once again for International Open Data Day, an annual event hosted by organizations all around the world. For the Orthogonal Research contribution, I am sharing a presentation on the role of theory in data science (and the analysis of open data).

 Full set of slides are available on Figshare, doi:10.6084/m9.figshare.5483746

A theory of data goes back to before there were concepts such as "big data" or "open data". In fact, we can learn a lot from attempts to characterize regularities in scientific phenomena, particularly in the behavioral sciences (e.g. Psychophysics).

There are a number of ways to build a mini-theory, but one advantage of the approach we are working on is that (assuming partial information about the data being analyzed) a theoretical model can be built with very limited amounts of data. I did not mention the role of non-empirical reasoning [1] in the theory-building, but might be an important issue for future consideration.

 The act of theory-building is also creating generalized models of pattern interpretation. In this case, our mini-theory detects sheep-shaped arrays. But there are bottom-up and top-down assumptions that go into this recognition, and theory-building is a way to make those explicit.
 Naive theories are a particular mode of error in theory-building from sparse or incomplete data. In the case of human reasoning, naive theories result from generalization based on limited empirical observation and blind inference of mechanism. They are characterized in the Cognitive Science literature as being based on implicit and non-domain-specific knowledge [2].

Taken together, mini-theories and naive theories can help us not only better characterize unlabeled and sparsely labelled data, but also gain an appreciation for local features in the dataset. In some cases, naive theory-building might be beneficial for enabling feature engineering, ontologies/metadata [3] and other characteristics of the data.

In terms of usefulness, theory-building in data science lies somewhere in between mathematical discovery programs and epistemological models. 

[1] Dawid, R. (2013). Novel Confirmation and the Underdetermination of Scientific Theory Building. PhilSci Archive.

[2] Gelman, S.A., Noles, N.S. (2011). Domains and naive theories. WIREs Cognitive Science, 2, 490–502. doi:10.1002/wcs.124

[3] Rzhetsky, A., Evans, J.A. (2011). War of Ontology Worlds: Mathematics, Computer Code, or Esperanto? PLoS Computational Biology, 7(9), e1002191. doi:10.1371/journal.pcbi.1002191