I have recently read the book "Ignorance: how it drives science" by
Stuart Firestein, a Neuroscientist at NYU. The title "Ignorance" refers to the
ubiquity of how what we don't know influences the scientific facts we teach, reference, and hold up as popular examples. In fact, he
teaches a course at New York University (NYU) on how "Ignorance" can guide research, with guest lecturers from a variety of fields [1]. This is of particular interest to me, as I hosted a workshop last summer (at Artificial Life 13) called
"Hard-to-Define Events" (HTDE). From what I have seen in the past year or so [2], people seem to be converging on this idea.
In my opinion, two trends are converging that seem to be generating interest in this topic. One is the rise of big data and the internet, which make the communication of results and rendering of the "research landscape" easier. Literature mining tools [3] are enabling discovery in and of itself, but also revealing the shortcomings of previously-published studies. There has also been a good deal of controversy raised over the last 10 years in terms of a replication crisis [4] coupled with the realization that the scientific method is not as rigorous [5] as previously thought.
The work of
Jeff Hawkins, head of
Numenta, Inc. [6], addresses many of these issues from the perspective of formulating a theoretical synthesis. For a number of years, he has been interested in a unified theory of the human brain. While there are challenges in terms of both testing such a theory AND getting the field to fit it into their conceptual schema, Jeff has nevertheless found success
building technological artifacts based on these ideas.
Jeff's work illustrates the balance between integrating what we do know about a scientific field with what we don't know. This involves using novel neural network models to generate "intelligent" and predictive behavior.
Computational abstraction is a useful tool in this regard, but in the case of empirical science the challenge is to include what we do know and exclude what we don't from our models.
According to this viewpoint, the success of scientific prediction (e.g. the extent to which a theory is useful) is dependent upon whether findings and deductions are convergent or divergent. By convergent and divergent, I mean how independent findings can be used to triangulate a single set of principles or predict similar outcomes. Examples of convergent findings include Darwin’s finch beak diversity to understand natural selection [7] and the use of behavioral and neuroimaging assays [8] to understand attention.
There are two ways in which the author proposes that ignorance operates in the domain of science. For one, ignorance defines our knowledge. The more we discover, the more we discover we don't know. This is often the case with problems and in fields for which little
a priori knowledge exists. The field of neuroscience has certainly has its share of landmark discoveries that ultimately raise more questions than provide answers in terms of function or mechanism. Many sciences often go through a
"stamp-collecting" or "natural history" phase, during which characterization is the primary goal [9]. Only later does hypothesis-driven, predictive science even seem appropriate.
The second role of ignorance is a caveat, based on the first part of the word: "to ignore". In this sense, scientific models can be viewed as tools of conformity. There is a tendency to ignore what does not fit the model, treating these data as outliers or noise. You can think about this as a challenge to the traditional use of curve-fitting and normalization models [10], both of which are biased towards treating normalcy as a statistical signal signature.
If we think about this algorithmically [11], it requires a constantly growing problem space, but in a manner typically associated with reflexivity. What would an algorithm defining "what we don't know" and “reflexive” science look like? Perhaps this can be better understood using a metaphor of the Starship Enterprise embedded in a spacetime topology. Sometimes, the Enterprise must venture into uncharted regions of space (but one that still corresponds to spacetime). While the newly-discovered features are embedded in the existing metric, these features are unknown a priori [12]. Now consider features that exist beyond the spacetime framework (beyond the edge of the known universe) [13]. How does a faux spacetime get extrapolated to features found here? The word extrapolation is key, since the features will not necessarily be classified in a fundamentally new way (e.g. prior experience will dictate what the extended space will look like).
With this in mind, there are several points that occurred to me as I was reading “Ignorance” that might serve as heuristics for doing exploratory and early-stage science:
1) Instead of focusing on
convexity (optimal points), examine the trajectory:
* problem spaces which are less well-known have a higher degree of nonconvexity, and have a moving global optimum.
* this allows us to derive trends in problem space instead of merely finding isolated solutions, especially for an ill-defined problem. It also prevents an answer marooned in solution space.
2) Instead of getting the correct answer, focus on defining the correct questions:
* according to Stuart Firestein,
David Hilbert's (Mathematician) approach was to predict best questions rather than best answers (e.g. what a
futurist would do).
3) People tend to ask question where we have the most complete information (e.g. look where the light shines brightest, not where the answer actually is):
* this leads us to make the comparison between prediction (function) and phenomenology (structure). Which is better? Are the relative benefits for each mode of investigation problem-dependent?
Stepping back from the solution space-as-Starfleet mission metaphor, we tend to study what we can measure well, and in turn what we can characterize well. But what is the relationship between characterization, measurement, and a solution (or ground-breaking scientific finding)? There are two components in the progression from characterization to solution. The first is to characterize enough of the problem space so as to create a measure, and then use that measure for further characterization, ultimately arriving at a solution. The second is to characterize enough of the problem space so that coherent questions can be asked, which allows a solution to be derived. When combined, this may provide the best tradeoff between precision and profundity.
Yet which should come first, the measurement or the question? This largely depends on the nature of the measurement. In some cases, measures are more universal than single questions or solutions (e.g. information entropy, fMRI, optical spectroscopy). Metric spaces are also subject to this universality. If a measure can lead to any possible solution, then it is much more independent of question. In “Ignorance”, von Neumann's
universal constructor, as applied to
NKS theory by Wolfram [15] are discussed as a potentially universal measurement scheme.
There are two additional points I found intriguing. The first is a fundamental difference between scientific fields where there is a high degree of "ignorance" (e.g. neuroscience) versus those where there is a relatively low degree (e.g. particle physics). This is not a new observation, but has implications for applied science. For example, the
interferometer is a tool used in the physical sciences to build inferences between and find information among signals in a system. Would it be possible to build an interferometer based on neural data? Yes and no. While there is an emerging technology called
brain-machine interfaces (BMI), these interfaces are limited to well-characterized signals and favorable electrophysiological conditions [16]. Indeed, as we uncover increasingly more about brain function, perhaps brain-machine interface technology will become closer to being like an interferometer. Or perhaps not, which would reveal a lot about how intractable ignorance (e.g. abundance of unknowable features) might be in this field.
The second point involves the nature of innovation, or rather, innovations which lead to useful inventions. It is generally thought that engaging in applied science is the shortest route to success in this area. After all, pure research (e.g. asking questions about the truly unknown) invoves blind trial-and-error and ad-hoc experiments which lead to hard-to-interpret results. Yet "Ignorance" author Firestein argues that pure research might be more useful in terms of generating future innovation than we might recognize. This is becuase while there are many blind alleyways in the land of pure research, there are also many opportunities for serendipity (e.g. luck). It is the experiments that benefit from luck which potentially drive innovation along the furthest.
NOTES:
[1] One example is
Brian Greene, a popularizer of string theory and theoretical physics and faculty member at NYU.
[2] via researching the literature and internal conversations among colleagues.
[3] for an example, please see: Frijters, R., van Vugt, M., Smeets, R., van Schaik, R., de Vlieg, J., and Alkema, W. (2010). Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases. PLoS Computational Biology, 6(9), e1000943.
[4] Yong, E. (2012). Bad Copy. Nature, 485, 298.
[5] Ioannidis, J.P. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8), e124 (2005).
[6] Also the author of the following book: Hawkins, J. and Blakeslee, S. (2004).
On Intelligence. Basic Books.
[7] here is a list of
examples for adaptation and natural selection (COURTESY: PBS).
[8] for an example of how this has affected the field of Psychology, please see: Sternberg, R.J. and Grigorenko, E.L. (2001). Unified Psychology. American Psychologist, 56(12), 1069-1079.
[9] this was true of biology in the 19th century, and neuroimaging in the late 20th century. There are likely other examples I have not included here.
[10] here are more information on
curve fitting (demo) and
normalization (Wiki).
[11] this was discussed in the HTDE Workshop (2012). What is the
computational complexity of a scientific problem? Can this be solved via parallel computing, high-throughput simulation, or other strategies?
Here are some additional insights from the philosophy of science (A) and the emerging literature on solving well-defined problems through optimizing experimental design (B):
A] Casti, J.L. (1989). Paradigms Lost: images of man in the mirror of science. William Morrow.
B] Feala, J.D., Cortes, J., Duxbury, P.M., Piermarocchi, C., McCulloch, A.D., and Paternostro, G. (2010). Systems approaches and algorithms for discovery of combinatorial therapies. Wiley Interdisciplinary Reviews: Systems Biology and Medicine, 2, 127.
AND Lee, C.J. and Harper, M. (2012). Basic Experiment Planning via Information Metrics: the RoboMendel Problem. arXiv, 1210.4808 [cs.IT].
[12] our spacetime topology corresponds to a metric space, a common context, or conceptual framework. In an operational sense, this could be the dynamic range of a measurement device or the logical structure of a theory.
[13] I have no idea how this would square away with current theory. For now, let’s just consider the possibility.....
[14] the citation is: Brooks, M. (2009). 13 Things That Don't Make Sense. Doubleday.
[15] NKS (New Kind of Science) demos are featured at
Wolfram's NKS website.
[16] this is a theme throughout the BMI and BCI literature, and includes variables such as type of signal used, patient population, and what is being controlled.