Synthetic Daisies: inverse-problems

Showing posts with label inverse-problems. Show all posts

March 18, 2012

Methods of Controlling Intelligence

This blog post will focus on the recent (and not so recent) attempts to quantify, control, and augment intelligent performance-related behavior in human beings. The intersection of human intelligence and artificial intelligence by way of human performance goes by the name of Augmented Cognition. Augmented Cognition, generally regarded as a domain of Human Factors engineering, also has broad applications to human-machine systems. Relevant application domains could range from automotive and transportation performance to human interactions with information technologies and bioengineered prosthetic devices.

Augmented Cognition is distinct from traditional artificial intelligence, in which a general purpose intelligence is constructed de novo to control all aspects of intelligent behavior. Rather than machine intelligence compensating for the shortcomings of human intelligence, human intelligence compensates for the shortcomings of machine intelligence. Academic interest in this set of problems began in the 1950's [1], while contemporary approaches have included information technologies and DARPA's Augmented Cognition project. As applied to technology, this work falls into the broader category of human-assisted intelligent control.

There are two main components of augmenting human intelligence using computational means. The first is a closed-loop system which involves a feedforward and a feedback component between the individual and a technological system that enabled augmentation. This could be a heads-up display, a mobile device, or a brain-computer interface controlled by a real-time algorithm. The second is a model of human performance for a given set of cognitive and physiological functions which determines a control policy. Examples of both are provided below, along with a consideration of open problems in this field.

Closed-loop System Design

In an article from the 1950's [2], W.R. Ashby took a cybernetic approach to first-order (e.g. no intermediate variables) intelligence augmentation (Figures 1-4). While somewhat crude by modern standards (by which we use sensors to gain real-time measurements of physiological state), it does lay out a simple theoretical model for augmenting cognitive and neural function.

Figure 1. Highlight for the X component.

Figure 2. Highlight for the G component.

Figure 3. Highlight for the S component

Figure 4. Highlight for the U component.

In the Ashby model, the feedforward component (G) was the intelligence of the user applied to performance captured by the device. This might be driving performance, or accuracy in moving an object. While the idea that intelligence can be distilled to a single variable is controversial, modern applications have used variables such as accuracy counts or a specific electrophysiological signal to "drive forward" the system. The amplifier (S) itself gathers the feedforward elements of G and operates on them in a selective manner. This can be treated as either an optimization problem [3] or an inverse problem [4], and defines the control policy imposed on the performance data. In the Yerkes-Dodson example shown later on, a minimax-style optimization method is used. The feedback element (U) is a signal taken from the information in G and should contribute to an improvement in performance, or subsequent measurements of G.

Contemporary Models from Human Performance

More contemporary models for augmenting human performance [5,6] have involved mapping closed-loop control to a physiological response function. Figures 5 through 7 show how this works in the context of the Yerkes-Dodson curve. The Yerkes-Dodson curve is an inversely U-shaped function that characterizes arousal in the context of some physiological measurement. At both low and high values of the physiological indicator, the level of arousal is low. At moderate values of the physiological indicator, the level of arousal is high. The goal of an amplifier (also called a mitigation strategy) is to maintain performance (defined as measured arousal) among the highest range of arousal values.

Figure 5. Example of a physiological response function (e.g. Yerkes-Dodson curve).

Figure 6. Example of a mitigation strategy.

Figure 7. Keeping performance within an optimal range.

Two outstanding problems

There are two potential challenges to this control policy: a reliance on convexity and complete measurement of a physiological state. The example shown here has relevance to arousal and attention. It has attracted attention because of its relative ease of mitigation. The development of brain-machine interfaces has likewise focused on simple-to-characterize physiological signals (such as population vector codes for movement [7] or spectral bands of an EEG [8]). However, not all physiological response functions are so simple to characterize. In cases of significant non-convexity (or cases where the response function does not form smooth, convex gradients), it may be quite difficult to mitigate suboptimal behavior or physiological responses [9]. In such cases, there could be multiple optimal points each with very different performance characteristics.

The complete measurement of physiological state is another potential problem with this method. While fully characterizing a physiological or behavioral process is the most obvious difficulty, the adaptability of a physiological system to repeated mitigation is a more subtle but important problem. In some cases, the physiological response will habituate to the mitigation treatments and render them ineffective. In the case of presenting information on a heads-up display, users might simply tend to ignore the presented cues over long periods of time. It might also be that encouraging rapid changes in arousal level is more effective than encouraging a fixed level of performance over time. In both strength training regimens and more general physiological responses to the environment, switching between stimuli of alternating intensities can have a complex and ultimately adaptive consequences on the long-term response.

Incorporation of intelligence augmentation into the design of a technological system is an ongoing challenge. In a future post, I will focus on why certain aspects of human and animal intelligence are fundamentally different from and can potentially aid and complement current approaches to machine learning and artificial intelligence.

References:

[1] Ashby, W.R. (1952). Design for a Brain. Chapman and Hall, London.

[2] Ashby, W.R. (1958). Design for an Intelligence Amplifier. In Automata Studies. Shannon, C.E. and Ashby, W.R. Princeton University Press, Princeton, NJ.

[3] an optimization method uses some objective criterion to select a range of values thought to either minimize or maximize system properties.

[4] an inverse problem is one where the solution is known, but the route to that solution is not.

[5] Schmorrow, D. D. & Stanney, K.M. (Eds) (2008). Augmented Cognition: A Practitioner's Guide. HFES Publications.

[6] Fuchs, S., Hale, K.S., Stanney, K.M., Juhnke, J., and Schmorrow, D.D. (2007). Enhancing Mitigation in Augmented Cognition. Journal of Cognitive Engineering and Decision Making, 1(3), 309-326.

[7] Jarosiewicz, B., Chase, S.M., Fraser, J.W., Velliste, M., Kass, R.E., and Schwartz, A.B. (2008). Functional network reorganization during learning in a brain-computer interface paradigm. PNAS, 105(49), 19486-19491.

[8] Lotte, F., Congedo, M., Lecuyer, A., Lamarche, F., and Arnaldi, B. (2007). A review of classification algorithms for EEG-based Brain-Computer Interfaces. Journal of Neural Engineering, 4, 1-24.

[9] Alicea, B. The adaptability of physiological systems optimizes performance: new directions in
augmentation. arXiv Repository, arXiv:0810.4884 [cs.HC, cs.NE] (2008).

October 3, 2011

The Curse of Orthogonality

This is an idea I developed after reading a paper [1] on detecting pluripotency in different cell populations. The theory was developed in terms of cell biology, but may also be applicable to a broader range of biological systems and even social systems.

One of the goals in analyzing biological and social systems is to gain a complete understanding of a given system or context. One assumption people make is that the more variables you have, the more complete the level of understanding. This is the idea behind the phrase two heads are better than one. For example, combining different types of data (e.g. sensor output, images) or different indicators gives one multiple perspectives on a process. This is similar to a several people blindly feeling different parts of an object, and coming to a consensus as to its identity [2].

This model of consensus is popular because it is consistent with normative statistical models. In other words, the more types of measurement we have, the closer to an "average" description we will get. Furthermore, variables of different classes (e.g. variables that describe different things) should be separable (e.g. should not interact with one another). However, a situation may arise whereby multiple measurements of the same phenomenon yield contradictory results. While "two heads are better than one", we also intuitively know that there can also be "too much of a good thing". By contradictory, I mean that these variables will strongly interact with one another. Not only do these inconsistent variables interact, but also exhibit non-normative behaviors with respect to their distribution. This is consistent with the idea of "deviant" effects which cannot be easily understood using a normative model [3,4]. This is what I am calling the "curse of orthogonality", which is inspired by the "curse of dimensionality" identified by Bellman and others [5,6] in complex systems.

Orthgonality is usually defined as "mutually exclusive, or at a right angle to". While it is not equivalent to statistical independence, there is an unexplored relationship between the two. This is particularly true of underspecified systems, for which most biological and social systems qualify. The "curse" refers to orthogonality of a slightly different definition. For any two variables that on their own have predictive power, the combined predictive power is subadditive due to that predictive power being oriented in different directions. In cases where there are many such variables, the addition of variables to an analysis will dampen the increase in predictive power.

Non-additivity is a common attribute of epistatic systems (where multiple genes interact [7] during expression), drug synergies (e.g. where drugs interact [8] when administered together), and sensory reweighting (e.g. when multisensory cues are dynamically integrated in the brain [9] during behavior). However, there is no general principle that explains tendencies involving such nonlinear effects.

As an example, let us take two indicators of the same process, called A and B. Taken separately, A and B correlate well with the process in question. However, most processes in complex systems have many moving parts and undercurrents that interact. Therefore, the combination of A and B will not be additive. We can see this in the two figures (1 and 2) below.

Figure 1

Figure 2

In Figure 1, the existence or lack of co-occurrence for A and B can be classified according to their sensitivity and specificity. In an actual data analysis, this gives us a subset of true positives which varies in proportion based on the actual data. In this example, true positives are both A and B correctly predict a given state of the system under study. Seen as a series of pseudo-distributions (Figure 2), the overlap between A and B provides a region of true positives. What can be learn from this theoretical example? The first thing is that even though co-occurrence relations between A and B predict each category, one variable (variable B in the case of Figure 2) might exhibit more commonly-expressed variance than the other (e.g. more support in the center of the distribution). By extension, the variable (variable A in the case of Figure 2) might exhibit longer tails than the other (e.g. variance that contributes to rare events). It is predicted that the more heterogeneity observed between the distributions (e.g. different variables with different distributions) will result in a less discriminant result (e.g. a higher percentage of false positives).

In cases such as this, we can say that A and B are quasi-independently distributed. That is, even though their distributions are overlapping, observations of A and B's behavior in the same context can often mimic the behavior of two independent variables. This is because while A and B are part of the same process, they are not closely integrated in a functional sense. In other words, using a small number of variables to predict a highly complex system will often yield a sub-additive explanation of the observed variance.

Unlike the curse of dimensionality, the curse of orthogonality depends more on effects of interactions than number of dimensions or variables. Therefore, overcoming the curse of orthogonality is tied to a better understanding of sub- and super-additive interactions and the effects of extreme events in complex systems. While this "curse" does not effect all complex systems, it is worth considering when dealing with highly complex systems when different measurements are in disagreement and multivariate analyses yield poor results.

References:
[1] Hough, et.al (2009). A continuum of cell states spans pluripotency and lineage commitment in human embryonic stem cells. PLoS One, 4(11), e7708.

[2] Aitchison, J. and Schwikowski, B. (2003). Systems Biology and Elephants. Nature Cell Biology, 5, 285.

[3] Samoilov, M.S. and Arkin, A.P. (2006). Deviant effects in molecular reaction pathways. Nature Biotechnology, 24(10), 1235-1240.

[4] Evans, M., Hastings, N., and Peacock, B. (2000). Statistical Distributions. Wiley, New York.

[5] Bellman, R.E. (1957). Dynamic programming. Princeton University Press, Princeton, NJ.

[6] Donoho, D.L. (2000). High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. Analysis, 1-33. American Mathematical Society.

[7] Cordell, H.C. (2002). Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Human Molecular Genetics, 11, 2463–2468.

[8] Tallarida, R.J. (2001). Drug Synergism: Its Detection and Applications. Journal of Pharmacology and Experimental Therapeutics, 298(3), 865–872.

[9] Carver, S., Kiemel, T., Jeka, J.J. (2006). Modeling the dynamics of sensory reweighting. Biological Cybernetics, 95, 123–134.

December 23, 2008

Evolution as an Inverse Problem, Part I

A few years back, I was at a conference (SwarmFest '04, if you must know) at which I heard Marco Dorigo (from the Swarm Intelligence community) characterize the collective behavior of social insects (nest building in this case) as an "inverse problem". For those of you unfamiliar, inverse problems are complex problems where the data is well-known, but the model parameters are not.

Definition of an Inverse Problem (Wolfram Mathworld)

Take the construction of Dorigo's anthill problem as an example. We can easily observe the actions of each ant and the interactions between them. This can even be extrapolated to a description of the anthill structure. However, this description is not generalizable to all instances of anthill. The reason for this is that because the structure is emergent, an anthill of a particular morphology can be had using any number of equally-suitable sequences of interactions. In other words, there are many different equally-suitable ways to produce to the observed structure. If we were to attempt a reconstruction of the anthill without our a priori observations, we would fail: for this reason, inverse problems such as these are called ill-posed problems. Other behaviors (such as arm movements) have also been called ill-posed problems.

Why is this relevant to the post?

If the obvious parallels to evolution weren't obvious with the word "reconstruction", consider the following: there are many possible ways
to get to a coherent anthill, just as there are many ways to get to a fit phenotype. A form of convergent evolution called neutral networks, where selection is not extreme and many genotypes are fit enough to provide an adaptive solution, comes to mind here.

Definition of Neutral Networks

In addition, the concept (if not the analytical techniques) of inverse problems could be useful for a better understanding of parameters such as transcriptional regulation, gene action, selection, and even fitness. This is especially important for understanding how these parameters assemble a complex phenotype from a genotype.

In the next installment, I will consider the basic combinatorics of anthills and gene action, and how thismight produce emergent structures with and without "selection".