October 3, 2011

The Curse of Orthogonality

This is an idea I developed after reading a paper [1] on detecting pluripotency in different cell populations. The theory was developed in terms of cell biology, but may also be applicable to a broader range of biological systems and even social systems.

One of the goals in analyzing biological and social systems is to gain a complete understanding of a given system or context. One assumption people make is that the more variables you have, the more complete the level of understanding. This is the idea behind the phrase two heads are better than one. For example, combining different types of data (e.g. sensor output, images) or different indicators gives one multiple perspectives on a process. This is similar to a several people blindly feeling different parts of an object, and coming to a consensus as to its identity [2].

This model of consensus is popular because it is consistent with normative statistical models. In other words, the more types of measurement we have, the closer to an "average" description we will get. Furthermore, variables of different classes (e.g. variables that describe different things) should be separable (e.g. should not interact with one another). However, a situation may arise whereby multiple measurements of the same phenomenon yield contradictory results. While "two heads are better than one", we also intuitively know that there can also be "too much of a good thing". By contradictory, I mean that these variables will strongly interact with one another. Not only do these inconsistent variables interact, but also exhibit non-normative behaviors with respect to their distribution. This is consistent with the idea of "deviant" effects which cannot be easily understood using a normative model [3,4]. This is what I am calling the "curse of orthogonality", which is inspired by the "curse of dimensionality" identified by Bellman and others [5,6] in complex systems.

Orthgonality is usually defined as "mutually exclusive, or at a right angle to". While it is not equivalent to statistical independence, there is an unexplored relationship between the two. This is particularly true of underspecified systems, for which most biological and social systems qualify. The "curse" refers to orthogonality of a slightly different definition. For any two variables that on their own have predictive power, the combined predictive power is subadditive due to that predictive power being oriented in different directions. In cases where there are many such variables, the addition of variables to an analysis will dampen the increase in predictive power.

Non-additivity is a common attribute of epistatic systems (where multiple genes interact [7] during expression), drug synergies (e.g. where drugs interact [8] when administered together), and sensory reweighting (e.g. when multisensory cues are dynamically integrated in the brain [9] during behavior). However, there is no general principle that explains tendencies involving such nonlinear effects.

As an example, let us take two indicators of the same process, called A and B. Taken separately, A and B correlate well with the process in question. However, most processes in complex systems have many moving parts and undercurrents that interact. Therefore, the combination of A and B will not be additive. We can see this in the two figures (1 and 2) below.

Figure 1

Figure 2

In Figure 1, the existence or lack of co-occurrence for A and B can be classified according to their sensitivity and specificity. In an actual data analysis, this gives us a subset of true positives which varies in proportion based on the actual data. In this example, true positives are both A and B correctly predict a given state of the system under study. Seen as a series of pseudo-distributions (Figure 2), the overlap between A and B provides a region of true positives. What can be learn from this theoretical example? The first thing is that even though co-occurrence relations between A and B predict each category, one variable (variable B in the case of Figure 2) might exhibit more commonly-expressed variance than the other (e.g. more support in the center of the distribution). By extension, the variable (variable A in the case of Figure 2) might exhibit longer tails than the other (e.g. variance that contributes to rare events). It is predicted that the more heterogeneity observed between the distributions (e.g. different variables with different distributions) will result in a less discriminant result (e.g. a higher percentage of false positives).

In cases such as this, we can say that A and B are quasi-independently distributed. That is, even though their distributions are overlapping, observations of A and B's behavior in the same context can often mimic the behavior of two independent variables. This is because while A and B are part of the same process, they are not closely integrated in a functional sense. In other words, using a small number of variables to predict a highly complex system will often yield a sub-additive explanation of the observed variance.

Unlike the curse of dimensionality, the curse of orthogonality depends more on effects of interactions than number of dimensions or variables. Therefore, overcoming the curse of orthogonality is tied to a better understanding of sub- and super-additive interactions and the effects of extreme events in complex systems. While this "curse" does not effect all complex systems, it is worth considering when dealing with highly complex systems when different measurements are in disagreement and multivariate analyses yield poor results.

[1] Hough, et.al (2009). A continuum of cell states spans pluripotency and lineage commitment in human embryonic stem cells. PLoS One, 4(11), e7708.

[2] Aitchison, J. and Schwikowski, B. (2003). Systems Biology and Elephants. Nature Cell Biology, 5, 285.

[3] Samoilov, M.S. and Arkin, A.P. (2006). Deviant effects in molecular reaction pathways. Nature Biotechnology, 24(10), 1235-1240.

[4] Evans, M., Hastings, N., and Peacock, B. (2000). Statistical Distributions. Wiley, New York.

[5] Bellman, R.E. (1957). Dynamic programming. Princeton University Press, Princeton, NJ.

[6] Donoho, D.L. (2000). High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. Analysis, 1-33. American Mathematical Society.

[7] Cordell, H.C. (2002). Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Human Molecular Genetics, 11, 2463–2468.

[8] Tallarida, R.J. (2001). Drug Synergism: Its Detection and Applications. Journal of Pharmacology and Experimental Therapeutics, 298(3), 865–872.

[9] Carver, S., Kiemel, T., Jeka, J.J. (2006). Modeling the dynamics of sensory reweighting. Biological Cybernetics, 95, 123–134.

No comments:

Post a Comment