Graphical free-association

1. Demography and Economics

I found these graphs online in the course of reading the news, following up on fleeting ideas, etc. This is an example of a free-association exercise I like to do with collections of graphs and data to better understand their underlying dynamics.

The first is a graph representing the imbalance between GDP growth and employment growth among major cities in the USA. I have read that this imbalance is thought to be a result of gains in productivity (where work normally available to humans is lessened by technology). While this might be thought of as an optimizing process (technology makes a process more efficient), it is also a displacement process (people trained for work in one sector of the economy must shift to a new domain). I've never heard of an economist talk about it in terms of the latter, but in any case here is the graph:

GDP by US Metro Area, The Economist



The second graph is a log-log plot of population estimates for the planet from 10,000 years ago to the present. The lower bound of this graph corresponds roughly with the advent of agriculture and the beginning of the Holocene (e.g. end of the ice ages). There are several interesting trends in these graphs that are related to economic growth and the maturity of a society.

Most people are aware of the population expansion that has taken place in the last 500 years. However, on a log-log scale we can see that there was a correspondingly large proportional increase (shown on a log-log plot) in human populations occurring about 5000 to 8000 years ago. This corresponds to a period when several societies around the planet were rapidly increasing in complexity. In the archaeological community, this is a well-known signature called the neolithic demographic transition, and is characterized by large increases in the proportion of 8-15 year old individuals at burial sites [1].

Log-Log World Population Growth, Wikipedia


A comparison and contrast between this era and the modern era (which has also seen large population growth) may also be understood by turning to dynamical models. In particular, the Lotka-Volterra model uses coupled differential equations to model the growth trajectory of two competing population given variation in the birth and death rate for each. In Lotka-Volterra (e.g. predator-prey) dynamics, it is assumed that one population has a direct effect on the other [2]. For example, the birth rate in one has an effect on the death rate in the other, with some allowable degree of lag. Compare this to the demographic transition model, for which a graph is shown below:

Concept of Demographic Transition in Human Societies, Wikipedia

In this example, there is no competition between populations per se, although it is not specified what is driving the internal dynamics. In the case of the above graph, it almost seems as though a high birth rate that transiently outstrips the death rate can prime future population growth, even if the birth rate falls during the exponential phase of population growth. This model has been used to explain human population growth, but may also reveal how complex and fleeting the precursors to human demographic and economic fluctuations actually are.




If you really want to watch a conservative's head explode [1], try explaining the demographic dividend [2] to them. The demographic dividend is the relative payoff for each child produced [3], and can be loosely thought of as the true cost of reproduction. The shape of both functions (red and blue) were determined by me.

The notion of "keeping up the family lineage" is traditionally a source of pride for many people. Therefore, continuous population growth is naturally intuitive for many people. Yet the above graph (based on actual trends) shows that we are entering a new statistical regime.

The figure above is a conceptual explanation, although the causal factors and driving forces are a bit more subtle (and beyond the scope of this post). Briefly, they involve the interplay between human demography, labor market dynamics, automation/technology, and ecological sustainability. Happy head exploding!


References:
[1] Bocquet-Appel, J-P. (2011). When the World’s Population Took Off: The Springboard of the Neolithic Demographic Transition. Science, 333(6042), 560-561. Podcast with author (Science).

[2] Hoppensteadt, F. (2006). Predator-Prey Model. Scholarpedia, 1(10), 1563.


[3] exploding head demonstration courtesy of the movie "Visioneers" (2008). Some conservatives (Ross Douthat, "More Babies, Please", NYT, December 1) still think that the route to economic growth is an increase in the birth rate. Strangely, this concept is not among their reasons for supporting it. The counterpoint (Mark Adomanis, "Ross Douthat, Demography, and Innovation", Forbes, December 3) makes it clear how short-sighted this argument can be.

[4] The demographic dividend can vary quite a bit by region and specific period of history, as the graph above is a rough average over the last 10,000 years or so.

Please see this book for more information: Bloom, D.E., Canning, D., and Sevilla, J.   The Demographic Dividend: a new perspective on the economic consequences of population change. Rand Corporation (2003).

[5] For more information on the history of human demography (particularly the Neolithic Demographic Transition, or NDT), please see [1]



2. Technology and Economics


Is the hype cycle of technological development (TOP) a social/marketwide version of the "uncanny valley" curve (BOTTOM) describing the emotional response to simulated human realism? The hype (or exposure) cycle involves the initial stages of diffusion for a new innovation (e.g. personal robots). There is an initial burst of interest (more idea than tangible technology), followed by a period of disillusionment and eventual mass adoption in the form of tangible products that contribute to economic productivity. 

The comparison to the uncanny valley curve involves the collective cognitive response to a new innovation: the idea and tangible applications diffuse in two conceptual waves. People must first be willing to accept the idea, which allows them to build a frame of reference for the technology. The ability of people to build a concept of the technology and sort through what it may and may not be able to do proceeds faster than the diffusion of the second wave. This contributes to the trough of disillusionment, the length of which can be mitigated by the rapid development and propagation of tangible technological products.



This one is a bit hard to understand at first, but will become clear as the graphs are explained. The main graph is a screenshot from a recently posted YouTube video [1] that demonstrates three things: what people think is the ideal income distribution in American society, what people think the distirbution actually is [2], and what the distribution actually is. Notice that the right-hand side of the actual distribution is much sharper than the popular expectation. This sharpening also reflects the evolution of this  distribution over the last 40 years.

The inset graph (lower left) characterizes newly-released survey data (from the General Social Survey) featured in the New York Times about gun ownership trends over the past 40 years [3]. Contrary to some popular conceptions about gun ownership (that increases in gun sales necessitates that more people own guns), the data actually show a trend that parallels the evolution of income inequality: more guns are being purchased by increasingly fewer people.

The connection between these two graphs involves an explanation for these independent results. Perhaps they suggest a general mechanism for demographic aggregation, or perhaps there is a more subtle mechanism for sorting and resource consolidation at work. Whatever the cause, the result is counterintuitive with a process not easily understood at first glance.

[1] Politizane   Wealth Inequality in America. YouTube (2013). November 20 (2012).

[2] For the work on cognitive biases and wealth distribution estimates, please see: Norton, M.I. and Ariely, D.   Building a Better America -- One Wealth Quintile at a Time. Perspectives on Psychological Science, 6, 9 (2011).

[3] Avernise, S.T. and Gebeloff, R.   Share of Homes With Guns Shows 4-Decade Decline. NYT, March 9 (2013).



3. Something's not right with these data......


Which side of the first graph (top) [1] would you guess is a signature of accounting fraud? According to The Big Picture blog [2], it is the left-hand side (e.g. Jack Welch's tenure). This was all brought to light because Jack Welch is the one who accused the Obama administration of "cooking" recent unemployment statistics [3].

For comparison, the graph below is the U3 unemployment rate over the course of the first Obama administration [4].


References:
[1] COURTESY: Tom Brakke's Twitter Feed.

[2] GE’s Jack Welch Knows About Cooking the Books. October, 2012.

[3] see Paul Krugman's NYT article from 10/8, "The Truth about Jobs".

[4] data from US Bureau of Labor. While not the best measure of unemployment, notice the variability (spikiness) from month to month. That's what a "natural" time-series of a stochastic process should look like.



4. Reporting Scientific Results

I found these graphs while reading papers for a laboratory-associated Journal Club. All of these examples are drawn from the cell biology literature, but similar examples can be found in other disciplines (so keep you eyes open!) If you have an example for me to post, please pass it along to me

This is one of the worst graphs I have ever encountered (from [1]), and serves as a lesson of what NOT to do when visualizing data (NOTE: let's assume that fold-enrichment is on a scale from 0-100, or include a break between the maximal value for all condition-associated bars and the 100-fold level).

First of all, the y-axis is out of proportion with the data. And I have to tell you that this is supposed to be a bar graph, which is hard to discern because the bars are almost invisible (in that they are so close to the x-axis). If they wanted to emphasize the scale of their finding, there are probably better ways of doing it.


This example comes from Figure 2 in Szabo et.al [2] Nature, 468, 521-526 (2010). It is more hard to interpret than a bona-fide "worst.graph". Mainly because there are too many undifferentiated conditions (in this case, independent genes). Nevertheless, I am posting it here.



The last example is a bit more subtle, and comes from [3]. This is a graph of relative conversion, or percentage of cells positive for a certain marker that are also positive for a secondary marker (see the y-axis). Question: what is wrong with it?


Answer: the authors should have used a conditional probability. For example, say the number of reporter positive cells is 30% and the number of tubulin and reporter positive cells is also 30%. The efficiency (a better measure for the y-axis) is 9% (0.3 * 0.3). By contrast, the measure that was used to generate this graph provides an artificially large percentage, especially when compared to similar measures of cellular reprogramming efficiency.

References: 
[1] Donmez, G., Wang, D., Cohen, D.E., and Guarente, L. (2010). SIRT1 Suppresses β-Amyloid Production by Activating the α-Secretase Gene ADAM10. Cell, 142(2), 320-332.

NOTE: The lead author on this paper was involved in a case of manipulated Western Blots. While this has nothing to do with the plotting of a graph, it speaks to quality control in the research process.

[2] Szabo, E., Rampalli, S., Risueno, R.M., Schnerch, A., Mitchell, R., Fiebig-Comyn, A., 
Levadoux-Martin, M., and Bhatia, M. (2010). Direct conversion of human fibroblasts to multilineage blood progenitors. Nature, 468, 521–526.

[3] Karow, M., Sanchez, R., Schichor, C., Masserdotti, G., Ortega, F., Heinrich, C., Gascon, S., Khan, M.A., Lie, D.C., Dellavalle, A., Cossu, G., Goldbrunner, R., Gotz, M., and Berninger, B. (2012). Reprogramming of Pericyte-Derived Cells of the Adult Human Brain into Induced Neuronal Cells. Cell Stem Cell, 11(4), 471-476.

4. Myths about Experimental Design and Analysis


5. Characterizing the Development of an Academic Field


I was going through some of my hard-copy magazines and other reading materials the other day, and stumbled upon a few issues of SEED Magazine. They have a feature where they create "curves" (this curve being interpolated from selected watershed events, which are ranked categorically) that represent the trajectory of a given research field. Inasmuch as one can quantify changes, major advances, and crises in various fields, they provide us with a good heuristic. The point of the graphic is to assure us that science does not proceed in a straight line. 

Since I could not find a virtual copy of this, I re-created the curve for group selection theory in the image above. The major landmarks were determined by the staff at SEED, and many lead-up events (lesser-known papers or influence from other fields or strong personalities) are not specified. The graph ends in 2010 at the publication of [6]. The future trajectory of the field is of course unknown, and in this case it is really unknown, as there was significant pushback and other discussion about this article in Nature and in the blogosphere (examples can be found here and here). In any case, hope you find this useful as a conversational stimulant.

References:
[1] Wilson, D.S. (1983). The Group Selection Controversy: history and current status. Annual Review of Ecology and Systematics, 14, 159-187.

[2] Hamilton, W. (1964). The genetical evolution of social behaviour. Journal of Theoretical Biology 7 (1): 1–16. Also see the following review: Dugatkin, L.A. (2007). Inclusive Fitness Theory from Darwin to Hamilton. Genetics, 176(3), 1375-1380.

[3] Williams, G.C. (1966). Adaptation and Natural Selection. AND (1971). Group Selection.

[4] Margulis, L. (1981). Symbiosis in cell evolution: life and its environment on the early Earth. 

[5] Dawkins, R. (1976). The Selfish Gene. 

[6] Nowak, M.A., Tarnita, C.E., and Wilson, E.O. (2010). The evolution of eusociality. Nature, 466(7310), 1057–1062.