Graphical free-association

1. Demography and Economics

I found these graphs online in the course of reading the news, following up on fleeting ideas, etc. This is an example of a free-association exercise I like to do with collections of graphs and data to better understand their underlying dynamics.

The first is a graph representing the imbalance between GDP growth and employment growth among major cities in the USA. I have read that this imbalance is thought to be a result of gains in productivity (where work normally available to humans is lessened by technology). While this might be thought of as an optimizing process (technology makes a process more efficient), it is also a displacement process (people trained for work in one sector of the economy must shift to a new domain). I've never heard of an economist talk about it in terms of the latter, but in any case here is the graph:

GDP by US Metro Area, The Economist

The second graph is a log-log plot of population estimates for the planet from 10,000 years ago to the present. The lower bound of this graph corresponds roughly with the advent of agriculture and the beginning of the Holocene (e.g. end of the ice ages). There are several interesting trends in these graphs that are related to economic growth and the maturity of a society.

Most people are aware of the population expansion that has taken place in the last 500 years. However, on a log-log scale we can see that there was a correspondingly large proportional increase (shown on a log-log plot) in human populations occurring about 5000 to 8000 years ago. This corresponds to a period when several societies around the planet were rapidly increasing in complexity. In the archaeological community, this is a well-known signature called the neolithic demographic transition, and is characterized by large increases in the proportion of 8-15 year old individuals at burial sites [1].

Log-Log World Population Growth, Wikipedia

A comparison and contrast between this era and the modern era (which has also seen large population growth) may also be understood by turning to dynamical models. In particular, the Lotka-Volterra model uses coupled differential equations to model the growth trajectory of two competing population given variation in the birth and death rate for each. In Lotka-Volterra (e.g. predator-prey) dynamics, it is assumed that one population has a direct effect on the other [2]. For example, the birth rate in one has an effect on the death rate in the other, with some allowable degree of lag. Compare this to the demographic transition model, for which a graph is shown below:

Concept of Demographic Transition in Human Societies, Wikipedia

In this example, there is no competition between populations per se, although it is not specified what is driving the internal dynamics. In the case of the above graph, it almost seems as though a high birth rate that transiently outstrips the death rate can prime future population growth, even if the birth rate falls during the exponential phase of population growth. This model has been used to explain human population growth, but may also reveal how complex and fleeting the precursors to human demographic and economic fluctuations actually are.

If you really want to watch a conservative's head explode [1], try explaining the demographic dividend [2] to them. The demographic dividend is the relative payoff for each child produced [3], and can be loosely thought of as the true cost of reproduction. The shape of both functions (red and blue) were determined by me.

The notion of "keeping up the family lineage" is traditionally a source of pride for many people. Therefore, continuous population growth is naturally intuitive for many people. Yet the above graph (based on actual trends) shows that we are entering a new statistical regime.

The figure above is a conceptual explanation, although the causal factors and driving forces are a bit more subtle (and beyond the scope of this post). Briefly, they involve the interplay between human demography, labor market dynamics, automation/technology, and ecological sustainability. Happy head exploding!

[1] Bocquet-Appel, J-P. (2011). When the World’s Population Took Off: The Springboard of the Neolithic Demographic Transition. Science, 333(6042), 560-561. Podcast with author (Science).

[2] Hoppensteadt, F. (2006). Predator-Prey Model. Scholarpedia, 1(10), 1563.

[3] exploding head demonstration courtesy of the movie "Visioneers" (2008). Some conservatives (Ross Douthat, "More Babies, Please", NYT, December 1) still think that the route to economic growth is an increase in the birth rate. Strangely, this concept is not among their reasons for supporting it. The counterpoint (Mark Adomanis, "Ross Douthat, Demography, and Innovation", Forbes, December 3) makes it clear how short-sighted this argument can be.

[4] The demographic dividend can vary quite a bit by region and specific period of history, as the graph above is a rough average over the last 10,000 years or so.

Please see this book for more information: Bloom, D.E., Canning, D., and Sevilla, J.   The Demographic Dividend: a new perspective on the economic consequences of population change. Rand Corporation (2003).

[5] For more information on the history of human demography (particularly the Neolithic Demographic Transition, or NDT), please see [1]

2. Technology and Economics

Is the hype cycle of technological development (TOP) a social/marketwide version of the "uncanny valley" curve (BOTTOM) describing the emotional response to simulated human realism? The hype (or exposure) cycle involves the initial stages of diffusion for a new innovation (e.g. personal robots). There is an initial burst of interest (more idea than tangible technology), followed by a period of disillusionment and eventual mass adoption in the form of tangible products that contribute to economic productivity. 

The comparison to the uncanny valley curve involves the collective cognitive response to a new innovation: the idea and tangible applications diffuse in two conceptual waves. People must first be willing to accept the idea, which allows them to build a frame of reference for the technology. The ability of people to build a concept of the technology and sort through what it may and may not be able to do proceeds faster than the diffusion of the second wave. This contributes to the trough of disillusionment, the length of which can be mitigated by the rapid development and propagation of tangible technological products.

This one is a bit hard to understand at first, but will become clear as the graphs are explained. The main graph is a screenshot from a recently posted YouTube video [1] that demonstrates three things: what people think is the ideal income distribution in American society, what people think the distirbution actually is [2], and what the distribution actually is. Notice that the right-hand side of the actual distribution is much sharper than the popular expectation. This sharpening also reflects the evolution of this  distribution over the last 40 years.

The inset graph (lower left) characterizes newly-released survey data (from the General Social Survey) featured in the New York Times about gun ownership trends over the past 40 years [3]. Contrary to some popular conceptions about gun ownership (that increases in gun sales necessitates that more people own guns), the data actually show a trend that parallels the evolution of income inequality: more guns are being purchased by increasingly fewer people.

The connection between these two graphs involves an explanation for these independent results. Perhaps they suggest a general mechanism for demographic aggregation, or perhaps there is a more subtle mechanism for sorting and resource consolidation at work. Whatever the cause, the result is counterintuitive with a process not easily understood at first glance.

[1] Politizane   Wealth Inequality in America. YouTube (2013). November 20 (2012).

[2] For the work on cognitive biases and wealth distribution estimates, please see: Norton, M.I. and Ariely, D.   Building a Better America -- One Wealth Quintile at a Time. Perspectives on Psychological Science, 6, 9 (2011).

[3] Avernise, S.T. and Gebeloff, R.   Share of Homes With Guns Shows 4-Decade Decline. NYT, March 9 (2013).

3. Job Markets and Long-term Trends

How hard is it to get a job? And, by extension, how hard is it for employers to fill jobs? While some people debate over whether or not there is massive skill mismatching in the job market [1], another way to approach the problem is to construct a metric. Virginia Postrel [2] introduces us to the Dice-DFH Vacancy Duration Measure, constructed from monthly data on job openings and labor turnover, demonstrates the relative tightness of the job market throughout the recent deep recession. While this measure shows a  recovery from the nadir of 2009, related data also show that this varies by economic sector. My take on this is that job openings are becoming more specialized [3], which is creating a situation of fewer qualified (for the position as listed) applicants. This specialization is unrealistic given the nature of experience and the needs of labor [4]. This is distinct from the need for labor, but may play a role in driving acute shortages in specific industries.

Now we will look at the difficulty of getting a job in today's job market, culminating in the debate of whether or not there is a skills mismatch.

Today, we explore the phenomenon of long-term employment and how it may be due to chance. Paul Krugman [5] write about the chance nature of the problem, and how this is often not reflected in policy debates. Matt O'Brien [6] takes it further by comparing the U6 unemployment rate against the odds of becoming one of the long-term unemployed.

To turn this debate on its head, David Graeber [7] provides a discussion of whether many jobs are simply a means to no productive end. This would suggest that long-term unemployment could be solved by a proliferation of BS jobs, but of course a guaranteed income might actually be more productive.

[1] Shapiro, L.   'Skills Mismatch' Causing High Unemployment? Not Quite. Huff Post Business, February 21 (2012).

[2] Postrel, V.   Why is it so hard to get a job? Bloomberg View, May 14 (2014).

[3] Button, A.   Employers struggling to find workers, but are they really trying? Forex Live, May 14 (2014).

[4] Hintze, A.   Arend’s Job Description – The Big Question. Adami Lab blog.

[5] Krugman, P.   Unemployment: it's not personal. Conscience of a Liberal blog, May 17 (2014).

[6] O'Brien, M.   The odds you’ll join the ranks of the long-term unemployed. WaPo Wonkblog, May 16 (2014).

[7] Graeber, D.   Ask yourself, says a notorious ‘Occupy’ academic, should your job exist? PBS Newshour Making Sense blog, May 9 (2014).

4. Something's not right with these data......

Which side of the first graph (top) [1] would you guess is a signature of accounting fraud? According to The Big Picture blog [2], it is the left-hand side (e.g. Jack Welch's tenure). This was all brought to light because Jack Welch is the one who accused the Obama administration of "cooking" recent unemployment statistics [3].

For comparison, the graph below is the U3 unemployment rate over the course of the first Obama administration [4].

[1] COURTESY: Tom Brakke's Twitter Feed.

[2] GE’s Jack Welch Knows About Cooking the Books. October, 2012.

[3] see Paul Krugman's NYT article from 10/8, "The Truth about Jobs".

[4] data from US Bureau of Labor. While not the best measure of unemployment, notice the variability (spikiness) from month to month. That's what a "natural" time-series of a stochastic process should look like.

5. Reporting Scientific Results

I found these graphs while reading papers for a laboratory-associated Journal Club. All of these examples are drawn from the cell biology literature, but similar examples can be found in other disciplines (so keep you eyes open!) If you have an example for me to post, please pass it along to me

This is one of the worst graphs I have ever encountered (from [1]), and serves as a lesson of what NOT to do when visualizing data (NOTE: let's assume that fold-enrichment is on a scale from 0-100, or include a break between the maximal value for all condition-associated bars and the 100-fold level).

First of all, the y-axis is out of proportion with the data. And I have to tell you that this is supposed to be a bar graph, which is hard to discern because the bars are almost invisible (in that they are so close to the x-axis). If they wanted to emphasize the scale of their finding, there are probably better ways of doing it.

This example comes from Figure 2 in Szabo [2] Nature, 468, 521-526 (2010). It is more hard to interpret than a bona-fide "worst.graph". Mainly because there are too many undifferentiated conditions (in this case, independent genes). Nevertheless, I am posting it here.

The last example is a bit more subtle, and comes from [3]. This is a graph of relative conversion, or percentage of cells positive for a certain marker that are also positive for a secondary marker (see the y-axis). Question: what is wrong with it?

Answer: the authors should have used a conditional probability. For example, say the number of reporter positive cells is 30% and the number of tubulin and reporter positive cells is also 30%. The efficiency (a better measure for the y-axis) is 9% (0.3 * 0.3). By contrast, the measure that was used to generate this graph provides an artificially large percentage, especially when compared to similar measures of cellular reprogramming efficiency.

[1] Donmez, G., Wang, D., Cohen, D.E., and Guarente, L. (2010). SIRT1 Suppresses β-Amyloid Production by Activating the α-Secretase Gene ADAM10. Cell, 142(2), 320-332.

NOTE: The lead author on this paper was involved in a case of manipulated Western Blots. While this has nothing to do with the plotting of a graph, it speaks to quality control in the research process.

[2] Szabo, E., Rampalli, S., Risueno, R.M., Schnerch, A., Mitchell, R., Fiebig-Comyn, A., 
Levadoux-Martin, M., and Bhatia, M. (2010). Direct conversion of human fibroblasts to multilineage blood progenitors. Nature, 468, 521–526.

[3] Karow, M., Sanchez, R., Schichor, C., Masserdotti, G., Ortega, F., Heinrich, C., Gascon, S., Khan, M.A., Lie, D.C., Dellavalle, A., Cossu, G., Goldbrunner, R., Gotz, M., and Berninger, B. (2012). Reprogramming of Pericyte-Derived Cells of the Adult Human Brain into Induced Neuronal Cells. Cell Stem Cell, 11(4), 471-476.

6. Myths about Experimental Design and Analysis

7. Characterizing the Development of an Academic Field

I was going through some of my hard-copy magazines and other reading materials the other day, and stumbled upon a few issues of SEED Magazine. They have a feature where they create "curves" (this curve being interpolated from selected watershed events, which are ranked categorically) that represent the trajectory of a given research field. Inasmuch as one can quantify changes, major advances, and crises in various fields, they provide us with a good heuristic. The point of the graphic is to assure us that science does not proceed in a straight line. 

Since I could not find a virtual copy of this, I re-created the curve for group selection theory in the image above. The major landmarks were determined by the staff at SEED, and many lead-up events (lesser-known papers or influence from other fields or strong personalities) are not specified. The graph ends in 2010 at the publication of [6]. The future trajectory of the field is of course unknown, and in this case it is really unknown, as there was significant pushback and other discussion about this article in Nature and in the blogosphere (examples can be found here and here). In any case, hope you find this useful as a conversational stimulant.

[1] Wilson, D.S. (1983). The Group Selection Controversy: history and current status. Annual Review of Ecology and Systematics, 14, 159-187.

[2] Hamilton, W. (1964). The genetical evolution of social behaviour. Journal of Theoretical Biology 7 (1): 1–16. Also see the following review: Dugatkin, L.A. (2007). Inclusive Fitness Theory from Darwin to Hamilton. Genetics, 176(3), 1375-1380.

[3] Williams, G.C. (1966). Adaptation and Natural Selection. AND (1971). Group Selection.

[4] Margulis, L. (1981). Symbiosis in cell evolution: life and its environment on the early Earth. 

[5] Dawkins, R. (1976). The Selfish Gene. 

[6] Nowak, M.A., Tarnita, C.E., and Wilson, E.O. (2010). The evolution of eusociality. Nature, 466(7310), 1057–1062.

8. Chart Blogging and Concepts at a Glance

Is chart-blogging inherently misleading? Matthew Iglesias makes the case for this using a graph from the Bureau of Economic Analysis. In this case, the typical indicators are shown to not strongly affect economic growth. Lesson: many graphs are misleading, and some are even impossible to interpret at first glance.

SOURCE: Yglesias, M.   Economic Policy  Institute Produces Dual Axis Chart That Proves Nothing Affects Economic Growth. Maneybox blog, June 4 (2013).

This graph, from James Brown, shows the scaling relationship between annual fertility rate (by country and mode of production) and per capita power comsumption (wattage). Interestingly, there is a strong linear relationship between the two. This suggests a fundamental limit to growth.

SOURCE: Brown, J.M.   Gasoline and Fertility Energy: we consume unlike any other., 1 (2013).

9. The Retrospective Demand Fallacy

This fallacy (summarized in many popular media accounts of the so-called employment mismatch) assumes that the production of human capital (in this case, college degrees) must meet the demands of existing industry. If there are changes to the structure of the job market, then the supply will adapt. The prescription for this is low investment in education (only what is needed at the time), and assumes that labor is maximally malleable. To assess the damage, let's explore two examples: the STEM job market, and the PhD (all fields) job market.

The first example is from [1], and is a critical look at the idea of a STEM job crisis. The conventional notion is that we must train as many people for STEM jobs as possible, while data imply that the job openings are simply not there. Some analyses [2] have framed this mismatch as a decoupling of productivity and conventional employment, which is in turn a peculiar feature of the information technology revolution. This also ignores the idea of education as market empowerment, particularly in terms of creating new enterprises and opportunities [3]. 

In short, when people are provided with an education, they create human capital which has the potential to create a much larger multiplier effect than people trained in the latest trend. When liberal (or generalized technical training) is minimized, this potential for empowering individuals to create their own opportunities goes away [4].

The second example is about the supposed glut of PhDs [5, 6]. Basically, the problem is that there are too many PhDs for the current number of academic and other jobs. This has advanced arguments that go something like this: if we reduce the number of PhDs produced, the glut will take care of itself. This has strong parallels with the first example.

The proposed solutions to the PhD glut include everything from shutting down most of the PhD training programs in the United States [7] to cutting (or, paradoxically, increasing) the number of foreign PhDs [8]. While these solutions are devised as economically-realistic ways of addressing the issue, none of them are particularly well-informed in terms of how the job market works as a system [9].

This brings us back to the subject of what exactly can be done about creating opportunity given the current job market. The graphic above demonstrates the actual choice to be made: does the current state of the market determine the what people learn (Example 1 -- or market optimization), or does training develop new markets and opportunities (Example 2 -- or expertise-driven economies). 

Why does the expertise-driven economy work? Consider that such an economy has the potential to drive growth in a way that does not necessitate further environmental degradation or interpersonal exploitation. This is actually what we have witnessed with the growth of the technology industry. New job and product markets emerged, but only as a function of the talent required to design and implement these technologies. 

[1] Charette, R.N.   The STEM crisis is a myth. IEEE Spectrum, August 30 (2013).

[2] Rotman, D.   How Technology is Destroying Jobs. MIT Technology Review, June 12 (2013).

[3] For some thoughts on this, please see: Bucci, A.   Market power, human capital, and growth. IDEAS Repository, 2002012 (2002).

[4] For more on conventional thinking about human capital, please see: Allison, D.   Human Capital: the most overlooked asset class. Investopedia, December 11 (2011).

[5] Eisenbrey, R.   America's Genius Glut. New York Times Op-Ed. February 7 (2013).

[7] A proposition which would likely lead to a downward spiral in job opportunities, not to mention opportunities for innovation. The definition of a zero-sum, brute-force solution.

[8] As you will read in [1], this might be a cynical ploy by existing employers to drive wages down. If this is the case, it has certainly worked. For an alternate view, however, please see: Pethokoukis, J.   The NYT Op-ed about America's genius glut is ludicrous. Business Insider, February 10 (2013).

[9] A position we will call market over-optimization. For more on this idea of optimization and its connection to planned economies, please see: Shalizi, C.   In Soviet Union, Optimization Problem Solves You. Three-Toed Sloth blog, May 30 (2012).

10. The Emergence of Theories

A follow-up on the "theory of theories" post: every great scientific idea (like a good tree) has it's roots. Here, we can see the roots of two biological theories. The top diagram, tracing key events in the double helix structure of DNA idea, is based on data from [1]. The bottom diagram, tracing key events in the notion of evolution by natural selection, is based on data from [2].

[1] Graur, D.   A short history of DNA. SlideShare presentation.

[2] Wilkins, J.   Precursors of Darwinian evolution. TalkOrigins FAQ.

11. Keeping up Appearances, with Chartjunk

Does science have to be convincing, or does it merely have to be cognitively pleasing? Although a major feature of "Proofiness" [1], the latter is an important factor in the "science-signaling" aspect of graphs [2]. As shown in the figure above, the graphical form is in and of itself reassuring, regardless of what it actually means. The signaling of scientific content can also be used as a scare tactic, a source of very effective propaganda.

So how do you present data without relying too heavily on cognitive biases? After all, while this may inadvertently help make your case stronger, it can also undermine what would otherwise be a landmark study. A recent PLoS Computational Biology feature that presents "Ten Simple Rules for Better Figures" [3] might help here. The ten points present ways to craft a clearer and more effective visual support for your hypotheses. Above is an example of "never trusting the defaults", while below is an example of embracing chartjunt (which you are advised by the authors to avoid). 

[1] Seife, C.   Proofiness: How You're Being Fooled by the Numbers. Penguin Books (2011).

[2] Tal, A. and Wansink, B.   Blinded with Science: trivial graphs and formulas increase ad persuasiveness and belief in product efficiency. Public Understanding of Science, 1-9 (2014).

[3] Rougier, N.P., Droettboom, M., and Bourne, P.E.   Ten Simple Rules for Better Figures. PLoS Computational Biology, 10(9), e1003833 (2014).