November 6, 2014

The Top 100 Needles in a Haystack

A week or so ago, Nature News published a feature on the Top 100 (e.g. most-cited) articles of all time [1]. Interesting read, even if you don't agree with their methods or conclusions. Briefly, a Science Citation Index (SCI) was used to generate a list, the top 100 articles of which were considered the most highly-cited papers. Another more inclusive list was generated for comparison using Google Scholar. The outcomes were then evaluated.

The top papers-as-mountain peak analogy. COURTESY: Nature Publishing.

So which type of papers dominated the top of the list? As you might have guessed, papers that describe the details of a now-ubiquitous method dominate the top 100. Articles on methods such as single-step RNA isolation (#5) or density-functional thermochemistry (#8) have been cited in the neighborhood of 50-60,000 times because they provide a simple description of a method that has now become widespread. As it is considered good form to cite the source article for a given method, these are the top articles in this index. It may seem a bit disingenuous to count these articles as the most influential in science. For example, the Watson and Crick paper [3] describing the structure of DNA does not make the top 100. But, this methodology allows us to see the relative diffusion of such methods in the literature. Normalizing the number of citations by their respective age gives us a citation rate, which in turn allows us to estimate the velocity [4] of a given method through the scientific community.

Number 12 on the list is the paper that introduced the BLAST genome alignment method [2]. COURTESY: Nature Publishing.

For comparative purposes, the Nature News article also provides an alternate index, one compiled by Google Scholar. This index not only includes books, but deals with citations that are more conceptual in nature. For example, the top 100 citations of all time includes: Thomas Kuhn's "Structure of Scientific Revolutions" (#7), Claude Shannon's "Mathematical Theory of Communication" (#9), Rogers' "Diffusion of Innovations" (#17), "The Rat Brain in Stereotaxic Coordinates" (#23), and Zadeh's "Fuzzy Sets" (#29). Much like the SCI method, the Google Scholar method results in a long-tailed distribution, with a few papers far exceeding the rest of the literature in terms of citations. While it is less dominated by methods papers (and books), most of the top references on the list have nevertheless had a major influence on a number of fields. 

But what does all this mean for your recent publication? Will your recent paper on the mathematical structure of inter-neuronal conversion be worthy of one of these top spots at some point in the future? One interesting exercise might be to predict the future citation rates and diffusion velocities for recent papers and books (published within the last 10 years). While not at the very top of the list, these papers would demonstrate how the middle of the distribution lives. And for all of those papers with only a few citations even after years of being published, don't despair. It could be that your paper does not fit the criterion of a "top" paper (e.g. covers a series of clever but non-landmark studies), and so has not gained a high profile. Citation patterns are a curious thing.


NOTES:
[1] The relevant databases have limited this to the 20th and 21st centuries. Article citation: Van Noorden, R., Maher, B., Nuzzo, R.   The Top 100 Papers. Nature News, October 29 (2014).

[2] Full citation: Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J.   Basic local search alignment tool. Journal of Molecular Biology, 215, 403-410 (1990).

[3] Watson, J.D. and Crick, F.H.C.   A Structure for Deoxyribose Nucleic Acid. Nature, 171, 737-738 (1953).

[4] Calculating method or publication "velocity" involves mapping the citation rate (per unit time) to a metric space or graph the represent the "shape" of the scientific community (disciplines, interest areas, etc). This is intentionally vague, as various community shapes are contingent on many factors.

1 comment:

  1. I read the first edition of Kuhn's book (1962) in the mid-1960s as a graduate student, and excited, wrote Kuhn about it. He sent back a letter expressing wonder that anyone was still interested in it!
    Yours, -Dr. Richard Gordon DickGordonCan@gmail.com

    ReplyDelete

Printfriendly