Synthetic Daisies: Fireside Science: The Consensus-Novelty Dampening

This content is being cross-posted to Fireside Science. NOTE: this content has not been peer-reviewed!

I am going to start this post with a rhetorical question: why do people often assume that traditional (or common sense) practices are inherently better, even when the cumulative evidence is inconclusive? In discussing political and economic policy-making, Duncan Black and Paul Krugman uses the term "very serious people" (VSPs) [1] to describe important people who back positions that sound serious but are actually wrong-headed and perhaps even dangerous. Part of this "seriousness" stems from appealing to their own authority or broad issues that have always been a legitimate concern.

Recently, such a "very important person" (not a famous scientist, but a VSP in spirit -- and you will see why as we move along) has published an article in Science called "Who’s Afraid of Peer Review?" [2]. This paper involved a experiment to validate quality control in peer-review in open-access journals, and had some useful results that did not particularly surprise me. For example, open-access journals that send out copious amounts of spam encouraging submission of your work may not be reject papers with faked data in them.

To recap the experiment, the author generated a large number of scientific papers with scientific-sounding (but false) results with accompanying bad graphs. The generative model used here is similar in concept to the Dada Engine [3], and the experimental treatment could best be described as 1,000 (or more) Sokal hoaxes. The papers were sent out to the many open-access journals that have popped into existence in the past 15 years, with a fair number of acceptances. There were also many rejections, most notable rejection from PLoS One, perhaps the flagship open-access journal [4].

These data speak for themselves, or do they? HINT: beware of obvious answers offering gifts.....

There are a number of problems with this article, least of which that it does not distinguish between predatory open-access journals and more reputable ones [5]. But perhaps the real problem with Bohannon's article is that it does not explore: 1) the role of lax editorial standards at traditional peer-review journals, or 2) conceive of this as a problem of false positives rather than a moral failing. This is along the lines of Michael Eisen's (founder of PLoS One) chief criticism with the article [6], and the reason why publication in Science makes it seem a bit like subterfuge.

Eisen's other criticism involves the Science article being biased against open-access. I read the article this way as well -- the moral imperative is quite thinly-veiled. The paper takes the tone of a reactionary pundit who thinks a return to traditional norms (perhaps even imagined ones) can solve any social problem. In light of this, here is some vitriol from Michael Eisen on the problem with subscription publishers vis-a-vis this issue:

"And the real problem isn’t that some fly-by-night publishers hoping to make a quick buck aren’t even doing peer review (although that is a problem). While some fringe OA publishers are playing a short con, subscription publishers are seasoned grifters playing a long con. They fleece the research community of billions of dollars every year by convincing them of something manifestly false – that their journals and their “peer review” process are an essential part of science, and that we need them to filter out the good science – and the good scientists – from the bad. Like all good grifters playing the long con, they get us to believe they are doing something good for us – something we need. While they pocket our billions, with elegant sleight of hand, then get us to ignore the fact that crappy papers routinely get into high-profile journals simply because they deal with sexy topics"

Which one of these is not like the other two? HINT: the guy (on the left) who violated Copyright law. HYPOTHESIS: Open-access is not a crime. COURTESY: Time Magazine cover.

From a phase space (e.g. parametric) perspective, the problem may be that traditional peer review is a sparse sampling of quality control. Of all the possible gatekeepers, we have 3-4 people either chosen at random or chosen explicitly to prime the pump (NOTE: when you suggest reviewers, you prime the pump). Not exactly the kind of strict consensus defenders of the traditional gatekeeper model like to believe exists.

A related observation (also inspired by physics) is something I call the "bifurcating opinion" issue. This occurs more often than one would think (or hope for). For example, one reviewer thinks an article is great, while the other reviewer hates it. The solution might be to add reviewers, but this might simply extend the problem in a manner similar to flipping a coin. Is this a legitimate way to reach consensus on quality control? Or is consensus even necessary?

I will now tell a story about a manuscript I posted [7] to Nature Precedings, a preprint (now archival) service run by a traditional publisher, in 2009. The paper was accepted under limited standards of quality control (there is a screening process, but no formal peer review process). I did so for two reasons: 1) a belief in scientific transparency, and 2) it did not fit cleanly into any existing journal (based on my first-pass approximation of the journal landscape). Soon after posting the paper, I was contacted by a Journal editor, who encouraged me to submit the paper to their Journal (which I did).

Three months later, the editor contacted me and said that 45 reviewers felt they could not be impartial reviewers to the article. So at least my intuitions were vindicated! But what does this say about quality control? Most certainly, the reviewers were not willing to issue a false positive acceptance. But does this come at the expense of rejecting novelty (a false negative)?

A schematic showing a 3-D phase space (demonstrating examples of sparse sampling and bifurcating opinion) of scientific expertise for a given area of research/article. The phenomenon of bifurcating opinion was used to show that agreement amongst reviewers is expected at no better than a chance occurance by [8].

In an article from the Chronicle of Higher Education [9], it is pointed out that open-access journals are in a frontier (e.g. wild-west) phase of development. In that sense, a non-uniform degree of quality control should be expected across a random sampling of journals -- with some degree of predatory enterprise. A representative from Science said this about the results of [2]:

“We don’t know whether peer review is as bad at traditional journals,” he said. “Then again, OA is the growth area in scientific publishing.”

This brings up another issue: does selectivity necessarily reflect quality? In the Bohannon study [2], open-access accepted the fraudulent papers even after they were put through peer-review process. As far as I am aware, the qualitative responses of these reviewers were not considered as a factor in the acceptance of fraudulent articles.

Journals with high selectivity are widely assumed to be better at filtering out noise (e.g. weak results and methods) and potential fraud. However, as long as the rejection rate (100-acceptance rate) exceeds the number of fraudulent manuscripts, selectivity and fraud (or error) detection tend to be two seperate things. Sure, journals with a low acceptance rate are likely to include fewer fraudulent papers. But these same journals will also tend to reject many reasonable, and sometimes even outstanding papers.

It is notable that retractions of papers from highly-selective journals are not that rare. Take the case of Anil Potti, whose data were discovered to be fabricated. The result (as of 2012) is 11 retractions, 7 corrections, and 1 patrtial retraction [10]. Only 2 of these retractions involved an open-access journal (PLoS One). The rest, in fact, involved peer reviewed biomedical journals.

A classification scheme for Type I (B -- or a false negative) and II (C -- or a false positive) error in manuscript evaluation. The goal of peer review should be to minimize the number of manuscripts in categories B and C. Of course, this is not considering manuscripts rejected for non-fraudulent reasons.

What are potential solutions to some of these problems [11]? Particularly, how can we keep selectivity from stifling innovation (e.g. novel interpretations, groundbreaking findings)? Can the concept of crowdsourcing provide any inspiration for this? John Hawks [12] discusses radically-open peer review as done by F1000. The F1000 model operates on the premise of the popularity. The more votes an article gets, the more staying power the article has.

But should popularity be linked to significance and/or quality? Investigations into the incongruity between popularity and influence suggests that these should be decoupled [13]. Or put another way: is it the percentage of accepted manuscripts that makes a quality journal, or is it that all articles meet certain benchmarks? And if the acceptance criterion is the only acceptable measure of quality, then is it an unfortunate one that stifles innovation [14].

Here's the deal: you give me $1,000, and I'll give you legitimacy, or you pay me a subscription fee, and I'll give you even more legitimacy....... COURTESY: South Park, Scott Tenorman Must Die.

So are there legitimate issues of concern here? Of course there are. But there are also pressing problems with the status quo that are for some reason not as shocking. Fooling people with non-sequiturs and supposedly self-evident experimental design flaws is a clever rhetorical device. But it does not answer some of the most pressing issues in balancing academic quality control with getting things out there (e.g. reporting results and scientific interaction) [15]. In the spirit of non-sequiturs, I leave you with a video clips from Patton Oswalt's TED talk highlighting the lack of quality control in the motivational speaking industry.

Still image from the Patton Oswalt TED talk, which parodied motivational speaking by generating nonsensical passages using the generalized motivational schema (e.g. sentence styles, jargon).

NOTES:

[1] For more, please see: Black, D. Everything Liberal Activists Do Is Wrong and Destructive. Eschaton blog, July 30 (2010) AND Krugman, P. VSP Economics. The Conscience of a Liberal blog, May 7 (2011).

[2] Bohannon, J. Who’s Afraid of Peer Review? Science, 342, 60-65 (2013). The reason I make this judgmental statement is because it is important to distinguish between legitimate skepticism and fostering a moral panic (e.g. open-access is bad for science, and I'm going to use the organ of a major journal to foster support of the cause). I feel that Bohannon has crossed this line.

For a more nuanced take on the phenomenon of predatory open-access journals, please see: Beall, J. "Predatory" Open-access Scholarly Publishing. The Charleston Advisor, April (2010).

[3] the modeling of non-sequiturs that resemble a particular field's jargon (e.g. legalese, postmodernism) using a recursive transition algorithm. For more on the Dada Engine, please see: Bulhak, A. On the Simulation of Postmodernism and Mental Debility Using Recursive Transition Networks. CiteSeerX repository (1996).

[4] I have a confession: I was rejected from PLoS One! However, this might not be as "bad" as it sounds, if these two references are correct:

a) Neylon, C. In defence of author-pays business models. Science in the Open blog, April 29 (2010).

b) Anderson, K. PLoS’ Squandered Opportunity — Their Problems with the Path of Least Resistance. The Scholarly Kitchen blog, April 27 (2010).

[5] Hawks, J. "Open access spam" and how journals sell scientific reputation. John Hawks weblog, October 3 (2013).

Of course, conventional journals also rely on the same sense of reputability, whether deserved or not. For more please see: Reich, E.S. Science publishing: the golden club. Nature News, October 16 (2013).

[6] Eisen, M. I confess, I wrote the Arsenic DNA paper to expose flaws in peer-review at subscription- based journals. It is NOT Junk blog, October 3 (2013).

Since the Bohannon article deals with a competing publication model, Science should have at least issued a conflict-of-interest disclaimer upon publication. As the Wikipedia cleanup editors would say: this article sounds like an advertisement.

[7] Alicea, B. Range-based techniques for discovering optimality and analyzing scaling relationships in neuromechanical systems. Nature Precedings, 2845.v1 (2009).

[8] Rothwell, P.M. and Martyn, C.N. Reproducibility of peer review in clinical neuroscience: is agreement between reviewers any greater than would be expected by chance alone? Brain, 123, 1964-1969 (2000).

[9] Basken, P. Critics Say Sting on Open-Access Journals Misses Larger Point. Chronicle of Higher Education, October 4 (2013).

[10] Ivanoransky The Anil Potti retraction record so far. Retraction Watch blog, February 14 (2012).

* or simply Google the names "Yoshitaka Fujii" and "Joachim Boldt" -- their retraction count is astounding.

More insight might be found in the following paper: Steen, R.G., Casadevall, A. and Fang, F.C. Why has the number of scientific retractions increased? PLoS One, 8(7), e68397.

[11] For a visionary take (written in 1998 and using National Lab pre-print servers as a template for the future) on open-access publishing, please see: Harnad, S. The invisible hand of peer review. Nature Web Matters, November 5 (1998).

* this reference also discusses self-policing vs. peer consensus and the issue of peer review as a popularity poll.

[12] Hawks, J. Time to trash anonymous peer review? John Hawks weblog, October 3 (2013).

[13] Solis, B. The Difference between Popularity and Influence Online. PaidContent, March 24 (2012).

[14] I was once told that to be accepted for publication, a scientific article should not have too many novelties in it. For example, an article that has a novel theoretical position or method is okay, but not both (or additional novelties). This was anecdotal -- however, this seems to be a built-in conservative bias of the peer-review system.

UPDATE (11/5)! What is the optimal level of novelty relative to scientific impact? For a large-scale analysis, please see: Uzzi, B., Mukherjee, S., Stringer, M., and Jones, B. Atypical Combinations and Scientific Impact. Science, 342, 468-472 (2013).

[15] Food for thought: does peer-review and standards actually harm science by excluding negative results from the literature? For more about this and the replicability crisis in science, please see this article (which I will be coming back to in a future post): Unreliable research: trouble at the lab. Economist, October 19 (2013).

October 22, 2013

Fireside Science: The Consensus-Novelty Dampening

No comments:

Post a Comment

Printfriendly