December 1, 2017

Coherence and Relevance in Scientific Communities

          During the past year, Synthetic Daisies featured a series of posts on relevance theory and intellectual coherence within research communities [1]. In this post, I would like to use a set of small datasets to demonstrate how relevance plays a role in shaping scientific practice [2]. We are using a syntactic approach (or word frequency) to infer changes over time in specific scientific fields. 

          This is done using a list of words extracted from titles of accepted papers at the NIPS (Neural Information Processing Systems) conference from various years past. The NIPS conference (annual) represents a set of fields (Neural Modeling, Artificial Intelligence, Machine Learning) that has experienced rapid innovation and vast methodological change over the past 20 years [3]. To get a handle on patterns that represent a more stable field, data from GECCO (Genetic and Evolutionary Computation Conference). While there is plenty of innovation in this area of CS research, the short-term wholesale turnover of methods is much less prominent.

          Our approach involves ranking words used in paper titles in terms of frequency, and then comparing these rankings between different time intervals. Title words are in many ways more precise than keywords in that titles tend to be descriptive of specific methods and approaches. Each list has the top 15 results for each year listed. Changes in rank are represented by lines between their location in each pairwise list, and words that newly appear or disappear from the list project to a black dot underneath the ranked lists. 

          The working hypothesis is that periods of rapid change are characterized by very little carry-over between two neighboring time-points. Basic descriptive terms specific to the field should remain, but all other terms in the earlier list will be replaced a new set of terms. 

NIPS Conference Accepted Papers for 10-year intervals.

          The first graph shows the change in top terms (relevance) across 10-year intervals. As expected for such a fast-moving field, the terms exhibit an almost complete turnover for each interval (4/15 terms are continuous between 1994 and 2007, and 5/15 terms are continuous between 2007 and 2016). The only three terms that are present in both 1994 and 2016 are "learning", "model", and "neural". These are consistent with the basic descriptive terms in our working hypothesis.

NIPS Conference Accepted Papers for 3-year intervals.

          The second graph demonstrates changes in top terms (relevance) between 2010 and 2016, using intervals of three years. As expected, there is more continuity between terms (8/15 terms are continuous between 2010 and 2013, and 11/15 terms are continuous between 2013 and 2016). The 2013-2016 interval is interesting in that two of the terms new to the 2016 list ("optimal" and "gradient") are descriptors of a word that was lost from the 2013 list ("algorithm"). This suggests that there was much coherence in research topics within this interval as compared the 2010-2013 intervals.

GECCO Conference Accepted Papers for 1-year intervals.

          For both one-year intervals, 11/15 terms are preserved from one interval to the next. The terms that exhibit this continuity are consistent with the idea of basic descriptive terms. This might be seen as the signature of stability within communities, as it matches what is observed between 2013 and 2016 for the NIPS data.

          In keeping with the idea of scientific revolutions [4], we might adjust our view of paradigm shifts as revolutions in relevance. This serves as an alternative to the "big person" view of history, where luminaries such as Newton or Einstein singularly make big discoveries that change the course of their field and upend prevailing views. In this case, revolutions occur when communities shift their discourse, sometimes quite rapidly, to new sets of topics. This seems to be the case with various samplings of the NIPS data.

          For papers presented at NIPS and GECCO, what is relevant in a particular year is made salient to the audience of people who attend the conference. Whether or not this results in a closed feedback loop (people perpetually revisiting a focused set of topics) is dependent on other social dynamics.

UPDATE (12/7):
A preprint is now available! Check it out here: How to find a scientific revolution: intellectual field formation and the analysis of terms. Psyarxiv, doi:10.17605/OSF.IO/RHS9G (2017).

[1] For more information, please see the following posts: Excellence vs. Relevance. July 2 AND Breaking Out From the Tyranny of the PPT, April 17 AND Loose Ends Tied, Interdisciplinarity, and Consilience. June 18.

[2] Lenoir, R. (2006). Scientific Habitus: Pierre Bourdieu and the Collective Individual. Theory, Culture, and Society, 23(6), 25-43.

[3] For more about the experience and history of NIPS, please see: Surmenok, P. (2017). NIPS 2016 Reviews. Medium.

[4] Kuhn, T.S. (1962). The Structure of Scientific Revolutions. University of Chicago Press, Chicago, IL.

No comments:

Post a Comment