When scientific papers are, literally, gibberish



This is a republish of this blog

More than 120 scientific papers have been removed from electronic databases of such papers published by Springer and the Institute of Electrical and Electronic Engineers. The papers contain fraudulent research findings—but not exactly of the kind that you might expect.

Apparently, the papers were generated using a program called SCIgen, or some knock-off. Developed by scientists at MIT almost a decade ago, SCIgen works from a database of existing scientific papers and strings common but arbitrarily phrases together to create papers that superficially sound like genuine scholarship but are actually absolute gibberish.

The developers of the program wanted to demonstrate that papers for presentation at professional conferences could be accepted simply because they sounded “scientific.”

The current demonstration of the program’s efficacy was provided by Cyril Labbé, a computer scientist at Joseph Fourier University, in Grenoble, France, who has developed a method for identifying papers produced with SCIgen.

The revelation that such a large number of papers with no scientific value whatsoever had made their way into databases being accessed by students and researchers all over the world will undoubtedly lead to a chorus of declarations about the uselessness of much academic research and publication.

But I think that a few at least somewhat more subtle conclusions might be drawn from these revelations:

1. A large number of the fraudulent papers were framed as having been delivered at Chinese conferences. That could suggest a serious lack of oversight among Chinese scholars and institutions, which might be a product of the rapid expansion of the Chinese economy and higher education in China, in particular in its research and development capacity.

But such a conclusion depends on the papers actually having originated in China and their having been produced by the Chinese scholars to whom they have been attributed. At this very early stage, before any formal inquiries have been undertaken as a result of the revelations, there is no detailed information on how many of the authors of the fraudulent papers are actual researchers and how many are fictional creations. In one instance, however, a Chinese scientist to whom one of the papers had been attributed has announced that he was completely unaware of the paper’s existence.

So, instead of being a black mark on the integrity of the review processes and oversight in the Chinese scientific community, the publication of these papers may very well turn out to be an attempt by someone outside of China to undermine the credibility of scientific research produced in China. In short, we should reserve judgment until we have a clearer idea about who produced the papers.

2. For all of the great advantages to having so much scholarship so readily available through massive databases, there are clearly some downsides. The attempt to be as inclusive as possible will lead almost inevitably to the inclusion of some dubious, if not patently fraudulent, papers.

But I think that the model that we have followed in digital scholarly publishing has exacerbated to the problem. In making ourselves increasingly dependent on corporations for the production of our scholarly journals and the management of these massive databases, we have opened the door to less scrupulous scholarly oversight of what is being published.

It may very well be that the creation and the management of these massive databases could not have been accomplished without corporate investment, but there is an almost inevitable trade-off in a number of areas, including the verification of the integrity of the publications being made available.

3. At first glance, it seems astonishing that such a large number of fraudulent papers should be exposed by one individual. One might naturally wonder why many of those papers had not been identified as fraudulent in a more piecemeal fashion by other scholars reviewing the literature. One might be tempted to jump to the conclusion that if no one is reading the literature any more closely than this—or at all–it is for the most part useless.

Of course, this would assume that the value of research is always immediately evident. It would also assume that the problem is in the volume of material being produced, rather than in the time available to read it.

I would suggest that faculty need to be given more time to digest what is going on in their fields—not just in their narrow specialties but across related disciplines. Some of the most important developments have emerged from transferring a concept, an approach, or an insight from one discipline to another. We have become so focused on scholarly output that we are giving short shrift to input.

One of the ways that we can measure (and reward) attention to current scholarship—and, in the process, facilitate that attention to current scholarship–may be to re-emphasize the annotated bibliography and the literature review as scholarly projects. These publications used to be a way simply to identify and to collect citations of the relevant literature on a topic, a function now largely replaced by search functions within a database. But if one moves beyond descriptive reviews to pointedly analytic and evaluative reviews, these kinds of scholarship could be more valuable now than they ever have been in the past.