As long ago as 1998, Colin Blakemore, then president of the British Neuroscience Association, expressed his reservations about the burden imposed by the UK’s research assessment exercise (RAE) on both institutions and those charged with “peer reviewing” their submissions.
“The changes in ranking that now occur from exercise to exercise are generally small in magnitude and in number,” he noted. “In other words, huge effort and cost are being invested to discover less and less information.”
In 2009, the Treasury proposed a much simpler system that awarded block grants in relation to institutions’ research income. However, this was greeted by howls of anguish from academics. Institutions had invested a lot in preparing for the RAE. Those who had done well out of it were particularly wedded to the status quo. And those with a high proportion of arts and humanities had realistic concerns about losing funding given that grants in these fields are small compared with science. So the new name belied the research excellence framework’s duplication of the RAE’s assessment methodology.
The influential Metric Tide report of 2015 discussed “metrics based” assessment, conceptualised as evaluation of individuals via indicators such as publication or citation volumes. But this was roundly rejected as unable to capture nuances of quality. “Peer review is not perfect, but it is the least worst form of academic governance we have,” the report concluded.
Now, with the UK government on a crusade against red tape and the academic workforce reeling from the pandemic, the REF – whose latest submission deadline passed at the end of last month – is again coming under scrutiny. It is worth reflecting on possible alternatives.
We should start by accepting that any new system will soon fall foul of Goodhart’s law: “When a measure becomes a target, it ceases to be a good measure.” This was recognised in 2009: if grant-based income was to be the metric, just watch that researcher who previously managed to do excellent work with few resources suddenly go cap in hand to the research councils.
We should also ask ourselves what we are trying to achieve. The stated purposes of the REF are to “provide accountability for public investment in research and produce evidence of the benefits”, “provide benchmarking information and establish reputational yardsticks” and “inform the selective allocation of funding for research”.
I’d be happy to ditch the second aim. A healthy higher education sector has a diversity of institutions, including some that may specialise in research on small and local issues. Measuring everyone against a single “world leading” yardstick is as undesirable as it is unrealistic.
The other two goals make sense. Public accountability is vital, and we need a fair way to allocate research funding. Indeed, in countries where corrupt and nepotistic practices are entrenched, the REF is seen as admirably transparent.
I have two objections to the current approach. First, the costs and benefits still seem massively unbalanced. For the past three years, institutions have been engaged in preparing their submissions, with many going through a mock REF in the process. Between May 2021 and February 2022, subject panels will assess the submissions. I did a few back-of-the-envelope calculations for my subject area: assuming each output is doubly assessed, each panel member could have about 500 outputs to assess over those 10 months. And that’s before we get to impact case studies.
Second, the REF process is not peer review in the sense that this term is usually understood. This is not a slur on the integrity of panel members; nobody, however dedicated, could have the expertise and time to adequately peer review such a large volume of work, much of which may be outside their specific interests. As Derek Sayer asked in 2014, how could a panel of historians evaluate the thoroughness, rigour and accuracy of a monograph on a specific period of eastern European history if none of them had background in that area?
Largely because academics are suspicious of simple metrics, we’ve ended up with a hugely complex system that is intended to give a more detailed and nuanced evaluation of quality but, in effect, just generates an enormous workload while achieving no more valid a result than could be obtained from a simpler system.
Yet what could go in its place? In 2014, I suggested that we could award funds in proportion to the number of active researchers at an institution, weighted, as is currently done, by the expense of research in each discipline. This would be far from perfect, but it would certainly cost a lot less.
It is easy to predict that institutions would proceed to designate everyone a researcher, with no quality control. However, if we anticipate such problems, we should be able to minimise them. The basic rule would be to start out with the simplest system possible and only add more complexity if the benefit demonstrably outweighed the cost.
We already have data from many REF rounds. We could evaluate how different the outcomes would have been if we had used simpler indicators. We could then decide on whether the cost of our current system is justified. Are we really spending all that time and money to achieve a fairer and more precise result? Or would we get an equally defensible outcome from an exercise based on existing indicators that could be completed in a matter of weeks rather than years?
Author Bio: Dorothy Bishop is Professor of Neurodevelopmental Psychology at the University of Oxford.