Mock REFs need a neutral referee


If you work at a UK university, your department will currently be using some form of internal review to identify which of your recent papers should be submitted to the research excellence framework later this year.

Unlike some, I don’t have any visceral objection to the REF. Good performance measures generate incentives that motivate staff and promote good work. Nor have I anything against the REF’s design. The criteria of originality, rigour and significance amount to a conceptual framework that is elegant in its simplicity and universal applicability. It is the implementation of that framework that is the problem.

Mock REFs involve departments’ assigning colleagues’ recent papers and impact case studies a score of one to four stars, with anything less than three considered ineligible for submission. But a commenter under a recent story in Times Higher Education encapsulates the problem: “I had two papers scored by four people last time (two internal, two external). The scores on both? 1, 2, 3 and 4 stars. One external gave two ones, the other two fours. Both were professors at Russell Group universities, in top ranked departments. Clearly my work divides opinion, but to determine someone’s career trajectory based on one score is grossly unfair.”

REF panellists are typically eminent scholars in their fields, but most departments can’t call on internal reviewers with anything like the same experience. The problem of variant scoring can be lessened by training, but the anonymity of the mock REF review process opens the door to huge biases given that the internal reviewer knows all the people being assessed and has various academic and personal relationships – supportive or adversarial – with them. There are other biases too, such as insufficient internal social science reviewers with expertise to assess quantitative work.

Reputations, egos and jobs are on the line, so the review processes are bizarrely politicised and emotive. It is relatively easy to push the score of a paper above or below a critical boundary; even if a second internal reviewer exists, they are probably less specialist and will not put up much of a fight over a well-put case. Moreover, the subtlety of unconscious biases means that sometimes even the reviewers may not realise that they are being more lenient towards someone because they attend departmental socials and smile in the corridor – or because they are close allies of the department head.

Many universities use external reviewers to promote accountability, but the way they are recruited and managed can also reflect bias. Some universities require all studies to be submitted for external review, but senior central staff may have little way of knowing whether departments conform. In other cases, only a sample of papers must be sent for external review, and favourites’ and mentees’ highly internally rated papers might be protected from such scrutiny. Remarkably, some departments even send their external reviewers the internals’ scores and comments, undermining their independence; externals are paid, so don’t bite the hand that feeds.

But what if people who feel undermarked could call on the expertise of field experts from around the world to back them up? Universities, you would think, would routinely consider such evidence. And, to be fair, some have begun to incorporate field-weighted citation impact (FWCI) into their mock REFs. But too many have not.

FWCI gauges a publication’s overall significance, originality and rigour – the REF criteria – relative to other studies in the same field with similar publication dates, based on the citation behaviour of scholars everywhere. Some people object that citations are not always positive, but even a critical reference to someone’s work indicates their contribution because it demonstrates that the work is pushing boundaries (and academic spats are useful when they clear the air).

REF panellists themselves are highly likely to use FWCIs to inform their own decisions, as they should. And while none of the citation indexes are perfect, they could be useful in lots of mock-REF situations.

Take the frustrated colleague whose paper was assigned a two-star rating internally (and was denied external review) despite receiving its journal’s annual award for best paper. Scopus shows an FWCI close to three times the global average, putting it in the 95th percentile. It must surely count as at least three-star.

Another colleague whose major output from an award-winning, research council-funded collaboration with internationally renowned colleagues was internally awarded a “low three-star”, putting it on the borderline for possible REF inclusion. The study has an FWCI score 10 times the global average, putting it in the 99th percentile.

Such discrepancies are not minor, and they have implications for individual careers.

Department heads and REF leads enjoy additional influence from the current processes and will not easily relinquish it. But I’m not suggesting that internal review be completely abandoned. As with most research, triangulation from different sources and angles is always better. But I believe that they should be consulted, as in the REF itself – especially in borderline cases. The fact that all outputs can be ranked on FWCI would allow for clearer, evidence-based demarcation of where borders lie.

Moreover, used at the university-level, FWCIs could identify departments whose proposed REF submission profile differs significantly from that which metrics would suggest is optimal. Departments found to have severe biases would have their internal review processes overhauled.

Used judiciously, FWCIs have the potential to improve the quality of REF submissions while – fingers crossed – reducing bias, bullying and unfairness along the way.

The author is a professor at a Russell Group university.