Made to measure? Why university rankings are flawed - World leading higher education information and services

$\"\"$

An unwritten law has emerged in both the sciences and social sciences – that it is better to measure than not to measure. Perhaps our affinity to measurement is attributable to Galileo who is purported to have said count what is countable, measure what is measurable and, what is not measurable, make measurable.

In a forthcoming paper in the academic journal Measurement, my co-authors and I examine the question of measurability and what constitutes a measure, as opposed to an opinion or an estimate.

Measurement is stronger than opinion or estimation. It’s analogous to counting and requires the measured values to satisfy the mathematical operations of addition and ordering, and also the standards represented in the International Vocabulary of Metrology (the science of measurement), which tries to find a common language for measurements.

But there’s an increasing tendency to confuse opinion with measurement, especially in the social sciences and in multidimensional data where the measurement of many dimensions is reduced to a single value. Measuring the quality of a university, or the quality of research, for instance, or the liveability of a city.

In our paper, we establish conditions for measurement related to changeability and continuity with respect to observers, characteristics, and time. In simple terms, these conditions are designed to impart reliability to measurement, so that a measure doesn’t change instantaneously, or change markedly when there’s a small change in attributes. Physical quantities such as length, mass, temperature and hardness satisfy these conditions; but measurements in the social sciences often do not.

Measurement in the social sciences is usually by humans of human behaviour, and typically this measurement cannot be replicated and is strongly dependent on its model. In the paper, we characterise measurement in the social sciences as akin to constructing a financial portfolio. The ranking of universities is a good example. Three different rankings, Webmetrics, the Times Higher Education World University Rankings and the Shanghai-Jiao Tong Academic Ranking of World Universities weight different indicators, such as teaching and research, into a portfolio of indicators.

The competition between ranking agencies is then like a tournament of portfolios, and their objective is to persuade observers to converge to their portfolio. They want observers to be anchored by their ranking. When a proposed measure becomes widely accepted, universities minimise risk by adopting practices maximising that ranking.

In terms of our financial markets analogy, it is akin to a fund manager persuading clients that their asset allocation is the optimal portfolio for investors. But unlike financial portfolios, there can be no ex-post assessment of the reliability of a measure relative to a true value, because the true value is unknown. The problem is that these measures are no more than opinions, and opinions don’t satisfy the conditions for measurement.

For the rankings of universities, when we considered the three ranking measures for 2011, we found a very high rank correlation across the measures for the top ten universities. It would be difficult to imagine that any ranking measure could omit Harvard, MIT, Stanford, Oxford, Cambridge, Princeton, UC Berkeley from its top ten.

But for universities ranked in the fifth decile, that is 41st to 50th, it’s an entirely different story. There’s a very low rank correlation across the three measures in 2011; that is, very little agreement as to the rankings, suggesting greater uncertainty and opinion dependence. But, more importantly, in the lower deciles, the volatility of rankings over time is appreciably greater, reinforcing the notion that these rankings are more uncertain.

And for the lower deciles, cross-correlations over time across rankings are very low, again suggesting opinion dependence, rather than measurement. Universities are long-term institutions and their rankings should not change so quickly; nor should the divergence between measures change so quickly.

Measurability involves a process of convergence to an accepted measure satisfying invariance and continuity conditions. In most physical measurement, the conditions are almost always satisfied with competing measures converging to values that are highly correlated across the measures and invariant to small changes in attributes.

For a characteristic that is not measurable, convergence is not possible because differences between competing measures cannot be minimised. And that is the problem with measuring opinion-laden concepts such as the quality of universities, the quality of research and the liveability of cities.