Information and knowledge have always been essential drivers of social progress, and the technologies through which knowledge is acquired, stored and communicated have been determinants of the nature and scale of their impact.
A technological milestone was passed at the turn of the millennium when the global volume of data and information that was stored digitally overtook that stored in analogue systems on paper, tape and disk. A digital explosion ensued that has immensely increased the annual rate of data acquisition and storage (40 times greater than 10 years ago), and dramatically reduced its cost.
In 2003, the human genome was sequenced for the first time. It had taken 10 years and cost $4 billion. It now takes three days and costs $1,000 (£770).
Like all revolutions that have not yet run their course, it is often difficult to distinguish reality and potential from hype. So what lies behind the “big data” phrase that has become the rallying cry of this revolution, and with which all levels of business and government, and increasingly universities and researchers, are struggling to come to terms?
The world of “big data” is one of enormous fluxes of digital data streaming into computational and storage devices, often from a great diversity of sources. It contrasts dramatically with the analogue world of relatively sparse and discontinuous data, and is consequently able to reveal patterns in phenomena that were hitherto far beyond our capacity to resolve. The observation of patterns in nature has often been the empirical starting point for productive quests for meaning, whether in the hands of a Copernicus, Darwin or Marx.
But we can go further. The learning algorithms developed by artificial intelligence (AI) researchers can now be fed with immense and varied data streams, which are the equivalent of empirical experiences, from which a device can learn to solve problems of great complexity, and without the prejudices that inhibit human learning. It is increasingly used in commerce, has great potential for research, but also poses threats to even the highly skilled jobs that have been regarded as essentially human, with profound implications for the future of work.
Sixteen years ago, Tim Berners-Lee proposed that the web that he invented, which discovers and produces electronic documents on request, could become a “semantic web” that allows data to be shared and reused across applications, enterprises and community boundaries, and machine-integrated to create knowledge, most profoundly of the behaviour of complex systems, including interactions between human and non-human systems.
These approaches not only offer novel opportunities for the natural sciences, engineering and medicine, but also for the social sciences and humanities. A common challenge that they all face, however, is that their data should be “intelligently open” (findable, accessible, intelligible, assessable and reusable). Without openness, researchers are trapped inside a cage of their own data and a community of ideas and knowledge based on a powerful collaborative potential, and able to interact with wider society in a more open science, fails to materialise.
These imperatives pose ethical challenges to publicly funded researchers to make the data they acquire intelligently open so that they can be reused, re-purposed or added to by others, particularly if that data provides the evidence for a published scientific claim.
They pose operational challenges to institutions and national science systems, not merely to prioritise the “hard” infrastructure of high-performance computing or cloud technologies and the software tools needed to acquire and manipulate data, but more problematic to the “soft” infrastructure of national policies, institutional relationships and practices, and incentives and capacities of individuals. For although science is an international enterprise, it is done within national systems of priorities, institutional roles and cultural practices, such that university policies and practices need to accommodate to their national environment.
The digital revolution is a world historical event as significant as Gutenberg’s invention of moveable type and certainly more pervasive. A crucial question for the research and scholarly community is the extent to which our current habits of storing and communicating data, information and the knowledge derived from them are fundamental to creative knowledge production and its communication for use in society, irrespective of the supporting technologies, or whether many are merely adaptations to an increasingly outmoded paper/print technology.
Do we any longer need expensive commercial publishers as intermediaries in the communication process? Do conventional means of recognising and rewarding research achievements militate against creative collaboration? Has pre-publication peer review ceased to have a useful function? These are non-trivial questions that need non-trivial responses.
Access to knowledge and information has increased in value in advanced economies such that it is becoming the primary capital asset. If the value of knowledge and information is so high, it is unlikely that private sector companies will readily cede this terrain to public bodies such as universities that have been society’s traditional knowledge nodes, with Google, Amazon and “platform” enterprises (such as Uber and Airbnb) as possible precursors of powerful interventions in the university space.
This may not simply be the replacement of one form of delivery of public good by another, it could be a trend towards the privatisation of knowledge, with profound implications for democracy and civic society. It is a potential trend that should be anathema to universities.
Author Bio: Geoffrey Boulton is Regius emeritus professor of geology at the University of Edinburgh, and president of the Committee on Data for Science and Technology.