Getting with the PID programme


If you’re a researcher in any field, chances are you want people to find, read and use your stuff, right?

You probably want them to continue finding it, using it and correctly attributing it to you, whether it’s twenty days or twenty years after publication. In our current state of digital deluge, we’re pretty good on the twenty days. It’s the twenty years where we come unstuck.

Enter the persistent identifier, or PID. Slayer of the Error 404 message!

A PID is a long-lasting, unambiguous reference to a digital object. That object could be a journal article, dataset, scientific sample, artwork, PhD thesis, publication or person, you name it.

The PID essentially takes you to a record containing metadata about that object or person including, where applicable, its current location for access or download.

The great thing about PIDs is that they stay put. If the location of an object changes, the metadata behind its PID record can be updated by automated or manual processes to reflect that new location. The location of the PID itself – the record of the object – doesn’t change.

What does a PID look like?

PIDs are minted and hosted by a range of organisations. They could be research institutes like CERN in Geneva catering for a specific research community, or not-for-profit organisations dedicated to sharing global research data such as CrossrefDataCite and ORCID.

There are many PID types depending on the nature of the digital object being described. DOI, ARK, Handle, ISBN, IGSN… it’s acronym city out there but there are some excellent online resources to help you make sense of which PID is for what.

The PID itself is an alphanumeric string that might look like this: doi:10.1186/2041-1480-3-9, or this: hdl:2381/12775, or this: ark:/13030/tf5p30086k. It can be hyperlinked or used alongside a citation.

What’s in it for me?

The benefit of the PID is not only its ability to point directly to a very specific object or person – driving other people to you and your work. It’s about connections to other PIDs. The potential for linking authors to articles to funding bodies to data, other articles, other datasets and so on. This is the true meaning of linked open data. Exposing research, connecting researchers, reducing duplication, increasing collaboration, improving discoverability, ensuring appropriate attribution and citation.

Let’s say you’re a geologist and you’ve found an article about sequences of quartzites in the Flinders Ranges in a peer-reviewed journal. In an ideal PID universe, the article would cite other journal articles in earth sciences using a DOI (Digital Object Identifier); reference fieldwork and quartzite samples using an IGSN (International Geo Sample Number); and link to its author, say, at the University of South Australia and co-authors at Nankai University using an ORCID iD (Open Research and Contributor ID). Those are all different PID types. Each would take you, the researcher, to a record which is unambiguously that of the sample, article or person in question. That record could link you to related data, people and publications to help you continue your research and to make new connections.

PIDs for People: ORCID

There are multiple PIDs for people but a good example is the ORCID iD. You may already have come across this as a compulsory field when submitting a research grant application. An ORCID iD is a record about you, compiled by you, controlled by you. You might think of it like a social media profile page minus the news feed, memes, unsolicited comments and endless notifications. It’s a clean, central record listing and linking to all of your research outputs, educational affiliations, funded projects and collaborators.

The nice thing about ORCID is that it’s a not-for-profit, member-driven organisation. There’s no sense of handing your data over for on-selling or nefarious purposes. Researchers registered with an ORCiD ID can use the ORCiD registry’s transparent linking mechanism to pull in data from other persistent identifier platforms (such as for publications, funding bodies, websites, datasets, researchers and research institutions) and create their own verifiable research profile. ORCiD provides an API to support system-to-system communication and authentication.

Having an ORCID iD is particularly handy for disambiguation – i.e. if your name is David Smith and you’re continually confused with other David Smiths in your field. It ensures that your work is correctly attributed to you, helps to monitor interest in your research outputs, and makes you far easier to find.

Sustaining the PID universe: the FREYA project

For PIDs to have their full effect, of course, they need to be widely understood, adopted and sustained. Research institutions, libraries and funding bodies around the world have for some time now been adding PIDs to the metadata behind their catalogue records, generating PIDs themselves, or insisting on the provision of PIDs when an article or project proposal is submitted.

As new PID types emerge these will need to be incorporated into the metadata behind our systems, in some instances we’ll need to build new infrastructure, and we’ll need to ensure that different PID systems are interoperable – that they can ‘talk’ to one another.

All of this is driving the three-year European Commission funded FREYA project. FREYA brings together twelve partners across Europe, Australia and the United States to improve PID systems, assess emerging PID types and foster their development and adoption.

We still have plenty of work to do to get PIDs working globally.

The FREYA team is continually collecting stories from the research community that tell us about how we could improve and extend the PID system. We need PIDs for records of cultural artefacts, for example, or scientific instruments, grants, longitudinal projects, geographic boundaries, historical personages. We’re uncovering potential issues such as who has the right to assign PIDs to heritage material out of copyright? How do we recognise contributors to scholarly research who don’t necessarily fit the researcher ‘mould’? What’s the best way to deal with multiple versions of the same dataset?

The FREYA project runs an Ambassador Programme to extend its reach into a range of research organisations around the world, and to gather feedback and input from a group of interested experts. Our project team members regularly run workshops and attend conferences to spread the word, generate discussion, and troubleshoot the technical challenges around PIDs.

We welcome contact from anybody interested in learning more or building awareness about PIDs in their own research community. More information about PIDs for researchers can be found in the FREYA Knowledge Hub.

Author Bio: Dr Barbara Lemon is a member of the FREYA project team at the British Library.