Mind the gap (in the literature)


Hey – Before I start, here are some upcoming events I’m doing at Cambridge University, which are open to the public:

A lecture at Wolfson College, Cambridge on Tuesday 6 June at 5:30pm, which touches on themes from my new book with Simon Clews ‘Be visible or Vanish’. There’s an in-person option if you can get to Cambridge, and an online version if you can’t: you can book here.

I’m giving a lecture about the PhD in uncertain times where I will talk about the latest data we have on the UK market for research skills, at Kings College, Cambridge on Monday 10 July, 5:30 – 7:30pmyou can book here

Ok, here’s the post for this month – one in a planned series about how to speed up writing about the literature.

I don’t enjoy doing literature reviews.

Academics have amazing ideas and insights, but tend to hide them under obtuse jargon and behind annoying paywalls. Once you dig the papers out of the digital ether, the real pain begins. Academic writing can be both boring and frustrating; reading a lot of it all at once can really drive you nuts.

I’m on sabbatical and foolishly told my boss I would do a big literature review on neurodiversity and the PhD while I was away. This is proving hard. I’m doing my sabbatical at Cambridge University and the folks here are really hospitable. I can never say “no” to a free lunch or dinner, especially when it’s with with smart, interesting people in a medieval (or modernist) college dining hall filled with ancient academic bling!

Behold the culinary splendour that is the Cambridge dining scene. This is what they would call in Glasgow a ‘fish tea’, complete with mushy peas (at Wolfson College, where I’m ‘attached’ while being a visiting scholar at ThinkLab):

And here’s what remained of a lovely piece of cod at Kings College by the time I was brave enough to sneak out my phone (under the guise of pausing to check my email before finishing my greens):

To maximise my socialising time, I am trying to be super efficient with this literature review nonsense.

This is the biggest literature review I’ve attempted since I started my PhD in 2006. We read everything on paper back then, even journal articles (we were living like animals I tell you). I ended up with a metric shit ton of photocopies and books full of post-it notes. I’m sure this physical clutter contributed to my literature review anxiety syndrome and subsequent avoidance behaviour.

This time I’m using every technological trick I know to manage the literature pain. Here’s my process:

Segment the literature

The temptation is to just start throwing keywords at Google Scholar and seeing what gets loads of hits. And hey – don’t let me stop you. I’ve found lots of good stuff this way, but you can quickly feel like you’re drowning when you do this scattergun thing.

To avoid overwhelm, I ‘segment’ the task, aiming to map a small part of the larger literature universe. In this case, just the literature on neurodiversity in education: not the causes, diagnostic tools, or debates about whether or not it should be considered a disability. I may have to delve into all that, but one thing at a time.

If you’re working on something big, like a book or a PhD, you’ll probably need more than one literature map, that’s ok. Several, tightly curated, maps can help you work your way across the vast literature expanse in a controlled and methodical fashion.

Express your search as simply as you can

Assume there is a genuine gap in the literature and express it as a question, like so:

Is there any existing research about the experiences of neurodiverse PhD students and working academics that can inform my research?

Writing your question down on a post-it note and sticking it to the edge of the screen will keep you focussed. Now you need a search phrase or two. Think Boolean – you want key terms with modifiers, like so:

Neurodiversity and higher education
Neurodiversity and employment

(The second one is there because PhD students are often staff members too).

If your search starts turning up too much unrelated stuff, add modifiers. For instance, there’s a lot of papers about how to help people with cognitive functioning problems into basic employment. I am more interested in people who do knowledge work, so I added a ‘not’ modifier:

Neurodiversity and employment not disability

It’s honestly ok to start with a couple of simple phrases like this – or even just one. The magic of algorithmic citation-ish searches (we’ll get there) means finding just a bit of stuff is going to help you find lots of other stuff.

Find the right database(s) to begin your search

Google scholar is fantastic and has the best interface for searching, but you should start in a ‘proper’ subject matter database. In my case, ERIC (via ProQuest).

I am not sure why I offer this advice.

I hear tell that ‘Google doesn’t have everything’, but in my experience it does, at least for my field. Look, there’s no harm sticking to orthodox methods; at least that’s defensible during a peer review review process. The first database is only a starting point anyway.

(If you’re not familiar with the databases in your discipline, time for a visit with your friendly librarian).

Do not read any full papers yet!

Stick with just reading abstracts for now; you’re just trying to get the vibe. It’s a rookie mistake to download and read individual papers at this stage. Trust me: I’ve made this mistake countless times, enough to know this way lies madness. You’ll fall down rabbit holes and before you know it you’ll be reading a paper about the politics of dusting in 18th century houses (joking, kind of).

I also advise against importing papers into your reference manager straight away. (If you are not using a reference manager, install one – I’ll wait). If you put all the papers into your reference manager from the beginning, you’ll get attached and end up with way too much ‘stuff’. Take it from a reformed hoarder, having too much ‘literature stuff’ is bad for thinking.

The process of curating papers is a bit like buying clothes from a huge charity shop: it’s hard to choose from abundance and some of them will be a bit rubbish. You’ll need to try on a lot before you find the few that will suit you.

Your digital ‘change room’: a humble spreadsheet

If a paper seems promising, save the citation only in your ‘digital change room’. Fire up Excel, Google sheets or some other spreadsheet tool. Cut and paste the formatted citations into each row as you go. The formatting will be horrible, but tell your perfectionistic self that’s a problem for later. As long as the first author name at the start is correct, you’ll be ok.

The reason I suggest spreadsheets is they enable you to curate ‘on the fly’. As you search, you will find yourself developing inclusion and exclusion criteria, further narrowing the scope. Write these criteria down as you go so you remember why you made specific decisions. As you get a better idea of the whole, some stuff you initially collected may seem less relevant. Letting go of papers when they are in your reference manager is psychologically hard for some reason; deleting a line from an excel spreadsheet is painless couple of clicks.

A program like Excel enables you to make a useful look up table; less fuss than a reference manager, easy to sort and more portable. You can search author name(s) before you add a new paper, or simply ask the software to remove duplicates at the end. Add a separate column for the year the paper was written so you can sort the literature into a timeline. Reading papers in the order they were written helps you see how the conversations have developed.

A list of tightly curated references is a valuable research artefact which you can share with your supervisor and/or team mates. Later, you may wish to publish this sheet in a data commons for other researchers to use.

Explore the citation neighbourhood (with robots)

Your initial list may be quite large. From ProQuest alone I got 90 some papers, just from the two search terms I listed above.

Hopefully, like me, you found a couple of literature review papers: where another researcher in the past has done the work you are doing now. You’re now ready to broaden out the search by using the references (what’s called in the trade a ‘citation search’). Start by scraping the references out of the literature reviews you have found, then reach for help from the robots.

There’s some interesting literature review helpers/robots on the market now; all use some form of algorithmic search. You might want to check out Petal and Scite, but I used connectedpapers.com for this task because it gives me a lot of control and is simple to use. Sadly, it’s no longer free. I paid $30 for the year, but I’ve often spent this amount on a book I haven’t read all the way through, so I think it’s well priced.

Connected papers displays your selected paper as part of a ‘social graph’ with nodes representing individual, related papers. Connected papers is not strictly a citation searching tool: it shows your paper within an overall citation ‘neighbourhood’. According to the notes, the algorithm that powers the graph works on a principle of ‘Co-citation and Bibliographic Coupling‘. Basically, papers that are closer together in the graph are likely to be similar in topic:

Clicking on individual papers will show you the abstract and give you a link to Google scholar, or other database. Connected papers harnesses the wisdom of the scholarly network. Basically, if another researcher on your list has missed something relevant, but other scholars have listed it, it will appear – even if your origin paper did not directly reference those other papers in the graph. Pretty clever!

I systematically put in each of my 90 original papers and explored the neighbourhood. By the time I exhausted the search I’d found 229 highly relevant papers. Your search is exhausted when you start discovering the same papers, over and over without finding much new stuff. Warning: any literature research is like an asymptote – getting closer and closer, but never hitting zero. At some point you just have to call Time. I stopped when it was taking at least half an hour to get new hits.

Add categories to aid writing

A good literature review should describe, at a high level, the trends and concerns of other researchers. You can use your spreadsheet to start this task. I simply read through each title and quickly tagged each paper, in a new column, with a few simple words and phrases like #autism, #adhd, #supportprograms, #academicsuccess:

At the end of this process I wrote a high level summary for myself:

Of the papers on higher education (109), a title analysis showed 109 were about autism exclusively, 25 were about ADHD exclusively and 4 were concerned with dyslexia. The rest (18) included all diverse neurotypes. The common themes were: reporting on or evaluating specific support programs (39); lived experience (34); academic success (32); transition from high school to college (13); mental health (10), raising awareness in peers or teachers who did not identify as neurodiverse (10) and Bias (3). A small number of papers (7) tackled the issue of writing as a specific skill while 14 tackled social skills. There were no papers exclusively about PhD students and only seven papers about neurodiverse academics.

I’ve validated my ‘gap’! Relief.

Because I measure these things, searching, indexing and tagging all this literature took me around 10 hours. I know it takes me around 40 minutes (on average) to read and take notes from a paper, so that’s 152 hours or so of reading I’ve lined up.

If all I did was read all day, it would take me a month. But no one can read intensely for 8 hours a day! Especially when there are so many free lunches on offer…

Do I have to read everything? Frankly: no. Skim reading is a ‘legitimate academic deviance’ for a reason. Some papers will be original and useful; worth reading closely, maybe even multiple times. Others will be derivative: just adding to the weight of evidence, in which case speed reading the abstract and the discussion will likely be enough.

Connected papers can show me which papers other researchers think are useful, which is a starting point for winnowing down. Having the whole list in Excel makes it easy to realistically plan the next stage of the project. I estimate close reading and note taking of this lot will take around 30 days, about the length of time I have left on my sabbatical. Easy!

The next step is to go back and import all the papers into Zotero, so I can take notes in Obsidian. I’ve added ChattieG functionality with the TextGenerator plugin, which helps me make quick summaries of my notes and the Smart Connections plug in, that helps me link related notes together. But all of this is a blog post for another time.