The 100-scientific-papers rule

If you’re ready to submit a scientific paper, you will have read 100 related papers.

Why 100? Well that advice has no basis more reliable than my own meandering experience. It’s my take on what it takes these days to be well versed on a specific topic and its broader background.

A typical scientific paper these days includes 30–50 references. Personally, I’ve gone as low as 24 and as high as 77. Twenty years ago, these numbers would’ve been lower, perhaps half as many. But rather than dwell on issues of inflation of the academic coin, we’ll just stick with 30–50 papers as our rough guess for now.

By the time you’re writing your own paper, you should’ve read more papers than you cite. And if you do the math, I’m perhaps implying that you should read 2–3 papers for every one that gets cited. Explore the literature beyond its essentials, but only so far before you reach a point of diminishing returns. Reasonable advice, right?

Not quite. My choice of 100 papers is simpler than that. It’s just a round number that’s a multiple of 10. It’s only slightly more legit than other quantities starting with a one and ending with a zero. Ten thousand steps a day. Ten thousand hours of practice to master a skill1. A picture is worth a thousand words. We only use ten percent of our brain. Clearly these kind of numbers are suspiciously neat2. (Had the Babylonian’s base-60 number system prevailed, would we get away with 3600 steps a day? The next order of magnitude is 603 = 216 000…)

So don’t quote me on 100 papers as a hard-and-fast rule, though I’m not the only one who has said it. My PhD advisor—who I might’ve (sub)consciously got this idea from?—suggests that a good PhD thesis will have over 100 references, properly referred to. (I know that citing 100 references is more than reading 100 papers, but a PhD thesis is more than a single paper.) On Quora, toxicology scientist David Belair notes that he read and compared 100 or so papers for a review paper that he wrote. And biology professor Claus Wilke calls 100 papers a concrete (but somewhat arbitrary) cutoff beyond which reading more will be of little value unless you’re making progress with your own paper.

Wilke goes on to advise that an inefficient way to research a new field would be to read extensively so as to find a gap in the literature and then do the work to fill that gap. And this is what gives value to having a ballpark figure of reading 100 papers. It (hopefully) provides structure and rhythm to the otherwise seemingly infinite task of getting to know the literature. If you know you’re going to spend, say, two years on a project, then you need to average one paper a week3.

Once you’ve read 100 papers, it’s tempting to milk them for all they’re worth. In the short term, this may save you some time, but it is bad for science as a whole. If you’ve every complained that movie companies rely too much on making sequels upon sequels (*cough* Fast and the Furious *cough*) rather than creating new characters and stories, then … well hopefully you see the analogy.

100 papers deeply read or skimmed?

Let’s calibrate expectations. When I say 100 papers, I’m not suggesting that you read all 100 completely and carefully4. Read only some of the 100 in depth. Remember, you’re not going to cite them all. In fact, of those cited, many will be fluff citations anyway. In these latter cases, you don’t need to know all the details, just the gists.

Since we’re talking about a continuum of effort, this is typically the point where I could call on (and overreach with) the Pareto principle. I could conjecture that 80% of what you’ll learn from the literature comes from 20% of the papers. So that’s how you should focus your time. But 80/20 doesn’t fit nicely with the factor-of-ten theme throughout this post. So instead I’ll call out Sturgeon’s law that 90% of everything is crap5. That means you should focus especially on 10 papers.

Also aim to strike a balance between reading papers just-in-time and just-in-case. Just-in-time learning is like on-the-job training, whereas just-in-case learning is like learning algebra at high school (which is taught to everyone just in case they need it later).

For just-in-case learning, you can read in a passive, start-to-finish way because your goal is to gain awareness of what’s out there in the literature. For just-in-time reading, that’s not the case. Instead, you’re trying to solve a specific issue at a specific time.

Will this post be relevant in a few years?

I’m writing this post four months after ChatGPT’s release raised existential questions as to whether machines can organise and combine information as well as or better than humans. Right now, at least, the answer is no6.

In his 2013 book Smarter than you think, tech journalist Clive Thompson argues that the internet isn’t dumbing us down. He states that “you can’t be googling information; it’s got to be inside you.” Google searches work when what you want is a fact or a stat. What it can’t help you with (as easily) is making linkages across the literature and sparking new insights and ideas.

Who knows? Maybe the paragraphs above will age like milk. Maybe quoting a ten-year-old book is a bad idea. And I’ll admit that it does seem old-fashioned and perhaps inefficient to read 100 papers simply to become familiar with what’s already been achieved. It’s a massive investment of time. But right now I don’t see a way around this.

Footnotes

1. There might be some truth to the 10 000 hour rule, or there might not. Take anything Malcolm Gladwell popularises with a large grain of salt.

2. I’m borrowing the phrase suspiciously neat from Paul Graham who uses it when describing the success rates of tech startups as 1/10.

3. The top answer on Quora to the question how many scientific papers do you read before you write your own? starts with My first guess is 52. If you’re like me, your first reaction is that 52 is an awfully specific guess. And then two seconds later it clicks that 52 papers is one per week for a year. At least I presume that’s how the responder came up with 52; he never says so.

4. Efficiency hacks abound on the internet with regards to reading scientific papers such as the three-pass approach, reading the paper out of order, and a bunch of no-brainers like reading the title and abstract first to see if it is relevant.

5. I was sure it’d be clear I was exaggerating when calling out Sturgeon’s law… until I checked:

what_percentage_crap_drop_shadow

6. There exist a bunch of articles in which ChatGPT’s repercussions on some field are discussed by asking the bot itself what it thinks. For scientific publishing, ChatGPT thinks that it could be used to generate new hypotheses or research questions based on existing data or to explore alternative explanations for existing findings. The human who wrote the rest of that article wasn’t persuaded.

Author: Ken Hughes

Post-doctoral research scientist in physical oceanography

%d bloggers like this: