Novel writers use an average of 100 clichés for every 100 000 words. Or about one every four pages. That’s what Ben Blatt found by comparing a range of novels against a list of 4000 clichés. How does scientific writing compare?
In one sense, scientific writing avoid clichés. A scientist isn’t going to write that their new results put the nail in the coffin of the outgoing theory, that they were careful to dot their i’s and cross their t’s so as to follow the methods of Jones et al. by the book, that Brown et al.’s finding is a diamond in the rough, or that two possible interpretations are six of one and half a dozen of the other.
In another sense, scientific writing is full of clichés. Our writing often feels like a fill in the blanks: the results of this study show X, these findings are in good agreement with Y, or Z is poorly understood and needs further study. Need more examples? Checkout the Manchester Academic Phrasebank, a collection of phrases from the academic literature that are “content neutral and generic in nature.”
Perhaps I’m being too harsh. Maybe scientific writing isn’t full of dull, procedural, and formulaic expressions. Maybe it’s confirmation bias in that I’ve read hundreds of scientific papers, so it’s easy to recall at least a few such expressions.
This calls for some data. Specifically, an inspection of the text of 360 papers that I’ve collected over the years in my field of physical oceanography (published since 2000). I’ve used this set of papers before in a similar post. Combining an automated search with some manual intervention, I checked the 360 papers for a range of clichés: third-person constructs, hedging terms, overzealous assertions, and directives for future work.
The author dislikes third-person statements
It’s a myth that scientific writing should demonstrate dispassionate observation. But let’s say you do buy into the myth and want to pretend you’re somehow impartial and uninvolved in the experiments that you’re reporting. When it comes time to recognise your own role, you’ll be forced to describe yourself and co-authors in the third person as “the authors”. As in, for example, “the categories were selected by the authors”. This awkward phrasing appears in 6% of the papers (a small number, fortunately). And not only is it awkward, it can be ambiguous. Another 6% of papers use “the authors” to refer to other writers, not themselves.
The most common scenario for third-person references is phrases along the lines of “as far as the authors’ are aware, no studies have considered process X”. The second-most common usage of “the authors” is in abstracts, as if for some reason they need to be even more dispassionate? Though maybe a desire to remain neutral is not actually relevant here. Case in point: Acknowledgements, the one section of a scientific paper where personality is always allowed. Yet, if I include that section in my count, the 6% ramps up to over 30%. “The authors thank …”, “The authors wish to thank …”, “The authors acknowledge …”, etc. Why gratitude is often expressed in this odd manner beats me.
There’s no good reason to use “the authors” when “we” is simpler, shorter, and better. That’s why 96% of the papers use “we” at least once. (Six of the 14 that don’t use “we” are single author papers, but none of these use “I” instead.) On average, the pronoun comes up 22 times per paper. One paper features 247 uses. Ironically, that paper has a single author (though it’s mathematical convention to use “we” regardless).
Hedging our bets
“Extraordinary claims require extraordinary evidence“. A flip side of this oft-repeated aphorism is that if you have only ordinary evidence, you can make only ordinary claims. What’s a sure-fire way to ensure you claim is ordinary? Hedge.
Take the word “suggest”. There’s at least one use of suggest (or suggests, suggested, suggesting, suggestion) in 87% of the papers. On average the word shows up five times per paper, or about once for every three pages. When you get to using “suggest” more than once per page (as 10% of papers do), that’s a sign of overuse.
Close cousins of suggest are “consistent with” and “likely”. Both occur, on average, three times per paper. As it happens, both terms showed up at least once in 256 of the 360 papers. The stronger phrase “agrees with” (or agree without the s) is 10 times less common.
A different type of hedge is “believe”. This is used in 1/4 of papers, implying that at least 3/4 of us agree that science isn’t about “belief”. It’s about facts, evidence, theories, experiments. (Obviously, using the word “believe” doesn’t imply a scientist disagrees with the statement. And seldom is the word used more than once in a paper.)
This is important. That’s important. Everything is important.
We all strive to do important science. But in some sense, importance is a zero-sum game. If everything is important, then nothing is. But that doesn’t stop us from claiming importance, however tenuous it is.
“Important” or “importance” shows up in 96% of papers. That’s as often as “we”! (Though “we” is used three times more often when tallying up all uses.) Importance is referenced, on average, seven times per paper. But don’t use that number as a guideline for your own writing since it is skewed by a few large values. Instead aim for something closer to the mode of the distribution: three times per paper. Better yet, aim for a single thing being important.
Like “important”, but more assertive, is “crucial”. This is used far less often: it shows up in only 20% of papers. And only 20% of that 20% uses it more than once. You might say that the comparatively limited usage of “crucial” is consistent with our propensity to hedge.
Leaving it for later
The cliché that something creates more questions than it answers often applies to science. Suggestions for future research are an acceptable approach to flesh out a discussion (though you shouldn’t end with them). Yet I was surprised at how few papers explicitly identify such suggestions.
“Future study/studies/work/research/experiment” showed up in 18% of papers. “Beyond the scope” showed up in 9%. “Cannot explain”, “do not explain” or “does not explain” showed up in 4%.
Here’s where I might acknowledge the shortcoming of my methods and note that in future work I’d check whether there are phrases equivalent to those above that I forgot and thereby excluded from my counts. But this is a blog post, not a scientific paper, so .
What isn’t cliché in scientific writing?
Among the 360 papers, there’s a single use of “gigantic”. I also recently came across the hedge “hopingly,” which is sufficiently rare that WordPress is underlining it as a spelling mistake as I type. But, of course, single words don’t really count. Real answers to the question of what isn’t cliché might be humour, contractions, or one-word sentences and single-sentence paragraphs. I look forward to the days when these become common.
Asides inspired by the main text, but that didn’t quite fit.
- Regular clichés, like those in the second paragraph, seldom occur within the body text of scientific papers. Yet they are common fodder for titles, as shown by Google Scholar searches for All in a day’s work, Back to square one, or Don’t judge a book by its cover.
- For a particularly notable academic cliché combination, consider one of the topic sentences in a well-known psychology paper: “To clarify the distinctive nature of our proposal it is useful to briefly consider prior research on overconfidence”. The first 17 of the 18 words are a generic framing of the only meaningful word in the sentence. (Do check out the paper though. Its topic, the illusion of explanatory depth, is fascinating and relevant to the practice of science in general.)
- The Academic Phrasebank notes on its homepage that it was designed for non-native speakers of English, but it is the native speakers that have ended up as the majority of its users.
- I acknowledge that the fill-in-the-blanks approach is a good way to get started writing, but some pushback against this approach is warranted.
- Like “we” and “important/importance”, another common word is “data”, which shows up, on average, 22 times per paper (“dataset” is included in this count). 2% of papers had triple-digit usage. Conversely, 22 of the 360 papers did not use the word. Of those 22 papers, 20 were focused on either theory or simulations.
- Awkward second-person references are worse than third-person. 10% of papers mentioned “the reader”, whereas <1% of papers mentioned “you” in any second-person sense. (That 10% excludes the boilerplate phrase used in 10 papers that “the reader is referred to the web version of this article” [for colour figures].)
- In a similar vein to “the author” and “the reader”, there’s the phrase “this paper”. This phrase is used in 60% of papers. On average, it was used 1.4 times per paper, which is reasonable. Four of the 360 papers, however, had double-digit uses of the phrase.
- I wanted to include stats for “possible/possibly” in the hedging section. But these words are can be used in many ways other than hedging, so I left them out. Similarly, I excluded “critical” in the importance section as that has specific meanings in my field.
- The heading This is important. That’s important. Everything is important should be read in Oprah’s voice.
- Rather than proclaiming importance where it’s not due, this paper is more honest: “From a practical perspective, that result, of course, is only moderately interesting.”