Novel writers use an average of 100 clichés for every 100 000 words. Or about one every four pages. That’s what Ben Blatt found by comparing a range of novels against a list of 4000 clichés. How does scientific writing compare?
In one sense, scientific writing avoid clichés. A scientist isn’t going to write that their new results put the nail in the coffin of the outgoing theory, that they were careful to dot their i’s and cross their t’s so as to follow the methods of Jones et al. by the book, that Brown et al.’s finding is a diamond in the rough, or that two possible interpretations are six of one and half a dozen of the other.
In another sense, scientific writing is full of clichés. Our writing often feels like a fill in the blanks: the results of this study show X, these findings are in good agreement with Y, or Z is poorly understood and needs further study. Need more examples? Checkout the Manchester Academic Phrasebank, a collection of phrases from the academic literature that are “content neutral and generic in nature.”
“There is this scientific convention of: ‘You put the images on one side, then you put the text to decipher it on the other side.’” That’s Jonathan Corum, science graphics editor for the New York Times, politely critiquing one of the ways in which a typical scientific paper creates unnecessary work for the reader, or “cognitive overhead.”
Decipher is the key word above (and a word I’ll use again below). If deciphering is necessary, it will precede understanding, but that doesn’t mean it is necessary. “No one intends to build a product with large cognitive overhead, but it happens if there isn’t forethought and recognition for it.”
Einstein had it easy as a scientist. His most famous paper had no references and his work was seldom peer reviewed. In one instance in 1936, he withdrew a paper submitted to Physical Review on the grounds that he had not authorised it to be shown to a specialist before publication. In another instance, he asserts
Other authors might have already elucidated part of what I am going to say. […] I felt that I should be permitted to forgo a survey of the literature, […] especially since there is good reason to hope this gap will be filled by other authors.
Einstein, of course, didn’t actually have it easy—being forced to flee his native Germany is the obvious counter example. And he faced stiff competition in the scientific arena. I mean, have you ever been to a scientific conference in which half of the attendees had or would win a Nobel prize?
Feeling like your scientific papers aren’t getting the attention they deserve? Wanna bump up your citations counts for the next decade? Then, consider dying young. It apparently helps: a posthumous spike in recognition arises owing to the promotional efforts of colleagues.
This morbid example is but one of many arguments that citations in the scientific literature are not a true meritocracy. Another example: last month I hypothesised that many papers are cited only because they’re new, not because their content is new. It makes me think there’s a better way to rank references.
Scorning citation metrics is a favourite pastime of scientists (up there with scorning p values). Distilling a study’s quality to a single value is simplistic is the standard argument. But what if we double down? What if we focus more on numbers when it comes to citations?
Why are the references in your research so old? That’s feedback I remember receiving on my first bit of true research, my honours dissertation. The examiner wasn’t as blunt as my paraphrasing, but the gist of his comment was memorable enough. At the time, it seemed an odd comment. I now realise that it’s a valid concern.
How many colours do you need to visualise data scored on a five-point scale?
If you went with the obvious answer of five colours, here’s what you get:
The green and grey figure wins in two ways. First, it tells a story: about a third of respondents view Wikipedia favourably. (Although there are other interpretations of the data shown, a good figure emphasises a single message.) Second, the grey and green version just looks better.
Everything should be made as simple as possible, but no simpler said Einstein. Except, he didn’t. His version of the quote was four times longer.
I’m not surprised that it took a non scientist to paraphrase and create the short, popular version. As scientists, we are not accustomed to brevity. We want to provide every detail. We read papers filled with columns of 10pt text. We construct figures with dozens of lines and colours. We spare no bit of white space when we design posters. And don’t get me started on logos for scientific campaigns (long story short: too many elements, too many colours, and too literal).
We lack minimalism.
You may argue that detail, nuance, and chains of logic—hallmarks of science—are not easily reduced to 280 characters or a sexy soundbite. I don’t disagree. But there are still aspects of minimalism we should embrace.
Every writer leaves a hidden fingerprint in their texts whether they know it or not. It’s hidden in the relative usage of words: some words appear more than average and other words less. Imagine there’s a rumour that a well established author has written a new book under a pen name, but they’re are pretending that this is not the case. One piece of evidence that the authors are one in the same is to count the number mundane words like and, but or -ly adverbs used within the new book and then compare the numbers to the author’s past works. Authors use surprisingly similar numbers of each word over the length of a book. Don’t believe me? Then check out Ben Blatt’s book Nabokov’s Favorite Word is Mauve.
The title of this post is a nod to Blatt’s book. In this, he statistically analyses word frequency in a range of texts from literature to fan fiction to New York Times bestsellers. He uses numbers to teach us about writing. Early on, he shows how a reduction in usage of -ly adverbs correlates with a book’s appeal. This is but one of many predictors of a text’s success based only on word frequency. In the same vein, I’m going to scrutinise my own scientific writing to find room for improvement. Navel-gazing? Yes. Will you learn something if you read on? Also yes.
Have you ever noticed the similarities between stock images that convey an increase? More often than not, it’s an arrow initially heading up at about a 30° angle, followed by a downturn, before continuing back up. There’s occasionally a second down-and-up for good measure. It’s sufficiently cliche that Yale Economics should feel a little embarrassed to have incorporated it into their logo.
In the spirit of an English teacher inferring a lot from a little, I wonder if the downturns are intended to imbue some kind of story arc to the progress implied by the ascending arrow? As in, you have to get knocked down before you can get back up. Or maybe the downturns are there to instil a sense of realism?
Absurd as these rhetorical questions sound, they hint at a surprisingly profound issue about fake data and what makes it look real.
Science is full of abstractions. A line plot is an abstraction. A false-colour image is an abstraction. Even scientific notation like 6.0 × 1023 is an abstraction. Abstractions like these are central to science, invoked all the time, and easy to understand. Other abstractions are far from simple and may cause more confusion than clarity.
Abstraction is a slippery slope. Things can quickly get out of hand. Let’s start with a simple abstraction like velocity. The units, metres per second or kilometres per hour, tell us what it is: how fast something moves. I imagine most scientists would still be comfortable if I increase the level of abstraction by calculating the acceleration (i.e., moving from m s−1 to m s−2). But what if I had instead starting talking about a quantity in m2 s−1, not m s−2? Diffusivity and viscosity are such quantities; they measure how quickly something spreads. All I did was change m s−1 to m2 s−1 and I’ve taken a concept a kid can understand to something that may trip up an undergraduate physicist.