During her rise, but before becoming a poker champion, Maria Konnikova was counselled by her coach that she was winning prize money in too many tournaments.
Wait, why wouldn’t she want to win prize money in every tournament? And what’s that go to do with a post about productivity in science? A loose answer to both questions: nonlinearity.
Maria’s initial goal in tournaments was to survive until enough other players had lost so that she reached the threshold, say the top 15%, to earn prize money. To reach this threshold, she was playing cautiously. Too cautiously, that is, for a realistic shot at the big money that goes to the top-placed finishers. Given how poker and its payouts work, a good player is better served by aiming high and winning a few large prizes (hence incurring many failures) compared to having many small wins.
Science poses the same conundrum. Instead of poker chips, we’re betting time. You can spend years on a high-risk, high-reward project and, if you’re lucky, you make a big breakthrough. Or you play it safe and produce incremental contributions.
Novel writers use an average of 100 clichés for every 100 000 words. Or about one every four pages. That’s what Ben Blatt found by comparing a range of novels against a list of 4000 clichés. How does scientific writing compare?
In one sense, scientific writing avoid clichés. A scientist isn’t going to write that their new results put the nail in the coffin of the outgoing theory, that they were careful to dot their i’s and cross their t’s so as to follow the methods of Jones et al. by the book, that Brown et al.’s finding is a diamond in the rough, or that two possible interpretations are six of one and half a dozen of the other.
In another sense, scientific writing is full of clichés. Our writing often feels like a fill in the blanks: the results of this study show X, these findings are in good agreement with Y, or Z is poorly understood and needs further study. Need more examples? Checkout the Manchester Academic Phrasebank, a collection of phrases from the academic literature that are “content neutral and generic in nature.”
“There is this scientific convention of: ‘You put the images on one side, then you put the text to decipher it on the other side.’” That’s Jonathan Corum, science graphics editor for the New York Times, politely critiquing one of the ways in which a typical scientific paper creates unnecessary work for the reader, or “cognitive overhead.”
Decipher is the key word above (and a word I’ll use again below). If deciphering is necessary, it will precede understanding, but that doesn’t mean it is necessary. “No one intends to build a product with large cognitive overhead, but it happens if there isn’t forethought and recognition for it.”
Einstein had it easy as a scientist. His most famous paper had no references and his work was seldom peer reviewed. In one instance in 1936, he withdrew a paper submitted to Physical Review on the grounds that he had not authorised it to be shown to a specialist before publication. In another instance, he asserts
Other authors might have already elucidated part of what I am going to say. […] I felt that I should be permitted to forgo a survey of the literature, […] especially since there is good reason to hope this gap will be filled by other authors.
Einstein, of course, didn’t actually have it easy—being forced to flee his native Germany is the obvious counter example. And he faced stiff competition in the scientific arena. I mean, have you ever been to a scientific conference in which half of the attendees had or would win a Nobel prize?
Feeling like your scientific papers aren’t getting the attention they deserve? Wanna bump up your citations counts for the next decade? Then, consider dying young. It apparently helps: a posthumous spike in recognition arises owing to the promotional efforts of colleagues.
This morbid example is but one of many arguments that citations in the scientific literature are not a true meritocracy. Another example: last month I hypothesised that many papers are cited only because they’re new, not because their content is new. It makes me think there’s a better way to rank references.
Scorning citation metrics is a favourite pastime of scientists (up there with scorning p values). Distilling a study’s quality to a single value is simplistic is the standard argument. But what if we double down? What if we focus more on numbers when it comes to citations?
Why are the references in your research so old? That’s feedback I remember receiving on my first bit of true research, my honours dissertation. The examiner wasn’t as blunt as my paraphrasing, but the gist of his comment was memorable enough. At the time, it seemed an odd comment. I now realise that it’s a valid concern.
How many colours do you need to visualise data scored on a five-point scale?
If you went with the obvious answer of five colours, here’s what you get:
The green and grey figure wins in two ways. First, it tells a story: about a third of respondents view Wikipedia favourably. (Although there are other interpretations of the data shown, a good figure emphasises a single message.) Second, the grey and green version just looks better.
Everything should be made as simple as possible, but no simpler said Einstein. Except, he didn’t. His version of the quote was four times longer.
I’m not surprised that it took a non scientist to paraphrase and create the short, popular version. As scientists, we are not accustomed to brevity. We want to provide every detail. We read papers filled with columns of 10pt text. We construct figures with dozens of lines and colours. We spare no bit of white space when we design posters. And don’t get me started on logos for scientific campaigns (long story short: too many elements, too many colours, and too literal).
We lack minimalism.
You may argue that detail, nuance, and chains of logic—hallmarks of science—are not easily reduced to 280 characters or a sexy soundbite. I don’t disagree. But there are still aspects of minimalism we should embrace.
Every writer leaves a hidden fingerprint in their texts whether they know it or not. It’s hidden in the relative usage of words: some words appear more than average and other words less. Imagine there’s a rumour that a well established author has written a new book under a pen name, but they’re are pretending that this is not the case. One piece of evidence that the authors are one in the same is to count the number mundane words like and, but or -ly adverbs used within the new book and then compare the numbers to the author’s past works. Authors use surprisingly similar numbers of each word over the length of a book. Don’t believe me? Then check out Ben Blatt’s book Nabokov’s Favorite Word is Mauve.
The title of this post is a nod to Blatt’s book. In this, he statistically analyses word frequency in a range of texts from literature to fan fiction to New York Times bestsellers. He uses numbers to teach us about writing. Early on, he shows how a reduction in usage of -ly adverbs correlates with a book’s appeal. This is but one of many predictors of a text’s success based only on word frequency. In the same vein, I’m going to scrutinise my own scientific writing to find room for improvement. Navel-gazing? Yes. Will you learn something if you read on? Also yes.