100 things a scientist should know about

Inspired by 250 things an architect should know but 60% less ambitious

Click on each item for a brief elaboration

  1. Anscombe’s quartet

    Four distinct datasets (x vs y) that produce the same summary statistics (mean, variance, correlation coefficient, and line of best fit)

  2. HSL colour space

    A colour space that defines colours in terms of their Hue (e.g., red or blue), Saturation (vivid to washed out), and Lightness (white to black)

  3. Whitespace

    The area within a design (website, poster, figure, etc) that lacks text, images, or other elements

  4. HARKing

    A questionable approach to research: Hypothesising After the Results are Known

  5. Fermi problems

    Problems in which an answer cannot be estimated outright but is instead derived as the product of more easily estimated quantities (e.g., how many grains of rice are eaten across the world every year?)

  6. Use cases of .png and .jpg images

    The JPG format is optimised for photos, whereas PNGs are for graphs and diagrams

  7. That vs which

    That and which, although similar, have opposing implications about whether a clause is restrictive or not

  8. McNamara Fallacy

    Basing a decision on only numbers or other objective measures without reference to any qualitative factors

  9. Version control

    A tool for tracking and recording all changes to software and other digital files as they evolve

  10. Serial position effect

    The human tendency to better remember what happened at the start and the end and forget what happened in the middle

  11. Zenodo and Figshare

    Online repositories for datasets, code, and other research output

  12. Your carbon footprint

    A typical person living in a western country will have an annual footprint of 5–20 tonnes CO2

  13. Matthew effect

    Well known scientists get cited more often than lesser known ones leading to a positive feedback loop

  14. Hand-waving solutions

    A metaphor for an answer that might gloss over details, be vague, or rely on many approximations

  15. Logarithmic scales

    Scales that increase geometrically (e.g., 1, 2, 4, 8, 16, …) rather than linearly (2, 4, 6, 8, …)

  16. “Data” is a plural

    It can sound odd, but data were collected is correct and data was collected is not

  17. Left-branching sentences

    A sentence structure to avoid because the initial words only make sense as the sentence nears it end

  18. Regression to the mean

    A statistical tendency for outliers in an initial experiment to deviate less in a subsequent experiment

  19. ImageMagick

    Software for all manner of image manipulations and conversions that can be run from the command line

  20. Bash shell

    The default command line interface

  21. Butterworth filters

    A widely used approach for smoothing time-series data

  22. Golden ratio

    The value 1.618…; an aesthetically pleasing aspect ratio for a rectangle among many other claims to fame

  23. Types of map projections

    Flattening the earth to a two-dimensional image can be achieved in numerous ways, each with its own pros and cons

  24. DOI and PMID

    Unique digital identifiers that can point to publications, datasets, software, and more

  25. Edward Tufte

    An early name in data visualisation and author of several books on the topic

  26. Widows and orphans

    A line at the beginning or end of a paragraph that is separated from the rest by a page break

  27. Construction cost of the Large Hadron Collider

    One of the most expensive scientific experiments took ~3 billion Swiss Francs to build (or ~5 billion US dollars back in 2001)

  28. Resolution of an electron microscope

    Electron microscopes can resolve objects as small as 0.1 nanometers

  29. Adding epicycles

    Tweaking a fundamentally flawed theory in a last-ditch effort to make it explain observations

  30. Why governments fund basic scientific research

    Among many reasons, basic scientific research (i) lowers the barrier for firms that want to develop new products and (ii) develops skilled scientists and engineers who can capitalise on research undertaken elsewhere

  31. William Shockley’s thoughts on productivity

    Shockley speculated that a small number of scientists can be exponentially more productive in total because the creation of a scientific paper is the combination of many individual tasks, and productivity in each of these tasks multiplies together to give overall productivity.

  32. Kerning

    Adjusting the spacing between individual letters in text to improve aesthetics

  33. How last authorship varies across fields

    Depending on scientific field, the last author either did the least work, is the group leader, obtained funding for the project, or has a surname near the end of the alphabet

  34. Project Jupyter

    An open source project that simplifies and promotes interactive use of many programming languages

  35. Uncertainty propagation

    The uncertainty of a derived quantity (e.g., kinetic energy derived from speed and mass) can be calculated from the uncertainty of the input quantities following simple—though sometime tedious—arithmetic

  36. SSH (Secure Shell)

    The standard way to access a remote server via the command line

  37. Difference between a hyphen and a minus sign

    Although similar, they should not be confused; a hyphen (-) is a short dash used to combine words, whereas a minus sign is longer (−)

  38. Optimal number of characters per line

    A line of text should have 60–70 characters (counting spaces) for a single-column layout and 40–50 for multiple columns (see page 32 of Detail in Typography)

  39. The .eps file type

    A predecessor to PDF that was developed in the late 1980s and is almost obsolete

  40. Stroke and fill

    For line drawings, the edge is known as the stroke and the interior is known as the fill

  41. The Greek alphabet

    The order doesn’t matter, but knowing the individual letters is worthwhile

  42. Text anti-aliasing

    The smoothing of text to improve its appearance (especially relevant at coarse resolution)

  43. Triptychs

    A three-panel image or collection of images (and an easy way to create an attractive title slide)

  44. Active voice

    Better than passive voice in most cases

  45. Fast Fourier transform

    An algorithm that makes much of modern technology possible

  46. Pseudoscience

    Statements and methods purportedly grounded in science but obviously flawed

  47. ORCID

    A unique digital identifier for a researcher that is linked with their scholarly works

  48. Transistors

    Your phone likely has billions of them

  49. RAM

    A computer’s short-term memory in a sense (distinct from the long-term memory that is the hard drive)

  50. Effective cost of a night’s worth of observation from a large telescope

    About $50 000

  51. Functions in programming

    One of the building blocks of any programming language that, typically, (i) takes one or more inputs, (ii) does whatever to those inputs, and (iii) returns an output

  52. Argument from authority fallacy

    The incorrect assumption that a claim is true because it is coming from an authority figure

  53. Pregistered studies

    Studies in which the methodology and hypothesis are published before data are obtained

  54. Strawman argument

    Misrepresenting a claim or changing its context so as to make it easier to argue against

  55. The amount of freely available satellite data

    NASA, for example, currently has about 30 active earth-observing satellites producing about 30 TB of data each day

  56. For loops

    The simplest way in most programming languages to make a computer do something again and again

  57. Illusion of explanatory depth

    Most people are overconfident in their understanding of a complex phenomenon or procedure until they try to explain it step by step

  58. How regression works

    Calculating a line of best fit is one of those things everyone should do manually at least once to understand the procedure that can otherwise be a black box

  59. Bayesian statistics

    An approach to statistics in which probabilities are continually updated as new information is obtained

  60. System 1 and 2 thinking

    Two distinct ways of thinking: system 1 is fast and driven by intuition and emotion, whereas system 2 is slower and more deliberate

  61. Pasteur’s quadrant

    Use-inspired basic research, or the view that basic and applied research aren’t mutually exclusive

  62. Sayre’s law

    In any dispute the intensity of feeling is inversely proportional to the value of the issues at stake

  63. Planning fallacy

    The tendency to underestimate the time needed to complete a task (e.g., writing a scientific paper) even with prior experience in the same or similar tasks

  64. Floating point numbers

    The system used by computers that allows a small number of bits (a zero or one) to represent a wide range of numbers (e.g., 64 bits can be used to closely approximate any number, positive or negative, up to 1.8×10308)

  65. The Dobzhansky Template

    A format coined by scientist-turned-filmmaker Randy Olson that aims to drill down to the essence of an idea: Nothing in ___ makes sense except in light of ___ (e.g., nothing in biology makes sense except in light of evolution)

  66. Newman design squiggle

    A visual metaphor for the design process that works equally well for the process of doing science

  67. Gestalt Laws

    Design laws, grounded in psychology, for how humans perceive combinations of objects or elements

  68. Einstellung effect

    An inefficient problem solving technique where you rely on your previous approaches that worked in the past despite there being better methods

  69. Bike-shedding

    Also known as the Law of Triviality, bike-shedding is giving undue emphasis on minor matters such as the design of bike sheds to be included within the development of a nuclear power plant

  70. Simpson’s Paradox

    Subsets of a dataset, all of which have a negative statistical trend, can still produce a positive trend in the overall dataset

  71. Parkinson’s law

    Work expands to fill the time available for its completion

  72. Epistemic trespassing

    When an expert in a given field trespasses into another and makes claims where they lack expertise

  73. Decline effect

    The strength or effect size of a scientific result tends to decline over successive replications

  74. Base rate neglect

    Misjudging the probability of an event due to more intuitive individuating information (e.g., thinking it’s more likely than not that someone who is 6-foot-8 plays basketball professionally, except that the chances are a fraction of 1%)

  75. Identifiable victim effect

    The desire to assist a specific individual facing a certain hardship but not a large, unknown group of people facing the same hardship

  76. John Ioannidis

    A somewhat controversial physician/scientist perhaps best known for his claim that most published research findings are false

  77. Texas sharpshooter fallacy

    Deriving incorrect conclusions by overly focusing on clusters of data points that may have arisen by chance

  78. Survivorship bias

    A type of selection bias in which the dataset contains only people who made it past some hurdle

  79. BANs: big-ass numbers

    One of the simplest ways to visualise data in which, in place of graphs, a few select metrics are displayed as numbers in large text

  80. Anchoring bias

    The tendency (and salesperson’s boon) for people to focus on relative changes from an initial value rather than the absolute amount

  81. The difference between science and engineering

    Scientists aim to generate knew knowledge and engineers aim to apply knowledge to solve real-world problems

  82. Researcher degrees of freedom

    A measure of the flexibility a scientist has in developing, analysing, and publishing an experiment

  83. Banking to 45°

    As a rule of thumb, the aspect ratio of a line graph should be one in which the changes to be emphasised have a slope of ~45°

  84. The linear model of innovation

    The conjecture that basic research informs applied research, which promotes development and production, which ultimately lead to economic growth

  85. Daryl Bem’s precognition paper

    An infamous study—that passed peer-review—that purportedly shows that people can essentially see briefly into the future

  86. Starting with the cake

    A teaching philosophy that starts with the big picture rather than tedious fundamentals

  87. Arxiv

    One of the original preprint servers (now 30 years old)

  88. Principle of least astonishment

    A guideline that encourages a design (say, an interface or piece of software) to be built to behave in a way that most users expect it to

  89. Germanic vs Latinate words

    Words with a German heritage tend to be simpler and less pretentious than those from Latin

  90. Altmetrics

    A type of citation measure that counts mentions in blogs, tweets, and other social media rather than standard citations in scientific papers

  91. scite.ai

    A (now expensive) AI service that summarises the different ways a paper is cited (supported, contrasted, or mentioned) rather than merely counting the number of citations

  92. Root cause analysis

    A problem solving technique that looks to solve the underlying issue rather than the immediate (and possibly superficial) problem

  93. Complex vs complicated

    Something that is complicated may involve a tedious number of straightforward steps, whereas something that is complex may have multiple nonlinear interactions and emergent behaviour

  94. WEIRD subjects

    People from Western, Educated, Industrialized, Rich, and Democratic societies who are over-represented in scientific studies involving human subjects

  95. Inkscape

    A vector graphics editor that is more than sufficient for a scientist’s needs

  96. Donald Knuth

    A computer scientist notable for, among many things, the creation of the TeX typesetting language and his decision to forgo email as of Jan 1, 1990

  97. Oblique, isometric, and one- and two-point perspective

    Four standard ways to project a three-dimensional object into two dimensions

  98. The second law of thermodynamics

    Entropy of a closed system cannot decrease or, more simply, heat flows from hot to cold

  99. The rate of sea level rise

    The current global average is about 4 mm/yr, but this varies regionally depending on the vertical movement of land

  100. The Oxford comma

    The comma placed before “and” or “or” in a list of three or more items

Unintentional entertainment in scientific writing

Save for the occasional pun in the title, scientific papers seldom contain intentional humour. But there’s entertainment to be had if you have the right mindset. Let me show you.

Relatability can be the basis of a good laugh. And as a scientist who routinely uses time series data, I can relate to the struggle of unwanted gaps in a dataset. So I was entertained when I came across the following sentence:

No data are available for 1991 and 1992 because the volcanic eruption of Mt Pinatubo in 1991 contaminated the signal. (ref)

Why, exactly, am I entertained, you ask? Partly, it’s the notion of a very expensive satellite being thwarted by a bit of ash. More so, it’s that the sentence is the epitome of scientific writing. A freakin’ volcanic eruption messes up two years worth of data, and yet it’s described in the same matter-of-fact tone as the other technical details like the satellite’s pixel resolution. Good luck finding any other types of writers who recount a long-lived effect of a natural disaster in a single sentence.

Continue reading “Unintentional entertainment in scientific writing”

True scientific news doesn’t make the news

Journalism and science have very different time scales. A week-old newspaper is barely worth reading. A week-old scientific paper is still warm from the photocopier. Journalism neglects this discrepancy and pressures science to hurry up. Numerous headlines with the eye-roll-inducing opening “A new study shows” imply that only the newest (or weirdest) science is worthy of attention. Google declares about 2000 times as many results for that phrase compared to “an old study shows”.

To make sense of the differences between science and its representation in the media, it first helps to figure out what does and doesn’t make news, and what does and doesn’t get widely shared or discussed. With that established, we’re better positioned to overcome the challenges of communicating all scientific research, not just the sexy aspects that make good headlines.

Continue reading “True scientific news doesn’t make the news”

Ready, fire, aim: the myth of the scientific method

Scientific literature is like social media: its content disproportionately comprises successes and achievements. Just as social media seldom features mundane necessities like trips to the supermarket, scientific papers seldom feature abandoned experiments or fruitless pursuits. In fact, we generally work backwards from the results and conclusions when writing these papers. We start with the answer and present only the relevant methods. To a non-scientist, this may sound dishonest and deceptive, but it’s not. Its for the reader’s benefit: a linear narrative is much easier to follow than the actual story with its many tangents, setbacks, and realisations.

Anyone exposed to the process of scientific research quickly learns that it seldom follows the so-called scientific method, those dispassionate experiments meant to objectively test hypotheses. Science is better described as Ready, Fire, Aim. Yes, in that order. This phrase, borrowed from Neil Gershenfeld, concisely captures how science is an iterative procedure without a specific target. Put in some groundwork to figure out the general direction (Ready), but take what might otherwise be a shot in the dark (Fire), then spend time making sense of what you hit (Aim). You may well fail and hit nothing, but as Gershenfeld elaborates, you can’t hit anything unexpected by aiming first.

If the result confirms the hypothesis, then you’ve made a measurement. If the result is contrary to the hypothesis, then you’ve made a discovery – Enrico Fermi

Continue reading “Ready, fire, aim: the myth of the scientific method”

Benefits of avoiding replication studies

The benefits of replication studies in science seem obvious and intuitive. Yet they are not particularly prevalent nor encouraged. The typical reasoning is that there’s no value for being the second scientist or group to observe a result. Some1 take this to suggest that the current scientific publishing system is flawed and promotes papers with provocative results rather than technically sound methods. Journals like PLOS One that disregard perceived importance are the exception. There are, however, a number of advantages of the status quo.

Continue reading “Benefits of avoiding replication studies”

Scientific rationale: convenient little white lies?

Physics is like sex: sure, it may give some practical results, but that’s not why we do it quipped Richard Feynman. The oceanographer Curtis Ebbesmeyer1 provides a similar, albeit less memorable quote, when describing his early work on water slabs (aka snarks), which had relevance to both military and pollution issues: such practical matters did not interest me. I found snarks fascinating, even beautiful in their own right. The introductions to many scientific papers, however, are framed in terms of practical results. Hence the rhetorical question implied in the title: are the rationale we as scientists publish convenient little white lies, simply a way to validate undertaking the science that we find personally interesting and intrinsically satisfying?

Continue reading “Scientific rationale: convenient little white lies?”

Benefits of an e-reader for academics

E-readers are no good for reading scientific papers.1 They’re grayscale, they’re too small, and flipping back and forth between pages takes time. That said, my e-reader has two key benefits for me as a scientist/academic. It provides a truly offline method to read content later and it lets me read books that are only available as PDFs.

ereader_example2
An example of how a given blog post looks on my e-reader

Continue reading “Benefits of an e-reader for academics”

This is the best decade to be a grad student

Catching up on the literature is a daunting aspect of graduate studies. As a physical oceanographer, I regularly cite work from 30 to 40 years ago. In that time, and all the way back to the turn of the 20th century, the scientists before me got to answer all the low-hanging-fruit problems and write the papers that will be cited thousands of time. They leave behind the messy, complex, and esoteric questions for the current grad students. Surely, then, I would think the 60s or 70s or even earlier would have been the best time to be a grad student?

Continue reading “This is the best decade to be a grad student”

Article titles are more important than your name

A webpage or CV with a list of publications serves two purposes. A useful one: to help readers discover papers related to one that interested them. And a less altruistic one: to say ‘Hey, look at how many publications I have’. These days, the latter is somewhat necessary, but shouldn’t overshadow the former. Furthermore, discovering related papers should be an easy task, but too often isn’t.

Too many publication lists that I come across these days obscure the title—surely the most important part of the citation—by bracketing it with the authors’ names and journal details. While this form is necessary for a reference list in a paper, it makes no sense in a CV or personal webpage.

Continue reading “Article titles are more important than your name”