Captioning a scientific figure is like commenting code

Comments within code are harmless, right? They don’t affect run-time, so you might as well use them whenever there’s any doubt something is unclear.

I hope you aren’t nodding your head, because a liberal use of comments is the wrong approach. Not all types of code comments are evil, but many are rightfully despised by programmers as (i) band-aid solutions to bad code, (ii) redundant, or even (iii) worse than no comment at all.

The same is true for scientific figures and their captions. In fact, many of the rules discussed in the post Best Practices for Writing Code Comments remain valid when we replace comments and code with captions and figures, respectively.

Comments Captions should not duplicate the code figure

Here’s what the Best Practices post calls the canonical example of a bad code comment:

i = i + 1;     // Add one to i

Redundant comments like this do “positive bad: if I see comments like that I’ll quit reading them—and miss the helpful ones” – programmer Charles Moore.

I share Moore’s sentiment when reading redundant figure captions. As I’ve said before and I’ll say again, if your figure is a plot of, say, wind speed vs time, then your figure caption should not be anything close to “A plot of wind speed (m/s) vs time (hrs)“. Those details will be labelled on the x and y axes. Or if you’ve got a multi-panel plot, your caption shouldn’t be “Time series data from the first experiment: (a) wind speed, (b) shortwave radiation, (c) current speed, (d) temperature.” All of those details are labels. They do not belong in the caption.

Did you notice the other bit of redundancy in my first example? “A plot of …”. There’s no need for these three words, nor for “A photograph of …”, “A schematic of …”, or anything similar.

Ultimately, your caption should be more insightful than something a fancy AI system could generate if it was shown your figure.

Good comments captions do not excuse unclear code figures

Don’t misinterpret the previous rule to mean that it’s okay to add to the caption anything that isn’t redundant. In particular, ask yourself if the only reason the details are not redundant is because they’re not already included in the figure itself.

All too often I see captions of the form “Measurements from experiment 1 (black line), experiment 2 (dark blue line), and experiment 3 (light blue line)“. If that information was on the figure as a legend, then the caption is no longer necessary.

The advice “Don’t comment bad code—rewrite it” becomes “Don’t write long captions for bad figures—redesign them”.

Explain unidiomatic code figures with comments captions

One good use for captions is to point out the unusual. In my papers, I’ve used captions to highlight or explain

  • That the vertical scale has been exaggerated
  • That one panel has a different y axis scale compared to the rest
  • Seemingly unexpected references (like why measurements from 1911 are attributed to a paper from 1975)

Captions can also be used to spell out details that would never fit on the figure itself. For example, it’s sometimes necessary to use acronyms on the figure itself and spell it out in the caption. But don’t overdo it just for completeness. I once had a paper in which—to my displeasure—a copy editor added text to five of the captions to specify that CTD stands for conductivity-temperature-depth. This is common knowledge to anyone who would ever read the paper. Telling an oceanographer what CTD stands for is as unnecessary as telling a microbiologist what DNA is. This kind of clarification is simply a waste of space.

Use comments captions to mark incomplete implementations

In the same vein as the previous rule, captions can be used to explain

  • Why there are gaps in the dataset
  • That certain panels are omitted
  • Whether any data has been interpolated or extrapolated

And any other realities of scientific research. Again, use you best judgement to as to whether these details are helpful or overkill.

Never write a comment caption until you have to.

Much of the discussion thread for the Best Practices post can be summarised as suggesting that one should approach writing code comments as a last resort. Someone noted that “if you write good code, then comments should be extremely infrequent.”

Imagine if we applied this line of thought to scientific figures: The rule would become “if you make a good figure, then a caption should be unnecessary.” This, of course, would be an extreme approach. But it’s a good thought experiment. Rather than using the caption as an info dump, you’d have to be more thoughtful about layout and labels.

Think about this in the context of designing a website or other interface. If your design includes 5–10 lines of explanatory text (like a lot of captions do), then that’s not good. At least as every programmer should know, help text is a poor solution to user interface design. (By the way, there are many other things scientists could learn from rules of interface design.)

Save your readers some time

Good code is easy to read. Bad code is capable of being understood with time. It’s the same with figure captions and scientific writing in general. Most scientific writing has the goal of conveying intuition. A good paper can let someone learn in just a few short hours something that took years to discover. A bad paper won’t necessarily prevent this, but it might make a few short hours feel like a few long hours.

The best way to quickly convey intuition with regard to figures is to say why they are there, not what they contain. Check out any of the results for a Google search of should code be commented? I guarantee that the recurring theme will be programmers advocating for the why not the what.

Indeed, to write a good caption, imagine you were describing your figure to someone over the phone. You wouldn’t say “it’s a two-panel plot with a time series of wind speed at the top and a contour plot of temperature below”. Instead, ask yourself why you’ve created a figure, and then tell them that. Examples might be that “Air temperatures peak two hours before sunset” or “Regions of blocked flow develop upstream of the tall obstacles”.

Or, as the old saying goes, don’t miss the forest for the trees. Literally, if you were captioning a figure with trees, then

Two contrasting examples of how to caption a scientific figure. It is better to explain the high-level overview, not the small details
I’m pretty sure a real forest doesn’t actually have so many big tree species altogether like this. But if that’s your concern, then you’re—y’know—missing the forest for the trees.

Author: Ken Hughes

Post-doctoral research scientist in physical oceanography

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s