Jupyter Notebooks are gone from my scientific workflow

TL;DR: I’ve just learned that the text editor Sublime Text can display images within Markdown files. Gone therefore is my need to use Jupyter Notebooks.


I was never a true convert to Jupyter Notebooks. I used them for several years, and saw their appeal, but they just didn’t quite feel right to me.

Most complaints against Notebooks are technical ones: they’re awkward to version control, they’re hard to debug, and they promote poor programming practices. But these issues are tangential to my complaints against Notebooks, which are are less concrete:

  • I’m always scrolling. It’s inefficient.
  • I don’t want to do work in a browser. Maybe it’s a weak reason, but I like keeping my scientific and programming tools separate from the browser.
  • Editing and navigating Notebooks feels clumsy. Maybe it’s a lack of practice, but I’d rather leverage the time I’ve invested in learning and setting up my text editor than spend time learning a bunch of new shortcuts specific to Notebooks.
Continue reading “Jupyter Notebooks are gone from my scientific workflow”

Line graphs: the best and worst way to visualise data

Line graphs are the Swiss army knives of data visualisation. They can be almost anything… which is both good and bad.

Line graphs are slow to interpret

Many graphs serve one clear purpose. Take the five graphs below:

Even without labels, it’s clear what role each of these graphs serves:

  • Pie chart—components of a total
  • Thermometer—progress toward a goal amount
  • Speedometer—percentage of the largest possible value
  • Histogram—distribution of values
  • Box plot—statistical summaries of several datasets

In other words, if I’m presented with one of the graphs above, I have an immediate head start on interpreting it. If, instead, I’m presented with a line graph, I’m forced to read the axes labels and limits first.

Deciphering text is the slow way to intake information. Shape is fastest, then colour, and only then text. This so-called Sequence of Cognition, popularised by Alina Wheeler, is something marketers need to know about.

Continue reading “Line graphs: the best and worst way to visualise data”

A better way to code up scientific figures

I typically write 100–200 lines of code each time I develop a scientific figure that is destined for publication. This is a dangerous length because it’s easy to create a functioning mess. With shorter code fragments, it’s feasible to start over from scratch, and with thousands of lines of code, it makes sense to invest time upfront to organise and plan. But in between these extremes lurks the appeal to write a script that feels coherent at the time, but just creates problems for future you.

Let’s say you want to create a moderately complicated figure like this:

A script for this figure could be envisaged as a series of sequential steps:

  1. Read data in from a csv file
  2. Remove any flagged data
  3. Create four subplots
  4. Plot the first line of data against time
  5. Label the y axis
  6. Set the y axis limit
  7. Repeat steps 4–6 for the second and third lines of data
  8. Add the coloured contours and grey contour lines
  9. Label the time axis
  10. Add various annotations
Continue reading “A better way to code up scientific figures”

Captioning a scientific figure is like commenting code

Comments within code are harmless, right? They don’t affect run-time, so you might as well use them whenever there’s any doubt something is unclear.

I hope you aren’t nodding your head, because a liberal use of comments is the wrong approach. Not all types of code comments are evil, but many are rightfully despised by programmers as (i) band-aid solutions to bad code, (ii) redundant, or even (iii) worse than no comment at all.

The same is true for scientific figures and their captions. In fact, many of the rules discussed in the post Best Practices for Writing Code Comments remain valid when we replace comments and code with captions and figures, respectively.

Continue reading “Captioning a scientific figure is like commenting code”

Transit map–style scientific figures

A good map is geographically accurate and to scale, right?

Not always. Transit maps are one exception. They are intentionally distorted in order to be information dense, yet clean, spacious, and organised.

Many of the design decisions that go into a transit map also apply to scientific figures. There’s a lot for us scientists to learn from a careful look at transit maps.

A typical transit map: Singapore

Singapore's official transit map
Singapore’s transit map.
Continue reading “Transit map–style scientific figures”

Grey and one other colour: a foolproof colour scheme for scientists

How many colours do you need to visualise data scored on a five-point scale?

Two.

bar_charts_grey
Data from here.

If you went with the obvious answer of five colours, here’s what you get:

bar_charts_colors

The green and grey figure wins in two ways. First, it tells a story: about a third of respondents view Wikipedia favourably. (Although there are other interpretations of the data shown, a good figure emphasises a single message.) Second, the grey and green version just looks better.

Continue reading “Grey and one other colour: a foolproof colour scheme for scientists”

Embracing minimalism when presenting science

Everything should be made as simple as possible, but no simpler said Einstein. Except, he didn’t. His version of the quote was four times longer.

I’m not surprised that it took a non scientist to paraphrase and create the short, popular version. As scientists, we are not accustomed to brevity. We want to provide every detail. We read papers filled with columns of 10pt text. We construct figures with dozens of lines and colours. We spare no bit of white space when we design posters. And don’t get me started on logos for scientific campaigns (long story short: too many elements, too many colours, and too literal).

We lack minimalism.

You may argue that detail, nuance, and chains of logic—hallmarks of science—are not easily reduced to 280 characters or a sexy soundbite. I don’t disagree. But there are still aspects of minimalism we should embrace.

Continue reading “Embracing minimalism when presenting science”

Don’t start paragraphs with Figure n shows …

This article is going to describe … would be a terrible opening for this article. It’s six words that convey nothing. You already know this is an article, and you already know that it’s going to describe something. We don’t see this, fortunately, because the importance of a strong and compelling opening sentence is well recognised. At the paragraph level, however, it’s easy to forget the importance of the first sentence. In scientific cases, a symptom of poor or lazy writing is opening a paragraph with Figure n shows.

When it comes to visualising your data, the most important question to ask yourself is what’s your point. Wording a paragraph by starting with Figure n shows will not convey the point. It tells me what you did, but not why I should care. Using this phrase would be like putting the Methods section of a scientific paper before the Introduction.

Continue reading “Don’t start paragraphs with Figure n shows …

Scientific software skills make up for minimal artistic talent

A computer is a better artist than I am. If I can tell it what to draw, it will produce attractive results. To make a nice schematic, the hardest part is to tell the computer what I want to draw. Fortunately for us so-called left-brain types prevalent throughout the sciences, a familiarity with scientific software can overcome a lack of artistic talent, allow rapid iteration of a design, and even provide creative inspiration.

Invoking my scientific software skills, I am able to produce elegant figures:

shear_rotation_schematic
From Evolution of the velocity structure in the diurnal warm layer

Now, compare that with my initial sketch…

shear_schematic_sketch

Continue reading “Scientific software skills make up for minimal artistic talent”