I typically write 100–200 lines of code each time I develop a scientific figure that is destined for publication. This is a dangerous length because it’s easy to create a functioning mess. With shorter code fragments, it’s feasible to start over from scratch, and with thousands of lines of code, it makes sense to invest time upfront to organise and plan. But in between these extremes lurks the appeal to write a script that feels coherent at the time, but just creates problems for future you.
Let’s say you want to create a moderately complicated figure like this:
A script for this figure could be envisaged as a series of sequential steps:
Read data in from a csv file
Remove any flagged data
Create four subplots
Plot the first line of data against time
Label the y axis
Set the y axis limit
Repeat steps 4–6 for the second and third lines of data
In a previous post, I listed a range of Matlab’s idiosyncrasies and flaws that seemed so much more blatant once I returned from several years of Python use. This post is a continuation, except this time highlighting ways in which Python makes life simpler rather than Matlab making life more difficult.
If you haven’t tried Python and you’re on the fence about whether it’s worth learning, let the points below convince you.
Keeping track of scripts used to generate figures is difficult. Before realising that Jupyter Notebooks could solve most of my problems, I would have directories with dozens of scripts with filenames of varying levels of ambiguity. Names that probably meant something to me at the time, but are hardly descriptive months or years later. Names like ISW_plume_plots.m, new_ISW_model_plots.m, and plot_model_behaviour.m. A certain PhD comic springs to mind.
Regardless of whether its Python, R, Julia, Matlab, or pretty much any other type of code, Jupyter Notebooks solve the problem. For example, I use a single notebook to archive the code for all figures in a paper and, more importantly, I can associate each set of code with the figure it generates. Rather than trying to remember what file I want, I need only remember which figure I want. (I say archive because I much prefer to do the bulk of my exploratory analysis in an editor. Alternatively, JupyterLab may work better for you.)
Catching up on the literature is a daunting aspect of graduate studies. As a physical oceanographer, I regularly cite work from 30 to 40 years ago. In that time, and all the way back to the turn of the 20th century, the scientists before me got to answer all the low-hanging-fruit problems and write the papers that will be cited thousands of time. They leave behind the messy, complex, and esoteric questions for the current grad students. Surely, then, I would think the 60s or 70s or even earlier would have been the best time to be a grad student?
Scientists should invest time in a good text editor: pay the upfront cost of learning to use and customising a single editor for all of your text needs. This may be obvious to programmers, but less so to scientists who may have yet to recognise the benefits of a good editor.
Much scientific analysis and documentation can be achieved with plain text files (e.g., .py, .m, .f, .r, .tex, or .md). The default method to work with multiple file types is to use multiple IDEs (Integrated Development Environments): Matlab for m-files, Spyder or IPython notebooks for python scripts, TexStudio or TeXnicCenter for latex files, RStudio for R, or one of the countless editors for Markdown currently available.
Using a single editor has many benefits over using a range of editors within each IDE: