Publication quality 2D graphs with less manual effort due to explicit use of dual coordinate systems
© Wagenaar; licensee BioMed Central Ltd. 2014
Received: 29 May 2014
Accepted: 18 September 2014
Published: 21 October 2014
Creating visually pleasing graphs in data visualization programs such as Matlab is surprisingly challenging. One common problem is that the positions and sizes of non-data elements such as textual annotations must typically be specified in either data coordinates or in absolute paper coordinates, whereas it would be more natural to specify them using a combination of these coordinate systems. I propose a framework in which it is easy to express, e.g., “this label should appear 2 mm to the right of the data point at (3, 2)” or “this arrow should point to the datum at (2, 1) and be 5 mm long.” I describe an algorithm for the correct layout of graphs of arbitrary complexity with automatic axis scaling within this framework. An implementation is provided in the form of a complete 2D plotting package that can be used to produce publication-quality graphs from within Matlab or Octave.
Computer programs for the two-dimensional graphical display of numerical data abound (e.g., Gnuplot , Igor Pro , Matlab , Octave ). Such programs commonly allow specification of data in arbitrary coordinates, and will automatically make sensible choices for axis ranges and many other aspects of the visualization. Rarely however, are the results immediately usable for professional publication. Less than elegant automatic positioning of text labels is especially common. While manual fine tuning is possible, this quickly becomes laborious even for relatively simple graphs, and in any case requires great attention to visual detail from users who would probably rather concentrate on their science. Why is correct positioning of text labels in graphs so challenging? Consider a typical task: placement of a textual annotation by a data point in an x-y graph. The text should appear centered a little below the data point. How much is a little? Probably a millimeter or two, but you cannot tell that to Matlab (or Gnuplot, etc.). Instead, you have to specify the label location in data coordinates. By trial and error, the user can experiment and find that in one particular graph the annotation should be placed at (1, 0.95) and in another at (1, 0.6) to appear at the desired distance below the data point at (1, 1). This will depend on the range of the graph’s axes, and the scale at which the graph is rendered. As an alternative, the graph could be exported and postprocessed in a graphics program such as Inkscape  or Illustrator , but producing consistent results this way is difficult, and it becomes laborious when many similar graphs are to be made. Wouldn’t it be convenient to simply be able to specify: “the top of this label should go 1.5 mm below the data point (1, 1)” from within the program that generates the graph in the first place?
A related, but not identical, problem is the determination of appropriate axis ranges when text labels may extend beyond the data range. If the final size of the graph is prespecified, then it is usually necessary to manually shrink the output from Matlab posthoc to make such labels fit. Naturally, that affects font sizes, which then need to be corrected, which typically makes it necessary to move the labels a little bit, and so on.
Here, I describe a program for generating two-dimensional graphs that explicitly acknowledges the relevance of two complementary coordinate systems: the logical coordinates of the data, and the physical coordinates of the output medium. As a result, text annotations can be positioned in the most natural way, and scaling the data axes to make the graph and its annotations fit in a prespecified area can be automated. The program can be used stand-alone, or—more conveniently—from within Matlab or Octave.
Notation and statement of the problem
The location of graphical elements may be written as a 4-tuple (x,y,δ X,δ Y) specifying a point displaced by (δ X,δ Y) paper units from the data point (x,y). This simple formalism is more flexible than it might seem at first glance: It can, for instance, readily handle rotated text, which would have (δ X,δ Y) be a function of the rotation angle, the point size, the font, and the text itself, but—importantly—not of the transformation parameters (a, b, c, d). However, this formalism cannot handle situations where the displacement is a function of the transformation parameters, as would be the case, e.g., when rotating text to parallel a curve specified in data coordinates. More on this later.
The problem to be solved is to find a set of transformation parameters (a, b, c, d) that makes all graphical elements fit within a predetermined area of size (W, H) on the paper and that causes all of the available space to be filled.
In principle, this problem could be solved directly by linear programming . However, for a practical implementation in software, an alternative approach was found to be preferable, not just because it was easier to program, but also because it naturally circumvented the limitation noted above.
with analogous expressions for the vertical axis. (For brevity, only the expressions for the horizontal axis placement will be given in the following. The expressions for the vertical axis are always directly analogous).
Next, all graph elements that are specified by a location displaced from given data points are considered. Take, for instance, a text label that is (w,h) points large as rendered and that is to be placed a distance (δ X,δ Y) away from data point (x,y). This label would have a bounding box (X0,Y0) – (X1,Y1), where X0 = a x + b + δ X and X1 = a x + b + δ X + w.
If the bounding boxes for all graph elements fall within the destination area, the algorithm is done. Otherwise, the transformation matrix needs to be modified to shrink the graph. Define Δ L, Δ R, Δ T, and Δ B to be the amounts by which the union of all bounding boxes protrudes outside the destination area to the left, right, top, and bottom respectively, and let Δ L (Δ R, etc.) be zero if the corresponding edge does not protrude.
It should be noted that not all layout challenges have solutions. For instance, if the label in Figure 2 had been “Impossible” rather than “Text,” no amount of shrinking would have allowed it to fit together with the label “ (x1,y1)” attached to the rightmost data point. In such instances, a becomes negative after a number of iterations, at which point the algorithm simply has to give up.
A layout algorithm by itself is not practically usable, so I implemented the above algorithm in conjunction with a 2D graph plotting package, named QPlot. The core of the program was written in C++  using the Qt library  to achieve operating system independence. This program and its user manual are available at http://www.danielwagenaar.net/qplot. The program may be used stand-alone which is expecially useful for automated graph generation. In addition, a library of Matlab/Octave functions may be used to conveniently graph data produced within these popular scientific computation environments with QPlot. (QPlot could not have been implemented directly in Matlab or Octave, because these languages do not permit graphical elements to be accurately measured).
Results and discussion
A new representation for laying out visualizations of scientific data has been presented that explicitly acknowledges the existence of two complementary sets of coordinates: paper coordinates (measured in millimeters or inches) and data coordinates (measured in arbitrary units). Using these dual coordinate systems, describing the placement of non-data elements  in appropriate locations relative to the data becomes much more straightforward. As a result, the placement of text labels, axes, and other elements can be guaranteed (within certain limits) to remain correct irrespective of the scaling of the data axes. This dual representation also enabled the formulation of a layout algorithm that automatically scales a graph to fit the available space while respecting constraints on the placement of text and other elements (Figures 1, 2 and 3).
To ensure that the results of this study are practically usable, a 2D graph plotting package was written that implements the dual representation and the automatic scaling algorithm. This software, QPlot, can be used from within the popular Matlab/Octave environments and is freely available online.
While the representation introduced in this article was described in terms of two-dimensional graphs and the current version of QPlot likewise only produces two-dimensional graphs, extension to three-dimensional data coordinates is in principle straightforward and may be implemented in a future version of the software.
Availability and requirements
Project name: QPlotProject home page:http://www.danielwagenaar.net/qplotProject archive:http://www.launchpad.net/qqplotOperating systems: QPlot has been tested on Linux. A binary version is available for Ubuntu 14.04. QPlot should compile from available sources on MacOS and Windows and wherever Qt and Octave are available.Programming languages: C++, MatlabOther requirements: QPlot needs the Qt libraries, version 4.8 or later. QPlot needs either Matlab, version 7 or later, or Octave, version 3.6 or later.License: GNU General Public License ver. 3+Any restrictions to use by-non-academics: None
DAW is an Assistant Professor at the Department of Biological Sciences at the University of Cincinnati. Previously, he was a Senior Research Fellow at the California Institute of Technology.
DAW is the recipient of a Career Award at the Scientific Interface from the Burroughs Wellcome Fund.
- Williams T, Kelley C, Bröker H-B, Campbell J, Cunningham R, Denholm D, Elber G, Fearick R, Grammes C, Hart L, Hecking L, Koenig T, Kotz D, Kubaitis E, Lang R, Lecomte T, Lehmann A, Mai A, Merritt EA, Mikulík P, Steger C, Tkacik T, Van der Woude, Woo A, Van Zandt JR, Zellner J:Gnuplot. [http://www.gnuplot.info/].
- Wavemetrics:Igor Pro. [http://www.wavemetrics.com/products/igorpro/igorpro.htm].
- Mathworks:Matlab: The language of technical computing. [http://www.mathworks.com/products/matlab/].
- Abbott B, Adler A, Aitkenhead AH, Anderson G, Andersson J, Annamalai M, Appel M, Atzeri M, Ayal S, Banks R, Barrowes B, Barth A, Bateman D, Bauschke H, Bect J, Belov R, Berry K, Billinghurst D, Bindner D, Bogusz J, Borgmann M, Boven P, Bovey R, Bradshaw J, Brinkmann M, Brister M, Bruno R, Buchacher C, Burchard A, Caliari M:GNU Octave. [http://www.gnu.org/software/octave].
- Harrington B, Hurst N, Gould T, Albert M, Andler J, Bah T, Barbry-Blot P, Barraud J-F, Baxter B, Beard J, Bintz J, Biro A, Bishop N, Blocher JL, Böck H, Bohre H, Borgmann D, Bouclet B, Broberg G, Brown C, Breuer H, Brubaker M, Bruno L, Buculei N, Byak B, Caclin P, Caldwell I, Carmichael G, Catmur E, Boldewyn:Inkscape. [http://inkscape.org].
- Adobe Systems Inc:Illustrator. [http://www.adobe.com/products/illustrator.html].
- Dantzig GB:Maximization of a linear function of variables subject to linear inequalities. Activity Analysis of Production and Allocation. Edited by: Koopmans TC. 1951, New York and London: Wiley and Chapman-Hall, 339-347.Google Scholar
- Stroustrup B: The C++ Programming, Language, 4th edn. 2013, Upper Saddle River: Addison-Wesley ProfessionalGoogle Scholar
- Digia:Qt. [http://qt.digia.com/].
- Doumont J-L: Trees, Maps, and Theorems: Effective Communication for Rational Minds. 2009, Kraainem: PrincipiaeGoogle Scholar
- Tufte ER: The Visual, Display of Quantitative Information, 2nd edn. 2001, Cheshire: Graphics PressGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.