We are often interested in ratios between two quantities. As an example, let’s use data from a study on the sugar content of soft drinks, where the the sugar content declared on the drink label was compared to the actual sugar content measured in the laboratory (Ventura et al. 2010, Obesity – pdf). The paper includes a nice table summarizing their measurements, which I have adapted to produce the plots shown here.
How can we present this data to get the most insight? In my opinion, presenting such data as ratios can obscure useful information; showing scatterplots of the two quantites can make it easier to spot patterns.
Alternative 1: Bar charts of ratios
This is how the authors of the original paper presented their results: as a bar chart of the ratio between total measured sugar and declared sugar content on the drink label. This ratio was expressed as a deviation from 100% (which represents the case when the declared and actual sugar content are the same). To highlight the worst offenders, they arranged the bar chart so that the highest positive deviations (more sugar than they say they have) are at the top, as I have recreated above. This would seem to be a reasonable way to show the results. We want to know whose labels are the most misleading, and the deviation from 100% is a direct measure of this.
The main drawback with ratios is that it is difficult to indicate uncertainty. What happens when there is some uncertainty or margin of error associated with the measurements? As the denominator value gets smaller, the calculated ratio also becomes more sensitive to small fluctuations in the absolute value of the numerator. Of course we can calculate the error of a derived quantity given the error in its components, but the interpretation is not intuitive, and you don’t know whether the error/uncertainty comes predominantly from one of the two, or whether it is spread more evenly between them.
In the extreme case, where the denominator value is zero (in this example, if there was a brand of drink that says zero sugar on the label), then you cannot even calculate a valid ratio, because there would be a division by zero!
Finally it gives no clue about absolute quantities. The Coca Cola sold in MacDonalds may have the most misleading label in terms of sugar content, but if we are using this as a guide to choosing which soda to buy, it still doesn’t offer that much help, because the total sugar content may still end up being lower than a more honest brand.
At least they did not organize their bar chart alphabetically (see below).
Alternative 2: Bar charts of percentages
In the case where we are comparing quantities that sum up to some total, we could also draw a stacked bar chart. Conveniently the same paper has data that can be shown in this way: the amounts of different types of sugars in the various drink brands.
With such a chart, there is no more division by zero. We can also compare more than two values, which we could not do with a ratio. However, there is still no indication of absolute quantities, and it is not easy or intuitive to indicate error bars. For example, for many of the drinks, the sucrose content was not zero, but was below the measurement limit (0.5 g per 100 mL). But how do you show that in a bar chart where each bar must be the same length?
Alternative 3: Scatter plot of absolute quantities
Scatterplots may not be as pretty as a neatly symmetrical bar chart, but they are more informative and reward exploration. Here I have plotted the measured sugar content (vertical axis) against the label content (horizontal). A 1:1 line shows where the points should fall if all labels were completely accurate. If a drink has more sugar than they claim to have, its point should fall above the line, whereas points below the line represent drinks that have less sugar than they declare.
Error bars can be easily added in both dimensions, there is no problem with division by zero, and we have a feel for the quantities involved because they are no longer hidden in a ratio. The human eye is also good at spotting clusters and other patterns that may pop up in the plot.
We see a cluster of points above the line, that should correspond to the same drink brands at the top of the bar chart in the first figure. We plot the labels for drink names on the graph to check.
As expected, these turn out to be the same soft drink brands that have more sugar than they claim to have, that we saw previously. The plot with labels shows the main drawback of scatter plots, that they can be cluttered and difficult to read. However, careful choice of plot axes, and editing the chart manually to make it clearer, should help in most cases.
Having the absolute quantities is useful, and is the main advantage of a scatter plot over charts of ratios or percentages. In the bar chart of deviations, the brand with the greatest negative deviation, i.e. less sugar than it says it has, is Kroger’s Apple Juice Cocktail. The naive reader might think that this is the healthiest drink on offer. Looking at the scatter plot, however, we see that although it is indeed falling below the line, the actual sugar content is on par with Coca Cola and Sprite from fast food outlets which are the “worst” options. So it is probably not so good after all, if your main intention is to cut back on sugar intake! Presenting ratios to obscure absolute values is a common marketing tactic – witness cigarette ads that claim “30% less tar”.
So avoid presenting data as ratios if you can!