The image below is from an Apple keynote delivered by Steve Jobs. At this point in the keynote Jobs is talking about the iPhone and presents a graph of the marketshare of different smartphone manufacturers in the US. Take a minute to look a the chart. The blue share representing RIM is obviously the biggest slice in this chart (that alone tells an interesting story!), but which is the second largest segment?
I show this image in data visualization courses all of the time and pretty reliably people choose the green segment, representing Apple, rather than the purple one, representing other manufacturers. The reason for this is that this chart is, perhaps carefully, designed in a way that misleads. In Apple’s defence they did actually include labels showing percentage of marketshare on this chart (we have removed them above).
The 3d effect in the chart is created by titling the chart away from the screen and this causes the green Apple segment to appear larger than it should for two reasons. First, because it is closer to the screen is appears bigger than the further away purple segment. Second, and more importantly, the side of the 3d pie chart is visible in green at the front of the image which makes us perceive the area of the green segment to be larger than it really is. These two things taken together make most people see the green Apple segment as the second biggest in this chart. The image below shows a flat 2D version of the same pie chart in which it is clear (even without the labels) that the purple segment is the second biggest.
This is a nice example of someone breaking one of Edward Tufte’s tenets of good data visualization:
“show all of the data and only the data“
We should not add things to a chart that do not contribute to the display of data in the chart. The 3d effect in the pie chart is a perfect example of this. The 3d effect adds nothing to the data content of the chart – in fact it actually masks the actual data content.
In his book The Visual Display of Quantitative Information Tufte captures this idea nicely in his data-ink ratio. The ink used to show data in a chart is divided by the total amount of ink used to print the chart to give a measure of its focus on data alone:
The data-ink ratio should be as close to 1 as possible – that is all ink should be data ink (we can easily substitute ink for pixels for charts displayed on screen). To see an example of this in action consider the (purposely pretty horrible) chart below based on the 2014 Irish local elections.
There is an awful lot of non-data ink in this chart. Through a series of simple steps we can remove the non-data ink to increase the data-ink ratio.
By removing the background colours, legend, bar colours, gridlines, vertical axis, and border we move the data-ink ratio much closer to 1 in the final chart below. This contains all of the data from the original chart, but none of the superfluity.
Some would argue that a chart like this final one is a little too minimalist, and they would stop one or two steps back in our process. I believe that there are some grounds for that argument, but the overall point remains: we should show all of the data and just the data.