Category Archives: charts

Shark season

Summer in Australia comes with cicadas, sunburn and, in the media at least, sharks. So far, I have learned that aerial shark patrols are inefficient (or perhaps not) and that the Western Australian government plans to keep swimmers safe by shooting big sharks.

Sharks are compelling objects of fear, right up there with spiders and snakes in the package of special terrors for visitors to Australia. As good hosts, we are quick to reassure: sharks may be the stuff of nightmares and 70s horror movies, but attacks are rare.

But, exactly how rare is death by shark? Over a Boxing Day lunch, I heard an excellent ‘statistic’, designed to reassure a visiting American. Apparently, more people are killed each year in the US by falling vending machines than are killed by sharks around the world. I was skeptical, but had no data to hand. Later, with the help of Google, I discovered that this statistic is 10 years old and the source? Los Angeles life guards. The tale has, however, become taller over time. Originally, vending machine deaths in the US were compared to shark attack fatalities in the US, not the entire world.

While data on vending machine related deaths are hard to come by, subsequent attempts to validate the story concluded that it was plausible, on the basis that there were two vending machine deaths in 2005 in the US but no fatal shark attacks.

Fun though the vending machine line may be, it is not relevant to Australia and, if you are on the beach contemplating a quick dip, then the risk of a shark attack is certainly higher in the sea than death by vending machine. Local data is in order.

According to the Taronga Zoo Australian Shark Attack File (ASAF):

 In the last 50 years, there have been 50 recorded unprovoked fatalities due to shark attack, which averages one per year.

Fatalities have been higher than average over the last couple of years. The ASAF recorded two deaths in 2012 and, although validated figures for 2013 are yet to be published, six deaths have been reported over the last two years, suggesting that fatalities rose further to four this year.

To compare shark fatalities to other causes of mortality, a common scale is useful. My unit of choice is the micromort. A one-in-a-million chance of death corresponds to a micromort of 1.0, a one-in-ten-million chance of death to a micromort of 0.1. Taking the recent average death rate of three per year (more conservative than the longer run average of one), and a population of 23 million in Australia leads to a figure of 0.13 micromorts for the annual risk of death for a randomly chosen Australian.

The most recent data on causes of death published by the Australian Bureau of Statistics (ABS) are for 2009. That year, three people were killed by crocodiles. Sharks are not specifically identified, but any fatal shark attacks would be included among the three deaths due to ‘contact with marine animals’. The chart below illustrates the risk of death associated with a number of ‘external causes’. None of these come close to heart disease, cancer or car accidents. Death by shark ranks well below drowning, even drowning in the bath, as well as below a variety of different types of falls, whether from stairs, cliffs or ladders.

Shark barplot

Annual risk of death in Australia (2009 data)*

Of course, you and I are not randomly chosen Australians and our choices change the risks we face. I am far less likely to suffer death by vending machine if I steer clear of the infernal things and I am far less likely to be devoured by a shark if I stay out of the water.

So, care should be taken when interpreting the data in the chart. Drug addicts (or perhaps very serious Hendrix imitators) are far more likely to asphyxiate on their own vomit than summer beach-goers. The fairest point of comparison is drowning in natural waters. At almost 3.5 micromorts, drownings in the sea (or lakes and rivers) is more than 25 times more common than fatal shark attacks. And the risk of both can be reduced by swimming between the flags.

What does that leave us with for conversations with foreign visitors? If you are headed to the beach, the risk of shark attack would be higher than death by vending machine, but it is still very low. The drive there (at 34.3 micromorts) is almost certainly more dangerous.

I will be taking comfort from my own analysis as I am heading to Jervis Bay tomorrow and sharks were sighted there this weekend:

Bendigo Bank Aerial Patrol spotted up to 14 sharks between 50 and 100 metres from shore at various beaches in Jervis Bay. [The] crew estimated the sharks at between 2.5 and 3.5 metres in length at Nelsons, Blenheim, Greenfields, Chinaman’s Beach and Hyams Beaches.

The beaches are un-patrolled, so wish me luck…but I don’t think I’ll need it.

* The figure for ‘Shark attack’ is based on the estimate of three deaths per year rather than the ABS data.

ngramr – an R package for Google Ngrams

The recent post How common are common words? made use of unusually explicit language for the Stubborn Mule. As expected, a number of email subscribers reported that the post fell foul of their email filters. Here I will return to the topic of n-grams, while keeping the language cleaner, and describe the R package I developed to generate n-gram charts.

Rather than an explicit language warning, this post carries a technical language warning: regular readers of the blog who are not familiar with the R statistical computing system may want to stop reading now!

The Google Ngram Viewer is a tool for tracking the frequency of words or phrases across the vast collection of scanned texts in Google Books. As an example, the chart below shows the frequency of the words “Marx” and “Freud”. It appears that Marx peaked in popularity in the late 1970s and has been in decline ever since. Freud persisted for a decade longer but has likewise been in decline.

Freud vs Marx ngram chart

The Ngram Viewer will display an n-gram chart, but does not provide the underlying data for your own analysis. But all is not lost. The chart is produced using JavaScript and so the n-gram data is buried in the source of the web page in the code. It looks something like this:

// Add column headings, with escaping for JS strings.

data.addColumn('number', 'Year');
data.addColumn('number', 'Marx');
data.addColumn('number', 'Freud');

// Add graph data, without autoescaping.

data.addRows(
[[1900, 2.0528437403299904e-06, 1.2246303970897543e-07],
[1901, 1.9467918036752963e-06, 1.1974195999187031e-07],
...
[2008, 1.1858645848406013e-05, 1.3913611155658145e-05]]
)

With the help of the RJSONIO package, it is easy enough to parse this data into an R dataframe. Here is how I did it:

ngram_parse <- function(html){
  if (any(grepl("No valid ngrams to plot!",
                html))) stop("No valid ngrams.") 
    
  cols <- lapply(strsplit(grep("addColumn", html,
                               value=TRUE), ","),
                getElement, 2)
  
  cols <- gsub(".*'(.*)'.*", "\\1", cols)

I realise that is not particularly beautiful, so to make life easier I have bundled everything up neatly into an R package which I have called ngramr, hosted on GitHub.

The core functions are ngram, which queries the Ngram viewer and returns a dataframe of frequencies, ngrami which does the same thing in a somewhat case insensitive manner (by which I mean that, for example, the results for "mouse", "Mouse" and "MOUSE" are all combined) and ggram which retrieves the data and plots the results using ggplot2. All of these functions allow you to specify various options, including the date range and the language corpus (Google can provide results for US English, British English or a number of other languages including German and Chinese).

The package is easy to install from GitHub and I may also post it on CRAN.

I would be very interested in feedback from anyone who tries out this package and will happily consider implementing any suggested enhancements.

UPDATE: ngramr is now available on CRAN, making it much easier to install.

Quandl

I spend a lot of time trawling the internet for data, particularly economic and financial data. Yahoo Finance and Google Finance are handy for market data and “FRED”, the St. Louis Fed is an excellent, albeit US-centric, resource for a broad range of financial aggregates. While these sites make it very easy to automate data downloads, most sites (including, unfortunately, the Australian Bureau of Statistics) provide data in Excel format or other inconvenient forms. At times this has become sufficiently frustrating that I have periodically entertained vague plans to build my own time-series data web-site that would source data from across the world and the web, making it available in consistent, useful way.

Needless to say, I never got around to it, but it seems that someone else has. Today I stumbled across Quandl, which aggregates and re-publishes over 5 million time-series. The data can be presented as charts on their website, downloaded or accessed programmatically through their application programming interface (API). There is even an R package available to make it easy to load data directly into my favourite statistical package, R.

Here is an example of how it all works. Quandl has data on the Australian All Ordinaries index. To read this data into R, you will first need to register with Quandl and obtain an authentication key for the API. This key is a random string, which looks something like this jEGfHz9HF7C3zTus6ZuK (this one is not a real key!). Once you have your key, you can fire up R and install and load the R package by entering the following commands:

install.packages("Quandl")
library(Quandl)

Once this is done, you will need to find the Quandl code for the data you are interested in. Near the bottom of the Quandl page, there is a pane showing the data-set information, including the provenance of the data.

Screen Shot 2013-04-20 at 10.54.02 PM

Armed with the text labelled “Quandl Code”, in this case “YAHOO/INDEX_AORD”, you now have everything you need. I will assume you already have the ggplot2 and scales packages installed. To plot the history of the All Ordinaries, simply enter the following code (replacing the string in the third line with your own authentication key).

library(ggplot2)
library(scales)
Quandl.auth("jEGfHz9HF7C3zTus6ZuK")
aord ggplot(aord, aes(x=Date, y=Close)) + geom_line() + labs(x="")

All Ordinaries

I can see I am going to have fun with Quandl. It even has Bitcoin price history. But that is a subject for another post.

Problem Pies

Last month the IMF published their latest Global Financial Stability Report. A colleague, who knows I rarely approve of pie charts*, drew my attention to the charts on page 27 of Chapter 3 of the report, which I have reproduced here (click on the image to enlarge). 

Here the authors of the report have decided to attempt some graphical improvisation, taking the pie chart and extending it. Over time some inspired new chart designs have been developed, but these have been rare. More often the result is inferior to using an established technique. While I do not wish to discourage innovation, the results should always be tested before being foisted on an unsuspecting audience.

The aim of this pair of charts is to illustrate the dwindling supply of “safe assets” in the form of highly rated sovereign debt as a result of the global financial crisis. For example, at the end of 2007, 68% of advanced economies boasted a AAA Standard & Poor’s credit rating (left hand chart, outer red arc) but  by January 2012 this proportion had fallen to 52% (left hand chart, inner red sector).

The heart of each chart is a conventional pie chart showing the current distribution of country ratings. Taken in isolation, either one of these would be a reasonable chart. But moving beyond a single pie chart, comparing the Advanced Economies chart to the Emerging Markets chart is not so easy. Edward Tufte’s adage from The Visual Display of Quantitative Information comes to mind: “the only worse design than a pie chart is several of them”. The crime against charting here is made particularly egregious with the choice of a colour scheme for ratings that is not consistent across the two charts!

If that wasn’t bad enough, the design comes right off the rails with the outer charts. These are a form of annular pie chart, but the alignment of each segment is shifted in an attempt to make the pre-crisis figure more readily comparable to the post-crisis figure for each rating. The result is highly confusing: it takes a while to work out exactly what is going on. Messing with the alignment of the outer chart also makes it harder to compare one rating to another. Even the decision to position the 2012 data in the middle and the 2007 data on the outside is a mistake. My eye expects a flow from the centre of the circle outwards rather than from outside in. An informal, if statistically insignificant, survey suggests that I am not the only one with this expectation.

The aim of any data visualisation is to provide easy access to the information. Understanding the IMF report’s chart is just too much work. A simple table of figures would have been easier to understand. But there are also more conventional charts that would do a better job. The chart below is an example of the “small multiples” technique. This involves a grid of similar charts which are readily compared as certain parameters are varied. In this case, scanning the charts horizontally reveals changes through time and vertically the differences between advanced economies and emerging markets.

Sovereign ratings from before the crisis (2007) to now (2012)

Some space could have been saved by restricting the vertical axis to a 0% to 70% range, but with the full 0% to 100% range the proportions for each rating are more readily grasped.

The small multiples chart is a vast improvement on the IMF original, and is a good illustration of the fact that choosing the right chart makes it far easier to visualise the patterns in your data.

* One of the few pie charts I do approve of is this one (I have seen this one in many places, but I am not sure of the original source).

Hans Rosling: data visualisation guru

It is no secret that I am very interested in data visualisation, and yet I have never mentioned the work of Hans Rosling here on the blog. It is an omission I should finally correct, not least to acknowledge those readers who regularly email me links to Rosling’s videos.

Rosling is a doctor with a particular interest in global health and welfare trends. In an effort to broaden awareness of these trends, he founded the non-profit organisation Gapminder, which is described as:

a modern “museum” on the Internet – promoting sustainable global development and achievement of the United Nations Millennium Development Goals

Gapminder provides a rich repository of statistics from a wide range of sources and it was at Gapminder that Rosling’s famous animated bubble charting tool Trendalyzer was developed. I first saw Trendalyzer in action a number of years ago in a presentation Rosling gave at a TED conference. Rosling continued to update his presentation and there are now seven TED videos available. But, the video that Mule readers most often send me is the one below, taken from the BBC documentary  “The Joy of Stats”.

If the four minutes of video here have whetted your appetite, the entire hour-long documentary is available on the Gapminder website. You can also take a closer look at Trendalyzer in action at Gapminder World.

Micromorts

Everyone knows hang-gliding is risky. How could throwing yourself off a mountain not be? But then again, driving across town is risky too. In both cases, the risks are in fact very low and assessing and comparing small risks is tricky.

Ronald A. Howard, the pioneer of the field of decision analysis (not the Happy Days star turned director) put it this way:

A problem we continually face in describing risks is how to discuss small probabilities. It appears that many people consider probabilities less than 1 in 100 to be “too small to worry about.” Yet many of life’s serious risks, and medical risks in particular, often fall into this range.

R. A. Howard (1989)

Howard’s solution was to come up with a better scale than percentages to measure small risks. Shopping for coffee you would not ask for 0.00025 tons  (unless you were naturally irritating), you would ask for 250 grams. In the same way, talking about a 1/125,000 or 0.000008 risk of death associated with a hang-gliding flight is rather awkward. With that in mind. Howard coined the term “microprobability” (μp) to refer to an event with a chance of 1 in 1 million and a 1 in 1 million chance of death he calls a “micromort” (μmt). We can now describe the risk of hang-gliding as 8 micromorts and you would have to drive around 3,000km in a car before accumulating a risk of 8μmt, which helps compare these two remote risks.

Before going too far with micromorts, it is worth getting a sense of just how small the probabilities involved really are. Howard observes that the chance of flipping a coin 20 times and getting 20 heads in a row is around 1μp and the chance of being dealt a royal flush in poker is about 1.5μp. In a post about visualising risk I wrote about “risk characterisation theatres” or, for more remote risks, a “risk characterisation stadium”. The lonely little spot in this stadium of 10,000 seats represents a risk of 100μp.

One enthusiastic user of the micromort for comparing remote risks is Professor David Spiegelhalter, a British statistician who holds the professorship of the “Public Understanding of Risk” at the University of Cambridge. He recently gave a public lecture on quantifying uncertainty at the London School of Economics*. The chart below provides a micromort comparison adapted from some of the mortality statistics appearing in Spiegelhalter’s lecture. They are UK figures and some would certainly vary from country to country.

Risk Ranking

Based on these figures, a car trip across town comes in at a mere 0.003μmt (or perhaps 3 “nanomorts”) and so is much less risk, if less fun, than a hang-gliding flight.

It is worth noting that assessing the risk of different modes of travel can be controversial. It is important to be very clear whether comparisons are being made based on risk per annum, risk per unit distance or risk per trip. These different approaches will result in very different figures. For example, for most people plane trips are relatively infrequent (which will make annual risks look better), but the distances travelled are much greater (so the per unit distance risk will look much better than the per trip risk).

Here are two final statistics to round out the context for the micromort unit of measurement: the average risk of premature death (i.e. dying of non-natural causes) in a single day for someone living in a developed nation is about 1μmt and the risk for a British soldier serving in Afghanistan for one day is about 33μmt.

*Thanks to Stephen from the SURF group for bringing this lecture to my attention.

Generate your own Risk Characterization Theatre

In the recent posts Visualizing Smoking Risk and Shades of grey I wrote about the use of “Risk Characterization Theatres” (RCTs) to communicate probabilities. I found the idea in the book The Illusion of Certainty, by Eric Rifkin and Edward Bouwer. Here is how they explain the RCTs:

Most of us are familiar with the crowd in a typical theater as a graphic illustration of a population grouping. It occurred to us that a theater seating chart would be useful for illustrating health benefit and risk information. With a seating capacity of 1,000, our Risk Characterization Theater (RCT) makes it easy to illustrate a number of important values: the number of individuals who would benefit from screening tests, the number of individuals contracting a disease due to a specific cause (e.g., HIV and AIDS), and the merits of published risk factors (e.g., elevated cholesterol, exposure to low levels of environmental contaminants).

As regular readers would know, most of the charts here on the blog are produced using the statistics and graphics tool called R. The RCT graphics were no exception. Writing the code involved painstakingly reproducing Rifkin and Bouwer’s theatre floor plan (as well as a few of my own design, including the stadium). For the benefit of anyone who would like to try generating their own RCTs, I have published the code on github.

RCT (Shaded theatres)

Using the code is straightforward (once you have installed R). Copy the two files plans.Rdata and RCT.R onto your computer. Fire up R and switch to the directory containing the downloaded files. Load the code using the following command:

source("RCT.R")

You will then have a function available called rct which will generate the RCTs. Try the following examples:

rct(18)
rct(18, type="theatre")
rct(18, type="stadium")
rct(c(10, 8, 5))

The rct function has quite a few optional parameters to tweak the appearance of the theatre:

rct(cases, type=”square”, border=”grey”, fill=NULL, xlab=NULL, ylab=””, lab.cex=1, seed=NULL, label=FALSE, lab.col=”grey”, draw.plot=TRUE)

  • cases: single number or vector giving the number of seats to shade. If a vector is supplied, the values indicate how many seats of each colour to shade. The sum of this vector gives the total number of seats shaded
  • type: the floor plan to be used. Current options are “square”, “theatre” (the original Rifkin and Bouwer floor plan), “stadium” and “bigsquare”
  • border: the color for the outlines of the floor plan
  • fill: vector of colours for shading seats. If no value is supplied, the default is a sequence of shades of grey
  • xlab: text label to appear below floor plan. Default is “x cases in n”
  • lab.cex: character expansion factor (see ‘par’) to specify size of text labels (if any) on the floor plan
  • seed: specify the starting seed value for the random number generator. Setting this makes it possible to reproduce exactly the same shaded seats on successive calls of rct
  • label: if TRUE, any text labels for the specified floor plan will be displayed
  • lab.col: colour used for any text labels
  • draw.plot: if this is FALSE, the RCT is not drawn and instead a data frame is returned showing the seats that would have been shaded and the colours that would have been used

Risk Characterization Stadium

Shades of grey

The recent post on the risks of smoking looked at Rifkin and Bouwer’s “Risk Characterization Theatre” (RCT), a graphical device for communicating risks. The graphic in that post, which compared mortality rates of smokers and non-smokers taken from the pioneering British doctors smoking study, highlighted both the strengths and weaknesses of RCTs.

The charts certainly illustrate the risks of smoking in a striking way and seem to elicit a far stronger reaction than drier statistical tables or charts. I also suspect that, for many people, the charts succeed in conveying the relative risks more effectively than more traditional approaches. On the other hand, there is no doubt that RCTs are extremely inefficient. The smoking graphic required an awful lot of ink to represent a mere eight data points.

In the comments on the original post, it was suggested that a colour-coding scheme could be used to combine the charts for the different age ranges, reducing the inefficiency while still preserving the immediacy of the theatre graphic. I took that as a challenge, and here is the result. Returning to the Rifkin and Bouwer theatre floor plan, rather than the more prosaic squares, I have coded deaths in different age ranges with shades of grey: the earlier the death, the darker the grey.

RCT (Shaded theatres)

Mortality of doctors born between 1900 and 1930

The risks of smoking still come through clearly in this version of the chart, but the increased efficiency may come at the expense of a potential for confusion.

What do you think?

Visualizing smoking risk

Risk is something many people have a hard time thinking about clearly. Why is that? In his book Risk: The Science and Politics of Fear, subtitled “why we fear the things we shouldn’t–and put ourselves in greater danger”, Dan Gardner surveyed many of the theories that have been used to explain this phenomenon. They range from simple innumeracy, to the influence of the media, or even the psychology of the short-cut “heuristics” (rules of thumb) we all use to make decisions quickly but that can also lead us astray.

In Reckoning With Risk, Gerd Gigerenzer argues that the traditional formulation of probability is particularly unhelpful, making calculations even harder than they should be. Studies have shown that even doctors struggle to handle probabilities correctly when explaining risks associated with illnesses and treatments. Gigerenzer instead proposed expressing risk in terms of “natural frequencies” (e.g. thinking in terms of 8 patients out of 1,000 rather than a 0.8% probability) and tests with general practitioners suggest that this kind of re-framing can be very effective.

The latest book on the subject that I have been reading is The Illusion of Certainty: Health Benefits and Risks by Erik Rifkin and Edward Bouwer. Rifkin and Bouwer are particularly critical of the common practice of reporting medical risks in terms of relative rather than absolute frequencies. When news breaks that a new treatment reduces the risk of dying from condition X  by 33%, should you be excited? That depends. This could mean that (absolute) risk of dying from X is currently 15% and the treatment brings this down to 10%. That would be big news. However, if the death rate from X is currently 3 in 10,000 and the treatment brings this down to 2 in 10,000 then the reduction in (relative) risk is still 33% but the news is far less exciting because the absolute risk of 3 in 10,000 is so much lower.

In an effort to facilitate the perception of risk, Rifkin and Bouwer devised an interesting graphical device. They note that it is particularly difficult to conceive and compare small risks, say a few cases in 1,000. In thinking about this problem, they came up with the idea of picturing a theatre with 1,000 seats and representing the cases as occupied seats in that theatre. They call the result a “Risk Characterization Theatre” (RCT). Here is an example to illustrate a 2% risk, or 20 cases in 1,000.

Risk Characterization Theatre

Now data visualization purists would be horrified by this picture. In The Visual Display of Quantitative Information, Edward Tufte argues that the “ink to data ratio” should be kept as low as possible, but the RCT uses a lot of ink just to display a single number! Still, I do think that the RCT can be an effective tool and perhaps this can be justified by thinking of it as a way of visualizing numbers rather than data (but maybe that’s a long bow).

Attractive though the theatre layout may be, there is probably no real need for the detail of the aisles, seating sections and labels, so here is a simpler version (again illustrating 20 in 1,000).

Simple Risk Characterization Theatre

To illustrate the use of RCTs, I’ll use one of the case studies from Rifkin and Bouwer’s book: smoking. One of the most significant studies of the health effects of smoking tracked the mortality of almost 35,000 British doctors (a mix of smokers and non-smokers). The study commenced in 1951 and the first results were published in 1954 and indicated a significantly higher incidence of lung cancer among smokers. The study ultimately continued until 2001 and and the final results were published in the 2004 paper Mortality in relation to smoking: 50 years’ observations on male British doctors.

The data clearly showed that, on average, smokers died earlier than non-smokers. The chart below would be the traditional way of visualizing this effect*.

Smoking Survival RatesSurvival of doctors born between 1900 and 1930

While it may be clear from this chart that being a smoker is riskier than being a non-smoker, thinking in terms of percentage survival rates may not be intuitive for everyone. Here is how the same data would be illustrated using RCTs. Appropriately, the black squares indicate a death (and for those who prefer the original layout, there is also a theatre version).

Smoking RCTsMortality of doctors born between 1900 and 1930

This is a rather striking chart. Particularly looking at the theatres for doctors up to 70 and 80 years old, the higher death rate of smokers is stark. However, the charts also highlight the inefficiency of the RCT. This graphic in fact only shows 8 of the 12 data points on the original charts.

So, the Risk Characterization Theatre is an interesting idea that may be a useful tool for helping to make numbers more concrete, but they are unlikely to be added to the arsenal of the serious data analyst.

As a final twist of the RCT, I have also designed a “Risk Characterization Stadium” which could be used to visualize even lower risks. Here is an illustration of 20 cases in 10,000 (0.2%).

Risk Characterization Stadium

* Note that the figures here differ slightly from those in Rifkin and Bower’s book. I have used data for doctors born between 1900 and 1930, whereas they refer to the 1900-1909 data but would in fact appear to have used the 1910-1919 data.

Bubbles to Brains

A couple of weeks ago I ranted about a bubble chart which attempted to illustrate trends in CDO issuance by large investment banks. If circles are a bad choice for depicting data, pictures of brains are even worse, but brains are what the BBC News designers settled on when it came to looking at the countries which have been most successful at winning Nobel prizes.

Nobel Brains - bad chart

There is no doubt that the idea to link Nobel prizes to brains is an appealing one, but comparing the relative sizes of these blobs of grey matter is not easy. In fact, it’s hard to avoid simply reading the numbers rather than looking at the graphics, which rather defeats the purpose of charting the data. A simple league table would have done the same job.

This would come as no surprise to William Cleveland, a statistician who took an experimental approach to understanding the effectiveness of different graphing techniques. In Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods, published jointly with Robert McGill in 1984, Cleveland ranked our ability to judge variation in charts in the following order:

  1. Position along a common scale
  2. Positions along nonaligned scales
  3. Length, direction, angle
  4. Area
  5. Volume, curvature
  6. Shading, color saturation

Furthermore, Cleveland’s experiments used circles rather than brains when area perception was tested and I suspect, brains would fall somewhere between four and five. This perception ranking also points at a better choice of graphic: a simple bar chart, which relies on judgement of length rather than area. Better still, since the bars have a common baseline, comparing them in fact requires judgement of position along a common scale, the easiest of the perception tasks.

Nobel Prize Bar ChartTop 5 Nobel Prize winning countries from 1901

The bar chart is much easier to read, but it may seem a little pedestrian to graphic designers excited by the idea of weaving in a brain image. While I am happy with the simple bar chart, sprucing it up with a background image does not interfere very much with the ease of reading the data. Here is an example, although I am sure those more adept at the use of Photoshop (or Gimp in this case) could come up with something better still.

Nobel Prize Bar Chart with Brain

The BBC post includes two more charts, which also have their shortcomings. The pie chart showing just how few women have won Nobel prizes is a particular waste of space. Certainly it is evident from the chart that women have not been awarded very many prizes, but simply stating in words that “the 41 of 806 prizes that went to women represent a mere 5.4%” does an even better job. Of course, the percentage could be added to the chart, but the necessity of adding a lot of numbers to a chart is a sure sign that the chart is not doing its job very well.