Following on from my post on Visualizing the Hottest 100, I noticed that the UK’s Guardian newspaper has published a list of 1000 songs to hear before you die*. The list was assembled from nominations posted by readers. Even before looking at the list, I suspected that the demographic profile of the Guardian’s readers may be a little different to that of Triple J’s listeners. A look at the distribution of year of release in the two lists bears that out.
Hottest 100 | Guardian 1000 | |
Minimum | 1965 | 1916 |
1st quartile | 1984 | 1968 |
Median | 1994 | 1977 |
3rd quartile | 1997 | 1988 |
Maximum | 2008 | 2008 |
Year of Release “Five Number” Statistics
In fact, fully 14% of the tracks in the Guardian’s list were released before the earliest track in the Hottest 100. Interestingly, that track was Bob Dylan’s “Like A Rolling Stone”, which also features in the Guardian’s list.
While the 1000 songs are not presented in any particular rank order, they are grouped by “theme”. The themes are heartbreak, life and death, love, party sonds, people and places, politics and protest and, of course, sex. This allows us to investigate the evolution over time of these different themes.
The chart below is a “box and whisker plot”, also known more prosaically as a “box plot”. It provides a graphical representation of the distribution over songs in each theme by year of release. The box shows the “interquartile range”, from the 1st quartile to the 3rd quartile. This means that half the songs fall inside the box, while a quarter were released in earlier years and a quarter in later years. The solid band shows the median year, which is the year right in the middle of the distribution. The light grey line shows the average year of release. Since most of the distributions are skewed to the left (early years) right (later years) in the interquartile range [see UPDATE below], the mean is a bit higher than the median. The “whiskers” on the plot extend no more than 1.5 times the width of the box. Any outliers beyond the whiskers are shown as points.
Distribution of Year of Release
So what can be made of these distributions? It looks as though love songs are not as popular as they once were and people and places have fared worse still. But while love may be old-fashioned, sex and party songs have become more prevalent and there is still plenty of heartbreak.
And what of the most popular artists? The three most successful artists in Triple J’s Hottest 100 were Nirvana, Jeff Buckley and Radiohead. Nirvana and Radiohead managed one song each in the Guardian’s list: “Lithium” and “Paranoid Android” respectively (both in the life and death theme). Jeff did not make the list, although his father Tim did, with the song “On Top”. The artist with the most entries in the Guardian’s list was Bob Dylan, and the top 12 features a few who did not make it into the Hottest 100 at all, including Randy Newman, Frank Sinatra and The Kinks.
Bob Dylan | 24 |
The Beatles | 19 |
David Bowie | 9 |
Randy Newman | 8 |
The Rolling Stones | 8 |
Elvis Presley | 6 |
Frank Sinatra | 6 |
Madonna | 6 |
Marvin Gaye | 6 |
Prince | 6 |
The Beach Boys | 6 |
The Kinks | 6 |
It’s hard to read much more than that into these numbers, but importantly it gave me the opportunity to use a box and whisker plot which this blog has been sorely lacking.
UPDATE: As Mark has commented, this is a bit of a dodgy explanation. There is only so much that can be deduced about a distribution from a box and whisker plot (appealing though they may be). This histogram shows the distribution of the year of release for life and death songs.
Life and Death Theme Histogram
Mark also pointed out that the box and whisker plot does not really show the relative popularity of the different themes over time. I haven’t used pie charts yet, but I am not a fan, so I have come up with a mosaic plot instead.
This confirms the decline in popularity of the love theme, but suggests that, while sex boomed in the 1990s, it has lost ground again in the 21st century. Heartbreak and party songs are the most popular themes of the current decade. The chart also shows that there are more songs in the list from the 60s and 70s than from the 90s, again a departure from the Hottest 100.
I have added this chart to the Guardian Datastore photo pool on flickr.
* To be precise, there are only 988 different songs in the list (and six are duplicated, each appearing in two different categories).
Possibly Related Posts (automatically generated):
- Rolling Stone vs Triple J (2 June 2010)
- Visualizing the Hottest 100 (12 July 2009)
- Hottest 100 for 2011 (26 January 2012)
- Swine Flu League Table (15 June 2009)
More nice work from The Mule, but there’s something very strange about the skewness depicted here. If the average is above the median, it usually means more outliers are to the upside (outliers contribute heavily to the average but not the median). But this isn’t observed in the box plots, nor is it expected given the hard upper bound of 22 years above the median and no firm lower bound with observations 51 years below the median. Perhaps we need to see a histogram to understand what is going on.
The other point I would make is that we shouldn’t conclude anything about the popularity of different themes at different times from this data. The changes in frequency of themes could have other causes, such as changes in music styles that affect how memorable songs of different themes are today. The data tell us how much relative value people ascribe today to music of a given theme from a given point in the past. To put it more concretely: perhaps party songs were more popular in the 60s and 70s than love songs, but the music style of the day has rendered the love songs more durable, while the 80s saw the reverse.
In a similar vein, we can’t conclude that “sex and party songs have become more prevalent” unless we also know the total populations of the various themes. If Love songs make up 950 of the 1000,
then the prevalence of valued Love songs from the 80s is still higher than that of party songs. Would a pie chart of thematic composition complete the graphical zoo?
Excellent points Mark. The position of the means reflects the skew to the right of the interquartile range. I will update the post to reflect this and histograms are on the way. As for pie charts, I can’t bring myself to do that. Pie charts are evil.
I agree that pie charts are (mostly) evil. A better approach would be a histogram of all songs with colours within each bar showing decomposition according to theme. This would give a good visual representation of the relative prevalence of different themes, both overall and in each period.
While the averages seem to indicate “skew to the right”, I don’t understand how that happens — it is, at least for me, highly unexpected.
Mark: I’ve added a couple of additional charts that should make the distribution a bit clearer. What is happening is that the skew to early years is due to a very small number of songs. Stripping them out, the skew is actually to the more recent years.
I just realised that there are a few songs that are repeated, albeit under different themes:
“Brother Can You Spare a Dime?” by Bing Crosby (people and places, politics and protest)
“Short People” by Randy Newman (politics and protest, party songs)
“I Want You” by Elvis Costello and the Attractions (heartbreak, sex)
“My Generation” by The Who (politics and protest, party songs)
“Smalltown Boy” by Bronski Beat (people and places, politics and protest)
“The Village Green Preservation Society” by The Kinks (people and places, politics and protest)
Strictly speaking, this means that Randy Newman and The Kinks should drop down a notch in the top 12 ranking.
Yo yo yo yo yo. Just don’t get too cray-zee here. There’s the faintest whiff of the error that you warned against in your Hottest one hundy post. I know you haven’t actually made it in your brainbox, but that whiff is there in the text, yo.
Said error: This doesn’t show that – say – party songs boomed in the 70s and 80s. That’s poppycock. I’ve heard plenty of songs from the 20s-30s about jumpin’ joints and drinking blueberry wine etc. What it does show is that UK-ers ‘remember’ the ‘more important’ songs (i.e. people and places) and so vote for them accordingly. So, yes, maybe 20s-50s party songs are underrepresented in this list, but that doesn’t show they were so at the time.
Also: why the goddamn-it do 40-sumthin’ UK-ers have such terrible taste in music?
Michael Michael: your point is well made. Also, given that the Hottest 100 was determined by votes and the Guardian’s list was based on nominations that were then curated by the Guardian. Perhaps a case could be made that the Hottest 100 reflects popular songs (among the listeners of Triple J) while the Guardian’s list more simply reflects what is memorable. A long bow perhaps, but that is the spirit of this post after all.
That mosaic chart, erm, rocks — brilliant.
Another tenuous interpretation: The Guardian list appears to have a dip in frequency during the 1990s, while this is the most popular decade in the Hottest 100 list; so perhaps the two audiences are complementary. The alternative-rock-loving Gen Y has turned its back on the newspaper, leaving a Hottest-100-shaped hole in the Guardian list.
Ooh, just noticed some ‘chartjunk’ on your mosaic plot, yo! The bottom right ‘stubbornmule.net’ is TOTALLY blowing out your data:ink ratio. Yo!
Pingback: Stilgherrian · Links for 24 July 2009 through 26 July 2009
Mule’s note: I don’t usually let spam through, but this comment was sufficiently amusing that I’m letting it though (stripped of contact details):
Hi
I Am Sanjay india No-1 love guru
What you have any Problem in your love
plz… call me….962884xxxx , 993604xxxx
Email- sanju.xxxx@gmail.com
your love guru
sanjay kumar
love the mosaic plot! Much prettier than the ones I’ve used in JMP as well!
Notched boxplots are a bit of a “best of both worlds” (short of bootstrap anyway), although the literature is a bit vague. You can also prod the boxplots to have the box width proportional to sample size, for even greater data density.