My recent randomness post hinged on people’s expectations of how long a run of heads or tails you can expect to see in a series of coin tosses. In the post, I suggested that people tend to underestimate the length of runs, but what does the fox maths say? The exploration of the numbers in this post draws on the excellent 1991 paper “The Longest Run of Heads” by Mark Schilling, which would be a good starting point for further reading for the mathematically inclined.. When I ran the experiment with the kids, I asked them to try to simulate 100 coin tosses, writing down a sequence of heads and tails. Their longest sequence was 5 heads, but on average, for 100 tosses, the length of the longest run (which can be either heads or tails) is 7. Not surprisingly, this figure increases for a longer sequence of coin tosses. What might be a bit more surprising is how slowly the length of longest run grows. Just to bump up the average length from 7 to 8, the number of tosses has to increase from 100 to 200. It turns out that the average length of the longest run grows approximately logarithmically with the total number of tosses. This formula gives a pretty decent approximation of the expected length:
average length of longest run in n tosses ≃ log2 n + 1/3
The larger the value of n, the better the approximation and once n reaches 20, the error falls below 0.1%.
Growth of the Longest Run
However, averages (or, technically, expected values) like this should be used with caution. While the average length of the longest run seen in 100 coin tosses is 7, that does not mean that the longest run will typically have length 7. The probability distribution of the length of the longest run is quite skewed, as is evident in the chart below. The most likely length for the longest run is 6, but there is always a chance of getting a much longer run (more so than very short runs, which can’t fall below 1) and this pushes up the average length of the longest run.
Distribution of the Longest Run in 100 coin tosses
What the chart also shows is that the chance of the longest run only being 1, 2 or 3 heads or tails long is negligible (less than 0.03%). Even going up to runs of up to 4 heads or tails adds less than 3% to the cumulative probability. So, the probability that the longest run has length at least 5 is a little over 97%. If you ever try the coin toss simulation experiment yourself and you see a supposed simulation which does not have a run of at least 5, it’s a good bet that it was the work of a human rather than random coin. Like the average length of the longest run, this probability distribution shifts (approximately) logarithmically as the number of coin tosses increases. With a sequence of 200 coin tosses, the average length of the longest run is 8, the most likely length for the longest run is 7 and the chances of seeing a run of at least 5 heads or tails in a row is now over 99.9%. If your experimental subjects have the patience, asking them to simulate 200 coin tosses makes for even safer ground for you to prove your randomness detection skills.
Distribution of the Longest Run in 200 coin tosses
What about even longer runs? The chart below shows how the chances of getting runs of a given minimum length increase with the length of the coin toss sequence. As we’ve already seen, the chances of seeing a run of at least 5 gets high very quickly, but you have to work harder to see longer runs. In 100 coin tosses, the probability that the longest run has length at least 8 is a little below 1/3 and is still only just over 1/2 in 200 tosses. Even in a sequence of 200 coin tosses, the chances of seeing at least 10 heads or tails in a row is only 17%.
Longest Run probabilities
Getting back to the results of the experiment I conducted with the kids, the longest run for both the real coin toss sequence and the one created by the children was 5 heads. So, none of the results here could help to distinguish them. Instead, I counted the number of “long” runs. Keeping the distribution of long runs for 100 tosses in mind, I took “long” to be any run of 4 or more heads or tails. To calculate the probability distribution for “long” runs, I used simulation*, generating 100,000 separate samples of a 100 coin toss sequences. The chart below shows the results, giving an empirical estimate of the probability distribution for the number of runs of 4 or more heads or tails in a sequence of 100 coin tosses. The probability of seeing no more than two of these “long” runs is only 2%, while the probability of seeing 5 or more is 81%.
These results provide the ammunition for uncovering the kids’ deceptions. Quoting from the Randomness post:
One of the sheets had three runs of 5 in a row and two runs of 4, while the other had only one run of 5 and one run of 4.
So, one of the sheets was in the 81% bucket and one in the 2% bucket. I guessed that the former was the record of coin tosses and the second was devised by the children. That guess turned out to be correct and my reputation as an omniscient father was preserved! For now.
If you have made it this far, I would encourage you to do the following things (particularly the first one):
- Listen to Stochasticity, possibly the best episode of the excellent Radiolab podcast, which features the coin toss challenge
- Try the experiment on your own family or friends (looking for at least 3 runs of 5 or more heads or tails and ideally at least one of 6 or more)
- Share your results in the comments below.
I look forward to hearing about any results.
* UPDATE: I subsequently did the exactly calculations, which confirmed that these simulated results were quite accurate.