In this follow-up guest post on The Stubborn Mule, Mark Lauer takes a closer look at the relationship between national development and fertility rates.
STOP PRESS: Switzerland’s population would be decimated in just two generations if it weren’t for advances in their development.
At least, that’s what the modelling in a recent Nature paper projects. The paper, widely reported in The New York Times, The Washington Post and The Economist, amongst others, was the subject of my recent Stubborn Mule guest post. In that post, I shared an animated chart and some statistical arguments that cast doubt on the paper’s conclusion. In this post, I’ll take a firmer stance: the conclusion is plain wrong. But to understand why, we’ll have to delve a little deeper into their model. Still, I’ll try to keep things as non-technical as possible.
First, let’s recap the evidence presented in the paper. It comprised three parts: a snapshot chart (republished in most of the reportage), a trajectory chart, and the results of an econometric model. As argued in my earlier post, the snapshot is misleading for several reasons, not least the distorted scales. And the trajectory chart suffers from a serious statistical bias, also explained in my earlier post. I’ll reproduce here my chart showing the same information without the bias.
That leaves the econometric model. From reading the paper, where details of the model are sketchy, I had wrongly inferred that the model suffered the same statistical bias as the trajectory chart. I have since looked at the supplementary information for the paper, and at the SAS code used to run the model. From these, it is clear that a fixed HDI threshold of 0.86 is used to define when a country’s fertility should begin to increase. So there’s no statistical bias. However, I discovered far more serious problems.
Possibly the best demonstration of these problems is to plot the so-called “null hypothesis” trajectories in the model. These show what the model expects would have happened if fertility changes were wholly insensitive to development after development passed the threshold. The model determines the sensitivity of fertility to development by measuring the true trajectories relative to these null trajectories. Here they are.
Given these null trajectories, it is perhaps not surprising that the model finds a statistically significant positive effect for development. But let’s consider just how negative the null trajectories are. According to these trajectories, the TFR in Switzerland in 2005 would have been just 0.66. After only one generation at this TFR, a country’s native population would shrink by a factor of 3. Two generations would collapse the population to barely over a tenth of the original. This is Switzerland we are talking about here.
Likewise, according to these null trajectories, the TFR in Germany, the Netherlands and the United States would have fallen to 0.75, 0.72 and 0.83 respectively in 2005. In fact, 15 of the 37 countries in the model have null trajectories which drive TFR below 1.0 in 2005. Remember a TFR below 1.0 means the native population will more than halve every generation. Given this null hypothesis, it would be extremely surprising if it weren’t rejected by the data.
The obvious question is how can the model possibly make such dire projections? The answer lies in a combination of two things: the set of countries chosen for inclusion in the model, and the use of so-called “time fixed-effects”. In the words of the paper:
This specification controls for … time trends, and thus allows us to test whether the reversal … persists after controlling for potentially confounding factors such as … common time trends.
To do this, the model contains so-called “year dummies”, factors that represent the impact of a particular calendar year. For example, being in 1990 is modelled as having the same fixed impact on the fertility of all countries. Being in 1991 is modelled as having another impact, but the same one across all countries. And so on. These fertility impacts are estimated from the data as part of the econometric analysis. Here is a chart showing the values of the year dummies estimated in the model.
For example, in 1982, fertility in all countries is estimated to fall by 0.066 simply due to the fact of it being 1982. Taken together, these year dummies impose a steeply declining path of fertility. Now notice the resemblance between this path and the null trajectories above. The time trend estimates dominate the modelling.
But we still haven’t explained why they are so negative. The answer is that estimates for the year dummies are made using the entire data set, including fertility changes in countries whose HDI is below the threshold. For example, in 1982 twelve of the countries have HDI scores below 0.86, some of them substantially below. Kuwait had an HDI of 0.75 in 1982 (a year when its TFR was 4.87). In the same year, South Korea’s HDI was only 0.76 and Chile’s just 0.74. Those are the development levels of Iran, Armenia and Ecuador today. But the model uses the fertility changes of Kuwait, South Korea and Chile in 1982 to establish a baseline for developed countries like Germany, Switzerland and the United States.
By including a dozen or so developing countries, whose fertility rates start out higher and fall quickly, the model estimates a steeply declining time trend. But those same data points are excluded from the HDI impact estimation because the corresponding countries are below the threshold in those years. As a result, HDI alone is forced to explain the absence of the steeply declining time trend in developed countries.
The paper’s criteria for inclusion of a country in the model is that its HDI must reach 0.85 by 2005. This allows countries like Argentina and the Slovak Republic to just scrape in, countries that provide no information whatsoever about fertility changes beyond an HDI of 0.86 because they never reach that threshold. Yet these countries contribute to the time trend estimation, and they exhibit large falls in fertility because they have higher fertility to start with.
To see this graphically, let’s consider what happens when we change the criteria for inclusion. Here is a chart showing the modelled “reversal” in fertility found by applying the paper’s model while varying the criterion for inclusion of countries.
The x-axis shows the minimum level of HDI in 2005 which allows a country to be included. The labelled points indicate the numbers of countries that are included as a result. The blue point is the criterion chosen in the paper. The point labelled 27 represents Australia, Austria, Belgium, Canada, Cyprus, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Israel, Italy, Japan, South Korea, Luxembourg, the Netherlands, New Zealand, Norway, Portugal, Slovenia, Spain, Sweden, Switzerland, the United Kingdom, and the United States. These are all developed countries, though Cyprus and Portugal could be considered borderline cases.
The ten additional countries used in the paper are Argentina, Chile, the Czech Republic, Estonia, Hungary, Kuwait, Malta, Poland, the Slovak Republic and the United Arab Emirates — all developing countries, or only recently developed. Including these countries drives down the time trend estimates (that is, the year dummies) and drives up the measured reversal. If we include still more developing countries, this reversal goes higher. If we use all 143 countries in the data set, the measured reversal is that HDI increases fertility in developed countries at a rate of almost 11 (that is, a 0.05 increase in HDI yields a 0.55 increase in TFR). So even if we believe the author’s model, their reversal rate of around 4 is still wrong, because by including more data we converge on a rate over 2.5 times their estimate.
More sensibly, we should exclude the developing countries. And, as the chart above shows, the reversal then disappears: fertility marginally decreases with increasing development. Better still, we should modify the model to only use data points from years where the corresponding country is above the threshold. Under this model, the rate of decline in fertility with development is even more negative (the Mathematica source code for all the charts and analysis in this post has been uploaded to GitHub).
When I first read about this paper, I was impressed. But after looking at the data I had serious doubts. Now I am convinced that its conclusion is unsupportable. At best, we can say that fertility stabilises or falls more slowly once development reaches the threshold. But that would be a reasonably predictable result — there are lower bounds to fertility after all. And it wouldn’t generate headlines in the New York Times, the Washington Post and the Economist.
Possibly Related Posts (automatically generated):
- Is There a Baby Bounce? (4 September 2009)
- Hottest 100 for 2011 (26 January 2012)
- Pinching Debt Data (22 May 2009)
- Bringing Harmony to the Global Warming Debate (25 February 2014)
It’s depressing that this paper made it into Nature. (Though apparently some large fraction even of Nature and Science papers have fairly elementary statistical problems.)
Pingback: Is There a Baby Bounce? | A Stubborn Mule's Perspective
Pingback: Kwoff.com
Nature has now published a comment on how difficult it is to have errors corrected:
http://www.nature.com/news/reproducibility-a-tragedy-of-errors-1.19264#comment-2498390708