The maths behind the model

This section of the Life of Riley website is really only for those of you obsessively interested in the statistics behind Life expectancy in the UK and and how I calculated the facts and figures within the book and built the 5 tribes model. If you are not engrossed by numbers, and just want the tips on how to live healthier for longer – this section is definitely not for you!

I need to start with the word “average”. There are three “averages” which are used in statistics. Each for a different, legitimate purpose.

Most of us tend to think of the arithmetic average, or “mean” when we are quoted an average. A bag of 9 apples weighs 900g, so each apple on average weighs 100g. That’s the “mean” weight of the apples.

The other main example of average is the median. This is the number that represents the midpoint in an ordered list of numbers. It is quite widely used when looking at human population data, for reasons that will become obvious.

The third measure is the “mode” which is the number that occurs most frequently in a series of numbers.

Back to the median. Remember when you lined up in class at school in order of your birthday, or your height? Well, the median would be the kid who stood in the middle of the line – the one with equal numbers younger and older (or smaller / taller) than him or her.

The mean and the median can be quite close – but occasionally they can differ quite sharply, because of two effects. The first is the mixing of different groups in the sample being measured, and the second effect is due to “outliers” in a sample. Let me explain both.

Group Mixing

If you weighed each of the apples in my example earlier, you’d be looking for what the 5th heaviest apple weighed if you wanted to know the median weight of the apples in the bag. Given most apples in a bag bought from a supermarket tend to be very similar if they are the same variety, it would probably be close to 100g – but it could be a little lighter or heavier.

However, if you had a bag of mixed apples of different varieties, the median could be a markedly different number to the mean. Imagine a bag with 6 coxs pippins each weighing exactly 50 grams, and three cooking apples each weighing 200 grams. The overall weight of the bag is still 900g, and there are still nine apples in the bag – so the mean is still 100g – but the median isn’t. The median is around 50g, because if you lined up the nine apples in order of weight, the fifth (i.e. median) would be a pippin.

This is important, because what it shows us is that within a single variety, mean and median align pretty closely, but if you mix different varieties of things, whilst the mean might stay the same, the median can shift dramatically. And then it becomes less meaningful to use “mean” to look at this mixed group of things as a whole. At that point it is much more sensible to split the bigger group into its different varieties, and measure and analyse those.

And that’s true of us too. Instead of thinking about us as one giant group of humans, as far as life expectancy is concerned, it is much more illuminating to split us into different groups who are easier to understand.

This is why I liked and developed the Tribe concept, as I had a suspicion it would explain a lot of what was going on in the ONS data, and provide some clues as to what the life outcomes for the live longer Tribe 5 members were. We know intuitively that life expectancy is going to be different for each Tribe, but perhaps not quite what those numbers are – and in particular what the numbers for “Tribe 5” are.

The Outlier Effect

So hopefully you now grasp why this concept of mixing five different tribes within a population might make the arithmetic mean a rather unhelpful measure. In addition, as I mentioned earlier, there is one further reason why median trumps mean for populations, and that is because of what are known as population “outliers”.

Imagine you had a room of 10 people and their average (mean) wealth was, say, £50,000. Then Richard Branson walks in. Suddenly the ‘average’ wealth in the room is millions of pounds if you used the arithmetic average, or mean. If you chose the “median” wealth instead however, the 6th richest person in the room would still, I guess, be pretty close to £50,000 in terms of his or her wealth. This is the second reason why median is a better measure of average for human populations, which can contain significant “outliers”. Even within single tribes you will get outliers who can distort arithmetic averages, so using a median works here too.

As far as life expectancy is concerned, the outliers include the small number of us who die very early. Tragically of course some babies die at birth, or some children from specific childhood diseases. And some teenagers and people in their early 20s die in accidents too. In fact around 1% of all deaths each year are to people under 30, and are nearly always unrelated to lifestyles.

Yet including these tragically cut-short lives means any arithmetic calculations of life expectancy are bound to under-represent the average age achieved by mature adults.

And what is average life expectancy in the UK?

All marketing/business plans start with a bit of desk research – looking at what’s publicly available (and therefore free) and trying to discern some patterns in it you can use in developing your plan.

And in this regard, for our purposes the UK mortality tables are a fascinating read if you are a numbers geek like me. Every year, the government publishes a series of “period expectation of life tables”, which predict how long we can expect to live (from 0 to 100) based on our current age (assuming nothing about life expectancy changes) and the latest are based on data for the years 2011-2013. These are split by men and women, and the data is fascinating.

According to this latest published ONS data, average (mean) life expectancy has crept up to 79.8 (78.5 for men, 81 for women). For the last few years it has been 79ish, and many recent newspaper articles quote 79 when reporting these stories. This number has now seeped into the public consciousness I think, as when I ask most people to tell me what average life expectancy is, 79 is the most quoted number – although many folk actually have no idea and are prepared to speculate with some wild guesses. I do worry about their pension provision. People also know women live longer than men, a topic to which I shall return.

But this 79 number is unhelpful in a couple of respects.

Firstly, this figure doesn’t take account of people who live beyond their 101st birthday, because the ONS doesn’t publish this data. So the “79” number of recent years only comes from analysing the predictions for people who die between the ages of 0 and 100. People who live to 101 and beyond aren’t included in any “mean” calculations you can do on the published data. So they simply get missed out when people do the quick maths to work out mean age of death (by the way, I am not blaming the ONS for this – they are clear their data only runs to 100).

Why do we ignore those aged 101 and older? Possibly because when these tables were first prepared (each published data set goes back 30 years, but we have records going much further back) the numbers of 101+ folk were really small – less than one third of 1%, or around 300 per 100,000 in 1980-82, and obviously far fewer before then. So up until fairly recently they were, statistically at least, insignificant.

However, the number of people living beyond 100 has increased from 300 to almost 1,300 per 100,000 – more than 4 times the size – in the last thirty years, and that number of oldies does have a measurable impact when you are working on arithmetic averages, or means. Hence the discrepancy between the widely reported 79 figure and the more accurate number from the full 2011-2013 tables for the “true mean”, including the centenarians, of exactly 81 (the ONS were kind enough to send me their private data to allow me to calculate this). I don’t think we can view these people as outliers now – there are simply too many of them, and for our purposes most will have achieved their longevity by making and following good lifestyle choices, so they should be in any calculations we make.

Wow – I’ve added over a year to your life expectancy in just a couple of sentences!

I suspect the ONS will have to think again about publishing the 101+ data in future years as the number living beyond 100 grows.

As I said earlier however, we are still including in this calculation the true outliers, those people who are sadly dying before the age of 30, nearly all of whom are doing so for non-lifestyle related reasons.

So from a lifestyle vs life expectancy perspective, the ONS data has been delivering a double whammy. Leaving in those early deaths, which are not lifestyle related for the most part, and excluding the oldest survivors, whom one might reasonably surmise got to that age at least in part because of good lifestyle choices!

So if we adjust for both effects, and simply looked at people who had reached the age of 30 (let’s call them “Mature Adults”), and included the centenarians in any calculations, the mean life expectancy overall rises to 82 (80 for men, 83.5 for women).

That’s quite some shift from where we started.

Remembering what I said earlier about mixing different tribes, if we instead look for the mid-point average for people 30+, the median life expectancy for the adult population of the UK, this number is quite a bit higher, at 84.2 (82.5 for men and 85.9 for women).

Trying to summarise, you might say that for people over the age of 30, the difference between the mean life expectancy of 82 and median life expectancy of 84 is down to groups vs individuals. The mean of 82 is “the average life expectancy for the adult population” and the median of 84.2 is “the life expectancy for an average person” – which is why I’d pick median if I were trying to communicate your likely life expectancy.

As a side point, the eminent Cambridge statistician David Spiegelhalter wrote a blog on just this subject of the gap between mean and median when discussing life expectancy, which is one of the reasons I started digging into the topic. He is as concerned as me that we are reporting on, and in many cases using for public policy, the wrong number. Median is what should be driving us here, not mean, when it comes to thinking about age.

So, if we start with that adult median figure of 84, trying to then disaggregate the ONS data to see what is going on under the surface of life expectancy is tricky. We simply don’t have the data to look at the Tribes individually. In particular, although there is data on ages of death for drinkers, smokers etc. – there is nothing really on those in Tribe 5 which is where I wanted to focus.

Hence the attempt to build a simple, theoretical model to illustrate what might be happening.

Now I should admit outright that this is a simple, superficial look at the data, and clearly the real world is a lot more complex than this. Nevertheless, I think there are some conclusions which emerge from this which can help us understand things a little more easily.

So to begin with what do we know? I summarised in chapter three of the book the key pieces of data, and how they combined to produce some interesting “Tribe median life expectancy” numbers, but to break that down a little further this is how the logic flows.

I’m going to concentrate on men here. I have done the work for both men and women, but just using one set of numbers makes the explanation easier. I am also going to use the median age of death as my overall reference point rather than mean, as it ends up modelling the real world much more closely.

In order to let you imagine what is going on, come with me on a rather “ghoulish” thought experiment. Imagine, we selected 1,000 adult men at random across the UK today, invited them to a party, and, by dint of being able to see into the future, gave them a special “ticket” which had their actual life expectancy (i.e. age of death) written on it (assume here nothing happens in terms of science/medicine etc to alter the current likely projections for folk). If they were lined up to get into the party in order of the age printed on their tickets, the 500th man in that queue would have a ticket which read ’82 years old’ (the current median UK adult male life expectancy according to ONS data). I imagine him holding a flag to indicate he’s at the half-way point in the queue.

Now imagine instead of there being just one entrance to get into the party, there are in fact five entrances. And each gate corresponds to one of our tribes.

The men begin to sort themselves out.

In front of the first entrance there are our heavy drinkers. There are 58 of them (according to the German study I quoted in the book). The chap half way along, in position 29, has co-incidentally got a ticket which says ‘58 years old’. No age, as they say.

If we got all of the men who smoked to put their hands up, there would be 220 of them, and if we just put them all in a line in front of entrance 2, the chap in 110th position (initially carrying the Smoker flag) would have a ticket saying ’75 years old’. However, 46 of those smokers have already shuffled across from queue 2, and are already in the queue for entrance 1 as they are also heavy drinkers. So there are only 174 smokers left in the queue for entrance 2. Because many of the youngest smokers to die are in queue 1, the median age of those left behind in queue 2 is perhaps older than you’d expect, and now, with only 174 left in this queue, the flag for the half-way point in queue 2 has passed back to the 87th bloke in line, a man with a ticket that reads ‘79 years old’.

260 of the chaps we are looking at are obese (According to the ONS data for 2013), and they live on average just under five years less than the population as a whole. If we got them to queue up just on their own in front of entrance 3, the bloke in position 130 would have a ticket saying ’79 years old’. However, about 52 of these men also smoke, and were already in queue 2. The remaining 208 obese men in queue 3 who don’t smoke have a chap carrying their flag at median position 104. He has a ticket saying he will be ‘80 years old’ when he dies.

That’s 58 + 174 + 208 chaps in these three queues. That’s 440 men in tribes 1, 2 and 3 now queuing up in front of entrances 1, 2 and 3 to this party. That’s not far off half of all the men overall. And their overall median life expectancy according to those tickets across the three lines is 76.

If we asked the remaining 560 men we invited to queue up for us in one line for a minute, the chap in position 280 would have a ticket saying ‘87 years old’. That’s over a decade older than the median for the guys in queues 1, 2 and 3.

At this point we can separate these guys, and ask the 200 or so physically active guys (according to the Bristol University study) to move across to gate 5.

The sedentary chaps shuffle forward in queue 4. There are 360 of these men left – and the man in the middle here, at position 180, has a ticket saying ‘86 years old’.

And the man carrying the flag for Tribe five, half-way along their queue for gate 5, in position 100, has a ticket which says ‘89 years old’ (this tallies, because if you recall from the book, the data suggests exercise gives you an extra three to four years of life expectancy).

That’s the average age of these tribes taken care of – but what about the distribution around that figure.

Sadly there simply isn’t any large-scale, reliable data on life-expectancy distribution within these different Tribes. So I’ve simply assumed each Tribe dies according to a “normal distribution”. This means you are as likely to die after as before the mean or median, according to a straightforward formula. Those of you with advanced stats will know that in groups with normal distribution, the mean and median are identical.

This “normal distribution” assumption is quite simplistic, and when I applied it to the data to produce a “5 tribes model” for life expectancy, the model does underplay a little the number of deaths overall for people in their 50s/60s compared to the real world data, so I suspect this normal distribution isn’t capturing everything – but it’s the best I could do with the data available. I strongly suspect the data for Tribes 2 and 3 is skewed in real life, with those who smoke or are clinically obese and are also engaging in harmful drinking (the level below alcohol dependency, but still suspected of having damaging effects on life expectancy) dying younger. This would cause their graphs to begin sloping upwards earlier, and tail off more quickly.

What about the spread of life expectancy (known as standard deviation) – are people packed tightly around the guy holding the flag – or spread out a long way either side. When I was building and testing my model I didn’t have any actual standard deviation numbers for my tribes, as this level of data simply isn’t available (except for the heavy drinkers in the German study – thank goodness for Teutonic efficiency).

Below is the overall table of data for men derived from the model. The mean/median ages are accurately drawn from real life, as are the size of the tribes. Only the Standard Deviations were estimated (except for Tribe 1, courtesy of the Germans), and these are the estimates that meant the overall model fitted most accurately with the real world data. One thing jumps out, which was that in order for me to get the model data to best fit the real world number, I needed to start with that German number and then reduce the standard deviation for the tribes with older life expectancy (especially tribes 4 and 5).


table 1

In the end, what fitted best was for Tribe 5 to only have a standard deviation of around 7 years. So for men, this would mean their deaths would be more tightly clustered around the mean age of 89/90, with 2/3rds living to between 83 and 97.

And that seems intuitively right – the less risk you place your body under, the more likely you are to end up somewhere near the norm for your Tribe.

What does this mean? The larger the standard deviation, the more the graph is flat and wide. Small standard deviations mean the graph is narrow and tall. As a rough approximation, around 2/3rds of all measurements will lie within one standard deviation either side of the central point (mean/median).

So from the German study of alcohol dependent men and women they found the Standard Deviation for men was 11 years, and women was 12 years. So men died on average aged 58, but two thirds of them died between the ages of 47 and 69. For women, mean age of death was 60, and 2/3rds died between 48 and 72

If we assume that the less damage you are doing to your body, the closer to your “normal” allotted span you are likely to live, you can see why each Tribe might expect to have a smaller standard deviation than the one before. I’ve modelled this in the model based on reducing the standard deviation by 0.12 years for every year the mean age of death increases by Tribe for men, and 0.18 years for women.

I used all those assumptions to model a set of normal distributions for each Tribe (assuming 100,000 sample of men), and it looks like this.

image 1

You can see Tribes 1, 2 and 3 are quite spread out, with Tribe 4 a little more tightly bunched, and Tribe 5 really skewed towards the upper end of the age range.

If you add all of those individual Tribes together each year, what does overall life expectancy look like [I did also build in a Tribe 13 – accidental deaths, at 3.5% per annum – just to capture unfortunates, along with idiot ladder-climbers like me who didn’t get lucky!]

image 2

here’s the real ONS data.

image 3

and here are the two sets, side by side:

image 4.png

Doing all of this, and adding these Tribe distrbutions together, produces a model which looks pretty much like the ONS data.

It isn’t perfect of course – and how could it be, given it is using a wide combination of data sets, and combining everything into just five discrete modelling units.

As you can see, the ONS data is showing slightly more deaths per year for people in their 50s and 60s (possibly the early smoking deaths I am missing). The Tribe model is predicting slightly more deaths for people in their 70s, and finally the ONS data peaks slightly later in the 80s but tails off a touch more quickly.

However, the overall correlation between the model’s predicted overall life expectancy data and the ONS data is almost 99% for both men and women. In statistical circles that would be considered “pretty damn good!” and it’s certainly close enough for us to take great comfort from the model that this ‘Tribe” approach is closely correlated with what is happening in the real world.

Interestingly, the reason why women live longer than men seems in large part to be because men are simply making more of the wrong lifestyle choices. 44% of men are in Tribes 1-3, vs just 36% of women (including 3 times as many heavy drinkers). Although overall the median gap in life expectancy between men and women is almost 3.5 years, within most Tribes it is less than 2 years.

The women’s model looks very similar in shape, and when combined the overall model vs ONS data looks like this

image 5

So the model looks pretty robust, and what it tells me is that if you are a man in Tribe 5, you can expect to have a 50:50 chance of living until you are almost 90, and you have a better than 80% chance of at least getting into your 80s.

For women in Tribe 5, the news is even better, with a median age of death of 91.

Obesity Impact on Life expectancy

One final point I allude to in the book is the data on the effects of obesity. Perhaps surprisingly, there is no simple measure for the impact of obesity (BMI > 30) on life expectancy, so I have had to build my own simple model to provide sensible estimates.The data is as follows:


table 2

There is no readily available breakout in obesity levels between clinical and severe, so I have simply assumed severe obesity levels to be twice those of morbidly obese (data for which is available) and then deducted this estimate from the Clinically + Severely obese data, which is available, to calculate clinical levels. Then I have used the NHS NOO life expectancy shortening estimates (2-4 years for clinical; 8-10 years for morbidly) and then used the mid-point 3;6;9 as the relevant years lost by level. Then simple maths produces an average years lost figure for all obese men of 4.8 and all obese women of 4. Clearly different levels for individuals will produce different results, I am simply applying population averages to the available data.