Separating Hyperplanes

Childhood obesity is a tricky concept to define

Matthew Martin 3/01/2014 11:50:00 PM

Slate has some righteous indignation over all the media outlets reporting that a study found a 43 percent decline in childhood obesity rates over the past decade. In contrast to the media headlines, the study really found "no significant changes in obesity prevalence in youth or adults between 2003-2004 and 2011-2012."

So what's up with the headlines? They all come from one tiny part of the paper where they found a "statistically significant" 43% decrease in obesity rates among children aged 2 to 5 between 2003 and 2012.

This is a good opportunity to talk about:

what it means to be statistically significant
how we define childhood obesity

On the first point, what you need to do is recognize that there are 171[*] different ways to define "child" in the dataset the researchers used, which means that even if there was no difference in population childhood obesity rates, at 95 percent confidence we'd expect to find a "statistically significant" (ie, a false positive) difference for roughly 8 of these possible ways. The study in question found one, when "child" is defined as 2 to 5 year olds. Basically what it boils down to is this: if you recode your variables enough, you'll almost always be able to get a false positive.

But really I tend to be skeptical of childhood obesity studies because of the second point. In adults, obesity is defined as being over a specific, time-invariant body mass index (BMI) threshold, usually over 30. This in itself is problematic because many muscular people can be exceptionally healthy and have a BMI above 30, while a flabby couch potato can easily be having diabetes and obesity-related heart attacks with a BMI below 30. These types of issues are just as much problems when we talk about childhood obesity, but that is all overshadowed by an even bigger problem for children: for people under age 18, the normal range for healthy BMI varies wildly with age.

In children, what counts as "obese" depends on exactly how old the child is. Pediatricians diagnose a child as obese by comparing them to other children of the same age and sex (boys and girls mature at different rates too!) --a common rule of thumb is that if they have a higher BMI than 95 percent of other children the same age and sex, they are considered obese. Now that's a problem for healthcare researchers, because that means that by definition, the prevalence of childhood obesity is always 5 percent by definition. This metric doesn't allow year-over-year comparisons.

Instead, healthcare researchers use a slightly modified definition of obesity. A child is considered obese if their BMI is greater than 95 percent of a previous year's children's BMIs.

This modified definition allows some year-over-year comparisons, but should be taken with a grain of salt. This definition of obesity depends on a lot of totally subjective choices: in particular, researchers pick the base year against which to define childhood obesity, and they also decide how to match age groups (should we compare a 5 year old to only other 5 year olds, or lump them in with 2-4 year olds?). What this means is that there are actually way more than 171 ways to code for childhood obesity, so "statistical significance" at the 95 confidence level may not mean anything at all. And besides, with a measure as subjective as that, it's really hard to determine whether the prevalence of childhood obesity is high or low. All the data really tells us is how it compares to years past.

Now, as it happens, I've been working with the NHANES dataset recently, which happens to include BMI data for a large number of children over the past decade:

A slight decrease in average childhood BMI over time.

I've just computed the raw average BMI for each of the previous six NHANES panels. Average BMI is a more objective measure than "childhood obesity," but also less meaningful because it doesn't tell us much about the distribution of BMI--we really want a measure of the fatness of the right tail of the distribution. Nevertheless, assuming a relatively constant age and sex distribution across time, this looks to me like a slight decrease overtime. Probably not a 43 percent reduction in childhood obesity, but it's at least consistent with some slight decrease in childhood obesity since 2003.

[*]It's just combinatorics. In NHANES and other surveys, age is a count variable numbered in years, so there are 18 possible values for children. Researchers must then define age-bands within which to define obesity. An age-band consists of a lower and upper bound (ie 2 to 5 year olds), so there are, with replacement, (18 choose 2)=171 ways to define a particular age group. The probability being able to find an age grouping that will show a "statistically significant" difference, conditional on there being no change in the population, is actually quite high.