## Notice

Separating Hyperplanes is undergoing maintenance to improve performance. For now, MathJax has been disabled, so math equations that were previously displayed on the blog will appear as gibberish. They will be restored soon.

## Wednesday, March 5, 2014

### Federal Spending

For no particular reason, here's a graph of federal spending:If you're wondering why it stops at 2010, the answer is that I made this over two years ago, and I'm only posting it now because I happened across it while organizing some files. I've broken down the spending data in various ways: red regions represent republican presidencies while blue regions are democrats; also, you can see that, these days, most federal spending is in the form of transfer payments, and that almost all the rest is the military--a perfect illustration of Krugman's point that the Federal government is just an insurance company with an army. I will leave it to my readers to infer how party affiliations affect spending policies.

## Saturday, March 1, 2014

### Childhood obesity is a tricky concept to define

Slate has some righteous indignation over all the media outlets reporting that a study found a 43 percent decline in childhood obesity rates over the past decade. In contrast to the media headlines, the study really found "no significant changes in obesity prevalence in youth or adults between 2003-2004 and 2011-2012."

So what's up with the headlines? They all come from one tiny part of the paper where they found a "statistically significant" 43% decrease in obesity rates among children aged 2 to 5 between 2003 and 2012.

This is a good opportunity to talk about:
1. what it means to be statistically significant
2. how we define childhood obesity
On the first point, what you need to do is recognize that there are  171[*] different ways to define "child" in the dataset the researchers used, which means that even if there was no difference in population childhood obesity rates, at 95 percent confidence we'd expect to find a "statistically significant" (ie, a false positive) difference for roughly 8 of these possible ways. The study in question found one, when "child" is defined as 2 to 5 year olds. Basically what it boils down to is this: if you recode your variables enough, you'll almost always be able to get a false positive.

But really I tend to be skeptical of childhood obesity studies because of the second point. In adults, obesity is defined as being over a specific, time-invariant body mass index (BMI) threshold, usually over 30. This in itself is problematic because many muscular people can be exceptionally healthy and have a BMI above 30, while a flabby couch potato can easily be having diabetes and obesity-related heart attacks with a BMI below 30. These types of issues are just as much problems when we talk about childhood obesity, but that is all overshadowed by an even bigger problem for children: for people under age 18, the normal range for healthy BMI varies wildly with age.

In children, what counts as "obese" depends on exactly how old the child is. Pediatricians diagnose a child as obese by comparing them to other children of the same age and sex (boys and girls mature at different rates too!) --a common rule of thumb is that if they have a higher BMI than 95 percent of other children the same age and sex, they are considered obese. Now that's a problem for healthcare researchers, because that means that by definition, the prevalence of childhood obesity is always 5 percent by definition. This metric doesn't allow year-over-year comparisons.

Instead, healthcare researchers use a slightly modified definition of obesity. A child is considered obese if their BMI is greater than 95 percent of a previous year's children's BMIs.

This modified definition allows some year-over-year comparisons, but should be taken with a grain of salt. This definition of obesity depends on a lot of totally subjective choices: in particular, researchers pick the base year against which to define childhood obesity, and they also decide how to match age groups (should we compare a 5 year old to only other 5 year olds, or lump them in with 2-4 year olds?). What this means is that there are actually way more than 171 ways to code for childhood obesity, so "statistical significance" at the 95 confidence level may not mean anything at all. And besides, with a measure as subjective as that, it's really hard to determine whether the prevalence of childhood obesity is high or low. All the data really tells us is how it compares to years past.

Now, as it happens, I've been working with the NHANES dataset recently, which happens to include BMI data for a large number of children over the past decade:I've just computed the raw average BMI for each of the previous six NHANES panels. Average BMI is a more objective measure than "childhood obesity," but also less meaningful because it doesn't tell us much about the distribution of BMI--we really want a measure of the fatness of the right tail of the distribution. Nevertheless, assuming a relatively constant age and sex distribution across time, this looks to me like a slight decrease overtime. Probably not a 43 percent reduction in childhood obesity, but it's at least consistent with some slight decrease in childhood obesity since 2003.

[*]It's just combinatorics. In NHANES and other surveys, age is a count variable numbered in years, so there are 18 possible values for children. Researchers must then define age-bands within which to define obesity. An age-band consists of a lower and upper bound (ie 2 to 5 year olds), so there are, with replacement, (18 choose 2)=171 ways to define a particular age group. The probability being able to find an age grouping that will show a "statistically significant" difference, conditional on there being no change in the population, is actually quite high.

## Friday, February 28, 2014

### Studies probably understate the cost of ER misuse

I've been thinking about Emergency Room (ER) misuse lately. It seems to me that many of the studies out there probably understate the cost of ER misuse.

Most studies of healthcare costs use "charges" as the dependent variable. Charges are the prices that hospital pricers have negotiated with various individuals and insurers, representing the actual charge to the healthcare recipient.This data is easy to get but not particularly meaningful because in reality charges are not perfectly correlated with the costs of treatment--the hospital's markups over cost vary widely both by proceedure and by recipient, influenced by things such as elasticity of demand and bargaining power. Relatively costly procedures with cheaper alternatives like, say, proton-beam therapies, may have relatively small margins over costs, while little everyday things like aspirin have massive margins relative to their tiny costs. (Even worse, of course, are studies that use list prices which are mostly about bargaining with insurers and bear little relation to costs).

Some studies do a little extra diligence and get the hospital pricers to divulge the baseline "costs" they use, before markups are factored in. Costs are certainly an improvement over charges, but they aren't really what many researchers seem to think--the hospital pricers didn't literally tabulate the cost of every little thing that goes into a given procedure to come up with the cost, nor did they use any kind of reduced form modeling to estimate the cost of individual procedures. Actually, what they do is produce a table of relative factor intensities and linearly scale them by the division's total cost. Here's an example of what I mean:
Suppose that the division performs only two types of procedures, called procedure X and procedure Y. They performed procedure X twice last quarter, performed procedure Y only once last quarter, and the division as a whole incurred a grad total of $100 worth of expenses, including all capital, labor and whatever other costs. The pricing division has determined that procedure Y uses twice as much resources, including all capital and labor inputs, as procedure X uses, which means that Y has a relative factor intensity of 2. To cover the costs of the division, then, we must have 100=cx+2cy where x is the number of X procedures and y is the number of Y procedures, and c is a scalar on the relative factor intensity. X was performed twice and Y performed once, so that implies c=25. Thus, the "costs" used in all these studies are just the scalar c times the relative factor intensity, in our case our data set would say that the cost of X is$25 and the cost of Y is $50. My point in all this is that the cost data used in all these studies aren't true costs, they are actually linearly homogeneous factor intensities, and the scalar c above provides absolutely no additional information of any kind. This has important implications for cost comparison studies in situations where costs do not actually scale linearly. Here's a concrete example. Consider a pediatric hospital that has a clinic for children with sickle cell disease. Sickle cell is a chronic condition with frequent complications such as pain and fever that often, for various reasons, leads patients to be admitted to ERs even though the sickle cell clinic could treat those complications. So want to know the cost difference between treating a sickle cell patient experiencing pain or fever in the ER versus treating them in the sickle cell clinic. One way of approaching that question would be to estimate the effect of treating sickle cell patients on total division costs for the ER and the sickle cell clinic. So for the ER, we get a regression of the form:$$Y_t=\beta_0+\beta_1S_t+\beta_2P_t+\beta_3S_t*P_t+\epsilon_t$$ where $Y_t$ is the ER division's total cost in period $t$, $S_t$ is the number of sickle cell patients treated in period $t$, $P_t$ is the number of non-sickle-cell patients treated in period $t$, and $\epsilon$ is the error term. Assuming that the necessary OLS assumptions etc. are satisfied, estimating this model let's us easily calculate the average marginal cost (AMC) of treating sickle cell patients in the ER, which is given by $$AMC_{ER}=\frac{\partial Y_t}{\partial S_t}=\beta_2+\beta_3P_t.$$ That gives us the cost of treating sickle cell patients in the ER, which we can then compare to the cost of treating them in the sickle cell clinic. The sickle cell clinic, obviously, does not treat all the other kinds of patients that the ER treats, so the sickle cell clinic's cost regression looks like this: $$X_t=\gamma_0+\gamma_1S_t+\mu_t$$ where $X_t$ is the clinic's total costs in period $t$, $S_t$ is the number of sickle cell patients treated in the clinic in period $t$, and $\mu_t$ is the error term. As before, we can compute the average marginal cost of treating a sickle cell patient in the sickle cell clinic as $$AMC_{SCC}= \frac{\partial X_t}{\partial S_t}= \gamma_1$$ and we can now present the difference in average marginal cost of treating them in the ER rather than the clinic as $$AMC_{ER}-AMC_{SCC}= \beta_1+\beta_3 \bar{P_t} -\gamma_1,$$ where $\bar{P_t}$ is the average value of $P_t$ in the data (ie, the average number of non-sickle-cell patients treated in the ER each period). But that's not what existing studies are doing. Existing studies are using the hospital provided "costs," which are really just linear factor intensities. Thus, they are estimating a model that looks more like this: $$Z_i=\gamma_0+\left(\beta_1-\gamma_1\right)S_i+\nu_i$$ where $Z_i$ is the "cost" of treating patient $i$ based on the "cost" data given to them by the hospital pricers, $S_i$ is a dummy variable equal to 1 for sickle cell patients that were treated in the ER and zero for sickle cell patients treated in the sickle cell clinic, and $\nu_i$ is the error term. Thus in these papers, $$AMC_{ER}-AMC_{SCC}= \beta_1 -\gamma_1.$$ My point is that papers that use hospital "costs"--which are really just linear factor intensities--will underestimate the cost of treating patients in the ER by $\beta_3\bar{P_t}$. In economics terms $\beta_3\bar{P_t}$ is the gains from specialization, or alternatively, the economies of scale. The problem is that linear factor intensities are by definition linear--they will never, ever, let you estimate economies of scale which are by definition non-linear. Actually, there are other problems with hospital-provided "costs" as well. For example, "costs" are calculated in a way so that they include a proportion of the divisions' fixed costs as part of the cost of treating a patient. That means that cost-effectiveness studies that use this "cost" data are actually basing their recommendations on sunk-costs--that is, costs that actually would be the same under either policy--which is a huge no-no. The ER is going to be equipped to treat sickle cell patients regardless of whether a particular sickle cell patient goes to the clinic or to the ER, so the fixed costs of equipping the ER to treat sickle cell should not be counted as part of the gains from triaging sickle cell patients to the sickle cell clinic--but by using hospital-provided "costs" much of the literature is implicitly committing this fallacy. The implication for future research is that we should favor reduced-form estimates of average marginal cost over hospital-provided "costs" per procedure. ## Thursday, February 27, 2014 ### Gradschool is great. Here's how to do it right. There's a lot of nay-sayers on grad schools these days. Matt Yglesias is one of them. They like to claim that graduate degrees are expensive, teach you nothing, and don't increase your earnings or employment prospects. Now, I don't doubt that useless graduate degrees exist--don't bother with MBAs from no-name for-profit colleges. But a masters or phd from a mid-tier or elite-tier university is great. First, the claims that graduate degrees don't pay is simply wrong: I'm not quite sure what the education nay-sayers think of this chart. Maybe they argue that correlation isn't causation, and that what's actually happening is that better workers are selecting into grad schools. I'm sure that that does explain part of it, but it really doesn't change the calculus at all--by not going to grad school, you are signalling to employers that you are not grad school material. That harms employment prospects even if grad schools have no effect on productivity. There may be some professions, like blogging for Slate perhaps, where having a graduate degree really doesn't change how much employers are willing to pay. The editors at Slate can easily observe Matt Yglesias's writing quality and web traffic, so they don't need an economics diploma on his wall to know he's a great economics writer (I'm assuming, of course, that Yglesias's leaving Slate has nothing to do with his lack of education). You also don't need a graduate degree to work at McDonalds. But for nearly all mid to upper-level positions at any firm or instution, you will find that having a graduate degree will increase your starting salary by a lot. You'll also find that a lot more people will return your calls when you inquire about job openings. But more importantly, in my view, the biggest error that these nay-sayers make when they argue against grad school is that the assume grad school is all signalling and no actual learning. This is clearly false. Getting a graduate degree in computer science really does make you a much better programmer. A Master's degree in English really does make you a better writer and editor. In my own case, I'm sure I got the job researching the health economics of pediatric sickle cell at the leading children's hospital in the US in large part because I have the graduate credential in economics from Cornell. But on the other hand, the reason why I'm able to do research at this level is because I studied graduate-level health economics and econometrics at Cornell. I suspect that Yglesias and others think that your employers will just teach you what you need to know, but this is false--my job isn't something where my manager just points to something and asks me to do it. To a large extent, I'm operating by my own parameters--they expected me to start with enough expertise to marshal these projects from nothingness to publication. Of course, I have great collegues who are willing to lend their expertise as well when I ask for it, but there are no training sessions, no one showing me how or even what to do, I'm expected to have learned all that at grad school, and that's exactly where I learned it. This is all to say that even if grad school wasn't causally related with better earnings and employment prospects--though it is--it would still be worthwhile solely for the human capital it you get from it. Something else I want to add here is that statistics on graduate degrees probably understate the gains they offer. That's because most people who go to grad school, especially PHD programs, do so with the aim to do research afterwards. Research is a notorious public good: it is hugely valuable to society, but usually the people who do it cannot profit from it once it becomes public knowledge, so we end up being severely under-invested in research. As a result, researchers are usually not paid much. You can graduate from a PHD program in biology and end up a modestly paid minion in some NSF-funded lab somewhere. These workers are motivated in part because they love research, and in part because they hope to break into "the profession" and get a higher-paying academic appointment somewhere with their own lab and minions. But that doesn't change the fact that they had the option of making fantastic salaries elsewhere making biological weapons for DARPA or what have you. Time it right and as a private-sector PHD environmental economist you can become a millionaire from a couple years of contracting with Exxon if you catch my drift... Point is, low-earning graduate degree holders are typically low-earning by choice. Gradschool naysayers are being deeply deceptive and unhelpful, biased by their own unusual experience. Fortunately, I have a couple of pointers that may be helpful: • Many employers will pay for your graduate degree. For example, if you get a full-time job at Cincinnati Children's Hospital Medical Center right after you graduate with a bachelors, one of the standard benefits is that they will pay for your graduate courses (at any accredited school), which you can take while working full-time. There is an annual limit to how much they will pay, but I know lots of people who got totally free masters degrees from great universities this way. • PHDs are about research. So are many masters degrees. If you are only looking for a credential and really don't like research, find a work-oriented terminal masters program. The only good reason to do a PHD is if you want to do research. End of story. • Collaborate, collaborate, collaborate. Here's a big secret: having two papers where you did only half the work really does count as more than one paper where you did all the work. In grad school, your job is to coauthor on as many different papers as possible. Gradschool is a foot-in-the-door for tons of opportunities that you cannot have any other way, so if you don't get anything out of gradschool, it is entirely your own fault. A good rule of thumb is that if you don't personally know most of the students and all of the professors in your graduate field, you haven't collaborated nearly enough. • Most universities will let full time employees complete graduate degrees I've known several people who ended up with PHDs simply because they worked for the university after graduating from their bachelors. Typically they will let you take any classes for free (provided they have enough space in the class), and if you impress the right people with your ability to research, they'll give you the degree. Also, working at a university is kind of a free credential when it comes to submitting research to publications--they often won't notice that it doesn't say "assistant professor" after your name. • Adjunct teaching doesn't pay. DO NOT MAKE THIS YOUR MAIN SOURCE OF INCOME. Based on how much I've observed them working, I calculate their hourly wage to be around$12 an hour at major universities. That said, I do suggest teaching a class or two in addition to a full time job. It's fun, it's illuminating, it's something you can only do with a graduate degree, and it does pay a little bit.
• Programming and Statistics are the most marketable skills
Don't let yourself leave gradschool until you've mastered programming and statistics. Gradschools have so many resources you may never have again.
• Gradstudent RA positions are kind of crap.
Typical RA jobs are just cleaning data and summarizing journal articles for some well-funded professor somewhere. Which is to say, RAs do more clerical work than research. By contrast, right now, I teach principles of economics courses and do health economics research full-time. That's basically exactly like grad school, except I make twice as much and don't pay tuition, and do actual substantive research for publication where I will be first author--without doing clerical crap for someone else's research. Grad students don't seem to realize that they are allowed to turn down RA positions and seek employment elsewhere, even outside the university. That's hard to do in, say, Ithaca, New York. But if you are a grad student at Ohio State University, I recommend you look for better-paying jobs downtown. Turning down RA funding may cause you to loose the tuition remission too, but remember, many employers will pay your tuition.

## Wednesday, February 26, 2014

### The criticisms of the mammogram study are valid

...but we need to act on the evicence we have.

Aaron Carroll reminds me of the people who were unconvinced by the latest mammogram study that showed, basically, that detecting breast cancer before a noticeable lump exists has no value-added. In post after post, Carroll responded to various criticisms of the mammogram study that argued, for example, that the technology in that study was two decades old, and thus the results are not generalizeable to today. As Carroll points out, that critique can always be applied to pretty much any large long-term study in medicine; the critique is technically valid, but not a good reason for ignoring the evidence.

It's very true that we are in the process of researching and improving treatment options for breast cancer, and thus what is true about early detection with mammograms today might not be true in the near future. There's no way to know. But there's a right way and a wrong way to handle that uncertainty.

The right way is to demand more mammogram studies, while recommending against universal mammogram screenings in the meantime. Mammogram screenings represent more than minimal risk to patients: even ignoring the substantial economic costs (resources that could have been used more productively), they lead to more biopsies and other proceedures in women who do not have cancer, which can cause serious complications in a small fraction of people[*] that should not be ignored in the cost-benefit calculus.

The wrong way to handle this uncertainty is to continue to advocate universal mammogram screening inspite of the fact that the evidence says they do more harm than good.

That's not to say that no one should get mammograms until further notice. People in particularly high-risk subgroups, such as women in families with high rates of breast cancer, should probably still get regular screenings. And generally the first thing you'll do after feeling a lump in the breast is get a mammogram, which may also be a good idea. But, a healthy 40 year old woman with no prior breast cancer risks should not be getting mammograms, until further notice.

[*]I use the gender neutral "people" instead of "women" because breast cancer also occurs in men. Sometimes, mammograms are also performed on men, though this is less common because it tends to be more painful and has less chance of detecting cancer than it does for women.