Thursday, October 30, 2014

Some ebola quarantine math

22% of all cases were in the past week. This epidemic isn't over yet.
You've probably heard of the case of Kaci Hickox, the nurse who is currently quarantined under mandatory voluntary1 house arrest in Maine for 21 days after she returned from fighting ebola in west Africa, even though she tested negative for ebola, has no symptoms, and is not infectious in any way. Several states, beginning with New York and New Jersey, have promulgated these mandatory quarantines of all people returning to the US from west Africa despite the lack of evidence that these are medically necessary, and despite unanimous opposition from medical experts. The argument against the quarantines that I see most often is that
  • Asymptomatic individuals are not infectious even if they have ebola, and
  • Medically unnecessary quarantines would both harm and discourage desparately needed healthcare workers from joining the fight against ebola where it is most important: in west Africa
These are good points, but leave out a lot from the cost-benefit calculus, I think.

One comment I've been making in various forums is that quarantines aren't even that effective to begin with. Some 13,703 people have been infected with ebola during this epidemic, of which 521 were healthcare workers treating quarantined ebola patients. Not all infected health care workers got ebola from these patients necessarily, but the vast majority of them did. Any way you look at it, that is an astonishingly high quarantine failure rate. And don't think this is a west Africa thing: there have been two ebola transmissions within the US, and in both cases the ebola patient was quarantined at the time the transmission happened.

The ineffectiveness of quarantines further undermines the argument for mandatory quarantines of low-risk patients. To see why, here's the math. Let [$]Q[$] represent the marginal social and economic cost of quarantining an individual while [$]p[$] is the probability that an individual returning from west Africa has ebola, [$]t_u[$] is the probability that he will infect someone else given that he is not quarantined, [$]t_q[$] is the probability that he will infect others if he is quarantined, and [$]I[$] is the social and economic cost if an infection occurs. Thus, given the probability of having ebola [$]p[$] and transmission rate [$]t[$] an individual has an an expected infection cost of [$]Ipt.[$] The marginal benefit of quarantining returning health care workers from west Africa is the decrease in the expected social and economic costs from infecting others: [$]Ipt_u-Ipt_q.[$] It is beneficial to quarantine this individual if and only if the marginal benefit exceeds the costs of the quarantine itself: [$$]Ip\left(t_u-t_q\right) \geq Q.[$$] Setting this out explicitly helps clarify our thinking about the costs and benefits of a quarantine. A smaller probability of having ebola, for example, makes it less likely that a quarantine will be beneficial, as does having a lower probability of being infectious. In particular, the higher [$]t_q[$] is, the less benefits there are from quarantining and more likely it is that the quarantine will do more harm than good.

First, note that the probability that an individual arriving from west Africa has ebola--[$]p[$]--is still pretty low. The population of the three countries we are talking about--Guinea, Sierra Lione, and Liberia--is 22.136 million people, so the raw probability of a person from the heart of the epidemic having ebola is less than 0.0006. I suspect most people think that [$]t_u[$] is high--that they are likely to get ebola from someone who has it--and that [$]t_q[$] is zero--that putting someone in quarantine means they can't spread the disease. Neither is true. Out of thousands of contacts between the time of infection and the time of either death or recovery, the average ebola patient transmits the infection only twice--compared to 4 for HIV or 18 for measles, for some context--making for a relatively small transmission rate outside of quarantine. And the fact that some 521 out of roughly 3,000 health care workers--a much higher rate than the general public--got infected despite quarantines in place suggests that [$]t_q[$] is not that much lower than [$]t_u[$]. The reduction in the probability of transmission is [$]p\left(t_u-t_q\right),[$] and we've established that [$]t_q[$] is relatively high (certainly way higher than zero), so a quarantine can only be justified if [$]p[$] and [$]t_u[$] are large. That is, quarantines are only beneficial when a person is very likely to transmit an infection.

But of course, that is almost irrelevant given the added fact that asymptomatic individuals who have ebola are not infectious. Hence, [$]t_u=0[$] for the people we are talking about, and as a result, regardless of [$]I,[$] [$]p,[$] or [$]t_q,[$] it can't be net beneficial to quarantine these asymptomatic individuals.

So the case against quarantines of asymptomatic individuals returning from west Africa is
  1. The probability that they have ebola is small
  2. They aren't infectious even if they did have ebola
  3. Quarantines would not be that effective even if the subjects were infectious
  4. Quarantines are economically costly
  5. Quarantines discourage much-needed healthcare workers from volunteering to help figh ebola in africa
  6. Discouraging health workers imposes a high social cost because fighting ebola at the source in Sierra Lione, Liberia, and Guinea is by far the most effective way to stop the epidemic.


1 Not a typo.

Tuesday, October 28, 2014

Theory says Nate Silver is full of crap

...not just Nate Silver.

I keep seeing these Senate forecasts all over, and can't make heads or tails of them:
"Probability" of GOP winning control of Congress, according to several experts.
Now I'm a fairly competent forecaster, and I don't know what that means. You can't observe a probability of something, and therefore you can't forecast the probability of something. Matt Yglesias thought he was waxing philosophical when he pointed this out, but it's actually a rigorous statistical critique--the "probabilities" we compute in statistical procedures are nothing more than artifacts of the procedure used, not real world objects that can be forecast.

The Bayesians have all finally gone crazy. A forecast consists of a range of outcomes at a given, pre-determined probability level. For example, "we are 95% confident that the GOP will win between 45 and 55 Senate seats" is a forecast. This claim is falsifiable--if the GOP gets fewer than 45 seats or more than 55 seats our forecast is incorrect. This claim has a margin of error. This claim is scientific.

The pollsters, by contrast, have all gotten it exactly backwards by picking a pre-determined outcome--GOP gets 51 Senate seats--and giving us a probability. How do we validate such a claim? We can't even falsify it, because if republicans don't take control of Congress, these pollsters will all simply say their probabilities were correct and that something improbable happened. We aren't even given a margin of error, which is all the more troubling given the huge range--from 55% to 93%--of forecast probabilities!

At the same time, it's obvious why they are reporting probabilities instead of forecasts. It's because, with the possible exception of the Washington Post, none of these forecasts are statistically significant. Science typically requires 95% confidence (or, in the Bayesian world, "probability") in a prediction to be significant. None of these models produces a statistically significant prediction of who will control congress.

A recent paper in AER explains why "experts" would make such a vacuous prediction. They present a model of an "expert" who makes forecasts about the sequence of future events, and a client who tests these forecasts against the data. Even though the client chooses how to test the expert's predictions, it is usually possible for the expert to make a forecast to pass the test even when the expert actually has no expertise whatsoever. In our case, the implicit "test" of the forecasts is whether the actual outcome was likely given their forecasts, which is pretty darn easy to pass even without looking at any polls--the standard scientific standard for an unlikely event is a probability less than 5%, so all of these models are still saying that a non-GOP Congress is a likely outcome. They can't be falsified. Even if we change the significance threshold, the pollsters could simply change their forecasts so that they still can't be falsified.

The alternative to this is claim validation. The AER shows that if you really want to know who is full of BS, make them validate their claims. Make them give an actual prediction of how many GOP congressmen and Senators there will be, and if the actual number is not in their predicted range, call them out for being wrong. Although this is not strictly logical--one can falsify but never truly validate--the authors show that the fact that forecasts are endogenous to the tests we apply to them means that only a standard of validation can actually force forecasters to produce truly falsifiable claims.

Until then, I'm calling them out. All the pollsters are full of crap. That isn't how you do statistics.

Wednesday, October 22, 2014

What happens to health during recessions? (continued)

A quick follow-up to my previous post. Austin Frakt pointed me to another study finding that while recessions appear to slightly improve health for working-age adults, mortality and healthcare utilization by Medicare-age seniors rises during recessions. Both Frakt and Jason Shaffrin discuss medical crowding out as a possible cause for this trend:
"Unlike younger adults who seem to use less healthcare as unemployment rates rise, seniors use more inpatient care…the rise in healthcare use may be tied to an increased willingness of healthcare providers to accept Medicare patients."
Ok, that explains healthcare utilization, but not mortality, unless going to the doctor's office increases mortality risk.

And here is yet another trend: men got more vasectomies during this recession.
"In 2006, 3.9 percent of men said they had had a vasectomy; in 2010, 4.4 percent reported having the surgery. That means an additional 150,000 to 180,000 men per year had vasectomies in each year of the recession."
I'd guess the financial hardship reminded a lot of men that having children is expensive.

Tuesday, October 21, 2014

What happens to health during recessions?

From Eagan et al, comparison of actual GDP to health-adjusted GDP, using selective VSL estimates that are way too high.
A few recent studies have me puzzled. The first is Eagan, Mulligan, and Phillipson (2014) on mortality over the business cycle, which documents that the mortality rate is pro-cyclical, with deaths increasing in expansions and decreasing in recessions. The effect was so large in 2009 that the health value of the lives saved offset a majority of the loss to GDP! Eagan et al aren't the first to note this trend and some plausible theories have been proposed to explain it: work-place injuries and commuter traffic accidents, for example, increase when more people have jobs, resulting in higher fatalities. And yet, here's a research letter in JAMA saying that there was a marked increase in Emergency Department visits by children coinciding with the recession in 2009. So did health improve during the recession or not? While you could spin a story about how recessions increase health and increase ED use by, for example, reducing traffic accidents and workplace injuries while also reducing health insurance coverage, which many believe increases ED use (though, see here). Or perhaps this is just an age thing--the Eagan et al result is heavily weighted towards older adults, who are the only demographic with any substantial mortality risk in their data, whereas JAMA looked at children, and if becoming unemployed causes the reduction in mortality, then this explains why the cyclicality would differ between children and adults.

But here's a third article in American Journal of Public Health finding that becoming unemployed significantly increases an individual's mortality risk during recessions but slightly decreases their risk during expansions! So we have a compositional puzzle on our hands:
  1. Employment decreases during recessions, and
  2. Aggregate mortality decreases during recessions, but
  3. loosing employment increases individual mortality risk
So which is it? Do recessions increase health or decrease it? And how can aggregate mortality decrease if individual mortality increases?

Friday, October 17, 2014

On object oriented programming

tl;dr() return true;

As a result of a recent promotion, I've been brushing up on my programming skills. By far the most dominant programming paradigm today is "object oriented programming," (OOP) which is a style of coding that aims to represents programs as data structures contained in objects. What is an object? Yes.

According to Smash Company, it turns out that "Object Oriented Programming is an expensive disaster which must end." The article is quite lengthy and well informed, but I think it is at once too harsh on OOP and not harsh enough.

In OOP languages everything in a program is an "object"--your program itself is an object, all of the components of your program are objects, all of the components of those components are objects--"it's objects all the way down" as the saying goes. This is where I think Smash Company was too harsh: the generic and abstract nature of the "objects" in OOP languages makes them extraordinarily flexible, allowing these languages to support a diverse range of programming styles. Rather, they argue against not so much OOP languages themselves as the programming style which is also known as "Object Oriented Programming." I think this is something of a misnomer. The programming style that has come to be associated with OOP should more precisely be called "Class-Oriented Programming" because the central tenant is that all data and functions in your code should be organized in a specific type of object called a class. Indeed, Bjarne Stroustrup, who invented the first popular OOP language originally called his creation "C with classes" instead of the later name "C++."

While you can do Class-Oriented Programming (COP) in languages like Python and Javascript, these are actually more highly general versions of the core COP languages of C++, C#, and Java which more or less torture programmers into adopting the COP style. In COP, the aim is to put all your code into classes that are separate from each other and minimize the amount of code inside the main section of your program. The classes themselves shouldn't do anything other than store your initial data. Instead, to use the data contained in a class the main program will create "instances" of the class as needed at run time, and perform all operations on these instances rather than the classes themselves. The advantage of this is that it prevents data destruction. In a non-COP program, for example, you might start with a dataset, perform some operations that change the dataset to get the desired output, but then be unable to reuse the original, unmodified dataset because it has now been altered. Ideally, a COP design prevents this problem by allowing you to call additional, separate instances of the original dataset (ie, the class) for each separate routine you are performing, without ever modifying the original data. This is called "abstraction," one of the three pillars of COP (note, I'm use the term "abstraction" through out this post generically to refer to a few separate but related OOP concepts: abstraction, encapsulation, and information hiding. Sources disagree on the exact definitions, which aren't really relevant here.).

But more likely than not, your program does not consist of doing a series of identical routines on identical datasets, but instead performs distinct routines on a set of related datasets. You could just make separate classes for each type of dataset and leave it at that. But if the types are related to each other, then they probably have a lot of common features that would require you to copy and paste identical chunks of code, and that's just a pain in the butt, especially when, months from now, someone asks for additional parameters to be included in all the data and you have to change every single dataset type you use. For this, COP offers inheritance. You still have to define as many classes as you have types, but if you create one or more additional parent classes where you put all the common code the types share, you can save time by simply telling the child classes to inherit this code from the parent classes. Indeed, "inheritance" is usually cited as the second pillar of COP.

But sometimes your classes are so similar you just say what the heck and combine them into one by programming the one class to respond differently under different situations, mimicking the multiple class you eliminated. And programmers are so lazy that this "polymorphism" is actually called the third pillar of COP.

So how does all this actually fare in practice? Like Smash Company, I'm inclined to conclude that it fares not well. At it's core, programming is just set theory. And it turns out that this set is too complicated for the COP paradigm:
There are four classes in your basic Venn diagram. A is the superclass, representing all the things that the two subsets B and C have in common. B and C each inherit from A but have their own distinctive features as well. D inherits from both B and C. Unless you use C#, in which case D is just screwed.
This is where I think Smash Company was not harsh enough: inheritance, at least as it is implemented in COP is bad bad bad.

Let's explore why with a simple application. For reasons no one is entirely happy with, my employer uses C#, so we'll use C# to construct a standard 52-card deck of playing cards. The goal is to use the COP paradigm to produce 52 objects representing the 52 playing cards--that the goal is to produce objects means we are already biasing this exercise in favor of COP. Each card bears the identifying features of a playing card--color, rank, and symbol--and does stuff for game play. So let's rewrite that in C# notation:
abstract class Card {
    private string symbol;
    private string rank;
    private string color;
    public string[] deal(){
        return new string[]{color,rank,symbol};
    }
    public void show() {
        Console.Write("\n{0} {1} of {2}", color, rank, symbol);
    }
    public Card(string c,string s,string r) {
        color=c;
        symbol=s;
        rank=r;
    }
}
Ok, so all I've done here is create a dataset describing what a card is like: each card has three data fields called symbol, rank, and color, that are left blank for now, as well as three methods called deal(), show(), and Card(). Technically, there are no functions in C#, and all behaviors are carried out by methods. What is a method? It's a function.

Two of the methods in the class Card, deal() and show(), are used in game play while the third is called a constructor--it's purpose is to create objects of the type Card, and those objects are called "instances" of the Card class. Remember, the code above isn't a card, it creates cards with all of those specified features. We haven't actually created anything yet.

Now, no card player ever says "this is a card that has the color red, rank Jack, and symbol Diamond"--rather we say "this is a red Jack of Diamonds." In COP, anything that fills the blank "is a ____" should be described by a class, while "has a ____" are things that should be properties of the class. So we can divide Card into two subclasses, Red Card and Black Card which are themselves classes:
abstract class Red:Card {
    public Red(string s, string r)
        :base("Red",s,r)
    {
       
    }
}
abstract class Black:Card {
    public Black(string s, string r)
        :base("Black",s,r)
    {
       
    }
}

These two new classes, Red and Black, inherit all the fields and methods of the Card class, but differ in the colors they assign to cards. We can go further, because there are two kinds of Red cards, Diamonds and Hearts, and two kinds of Black cards, Clubs and Spades. Thus we have four more classes, inheriting from Red and Black and, by transitivity, from Card as well:
class Heart : Red
{
    public Heart(string r) : base("Hearts", r) { }
}
class Diamond : Red
{
    public Diamond(string r) : base("Diamonds", r) { }
}
class Spade : Black
{
    public Spade(string r) : base("Spades", r) { }
}
class Club : Black
{
    public Club(string r) : base("Clubs", r) { }
}

Ok, we've still not created any cards. But we have created various classes that describe some of the data that makes up each card, and so far we've done it without having to copy and paste anything, because our inheritance classes let us apply all the common features to their respective cards without needing to mindlessly reuse code. This is good. But now we're stuck. We still have 13 more classes--2 through 10, plus Jack, Queen, King, and Ace--but no way to represent them as classes in the COP paradigm, because of inheritance.
All programming is set theory. The COP paradigm is an incomplete representation of set theory.
A Jack of Hearts, for example, should inherit from the classes Jack and Hearts, but COP does not support instantiating a single object from more than one class. Even if we wanted to create a class called Jack of Hearts, C# does not allow multiple inheritances so class JackOfHearts:Jack,Heartsdoes nothing. It's true that other languages like C++ do allow multiple inheritances at the class level, but this doesn't really help us as it would mean coding 72 separate classes to generate 52 playing cards.

It's true that there are some silly tricks you can do with interfaces here, but they are exactly that: stupid hacks that probably shouldn't work but do. Here's what I will do. Instead of creating classes to represent the actual ranks of the cards, I will create one more class, representing the concept of a deck of cards. This final class will contain the instructions for making all 52 cards from the 7 classes we've already defined. Here it is:
class Deck :List<Card>
{
    private string[] rankList = {"Ace","King","Queen","Jack","10","9","8","7","6","5","4","3","2"};
    public Deck(){
        for(int i=0;i < 13; i++){
            base.Add(new Heart(rankList[i]));
            base.Add(new Diamond(rankList[i]));
            base.Add(new Spade(rankList[i]));
            base.Add(new Club(rankList[i]));
        }
    }
}
What this does is create a class that follows the structure of a List, which is a pre-defined class in C#. The elements of the list will be our 52 card deck.

So that's all the classes. All of the data needed to make a 52 card deck are contained in those 8 classes, and we wrote this without hardly needing to repeat any bits of code. But we still don't have our deck of cards, these classes are merely instructions to the computer about how it would go about making the cards. To actually tell the computer to make them, we have a 9th class--which is common to all C# programs, that contains a static void main() method. When you actually run the application, it will start by running this method, and then following your instructions to all the various classes and methods in the order that you specify within main(). At this point, creating the deck is as simple as Deck myDeck=new Deck();We can now use this deck for card playing. You can imagine doing something fun here, but in our case we will merely read all of the cards off into the console:
for(int i=0; i < myDeck.Count;i++){
   Console.WriteLine(myDeck[i].show());
}

Putting all the code together now:
using System;
using System.Collections.Generic;
namespace cards
{
class Card {
   private string symbol;
   private string rank;
   private string color;
   public string[] deal(){
      return new string[]{color,rank,symbol};
   }
   public void show() {
      Console.Write("\n{0} {1} of {2}", color, rank, symbol);
   }
   public Card(string c,string s,string r) {
      color=c;
      symbol=s;
      rank=r;
   }
}
abstract class Red:Card {
   public Red(string s, string r):base("Red",s,r){}
}
abstract class Black:Card {
   public Black(string s, string r):base("Black",s,r){}
}
class Heart : Red
{
public Heart(string r) : base("Hearts", r) { }
}
class Diamond : Red
{
public Diamond(string r) : base("Diamonds", r) { }
}
class Spade : Black
{
public Spade(string r) : base("Spades", r) { }
}
class Club : Black
{
public Club(string r) : base("Clubs", r) { }
}
class Deck :List<card>
{
   private string[] rankList = { "Ace", "King", "Queen", "Jack", "10", "9", "8", "7", "6", "5", "4", "3", "2" };
   public Deck(){
      for(int i=0;i < 13; i++){
         base.Add(new Heart(rankList[i]));
         base.Add(new Diamond(rankList[i]));
         base.Add(new Spade(rankList[i]));
         base.Add(new Club(rankList[i]));
      }
   }
}

class Program
{
   static void Main(string[] args)
   {
      Deck newdeck=new Deck();
      foreach(Card i in newdeck){
         i.show();
      }
      Console.ReadKey();
   }
}
}





This code has some admirable features. For one thing, we've achieved a significant amount of data abstraction: almost everything in our code is protected, so that computer, programmer, or user errors cannot corrupt the data. The color, rank, and symbol data for all of the cards, for example, are private variables that cannot be altered once the cards are created. Moreover, the Red, Black, and Card classes are all abstract, meaning that we cannot instantiate Red cards that aren't Hearts or Diamonds, for example. The abstraction of this code into modular classes also makes it exceedingly easy to add new types of cards--for example, Green Flowers as a new type of card along side the traditional Hearts, Diamonds, Clubs, and Spades, is as simple as adding Green and Flower:Green classes, and inserting one line into the Deck class. And, at least for our purposes, adding another rank--say, an 11 to follow the traditional 2 through 10, is as easy as inserting "11" into the rankList array.

At the same time, this code represents a complete failure of the COP paradigm to achieve its goal. We aren't actually creating the right objects. We aren't, for example, creating a Red Jack of Hearts, but rather a Red Heart that has a Jack. The only COP way to produce a true Red Jack of Hearts requires more classes than we have cards, and isn't even supported in all the major COP languages. And the level of abstraction that is possible here could be better, since this code gives both the Deck class and the main() method direct access to the Heart, Diamond, Spade, and Club classes, which in reality are abstract. This exposes our code unnecessarily to additional possible sources of error that--in a larger program--could be extremely hard to track down because it could originate anywhere in the code. Moreover, we are calling constructors from inside the Deck class, which is bad according to COP.

The failure of COP to actually be capable of representing a standard 52-card deck is a bit comical. This is programming, after all, and computing repetitive patterns of data in a systematic way is what computers are supposed to be good at. Hardly anything in the real world is as systematically patterned as the 52 card playing deck where, although each card is unique, no card contains unique attributes and all of the attributes are assigned according to fixed, very simple rules. I should be able to tell the computer what these simple rules are, and get it to do the rest.

But isn't a simple rule...wait for it...just a function? YES! In fact, if we adopt a more functional paradigm instead of full-out COP, multiple inheritance is no problem at all. Consider the following generic non-COP language
function setA(obj){
   var obj={};
   obj.prop1= <properties of set A> ;
   return obj;
}
function setB(obj){
   obj.prop2= <properties of set B>;
   return obj;
}
function setC(obj){
   obj.prop3= <properties of set C>;
   return obj;
}
function main(){
   var b=setB(setA());
   var c=setC(setA());
   var d=setC(setB(setA()));
}
var closure=main();
We just constructed what COP could not. We have created three objects: object b is an element of sets A and B from the Venn diagram above, object c is an element of sets A and C, while object d is an element of A,B, and C, corresponding to the set D in the diagram. This is where Smash Company was not harsh enough. We want inheritance, but we do not want hierarchical data structures. The real world does not consist of coverings of mutually disjoint sets. In the real world, data is not hierarchical.

So, with the functional approach we've achieved a stronger version of inheritance than COP allows for. We've fully protected and abstracted all our data by using local variables defined inside functions, along with closures, and we don't need any darn polymorphism because our functions are not loitering inside objects pretending to be "methods." Adding more types is still just as easy as inserting more functions. The code is still totally modular and serviceable. And we still ended up with objects as the result of our initialization procedures. My point is that the functional paradigm does object oriented programming better than the dominant COP paradigm.

And yes, in the real world an object can belong to both the class of objects that are "too harsh" and the class of objects that are "not harsh enough."