Separating Hyperplanes
...not just Nate Silver.

I keep seeing these Senate forecasts all over, and can't make heads or tails of them:
"Probability" of GOP winning control of Congress, according to several experts.
Now I'm a fairly competent forecaster, and I don't know what that means. You can't observe a probability of something, and therefore you can't forecast the probability of something. Matt Yglesias thought he was waxing philosophical when he pointed this out, but it's actually a rigorous statistical critique--the "probabilities" we compute in statistical procedures are nothing more than artifacts of the procedure used, not real world objects that can be forecast.

The Bayesians have all finally gone crazy. A forecast consists of a range of outcomes at a given, pre-determined probability level. For example, "we are 95% confident that the GOP will win between 45 and 55 Senate seats" is a forecast. This claim is falsifiable--if the GOP gets fewer than 45 seats or more than 55 seats our forecast is incorrect. This claim has a margin of error. This claim is scientific.

The pollsters, by contrast, have all gotten it exactly backwards by picking a pre-determined outcome--GOP gets 51 Senate seats--and giving us a probability. How do we validate such a claim? We can't even falsify it, because if republicans don't take control of Congress, these pollsters will all simply say their probabilities were correct and that something improbable happened. We aren't even given a margin of error, which is all the more troubling given the huge range--from 55% to 93%--of forecast probabilities!

At the same time, it's obvious why they are reporting probabilities instead of forecasts. It's because, with the possible exception of the Washington Post, none of these forecasts are statistically significant. Science typically requires 95% confidence (or, in the Bayesian world, "probability") in a prediction to be significant. None of these models produces a statistically significant prediction of who will control congress.

A recent paper in AER explains why "experts" would make such a vacuous prediction. They present a model of an "expert" who makes forecasts about the sequence of future events, and a client who tests these forecasts against the data. Even though the client chooses how to test the expert's predictions, it is usually possible for the expert to make a forecast to pass the test even when the expert actually has no expertise whatsoever. In our case, the implicit "test" of the forecasts is whether the actual outcome was likely given their forecasts, which is pretty darn easy to pass even without looking at any polls--the standard scientific standard for an unlikely event is a probability less than 5%, so all of these models are still saying that a non-GOP Congress is a likely outcome. They can't be falsified. Even if we change the significance threshold, the pollsters could simply change their forecasts so that they still can't be falsified.

The alternative to this is claim validation. The AER shows that if you really want to know who is full of BS, make them validate their claims. Make them give an actual prediction of how many GOP congressmen and Senators there will be, and if the actual number is not in their predicted range, call them out for being wrong. Although this is not strictly logical--one can falsify but never truly validate--the authors show that the fact that forecasts are endogenous to the tests we apply to them means that only a standard of validation can actually force forecasters to produce truly falsifiable claims.

Until then, I'm calling them out. All the pollsters are full of crap. That isn't how you do statistics.
10/28/2014 08:54:00 AM
A quick follow-up to my previous post. Austin Frakt pointed me to another study finding that while recessions appear to slightly improve health for working-age adults, mortality and healthcare utilization by Medicare-age seniors rises during recessions. Both Frakt and Jason Shaffrin discuss medical crowding out as a possible cause for this trend:
"Unlike younger adults who seem to use less healthcare as unemployment rates rise, seniors use more inpatient care…the rise in healthcare use may be tied to an increased willingness of healthcare providers to accept Medicare patients."
Ok, that explains healthcare utilization, but not mortality, unless going to the doctor's office increases mortality risk.

And here is yet another trend: men got more vasectomies during this recession.
"In 2006, 3.9 percent of men said they had had a vasectomy; in 2010, 4.4 percent reported having the surgery. That means an additional 150,000 to 180,000 men per year had vasectomies in each year of the recession."
I'd guess the financial hardship reminded a lot of men that having children is expensive.
10/22/2014 07:29:00 AM
From Eagan et al, comparison of actual GDP to health-adjusted GDP, using selective VSL estimates that are way too high.
A few recent studies have me puzzled. The first is Eagan, Mulligan, and Phillipson (2014) on mortality over the business cycle, which documents that the mortality rate is pro-cyclical, with deaths increasing in expansions and decreasing in recessions. The effect was so large in 2009 that the health value of the lives saved offset a majority of the loss to GDP! Eagan et al aren't the first to note this trend and some plausible theories have been proposed to explain it: work-place injuries and commuter traffic accidents, for example, increase when more people have jobs, resulting in higher fatalities. And yet, here's a research letter in JAMA saying that there was a marked increase in Emergency Department visits by children coinciding with the recession in 2009. So did health improve during the recession or not? While you could spin a story about how recessions increase health and increase ED use by, for example, reducing traffic accidents and workplace injuries while also reducing health insurance coverage, which many believe increases ED use (though, see here). Or perhaps this is just an age thing--the Eagan et al result is heavily weighted towards older adults, who are the only demographic with any substantial mortality risk in their data, whereas JAMA looked at children, and if becoming unemployed causes the reduction in mortality, then this explains why the cyclicality would differ between children and adults.

But here's a third article in American Journal of Public Health finding that becoming unemployed significantly increases an individual's mortality risk during recessions but slightly decreases their risk during expansions! So we have a compositional puzzle on our hands:
  1. Employment decreases during recessions, and
  2. Aggregate mortality decreases during recessions, but
  3. loosing employment increases individual mortality risk
So which is it? Do recessions increase health or decrease it? And how can aggregate mortality decrease if individual mortality increases?
10/21/2014 01:26:00 PM
tl;dr() return true;

As a result of a recent promotion, I've been brushing up on my programming skills. By far the most dominant programming paradigm today is "object oriented programming," (OOP) which is a style of coding that aims to represents programs as data structures contained in objects. What is an object? Yes.

According to Smash Company, it turns out that "Object Oriented Programming is an expensive disaster which must end." The article is quite lengthy and well informed, but I think it is at once too harsh on OOP and not harsh enough.

In OOP languages everything in a program is an "object"--your program itself is an object, all of the components of your program are objects, all of the components of those components are objects--"it's objects all the way down" as the saying goes. This is where I think Smash Company was too harsh: the generic and abstract nature of the "objects" in OOP languages makes them extraordinarily flexible, allowing these languages to support a diverse range of programming styles. Rather, they argue against not so much OOP languages themselves as the programming style which is also known as "Object Oriented Programming." I think this is something of a misnomer. The programming style that has come to be associated with OOP should more precisely be called "Class-Oriented Programming" because the central tenant is that all data and functions in your code should be organized in a specific type of object called a class. Indeed, Bjarne Stroustrup, who invented the first popular OOP language originally called his creation "C with classes" instead of the later name "C++."

While you can do Class-Oriented Programming (COP) in languages like Python and Javascript, these are actually more highly general versions of the core COP languages of C++, C#, and Java which more or less torture programmers into adopting the COP style. In COP, the aim is to put all your code into classes that are separate from each other and minimize the amount of code inside the main section of your program. The classes themselves shouldn't do anything other than store your initial data. Instead, to use the data contained in a class the main program will create "instances" of the class as needed at run time, and perform all operations on these instances rather than the classes themselves. The advantage of this is that it prevents data destruction. In a non-COP program, for example, you might start with a dataset, perform some operations that change the dataset to get the desired output, but then be unable to reuse the original, unmodified dataset because it has now been altered. Ideally, a COP design prevents this problem by allowing you to call additional, separate instances of the original dataset (ie, the class) for each separate routine you are performing, without ever modifying the original data. This is called "abstraction," one of the three pillars of COP (note, I'm use the term "abstraction" through out this post generically to refer to a few separate but related OOP concepts: abstraction, encapsulation, and information hiding. Sources disagree on the exact definitions, which aren't really relevant here.).

But more likely than not, your program does not consist of doing a series of identical routines on identical datasets, but instead performs distinct routines on a set of related datasets. You could just make separate classes for each type of dataset and leave it at that. But if the types are related to each other, then they probably have a lot of common features that would require you to copy and paste identical chunks of code, and that's just a pain in the butt, especially when, months from now, someone asks for additional parameters to be included in all the data and you have to change every single dataset type you use. For this, COP offers inheritance. You still have to define as many classes as you have types, but if you create one or more additional parent classes where you put all the common code the types share, you can save time by simply telling the child classes to inherit this code from the parent classes. Indeed, "inheritance" is usually cited as the second pillar of COP.

But sometimes your classes are so similar you just say what the heck and combine them into one by programming the one class to respond differently under different situations, mimicking the multiple class you eliminated. And programmers are so lazy that this "polymorphism" is actually called the third pillar of COP.

So how does all this actually fare in practice? Like Smash Company, I'm inclined to conclude that it fares not well. At it's core, programming is just set theory. And it turns out that this set is too complicated for the COP paradigm:
There are four classes in your basic Venn diagram. A is the superclass, representing all the things that the two subsets B and C have in common. B and C each inherit from A but have their own distinctive features as well. D inherits from both B and C. Unless you use C#, in which case D is just screwed.
This is where I think Smash Company was not harsh enough: inheritance, at least as it is implemented in COP is bad bad bad.

Let's explore why with a simple application. For reasons no one is entirely happy with, my employer uses C#, so we'll use C# to construct a standard 52-card deck of playing cards. The goal is to use the COP paradigm to produce 52 objects representing the 52 playing cards--that the goal is to produce objects means we are already biasing this exercise in favor of COP. Each card bears the identifying features of a playing card--color, rank, and symbol--and does stuff for game play. So let's rewrite that in C# notation:
abstract class Card {
    private string symbol;
    private string rank;
    private string color;
    public string[] deal(){
        return new string[]{color,rank,symbol};
    }
    public void show() {
        Console.Write("\n{0} {1} of {2}", color, rank, symbol);
    }
    public Card(string c,string s,string r) {
        color=c;
        symbol=s;
        rank=r;
    }
}
Ok, so all I've done here is create a dataset describing what a card is like: each card has three data fields called symbol, rank, and color, that are left blank for now, as well as three methods called deal(), show(), and Card(). Technically, there are no functions in C#, and all behaviors are carried out by methods. What is a method? It's a function.

Two of the methods in the class Card, deal() and show(), are used in game play while the third is called a constructor--it's purpose is to create objects of the type Card, and those objects are called "instances" of the Card class. Remember, the code above isn't a card, it creates cards with all of those specified features. We haven't actually created anything yet.

Now, no card player ever says "this is a card that has the color red, rank Jack, and symbol Diamond"--rather we say "this is a red Jack of Diamonds." In COP, anything that fills the blank "is a ____" should be described by a class, while "has a ____" are things that should be properties of the class. So we can divide Card into two subclasses, Red Card and Black Card which are themselves classes:
abstract class Red:Card {
    public Red(string s, string r)
        :base("Red",s,r)
    {
       
    }
}
abstract class Black:Card {
    public Black(string s, string r)
        :base("Black",s,r)
    {
       
    }
}

These two new classes, Red and Black, inherit all the fields and methods of the Card class, but differ in the colors they assign to cards. We can go further, because there are two kinds of Red cards, Diamonds and Hearts, and two kinds of Black cards, Clubs and Spades. Thus we have four more classes, inheriting from Red and Black and, by transitivity, from Card as well:
class Heart : Red
{
    public Heart(string r) : base("Hearts", r) { }
}
class Diamond : Red
{
    public Diamond(string r) : base("Diamonds", r) { }
}
class Spade : Black
{
    public Spade(string r) : base("Spades", r) { }
}
class Club : Black
{
    public Club(string r) : base("Clubs", r) { }
}

Ok, we've still not created any cards. But we have created various classes that describe some of the data that makes up each card, and so far we've done it without having to copy and paste anything, because our inheritance classes let us apply all the common features to their respective cards without needing to mindlessly reuse code. This is good. But now we're stuck. We still have 13 more classes--2 through 10, plus Jack, Queen, King, and Ace--but no way to represent them as classes in the COP paradigm, because of inheritance.
All programming is set theory. The COP paradigm is an incomplete representation of set theory.
A Jack of Hearts, for example, should inherit from the classes Jack and Hearts, but COP does not support instantiating a single object from more than one class. Even if we wanted to create a class called Jack of Hearts, C# does not allow multiple inheritances so class JackOfHearts:Jack,Heartsdoes nothing. It's true that other languages like C++ do allow multiple inheritances at the class level, but this doesn't really help us as it would mean coding 72 separate classes to generate 52 playing cards.

It's true that there are some silly tricks you can do with interfaces here, but they are exactly that: stupid hacks that probably shouldn't work but do. Here's what I will do. Instead of creating classes to represent the actual ranks of the cards, I will create one more class, representing the concept of a deck of cards. This final class will contain the instructions for making all 52 cards from the 7 classes we've already defined. Here it is:
class Deck :List<Card>
{
    private string[] rankList = {"Ace","King","Queen","Jack","10","9","8","7","6","5","4","3","2"};
    public Deck(){
        for(int i=0;i < 13; i++){
            base.Add(new Heart(rankList[i]));
            base.Add(new Diamond(rankList[i]));
            base.Add(new Spade(rankList[i]));
            base.Add(new Club(rankList[i]));
        }
    }
}
What this does is create a class that follows the structure of a List, which is a pre-defined class in C#. The elements of the list will be our 52 card deck.

So that's all the classes. All of the data needed to make a 52 card deck are contained in those 8 classes, and we wrote this without hardly needing to repeat any bits of code. But we still don't have our deck of cards, these classes are merely instructions to the computer about how it would go about making the cards. To actually tell the computer to make them, we have a 9th class--which is common to all C# programs, that contains a static void main() method. When you actually run the application, it will start by running this method, and then following your instructions to all the various classes and methods in the order that you specify within main(). At this point, creating the deck is as simple as Deck myDeck=new Deck();We can now use this deck for card playing. You can imagine doing something fun here, but in our case we will merely read all of the cards off into the console:
for(int i=0; i < myDeck.Count;i++){
   Console.WriteLine(myDeck[i].show());
}

Putting all the code together now:
using System;
using System.Collections.Generic;
namespace cards
{
class Card {
   private string symbol;
   private string rank;
   private string color;
   public string[] deal(){
      return new string[]{color,rank,symbol};
   }
   public void show() {
      Console.Write("\n{0} {1} of {2}", color, rank, symbol);
   }
   public Card(string c,string s,string r) {
      color=c;
      symbol=s;
      rank=r;
   }
}
abstract class Red:Card {
   public Red(string s, string r):base("Red",s,r){}
}
abstract class Black:Card {
   public Black(string s, string r):base("Black",s,r){}
}
class Heart : Red
{
public Heart(string r) : base("Hearts", r) { }
}
class Diamond : Red
{
public Diamond(string r) : base("Diamonds", r) { }
}
class Spade : Black
{
public Spade(string r) : base("Spades", r) { }
}
class Club : Black
{
public Club(string r) : base("Clubs", r) { }
}
class Deck :List<card>
{
   private string[] rankList = { "Ace", "King", "Queen", "Jack", "10", "9", "8", "7", "6", "5", "4", "3", "2" };
   public Deck(){
      for(int i=0;i < 13; i++){
         base.Add(new Heart(rankList[i]));
         base.Add(new Diamond(rankList[i]));
         base.Add(new Spade(rankList[i]));
         base.Add(new Club(rankList[i]));
      }
   }
}

class Program
{
   static void Main(string[] args)
   {
      Deck newdeck=new Deck();
      foreach(Card i in newdeck){
         i.show();
      }
      Console.ReadKey();
   }
}
}





This code has some admirable features. For one thing, we've achieved a significant amount of data abstraction: almost everything in our code is protected, so that computer, programmer, or user errors cannot corrupt the data. The color, rank, and symbol data for all of the cards, for example, are private variables that cannot be altered once the cards are created. Moreover, the Red, Black, and Card classes are all abstract, meaning that we cannot instantiate Red cards that aren't Hearts or Diamonds, for example. The abstraction of this code into modular classes also makes it exceedingly easy to add new types of cards--for example, Green Flowers as a new type of card along side the traditional Hearts, Diamonds, Clubs, and Spades, is as simple as adding Green and Flower:Green classes, and inserting one line into the Deck class. And, at least for our purposes, adding another rank--say, an 11 to follow the traditional 2 through 10, is as easy as inserting "11" into the rankList array.

At the same time, this code represents a complete failure of the COP paradigm to achieve its goal. We aren't actually creating the right objects. We aren't, for example, creating a Red Jack of Hearts, but rather a Red Heart that has a Jack. The only COP way to produce a true Red Jack of Hearts requires more classes than we have cards, and isn't even supported in all the major COP languages. And the level of abstraction that is possible here could be better, since this code gives both the Deck class and the main() method direct access to the Heart, Diamond, Spade, and Club classes, which in reality are abstract. This exposes our code unnecessarily to additional possible sources of error that--in a larger program--could be extremely hard to track down because it could originate anywhere in the code. Moreover, we are calling constructors from inside the Deck class, which is bad according to COP.

The failure of COP to actually be capable of representing a standard 52-card deck is a bit comical. This is programming, after all, and computing repetitive patterns of data in a systematic way is what computers are supposed to be good at. Hardly anything in the real world is as systematically patterned as the 52 card playing deck where, although each card is unique, no card contains unique attributes and all of the attributes are assigned according to fixed, very simple rules. I should be able to tell the computer what these simple rules are, and get it to do the rest.

But isn't a simple rule...wait for it...just a function? YES! In fact, if we adopt a more functional paradigm instead of full-out COP, multiple inheritance is no problem at all. Consider the following generic non-COP language
function setA(obj){
   var obj={};
   obj.prop1= <properties of set A> ;
   return obj;
}
function setB(obj){
   obj.prop2= <properties of set B>;
   return obj;
}
function setC(obj){
   obj.prop3= <properties of set C>;
   return obj;
}
function main(){
   var b=setB(setA());
   var c=setC(setA());
   var d=setC(setB(setA()));
}
var closure=main();
We just constructed what COP could not. We have created three objects: object b is an element of sets A and B from the Venn diagram above, object c is an element of sets A and C, while object d is an element of A,B, and C, corresponding to the set D in the diagram. This is where Smash Company was not harsh enough. We want inheritance, but we do not want hierarchical data structures. The real world does not consist of coverings of mutually disjoint sets. In the real world, data is not hierarchical.

So, with the functional approach we've achieved a stronger version of inheritance than COP allows for. We've fully protected and abstracted all our data by using local variables defined inside functions, along with closures, and we don't need any darn polymorphism because our functions are not loitering inside objects pretending to be "methods." Adding more types is still just as easy as inserting more functions. The code is still totally modular and serviceable. And we still ended up with objects as the result of our initialization procedures. My point is that the functional paradigm does object oriented programming better than the dominant COP paradigm.

And yes, in the real world an object can belong to both the class of objects that are "too harsh" and the class of objects that are "not harsh enough."
10/17/2014 04:41:00 PM