On object oriented programming

10/17/2014 04:41:00 PM
Tweetable
tl;dr() return true;

As a result of a recent promotion, I've been brushing up on my programming skills. By far the most dominant programming paradigm today is "object oriented programming," (OOP) which is a style of coding that aims to represents programs as data structures contained in objects. What is an object? Yes.

According to Smash Company, it turns out that "Object Oriented Programming is an expensive disaster which must end." The article is quite lengthy and well informed, but I think it is at once too harsh on OOP and not harsh enough.

In OOP languages everything in a program is an "object"--your program itself is an object, all of the components of your program are objects, all of the components of those components are objects--"it's objects all the way down" as the saying goes. This is where I think Smash Company was too harsh: the generic and abstract nature of the "objects" in OOP languages makes them extraordinarily flexible, allowing these languages to support a diverse range of programming styles. Rather, they argue against not so much OOP languages themselves as the programming style which is also known as "Object Oriented Programming." I think this is something of a misnomer. The programming style that has come to be associated with OOP should more precisely be called "Class-Oriented Programming" because the central tenant is that all data and functions in your code should be organized in a specific type of object called a class. Indeed, Bjarne Stroustrup, who invented the first popular OOP language originally called his creation "C with classes" instead of the later name "C++."

While you can do Class-Oriented Programming (COP) in languages like Python and Javascript, these are actually more highly general versions of the core COP languages of C++, C#, and Java which more or less torture programmers into adopting the COP style. In COP, the aim is to put all your code into classes that are separate from each other and minimize the amount of code inside the main section of your program. The classes themselves shouldn't do anything other than store your initial data. Instead, to use the data contained in a class the main program will create "instances" of the class as needed at run time, and perform all operations on these instances rather than the classes themselves. The advantage of this is that it prevents data destruction. In a non-COP program, for example, you might start with a dataset, perform some operations that change the dataset to get the desired output, but then be unable to reuse the original, unmodified dataset because it has now been altered. Ideally, a COP design prevents this problem by allowing you to call additional, separate instances of the original dataset (ie, the class) for each separate routine you are performing, without ever modifying the original data. This is called "abstraction," one of the three pillars of COP (note, I'm use the term "abstraction" through out this post generically to refer to a few separate but related OOP concepts: abstraction, encapsulation, and information hiding. Sources disagree on the exact definitions, which aren't really relevant here.).

But more likely than not, your program does not consist of doing a series of identical routines on identical datasets, but instead performs distinct routines on a set of related datasets. You could just make separate classes for each type of dataset and leave it at that. But if the types are related to each other, then they probably have a lot of common features that would require you to copy and paste identical chunks of code, and that's just a pain in the butt, especially when, months from now, someone asks for additional parameters to be included in all the data and you have to change every single dataset type you use. For this, COP offers inheritance. You still have to define as many classes as you have types, but if you create one or more additional parent classes where you put all the common code the types share, you can save time by simply telling the child classes to inherit this code from the parent classes. Indeed, "inheritance" is usually cited as the second pillar of COP.

But sometimes your classes are so similar you just say what the heck and combine them into one by programming the one class to respond differently under different situations, mimicking the multiple class you eliminated. And programmers are so lazy that this "polymorphism" is actually called the third pillar of COP.

So how does all this actually fare in practice? Like Smash Company, I'm inclined to conclude that it fares not well. At it's core, programming is just set theory. And it turns out that this set is too complicated for the COP paradigm:
There are four classes in your basic Venn diagram. A is the superclass, representing all the things that the two subsets B and C have in common. B and C each inherit from A but have their own distinctive features as well. D inherits from both B and C. Unless you use C#, in which case D is just screwed.
This is where I think Smash Company was not harsh enough: inheritance, at least as it is implemented in COP is bad bad bad.

Let's explore why with a simple application. For reasons no one is entirely happy with, my employer uses C#, so we'll use C# to construct a standard 52-card deck of playing cards. The goal is to use the COP paradigm to produce 52 objects representing the 52 playing cards--that the goal is to produce objects means we are already biasing this exercise in favor of COP. Each card bears the identifying features of a playing card--color, rank, and symbol--and does stuff for game play. So let's rewrite that in C# notation:
abstract class Card {
    private string symbol;
    private string rank;
    private string color;
    public string[] deal(){
        return new string[]{color,rank,symbol};
    }
    public void show() {
        Console.Write("\n{0} {1} of {2}", color, rank, symbol);
    }
    public Card(string c,string s,string r) {
        color=c;
        symbol=s;
        rank=r;
    }
}
Ok, so all I've done here is create a dataset describing what a card is like: each card has three data fields called symbol, rank, and color, that are left blank for now, as well as three methods called deal(), show(), and Card(). Technically, there are no functions in C#, and all behaviors are carried out by methods. What is a method? It's a function.

Two of the methods in the class Card, deal() and show(), are used in game play while the third is called a constructor--it's purpose is to create objects of the type Card, and those objects are called "instances" of the Card class. Remember, the code above isn't a card, it creates cards with all of those specified features. We haven't actually created anything yet.

Now, no card player ever says "this is a card that has the color red, rank Jack, and symbol Diamond"--rather we say "this is a red Jack of Diamonds." In COP, anything that fills the blank "is a ____" should be described by a class, while "has a ____" are things that should be properties of the class. So we can divide Card into two subclasses, Red Card and Black Card which are themselves classes:
abstract class Red:Card {
    public Red(string s, string r)
        :base("Red",s,r)
    {
       
    }
}
abstract class Black:Card {
    public Black(string s, string r)
        :base("Black",s,r)
    {
       
    }
}

These two new classes, Red and Black, inherit all the fields and methods of the Card class, but differ in the colors they assign to cards. We can go further, because there are two kinds of Red cards, Diamonds and Hearts, and two kinds of Black cards, Clubs and Spades. Thus we have four more classes, inheriting from Red and Black and, by transitivity, from Card as well:
class Heart : Red
{
    public Heart(string r) : base("Hearts", r) { }
}
class Diamond : Red
{
    public Diamond(string r) : base("Diamonds", r) { }
}
class Spade : Black
{
    public Spade(string r) : base("Spades", r) { }
}
class Club : Black
{
    public Club(string r) : base("Clubs", r) { }
}

Ok, we've still not created any cards. But we have created various classes that describe some of the data that makes up each card, and so far we've done it without having to copy and paste anything, because our inheritance classes let us apply all the common features to their respective cards without needing to mindlessly reuse code. This is good. But now we're stuck. We still have 13 more classes--2 through 10, plus Jack, Queen, King, and Ace--but no way to represent them as classes in the COP paradigm, because of inheritance.
All programming is set theory. The COP paradigm is an incomplete representation of set theory.
A Jack of Hearts, for example, should inherit from the classes Jack and Hearts, but COP does not support instantiating a single object from more than one class. Even if we wanted to create a class called Jack of Hearts, C# does not allow multiple inheritances so class JackOfHearts:Jack,Heartsdoes nothing. It's true that other languages like C++ do allow multiple inheritances at the class level, but this doesn't really help us as it would mean coding 72 separate classes to generate 52 playing cards.

It's true that there are some silly tricks you can do with interfaces here, but they are exactly that: stupid hacks that probably shouldn't work but do. Here's what I will do. Instead of creating classes to represent the actual ranks of the cards, I will create one more class, representing the concept of a deck of cards. This final class will contain the instructions for making all 52 cards from the 7 classes we've already defined. Here it is:
class Deck :List<Card>
{
    private string[] rankList = {"Ace","King","Queen","Jack","10","9","8","7","6","5","4","3","2"};
    public Deck(){
        for(int i=0;i < 13; i++){
            base.Add(new Heart(rankList[i]));
            base.Add(new Diamond(rankList[i]));
            base.Add(new Spade(rankList[i]));
            base.Add(new Club(rankList[i]));
        }
    }
}
What this does is create a class that follows the structure of a List, which is a pre-defined class in C#. The elements of the list will be our 52 card deck.

So that's all the classes. All of the data needed to make a 52 card deck are contained in those 8 classes, and we wrote this without hardly needing to repeat any bits of code. But we still don't have our deck of cards, these classes are merely instructions to the computer about how it would go about making the cards. To actually tell the computer to make them, we have a 9th class--which is common to all C# programs, that contains a static void main() method. When you actually run the application, it will start by running this method, and then following your instructions to all the various classes and methods in the order that you specify within main(). At this point, creating the deck is as simple as Deck myDeck=new Deck();We can now use this deck for card playing. You can imagine doing something fun here, but in our case we will merely read all of the cards off into the console:
for(int i=0; i < myDeck.Count;i++){
   Console.WriteLine(myDeck[i].show());
}

Putting all the code together now:
using System;
using System.Collections.Generic;
namespace cards
{
class Card {
   private string symbol;
   private string rank;
   private string color;
   public string[] deal(){
      return new string[]{color,rank,symbol};
   }
   public void show() {
      Console.Write("\n{0} {1} of {2}", color, rank, symbol);
   }
   public Card(string c,string s,string r) {
      color=c;
      symbol=s;
      rank=r;
   }
}
abstract class Red:Card {
   public Red(string s, string r):base("Red",s,r){}
}
abstract class Black:Card {
   public Black(string s, string r):base("Black",s,r){}
}
class Heart : Red
{
public Heart(string r) : base("Hearts", r) { }
}
class Diamond : Red
{
public Diamond(string r) : base("Diamonds", r) { }
}
class Spade : Black
{
public Spade(string r) : base("Spades", r) { }
}
class Club : Black
{
public Club(string r) : base("Clubs", r) { }
}
class Deck :List<card>
{
   private string[] rankList = { "Ace", "King", "Queen", "Jack", "10", "9", "8", "7", "6", "5", "4", "3", "2" };
   public Deck(){
      for(int i=0;i < 13; i++){
         base.Add(new Heart(rankList[i]));
         base.Add(new Diamond(rankList[i]));
         base.Add(new Spade(rankList[i]));
         base.Add(new Club(rankList[i]));
      }
   }
}

class Program
{
   static void Main(string[] args)
   {
      Deck newdeck=new Deck();
      foreach(Card i in newdeck){
         i.show();
      }
      Console.ReadKey();
   }
}
}





This code has some admirable features. For one thing, we've achieved a significant amount of data abstraction: almost everything in our code is protected, so that computer, programmer, or user errors cannot corrupt the data. The color, rank, and symbol data for all of the cards, for example, are private variables that cannot be altered once the cards are created. Moreover, the Red, Black, and Card classes are all abstract, meaning that we cannot instantiate Red cards that aren't Hearts or Diamonds, for example. The abstraction of this code into modular classes also makes it exceedingly easy to add new types of cards--for example, Green Flowers as a new type of card along side the traditional Hearts, Diamonds, Clubs, and Spades, is as simple as adding Green and Flower:Green classes, and inserting one line into the Deck class. And, at least for our purposes, adding another rank--say, an 11 to follow the traditional 2 through 10, is as easy as inserting "11" into the rankList array.

At the same time, this code represents a complete failure of the COP paradigm to achieve its goal. We aren't actually creating the right objects. We aren't, for example, creating a Red Jack of Hearts, but rather a Red Heart that has a Jack. The only COP way to produce a true Red Jack of Hearts requires more classes than we have cards, and isn't even supported in all the major COP languages. And the level of abstraction that is possible here could be better, since this code gives both the Deck class and the main() method direct access to the Heart, Diamond, Spade, and Club classes, which in reality are abstract. This exposes our code unnecessarily to additional possible sources of error that--in a larger program--could be extremely hard to track down because it could originate anywhere in the code. Moreover, we are calling constructors from inside the Deck class, which is bad according to COP.

The failure of COP to actually be capable of representing a standard 52-card deck is a bit comical. This is programming, after all, and computing repetitive patterns of data in a systematic way is what computers are supposed to be good at. Hardly anything in the real world is as systematically patterned as the 52 card playing deck where, although each card is unique, no card contains unique attributes and all of the attributes are assigned according to fixed, very simple rules. I should be able to tell the computer what these simple rules are, and get it to do the rest.

But isn't a simple rule...wait for it...just a function? YES! In fact, if we adopt a more functional paradigm instead of full-out COP, multiple inheritance is no problem at all. Consider the following generic non-COP language
function setA(obj){
   var obj={};
   obj.prop1= <properties of set A> ;
   return obj;
}
function setB(obj){
   obj.prop2= <properties of set B>;
   return obj;
}
function setC(obj){
   obj.prop3= <properties of set C>;
   return obj;
}
function main(){
   var b=setB(setA());
   var c=setC(setA());
   var d=setC(setB(setA()));
}
var closure=main();
We just constructed what COP could not. We have created three objects: object b is an element of sets A and B from the Venn diagram above, object c is an element of sets A and C, while object d is an element of A,B, and C, corresponding to the set D in the diagram. This is where Smash Company was not harsh enough. We want inheritance, but we do not want hierarchical data structures. The real world does not consist of coverings of mutually disjoint sets. In the real world, data is not hierarchical.

So, with the functional approach we've achieved a stronger version of inheritance than COP allows for. We've fully protected and abstracted all our data by using local variables defined inside functions, along with closures, and we don't need any darn polymorphism because our functions are not loitering inside objects pretending to be "methods." Adding more types is still just as easy as inserting more functions. The code is still totally modular and serviceable. And we still ended up with objects as the result of our initialization procedures. My point is that the functional paradigm does object oriented programming better than the dominant COP paradigm.

And yes, in the real world an object can belong to both the class of objects that are "too harsh" and the class of objects that are "not harsh enough."