mdlbear: (technonerdmonster)
Back when smalltalk was sports and the weather, And an object was what you could see, And we watched "Captain Video" in black-and-white, Before there was color TV. -- "When I Was a Lad"

Even if you're not a programmer (and most of the people I expect to be reading this aren't) you've probably heard one of your geek friends mentioning "object-oriented" programming. You might even have heard them mention "functional" programming, and wondered how it differs from non-functional programming. The computer curmudgeon can help you.

If you are a programmer, you probably know a lot of this. You may find a few items of historical interest, and it might be useful explaining things to the non-programmers in your life.

Imperative languages

As you may recall from last month's post on computer languages, the first programming languages were assembly languages -- just sequences of things for the computer to do, one after another (with a few side-trips). This kind of language is called "imperative", because each statement is a command telling the computer what to do next: "load a into register 2!" "add register 2 to register 3!" "roll over!" "sit!" You get the idea.

Most programmers use "statement" instead of "command" for these things even though the latter might be more correct from a grammatical point of view; a "command" is something one types at a terminal in order to run a program (which is also called a "command").

Most of the earlier programming languages (with one notable exception that we'll get to later) were also imperative. Fortran, Cobol, and Algol were the most prominent of these; most of today's languages descend more-or-less from Algol. (By the way, the Report on the Algorithmic Language Algol 60 is something I always point to as an example of excellent technical writing.)

Imperative languages make a distinction between statements, which do things (like "a := 2", which puts the number 2 into a variable called "a"), and "expressions", which compute results. Another way of saying that is that an expression has a value (like 2+2, which has a value of 4), and a statement doesn't. In earlier languages that value was always a number; later languages add things like lists of symbols and strings of characters enclosed in quotes.

Algol and its descendents lend themselves to what's called "Structured Programming" -- control structures like conditionals ("if this then {that} else {something-else}") and loops (for x in some-collection do {something}) let you build your program out of simple sections that nest together but always have a single entrance and exit. Notice the braces around those sections. Programming languages have different ways of grouping statements; braces and indentation are probably the most common, but some languages use "if ... fi" and "do ... done".

Languages also have ways of packaging up groups of statements so that they can be used in different places in the program. These are called either "subroutines" or "functions"; if a language has both it means that a function returns a value (like "sqrt(2)", which returns the square root of 2) and a subroutine doesn't (like "print('Hello world!')", which prints a familiar greeting.) Subroutines are statements, and functions are expressions. The expressions in parentheses are called either "arguments" or "parameters" -- I was bemused to find out recently that verbs have arguments, too. (A verb's arguments are the subject, and the direct and indirect objects.)

Object-oriented languages

As long as we're on the subject of objects, ...

In the mid 1960s Ole-Johan Dahl and Kristen Nygaard, working at the University of Oslo, developed a language called Simula, designed for simulating objects and events in the real world. Objects are things like cars, houses, people, and cats, and Simula introduced software objects to represent them. Objects have "attributes": my car is blue, my cats are both female, my wife has purple hair. Some of these attributes are objects in their own right: a car has wheels, a cat has four legs and a person has two legs and two arms. Some of these are variables: sometimes my driveway has a car in it, sometimes it doesn't, and sometimes it has two or three.

What this means is that objects have state. A car's speed and direction vary all the time, as does the position of a cat's tail. An object is a natural way to bundle up a collection of state variables.

An object also can do things. In a program, objects rarely do things on their own, some other object tells them what to do. This is a lot like sending the object a message. "Hey, car -- turn left!" "Hey, cat! What color are your eyes?" Software objects don't usually have any of their state on the outside where other parts of the program can see it; they have to be asked politely. That means that the object might be computing something that looks to the outside like a simple state variable -- a car computes its speed from the diameter of its tires and how fast its wheels are turning. All these things are hidden from the rest of the program.

An object has to have a method for handling any message thrown at it, so the documentation of most languages refers to "calling a method" rather than "sending a message". It's the same thing in most cases. As seen from the outside, calling a method is just like calling a function or subroutine, except that there's an extra argument that represents the object. If you think of the message as a command -- an imperative sentence -- it has an implied subject, which is the object you're talking to. Inside the little program that handles the method, the object is represented by a name that is usually either "self" or "this", depending on the language.

One natural thing to do with objects is to classify them, so objects have "classes", and classes tend to be arranged in hierarchies. My cats are cats -- we say that Ticia and Desti are instances of the class "Cat". All cats are animals, so Cat is a subclass of Animal, as are Dog and Person. (In programming languages, classes tend to have their names capitalized. Variables (like "x" or "cat") almost always start with a lowercase letter. Constants (like "Pi", "True", and "Ticia") can go either way.)

An object's class contains everything the computer needs to determine the behavior of its instances: a list of the state variables, a list of the methods it can handle, and so on. Now we get to the fun part.

Simula was just objects and methods tacked on to a "traditional" imperative language -- Algol in that case. Most other "object-oriented" languages work the same way -- C++ is just objects tacked on to C; Python and Perl had objects from the beginning or close to it, but they still include a lot of things -- numbers, strings, arrays, and so on, that aren't objects or (like strings in Java) are pretending to be objects. It doesn't have to be that complicated.

Shortly after Simula was released, Alan Kay at Xerox PARC realized that you could design a programming language where everything was an object. The result was Smalltalk, and it's still the best example of a pure object-oriented language. (It inspired C++ and Java, and indeed almost all existing OO languages, but very few of them went all the way. Ruby comes closest of the languages I'm familiar with.) Smalltalk is turtles objects all the way down.

In Smalltalk numbers are objects -- you can add new methods to them, or redefine existing methods. (Your program will eventually stop working if you redefine addition so that 2+2 is 5, but it will keep going for a surprisingly long time.) So are booleans. True and False are instances of different subclasses of Boolean, and they behave differently when given messages like ifTrue: or ifFalse:. (I'm not going to go into the details here; Smalltalk warrants an article all by itself). Loops are methods, too. The trickiest bit, though, is that classes are objects. A class is just an instance of Metaclass, and Metaclass is an instance of itself.

Someone trying to implement Smalltalk has to cheat in a couple of places -- the Metaclass weirdness is one of them -- but they have to hide their tracks and not get caught.

Functional languages

Smalltalk, for all its object orientation, is still an imperative language underneath. A program is still a sequence of statements, even if they're all wrapped up in methods. Objects have state -- lots of it; they're just a good way of organizing it. Alan Kay once remarked that programming in Smalltalk was a lot like training a collection of animals to perform tricks together. Sometimes it's more like herding cats.

But it turns out that you don't need statements -- or even state -- at all! All you need is functions.

About the same time Alan Turing was inventing the Turing Machine, which is basically a computer stripped-down and simplified to something that will just barely work, Alonzo Church was developing something he called the Lambda Calculus, which did something similar to the mathematical concept of functions.

A mathematical function takes a set of inputs, and produces an output. The key thing about it is that a given set of inputs will always produce exactly the same output. It's like a meat grinder -- put in beef, and you get ground beef. Put in pork, and you get ground pork. A function has no state, and no side-effects. State would be like a secret compartment in the meat grinder, so that you get a little chicken mixed in with your ground beef, and beef in your ground pork. Ugh. A side effect would be like a hole in the side that lets a little of the ground meat escape.

Objects are all about state and side effects. You can call a method like "turn left" on a car (in most languages that would look like car.turn(left)) and as a side effect it will change the car's direction state to something 90 degrees counterclockwise from where it was. That makes it hard to reason about what's going to happen when you call a particular method. It's even harder with cats.

Anyway, in lambda calculus, functions are things that you can pass around, pass to other functions, and return from functions. They're first class values, just like numbers, strings, and lists. It turns out that you can use functions to compute anything that you can compute with a Turing machine.

Lambda calculus gets its name from the way you represent functions: To define a function that squares a number, for example, we say something like:

square = λ x: x*x

Now, this looks suspiciously like assigning a value to a variable, and that's exactly what it's doing. If you like, you can even think of λ as a funny kind of function that takes a list of symbols and turns it into a function. You find the value of an expression like square(3) by plugging the value 3 into the function's definition in place of x. So,

square(3)
  → (λ x: x*x)(3)
  → 3*3
  → 9

Most functional languages don't allow you to redefine a name once you've bound it to a value; if you have a function like square it wouldn't make much sense to redefine it in another part of the program as x*x*x. This is vary different from imperative languages, where variables represent state and vary all over the place. (Names that represent the arguments of functions are only defined inside any one call to the function, so x can have different values in square(2) and square(3) and nobody gets confused.)

Lambda calculus lends itself to recursive functions -- functions that call themselves -- and recursion is particularly useful when you're dealing with lists. You do something to the first thing on the list, and then combine that with the result of doing the same thing to the rest of the list.Suppose you have a list of numbers, like (6 2 8 3). If you want another list that contains the squares of those numbers, you can use a function called map that takes two arguments: a function and a list. It returns the result of applying the function to each element of the list. So

map(square, (6, 2, 8, 3))
 → (36, 4, 64, 9)

It's pretty easy to define map, too. It's just

map = λ (f, l):
	if isEmpty(l)
           then l
	   else cons(f(first(l)), map(f, rest(l)))

The cons function constructs a new list that consists of its first argument, which can be anything, tacked onto the front of its second argument, which is a list. Using first and rest as the names of the functions that get the first item in a list, and the remainder of the list after the first item, is pretty common tese days. Historically, the function that returns the first item in a list is called car, and the function that returns the rest of the list is called cdr (pronounced sort of like "could'r").

I know, the classic introduction to recursive functions is factorial. Lists are more interesting. Besides, I couldn't resist an excuse for mentioning "car". We'll see "cat" in a future post when I discuss scripting languages.

We call map a "higher-level" function because it takes a function as one of its arguments. Other useful higher-level functions include filter, which takes a boolean function (one that returns true or false) and flatmap, which takes a function that returns lists and flattens them out into into a single list. (Some languages call that concatmap.) Higher-level functions are what make functional languages powerful, beautiful, and easy to program in.

Now, Church's lambda calculus was purely mathematical -- Church used it to prove things about what kinds of functions were computable, and which weren't, which is exactly what Alan Turing did with the Turing Machine. We'll save that for another post, except to point out that Church's lambda calculus can compute anything that a Turing machine can compute, and vice versa.

But in 1958 John McCarthy at MIT invented a simple way of representing lambda expressions as lists, so that they could be processed by a computer: just represent a function and its arguments as a list with the function first, followed by its arguments. Trivial? Yes. But brilliant. His paper included a simple function called eval that takes a list representing a function and its arguments, and returns the result. Steve Russell realized that it wouldn't be hard to implement eval in machine language (on an IBM 704).

I'll save the details for another post, but the result was a language called Lisp. It's the second oldest programming language still in use. The post about LISP will explain how it works (elsewhere I've described eval as The most amazing piece of software in the world), and hopefully show you why Kanef's song The Eternal Flame says that God wrote the universe in LISP. Programmers may want to take a look at "Sex and the Single Link".

And finally,

I'd be particularly interested in seeing comments from the non-programmers among my readers, if I haven't lost you all by now. Was this post interesting? Was it understandable? Was it too long? Were my examples sufficiently silly? Inquiring minds...

Another fine post from The Computer Curmudgeon.

Most Popular Tags

Syndicate

RSS Atom

Style Credit

Page generated 2019-04-23 02:56 am
Powered by Dreamwidth Studios