Today in my continuing series on programming languages I'm going to talk
about "scripting languages". "Scripting" is a rather fuzzy category,
because unlike the kinds of languages we've discussed before, scripting
languages are really distinguished by how they are used, and
they're used in two very different ways. It's also confusing because most
scripting languages are interpreted, and people tend to use "scripting"
when they should be using "interpreted". In my opinion it's more correct
to say that a language is being used as a scripting language,
rather than to say that it is a scripting language. As we'll
see, this is particularly true when the language is being used to
customize some application.
But first, let's define scripts. A script is basically a
sequence of commands that a user could type at a terminal[1] --
often called "the command line" -- that have been put together in a file
so that they can be run automatically. The script then becomes a
new command. In Linux, and before that Unix, the program that interprets
user commands is called a "shell", possibly because it's the visible outer
layer of the operating system. The quintessential script is a shell
script. We'll dive into the details later.
[1] okay, a terminal emulator. Hardly anyone uses physical terminals
anymore. Or remembers that the "tty" in /dev/tty
stands for
"teletype"'.
The second kind of scripting language is used to implement commands inside
some interactive program that isn't a shell. (These languages
are also called extension languages, because they're
extending the capabilities of their host program, or sometimes
configuration languages.) Extension languages generally look
nothing at all like something you'd type into a shell -- they're really
just programming languages, and often are just the programming language
the application was written in. The commands of an interactive program
like a text editor or graphics editor tend to be things like single
keystrokes and mouse gestures, and in most cases you wouldn't want to --
or even be able to -- write programs with them. I'll use "extension
languages" for languages used in this way. There's some overlap in
between, and I'll talk about that later.
Shell scripting languages
Before there was Unix, there were mainframes. At first, you would punch
out decks of Hollerith cards, hand them to the computer operator, and they
would (eventually) put it in the reader and push the start button, and you
would come back an hour or so later and pick up your deck with a pile of
listings from the printer.
Computers were expensive in those days, so to save time the
operator would pile a big batch of card decks on top of one another with
a couple of "job control" cards in between to separate the jobs. Job
control languages were really the first scripting languages. (And the old
terminology lingers on, as such things do, in the ".bat" extension of
MS/DOS (later Windows) "batch files". Which are shell scripts.)
By far the most sophisticated job control language ran on the Burroughs
5000 and 6000 series computers, which were designed to run Algol very
efficiently. (So efficiently that they used Algol as what amounted to
their assembly language! Programs in other languages, including Fortran
and Cobol, were compiled by first translating them into Algol.) The job
control language was a somewhat extended version of Algol in which some
variables had files as their values, and programs were simply
subroutines. Don't let anyone tell you that all scripting languages are
interpreted.
Side note: the Burroughs machines' operating system was called MCP, which
stands for Master Control Program. Movie fans may find that name familiar.
Even DOS batch files had control-flow statements (conditionals and loops)
and the ability to substitute variables into commands. But these features
were clumsy to use. In contrast, the Unix shell written by Stephen Bourne
at Bell Labs was designed as a scripting language. The syntax of
the control structures was, in fact, derived from Algol 68, which
introduced the "if...fi" and "do...done" syntax.
Bourne's shell was called sh
in Unix's characteristically
terse style. The version of Unix developed at Berkeley, (BSD, for
Berkeley System Distribution -- I'll talk about the history of operating
systems some time) had a shell called the C shell, csh
, which
had a syntax derived from the C programming language. That immediately
gave rise to the popular tongue-twister "she sells cshs by the cshore".
The GNU (GNU's Not Unix) project, started by Richard Stallman with the
goal of producing a completely free replacement for Unix, naturally had
its own rewrite of the Bourne Shell called bash
-- the Bourne
Again Shell. It's a considerable improvement over the original, pulling
in features from csh
and some other shell variants.
Let's look at shell scripting a little more closely. The basic statement
is a command -- the name of a program followed by its arguments, just as
you would type it on the command line. If the command isn't one of the
few built-in ones, the shell then looks for a file that matches the name
of the command, and runs it. The program eventually produces some output,
and exits with a result code that indicates either success or failure.
There are a few really brilliant things going on here.
- Each program gets run in a separate process. Unix was originally a
time-sharing operating system, meaning that many people
could use the computer at the same time, each typing at their own
terminal, and the OS would run all their commands at once, a little at
a time.
- That means that you can pipe the output of one command into the input
of another. That's called a "pipeline"; the commands are separated by
vertical bars, like | this, so the '|' character is often called "pipe"
in other contexts. It's a lot shorter than saying "vertical bar".
- You can "redirect" the output of a command into a file. There's even a
"pipe fitting" command called
tee
that does both: copies
its input into a file, and also passes it along to the next command in
the pipeline.
- The shell uses the command's result code for control -- there's a
program called
true
that does nothing but immediately
returns success, and another called false
that immediately
fails. There's another one, test
, which can perform
various tests, for example to see whether two strings are equal, or a
file is writable. There's an alias for it: [
. Unix
allows all sorts of characters in filenames. Anyway, you can say
things like if [ -w $f ]; then...
- You can also use a command's output as part of another command line, or
put it into a variable.
today=`date`
takes the result of
running the date
program and puts it in a variable called
today
.
This is basically functional programming, with programs as functions and
files as variables. (Of course, you can define variables and functions in
the shell as well.) In case you were wondering whether Bash is a "real"
programming language, take a look at nanoblogger and
Abcde (A Better CD Encoder).
Sometime later in this series I'll devote a whole post to an introduction
to shell scripting. For now, I'll just show you a couple of my favorite
one-liners to give you a taste for it. These are tiny but useful scripts
that you might type off the top of your head. Note that comments in shell
-- almost all Unix scripting languages, as a matter of fact -- start with
an octothorpe. (I'll talk about octothorpe/sharp/hash/pound later, too.)
# wait until nova (my household server) comes back up after a reboot
until ping -c1 nova; do sleep 10; done
# count my blog posts. wc counts words, lines, and/or characters.
find $HOME/.ljarchive -type f -print | wc -l
# find all posts that were published in January.
# grep prints lines in its input that match a pattern.
find $HOME/.ljarchive -type f -print | grep /01/ | sort
Other scripting languages
As you can see, shell scripts tend to be a bit cryptic. That's partly
because shells are also meant to have commands typed at them directly, so
brevity is often favored over clarity. It's also because all of the
operations that work on files are programs in their own right;
they often have dozens of options and were written at different times by
different people. The find
program is often cited as a good
(or bad) example of this -- it has a very different set of options from
any other program, because you're trying to express a rather complicated
combination of tests on its command line.
Some things are just too complicated to express on a single line, at least
with anything resembling readability, so many other programs besides
shells are designed to run scripts. Some of the first of these in Unix
were sed
, the "stream editor", which applies text editing
operations to its input, and awk
, which splits lines into
"fields" and lets you do database-like operations on them. (Simpler
programs that also split lines into fields include sort
,
uniq
, and join
.)
DOS and Windows look at the last three characters of a program's name
(e.g., "exe" for "executable" machine language and "bat" for "batch"
scripts) to determine what it contains and how to run it. Unix, on the
other hand, looks at the first few characters of the file itself. In
particular, if these are "#!
" followed by the name of a
program (I'm simplifying a little), the file is passed to that program to
be run as a script. The "#!
" combination is usually
pronounced "shebang". This
accounts for the popularity of "#
" to mark comments -- lines
that are meant to be ignored -- in most scripting languages.
The scripting programs we've seen so far -- sh
,
sed
, awk
, and some others -- are all designed to
do one kind of thing. Shells mostly just run commands, assign variables,
and substitute variables into commands, and rely on other programs like
find
and grep
to do most other things. Wouldn't
it be nice if one could combine all these functions into one program, and
give it a better language to write programs in. The first of these that
really took off was Larry Wall's Perl. Like the others it
allows you to put simple commands on the command line -- with exactly
the same syntax as grep and awk.
Perl's operations for searching and substituting text look just like the
ones in sed
and grep
. It has associative arrays
(basically lookup tables) just like the ones in awk
. It can
run programs and get their results exactly the way sh
does,
by enclosing them in backtick characters (`...` -- originally meant to be
used as left single quotes), and it can easily read lines out of files,
mess with them, and write them out. It has has objects, methods, and
(more or less) first-class functions. And just like find
and
the Unix command line, it has a well-earned reputation for scripts that
are obscure and hard to read.
You've probably heard Python mentioned. It was designed by Guido van Rossum in an attempt
to be a better scripting language Perl, with an emphasis on making
programs more readable, easier to write, and easier to maintain. He
succeeded. At this point Python has mostly replaced Perl as the most
popular scripting language, in addition to being a good first language for
learning programming. (Which is the best language for learning
is a subject guaranteed to invoke strong opinions and heated discussions;
I'll avoid it for now.) I avoided Python for many years, but I'm finally
learning it and finding it much better than I expected.
Extension languages
The other major kind of scripting is done to extend a program that
isn't a shell. In most cases this will be an interactive program
like an editor, but it doesn't have to be. Extensions of this sort may
also be called "plugins".
Extension languages are usually small, simple, and interpreted, because
nobody wants their text editor (for example) to include something as large
and complex as a compiler when its main purpose is defining keyboard
shortcuts. There's an exception to this -- sometimes when a program is
written in a compiled language, the same language may be used for
extensions. In that case the extensions have to be compiled in, which is
usually inconvenient, but they can be particularly powerful. I've already
written about one such case -- the Xmonad window
manager, which is written and configured in Haskell.
Everyone these days has at least heard of JavaScript, which is
the scripting language used in web pages. Like most scripting languages,
JavaScript has escaped from its enclosure in the browser and run wild, to
the point where text editors, whole web browsers, web servers,
and so on are built in it.
Other popular extension languages include various kinds of Lisp, Tcl, and
Lua. Lua and Tcl were explicitly designed to be embedded in
programs. Lua is particularly popular in games, although it has recently
turned up in other places, including the TeX typesetting system.
Lisp is an interesting case -- probably its earliest use as an extension
language was in the Emacs text editor, which is almost entirely written in
it. (To the point where many people say that it's a very good Lisp
interpretor, but it needs a better text editor. I'm not one of them: I'm
passionately fond of Emacs, and I'll write about it at greater length
later on.) Because of its radically simple structure, Lisp is
particularly easy to write an interpretor for. Emacs isn't the only
example; there are Lisp variants in the Audacity audio workstation and the
Autodesk CAD program. I used the one in Audacity for the sound effects in
my computer/horror crossover song "Vampire Megabyte".
Emacs, Atom (a text editor written in JavaScript), and Xmonad are good
examples of interactive programs where the same language is used for
(most, if not all, of) the implementation as well as for the configuration
files and the extensions. The boundaries can get very fuzzy in cases like
that; as a Mandelbear I find that particularly appealing.
Another fine post from
The Computer Curmudgeon.