mdlbear: (technonerdmonster)

You may remember this post about renaming the default branch in Git repositories. Since then I've done some script writing -- they say you don't really understand a process until you can write a program that does it, and this was no exception. (There are lots of exceptions, actually, but that's rather beside the point of this post...)

Anyway, here's what I think is the best way to rename master to main in a clone of a repository where that rename has already been done. (That's a common case anywhere you have multiple developers, each with their own clone, or one developer like me who works on a different laptop depending on the time of day and where the cats are sitting.)

     git fetch
     git branch -m master main
     git branch -u origin/main main
     git remote set-head origin main
     git remote prune origin

The interesting part is why this is the best way I've found of doing it: 1. It works even if master isn't the current branch, or if it's out of date or diverged from upstream. 2. It doesn't print extraneous warnings or fail with an error. Neither of those is a problem if you're doing everything manually, but it can be annoying or fatal in a script. So here it is again, with commentary:

git fetch -- you have to do this first, or the git branch -u ... line will fail because git will think you're setting upstream to a branch that doesn't exist on the origin.

git branch -m master main -- note that the renamed branch will still be tracking master. We fix that with...

git branch -u origin/main main -- many of the pages I've seen use git push -u..., but the push isn't necessary and has several different ways it can fail, for example if the current branch isn't main or if it isn't up to date.

git remote set-head origin main -- This sets main as the default branch, so things like git push will work without naming the branch. You can use -a for "automatic" instead of the branch name, but why make git do extra work? Many of the posts I've seen use the following low-level command, which works but isn't very clear and relies on implementation details you shouldn't have to bother with:

    git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main

git remote prune origin -- I've seen people suggesting git fetch --prune, but we already did the fetch way back in step 1. Alternatively, we could use --prune on that first fetch, but then git will complain about master tracking a branch that doesn't exist. It still works, but it's annoying in a script.

Just as an aside because I think it's amusing: my former employer (a large online retailer) used and probably still uses "mainline" for the default branch, and I've seen people suggesting as an alternative to "main". It is, if anything, more jarring than "master" for someone who has previously encountered "mainlining" only in the context of self-administered street drugs.

Another fine post from The Computer Curmudgeon (also at computer-curmudgeon.com).
Donation buttons in profile.

mdlbear: (technonerdmonster)

Today the computer curmudgeon talks about how he blogs.

I think it's widely known that I update this journal (and by that I mean both my Dreamwidth journal, which you are probably reading now, and my Computer Curmudgeon website), using a rickety combination of shell scripts and makefiles called MakeStuff. There are several good reasons for that. (Whether "because I can" is a good reason or not is debatable. There are others.)

A little over a year ago I made a planning post, and the "Where I am now" section remains a pretty good, albeit sketchy, description of the process. There's also an even sketchier one in the README file for MakeStuff/blogging. It had a list of what I wanted to do next, but essentially the only thing I've actually done is posting in either HTML or Markdown.

I have, however, reorganized things a bit, so that all of the relevant scripts are in MakeStuff/blogging -- the last part was moving in the script, now called charm-wrapper, that takes the post's metadata out of its email-like header and turns it into a command line for charm, a livejournal/dreamwidth posting client written in Python. It isn't a very good solution, but it works.

And since I'm running out of time to make a post today, I'm going to start this series with the other utilities in MakeStuff/blogging. And then go add this list to the README.

  • check-html is a simple wrapper for html-tidy, a popular HTML syntax checker. Since it's not actually being used to fix syntax, it can play fast-and-loose with things like the header (it's just text) and blog-specific tags like <cut> and <user>. It handles those by putting a space after the "<" character. It would be trivial to add this to the make recipe for post.
  • last-post is a site-scraper that returns the URL of your most recent post. It's useful, because charm doesn't return it. (I eventually put that functionality into charm-wrapper, but it's still useful. Eventually it wants to take a date on the command line.
  • word-count is what I use to generate the NaBloPoMo statistics at the bottom of posts in November. (In other months, you just get a straight list; you can also get a listing for a whole year.)
NaBloPoMo stats:
     43 2019/11/01--rabbit-rabbit-rabbit.html
   1731 2019/11/02--s4s-memorials.html
   1465 2019/11/03--done-since-1027.html
    145 2019/11/04--i-ought-to-post-something.html
    430 2019/11/05--how-to-makestuff.html
-------
   3814 words in 5 posts this month (average 762/post)
    430 words in 1 post today

Another fine post from The Computer Curmudgeon (also at computer-curmudgeon.com).
Donation buttons in profile.

mdlbear: (technonerdmonster)

Today in my continuing series on programming languages I'm going to talk about "scripting languages". "Scripting" is a rather fuzzy category, because unlike the kinds of languages we've discussed before, scripting languages are really distinguished by how they are used, and they're used in two very different ways. It's also confusing because most scripting languages are interpreted, and people tend to use "scripting" when they should be using "interpreted". In my opinion it's more correct to say that a language is being used as a scripting language, rather than to say that it is a scripting language. As we'll see, this is particularly true when the language is being used to customize some application.

But first, let's define scripts. A script is basically a sequence of commands that a user could type at a terminal[1] -- often called "the command line" -- that have been put together in a file so that they can be run automatically. The script then becomes a new command. In Linux, and before that Unix, the program that interprets user commands is called a "shell", possibly because it's the visible outer layer of the operating system. The quintessential script is a shell script. We'll dive into the details later.

[1] okay, a terminal emulator. Hardly anyone uses physical terminals anymore. Or remembers that the "tty" in /dev/tty stands for "teletype"'.

The second kind of scripting language is used to implement commands inside some interactive program that isn't a shell. (These languages are also called extension languages, because they're extending the capabilities of their host program, or sometimes configuration languages.) Extension languages generally look nothing at all like something you'd type into a shell -- they're really just programming languages, and often are just the programming language the application was written in. The commands of an interactive program like a text editor or graphics editor tend to be things like single keystrokes and mouse gestures, and in most cases you wouldn't want to -- or even be able to -- write programs with them. I'll use "extension languages" for languages used in this way. There's some overlap in between, and I'll talk about that later.

Shell scripting languages

Before there was Unix, there were mainframes. At first, you would punch out decks of Hollerith cards, hand them to the computer operator, and they would (eventually) put it in the reader and push the start button, and you would come back an hour or so later and pick up your deck with a pile of listings from the printer.

Computers were expensive in those days, so to save time the operator would pile a big batch of card decks on top of one another with a couple of "job control" cards in between to separate the jobs. Job control languages were really the first scripting languages. (And the old terminology lingers on, as such things do, in the ".bat" extension of MS/DOS (later Windows) "batch files". Which are shell scripts.)

By far the most sophisticated job control language ran on the Burroughs 5000 and 6000 series computers, which were designed to run Algol very efficiently. (So efficiently that they used Algol as what amounted to their assembly language! Programs in other languages, including Fortran and Cobol, were compiled by first translating them into Algol.) The job control language was a somewhat extended version of Algol in which some variables had files as their values, and programs were simply subroutines. Don't let anyone tell you that all scripting languages are interpreted.

Side note: the Burroughs machines' operating system was called MCP, which stands for Master Control Program. Movie fans may find that name familiar.

Even DOS batch files had control-flow statements (conditionals and loops) and the ability to substitute variables into commands. But these features were clumsy to use. In contrast, the Unix shell written by Stephen Bourne at Bell Labs was designed as a scripting language. The syntax of the control structures was, in fact, derived from Algol 68, which introduced the "if...fi" and "do...done" syntax.

Bourne's shell was called sh in Unix's characteristically terse style. The version of Unix developed at Berkeley, (BSD, for Berkeley System Distribution -- I'll talk about the history of operating systems some time) had a shell called the C shell, csh, which had a syntax derived from the C programming language. That immediately gave rise to the popular tongue-twister "she sells cshs by the cshore".

The GNU (GNU's Not Unix) project, started by Richard Stallman with the goal of producing a completely free replacement for Unix, naturally had its own rewrite of the Bourne Shell called bash -- the Bourne Again Shell. It's a considerable improvement over the original, pulling in features from csh and some other shell variants.

Let's look at shell scripting a little more closely. The basic statement is a command -- the name of a program followed by its arguments, just as you would type it on the command line. If the command isn't one of the few built-in ones, the shell then looks for a file that matches the name of the command, and runs it. The program eventually produces some output, and exits with a result code that indicates either success or failure.

There are a few really brilliant things going on here.

  • Each program gets run in a separate process. Unix was originally a time-sharing operating system, meaning that many people could use the computer at the same time, each typing at their own terminal, and the OS would run all their commands at once, a little at a time.
  • That means that you can pipe the output of one command into the input of another. That's called a "pipeline"; the commands are separated by vertical bars, like | this, so the '|' character is often called "pipe" in other contexts. It's a lot shorter than saying "vertical bar".
  • You can "redirect" the output of a command into a file. There's even a "pipe fitting" command called tee that does both: copies its input into a file, and also passes it along to the next command in the pipeline.
  • The shell uses the command's result code for control -- there's a program called true that does nothing but immediately returns success, and another called false that immediately fails. There's another one, test, which can perform various tests, for example to see whether two strings are equal, or a file is writable. There's an alias for it: [. Unix allows all sorts of characters in filenames. Anyway, you can say things like if [ -w $f ]; then...
  • You can also use a command's output as part of another command line, or put it into a variable. today=`date` takes the result of running the date program and puts it in a variable called today.

This is basically functional programming, with programs as functions and files as variables. (Of course, you can define variables and functions in the shell as well.) In case you were wondering whether Bash is a "real" programming language, take a look at nanoblogger and Abcde (A Better CD Encoder).

Sometime later in this series I'll devote a whole post to an introduction to shell scripting. For now, I'll just show you a couple of my favorite one-liners to give you a taste for it. These are tiny but useful scripts that you might type off the top of your head. Note that comments in shell -- almost all Unix scripting languages, as a matter of fact -- start with an octothorpe. (I'll talk about octothorpe/sharp/hash/pound later, too.)

# wait until nova (my household server) comes back up after a reboot
until ping -c1 nova; do sleep 10; done

# count my blog posts.  wc counts words, lines, and/or characters.
find $HOME/.ljarchive -type f -print | wc -l

# find all posts that were published in January.
# grep prints lines in its input that match a pattern.
find $HOME/.ljarchive -type f -print | grep /01/ | sort

Other scripting languages

As you can see, shell scripts tend to be a bit cryptic. That's partly because shells are also meant to have commands typed at them directly, so brevity is often favored over clarity. It's also because all of the operations that work on files are programs in their own right; they often have dozens of options and were written at different times by different people. The find program is often cited as a good (or bad) example of this -- it has a very different set of options from any other program, because you're trying to express a rather complicated combination of tests on its command line.

Some things are just too complicated to express on a single line, at least with anything resembling readability, so many other programs besides shells are designed to run scripts. Some of the first of these in Unix were sed, the "stream editor", which applies text editing operations to its input, and awk, which splits lines into "fields" and lets you do database-like operations on them. (Simpler programs that also split lines into fields include sort, uniq, and join.)

DOS and Windows look at the last three characters of a program's name (e.g., "exe" for "executable" machine language and "bat" for "batch" scripts) to determine what it contains and how to run it. Unix, on the other hand, looks at the first few characters of the file itself. In particular, if these are "#!" followed by the name of a program (I'm simplifying a little), the file is passed to that program to be run as a script. The "#!" combination is usually pronounced "shebang". This accounts for the popularity of "#" to mark comments -- lines that are meant to be ignored -- in most scripting languages.

The scripting programs we've seen so far -- sh, sed, awk, and some others -- are all designed to do one kind of thing. Shells mostly just run commands, assign variables, and substitute variables into commands, and rely on other programs like find and grep to do most other things. Wouldn't it be nice if one could combine all these functions into one program, and give it a better language to write programs in. The first of these that really took off was Larry Wall's Perl. Like the others it allows you to put simple commands on the command line -- with exactly the same syntax as grep and awk.

Perl's operations for searching and substituting text look just like the ones in sed and grep. It has associative arrays (basically lookup tables) just like the ones in awk. It can run programs and get their results exactly the way sh does, by enclosing them in backtick characters (`...` -- originally meant to be used as left single quotes), and it can easily read lines out of files, mess with them, and write them out. It has has objects, methods, and (more or less) first-class functions. And just like find and the Unix command line, it has a well-earned reputation for scripts that are obscure and hard to read.

You've probably heard Python mentioned. It was designed by Guido van Rossum in an attempt to be a better scripting language Perl, with an emphasis on making programs more readable, easier to write, and easier to maintain. He succeeded. At this point Python has mostly replaced Perl as the most popular scripting language, in addition to being a good first language for learning programming. (Which is the best language for learning is a subject guaranteed to invoke strong opinions and heated discussions; I'll avoid it for now.) I avoided Python for many years, but I'm finally learning it and finding it much better than I expected.

Extension languages

The other major kind of scripting is done to extend a program that isn't a shell. In most cases this will be an interactive program like an editor, but it doesn't have to be. Extensions of this sort may also be called "plugins".

Extension languages are usually small, simple, and interpreted, because nobody wants their text editor (for example) to include something as large and complex as a compiler when its main purpose is defining keyboard shortcuts. There's an exception to this -- sometimes when a program is written in a compiled language, the same language may be used for extensions. In that case the extensions have to be compiled in, which is usually inconvenient, but they can be particularly powerful. I've already written about one such case -- the Xmonad window manager, which is written and configured in Haskell.

Everyone these days has at least heard of JavaScript, which is the scripting language used in web pages. Like most scripting languages, JavaScript has escaped from its enclosure in the browser and run wild, to the point where text editors, whole web browsers, web servers, and so on are built in it.

Other popular extension languages include various kinds of Lisp, Tcl, and Lua. Lua and Tcl were explicitly designed to be embedded in programs. Lua is particularly popular in games, although it has recently turned up in other places, including the TeX typesetting system.

Lisp is an interesting case -- probably its earliest use as an extension language was in the Emacs text editor, which is almost entirely written in it. (To the point where many people say that it's a very good Lisp interpretor, but it needs a better text editor. I'm not one of them: I'm passionately fond of Emacs, and I'll write about it at greater length later on.) Because of its radically simple structure, Lisp is particularly easy to write an interpretor for. Emacs isn't the only example; there are Lisp variants in the Audacity audio workstation and the Autodesk CAD program. I used the one in Audacity for the sound effects in my computer/horror crossover song "Vampire Megabyte".

Emacs, Atom (a text editor written in JavaScript), and Xmonad are good examples of interactive programs where the same language is used for (most, if not all, of) the implementation as well as for the configuration files and the extensions. The boundaries can get very fuzzy in cases like that; as a Mandelbear I find that particularly appealing.

Another fine post from The Computer Curmudgeon.

Most Popular Tags

Syndicate

RSS Atom

Style Credit

Page generated 2025-06-07 03:05 pm
Powered by Dreamwidth Studios