mdlbear: (technonerdmonster)
2019-01-23 01:52 pm
mdlbear: (technonerdmonster)
2019-01-18 09:43 pm
Entry tags:

Yet Another Data Breach

It's getting so that data breaches aren't news anymore unless they're huge. The Gizmodo article calls it The Mother of All Breaches, exposing 773 million email addresses and 21 million passwords. There's a more complete post by Troy Hunt: The 773 Million Record "Collection #1" Data Breach. Hunt is the person behind the Have I Been Pwned website. That should be your next stop -- it lets you check to see which of your email addresses, usernames, and passwords have appeared in any data breach.

If your password shows up in Pwned Passwords, stop using it. Consider enabling two-factor authentication where you can, and getting a password vault. Hunt recommends 1Password. If you want open source, you can try KeePassX.

Another fine post from The Computer Curmudgeon (also at

mdlbear: (technonerdmonster)
2019-01-07 08:20 pm

Privacy Tips

Some day I ought to put together a comprehensive list of privacy-related links. This is not that list; it's just a few of the links that came my way recently, in no particular order.

I'd suggest starting with the ACLU's What Individuals Should Do Now That Congress Has Obliterated the FCC’s Privacy Protections. It's a good overview.

DuckDuckGo is my current privacy-preserving search engine of choice. The DuckDuckGo Blog has been a good source of additional information. I especially recommend this article on How to Set Up Your Devices for Privacy Protection -- it has advice for iOS, Android, Mac, Windows 10 and 7, and Linux. Also check out a broader range of tips here.

The Electronic Frontier Foundation, as you might expect, is another great source of information. I suggest starting with Tools from EFF's Tech Team. While you're there, install Privacy Badger. It's not exactly an ad blocker; what it does is block trackers.

Here's an article on Which Browser Is Better for Privacy? (Spoiler: it's Firefox.) Then go to Firefox Privacy - The Complete How-To Guide.

For the paranoid among us, there are few things better than Tor Browser. If you use it, you'll probably want to turn off Javascript as well.

The Linux Journal's article on Data Privacy: Why It Matters and How to Protect Yourself has a lot of good advice, most of which isn't Linux-specific at all.

However, if you are running Linux, you'll want to look at How To Encrypt Your Home Folder After Ubuntu Installation, Locking down and securing SSH access to your server, and Own Your DNS Data.

Another fine post from The Computer Curmudgeon (also at

mdlbear: (technonerdmonster)
2018-12-28 06:10 pm
Entry tags:

PSA: WA Emergency Numbers

From :

In case of an emergency and you can't get through by dialing 911, you can dial the following numbers for dispatch centers:

Chelan/Douglas County 911 Countywide 911 Center for Police and Fire (509) 663-9911 Clallam County 911 Countywide 911 Center for Police and Fire 360-417-2259/2459 or 360-417-4970 Grays Harbor 911 Countywide 911 Center for Police and Fire (800) 281-6944 Island County 911 Countywide 911 Center for Police and Fire (360) 678-6116 Jefferson County 911 Countywide 911 Center for Police and Fire 360-385-3831 or 360-344-9779 EXT. 0 or text 911 King County 911 Bothell Police (425) 486-1254 Enumclaw Police (360) 417-2259 Lake Forrest Park Police (425) 486-1254 Issaquah Police (425) 837- 3200 Redmond Police (425) 556-2500 Snoqualmie Police (425) 888-3333 Seattle Police (206) 625-5011 Seattle Fire (206) 583-2111 Norcom (425) 577-5656 Fire Departments – Bellevue FD, Bothell FD, Duvall FD, Eastside Fire and Rescue, Fall City FD, Kirkland FD, Mercer Island FD, Northshore FD, Redmond FD, Shoreline FD, Skykomish FD, Snoqualmie FD, Snoqualmie Pass Fire and Rescue and Woodinville Fire and Rescue Police Departments – Bellevue PD, Clyde Hill PD, Medina PD, Kirkland PD, Mercer Island PD and Normandy Park Police. Valley Com (253) 852-2121 Fire Departments - Valley Regional Fire Authority (Algona, Pacific and Auburn), South King Fire and Rescue (Federal Way and Des Moines), Puget Sound Regional Fire Authority (Kent, Seatac, Covington and Maple Valley), Tukwila FD, Renton FD, Burien /Normandy Park FD, Skyway Fire, Mountain View Fire and Rescue, Palmer Selleck Fire Districts, Vashon Island Fire and Rescue, Enumclaw FD, King County Airport (Boeing Field) and King County Medic One Police Departments - Algona PD, Pacific PD, Auburn PD, Des Moines PD, Federal Way PD, Kent PD, Renton PD and Tukwila PD. King County Sheriff’s Office (206) 296-3311 Town of Beaux Arts, City of Burien, City of Carnation, City of Covington, City of Kenmore, King County Airport Police (Boeing Field), City of Maple Valley, King County Metro Transit, Muckleshoot Indian Tribe, City of Newcastle, City of Sammamish, City of Seatac, City of Shoreline, Town of Skykomish, Sound Transit and City of Woodinville. Kitsap County 911 Countywide 911 Center for Police and Fire (360)-308-5400 Kittitas County 911 Lower County: 509 925 8534 Upper County: 509 674 2584, select 1, then select 1 for KITTCOM Lewis County 911 Countywide 911 Center for Police and Fire (360) 740-1105 Mason County 911 Countywide 911 Center for Police and Fire (360) 426-4441 Pacific County 911 Countywide 911 Center for Police and Fire (360) 875-9397 Pierce County 911 Countywide 911 Center for Police and Fire (253) 798-4722 *Except Tacoma, Fircrest, Fife and Ruston - call Tacoma Fire Dispatch (253)627-0151 San Juan County 911 Countywide 911 Center for Police and Fire (360) 378-4151 Skagit County 911 Countywide 911 Center for Police and Fire (360) 428-3211 Snohomish County 911 Countywide 911 Center for Police and Fire (425) 407-3999 Thurston County 911 Countywide 911 Center for Police and Fire (360) 704-2740 Whatcom County 911 Whatcom County Fire (360) 676-6814 Whatcom County Sheriff (360) 676-6911

Another fine post from The Computer Curmudgeon (also at

mdlbear: (technonerdmonster)
2018-12-28 02:02 pm
Entry tags:

PSA: Centurylink outage

Last night around 11pm I was awakened by an alert on my phone telling me that 911 service was down, and giving me an alternat number to call. By morning, it was clear that it wasn't a local problem. A quick search showed that the problem was caused by CenturyLink, which tweeted, blaming it on a a network element that was impacting customer services and saying that they estimated it would be fixed in about four hours.

It was more like twelve here on Whidbey Island, and some parts of the country are still (as of 2pm) offline, according to Outage.Report. The FCC is investigating.

If you live in Washington, has a handy list of numbers to call, by county. (The news article also has auto-playing video - you may want to mute your speakers.)

Notes & links, as usual )

Another fine post from The Computer Curmudgeon (also at

mdlbear: (technonerdmonster)
2018-11-12 04:53 pm


Most humans multitask rather badly -- studies have shown that when one tries to do two tasks at the same time, both tasks suffer. That's why many states outlaw using a cell phone while driving. Some people are much better than others at switching between tasks, especially similar tasks, and so give the appearance of multitasking. There is still a cost to switching context, though. The effect is much less if one of the tasks requires very little attention, knitting during a conversation, or sipping coffee while programming. (Although I have noticed that if I get deeply involved in a programming project my coffee tends to get cold.) It may surprise you to learn that computers have the same problem.

Your computer isn't really responding to your keystrokes and mouse clicks, playing a video from YouTube in one window while running a word processor in another, copying a song to a thumb drive, fetching pages from ten different web sites, and downloading the next Windows update, all at the same time. It's just faking it by switching between tasks really fast. (That's only partially true. We'll get to that part later, so if you already know about multi-core processors and GPUs, please be patient. Or skip ahead. Like a computer, my output devices can only type one character at a time.)

Back when computers weighed thousands of pounds, cost millions of dollars, and were about a million times slower than they are now, people started to notice that their expensive machines were idle a lot of the time -- they were waiting for things to happen in the "real world", and when the computer was reading in the next punched card it wasn't getting much else done. As computers got faster -- and cheaper -- the effect grew more and more noticable, until some people realized that they could make use of that idle time to get something else done. The first operating systems that did this were called "foreground/background" systems -- they used the time when the computer was waiting for I/O to switch to a background task that did something that did a lot of computation and not much I/O.

Once when I was in college I took advantage of the fact that the school's IBM 1620 was just sitting there most of the night to write a primitive foreground/background OS that consisted of just two instructions and a sign. The instructions dumped the computer's memory onto punched cards and then halted. The sign told whoever wanted to use the computer to flip a switch, wait for the dump to be punched out, and load it back in when they were done with whatever they were doing. I got a solid week of computation done. (It would take much less than a second on your laptop or even your phone, but we had neither laptop computers nor cell phones in 1968.)

By the end of the 1950s computers were getting fast enough, and had enough memory, that people could see where things were headed, and several people wrote papers describing how one could time-share a large, fast computer among several people to give them each the illusion that they had a (perhaps somewhat less powerful) computer all to themselves. The users would type programs on a teletype machine or some other glorified typewriter, and since it takes a long time for someone to type in a program or make a change to it, the computer had plenty of time to do actual work. The first such systems were demonstrated in 1961.

I'm going to skip over a lot of the history, including minicomputers, which were cheap enough that small colleges could afford them (Carleton got a PDP-8 the year after I graduated). Instead, I'll say a little about how timesharing actually works.

A computer's operating system is there to manage resources, and in a timesharing OS the goal is to manage them fairly, and switch contexts quickly enough for users to think that they're using the whole machine by themselves. There are three main resources to manage: time (on the CPU), space (memory), and attention (all those users typing at their keyboards).

There are two ways to manage attention: polling all of the attached devices to see which ones have work to do, and letting the devices interrupt whatever was going on. If only a small number of devices need attention, it's a lot more efficient to let them interrupt the processor, so that's how almost everything works these days.

When an interrupt comes in, the computer has to save whatever it was working on, do whatever work is required, and then put things back the way they were and get back to what it was doing before. This takes time. So does writing about it, so I'll just mention it briefly before getting back to the interesting stuff.

See what I did there? This is a lot like what I'm doing writing this post, occasionally switching tasks to eat lunch, go shopping, sleep, read other blogs, or pet the cat that suddenly sat on my keyboard demanding attention.

Let's look at time next. The computer can take advantage of the fact that many programs perform I/O to use the time when it's waiting for an I/O operation to finish to look around and see whether there's another program waiting to run. Another good time to switch is when an interrupt comes in -- the program's state already has to be saved to handle the interrupt. There's a bit of a problem with programs that don't do I/O -- these days they're usually mining bitcoin. So there's a clock that generates an interrupt every so often. In the early days that used to be 60 times per second (50 in Britain); a sixtieth of a second was sometimes called a "jiffy". That way of managing time is often called "time-slicing".

The other way of managing time is multiprocessing: using more than one computer at the same time. (Told you I'd get to that eventually.) The amount of circuitry you can put on a chip keeps increasing, but the amount of circuitry required to make a CPU (a computer's Central Processing Unit) stays pretty much the same. The natural thing to do is to add another CPU. That's the point at which CPUs on a chip started being called "cores"; multi-core chips started hitting the consumer market around the turn of the millennium.

There is a complication that comes in when you have more than one CPU, and that's keeping them from getting in one another's way. Think about what happens when you and your family are making a big Thanksgiving feast in your kitchen. Even if it's a pretty big kitchen and everyone's working on a different part of the counter, you're still occasionally going to have times when more than one person needs to use the sink or the stove or the fridge. When this happens, you have to take turns or risk stepping on one another's toes.

You might think that the simplest way to do that is to run a completely separate program on each core. That works until you have more programs than processors, and it happens sooner than you might think because many programs need to do more than one thing at a time. Your web browser, for example, starts a new process every time you open a tab. (I am not going to discuss the difference between programs, processes, and threads in this post. I'm also not going to discuss locking, synchronization, and scheduling. Maybe later.)

The other thing you can do is to start adding specialized processors for offloading the more compute-intensive tasks. For a long time that meant graphics -- a modern graphics card has more compute power than the computer it's attached to, because the more power you throw at making pretty pictures, the better they look. Realistic-looking images used to take hours to compute. In 1995 the first computer-animated feature film, Toy Story, was produced on a fleet of 117 Sun Microsystems computers running around the clock. They got about three minutes of movie per week.

Even a mediocre graphics card can generate better-quality images at 75 frames per second. It's downright scary. In fairness, most of that performance comes from specialization. Rather than being general-purpose computers, graphics cards mostly just do the computations required for simulating objects moving around in three dimensions.

The other big problem, in more ways than one, is space. Programs use memory, both for code and for data. In the early days of timesharing, if a program was ready to run that didn't fit in the memory available, some other program got "swapped out" onto disk. All of it. Of course, memory wasn't all that big at the time -- a megabyte was considered a lot of memory in those days -- but it still took a lot of time.

Eventually, however, someone hit on the idea of splitting memory up into equal-sized chunks called "pages". A program doesn't use all of its memory at once, and most operations tend to be pretty localized. So a program runs until it needs a page that isn't in memory. The operating system then finds some other page to evict -- usually one that hasn't been used for a while. The OS writes out the old page (if it has to; if it hasn't been modified and it's still around in swap space, you win), and schedules the I/O operation needed to read the new page in. And because that take a while, it goes off and runs some other program while it's waiting.

There's a complication, of course: you need to keep track of where each page is in what its program thinks of as a very simple sequence of consecutive memory locations. That means you need a "page table" or "memory map" to keep track of the correspondence between the pages scattered around the computer's real memory, and the simple virtual memory that the program thinks it has.

There's another complication: it's perfectly possible (and sometimes useful) for a program to allocate more virtual memory than the computer has space for in real memory. And it's even easier to have a collection of programs that, between them, take up more space than you have.

As long as each program only uses a few separate regions of its memory at a time, you can get away with it. The memory that a program needs at any given time is called its "working set", and with most programs it's pretty small and doesn't jump around too much. But not every program is this well-behaved, and sometimes even when they are there can be too many of them. At that point you're in trouble. Even if there is plenty of swap space, there isn't enough real memory for every program to get their whole working set swapped in. At that point the OS is frantically swapping pages in and out, and things slow down to a crawl. It's called "thrashing". You may have noticed this when you have too many browser tabs open.

The only things you can do when that happens are to kill some large programs (Firefox is my first target these days), or re-boot. (When you restart, even if your browser restores its session to the tabs you had open when you stopped it, you're not in trouble again because it only starts a new process when you look at a tab.)

And at this point, I'm going to stop because I think I've rambled far enough. Please let me know what you think of it. And let me know which parts I ought to expand on in later posts. Also, tell me if I need to cut-tag it.

Another fine post from The Computer Curmudgeon (also at If you found it interesting or useful, you might consider using one of the donation buttons on my profile page.

NaBloPoMo stats:
   8632 words in 13 posts this month (average 664/post)
   2035 words in 1 post today

mdlbear: (technonerdmonster)
2018-11-02 07:39 pm
Entry tags:

Learn Enough to be Dangerous

Recently I started reading this Ruby on Rails Tutorial by Michael Hartl. It's pretty good; very hands-on, and doesn't assume that you know Ruby (that's a programming language; Rails is a web development framework). It does assume that you know enough about software development and web technology to be dangerous. And if you're not dangerous yet,...

It points you at a web site where you can learn enough to be dangerous. Starting from knowing nothing at all.

It's the author's contention that Tech is the new literacy [and] [l]earning the basics of programming is only one piece of the puzzle. LearnEnough to Be Dangerous teaches [you] to code as well as a much more powerful skill: technical sophistication. Part of that technical sophistication is knowing how to look things up or figure things out when you don't know them.

There are seven volumes in the series leading up to the Rails tutorial, giving you an introductory course in software development. I haven't gone to a bootcamp, but I'd guess that this is roughly the equivalent. More importantly, by the end of this series you'll be able to work through and understand just about any of the thousands of free tutorials on the web, and more importantly you'll have learned how to think and work like a software developer.

The first three tutorials lay the groundwork: Learn Enough Command Line..., Learn Enough Text Editor..., and Learn Enough Git to Be Dangerous. With just those, you'll know enough to set up a simple website -- and you do, on GitHub Pages. You'll also end up with a pretty good Linux or MacOS development environment (even if you're using Windows).

I have a few quibbles -- the text editor book doesn't mention Emacs, and the author is clearly a Mac user. (You don't need a tutorial on Emacs, because it has one built in -- along with a complete set of manuals. So you'll be able to try it on your own.)

The next three books are Learn Enough HTML to Be Dangerous, Learn Enough CSS & Layout, and Learn Enough JavaScript. The JavaScript is a real introduction to programming -- you'll also learn how to write tests, and of course you'll also know how to use version control, from the git tutorial.

At this point I have to admit that after starting the Ruby tutorial I went back and skimmed through the others; I'll probably want to take a closer look at the JavaScript tutorial to see if I've missed anything in my somewhat haphazard journey toward front-end web development.

The next book in the series is Learn Enough Ruby to Be Dangerouse. (If you skip it on your way to the Rails tutorial, there's a quick introduction there as well.) Ruby seems like a good choice for a second language, and learning a second programming language is important because it lets you see which ideas and structures are fundamental, and which aren't. (There's quite a lot of that about JavaScript -- it's poorly-designed in many ways, and some things about it are quite peculiar.)

Another good second or third programming language would be Python. If you'd like to go there next, or start from the beginning with Python, I can recommend Django Girls and their Tutorial. This is another from-the-ground-up introduction to web development, so of course there's a lot of overlap in the beginning.

Another fine post from The Computer Curmudgeon (also at

NaBloPoMo stats: 593 words in this post, 1172 words in 3 posts this month.

mdlbear: (technonerdmonster)
2018-10-22 05:11 pm

Further Adventures in Hyperspace

You may remember from my previous post about Hyperviewer that I’d been plagued by a mysterious bug. The second time the program tried to make a simplex (the N-dimensional version of a triangle (N=2) or tetrahedron (N=3), a whole batch of “ghost edges” appeared and the program (quite understandably) blew up. I didn’t realize it until somewhat later, but there were ghost vertices as well, and that was somewhat more fundamental. Basically, nVertices, the field that holds the number of vertices in the polytope, was wildly wrong.

Chasing ghosts

Eventually I narrowed things down to someplace around here, which is where things stood at the end of the previous post.

1        let vertices = [];
2        /* something goes massively wrong, right here. */
3        for (let i = 0; i < dim; ++i) {
4            vertices.push(new vector(dim).fill((j) => i === j? 1.0 : 0));
5        }

I found this by throwing an error, with a big data dump, right in the middle if nVertices was wrong (it’s supposed to be dim+), or if the length of the list of vertices was different from nVertices.

 1        let vertices = [];
 2        /* something goes massively wrong, right here. */
 3        if (this.nVertices !== (dim + 1) || this.nEdges !== ((dim + 1) * dim / 2) ||
 4            this.vertices.length !== 0 || this.edges.length !== 0 ) {
 5            throw new Error("nEdges = " + this.nEdges + " want " +  ((dim + 1) * dim / 2) +
 6                            "; nVertices = " + this.nVertices + " want " + dim +
 7                            "; vertices.length = " + this.vertices.length +
 8                            ' at this point in the initialization, where dim = ' + dim +
 9                            " in " + this.dimensions + '-D ' + 
10                           );
11        } 
12        for (let i = 0; i < dim; ++i) {

It appeared that nVertices was wildly wrong at that point. If I’d looked carefully and thought about what nVertices actually was, I would probably have found the bug at that point. Or even earlier. Instead, what clinched it was this:

 1        this.vertices = vertices;
 2        this.nVertices = vertices.length;  // setting this to dim+1 FAILS:
 3        // in other words, this.nVertices is getting changed between these two statements!
 4        if (this.vertices.length !== this.nVertices || this.edges.length !== 0) {
 5            throw new Error("expect " + this.nVertices + " verts, have " + this.vertices.length +
 6                            " in " + this.dimensions + '-D ' + +
 7                            "; want " + this.nEdges + " edges into " + this.edges.length
 8                           );
 9        }

The code that creates the list of vertices produces the right number of vertices. If I set nVertices equal to the length of that list, everything was fine.

If instead I set

1        this.nVertices = dim+1;

it was wrong. Huh? For example, in four dimensions, the number of vertices is supposed to be five, and that was the length of the list. When is 4+1 not equal to 5?

At this point a light bulb went off, because it was clear that dim+1 was coming out equal to 41. In three dimensions it was 31. When is 4+1 not equal to 5? When it’s actually "4"+1. In other words, dim was a string. JavaScript “helpfully” converts a string to a number when you do anything arithmetical to it, like multiply it by something or raise it to a power. But + isn’t always an arithmetic operation! In JavaScript (and many other languages) it’s also used for string concatenation.

What went wrong, and a rant

The problem was that, the second time I tried to create a simplex, the number of dimensions was coming from the user interface. From an <input element in a web form. And every value that you get from a web form is a string. HTML knows nothing about numbers, and it has no way to know what you’re going to do with the input you get.

So the fix was simple (and you can see it here on GitHub: convert the value from a string to a number right off before trying to use it as a number of dimensions. But… But… But cubes and octohedrons were right!

That’s because the number of vertices in a N-cube is 2**N, and in an N-orthoplex (octohedron in three dimensions) it’s N*2 (and multiplication is always an arithmetic operator in JavaScript). And it worked when I was creating the simplex’s vertices because it was being compared against in a for loop. And so on.

If I’d been using a strongly-typed language, the compiler would have found this two weeks ago.

There are two main ways of dealing with data in a programming language, called “strong typing” and “dynamic typing”. In a strongly-typed language, both values and variables (the boxes you put values into) have types (like “string” or “integer”), and the types have to match. You can’t put a string into a variable with a type of integer. Java is like that (mostly).

Some people find this burdensome, and they prefer dynamically-typed languages like JavaScript. In JavaScript, values have types, but variables don’t. It’s called “dynamic” typing because a variable can hold anything, and its type is that of the last thing that was put into it.

You can write code very quickly in a language where you don’t have to declare your variables and make sure they’re the right type for the kind of values you want to put into them. You can also shoot yourself in the foot much more easily.

There are a couple of strongly-typed variants on JavaScript, for example CoffeeScript and TypeScript, and a type-checker called “Flow”. I’m going to try one of those next.

There was one more problem with simplexes

(simplices?) … but that was purely geometrical, and just because I was trying to do all the geometry in my head instead of on paper, and wasn’t thinking things through.

If you’re in N dimensions, you can create an N-1 dimensional simplex by simply connecting the points with coordinates like [1,0,0], [0,1,0], and [0,0,1] (in three dimensions – it’s pretty easy to see that that gives you an equilateral triangle). Moreover, all the vertices are on the unit sphere, which is where we want them. The last vertex is a bit of a problem.

A fair amount of googling around (or DuckDuckGoing around, in my case) will eventually turn up this answer on, which says that in N dimensions, the last vertex has to be at [x,...,x] where x=-1/(1+sqrt(1+N)). Cool! And it works. Except that it’s not centered – that last vertex is a lot closer to the origin than the others. It took me longer than it should have to get this right, but the center of the simplex is its “center of mass”, which is simply the average of all the vertices. So that’s at y=(1+x)/(N+1) because there are N+1 vertices. Now we just have to subtract y from all the coordinates to shift it over until the center is at the origin.

Then of course we have to scale it so that all the vertices are back on the unit sphere. You can find the code here, on GitHub.

Another fine post from The Computer Curmudgeon.

mdlbear: (technonerdmonster)
2018-10-08 09:01 pm

Adventures in Hyperspace (and Javascript)

(This will be something of an experiment. The original was written in markdown and posted on We'll see whether the process made a hash of it. I may have to do some cleaning up.

This post is about Hyperviewer, an update of a very old demo program of mine from 1988 that displays wireframe objects rotating in hyperspace. (Actually, anywhere between four and six dimensions.) Since this is 2018, I naturally decided to write it in JavaScript, using Inferno and SVG, and put it on the web. It was a learning experience, in more ways than one.

Getting started

I had been doing a little work with React, which is pretty good an very popular, and had recently read about Inferno, which is a lighter-weight, faster framework that's almost completely interchangeable with React. Sounded good, especially since I wanted high performance for something that's going to be doing thousands of floating-point matrix multiplies per second. (A hypercube in N dimensions has 2^N vertices, and a rotation matrix has N^2 entries -- do the math). (It turns out I really didn't have to worry -- Moore's Law over three decades gives a speedup by a factor of a million, give or take a few orders of magnitude, so even using an partially-interpreted language speed isn't a problem. Perhaps I'm showing my age.)

To keep things simple -- and make it possible to eventually save pictures -- I decided to use SVG: the web standard for Scalable Vector Graphics, rather than trying to draw them out using an HTML5 Canvas tag. It's a perfect match for something that's nothing but a bunch of vectors. SVG is XML-based, and you can simply drop it into the middle of an HTML page. SVG is also really easy to generate using the new JSX format, which is basically XML tags embedded in a JavaScript file.

Modern JavaScript uses a program called a "transpiler" -- the most common one is Babel -- that compiles shiny new JavaScript constructs (and even some new languages like TypeScript and CoffeeScript, which I want to learn soon) into the kind of plain old JavaScript that almost any browser can understand. (There are still some people using Microsoft Exploiter from the turn of the century millennium; if you're reading this blog it's safe for me to assume that you aren't one of them.)

Anyway, let's get started:

cut tag added to protect your sanity )

(Not too bad of a formatting job, though of course the color didn't come through. Cut tag added because it's over 2000 words.)

Another fine post from The Computer Curmudgeon.
Cross-posted on

mdlbear: (technonerdmonster)
2018-09-21 02:38 pm

PSA: Data breach at Newegg

TL;DR: if you bought anything from Newegg between August 14th and September 18th, call your bank and get a new credit card. You can find more details in these articles: NewEgg cracked in breach, hosted card-stealing code within its own checkout | Ars Technica // Hackers stole customer credit cards in Newegg data breach | TechCrunch // Magecart Strikes Again: Newegg in the Crosshairs | Volexity // Another Victim of the Magecart Assault Emerges: Newegg

The credit-card skimming attack appears to have been done by Magecart, the organization behind earlier attacks on British Airways and Ticketmaster. If you are one of the customers victimized by one of these attacks, it's not your fault, and there isn't much you could have done to protect yourself (but read on for some tips). Sorry about that.

This article, Compromised E-commerce Sites Lead to "Magecart", gives some useful advice. (It's way at the end, of course; search for "Conclusion and Guidance".) The most relevant for users is

An effective control that can prevent attacks such as Magecart is the use of web content whitelisting plugins such as NoScript (for Mozilla’s Firefox). These types of add-ons function by allowing the end user to specify which websites are “trusted” and prevents the execution of scripts and other high-risk web content. Using such a tool, the malicious sites hosting the credit card stealer scripts would not be loaded by the browser, preventing the script logic from accessing payment card details.

Note that I haven't tried NoScript myself -- yet. I'll give you a review when I do. They also advise selecting your online retailers carefully, but I'm not sure I'd consider, say, British Airlines to be all that dubious. (Ticketmaster is another matter.)

Impacts of a Hack on a Magento Ecommerce Website, which talks about an attack on a site using the very popular Magento platform, gives some additional advice:

Shy away from sites that require entering payment details on their own page. Instead prefer the websites that send you to a payment organization (PayPal, payment gateway, bank, etc) to complete the purchase. These payment organizations are required to have very strict security policies on their websites, with regular assessments, so they are less likely to be hacked or miss some unauthorized modifications in their backend code.

They also suggest checking to see whether the website has had recent security issues, and using credit cards with additional levels of authentication (e.g. 2FA -- two-factor authentication).


Things are more difficult for retailers, but the best advice (from this article, again) is

Stay away from processing payment details on your site. If your site never has access to clients’ payment details, it can’t be used to steal them even if it is hacked. Just outsource payments to some trusted third-party service as PayPal, Stripe, Google Wallet,, etc.

Which is the flip side of what they recommend for shoppers. If the credit card info isn't collected on your site, you're not completely safe, but it avoids many of the problems, including Magecart. Keep your site patched anyway.

If you insist on taking payment info on your own site, and even if you don't, the high-order bit is this paragraph:

E-commerce site administrators must ensure familiarity and conformance to recommended security controls and best practices related to e-commerce, and particularly, the software packages utilized. All operating system software and web stack software must be kept up to date. It is critical to remain abreast of security advisories from the software developers and to ensure that appropriate patch application follows, not only for the core package but also third-party plugins and related components. [emphasis mine]

Be careful out there! links )

Another fine post from The Computer Curmudgeon, cross-posted to

mdlbear: (technonerdmonster)
2018-09-06 10:52 am

PSA: Security

Actually two PSAs.

First: Especially if you're running Windows, you ought to go read The Untold Story of NotPetya, the Most Devastating Cyberattack in History | WIRED. It's the story of how a worldwide shipping company was taken out as collateral damage in the ongoing cyberwar between Russia and the Ukraine. Three takeaways:

  1. If you're running Windows, keep your patches up to date.
  2. If you're running a version of Windows that's no longer supported (which means that you can't keep it patched, by definition), either never under any circumstances connect that box to a network, or wipe it and install an OS that's supported.
  3. If at all possible, keep encrypted offline backups of anything really important. (I'm not doing that at the moment either. I need to fix that.) If you're not a corporation and not using cryptocurrency, cloud backups encrypted on the client side are probably good enough.

Second: I don't really expect that any of you out there are running an onion service. (If you had to click on that link to find out what it is, you're not.) But just in case you are, you need to read Public IP Addresses of Tor Sites Exposed via SSL Certificates, and make sure that the web server for your service is listening to (localhost) and not or *. That's the way the instructions (at the "onion service" link above) say to set it up, but some people are lazy. Or think they can get away with putting a public website on the same box. They can't.

If you're curious and baffled by the preceeding paragraph, Tor (The Onion Router) is a system for wrapping data packets on the internet in multiple layers of encryption and passing them through multiple intermediaries between you and whatever web site you're connecting with. This will protect both your identity and your information as long as you're careful! An onion service is a web server that's only reachable via Tor.

Onion services are part of what's sometimes called "the dark web".

Be safe! The network isn't the warm, fuzzy, safe space it was in the 20th Century.

Another public service announcement from The Computer Curmudgeon.

mdlbear: the positively imaginary half of a cubic mandelbrot set (Default)
2018-09-03 08:42 pm
Entry tags:

Uploading and showing an image on DW

This started out in a comment, but I think enough people may be struggling with DW's relatively new image upload feature that it's worth making it into a post. And it's a good excuse for posting.

(aside: The key to all of this is knowing enough about HTML to be able to identify the correct URL and create the img tag to refer to it -- it probably all looks like black magic to somebody who hasn't been doing this stuff for the last few years. Don't expect to get it all on one reading.)

You might also want to know some terminology. Those things enclosed in angle brackets are commonly called "tags". The things with a name, an equals sign, and something in quotes are called "attributes". And single and double quotes are identical in effect, but have to match: you can't start with a single and end with a double.

So -- I uploaded an image; down at the bottom there's a box labeled Image Code. Copy, paste:

  <a href=''>
     <img src=''
         title='Cat and mouse'
           alt='Grey tabby cat looking intently at a Logitech mouse.'  /></a>

(That's what it looked like, except that I wrapped the lines to make it clearer.)

Looking at that line of HTML I can see that there's an img tag that, from the /100x100/ in its URL, looks like a thumbnail. It's wrapped in a link -- an <a...> tag. The href attribute is, which ought to be the full-sized image, so I'll get a preview just to make sure...

Then I clicked the "Preview" button, and it looked like this:

Grey tabby cat looking intently at a Logitech mouse.

Yup. So -- the href attribute of the img tag, is the URL of the actual image. (At that point I could just as well have copied it out of the browser's location bar, since I was already looking at the full-sized image. I didn't think of that, and just copied it out of the comment I was writing.)

I can put that into an img tag that looks like

<:img src="" width="512">

Chrome says (in the page title) that the image is 1024x719, so I'll try cutting it down by adding a width="512" attribute. Here we go:

-- purrfect.

One final aside: if you're following along and using a word processor that automagically turns quotation marks into matching curly quotes, you need to turn that feature off when you're editing HTML, because they won't be recognized as quotes. If you're creating your post in your browser and you want to enter HTML directly, you need to click the "HTML" tab (over on the upper right). The "Rich Text" setting will create HTML automatically and you'll need to click one of its little icons to create an actual image tag.

Happy hacking!

Another fine post from The Computer Curmudgeon.

mdlbear: (technonerdmonster)
2018-07-26 05:46 pm

Scripting Languages

Today in my continuing series on programming languages I'm going to talk about "scripting languages". "Scripting" is a rather fuzzy category, because unlike the kinds of languages we've discussed before, scripting languages are really distinguished by how they are used, and they're used in two very different ways. It's also confusing because most scripting languages are interpreted, and people tend to use "scripting" when they should be using "interpreted". In my opinion it's more correct to say that a language is being used as a scripting language, rather than to say that it is a scripting language. As we'll see, this is particularly true when the language is being used to customize some application.

But first, let's define scripts. A script is basically a sequence of commands that a user could type at a terminal[1] -- often called "the command line" -- that have been put together in a file so that they can be run automatically. The script then becomes a new command. In Linux, and before that Unix, the program that interprets user commands is called a "shell", possibly because it's the visible outer layer of the operating system. The quintessential script is a shell script. We'll dive into the details later.

[1] okay, a terminal emulator. Hardly anyone uses physical terminals anymore. Or remembers that the "tty" in /dev/tty stands for "teletype"'.

The second kind of scripting language is used to implement commands inside some interactive program that isn't a shell. (These languages are also called extension languages, because they're extending the capabilities of their host program, or sometimes configuration languages.) Extension languages generally look nothing at all like something you'd type into a shell -- they're really just programming languages, and often are just the programming language the application was written in. The commands of an interactive program like a text editor or graphics editor tend to be things like single keystrokes and mouse gestures, and in most cases you wouldn't want to -- or even be able to -- write programs with them. I'll use "extension languages" for languages used in this way. There's some overlap in between, and I'll talk about that later.

Shell scripting languages

Before there was Unix, there were mainframes. At first, you would punch out decks of Hollerith cards, hand them to the computer operator, and they would (eventually) put it in the reader and push the start button, and you would come back an hour or so later and pick up your deck with a pile of listings from the printer.

Computers were expensive in those days, so to save time the operator would pile a big batch of card decks on top of one another with a couple of "job control" cards in between to separate the jobs. Job control languages were really the first scripting languages. (And the old terminology lingers on, as such things do, in the ".bat" extension of MS/DOS (later Windows) "batch files". Which are shell scripts.)

By far the most sophisticated job control language ran on the Burroughs 5000 and 6000 series computers, which were designed to run Algol very efficiently. (So efficiently that they used Algol as what amounted to their assembly language! Programs in other languages, including Fortran and Cobol, were compiled by first translating them into Algol.) The job control language was a somewhat extended version of Algol in which some variables had files as their values, and programs were simply subroutines. Don't let anyone tell you that all scripting languages are interpreted.

Side note: the Burroughs machines' operating system was called MCP, which stands for Master Control Program. Movie fans may find that name familiar.

Even DOS batch files had control-flow statements (conditionals and loops) and the ability to substitute variables into commands. But these features were clumsy to use. In contrast, the Unix shell written by Stephen Bourne at Bell Labs was designed as a scripting language. The syntax of the control structures was, in fact, derived from Algol 68, which introduced the "" and "do...done" syntax.

Bourne's shell was called sh in Unix's characteristically terse style. The version of Unix developed at Berkeley, (BSD, for Berkeley System Distribution -- I'll talk about the history of operating systems some time) had a shell called the C shell, csh, which had a syntax derived from the C programming language. That immediately gave rise to the popular tongue-twister "she sells cshs by the cshore".

The GNU (GNU's Not Unix) project, started by Richard Stallman with the goal of producing a completely free replacement for Unix, naturally had its own rewrite of the Bourne Shell called bash -- the Bourne Again Shell. It's a considerable improvement over the original, pulling in features from csh and some other shell variants.

Let's look at shell scripting a little more closely. The basic statement is a command -- the name of a program followed by its arguments, just as you would type it on the command line. If the command isn't one of the few built-in ones, the shell then looks for a file that matches the name of the command, and runs it. The program eventually produces some output, and exits with a result code that indicates either success or failure.

There are a few really brilliant things going on here.

  • Each program gets run in a separate process. Unix was originally a time-sharing operating system, meaning that many people could use the computer at the same time, each typing at their own terminal, and the OS would run all their commands at once, a little at a time.
  • That means that you can pipe the output of one command into the input of another. That's called a "pipeline"; the commands are separated by vertical bars, like | this, so the '|' character is often called "pipe" in other contexts. It's a lot shorter than saying "vertical bar".
  • You can "redirect" the output of a command into a file. There's even a "pipe fitting" command called tee that does both: copies its input into a file, and also passes it along to the next command in the pipeline.
  • The shell uses the command's result code for control -- there's a program called true that does nothing but immediately returns success, and another called false that immediately fails. There's another one, test, which can perform various tests, for example to see whether two strings are equal, or a file is writable. There's an alias for it: [. Unix allows all sorts of characters in filenames. Anyway, you can say things like if [ -w $f ]; then...
  • You can also use a command's output as part of another command line, or put it into a variable. today=`date` takes the result of running the date program and puts it in a variable called today.

This is basically functional programming, with programs as functions and files as variables. (Of course, you can define variables and functions in the shell as well.) In case you were wondering whether Bash is a "real" programming language, take a look at nanoblogger and Abcde (A Better CD Encoder).

Sometime later in this series I'll devote a whole post to an introduction to shell scripting. For now, I'll just show you a couple of my favorite one-liners to give you a taste for it. These are tiny but useful scripts that you might type off the top of your head. Note that comments in shell -- almost all Unix scripting languages, as a matter of fact -- start with an octothorpe. (I'll talk about octothorpe/sharp/hash/pound later, too.)

# wait until nova (my household server) comes back up after a reboot
until ping -c1 nova; do sleep 10; done

# count my blog posts.  wc counts words, lines, and/or characters.
find $HOME/.ljarchive -type f -print | wc -l

# find all posts that were published in January.
# grep prints lines in its input that match a pattern.
find $HOME/.ljarchive -type f -print | grep /01/ | sort

Other scripting languages

As you can see, shell scripts tend to be a bit cryptic. That's partly because shells are also meant to have commands typed at them directly, so brevity is often favored over clarity. It's also because all of the operations that work on files are programs in their own right; they often have dozens of options and were written at different times by different people. The find program is often cited as a good (or bad) example of this -- it has a very different set of options from any other program, because you're trying to express a rather complicated combination of tests on its command line.

Some things are just too complicated to express on a single line, at least with anything resembling readability, so many other programs besides shells are designed to run scripts. Some of the first of these in Unix were sed, the "stream editor", which applies text editing operations to its input, and awk, which splits lines into "fields" and lets you do database-like operations on them. (Simpler programs that also split lines into fields include sort, uniq, and join.)

DOS and Windows look at the last three characters of a program's name (e.g., "exe" for "executable" machine language and "bat" for "batch" scripts) to determine what it contains and how to run it. Unix, on the other hand, looks at the first few characters of the file itself. In particular, if these are "#!" followed by the name of a program (I'm simplifying a little), the file is passed to that program to be run as a script. The "#!" combination is usually pronounced "shebang". This accounts for the popularity of "#" to mark comments -- lines that are meant to be ignored -- in most scripting languages.

The scripting programs we've seen so far -- sh, sed, awk, and some others -- are all designed to do one kind of thing. Shells mostly just run commands, assign variables, and substitute variables into commands, and rely on other programs like find and grep to do most other things. Wouldn't it be nice if one could combine all these functions into one program, and give it a better language to write programs in. The first of these that really took off was Larry Wall's Perl. Like the others it allows you to put simple commands on the command line -- with exactly the same syntax as grep and awk.

Perl's operations for searching and substituting text look just like the ones in sed and grep. It has associative arrays (basically lookup tables) just like the ones in awk. It can run programs and get their results exactly the way sh does, by enclosing them in backtick characters (`...` -- originally meant to be used as left single quotes), and it can easily read lines out of files, mess with them, and write them out. It has has objects, methods, and (more or less) first-class functions. And just like find and the Unix command line, it has a well-earned reputation for scripts that are obscure and hard to read.

You've probably heard Python mentioned. It was designed by Guido van Rossum in an attempt to be a better scripting language Perl, with an emphasis on making programs more readable, easier to write, and easier to maintain. He succeeded. At this point Python has mostly replaced Perl as the most popular scripting language, in addition to being a good first language for learning programming. (Which is the best language for learning is a subject guaranteed to invoke strong opinions and heated discussions; I'll avoid it for now.) I avoided Python for many years, but I'm finally learning it and finding it much better than I expected.

Extension languages

The other major kind of scripting is done to extend a program that isn't a shell. In most cases this will be an interactive program like an editor, but it doesn't have to be. Extensions of this sort may also be called "plugins".

Extension languages are usually small, simple, and interpreted, because nobody wants their text editor (for example) to include something as large and complex as a compiler when its main purpose is defining keyboard shortcuts. There's an exception to this -- sometimes when a program is written in a compiled language, the same language may be used for extensions. In that case the extensions have to be compiled in, which is usually inconvenient, but they can be particularly powerful. I've already written about one such case -- the Xmonad window manager, which is written and configured in Haskell.

Everyone these days has at least heard of JavaScript, which is the scripting language used in web pages. Like most scripting languages, JavaScript has escaped from its enclosure in the browser and run wild, to the point where text editors, whole web browsers, web servers, and so on are built in it.

Other popular extension languages include various kinds of Lisp, Tcl, and Lua. Lua and Tcl were explicitly designed to be embedded in programs. Lua is particularly popular in games, although it has recently turned up in other places, including the TeX typesetting system.

Lisp is an interesting case -- probably its earliest use as an extension language was in the Emacs text editor, which is almost entirely written in it. (To the point where many people say that it's a very good Lisp interpretor, but it needs a better text editor. I'm not one of them: I'm passionately fond of Emacs, and I'll write about it at greater length later on.) Because of its radically simple structure, Lisp is particularly easy to write an interpretor for. Emacs isn't the only example; there are Lisp variants in the Audacity audio workstation and the Autodesk CAD program. I used the one in Audacity for the sound effects in my computer/horror crossover song "Vampire Megabyte".

Emacs, Atom (a text editor written in JavaScript), and Xmonad are good examples of interactive programs where the same language is used for (most, if not all, of) the implementation as well as for the configuration files and the extensions. The boundaries can get very fuzzy in cases like that; as a Mandelbear I find that particularly appealing.

Another fine post from The Computer Curmudgeon.

mdlbear: (technonerdmonster)
2018-07-22 09:25 pm

Blog posting software planning

In comments on "Done Since 2018-07-15" we started having a discussion of mirroring and cross-posting DW blog entries, and in particular what my plans are for implementing personal blog sites that mirror all or some of a -- this -- Dreamwidth journal.

Non-techie readers might conceivably want to skip this post.

Where I am now:

Right now, my blog posting process is, well, let's just say idiosyncratic. Up until sometime late last year, I was posting using an Emacs major mode called lj-update-mode; it was pretty good. It had only two significant problems:

  1. It could only create one post at a time, and there was no good way to save a draft and come back to it later. I could live with that.
  2. It stopped working when DW switched to all HTTPS. It was using an obsolete http library, and noone was maintaining either of them.

My current system is much better.

  1. I run a command, either make draft or, if I'm pretty sure I'm going to post immediately, make entry. I pass the filename, without the yyyy/mm/dd prefix, along with an optional title. If I don't pass the title I can add it later. The draft gets checked in with git; I can find out when I started by using git log.
  2. I edit the draft. It can sit around for days or months; doesn't matter. It' an ordinary html file except that it has an email-like header with the metadata in it.
  3. When I'm done, I make post. Done. If I'm posting a draft I have to pass the filename again to tell it which draft; make entry makes a symlink to the entry, which is already in a file called yyyy/mm/dd-filename.html. It gets posted, and committed in git with a suitable commit message.

You can see the code in MakeStuff/blogging on GitHub. It depends on a Python client called charm, which I forked to add the Location: header and some sane defaults like not auto-formatting. Charm is mostly useless -- it does almost everything using a terminal-based text editor. Really? But it does have a "quick-post" mode that takes metadata on the command line, and a "sync" mode that you can use to sync your journal with an archive. Posts in the archive are almost, but not quite, in the same format as the MakeStuff archive; the main difference is that the filenames look like yyyy/mm/dd_HHMM. Close, but not quite there.

There's another advantage that isn't apparent in the code: you can add custom make targets that set up your draft using a template. For example, my "Done since ..." posts are started with make done, and my "Computer Curmudgeon" posts are started with make curmudgeon. There are other shortcuts for River and S4S posts. I also have multiple directories for drafts, separated roughly by subject, but all posting into the same archive.

Where I want to go:

Here's what I want next:

  • The ability to post in either HTML or markdown -- markdown has a great toolchain, including the ability to syntax-color your code blocks.
  • The ability to edit posts by editing the archived post and uploading it. Right now it's a real pain to keep them in sync.
  • A unified archive, with actual URLs in the metadata rather than just the date and time in the filename.
  • The ability to put all or part of my blog on different sites. I really want the computer-related posts to go on (usually shortened to in my notes), and a complete mirror on (
  • Cross-links in both directions between my sites and DW.

How to get there:

Here's a very brief sketch of what needs to be done. It's only vaguely in sequence, and I've undoubtedly left parts out. But it's a start.

Posting, editing, and archiving

  • Posting in HTML or markdown is a pretty easy one; I can do that just by modifying the makefiles and (probably) changing the final extension from .html to .posted so that make can apply its usual dependency-inference magic.
  • Editing and a unified archive will both require a new command-line client. There aren't any. There are libraries, in Ruby, Haskell, and Javascript, that I can wrap a program around. (The Python code in charm doesn't look worth saving.) I wanted to learn Ruby anyway.
  • The unified archive will also require a program that can go back in time and match up archived posts with the right URLs, reconcile the two file naming conventions, and remove the duplicates that are due to archiving posts both in charm and MakeStuff. Not too hard, and it only has to be done once.
  • It would be nice to be able to archive comments, too. The old ljbackup program can do it, so it's feasible. It's in Perl, so it might be a good place to start.

Mirror, mirror, on the server...

This is a separate section because it's mostly orthogonal to the posting, archiving, etc.

  • The only part of the posting section that really needs to be done first is the first one, changing the extension of archived posts to .posted. (That's because make uses extensions to figure out what rules to apply to get from one to another. Remind me to post about make some time.)
  • The post archive may want to have its own git repository.
  • Templating and styling. My websites are starting to show their age; there's nothing really wrong with a retro look, but they also aren't responsive (to different screen sizes -- that's important when most people are reading websites on their phones), or accessible (screen-reader friendly and navigable by keyboard; having different font sizes helps here, too). Any respectable static site generator can do it -- you may remember this post on The Joy of Static Sites -- but the way I'm formatting my metadata will require some custom work. Blosxom and nanoblogger are probably the closest, but they're ancient. I probably ought to resist the temptation to roll my own.

Yeah. Right.

Another fine post from The Computer Curmudgeon.

mdlbear: the positively imaginary half of a cubic mandelbrot set (Default)
2018-07-21 04:18 pm

Songs for Saturday: Stormy Weather

This week's S4S isn't actually a song, but it is music. "The Opposite of Forecasting" is Austin, TX's Weather, Sonified. It's a project by Douglas Lausten <>.

You can read the Music Notes, explaining how the weather is mapped into muic, or the Tech Notes, about how the hardware works. Or you could just go to this link to hear it in your browser.

Let's see whether I can add a player: -- yup! Can't seem to apply any styles to it, though, and I have no idea whether DW will allow it.

Credit for finding today's s4s goes to siderea.

mdlbear: (wtf-logo)
2018-07-20 09:19 pm
Entry tags:

PSA: Venmo users check your privacy settings!

If you're using the popular social/money-transfer phone app Venmo check your privacy settings!! It seems that the default is that every transaction you make is public! It is difficult for me to express just how broken this is. In case you're having trouble grasping the implications, just go to PUBLIC BY DEFAULT - Venmo Stories of 2017. There you will find profiles of five unsuspecting Venmo users -- one of them is a cannabis retailer -- whose transactions were among the over two hundred thousand exposed to public view during 2017.

The site is a project of Mozilla Media Fellow Hang Do Thi Duc. She has some other interesting things on her site.

It's worth noting that Venmo is owned by PayPal, and that according to a PayPal spokesperson quoted in this article on Gizmodo the public-by-default nature of person-to-person transfers (person-to-business transactions are private) is apparently a deliberate feature, not a bug.

“Venmo was designed for sharing experiences with your friends in today’s social world, and the newsfeed has always been a big part of this,” a company spokesperson told Gizmodo, asserting that the “safety and privacy” of its users is a “top priority.”

Yeah. Right.

Here are more articles at The Guardian, Lifehacker, and CNET.

"We make it default because it's fun to share [information] with friends in the social world," a Venmo representative told CNET Friday. "[We've seen that] people open up Venmo to see what their family and friends are up to."

Because it's fun. Kind of puts it in the same category as other "fun" things like cocaine, binge drinking, and unprotected sex, doesn't it?

This has been a public service announcement from The Computer Curmudgeon. With a tip of the hat to Thnidu.

mdlbear: (technonerdmonster)
2018-07-05 10:36 pm

Languages: Imperative, Object-Oriented, and Functional

Back when smalltalk was sports and the weather, And an object was what you could see, And we watched "Captain Video" in black-and-white, Before there was color TV. -- "When I Was a Lad"

Even if you're not a programmer (and most of the people I expect to be reading this aren't) you've probably heard one of your geek friends mentioning "object-oriented" programming. You might even have heard them mention "functional" programming, and wondered how it differs from non-functional programming. The computer curmudgeon can help you.

If you are a programmer, you probably know a lot of this. You may find a few items of historical interest, and it might be useful explaining things to the non-programmers in your life.

Imperative languages

As you may recall from last month's post on computer languages, the first programming languages were assembly languages -- just sequences of things for the computer to do, one after another (with a few side-trips). This kind of language is called "imperative", because each statement is a command telling the computer what to do next: "load a into register 2!" "add register 2 to register 3!" "roll over!" "sit!" You get the idea.

Most programmers use "statement" instead of "command" for these things even though the latter might be more correct from a grammatical point of view; a "command" is something one types at a terminal in order to run a program (which is also called a "command").

Most of the earlier programming languages (with one notable exception that we'll get to later) were also imperative. Fortran, Cobol, and Algol were the most prominent of these; most of today's languages descend more-or-less from Algol. (By the way, the Report on the Algorithmic Language Algol 60 is something I always point to as an example of excellent technical writing.)

Imperative languages make a distinction between statements, which do things (like "a := 2", which puts the number 2 into a variable called "a"), and "expressions", which compute results. Another way of saying that is that an expression has a value (like 2+2, which has a value of 4), and a statement doesn't. In earlier languages that value was always a number; later languages add things like lists of symbols and strings of characters enclosed in quotes.

Algol and its descendents lend themselves to what's called "Structured Programming" -- control structures like conditionals ("if this then {that} else {something-else}") and loops (for x in some-collection do {something}) let you build your program out of simple sections that nest together but always have a single entrance and exit. Notice the braces around those sections. Programming languages have different ways of grouping statements; braces and indentation are probably the most common, but some languages use "if ... fi" and "do ... done".

Languages also have ways of packaging up groups of statements so that they can be used in different places in the program. These are called either "subroutines" or "functions"; if a language has both it means that a function returns a value (like "sqrt(2)", which returns the square root of 2) and a subroutine doesn't (like "print('Hello world!')", which prints a familiar greeting.) Subroutines are statements, and functions are expressions. The expressions in parentheses are called either "arguments" or "parameters" -- I was bemused to find out recently that verbs have arguments, too. (A verb's arguments are the subject, and the direct and indirect objects.)

Object-oriented languages

As long as we're on the subject of objects, ...

In the mid 1960s Ole-Johan Dahl and Kristen Nygaard, working at the University of Oslo, developed a language called Simula, designed for simulating objects and events in the real world. Objects are things like cars, houses, people, and cats, and Simula introduced software objects to represent them. Objects have "attributes": my car is blue, my cats are both female, my wife has purple hair. Some of these attributes are objects in their own right: a car has wheels, a cat has four legs and a person has two legs and two arms. Some of these are variables: sometimes my driveway has a car in it, sometimes it doesn't, and sometimes it has two or three.

What this means is that objects have state. A car's speed and direction vary all the time, as does the position of a cat's tail. An object is a natural way to bundle up a collection of state variables.

An object also can do things. In a program, objects rarely do things on their own, some other object tells them what to do. This is a lot like sending the object a message. "Hey, car -- turn left!" "Hey, cat! What color are your eyes?" Software objects don't usually have any of their state on the outside where other parts of the program can see it; they have to be asked politely. That means that the object might be computing something that looks to the outside like a simple state variable -- a car computes its speed from the diameter of its tires and how fast its wheels are turning. All these things are hidden from the rest of the program.

An object has to have a method for handling any message thrown at it, so the documentation of most languages refers to "calling a method" rather than "sending a message". It's the same thing in most cases. As seen from the outside, calling a method is just like calling a function or subroutine, except that there's an extra argument that represents the object. If you think of the message as a command -- an imperative sentence -- it has an implied subject, which is the object you're talking to. Inside the little program that handles the method, the object is represented by a name that is usually either "self" or "this", depending on the language.

One natural thing to do with objects is to classify them, so objects have "classes", and classes tend to be arranged in hierarchies. My cats are cats -- we say that Ticia and Desti are instances of the class "Cat". All cats are animals, so Cat is a subclass of Animal, as are Dog and Person. (In programming languages, classes tend to have their names capitalized. Variables (like "x" or "cat") almost always start with a lowercase letter. Constants (like "Pi", "True", and "Ticia") can go either way.)

An object's class contains everything the computer needs to determine the behavior of its instances: a list of the state variables, a list of the methods it can handle, and so on. Now we get to the fun part.

Simula was just objects and methods tacked on to a "traditional" imperative language -- Algol in that case. Most other "object-oriented" languages work the same way -- C++ is just objects tacked on to C; Python and Perl had objects from the beginning or close to it, but they still include a lot of things -- numbers, strings, arrays, and so on, that aren't objects or (like strings in Java) are pretending to be objects. It doesn't have to be that complicated.

Shortly after Simula was released, Alan Kay at Xerox PARC realized that you could design a programming language where everything was an object. The result was Smalltalk, and it's still the best example of a pure object-oriented language. (It inspired C++ and Java, and indeed almost all existing OO languages, but very few of them went all the way. Ruby comes closest of the languages I'm familiar with.) Smalltalk is turtles objects all the way down.

In Smalltalk numbers are objects -- you can add new methods to them, or redefine existing methods. (Your program will eventually stop working if you redefine addition so that 2+2 is 5, but it will keep going for a surprisingly long time.) So are booleans. True and False are instances of different subclasses of Boolean, and they behave differently when given messages like ifTrue: or ifFalse:. (I'm not going to go into the details here; Smalltalk warrants an article all by itself). Loops are methods, too. The trickiest bit, though, is that classes are objects. A class is just an instance of Metaclass, and Metaclass is an instance of itself.

Someone trying to implement Smalltalk has to cheat in a couple of places -- the Metaclass weirdness is one of them -- but they have to hide their tracks and not get caught.

Functional languages

Smalltalk, for all its object orientation, is still an imperative language underneath. A program is still a sequence of statements, even if they're all wrapped up in methods. Objects have state -- lots of it; they're just a good way of organizing it. Alan Kay once remarked that programming in Smalltalk was a lot like training a collection of animals to perform tricks together. Sometimes it's more like herding cats.

But it turns out that you don't need statements -- or even state -- at all! All you need is functions.

About the same time Alan Turing was inventing the Turing Machine, which is basically a computer stripped-down and simplified to something that will just barely work, Alonzo Church was developing something he called the Lambda Calculus, which did something similar to the mathematical concept of functions.

A mathematical function takes a set of inputs, and produces an output. The key thing about it is that a given set of inputs will always produce exactly the same output. It's like a meat grinder -- put in beef, and you get ground beef. Put in pork, and you get ground pork. A function has no state, and no side-effects. State would be like a secret compartment in the meat grinder, so that you get a little chicken mixed in with your ground beef, and beef in your ground pork. Ugh. A side effect would be like a hole in the side that lets a little of the ground meat escape.

Objects are all about state and side effects. You can call a method like "turn left" on a car (in most languages that would look like car.turn(left)) and as a side effect it will change the car's direction state to something 90 degrees counterclockwise from where it was. That makes it hard to reason about what's going to happen when you call a particular method. It's even harder with cats.

Anyway, in lambda calculus, functions are things that you can pass around, pass to other functions, and return from functions. They're first class values, just like numbers, strings, and lists. It turns out that you can use functions to compute anything that you can compute with a Turing machine.

Lambda calculus gets its name from the way you represent functions: To define a function that squares a number, for example, we say something like:

square = λ x: x*x

Now, this looks suspiciously like assigning a value to a variable, and that's exactly what it's doing. If you like, you can even think of λ as a funny kind of function that takes a list of symbols and turns it into a function. You find the value of an expression like square(3) by plugging the value 3 into the function's definition in place of x. So,

  → (λ x: x*x)(3)
  → 3*3
  → 9

Most functional languages don't allow you to redefine a name once you've bound it to a value; if you have a function like square it wouldn't make much sense to redefine it in another part of the program as x*x*x. This is vary different from imperative languages, where variables represent state and vary all over the place. (Names that represent the arguments of functions are only defined inside any one call to the function, so x can have different values in square(2) and square(3) and nobody gets confused.)

Lambda calculus lends itself to recursive functions -- functions that call themselves -- and recursion is particularly useful when you're dealing with lists. You do something to the first thing on the list, and then combine that with the result of doing the same thing to the rest of the list.Suppose you have a list of numbers, like (6 2 8 3). If you want another list that contains the squares of those numbers, you can use a function called map that takes two arguments: a function and a list. It returns the result of applying the function to each element of the list. So

map(square, (6, 2, 8, 3))
 → (36, 4, 64, 9)

It's pretty easy to define map, too. It's just

map = λ (f, l):
	if isEmpty(l)
           then l
	   else cons(f(first(l)), map(f, rest(l)))

The cons function constructs a new list that consists of its first argument, which can be anything, tacked onto the front of its second argument, which is a list. Using first and rest as the names of the functions that get the first item in a list, and the remainder of the list after the first item, is pretty common tese days. Historically, the function that returns the first item in a list is called car, and the function that returns the rest of the list is called cdr (pronounced sort of like "could'r").

I know, the classic introduction to recursive functions is factorial. Lists are more interesting. Besides, I couldn't resist an excuse for mentioning "car". We'll see "cat" in a future post when I discuss scripting languages.

We call map a "higher-level" function because it takes a function as one of its arguments. Other useful higher-level functions include filter, which takes a boolean function (one that returns true or false) and flatmap, which takes a function that returns lists and flattens them out into into a single list. (Some languages call that concatmap.) Higher-level functions are what make functional languages powerful, beautiful, and easy to program in.

Now, Church's lambda calculus was purely mathematical -- Church used it to prove things about what kinds of functions were computable, and which weren't, which is exactly what Alan Turing did with the Turing Machine. We'll save that for another post, except to point out that Church's lambda calculus can compute anything that a Turing machine can compute, and vice versa.

But in 1958 John McCarthy at MIT invented a simple way of representing lambda expressions as lists, so that they could be processed by a computer: just represent a function and its arguments as a list with the function first, followed by its arguments. Trivial? Yes. But brilliant. His paper included a simple function called eval that takes a list representing a function and its arguments, and returns the result. Steve Russell realized that it wouldn't be hard to implement eval in machine language (on an IBM 704).

I'll save the details for another post, but the result was a language called Lisp. It's the second oldest programming language still in use. The post about LISP will explain how it works (elsewhere I've described eval as The most amazing piece of software in the world), and hopefully show you why Kanef's song The Eternal Flame says that God wrote the universe in LISP. Programmers may want to take a look at "Sex and the Single Link".

And finally,

I'd be particularly interested in seeing comments from the non-programmers among my readers, if I haven't lost you all by now. Was this post interesting? Was it understandable? Was it too long? Were my examples sufficiently silly? Inquiring minds...

Another fine post from The Computer Curmudgeon.

mdlbear: (technonerdmonster)
2018-06-19 04:26 pm
Entry tags:

HTML for Poets: Some simple magic

Here are a couple of little tricks that may keep you from tearing your hair out in frustration, and in any case will make your posts look better.

Tip the first: Make beautiful poetry with no effort.

There are two obvious ways to post poetry: enclose it in <pre>...</pre> tags (which tells the browser that it's preformatted, so it should leave your line breaks alone), or tediously add a line break tag -- <br> -- to the end of every line. Unfortunately, preformatted text uses a monospaced font, which is great for code and concrete poetry, but really doesn't look very good. And adding line breaks is just tedious.

There is a third way: use

<blockquote style="white-space: pre-wrap;" >to wrap your poem. </blockquote>

This tells your browser to keep spaces and line breaks the way you want them, but continue using the same font as the rest of your page.

There's one subtlety: that start tag is pretty long, but if you start your poem on the following line there will be an annoying extra blank line at the top. So do what I did up there, and start a new line before the >. It turns out that you can replace spaces with newlines anywhere outside of quotes.

That brings me to the next tip:

Tip the Second: Don't let your word processor try to help.

A lot of people compose their blog posts using a word processor, and paste them into the "post" form. If you don't, and aren't using some kind of posting client that lets you save your work any time you want to, you should. Especially if you have cats. But there's a problem. Word processors want to help you create great-looking documents. And that's a problem if you're trying to write HTML.

One thing word processors may try to do is to replace all your quotation marks with “curly” quotes (also called “smart” quotes), instead of using "straight" quotes. That's great except for when you're trying to enter a link, or one of style attributes I just showed you up above. A browser won't recognize them. So instead of

<a href="">this</a>

you get

<a href=“”>this</a>

See the difference? You may have to zoom in a little. Hover over the links. You'll see that your browser thinks that it's looking at a local link, because it doesn't start with https://, it starts with “https://! Oops!

Another thing I've seen word processors and some other programs do is, behind your back, replace "<" and ">" characters with "&lt;" and "&gt;". Those weird things are called "character entities", and they're HTML's way of forcing the browser to show something that looks like "<" without letting it interpret it as part of a tag. That probably isn't what you wanted, unless you're trying to write math inequalities or an article about how to write HTML.

Yet another problem happens because word processors use proportional spacing, so that your post looks to you the way it's going to look in a browser, more or less. The problem happens because some characters are really small in proportional-spaced fonts, so it's hard to select exactly the text you wanted to, and easy to overlook stray spaces and periods if they get pasted into your links by mistake.

That's why programmers use text editors with fixed-width fonts, and rely on programs like web browsers and LaTeX to make their pages look good when you read or print them.


By the way, I did a little searching and found The Mandelbear's Rough Guide to HTML, from back in 2006. It's still mostly good advice.

Another fine post from The Computer Curmudgeon.

mdlbear: (technonerdmonster)
2018-06-05 10:44 am

Git and GitHub, with a side of Microsoft

If you develop software and haven't just returned from the moon, you've undoubtedly heard that GitHub is being acquired by Microsoft. Depending on your affiliations you might be spelling "being acquired by" as "selling out to". The rest of you are probably wondering what on Earth a GitHub is, and why Microsoft would want one. Let me explain.

Please note: this post isn't about my opinion of today's news. It's really too early to tell, though I may get into that a little toward the end. Instead, I'm going to explain what GitHub is, and why it matters. But first I have to explain Git.

Git is a version-control system. (Version-control systems are sometimes called "source code management" (SCM) systems. If you look closely you might even have spotted "scm" in git's URL up there at the end of the last paragraph.) Basically, a version-control system lets you record the complete history of a project, with what changes were made, who made the each change, when they changed it, and their notes about what they did and why. It doesn't have to be a software project, either. It can be recipes, photographs, books, the papers you're writing for school, or even blog entries. (Yes, I do.)

Before git, most version-control systems kept track of changes in text files (which of course is what all source code is) by recording which lines are different from the previous version. (It's usually done by a program called diff.) This was very compact, but it could also be very slow if you had to undo all the changes between two versions in order to see what the older one looked like.

Git, on the other hand, is blindingly fast in part because it works in the stupidest way possible (which is why it's called "git"). It simply takes the new version of each file that changed since the last version, zips it up, and stuffs it whole into its repository. So it takes git about the same amount of time to roll a file back two versions or two hundred.

The other thing that makes git fast is where it keeps all of its version information. Before git, most version-control systems used a centralized repository on a server somewhere. (Subversion, one of the best of these, even lets you browse the repository with a web browser.) That means that all the change information is going over a network. Git keeps its repository (these days everyone shortens that to "repo") on your local disk, right next to your working copy, in a hidden subdirectory called ".git".

Because its repo is local, and contains the entire history of your project, you don't need a network connection to use git. On the beach, in an airplane, on a boat, with a goat, it doesn't matter to git. It's de-centralized. It gets a little more complicated when more than one developer is working on a project.

Bob's been in the office all week working on a project. When his boss, Alice, comes back from the open source conference she's been at all week, all she has to do is tell git to fetch all the changes that Bob made while she was away. Git gets them directly from Bob's repo. If Alice didn't make any changes, that's called a "fast-forward" merge -- git just takes the changes that Bob made, copies those files into Alice's repo, updates her working tree, and it's done.

It's a little trickier if Alice had time to make some changes, too. Now Alice has to merge the two sets of changes, and then let Bob pull the merged files onto his computer. By the way, a "pull" is just a fetch followed by a merge, but it's so common that git has a shorthand way of doing it. (I'm oversimplifying here, but this isn't the time to go into the difference between merge and rebase. It's also not a good time to talk about branches -- maybe some other week.) As you can imagine, this gets out of hand pretty quickly, and it's even worse if there's a whole team working on the project.

The obvious thing to do is for the group to have one repo on a server somewhere that has what everyone agrees is the definitive set of files on it. Bob pushes his changes to the server, and when Alice tries to push her changes, git balks and gives her an error message. Now it's Alice's responsibility to make any necessary fixes and push them to the server. Actually, in a real team, Alice would send her proposed changes around by making a diff and sending email to the other team members to review, and not actually push her changes until someone approves them.

In a large team, this is kind of a hub-and-spokes arrangement. You can see where this is going, right?

GitHub is a company that provides a place for people and projects to put shared git repositories where other people can see them, clone them, and contribute to them. GitHub has become wildly popular, because it's a great place to share software. If you have an open-source software project, putting a public repo on GitHub is the most effective way to reach developers. It's so popular that Google and Microsoft shut down their own code-hosting sites (Google Code and CodePlex respectively) and moved to GitHub. Microsoft, it turns out, is GitHub's biggest contributor.

Putting a public repository on GitHub is free. If you want to set up private repositories, GitHub will charge you for it, and if your company wants to put a clone of GitHub on its own private servers they can buy GitHub Enterprise, but if your software is free, so's your space on GitHub.

That's a bit of a problem, because the software that runs GitHub is not free. That means that they need a steady stream of income to pay their in-house developers, because they're not going to get any help from the open-source developer community. GitHub lost $66 million in 2016, and doesn't really have a sustainable business model that would make them attractive to investors. They needed to get acquired, or they had a real risk of going under. And when a service based on proprietary software goes under, all of their customers have a big problem. But their users? Heh.

Everybody knows the old adage, "if you're getting a service for free you're not the customer, you're the product." That's especially true for companies like Google and Facebook, which sell their users' eyeballs to advertisers. It's a lot less true for a company whose users can leave any time they want, painlessly, taking all their data and their readers with them. I'm sure most of my readers here on Dreamwidth remember what happened to Livejournal when they got bought by the Russians. Well, GitHub is being bought by Microsoft. It's not entirely clear which is worse.

GitHub has an even worse problem than Livejournal did, because "cross-posting" is basically the way git works. There's a company called GitLab that looks a lot like GitHub, except that their core software -- the stuff that wraps a slick web interface around a git repository -- is open source. (They do sell extensions, but most projects aren't going to need them.) If you want to set up your own private GitLab site, it's free, and you can do it in ten minutes with a one-line command. If you find bugs, you can fix them yourself. You'll find a couple of great quotes from their blog at the end of the notes, but the bottom line is that 100,000 repositories have moved from GitHub to GitLab in the last 24 hours.

And once you've moved a project to GitLab, you don't have to worry about what happens to it, because the open-source core of it will continue to be maintained by its community. That's what happened when a company called Netscape went belly-up: Mozilla Firefox is still around and doing fine. And if the fact that GitLab is for profit is a problem for you, there's Apache Allura, gitolite3, gitbucket, and gitweb (to name a few). Go for it!


This so wasn't what I was planning to write today.

  @ Microsoft Reportedly Acquires GitHub | Linux Journal
    The article ends with a list of alternatives:
    Apache Allura
    GitBucket: A Git platform
  @ Microsoft acquires GitHub for $7.5 billion - TFiR
    " According to reports, GitHub lost over $66 millions in 2016. At the same time
      GitLab, a fully open source and decentralized service is gaining momentum, giving
      users a fully open source alternative. "
  @ Microsoft to acquire GitHub for $7.5 billion | Stories official press release
  @ Microsoft + GitHub = Empowering Developers - The Official Microsoft Blog
  @ A bright future for GitHub | The GitHub Blog
  @ Congratulations GitHub on the acquisition by Microsoft | GitLab
    " While we admire what's been done, our strategy differs in two key areas. First,
      instead of integrating multiple tools together, we believe a single application,
      built from the ground up to support the entire DevOps lifecycle is a better
      experience leading to a faster cycle time. Second, it’s important to us that the
      core of our product always remain open source itself as well. "
  @ GitLab Ultimate and Gold now free for education and open source | GitLab 
    " It has been a crazy 24 hours for GitLab. More than 2,000 people tweeted about
      #movingtogitlab. We imported over 100,000 repositories, and we've seen a 7x increase
      in orders. We went live on Bloomberg TV. And on top of that, Apple announced an
      Xcode integration with GitLab. "

Another fine post from The Computer Curmudgeon.

mdlbear: (technonerdmonster)
2018-05-25 09:52 pm

Computer Languages: compiled, interpreted, and assembled.

A few months ago while waiting for a ferry with my sister I mentioned a programming language called Elm, which I've been particularly interested in because it's a functional language that compiles into Javascript. Her response, predictably, was something like "I have no idea what you just said. What does 'compiles' mean?" So I explained the difference between compilers and interpreters, and a few other things about programming languages, and she said that it was the first time anyone had explained how programming languages work in a way that made sense to her. In the back of my mind, I thought it might someday make an interesting blog post. So here it is.

Inside of a computer, everything is a number. These days, these numbers are all represented in binary (base two -- just ones and zeroes), stored in a couple of billion numbered locations each of which holds eight binary digits. (A binary digit is called a "bit", and eight of them together make a "byte", which is the amount of data needed to represent a number between 0 and 255, or a single character in a block of text. Apart from the numerical coincidence, computer bits have nothing to do with the bits that were made by breaking apart "pieces of eight".)

The numbers in the computer's memory are used for three different things. First, some of them are used to represent the data that the computer is going to be working on. Second, some of them represent the location of those data. And finally, some of them are instructions that tell the computer what to do with the data. Those instructions are called "machine language" because they're the only thing the machine actually understands. Nobody writes programs in machine language if they can possibly avoid it.

the details )

TL;DR: an assembler turns each line in a program directly into a machine-language instruction. A compiler takes a program written in a more complicated (for the computer) but easier to write in (for people) and turns it into a sequence of machine-language instructions, that can then be read back in and run. An interpreter skips that last step -- instead of writing out the machine language instructions, it just does them on the spot.

Teaser: Next time I'll talk about different kinds of programming languages: functional, imperative, object-oriented, and scripting. I'm also open to suggestions -- what would you like me to write about?