mdlbear: blue fractal bear with text "since 2002" (Default)

The Git version-control system was released 18 years ago today, so it's old enough to vote in the US. (The most recent version, 2.40.0, was released this year on my birthday. But I digress.)

In related news, I'd like to join elf and ysabetwordsmith in boosting the signal for The Fujoshi Guide to Web Development by Essential Randomness, on Kickstarter.

The Fujoshi Guide to Web Development is a series of zines/books featuring anthropomorphized versions of programming languages and concepts (aka gijinka), each one engineered from the ground up to cater to transformational fandom's sensibilities and interests.

[personal profile] elf summarizes it as:

Problem: The corporate webosphere is all based on public feeds of identical-looking scrolling content. Fandom has mostly lost the habit of creating their own webspaces for purposes other than constant interaction. But most of the tutorials are horribly hostile to beginners, or to fandom purposes, or both.

Solution: Learn web development from hot anime guys in a dating sim. Well, not actually a dating sim. But it looks like a dating sim.

... and appropriately enough the first demo is about Git (personified as a hawt catboy). (My guess is that Git's persona was chosen so that they could personify GitHub as an octocat-boy.) I'm not in the target demographic, obviously, but I'm all for anything that promises to get fans and other outsiders hooked on web development and version control.

mdlbear: (technonerdmonster)

You may remember this post about renaming the default branch in Git repositories. Since then I've done some script writing -- they say you don't really understand a process until you can write a program that does it, and this was no exception. (There are lots of exceptions, actually, but that's rather beside the point of this post...)

Anyway, here's what I think is the best way to rename master to main in a clone of a repository where that rename has already been done. (That's a common case anywhere you have multiple developers, each with their own clone, or one developer like me who works on a different laptop depending on the time of day and where the cats are sitting.)

     git fetch
     git branch -m master main
     git branch -u origin/main main
     git remote set-head origin main
     git remote prune origin

The interesting part is why this is the best way I've found of doing it: 1. It works even if master isn't the current branch, or if it's out of date or diverged from upstream. 2. It doesn't print extraneous warnings or fail with an error. Neither of those is a problem if you're doing everything manually, but it can be annoying or fatal in a script. So here it is again, with commentary:

git fetch -- you have to do this first, or the git branch -u ... line will fail because git will think you're setting upstream to a branch that doesn't exist on the origin.

git branch -m master main -- note that the renamed branch will still be tracking master. We fix that with...

git branch -u origin/main main -- many of the pages I've seen use git push -u..., but the push isn't necessary and has several different ways it can fail, for example if the current branch isn't main or if it isn't up to date.

git remote set-head origin main -- This sets main as the default branch, so things like git push will work without naming the branch. You can use -a for "automatic" instead of the branch name, but why make git do extra work? Many of the posts I've seen use the following low-level command, which works but isn't very clear and relies on implementation details you shouldn't have to bother with:

    git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main

git remote prune origin -- I've seen people suggesting git fetch --prune, but we already did the fetch way back in step 1. Alternatively, we could use --prune on that first fetch, but then git will complain about master tracking a branch that doesn't exist. It still works, but it's annoying in a script.

Just as an aside because I think it's amusing: my former employer (a large online retailer) used and probably still uses "mainline" for the default branch, and I've seen people suggesting as an alternative to "main". It is, if anything, more jarring than "master" for someone who has previously encountered "mainlining" only in the context of self-administered street drugs.

Another fine post from The Computer Curmudgeon (also at computer-curmudgeon.com).
Donation buttons in profile.

mdlbear: (technonerdmonster)

Hopefully, this post will become the first of a series about solving various common problems with Git. Note that the grouping in that phrase is intentionally ambiguous – it could be either “(solving various common problems) with Git”, or “solving (various common problems with Git)”, and I expect to cover both meanings. Often there are aspects of both: Git got you into trouble, and you need to use Git to get yourself out of it.

“It is easy to shoot your foot off with git, but also easy to revert to a previous foot and merge it with your current leg.” —Jack William Bell

In many cases, though, this will involve git rebase rather than merge, and I think “rebase it onto your current leg” reads better.

Overcoming your fear of git rebase

Many introductions to Git leave out rebase, either because the author considers it an “advanced technique”, or because “it changes history” and the author thinks that it’s undesirable to do so. The latter is undermined by the fact that they usually do talk about git commit --amend. But, like amend, rebase lets you correct mistakes that you would otherwise simply have to live with, and avoid some situations that you would have a lot of trouble backing out of.

In order to rebase fearlessly, you only need to follow these simple rules:

  • Always commit your changes before you pull, merge, rebase, or check out another branch! If you have your changes committed, you can always back out with git reset if something goes wrong. Stashing also works, because git stash commits your work in progress before resetting back to the last commit.
  • Never rebase or amend a commit that’s already been pushed to a shared branch! You can undo changes that were pushed by mistake with git revert. (There are a few cases where you really have to force-push changes, for example if you foolishly commit a configuration file that has passwords in it. It’s a huge hassle, and everyone else on your team will be annoyed at you. If you’re working on a personal project, you’ll be annoyed at yourself, which might be even worse.)
  • If you’re collaborating, do your work on a feature branch. You can use amend and rebase to clean it up before you merge it. You can even share it with a teammate (although it might be simpler to email a patch set).

That last rule is a lot less important if you’re working by yourself, but it’s still a good idea if you want to keep your history clean and understandable – see Why and How To Keep Your Master Happy. And remember that you’re effectively collaborating if your project is on GitHub or GitLab, even if nobody’s forked it yet.

Push rejected (not fast forward)

One common situation where you may want to rebase is when you try to push a commit and it gets rejected because there’s another commit on the remote repo. You can detect this situation without actually trying to push – just use git fetch followed by git status.

I get into this situation all the time with my to-do file, because I make my updates on the master branch and I have one laptop on my desk and a different one in my bedroom, and sometimes I make and commit some changes without pulling first to sync up. This usually happens before I’ve had my first cup of coffee.

The quick fix is git pull --rebase. Now all of the changes you made are sitting on top of the commit you just pulled, and it’s safe for you to push. If you’re developing software, be sure to run all your tests first, and take a close look at the files that were merged. Just because Git is happy with your rebase or merge, that doesn’t mean that something didn’t go subtly wrong.

Pull before pushing changes

I get into a similar situation at bedtime if I try to pull the day’s updates and discover that I hadn’t pushed the changes I made the previous night, resulting in either a merge commit that I didn’t want, or merge conflicts that I really didn’t want. You can avoid this problem by always using git pull --rebase (and you can set the config variable pull.rebase to true to make that the default, but it’sa little risky). But you can also fix the problem.

If you have a conflict, you can back get out of it with git merge --abort. (Remember that pull is just shorthand for fetch followed by merge.) If the merge succeeded and made an unwanted merge commit, you can use git reset --hard HEAD^.

Another possibility in this situation is that you have some uncommitted changes. In most cases Git will either go ahead with the merge, or warn you that a locally-modified file will be overwritten by the merge. In the first case, you may have merge conflicts to resolve. In the second, you can stash your changes with git stash, and after the pull has finished, merge them back in with git stash pop. (This combination is almost exactly the same as committing your changes and then rebasing on top of the pulled commit – stash actually makes two hidden commits, one to preserve the working tree, and the other to preserve the index. You can see it in action with gitk --all.

… and I’m going to stop here, because this has been sitting in my drafts folder, almost completely finished, since the middle of January.

Resources

NaBloPoMo stats:
   5524 words in 11 posts this month (average 502/post)
    967 words in 1 post today

Another fine post from The Computer Curmudgeon (also at computer-curmudgeon.com).
Donation buttons in profile.

mdlbear: (technonerdmonster)

If you've been paying attention to the software-development world, you may have noticed a movement to [remove] racist terms in tech contexts. The most obvious such terms are "master" and "slave", and there are plenty of good alternatives: primary/secondary, main/replica, leader/follower, etc. The one that almost every software developer sees every day is Git's "master" default branch. This issue on GitLab includes some good discussion of what makes "main" the best choice for git. (I've also seen "mainline" used.)

Renaming your master branch is easy. If you have a local repo that isn't a clone of anything (so it doesn't have any remotes), it's a one-liner:

   git branch -m master main

Renaming the default branch on an existing repo is trivial. If it has no remotes, for example if it's purely local or a shared repo on a server you have an ssh account on, it's a one-liner:

   git branch -m master main

It's a little more complicated for a clone, but not much more complicated:

   git branch -m master main
   git push -u origin main
   git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main
   git pull

What you need to do at this point depends on where your origin repo is located. If you've already renamed its default branch, you're done. If you haven't, the git push -u created it. At this point if your origin repo is on GitHub, need to log in and change its default branch from master to main because it won't let you delete its default branch.

Then, delete the old master branch with

   git push --delete master

This works for simple cases. It gets a little more complicated on GitHub because you might have web hooks, pull requests, and so on that still refer to master. GitHub says that renaming master will be a one-step process later in the year, so you may want to wait until then. For less complicated situations, any URLs that reference master will get automatically redirected to main. See this page for details.

I had a slightly different problem: my shared repositories are on my web host, and there are hook scripts that pull from the shared repo into the web directory. My version of the post-update only looks for changes in the master branch. Fortunately that's a one-liner, too:

   ssh HOST sed -i -e s/master/main/g REPO/hooks/post-update

 

The next problem is creating a new repo with main as the default branch. GitHub already does this, so if you are starting your project there you're good to go. Otherwise, read on:

The Git project has also added a configuration variable, init.defaultBranch, to specify the default branch for new repositories, but it's probably not in many distributions yet. Fortunately, there's a workaround, so if you don't want to wait for your distribution to catch up, you can take advantage of the way git init works, as described in this article by Leigh Brenecki:

  1. Find out where Git keeps the template that git init copies to initialize a new repo. On Ubuntu, that's /usr/share/git-core/templates, but if it isn't there look at the man page for git-init.
  2. Copy it to someplace under your control; I used .config/git/init-template.
  3. cd to the (new) template and create a file called HEAD, containing ref: refs/heads/main.
  4. Set the init.templateDir config variable to point to the new template.

Now when git wants to create a new repo, it will use HEAD to tell it which branch to create. Putting all that together, it looks like:

   cp -a /usr/share/git-core/templates/ ~/.config/git/init-template
   echo ref: refs/heads/main > ~/.config/git/init-template/HEAD
   git config --global init.templateDir ~/.config/git/init-template

You can actually replace that initial copy with mkdir; git is able to fill in the missing pieces. Alternatively, you can add things like a default config file, hooks, and so on.

(I've already updated my configuration repository, Honu, to set up the modified template along with all the other config files it creates. But that probably doesn't help anyone but me.)

Resources

Another fine post from The Computer Curmudgeon (also at computer-curmudgeon.com).
Donation buttons in profile.

NaBloPoMo stats:
   2146 words in 4 posts this month (average 536/post)
    814 words in 1 post today

mdlbear: (technonerdmonster)

It's been a while since I described the way I do backups -- in fact, the only public document I could find on the subject was written in 2006, and things have changed a great deal since then. I believe there have been a few mentions in Dreamwidth and elsewhere, but in this calamitous year it seems prudent to do it again. Especially since I'm starting to feel mortal, and starting to think that some day one of my kids is going to have to grovel through the whole mess and try to make sense of it. (Whether they'll find anything worth keeping or even worth the trouble of looking is, of course, an open question.)

My home file server, a small Linux box called Nova, is backed up by simply copying (almost -- see below) its entire disk to an external hard drive every night. (It's done using rsync, which is efficient because it skips over everything that hasn't been changed since the last copy.) When the disk crashes (it's almost always the internal disk, because the external mirror is idle most of the time) I can (and have, several times) swap in the external drive, make it bootable, order a new drive for the mirror, and I'm done. Or, more likely, buy a new pair of drives that are twice as big for half the price, copy everthing, and archive the better of the old drives. Update it occasionally.

That's not very interesting, but it's not the whole story. I used to make incremental backups -- instead of the mirror drive being an exact copy of the main one, it's a sequence of snapshots (like Apple's Time Machine, for example). There were some problems with that, including the fact because of the way the snapshots were made (using cp -l to copy directories but leave hard links to the files that haven't changed) it takes more space than it needs to, and makes the backup disk very difficult -- not to mention slow -- to copy if it starts flaking out. There are ways of getting around those problems now, but I don't need them.

The classic solution is to keep copies offsite. But I can do better than that because I already have a web host, and I have Git. I need to back up a little.

I noticed that almost everything I was backing up fell into one of three categories:

  1. Files I keep under version control.
  2. Files (mostly large ones, like audio recordings) that never change after they've been created -- recordings of past concerts, my collection of ripped CDs, the masters for my CD, and so on. I accumulate more of them as time goes by, but most of the old ones stick around.
  3. Files I can reconstruct, or that are purely ephemeral -- my browser cache, build products like PDFs, executable code, downloaded install CDs, and of course entire OS, which I can re-install any time I need to in under an hour.

Git's biggest advantage for both version control and backups is that it's distributed -- each working directory has its own repository, and you can have shared repositories as well. In effect, every repository is a backup. In my case the shared repositories are in the cloud on Dreamhost, my web host. There are working trees on Nova (the file server) and on one or more laptops. A few of the more interesting ones have public copies on GitLab and/or GitHub as well. So that takes care of Group 1.

The main reason for using incremental backup or version control is so that you can go back to earlier versions of something if it gets messed up. But the files in group don't change, they just accumulate. So I put all of the files in Group 2 -- the big ones -- into the same directory tree as the Git working trees; the only difference is that they don't have an associated Git repo. I keep thinking I should set up git-annex to manage them, but it doesn't seem necessary. The workflow is very similar to the Git workflow: add something (typically on a laptop), then push it to a shared server. The Rsync commands are in a Makefile, so I don't have to remember them: I just make rsync. (Rsync doesn't copy anything that is already at the destination and hasn't changed since the previous run, and by default it ignores files on the destination that don't have corresponding source files. So I don't have to have a complete copy of my concert recordings (for example) on my laptop, just the one I just made.)

That leaves Group 3 -- the files that don't have to be backed up because they can be reconstructed from version-controlled sources. All of my working trees include a Makefile -- in most cases it's a link to MakeStuff/Makefile -- that builds and installs whatever that tree needs. Programs, web pages, songbooks, what have you. Initial setup of a new machine is done by a package called Honu (Hawaiian for the green sea turtle), which I described a little over a year ago in Sable and the turtles: laptop configuration made easy.

The end result is that "backups" are basically a side-effect of the way I normally work, with frequent small commits that are pushed almost immediately to a shared repo on Dreamhost. The workflow for large files, especially recording projects, is similar, working on my laptop and backing up with Rsync to the file server as I go along. When things are ready, they go up to the web host. Make targets push and rsync simplify the process. Going in the opposite direction, the pull-all command updates everything from the shared repos.

Your mileage may vary.

Resources and references

Another fine post from The Computer Curmudgeon (also at computer-curmudgeon.com).
Donation buttons in profile.

mdlbear: (technonerdmonster)

Okay, this one's a bit weird. Apparently A hacker is wiping Git repositories and asking for a ransom (of 0.1 Bitcoin). Apparently it was done by scanning the entire web for /.git/config files and mining those for credentials (including access tokens and URLs of the form http://user.password@victim.com. The hacker "replaced" the contents of the repository with a ransom demand.

The perpetrator is apparently hoping that anyone stupid enough to leave their git repo accessible through the web (I admit -- I used to do that) and to put login credentials in it (no, I'm not that stupid -- that's one of the things everyone is warned about multiple times, just in case it wasn't obvious), is probably stupid enough to pay the ransom instead of simply restoring their repo from any clone of it and changing their password.

And of course it turns out that the entire repo is still there after the attack -- the perpetrator is apparently just adding a commit and pointing HEAD at it. this post on StackExchange explains how to recover.

It's even easier, though, if you've actually been using the repo, because then you'll have a clone of it somewhere and all you have to do is

  cd clone
  git push --force origin HEAD:master

There's still the perp's threat to release your code if you don't pay. If your code is in a public repo on GitHub, GitLab, or BitBucket -- who cares? If it's in a private repo, you may have a problem, provided you (1) think it's likely that this threat can be carried out (there is reason to believe that your code hasn't actually be stashed away anywhere) and (2) you think that whatever secrets may have been in your private repo are worth more than about $570.

You can see by looking at Bitcoin Address 1ES14c7qLb5CYhLMUekctxLgc1FV2Ti9DA that, so far (4pm today) nobody has paid up.

Resources

Another fine post from The Computer Curmudgeon (also at computer-curmudgeon.com).
Donation buttons in profile.

mdlbear: (technonerdmonster)

In my previous curmudgeon post, Writing Without Distractions, I gave version control only a brief mention, and promised a follow-up post. That would be this one. This post is intended for people who are not in the software industry, including not only poets but other writers, students, people who program as a hobby, and programmers who have been in suspended animation for the last decade or three and are just now waking up.

The Wikipedia article on version control gives a pretty good overview, but it suffers from being way too general, and at the same time too focused on software development. This post is aimed at poets and other writers, and will be using the most popular version control system, git. (That Wikipedia article shares many of the same flaws as the one on version control.) My earlier post, Git: The other blockchain, was aimed at software developers and blockchain enthusiasts.

What is version control and why should I use it?

A version control system, also called a software configuration management (SCM) system, is a system for keeping track of changes in a collection of files. (The two terms have slightly different connotations and are used in different contexts, but it's like "writer" and "author" -- a distinction without much of a difference. For what it's worth, git's official website is git-scm.com/, but the first line of text on the site says that "Git is a free and open source distributed version control system". Then in the next paragraph they use the initialism SCM when they want to shorten it. Maybe it's easier to type? Go figure.)

So what does the ability to "track changes" really get you?

Quite a lot, actually! )

...and Finally

The part you've been waiting for -- the end. This post is already long, so I'll just refer you to the resources for now. Expect another installment, though, and please feel free to suggest future topics.

Resources

Tutorials

Digging Deeper

Another fine post from The Computer Curmudgeon (also at computer-curmudgeon.com).
Donation buttons in profile.

mdlbear: (technonerdmonster)

Part 1: Blockchain

Blockchain is the technology behind Bitcoin and other cybercurrencies. That's about all anyone outside the software industry knows about it; that and the fact that lots of people are claiming that it's going to transform everything. (The financial industry, the Web, manufacturing supply chains, identity, the music industry, ... the list goes on.) If you happen to be in the software industry and have a moderately good idea of what blockchain is, how it works, and what it can and can't do, you may want to skip to Part 2.

Still with me? Here's the fifty-cent summary of blockchain. Blockchain is a distributed, immutable ledger. Buzzword is a buzzword buzzword buzzword? Blockchain is a chain of blocks? That's closer.

The purpose of a blockchain is to keep track of financial transactions (that's the "ledger" part) and other data by making them public (that's half of the "distributed" part), keeping them in blocks of data (that's the "block" part) that can't be changed (that's the "immutable" part, and it's a really good property for a ledger to have), are linked together by hashes (that's the "chain" part, and we'll get to what hashes are in a moment), with the integrity of that chain guaranteed by a large group of people (that's the other half of the "distributed" part) called "miners" (WTF?).

Let's start in the middle: how can we link blocks of data together so that they can't be changed? Let's start by making it so that any change to a block, or to the order of those blocks, can be detected. Then, the fact that everything is public makes the data impossible to change without that change being glaringly obvious. We do that with hashes.

A hash function is something that takes a large block of data and turns it into a very long sequence of bits (which we will sometimes refer to as a "number", because any whole number can be represented by a sequence of binary digits, and sometimes as a "hash", because the data has been chopped up and mashed together like the corned beef hash you had for breakfast). A good hash function has two important properties:

  1. It's irreversible. Starting with a hash, it is effectively impossible to construct a block of data that will produce that hash. (It is significantly easier to construct two blocks with the same hash, which is why the security-conscious world moves to larger hashes from time to time.)
  2. It's unpredictable. If two blocks of data differ anywhere, even by a single bit, their hashes will be completely different.

Those two together mean that if two blocks have the same hash, they contain the same data. If somebody sends you a block and a hash, you can compare the hash of the block and if it matches, you can be certain that the block hasn't been damaged or tampered with before it got to you. And if they also cryptographically sign that hash, you can be certain that they used the key that created that signature.

Now let's guarantee the integrity of the sequence of blocks by chaining them together. Every block in the chain contains the hash of the previous block. If block B follows block A in the chain, B's hash depends in part on the hash of block A. If a villain tries to insert a forged transaction into block A, its hash won't match the one in block B.

Now we get to the part that makes blockchain interesting: getting everyone to agree on which transactions go into the next block. This is done by publishing transactions where all of the miners can see them. The miners then get to work with shovels and pickaxes big fast computers, validating the transaction, putting it into a block, and then running a contest to see which of them gets to add their block to the chain and collect the associated reward. Winning the contest requires doing a lot of computation. It's been estimated that miners' computers collectively consume roughly the same amount of electricity as Ireland.

There's more to it, but that's blockchain in a nutshell. I am not going to say anything about what blockchain might be good for besides keeping track of virtual money -- that's a whole other rabbit hole that I'll save for another time. For now, the important thing is that blockchain is a system for keeping track of financial transactions by using a chain of blocks connected by hashes.

The need for miners to do work is what makes the virtual money they're mining valuable, and makes it possible for everyone to agree on who owns how much of it without anyone having to trust anyone else. It's all that work that makes it possible to detect cheating. It also makes it expensive and slow. The Ethereum blockchain can handle about ten transactions per second. Visa handles about 10,000.

Part 2: The other blockchain

Meanwhile, in another part of cyberspace, software developers are using another system based on hash chains to keep track of their software -- a distributed version control system called git. It's almost completely different, except for the way it uses hashes. How different? Well, for starters it's both free and fast, and you can use it at home. And it has nothing to do with money -- it's a version control system.

If you've been with me for a while, you've probably figured out that I'm extremely fond of git. This post is not an introduction to git for non-programmers -- I'm working on that. However, if you managed to get this far it does contain enough information to stand on its own,

Git doesn't use transactions and blocks; instead it uses "objects", but just like blocks each object is identified by its hash. Instead of keeping track of virtual money, it keeps track of files and their histories. And just as blockchain keeps a complete history of everyone's coins, git records the complete history of everyone's data.

Git uses several types of object, but the most fundamental one is called a "blob", and consists of a file, its size, and the word "blob". For example, here's how git idenifies one of my Songs for Saturday posts:

git hash-object 2019/01/05--s4s-welcome-to-acousticville.html
957259dd1e41936104f72f9a8c451df50b045c57

Everything you do with git starts with the git command. In this case we're using git hash-object and giving it the pathname of the file we want to hash. Hardly anyone needs to use the hash-object subcommand; it's used mainly for testing and the occasional demonstration.

Git handles a directory (you may know directories as "folders" if you aren't a programmer) by combining the names, metadata, and hashes of all of its contents into a type of object called a "tree", and taking the hash of the whole thing.

Here, by the way, is another place where git really differs from blockchain. In a blockchain, all the effort of mining goes into making sure that every block points to its one guaranteed-unique correct predecessor. In other words, the blocks form a chain. Files and directories form a tree, with the ordinary files as the leaves, and directories as branches. The directory at the top is called the root. Top? Top. For some reason software trees grow from the root down. After a while you get used to it.

Actually, that's not quite accurate, because git stores each object in exactly one place, and it's perfectly possible for the same file to be in two different directories. This can be very useful -- if you make a hundred copies of a file, git only has to store one of them. It's also inaccurate because trees, called Merkle Trees are used inside of blocks in a blockchain. But I digress.

Technically the hash links in both blockchains and git form a directed acyclic graph -- that means that the links all point in one direction, and there aren't any loops. In order to make a loop you'd have to predict the hash of some later block, and you just can't do that. I have another post about why this is a good thing.

And that brings us to the things that make git, git: commits. ("Commit" is used in the same sense, more or less, as it is in the phrase "commit something to memory", or "commit to a plan of action". It has very little to do with crime. Hashes are even more unique than fingerprints, and we all know what criminals think about fingerprints. In cryptography, the hash of a key is called its fingerprint.)

Anyway, when you're done making changes in a project, you type the command

git commit

... and git will make a new commit object which contains, among other things, the time and date, your name and email address, maybe your cryptographic signature, a brief description of what you did (git puts you into your favorite text editor so you can enter this if you didn't put it on the command line), the hash of the current root, and the hash of the previous commit. Just like a blockchain.

Unlike earlier version control systems, git never has to compare files; all it has to do is compare their hashes. This is fast -- git's hashes are only 20 bytes long, no matter how big the files are or how many are in a directory tree. And if the hashes of two trees are the same, git doesn't have to look at any of the blobs in those trees to know that they are all the same.

@ Blockchain 101 — only if you ‘know nothing’! – Hacker Noon @ When do you need blockchain? Decision models. – Sebastien Meunier @ Git - Git Objects @ git ready » how git stores your data @ Git/Internal structure - Wikibooks, open books for an open world @ Why Singly-Linked Lists Win* | Stephen Savitzky

Another fine post from The Computer Curmudgeon (also at computer-curmudgeon.com).

mdlbear: (technonerdmonster)

If you develop software and haven't just returned from the moon, you've undoubtedly heard that GitHub is being acquired by Microsoft. Depending on your affiliations you might be spelling "being acquired by" as "selling out to". The rest of you are probably wondering what on Earth a GitHub is, and why Microsoft would want one. Let me explain.

Please note: this post isn't about my opinion of today's news. It's really too early to tell, though I may get into that a little toward the end. Instead, I'm going to explain what GitHub is, and why it matters. But first I have to explain Git.

Git is a version-control system. (Version-control systems are sometimes called "source code management" (SCM) systems. If you look closely you might even have spotted "scm" in git's URL up there at the end of the last paragraph.) Basically, a version-control system lets you record the complete history of a project, with what changes were made, who made the each change, when they changed it, and their notes about what they did and why. It doesn't have to be a software project, either. It can be recipes, photographs, books, the papers you're writing for school, or even blog entries. (Yes, I do.)

Before git, most version-control systems kept track of changes in text files (which of course is what all source code is) by recording which lines are different from the previous version. (It's usually done by a program called diff.) This was very compact, but it could also be very slow if you had to undo all the changes between two versions in order to see what the older one looked like.

Git, on the other hand, is blindingly fast in part because it works in the stupidest way possible (which is why it's called "git"). It simply takes the new version of each file that changed since the last version, zips it up, and stuffs it whole into its repository. So it takes git about the same amount of time to roll a file back two versions or two hundred.

The other thing that makes git fast is where it keeps all of its version information. Before git, most version-control systems used a centralized repository on a server somewhere. (Subversion, one of the best of these, even lets you browse the repository with a web browser.) That means that all the change information is going over a network. Git keeps its repository (these days everyone shortens that to "repo") on your local disk, right next to your working copy, in a hidden subdirectory called ".git".

Because its repo is local, and contains the entire history of your project, you don't need a network connection to use git. On the beach, in an airplane, on a boat, with a goat, it doesn't matter to git. It's de-centralized. It gets a little more complicated when more than one developer is working on a project.

Bob's been in the office all week working on a project. When his boss, Alice, comes back from the open source conference she's been at all week, all she has to do is tell git to fetch all the changes that Bob made while she was away. Git gets them directly from Bob's repo. If Alice didn't make any changes, that's called a "fast-forward" merge -- git just takes the changes that Bob made, copies those files into Alice's repo, updates her working tree, and it's done.

It's a little trickier if Alice had time to make some changes, too. Now Alice has to merge the two sets of changes, and then let Bob pull the merged files onto his computer. By the way, a "pull" is just a fetch followed by a merge, but it's so common that git has a shorthand way of doing it. (I'm oversimplifying here, but this isn't the time to go into the difference between merge and rebase. It's also not a good time to talk about branches -- maybe some other week.) As you can imagine, this gets out of hand pretty quickly, and it's even worse if there's a whole team working on the project.

The obvious thing to do is for the group to have one repo on a server somewhere that has what everyone agrees is the definitive set of files on it. Bob pushes his changes to the server, and when Alice tries to push her changes, git balks and gives her an error message. Now it's Alice's responsibility to make any necessary fixes and push them to the server. Actually, in a real team, Alice would send her proposed changes around by making a diff and sending email to the other team members to review, and not actually push her changes until someone approves them.

In a large team, this is kind of a hub-and-spokes arrangement. You can see where this is going, right?

GitHub is a company that provides a place for people and projects to put shared git repositories where other people can see them, clone them, and contribute to them. GitHub has become wildly popular, because it's a great place to share software. If you have an open-source software project, putting a public repo on GitHub is the most effective way to reach developers. It's so popular that Google and Microsoft shut down their own code-hosting sites (Google Code and CodePlex respectively) and moved to GitHub. Microsoft, it turns out, is GitHub's biggest contributor.

Putting a public repository on GitHub is free. If you want to set up private repositories, GitHub will charge you for it, and if your company wants to put a clone of GitHub on its own private servers they can buy GitHub Enterprise, but if your software is free, so's your space on GitHub.

That's a bit of a problem, because the software that runs GitHub is not free. That means that they need a steady stream of income to pay their in-house developers, because they're not going to get any help from the open-source developer community. GitHub lost $66 million in 2016, and doesn't really have a sustainable business model that would make them attractive to investors. They needed to get acquired, or they had a real risk of going under. And when a service based on proprietary software goes under, all of their customers have a big problem. But their users? Heh.

Everybody knows the old adage, "if you're getting a service for free you're not the customer, you're the product." That's especially true for companies like Google and Facebook, which sell their users' eyeballs to advertisers. It's a lot less true for a company whose users can leave any time they want, painlessly, taking all their data and their readers with them. I'm sure most of my readers here on Dreamwidth remember what happened to Livejournal when they got bought by the Russians. Well, GitHub is being bought by Microsoft. It's not entirely clear which is worse.

GitHub has an even worse problem than Livejournal did, because "cross-posting" is basically the way git works. There's a company called GitLab that looks a lot like GitHub, except that their core software -- the stuff that wraps a slick web interface around a git repository -- is open source. (They do sell extensions, but most projects aren't going to need them.) If you want to set up your own private GitLab site, it's free, and you can do it in ten minutes with a one-line command. If you find bugs, you can fix them yourself. You'll find a couple of great quotes from their blog at the end of the notes, but the bottom line is that 100,000 repositories have moved from GitHub to GitLab in the last 24 hours.

And once you've moved a project to GitLab, you don't have to worry about what happens to it, because the open-source core of it will continue to be maintained by its community. That's what happened when a company called Netscape went belly-up: Mozilla Firefox is still around and doing fine. And if the fact that GitLab is for profit is a problem for you, there's Apache Allura, gitolite3, gitbucket, and gitweb (to name a few). Go for it!

 

This so wasn't what I was planning to write today.

Notes:
  @ Microsoft Reportedly Acquires GitHub | Linux Journal
    The article ends with a list of alternatives:
    Gitea
    Apache Allura
    GitBucket: A Git platform
    GitLab
  @ Microsoft acquires GitHub for $7.5 billion - TFiR
    " According to reports, GitHub lost over $66 millions in 2016. At the same time
      GitLab, a fully open source and decentralized service is gaining momentum, giving
      users a fully open source alternative. "
  @ Microsoft to acquire GitHub for $7.5 billion | Stories official press release
  @ Microsoft + GitHub = Empowering Developers - The Official Microsoft Blog
  @ A bright future for GitHub | The GitHub Blog
  @ Congratulations GitHub on the acquisition by Microsoft | GitLab
    " While we admire what's been done, our strategy differs in two key areas. First,
      instead of integrating multiple tools together, we believe a single application,
      built from the ground up to support the entire DevOps lifecycle is a better
      experience leading to a faster cycle time. Second, it’s important to us that the
      core of our product always remain open source itself as well. "
  @ GitLab Ultimate and Gold now free for education and open source | GitLab 
    " It has been a crazy 24 hours for GitLab. More than 2,000 people tweeted about
      #movingtogitlab. We imported over 100,000 repositories, and we've seen a 7x increase
      in orders. We went live on Bloomberg TV. And on top of that, Apple announced an
      Xcode integration with GitLab. "

Another fine post from The Computer Curmudgeon.

mdlbear: blue fractal bear with text "since 2002" (Default)

On the health front, I may finally be learning to relax the muscles in my lower back that make it hurt when I walk. Maybe. It also seems to have a lot to do with how heavy my shoulder bag is, so that's going to be an ongoing problem. A backpack would be better, except that it's hard to get off when I take a seat in the bus, and unlike a shoulder bag I can't swing it around when I want to get at something like my wallet.

I've finally started doing some serious system administration/scripting work to get my website working directories the rest of the way under git control. That's done -- I can now say "make deploy" in a web directory and have it committed, pushed to the remote repo, and pulled into the website with no further attention.

In the process, I had to write a script for converting a directory from CVS to git. There are a couple of challenges in that process because the old CVS repositories were in pretty bad shape, with stuff not having been checked in consistently. Not like a well-maintained software project, in other words. Bad bear. No cookie. My websites don't use cookies anyway.

The associated asset archive is going to be harder, because some directories have large media files in them. Like, um... the audio. The goal is to eliminate the use of rsync snapshots for backups (for reasons I will probably go into in more detail in a later post).

Detail in the notes, as usual.

raw notes, with links )
mdlbear: blue fractal bear with text "since 2002" (Default)
raw notes )

A pretty good day. I even got a walk in, though I cut it a little short because I was getting some foot pain. Naturally it went away as soon as I turned around. (It's because I have shoes with three different insoles. I get arch pain sometimes when readjusting between them. :P )

Quite a lot of puttering around the websites and associated makefiles, including finally getting HyperSpace-Express.com online. After owning it for how many years? Did I mention that I procrastinate?

Speaking of procrastination, I also got No Greater Love fully chorded out. About 2 weeks late, but in time for Tempered Glass's Orycon gig. Which is next Saturday evening. Eeek!

Lots of good links under the cut. Don Marti provided a lot of them, including a few great git links, one of which had this marvelous quote:

It is easy to shoot your foot off with git, but also easy to revert to a previous foot and merge it with your current leg.

mdlbear: blue fractal bear with text "since 2002" (Default)
raw notes )

A very good day. I woke up to find myself still happy from having finished writing my presentation about git the night before. And it went well; it was planned as a half-hour talk, but ran to nearly an hour with a lot of good audience interaction. Go me! It helps that git is just plain cool; I expect to have the presentation up on the web soon.

The Cat and I went to the Valley Fair mall for our evening out. Parts were almost deserted; I could walk fast enough for exercise, and Colleen could keep up. Delightful.

I'm going to have to re-investigate LJ's set of pre-defined moods; I'm almost completely unfamiliar with the "better-than-OK" range.

mdlbear: blue fractal bear with text "since 2002" (Default)

At the museum yesterday we spotted a man on a nice little folding scooter: almost certainly this one. Either folds up or comes apart; the combination of small wheels, plastic seat, and small battery means that it's probably limited to light-duty, mainly indoor use, but it looks especially convenient for travel. A bit pricy, though.

From my Mom, a link to pomegranate.com, an art publishing house.

From [livejournal.com profile] gmcdavid, this post linking to an obituary for the last surviving member of Nicolas Bourbaki. I've read a couple of their books - crystal clear even with my rather limited high school French. Sad: another Great Old One gone.

As long as I'm clearing my tabs, here's a link to an article on organizing a web site with git, on linuxworld.com.

mdlbear: (hacker glider)
Getting a static web site organized with git | LinuxWorld Community
Yes, I still end up maintaining some static web sites. I've started doing them under git revision control, just to be safe, and because "git push origin" is just as easy as rsync anyway. Here's a rough cut at a system for keeping these things organized.
Not as directly useful to me as it would be if I wasn't already syncing my entire web-related directory tree up to a large external hosting site for backup.
mdlbear: (hacker glider)

Excellent post by [livejournal.com profile] don_marti on becoming more productive by going offline. Git (distributed version control, basically syncing on steroids), ikiwiki (offline-rendered wiki), blosxom (offline-rendered blog), and more. It's related to a lot of what I've been saying about keeping control of your own data. In essence, what you want to do is to separate writing from publishing.

mdlbear: (hacker glider)

...and other geeky things.

Since I've started to use a version control system called git for my recording projects, and since the subject has come up in the comments to my last post, I thought I'd dive a little deeper into git and why I'm using it. (This is a good summary of git's features.)

non-geeks may want to skip this. )

It's getting lateish, so I'll continue this little dissertation this evening. Happy hacking!

mdlbear: (hacker glider)

I'm in the process of uploading my most recent take of "Someplace in the Net", along with enough version control information to make it possible for a collaborator (waves at [livejournal.com profile] cflute) to upload some additions. I'm almost certainly doing it wrong; possibly somebody more familiar with the git version-control system could tell me how to do what I really want to do.

technical details )

Anything to avoid doing actual work...

Most Popular Tags

Syndicate

RSS Atom

Style Credit

Page generated 2025-06-28 09:23 pm
Powered by Dreamwidth Studios