mdlbear: (technonerdmonster)

In my previous curmudgeon post, Writing Without Distractions, I gave version control only a brief mention, and promised a follow-up post. That would be this one. This post is intended for people who are not in the software industry, including not only poets but other writers, students, people who program as a hobby, and programmers who have been in suspended animation for the last decade or three and are just now waking up.

The Wikipedia article on version control gives a pretty good overview, but it suffers from being way too general, and at the same time too focused on software development. This post is aimed at poets and other writers, and will be using the most popular version control system, git. (That Wikipedia article shares many of the same flaws as the one on version control.) My earlier post, Git: The other blockchain, was aimed at software developers and blockchain enthusiasts.

What is version control and why should I use it?

A version control system, also called a software configuration management (SCM) system, is a system for keeping track of changes in a collection of files. (The two terms have slightly different connotations and are used in different contexts, but it's like "writer" and "author" -- a distinction without much of a difference. For what it's worth, git's official website is git-scm.com/, but the first line of text on the site says that "Git is a free and open source distributed version control system". Then in the next paragraph they use the initialism SCM when they want to shorten it. Maybe it's easier to type? Go figure.)

So what does the ability to "track changes" really get you?

Quite a lot, actually! )

...and Finally

The part you've been waiting for -- the end. This post is already long, so I'll just refer you to the resources for now. Expect another installment, though, and please feel free to suggest future topics.

Resources

Tutorials

Digging Deeper

Another fine post from The Computer Curmudgeon (also at computer-curmudgeon.com).
Donation buttons in profile.

mdlbear: (technonerdmonster)

Part 1: Blockchain

Blockchain is the technology behind Bitcoin and other cybercurrencies. That's about all anyone outside the software industry knows about it; that and the fact that lots of people are claiming that it's going to transform everything. (The financial industry, the Web, manufacturing supply chains, identity, the music industry, ... the list goes on.) If you happen to be in the software industry and have a moderately good idea of what blockchain is, how it works, and what it can and can't do, you may want to skip to Part 2.

Still with me? Here's the fifty-cent summary of blockchain. Blockchain is a distributed, immutable ledger. Buzzword is a buzzword buzzword buzzword? Blockchain is a chain of blocks? That's closer.

The purpose of a blockchain is to keep track of financial transactions (that's the "ledger" part) and other data by making them public (that's half of the "distributed" part), keeping them in blocks of data (that's the "block" part) that can't be changed (that's the "immutable" part, and it's a really good property for a ledger to have), are linked together by hashes (that's the "chain" part, and we'll get to what hashes are in a moment), with the integrity of that chain guaranteed by a large group of people (that's the other half of the "distributed" part) called "miners" (WTF?).

Let's start in the middle: how can we link blocks of data together so that they can't be changed? Let's start by making it so that any change to a block, or to the order of those blocks, can be detected. Then, the fact that everything is public makes the data impossible to change without that change being glaringly obvious. We do that with hashes.

A hash function is something that takes a large block of data and turns it into a very long sequence of bits (which we will sometimes refer to as a "number", because any whole number can be represented by a sequence of binary digits, and sometimes as a "hash", because the data has been chopped up and mashed together like the corned beef hash you had for breakfast). A good hash function has two important properties:

  1. It's irreversible. Starting with a hash, it is effectively impossible to construct a block of data that will produce that hash. (It is significantly easier to construct two blocks with the same hash, which is why the security-conscious world moves to larger hashes from time to time.)
  2. It's unpredictable. If two blocks of data differ anywhere, even by a single bit, their hashes will be completely different.

Those two together mean that if two blocks have the same hash, they contain the same data. If somebody sends you a block and a hash, you can compare the hash of the block and if it matches, you can be certain that the block hasn't been damaged or tampered with before it got to you. And if they also cryptographically sign that hash, you can be certain that they used the key that created that signature.

Now let's guarantee the integrity of the sequence of blocks by chaining them together. Every block in the chain contains the hash of the previous block. If block B follows block A in the chain, B's hash depends in part on the hash of block A. If a villain tries to insert a forged transaction into block A, its hash won't match the one in block B.

Now we get to the part that makes blockchain interesting: getting everyone to agree on which transactions go into the next block. This is done by publishing transactions where all of the miners can see them. The miners then get to work with shovels and pickaxes big fast computers, validating the transaction, putting it into a block, and then running a contest to see which of them gets to add their block to the chain and collect the associated reward. Winning the contest requires doing a lot of computation. It's been estimated that miners' computers collectively consume roughly the same amount of electricity as Ireland.

There's more to it, but that's blockchain in a nutshell. I am not going to say anything about what blockchain might be good for besides keeping track of virtual money -- that's a whole other rabbit hole that I'll save for another time. For now, the important thing is that blockchain is a system for keeping track of financial transactions by using a chain of blocks connected by hashes.

The need for miners to do work is what makes the virtual money they're mining valuable, and makes it possible for everyone to agree on who owns how much of it without anyone having to trust anyone else. It's all that work that makes it possible to detect cheating. It also makes it expensive and slow. The Ethereum blockchain can handle about ten transactions per second. Visa handles about 10,000.

Part 2: The other blockchain

Meanwhile, in another part of cyberspace, software developers are using another system based on hash chains to keep track of their software -- a distributed version control system called git. It's almost completely different, except for the way it uses hashes. How different? Well, for starters it's both free and fast, and you can use it at home. And it has nothing to do with money -- it's a version control system.

If you've been with me for a while, you've probably figured out that I'm extremely fond of git. This post is not an introduction to git for non-programmers -- I'm working on that. However, if you managed to get this far it does contain enough information to stand on its own,

Git doesn't use transactions and blocks; instead it uses "objects", but just like blocks each object is identified by its hash. Instead of keeping track of virtual money, it keeps track of files and their histories. And just as blockchain keeps a complete history of everyone's coins, git records the complete history of everyone's data.

Git uses several types of object, but the most fundamental one is called a "blob", and consists of a file, its size, and the word "blob". For example, here's how git idenifies one of my Songs for Saturday posts:

git hash-object 2019/01/05--s4s-welcome-to-acousticville.html
957259dd1e41936104f72f9a8c451df50b045c57

Everything you do with git starts with the git command. In this case we're using git hash-object and giving it the pathname of the file we want to hash. Hardly anyone needs to use the hash-object subcommand; it's used mainly for testing and the occasional demonstration.

Git handles a directory (you may know directories as "folders" if you aren't a programmer) by combining the names, metadata, and hashes of all of its contents into a type of object called a "tree", and taking the hash of the whole thing.

Here, by the way, is another place where git really differs from blockchain. In a blockchain, all the effort of mining goes into making sure that every block points to its one guaranteed-unique correct predecessor. In other words, the blocks form a chain. Files and directories form a tree, with the ordinary files as the leaves, and directories as branches. The directory at the top is called the root. Top? Top. For some reason software trees grow from the root down. After a while you get used to it.

Actually, that's not quite accurate, because git stores each object in exactly one place, and it's perfectly possible for the same file to be in two different directories. This can be very useful -- if you make a hundred copies of a file, git only has to store one of them. It's also inaccurate because trees, called Merkle Trees are used inside of blocks in a blockchain. But I digress.

Technically the hash links in both blockchains and git form a directed acyclic graph -- that means that the links all point in one direction, and there aren't any loops. In order to make a loop you'd have to predict the hash of some later block, and you just can't do that. I have another post about why this is a good thing.

And that brings us to the things that make git, git: commits. ("Commit" is used in the same sense, more or less, as it is in the phrase "commit something to memory", or "commit to a plan of action". It has very little to do with crime. Hashes are even more unique than fingerprints, and we all know what criminals think about fingerprints. In cryptography, the hash of a key is called its fingerprint.)

Anyway, when you're done making changes in a project, you type the command

git commit

... and git will make a new commit object which contains, among other things, the time and date, your name and email address, maybe your cryptographic signature, a brief description of what you did (git puts you into your favorite text editor so you can enter this if you didn't put it on the command line), the hash of the current root, and the hash of the previous commit. Just like a blockchain.

Unlike earlier version control systems, git never has to compare files; all it has to do is compare their hashes. This is fast -- git's hashes are only 20 bytes long, no matter how big the files are or how many are in a directory tree. And if the hashes of two trees are the same, git doesn't have to look at any of the blobs in those trees to know that they are all the same.

@ Blockchain 101 — only if you ‘know nothing’! – Hacker Noon @ When do you need blockchain? Decision models. – Sebastien Meunier @ Git - Git Objects @ git ready » how git stores your data @ Git/Internal structure - Wikibooks, open books for an open world @ Why Singly-Linked Lists Win* | Stephen Savitzky

Another fine post from The Computer Curmudgeon (also at computer-curmudgeon.com).

mdlbear: (technonerdmonster)

If you develop software and haven't just returned from the moon, you've undoubtedly heard that GitHub is being acquired by Microsoft. Depending on your affiliations you might be spelling "being acquired by" as "selling out to". The rest of you are probably wondering what on Earth a GitHub is, and why Microsoft would want one. Let me explain.

Please note: this post isn't about my opinion of today's news. It's really too early to tell, though I may get into that a little toward the end. Instead, I'm going to explain what GitHub is, and why it matters. But first I have to explain Git.

Git is a version-control system. (Version-control systems are sometimes called "source code management" (SCM) systems. If you look closely you might even have spotted "scm" in git's URL up there at the end of the last paragraph.) Basically, a version-control system lets you record the complete history of a project, with what changes were made, who made the each change, when they changed it, and their notes about what they did and why. It doesn't have to be a software project, either. It can be recipes, photographs, books, the papers you're writing for school, or even blog entries. (Yes, I do.)

Before git, most version-control systems kept track of changes in text files (which of course is what all source code is) by recording which lines are different from the previous version. (It's usually done by a program called diff.) This was very compact, but it could also be very slow if you had to undo all the changes between two versions in order to see what the older one looked like.

Git, on the other hand, is blindingly fast in part because it works in the stupidest way possible (which is why it's called "git"). It simply takes the new version of each file that changed since the last version, zips it up, and stuffs it whole into its repository. So it takes git about the same amount of time to roll a file back two versions or two hundred.

The other thing that makes git fast is where it keeps all of its version information. Before git, most version-control systems used a centralized repository on a server somewhere. (Subversion, one of the best of these, even lets you browse the repository with a web browser.) That means that all the change information is going over a network. Git keeps its repository (these days everyone shortens that to "repo") on your local disk, right next to your working copy, in a hidden subdirectory called ".git".

Because its repo is local, and contains the entire history of your project, you don't need a network connection to use git. On the beach, in an airplane, on a boat, with a goat, it doesn't matter to git. It's de-centralized. It gets a little more complicated when more than one developer is working on a project.

Bob's been in the office all week working on a project. When his boss, Alice, comes back from the open source conference she's been at all week, all she has to do is tell git to fetch all the changes that Bob made while she was away. Git gets them directly from Bob's repo. If Alice didn't make any changes, that's called a "fast-forward" merge -- git just takes the changes that Bob made, copies those files into Alice's repo, updates her working tree, and it's done.

It's a little trickier if Alice had time to make some changes, too. Now Alice has to merge the two sets of changes, and then let Bob pull the merged files onto his computer. By the way, a "pull" is just a fetch followed by a merge, but it's so common that git has a shorthand way of doing it. (I'm oversimplifying here, but this isn't the time to go into the difference between merge and rebase. It's also not a good time to talk about branches -- maybe some other week.) As you can imagine, this gets out of hand pretty quickly, and it's even worse if there's a whole team working on the project.

The obvious thing to do is for the group to have one repo on a server somewhere that has what everyone agrees is the definitive set of files on it. Bob pushes his changes to the server, and when Alice tries to push her changes, git balks and gives her an error message. Now it's Alice's responsibility to make any necessary fixes and push them to the server. Actually, in a real team, Alice would send her proposed changes around by making a diff and sending email to the other team members to review, and not actually push her changes until someone approves them.

In a large team, this is kind of a hub-and-spokes arrangement. You can see where this is going, right?

GitHub is a company that provides a place for people and projects to put shared git repositories where other people can see them, clone them, and contribute to them. GitHub has become wildly popular, because it's a great place to share software. If you have an open-source software project, putting a public repo on GitHub is the most effective way to reach developers. It's so popular that Google and Microsoft shut down their own code-hosting sites (Google Code and CodePlex respectively) and moved to GitHub. Microsoft, it turns out, is GitHub's biggest contributor.

Putting a public repository on GitHub is free. If you want to set up private repositories, GitHub will charge you for it, and if your company wants to put a clone of GitHub on its own private servers they can buy GitHub Enterprise, but if your software is free, so's your space on GitHub.

That's a bit of a problem, because the software that runs GitHub is not free. That means that they need a steady stream of income to pay their in-house developers, because they're not going to get any help from the open-source developer community. GitHub lost $66 million in 2016, and doesn't really have a sustainable business model that would make them attractive to investors. They needed to get acquired, or they had a real risk of going under. And when a service based on proprietary software goes under, all of their customers have a big problem. But their users? Heh.

Everybody knows the old adage, "if you're getting a service for free you're not the customer, you're the product." That's especially true for companies like Google and Facebook, which sell their users' eyeballs to advertisers. It's a lot less true for a company whose users can leave any time they want, painlessly, taking all their data and their readers with them. I'm sure most of my readers here on Dreamwidth remember what happened to Livejournal when they got bought by the Russians. Well, GitHub is being bought by Microsoft. It's not entirely clear which is worse.

GitHub has an even worse problem than Livejournal did, because "cross-posting" is basically the way git works. There's a company called GitLab that looks a lot like GitHub, except that their core software -- the stuff that wraps a slick web interface around a git repository -- is open source. (They do sell extensions, but most projects aren't going to need them.) If you want to set up your own private GitLab site, it's free, and you can do it in ten minutes with a one-line command. If you find bugs, you can fix them yourself. You'll find a couple of great quotes from their blog at the end of the notes, but the bottom line is that 100,000 repositories have moved from GitHub to GitLab in the last 24 hours.

And once you've moved a project to GitLab, you don't have to worry about what happens to it, because the open-source core of it will continue to be maintained by its community. That's what happened when a company called Netscape went belly-up: Mozilla Firefox is still around and doing fine. And if the fact that GitLab is for profit is a problem for you, there's Apache Allura, gitolite3, gitbucket, and gitweb (to name a few). Go for it!

 

This so wasn't what I was planning to write today.

Notes:
  @ Microsoft Reportedly Acquires GitHub | Linux Journal
    The article ends with a list of alternatives:
    Gitea
    Apache Allura
    GitBucket: A Git platform
    GitLab
  @ Microsoft acquires GitHub for $7.5 billion - TFiR
    " According to reports, GitHub lost over $66 millions in 2016. At the same time
      GitLab, a fully open source and decentralized service is gaining momentum, giving
      users a fully open source alternative. "
  @ Microsoft to acquire GitHub for $7.5 billion | Stories official press release
  @ Microsoft + GitHub = Empowering Developers - The Official Microsoft Blog
  @ A bright future for GitHub | The GitHub Blog
  @ Congratulations GitHub on the acquisition by Microsoft | GitLab
    " While we admire what's been done, our strategy differs in two key areas. First,
      instead of integrating multiple tools together, we believe a single application,
      built from the ground up to support the entire DevOps lifecycle is a better
      experience leading to a faster cycle time. Second, it’s important to us that the
      core of our product always remain open source itself as well. "
  @ GitLab Ultimate and Gold now free for education and open source | GitLab 
    " It has been a crazy 24 hours for GitLab. More than 2,000 people tweeted about
      #movingtogitlab. We imported over 100,000 repositories, and we've seen a 7x increase
      in orders. We went live on Bloomberg TV. And on top of that, Apple announced an
      Xcode integration with GitLab. "

Another fine post from The Computer Curmudgeon.

mdlbear: the positively imaginary half of a cubic mandelbrot set (Default)

On the health front, I may finally be learning to relax the muscles in my lower back that make it hurt when I walk. Maybe. It also seems to have a lot to do with how heavy my shoulder bag is, so that's going to be an ongoing problem. A backpack would be better, except that it's hard to get off when I take a seat in the bus, and unlike a shoulder bag I can't swing it around when I want to get at something like my wallet.

I've finally started doing some serious system administration/scripting work to get my website working directories the rest of the way under git control. That's done -- I can now say "make deploy" in a web directory and have it committed, pushed to the remote repo, and pulled into the website with no further attention.

In the process, I had to write a script for converting a directory from CVS to git. There are a couple of challenges in that process because the old CVS repositories were in pretty bad shape, with stuff not having been checked in consistently. Not like a well-maintained software project, in other words. Bad bear. No cookie. My websites don't use cookies anyway.

The associated asset archive is going to be harder, because some directories have large media files in them. Like, um... the audio. The goal is to eliminate the use of rsync snapshots for backups (for reasons I will probably go into in more detail in a later post).

Detail in the notes, as usual.

raw notes, with links )
mdlbear: the positively imaginary half of a cubic mandelbrot set (Default)
raw notes )

A pretty good day. I even got a walk in, though I cut it a little short because I was getting some foot pain. Naturally it went away as soon as I turned around. (It's because I have shoes with three different insoles. I get arch pain sometimes when readjusting between them. :P )

Quite a lot of puttering around the websites and associated makefiles, including finally getting HyperSpace-Express.com online. After owning it for how many years? Did I mention that I procrastinate?

Speaking of procrastination, I also got No Greater Love fully chorded out. About 2 weeks late, but in time for Tempered Glass's Orycon gig. Which is next Saturday evening. Eeek!

Lots of good links under the cut. Don Marti provided a lot of them, including a few great git links, one of which had this marvelous quote:

It is easy to shoot your foot off with git, but also easy to revert to a previous foot and merge it with your current leg.

mdlbear: the positively imaginary half of a cubic mandelbrot set (Default)
raw notes )

A very good day. I woke up to find myself still happy from having finished writing my presentation about git the night before. And it went well; it was planned as a half-hour talk, but ran to nearly an hour with a lot of good audience interaction. Go me! It helps that git is just plain cool; I expect to have the presentation up on the web soon.

The Cat and I went to the Valley Fair mall for our evening out. Parts were almost deserted; I could walk fast enough for exercise, and Colleen could keep up. Delightful.

I'm going to have to re-investigate LJ's set of pre-defined moods; I'm almost completely unfamiliar with the "better-than-OK" range.

mdlbear: the positively imaginary half of a cubic mandelbrot set (Default)

At the museum yesterday we spotted a man on a nice little folding scooter: almost certainly this one. Either folds up or comes apart; the combination of small wheels, plastic seat, and small battery means that it's probably limited to light-duty, mainly indoor use, but it looks especially convenient for travel. A bit pricy, though.

From my Mom, a link to pomegranate.com, an art publishing house.

From [livejournal.com profile] gmcdavid, this post linking to an obituary for the last surviving member of Nicolas Bourbaki. I've read a couple of their books - crystal clear even with my rather limited high school French. Sad: another Great Old One gone.

As long as I'm clearing my tabs, here's a link to an article on organizing a web site with git, on linuxworld.com.

mdlbear: (hacker glider)
Getting a static web site organized with git | LinuxWorld Community
Yes, I still end up maintaining some static web sites. I've started doing them under git revision control, just to be safe, and because "git push origin" is just as easy as rsync anyway. Here's a rough cut at a system for keeping these things organized.
Not as directly useful to me as it would be if I wasn't already syncing my entire web-related directory tree up to a large external hosting site for backup.
mdlbear: (hacker glider)

Excellent post by [livejournal.com profile] don_marti on becoming more productive by going offline. Git (distributed version control, basically syncing on steroids), ikiwiki (offline-rendered wiki), blosxom (offline-rendered blog), and more. It's related to a lot of what I've been saying about keeping control of your own data. In essence, what you want to do is to separate writing from publishing.

mdlbear: (hacker glider)

...and other geeky things.

Since I've started to use a version control system called git for my recording projects, and since the subject has come up in the comments to my last post, I thought I'd dive a little deeper into git and why I'm using it. (This is a good summary of git's features.)

non-geeks may want to skip this. )

It's getting lateish, so I'll continue this little dissertation this evening. Happy hacking!

mdlbear: (hacker glider)

I'm in the process of uploading my most recent take of "Someplace in the Net", along with enough version control information to make it possible for a collaborator (waves at [livejournal.com profile] cflute) to upload some additions. I'm almost certainly doing it wrong; possibly somebody more familiar with the git version-control system could tell me how to do what I really want to do.

technical details )

Anything to avoid doing actual work...

Most Popular Tags

Syndicate

RSS Atom

Style Credit

Page generated 2019-04-23 02:10 am
Powered by Dreamwidth Studios