<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dw="https://www.dreamwidth.org">
  <id>tag:dreamwidth.org,2010-04-27:505737</id>
  <title>The Mandelbear's Musings</title>
  <subtitle>mdlbear</subtitle>
  <author>
    <name>mdlbear</name>
  </author>
  <link rel="alternate" type="text/html" href="https://mdlbear.dreamwidth.org/"/>
  <link rel="self" type="text/xml" href="https://mdlbear.dreamwidth.org/data/atom"/>
  <updated>2019-02-22T20:22:17Z</updated>
  <dw:journal username="mdlbear" type="personal"/>
  <entry>
    <id>tag:dreamwidth.org,2010-04-27:505737:1662041</id>
    <link rel="alternate" type="text/html" href="https://mdlbear.dreamwidth.org/1662041.html"/>
    <link rel="self" type="text/xml" href="https://mdlbear.dreamwidth.org/data/atom/?itemid=1662041"/>
    <title>Git: The other blockchain</title>
    <published>2019-02-22T06:25:57Z</published>
    <updated>2019-02-22T20:22:17Z</updated>
    <category term="hashing"/>
    <category term="curmudgeon"/>
    <category term="blockchain"/>
    <category term="git"/>
    <category term="software"/>
    <dw:mood>didactic</dw:mood>
    <dw:security>public</dw:security>
    <dw:reply-count>2</dw:reply-count>
    <content type="html">&lt;h3&gt;Part 1: Blockchain&lt;/h3&gt;
&lt;p&gt; Blockchain is the technology behind Bitcoin and other cybercurrencies.
    That's about all anyone outside the software industry knows about it; that
    and the fact that lots of people are claiming that it's going to transform
    everything.  (The financial industry, the Web, manufacturing supply chains,
    identity, the music industry, ... the list goes on.)  If you happen to be
    &lt;em&gt;in&lt;/em&gt; the software industry and have a moderately good idea of what
    blockchain is, how it works, and what it can &lt;em&gt;and can't&lt;/em&gt; do, you
    may want to skip to &lt;a href="#part-2"&gt;Part 2&lt;/a&gt;.

&lt;p&gt; Still with me?  Here's the fifty-cent summary of blockchain.  Blockchain
    is a distributed, immutable ledger.  Buzzword is a buzzword buzzword
    buzzword?  Blockchain is a chain of blocks?  That's closer.

&lt;p&gt; The purpose of a blockchain is to keep track of financial transactions
    (that's the "ledger" part) and other data by making them public (that's
    half of the "distributed" part), keeping them in blocks of data (that's
    the "block" part) that can't be changed (that's the "immutable" part, and
    it's a really good property for a ledger to have), are linked together by
    hashes (that's the "chain" part, and we'll get to what hashes are in a
    moment), with the integrity of that chain guaranteed by a large group of
    people (that's the other half of the "distributed" part) called "miners"
    (WTF?).

&lt;p&gt; Let's start in the middle:  how can we link blocks of data together so
    that they can't be changed?  Let's start by making it so that any change
    to a block, or to the order of those blocks, can be detected.  Then, the
    fact that everything is public makes the data impossible to change without
    that change being glaringly obvious.  We do that with hashes.

&lt;p&gt; A hash function is something that takes a large block of data and turns it
    into a very long sequence of bits (which we will sometimes refer to as a
    "number", because any whole number can be represented by a sequence of
    binary digits, and sometimes as a "hash", because the data has been
    chopped up and mashed together like the corned beef hash you had for
    breakfast).  A good hash function has two important properties:

&lt;ol&gt;
  &lt;li&gt; It's irreversible.  Starting with a hash, it is effectively impossible to
       construct a block of data that will produce that hash.  (It is
       significantly easier to construct two blocks with the same hash, which
       is why the security-conscious world moves to larger hashes from time to
       time.) 
  &lt;li&gt; It's unpredictable.  If two blocks of data differ anywhere, even by a
       single bit, their hashes will be &lt;em&gt;completely&lt;/em&gt; different.
&lt;/li&gt;&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt; Those two together mean that if two blocks have the same hash, they
    contain the same data.  If somebody sends you a block and a
    hash, you can compare the hash of the block and if it matches, you can be
    certain that the block hasn't been damaged or tampered with before it got
    to you.  And if they also cryptographically &lt;em&gt;sign&lt;/em&gt; that hash, you
    can be certain that they used the key that created that signature.

&lt;p&gt; Now let's guarantee the integrity of the &lt;em&gt;sequence&lt;/em&gt; of blocks by
    chaining them together.  Every block in the chain contains the hash of the
    previous block.  If block B follows block A in the chain, B's hash depends
    in part on the hash of block A.  If a villain tries to insert a forged
    transaction into block A, its hash won't match the one in block B.

&lt;p&gt; Now we get to the part that makes blockchain interesting:  getting
    everyone to agree on which transactions go into the next block.  This is
    done by &lt;em&gt;publishing&lt;/em&gt; transactions where all of the miners can see
    them.  The miners then get to work with &lt;del&gt;shovels and pickaxes&lt;/del&gt;
    &lt;ins&gt;big fast computers&lt;/ins&gt;, validating the transaction, putting it into
    a block, and then running a contest to see which of them gets to add their
    block to the chain and collect the associated reward.  Winning the contest
    requires doing a &lt;em&gt;lot&lt;/em&gt; of computation.  It's been estimated that
    miners' computers collectively consume roughly the same amount of
    electricity as Ireland.

&lt;p&gt; There's more to it, but that's blockchain in a nutshell.  I am
    &lt;em&gt;not&lt;/em&gt; going to say anything about what blockchain might be good for
    besides keeping track of virtual money -- that's a whole other rabbit hole
    that I'll save for another time.  For now, the important thing is that
    blockchain is a system for keeping track of financial transactions by
    using a chain of blocks connected by hashes.

&lt;p&gt; The need for miners to do work is what makes the virtual money they're mining
    valuable, and makes it possible for everyone to agree on who owns how much
    of it without anyone having to trust anyone else.  It's all that work that
    makes it possible to detect cheating.  It also makes it expensive and
    slow.  The Ethereum blockchain can handle about ten transactions per
    second.  Visa handles about 10,000.


&lt;h3&gt;Part 2: The &lt;em&gt;other&lt;/em&gt; blockchain&lt;/h3&gt;

&lt;p&gt; Meanwhile, in another part of cyberspace, software developers are using
    another system based on hash chains to keep track of their software -- a
    distributed version control system called &lt;code&gt;git&lt;/code&gt;.  It's almost
    completely different, except for the way it uses hashes.  How different?
    Well, for starters it's both free and fast, and you can use it at home.
    And it has nothing to do with money -- it's a version control system.

&lt;blockquote&gt;
&lt;p&gt; If you've been with me for a while, you've probably figured out that I'm
    extremely fond of git.  This post is &lt;em&gt;not&lt;/em&gt; an introduction to git
    for non-programmers -- I'm working on that.  However, if you managed to
    get this far it does contain enough information to stand on its own,
&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt; Git doesn't use transactions and blocks; instead it uses "objects", but
    just like blocks each object is identified by its hash.  Instead of
    keeping track of virtual money, it keeps track of files and their
    histories.  And just as blockchain keeps a complete history of everyone's
    coins, git records the complete history of everyone's data.

&lt;p&gt; Git uses several types of object, but the most fundamental one is called a
    "blob", and consists of a file, its size, and the word "blob".  For
    example, here's how git idenifies one of my Songs for Saturday posts:

&lt;pre&gt;git hash-object 2019/01/05--s4s-welcome-to-acousticville.html
957259dd1e41936104f72f9a8c451df50b045c57&lt;/pre&gt;

&lt;blockquote&gt;
&lt;p&gt; Everything you do with git starts with the &lt;code&gt;git&lt;/code&gt; command.  In
    this case we're using &lt;code&gt;git&amp;nbsp;hash-object&lt;/code&gt; and giving it the
    pathname of the file we want to hash.  Hardly anyone needs to use the
    &lt;code&gt;hash-object&lt;/code&gt; subcommand; it's used mainly for testing and the
    occasional demonstration.
&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt; Git handles a &lt;em&gt;directory&lt;/em&gt; (you may know directories as "folders" if
    you aren't a programmer) by combining the names, metadata, and hashes of
    all of its contents into a type of object called a "tree", and taking the
    hash of the whole thing.

&lt;p&gt; Here, by the way, is another place where git really differs from blockchain.
    In a blockchain, all the effort of mining goes into making sure that every
    block points to its one guaranteed-unique correct predecessor.  In other
    words, the blocks form a chain.  Files and directories form a tree, with
    the ordinary files as the leaves, and directories as branches.  The
    directory at the top is called the root.  &lt;em&gt;Top?&lt;/em&gt; Top.  For some
    reason software trees grow from the root down.  After a while you get used
    to it.

&lt;p&gt; Actually, that's not quite accurate, because git stores each object in
    exactly one place, and it's perfectly possible for the same file to be in
    two different directories.  This can be &lt;em&gt;very&lt;/em&gt; useful -- if you
    make a hundred copies of a file, git only has to store one of them.  It's
    also inaccurate because trees, called &lt;a href="https://en.wikipedia.org/wiki/Merkle_tree"&gt;Merkle Trees&lt;/a&gt; are
    used &lt;em&gt;inside&lt;/em&gt; of blocks in a blockchain.  But I digress.

&lt;blockquote&gt;
&lt;p&gt; Technically the hash links in both blockchains and git form a &lt;a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph"&gt;directed
    acyclic graph&lt;/a&gt; -- that means that the links all point in one direction,
    and there aren't any loops.  In order to make a loop you'd have to predict
    the hash of some later block, and you just can't do that.  I have &lt;a href="https://computer-curmudgeon.com/Blog/2018/09/19/single-link/"&gt;another post about why this is a good thing.&lt;/a&gt;
&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt; And that brings us to the things that make git, git:  commits.  ("Commit"
    is used in the same sense, more or less, as it is in the phrase "commit
    something to memory", or "commit to a plan of action".  It has very little
    to do with crime.  Hashes are even more unique than fingerprints, and we
    all know what criminals think about fingerprints.  In cryptography, the
    hash of a key is &lt;em&gt;called&lt;/em&gt; its fingerprint.)

&lt;p&gt; Anyway, when you're done making changes in a project, you type the command

&lt;pre&gt;git commit&lt;/pre&gt;

&lt;p&gt; ... and git will make a new commit object which contains, among other
    things, the time and date, your name and email address, maybe your
    cryptographic signature, a brief description of what you did (git puts you
    into your favorite text editor so you can enter this if you didn't put it
    on the command line), the hash of the current root, and &lt;em&gt;the hash of
    the previous commit&lt;/em&gt;.  Just like a blockchain.

&lt;p&gt; Unlike earlier version control systems, git never has to compare files;
    all it has to do is compare their hashes.  This is &lt;em&gt;fast&lt;/em&gt; -- git's
    hashes are only 20 bytes long, no matter how big the files are or how many
    are in a directory tree.  And if the hashes of two &lt;em&gt;trees&lt;/em&gt; are the
    same, git doesn't have to look at any of the blobs in those trees to know
    that they are all the same.

&lt;p&gt; 


&lt;blockquote style="white-space: pre-wrap;"&gt;
  @ &lt;a href="https://hackernoon.com/blockchain-101-only-if-you-know-nothing-b883902c59f7"&gt;Blockchain 101 — only if you ‘know nothing’! – Hacker Noon&lt;/a&gt; 
  @ &lt;a href="https://medium.com/@sbmeunier/when-do-you-need-blockchain-decision-models-a5c40e7c9ba1"&gt;When do you need blockchain? Decision models. – Sebastien Meunier&lt;/a&gt;

  @ &lt;a href="https://git-scm.com/book/en/v2/Git-Internals-Git-Objects"&gt;Git - Git Objects&lt;/a&gt;
  @ &lt;a href="http://gitready.com/beginner/2009/02/17/how-git-stores-your-data.html"&gt;git ready » how git stores your data&lt;/a&gt;
  @ &lt;a href="https://en.wikibooks.org/wiki/Git/Internal_structure"&gt;Git/Internal structure - Wikibooks, open books for an open world&lt;/a&gt;
  @ &lt;a href="https://computer-curmudgeon.com/Blog/2018/09/19/single-link/"&gt;Why Singly-Linked Lists Win* | Stephen Savitzky&lt;/a&gt;
&lt;/blockquote&gt;

&lt;p class="colophon"&gt; &lt;em&gt;Another fine post from
    &lt;a href="https://mdlbear.dreamwidth.org/tag/curmudgeon"&gt;The Computer Curmudgeon&lt;/a&gt; (also at
    &lt;a href="https://computer-curmudgeon.com/"&gt;computer-curmudgeon.com&lt;/a&gt;).&lt;/em&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src="https://www.dreamwidth.org/tools/commentcount?user=mdlbear&amp;ditemid=1662041" width="30" height="12" alt="comment count unavailable" style="vertical-align: middle;"/&gt; comments</content>
  </entry>
</feed>
