mdlbear | Entries tagged with backups

Profile

$mdlbear: blue fractal bear with text "since 2002" (Default)$

mdlbear

steve.savitzky.net

July 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 2025

Page Summary

Keeping backups +3 comments
More from the rabbit-hole +3 comments
Done recently (20130910 Tu - 0915 Su) +5 comments
Backing up +2 comments
Back to the past +2 comments
Backups (I love it when a plan comes together)
Backups
Backups and UPSs
Adventures in home computing
Thinking about backup
Adventures in home computing
The Daily Mirror
That was close

Active Entries

1: River: Remembering the FlowerCat: Four years after
2: Done Since 2025-06-29
3: "Rabbit rabbit rabbit!"
4: River: A shade of purple has left the world
5: Done Since 2025-06-22
6: Thankful Thursday
7: Done Since 2025-05-25
8: Done Since 2025-05-18
9: Thankful Thursday
10: Done Since 2025-05-11

Expand Cut Tags

No cut tags

It's been a while since I described the way I do backups -- in fact, the only public document I could find on the subject was written in 2006, and things have changed a great deal since then. I believe there have been a few mentions in Dreamwidth and elsewhere, but in this calamitous year it seems prudent to do it again. Especially since I'm starting to feel mortal, and starting to think that some day one of my kids is going to have to grovel through the whole mess and try to make sense of it. (Whether they'll find anything worth keeping or even worth the trouble of looking is, of course, an open question.)

My home file server, a small Linux box called Nova, is backed up by simply copying (almost -- see below) its entire disk to an external hard drive every night. (It's done using rsync, which is efficient because it skips over everything that hasn't been changed since the last copy.) When the disk crashes (it's almost always the internal disk, because the external mirror is idle most of the time) I can (and have, several times) swap in the external drive, make it bootable, order a new drive for the mirror, and I'm done. Or, more likely, buy a new pair of drives that are twice as big for half the price, copy everthing, and archive the better of the old drives. Update it occasionally.

That's not very interesting, but it's not the whole story. I used to make incremental backups -- instead of the mirror drive being an exact copy of the main one, it's a sequence of snapshots (like Apple's Time Machine, for example). There were some problems with that, including the fact because of the way the snapshots were made (using cp -l to copy directories but leave hard links to the files that haven't changed) it takes more space than it needs to, and makes the backup disk very difficult -- not to mention slow -- to copy if it starts flaking out. There are ways of getting around those problems now, but I don't need them.

The classic solution is to keep copies offsite. But I can do better than that because I already have a web host, and I have Git. I need to back up a little.

I noticed that almost everything I was backing up fell into one of three categories:

Files I keep under version control.
Files (mostly large ones, like audio recordings) that never change after they've been created -- recordings of past concerts, my collection of ripped CDs, the masters for my CD, and so on. I accumulate more of them as time goes by, but most of the old ones stick around.
Files I can reconstruct, or that are purely ephemeral -- my browser cache, build products like PDFs, executable code, downloaded install CDs, and of course entire OS, which I can re-install any time I need to in under an hour.

Git's biggest advantage for both version control and backups is that it's distributed -- each working directory has its own repository, and you can have shared repositories as well. In effect, every repository is a backup. In my case the shared repositories are in the cloud on Dreamhost, my web host. There are working trees on Nova (the file server) and on one or more laptops. A few of the more interesting ones have public copies on GitLab and/or GitHub as well. So that takes care of Group 1.

The main reason for using incremental backup or version control is so that you can go back to earlier versions of something if it gets messed up. But the files in group don't change, they just accumulate. So I put all of the files in Group 2 -- the big ones -- into the same directory tree as the Git working trees; the only difference is that they don't have an associated Git repo. I keep thinking I should set up git-annex to manage them, but it doesn't seem necessary. The workflow is very similar to the Git workflow: add something (typically on a laptop), then push it to a shared server. The Rsync commands are in a Makefile, so I don't have to remember them: I just make rsync. (Rsync doesn't copy anything that is already at the destination and hasn't changed since the previous run, and by default it ignores files on the destination that don't have corresponding source files. So I don't have to have a complete copy of my concert recordings (for example) on my laptop, just the one I just made.)

That leaves Group 3 -- the files that don't have to be backed up because they can be reconstructed from version-controlled sources. All of my working trees include a Makefile -- in most cases it's a link to MakeStuff/Makefile -- that builds and installs whatever that tree needs. Programs, web pages, songbooks, what have you. Initial setup of a new machine is done by a package called Honu (Hawaiian for the green sea turtle), which I described a little over a year ago in Sable and the turtles: laptop configuration made easy.

The end result is that "backups" are basically a side-effect of the way I normally work, with frequent small commits that are pushed almost immediately to a shared repo on Dreamhost. The workflow for large files, especially recording projects, is similar, working on my laptop and backing up with Rsync to the file server as I go along. When things are ready, they go up to the web host. Make targets push and rsync simplify the process. Going in the opposite direction, the pull-all command updates everything from the shared repos.

Your mileage may vary.

Resources and references

Another fine post from The Computer Curmudgeon (also at computer-curmudgeon.com).
Donation buttons in profile.

Mood: didactic
Location: Somewhere in cyberspace
Music: owls somewhere outside
Crossposts: https://mdlbear.livejournal.com/1745764.html

$mdlbear: blue fractal bear with text "since 2002" (Default)$

Following up on mdlbear | Welcome, tumblr refugees: this might otherwise have just been a longish section of next Sunday's "done" post, but the Tumblr apocalypse (tumbling-down?) is happening now and I wanted to get tumblr_backup.py out there. (It's a tumblr backup script, via this tumblr post by greywash, who notes that the original post by Greymask has disappeared). I think some of my readers will find it useful.

It's also worth noting greywash | State of the Migration: On fannish archival catastrophes, and what happens next (by way of ysabetwordsmith; I saw this someplace else last week, but apparently didn't log it.)

More meta stuff:

a remark by apricops: "It’s quite likely no... coincidence that that most ‘mismanaged’ and least profitable social media site is also the one that turned out to be most amenable to the formation of actual communities"
pangodillo | What Dreamwidth lacks is the ability to use tags as "whisperspace".

Mood: meta
Location: The Rainbow Caravan's North End on Whidbey Island
Crossposts: https://mdlbear.livejournal.com/1655815.html

$mdlbear: blue fractal bear with text "since 2002" (Default)$

Tried to log in on my file server last week and found out that the hard drive was dead. Finally went to Fry's yesterday, and bought a couple of Western Digital red (NAS) 2TB drives. Designed for continuous duty, which would be a good thing. Disassembled the lock on the docking bay I had the backup drive in (and promptly found the key, lurking in what had been my nightstand).

Confirmed that the backup works and the old main drive doesn't, and installed the latest Debian. Which only took about an hour. It boots fast as a bat, and ships with a driver for the Realtek ethernet controller on my motherboard. So I can free up the PCI slot for something more useful, like maybe an ESATA/USB-3 card, if I can find one.

Now begins the tedious process of restoring (done, as of this evening) and reconfiguring. Which will take time because I want to make some long-overdue changes in the config.

It looks like the last time a backup was made was June 25th. I don't *think* I did much, if anything, since then except maybe add a couple of passwords to the keychain. And of course I've lost a lot of email. If you sent anything to steve at thestarport.org in the last couple of months, I haven't seen it. (It is now forwarded to my gmail account, along with steve at savitzky.net which I've been doing pretty well at keeping up with.)

It's possible that some of the transient stuff can be rescued from the old drive -- it seems to run ok for a few minutes before suddenly going offline. Not entirely clear that it's worth bothering with.

Apart from that... Colleen has been getting physical therapy three times/week, and is now able to stand up and transfer into her power chair. Progress. Her caregiver is an excellent cook -- Thai, Chinese, and Japanese, with an emphasis on lean and low sodium. Yum!

Links in the notes, as usual. One, found by a coworker after I'd mentioned something to that effect, is one of my favorite stats: iPad 2 as fast as Cray 2 supercomputer. I also dropped a donation on YsabetWordsmith's poem, "Part of Who I Am". Some great links there, too.

( raw notes )

Mood: calm
Location: Rainbow's End
Crossposts: http://mdlbear.livejournal.com/1509259.html

So I finally decided to get serious about off-site backups: i.e., stop planning and start doing. This was assisted by the fact that work finally got around to installing a second T1 line yesterday -- my upstream bandwidth at home is barely sufficient to keep up with incremental backups; it would be hopeless for uploading the roughly 80GB already on the fileserver and needing to be backed up. (There's a lot that doesn't need to be backed up, fortunately.)

Sometime last Friday I dragged home a bare 500GB drive that was sitting around at work (originally intended for an outside-the-firewall server that never quite got off the ground), stuck it into a USB/eSATA enclosure, and loaded it up. Yesterday I mounted it on my desktop machine, and started uploading to my server at Dreamhost last night. Got about 250MB/s, which works out to about 890MB/h.

I'm doing it in pieces, of course: the web master directories last night, then my working directories today -- which amount to about 10GB, excluding the Audacity projects. Those are another 60GB -- I'll do those a little bit at a time, at night, with bandwidth limiting.

At that point, the only thing left will be the /home partition -- I can't do that until I have my planned encryption scheme in place. (Although in the interim I can fake it with an encrypted tar file.)

( more details, for the techies )

Hopefully I'll have everything uploaded by the end of the year, which would be nice.

Mood: productive

flyback - Google Code

Apple's Time Machine is a great feature in their OS, and Linux has almost all of the required technology already built in to recreate it. This is a simple GUI to make it easy to use.

(from this post on slashdot.)

I just upgraded my work laptop to Leopard yesterday, and fired up Time Machine because, well, automatic incremental backups are a Good Thing. I was intrigued to find, though, that it's not really doing anything special: behind that pretty interface is a directory tree with pathnames like nodename/yyyy-mm-dd-hhmmss. Whee! It keeps hourly backups for 24 hours, daily backups for a month, and weekly backups until you run out of space on your backup disk, at which point it presumably throws up its hands and begs for more storage.

Apart from the naming conventions and intervals, that's pretty close to what I've been doing with rsync for the last couple of years on Linux. What took them so long?

(eta: Other, similar packages for Win$ and Linux include BackupPC and Dirvish. What are you using?)

Location: 94025

Did backups this morning using the new SATA backup drive and new scripts. Fast as a bat: 10 minutes for 273GB of data.

I still haven't done the rest of the associated reorganization; I just wanted to get a snapshot of the current state.

( geeky details: the next steps )

Mood: geeky

Set up a massive file transfer to my shiny new backup drive last night and went to bed; I was rather disturbed to come into the office this morning and find an I/O error on the screen, and the OS unable to find the drive. Gleep!

I took the drive out of the USB enclosure, powered down, put it in Trantor's case, powered up, and was greatly relieved to find the drive up and running. A thorough fsck and a fresh rsync confirmed that all data was present and accounted for. I'm guessing it may have been a glitch somewhere in the external box's USB interface or the cable. Not going to worry about it much. (eta: power-cycling the drive enclosure didn't work; I didn't try rebooting or power-cycling the computer with the drive still external; that would probably have worked, I was just impatient.)

I'm still trying to resist the temptation to do more work on reorganizing my directory tree and setting up the offsite backups.

Mood: calm
Location: Grand Central Starport

Meanwhile, my disk test is finally on its final write pass. At roughly 3.5 hours per pass, I'm guessing sometime between 2 and 3am. (ETA: 03:56:13, as it turns out) I'm really enjoying having an OS that's stable enough that you can run a two-day I/O-bound process without having to worry about anything more likely than a possible power failure. (Not entirely unlikely, though -- we've had two at work so far this season. There's a reason why my machines are on APC UPSs.)

Mood: tired

... part mumble. Just before leaving for six days of Westercon, I very sensibly put my backup drive in another room and made a second set of backups (just of the important stuff: /home, /local/starport, and /mm/record) on yet another drive that I had lying around. Came back, retrieved the backup drive, and did a full backup. The system hung when I tried to unmount it.

Oops.

Taking this as a Bad Sign, I did an fsck after the reboot, and sure enough the disk was fscked up, though not too badly. From the dates on the inodes in lost+found, I'd say I had a couple of corrupted directories due to a crash back in 2005. Redid the current backups, and all is well for the moment. But I was very glad of the spare backup disk -- things could have been much worse.

But a corrupted directory can potentially cause an arbitrary amount of data to go kablooie, or at least become very hard to recover. My current nefarious plan to back up remotely using encrypted blobs has a similar problem unless there's enough reduncancy in the system to ensure that I never lose all the copies of any one blob. (It's still somewhat safer because blobs -- even directory blobs -- are immutable and so never have to be rewritten. Hmm: log-structured blob store?

Mood: relieved

This post by Mark Pilgrim, along with a (unfortunately friends-locked) post by my nephew asavitzk, got me thinking about backups again. I'm doing OK, but I can do better.

( the current setup: hot and cold running backups )

The current setup, with "hot" daily mirroring and "cold" weekly backups and monthly archives, works pretty well. It isn't disaster-proof, though.

( Now, here's the plan: going off-site )

A lot of what I work on is public, or at least semi-public: websites, recorded songs, and the like. That gets offsite "backups" automatically, but it needs a little more work.

( publish and be damned )

Update: a slightly modified version of this post can be found on my website under the title Keeping Backups

My new backup script seems to be working -- about 8 minutes to mirror a day's changes in over 100GB of files. Still needs to be parametrized, then I'll write it up and put it up on my website. And I really need to move the Debian mirror to a larger disk on the gateway; it's occupying 70GB that I'm going to be needing soon.

Mood: content

$mdlbear: blue fractal bear with text "since 2002" (Default)$

Finally got my daily backup script up and installed on the fileserver. Basically all it does is mount the backup drive and all of its partitions (which are identical to the ones on the main drive), mirror each partition with rsync, and unmount the backup drive.

It still needs to be parametrized better -- right now it's specialized for that particular set of partitions.

Mood: accomplished

I've been rearranging directories in my public website, and corresponding directories on the fileserver. The most recent operation was to move theStarport.com/people/steve/Doc/ to theStarport.com/Steve_Savitzky/. Everything went well; I did the move, made the corresponding move in the CVS repository (it's done using a one-line find command), fixed up the Makefiles (two similar one-liners), and then went to move the latest backup directory so that the next rsync wouldn't have to copy all the sound files and other bulky stuff.

That's when I noticed that /bak/usr/local/starport was a symlink. I'd installed a new, large disk on the fileserver a little over a month ago, and moved /usr/local into a separate partition called /local. I then made a new /usr/local just for the fileserver. I was backing up the new directory, which was in the same old place, but not the new partition. Oops.

No real harm done -- there haven't been many changes since late March when I installed the new disk. Except for the major changes I made this weekend, and I've backed all that up now.

Mood: accomplished

Syndicate

Style Credit

Style: Green for EasyRead by rb

Page generated 2025-07-16 06:11 pm