OpenWRT, rsync, and linux love

I use an rsync / hard link backup system of my own design (but similar in concept to this).  I have it providing 180 days of backups for numerous production machines spread around the internet, along with more permanent external backups provided by spideroak (referral link, but we both get free stuff if you sign up).  My internal backup system serves as my hot backups, so I want it available 24×7 via a remote file mount (sshfs) should I need it.

The machine I was running it on, though, was WAY overpowered and idles at around 70 watts – this was having a noticeable effect on our electricity bill.  So I put openwrt on my asus wl500gp v2 and have that now doing my backups. It’s silent, fanless, and combined with a good external USB drive has as much storage as you can afford. It also idles at 5 watts total, device and USB drive combined!

Notes:

  • The external drive is formatted for small files and inodes and a higher inode / block ratio. I just did “mkfs.ext -t small /dev/partition” from a full linux machine for the format – these options are a better fit the usage on an rsync / hard link backup system.
  • I installed the openwrt image with the 2.4 kernel because it seems it has better hardware support for this device – it works great.
  • I disabled the wlan and lan, leaving only the wan enabled with a static IP. I port forwarded an external port from my verizon router to allow ssh access from anywhere.
  • I had to install openssh via opkg because the dropbear ssh client doesn’t support outgoing key auth, or if it does it doesn’t support openssh-style keys.
  • I switched the default shell from ash to bash – just too many minor differences for me.
  • rsync is available via opkg. Install it.
  • cron is provided by busybox and has some minor differences in crontab syntax, I could not get @reboot jobs working.
  • USB storage is fairly easy to set up. I found, however, that the external device partitions were recognized at “/dev/discs/disc0/part1” instead of the more traditional “/dev/sda1” locations. No biggie, just odd.  You should read and implement the “start on boot” section.
  • You can see syslog output via the command “logread”
  • I needed to slow down the automount process via a “sleep” command  to allow the drive to spin up before mounting. Details here. Once I put in that delay, automount worked great.
  • “find” provided by busybox is way limited compared to gnu find, and I can’t seem to locate gnu find in the otherwise complete openwrt repos. Busybox find can’t search based on modification times nor link counts – both key to how I implemented my backup system. I reap backup directories via their modification times to expire old backups. I installed ruby and ruby-core (which contains the ruby stdlib that provides file / directory classes) and wrote my own little timed reaper. Source is below.

It’s working great so far – quiet, low-power and fast enough for me.

Stupidly simple timed directory reaper written in ruby

# expire_directories.rb. My backup directory names all look like "back-2012-01-27-04:44:05", 
# hence the regex along with the date check.
require 'find'
require 'fileutils'

days = ARGV[0]

Find.find('./') do |path|
  if FileTest.directory?(path)
    if path.scan('/').size == 1
      if path.match(/back/) && (File.stat(path).mtime < (Time.now - (60 * 60 * 24 * days.to_i)))
        puts "Removing: #{path}"
        FileUtils.rm_rf(path)
      end
    else
      Find.prune
    end
    #puts path
  end
end

invoked thusly:

cd /some/directory/that/contains/your/backup_directories && ruby ~/bin/expire_directories.rb 60

so pass it the number of days. Be sure you're in the proper directory before running this, it's doing an "rm -rf".

Cool stuff you can do with ssh and fuse

fuse over ssh rocks, as we all know. It allows you to mount remote filesystems anywhere you reach with SCP or SSH. But wait – there’s more!

Run commands on the filesystems of hosts you don’t control

I needed to use rsync on a host I don’t control (godaddy, in this case). So I used fuse to remotely mount the godaddy filesystem and then used rsync to do a local copy.

sshfs -C godaddyuser@godaddyhostname.org:/var/chroot/home/content/38/382342342/html/ ~/godaddy/
rsync -auvz --delete-excluded ~/godaddy/ ~/godaddy-copy/

I also created a git repo on that remote godaddy fuse mount – I feel naked without source control.

cd ~/godaddy/ && git init

just like working directly on the machine – except slower because of the network overhead.

How to extract uniq IPs from apache via grep, cut, and uniq

Say you’d like to find out the IP addresses of lines in your apache access.log (or any log file with a similar format, really) that contain “Googlebot”:

grep 'Googlebot' access.log | cut -d' ' -f1 | sort | uniq

which finds the lines via grep, uses cut to extract the first field (space delimited), sorts the IP addresses and then uniqifies them.

Dirt simple, stupidly powerful.