I once was offered a job at a small development shop. I asked all the right questions: Were they using version control? Did they have automated testing? etc. The answers all sounded good so I accepted the offer.
My plan for the first day was to get their software product building and running on my laptop and then start looking through the version control system to see what a typical change set looked like so I’d know a bit more about what to expect when looking at the system.
They were using CVS, but after a few queries, I couldn’t find any commits that were more than just a few days old. It was as if the source had been loaded into CVS a day or two before I started work. When I asked why there wasn’t any history in CVS I heard the following horror story.
They needed a server for an onsite demonstration, so they decided to use the CVS server. They installed the software on it and shipped it out to a customer’s location for a few weeks. I guess they decided they could just hold off on updating CVS during that period.
I never got the full story about what exactly happened–just that the hard drives “crashed”. Given the RAID setup, I doubt that hardware failure was the root problem. I’m guessing that someone reformatted it.
Anyway, when they got the server back, they weren’t too worried because they could always restore from their backup. The backup ran automatically every night and backed up all of CVS’s datafiles. Unfortunately they had never tested this backup and none of them realized that if CVS is running you can’t get a backup of its files. So all of their backups were corrupted.
In the end they just copied the contents of a developers laptop back into CVS and ran with that. Unfortunately they had several very large clients with custom installations that were built from source code that was in a branch of CVS. This meant there wasn’t anyway to give them bug fixes because their source code was completely gone.
Obviously I wanted to avoid that situation again. Since their current system didn’t really have any history I suggested we change to Subversion. Everyone was fine with this because they still thought the whole fiasco was somehow the fault of CVS.
Once Subversion was installed and running, I wanted to come up with a foolproof way to back it up. Subversion has some very nice commands for dumping the repository out to a text file, but I wanted something that would guarantee that not only had it been backed up, but that the backup was good.
Here is the subversion backup plan we eventually ended up with:
- Dump the repository to a file.
- Zip the file and save it with the date/time in the filename.
- Copy the zip to another server
- Delete the unzipped file.
- Copy the zip file back down from the server to a temp directory.
- Unzip the file
- Create a new repository
- Load the dump file into the new repository
- Checkout the trunk to a temporary directory
- Delete the temp directories
This entire process ran as a cron job and had the quiet options enabled everywhere. If there was a problem, data would be output and I’d get an email with the error.
This SVN backup process made it very easy to guarantee that the backup process was producing didn’t have any issues, could produce a valid repository and that trunk could be checked out without any errors.
I refined the process over the years and eventually posted it as a perl script if you want to try this svn backup process.