fill the void - bdunagan

02 Apr 2014
Migrating Retrospect from SVN to GitHub Enterprise

The Retrospect codebase has been around since 1984 (counting DiskFit) when Dantz Development was founded. Thirty years of code.

Our original version control system was naturally Retrospect. When our Windows development team spun up in the 90s, they used Visual SourceSafe; we made sure to back that up as well, which saved us multiple times. In 2000, both the Mac and Windows teams switched to CVS, and in 2008, I migrated the various code repositories to SVN.

In 2013, I migrated Retrospect from SVN to Git, in the form of GitHub Enterprise. A year later, we still comment on how much better Git is, and GitHub Enterprise has proven itself to be worth every penny.

Mass Migration

SVN is old. It was an excellent replacement for CVS, but it’s old. Branching isn’t cheap. Repos aren’t distributed. History isn’t local. It was time for a new version control system. We chose Git, mainly because the CEO and I were already familiar with it and with GitHub.

All included, we had 27 SVN repos to convert. I used nirvdrum’s svn2git, which worked very well. The main repo took 10 hours on a MacBook Pro SSD. It had 33k commits, dating back to 2000. (There was no vss2cvs like we have cvs2svn and svn2git.)

First, I dumped the SVN repos on the SVN server and transferred them to my laptop.

# Dump SVN repo, within 'screen' as the process takes a while.
svnadmin dump repo_name | gzip -9 > repo_name.dump.gz

Next, I collated the authors list for each repo.

# One-liner, thanks to StackOverflow.
svn log -q -r 1:HEAD svn://localhost | grep '^r' | awk -F'|' '!x[$2]++{print$2}'

Finally, I wrote a Ruby script to manage svn2git. I ran it manually for every repo, so that I could check the state of the new Git repo and log before moving onto the next conversion.

# Ruby script for setting up svn2git
@repo = ARGV[0]
system("killall svnserve")
system("svnadmin create #{@repo}_svn")
system("gunzip < #{@repo}.dump.gz | svnadmin load #{@repo}_svn")
system("svnserve -d -r #{@repo}_svn")
system("mkdir #{@repo}")
system("cd #{@repo}; svn2git svn://localhost/ -v --authors ../svn_authors.txt --metadata; cd ..")
system("killall svnserve")

I included these snippets of Ruby and Bash code to help others along, but none of them just work. Each repo migration is unique to the codebase and engineering team. So are the issues. Luckily, I only encountered three problems:

  • quotes in commits: svn2git had an issue with escaping quotes and double quotes from commit messages. I fixed the issue with the code from the pull request.
  • bad commits: Three repos had a bad commit around 2001 that resulted in the following error message: “Filesystem has no item: Working copy path ‘branches/dantz’ does not exist in repository at /usr/local/git/lib/perl5/site_perl/Git/SVN/ line 282”. I tried running svn2git on the first set of commits in the main repo then rerunning it after the bad one; the second run took 80 hours to fail. I ended up splitting the repos on these commits to deal with them.
  • line endings: We had to change 30% of our files to normalize the main codebase to LF. We added * text=auto to .gitattributes.

The entire process took a week. I spent the first few days figuring out the best migration strategy. The actual conversion only took a day, so the team held off on commits until it was done.

Then, I ran git push –all to GitHub Enterprise.

The Value of Social Coding

GitHub Enterprise is GitHub running on an internal server. It works just like Same feature set. It costs $250 per person per year, sold in packs of twenty. We migrated to GitHub Enterprise a year ago, and we’ve been extremely pleased with it.

We chose GitHub Enterprise because the CEO and I were already familiar with GitHub. (I migrated our website to it six months earlier, and we work on it together.) In particular, we found pull requests and inline discussions immensely useful. Pull requests encapsulated features, and discussions happened inline and in context. We didn’t need to exchange long emails with code pasted in. Everything happened in GitHub.

That’s what we were missing: social coding. Git provided cheap branching, distributed repos, and local history, but GitHub added a set of tools that made code discussions seamless. Our development team tried code review tools in the past, but none lasted long. The tools were not integrated into our existing workflow, and they were awkward to use. GitHub doesn’t feel awkward. Pull requests for feature branches seem obvious, and discussions are natural. And the 800 emoji that it supports make them fun. We frequently post “ it!” on pull requests. In fact, during our recent release, we merged our 500th pull request. Pull requests are what we paid for.

The second-order feature set is where GitHub gets better: Contributions and Pulse. These give everyone on the team a high-level view on every developer and project, and we’ve discovered interesting patterns in both views. I get that all this data was already there, even when it was locked away in svn log, but I never took the time to visualize it. The News Feed lets me see what’s going on right now. Pulse aggregates that over a month. Contributions aggregates it over a year. The data visualization is automatic.

Below is my own Contributions page:

If you’re still on SVN, think about migrating to Git. If you use Git, consider GitHub Enterprise. Both proved worthwhile investments for our team at Retrospect.

Previous LinkedIn Twitter GitHub Email Next