No Habla Branching
The primary issue when bringing changesets from Git to Subversion is based in the major differences between how the two handle branching and changesets to begin with. In theory, projects like Tailor were created to help solve this issue by first translating both the source and destination repositories into an intermediary changeset format in order to cross-apply changes from one end to the other. Unfortunately after I spent a couple days battling with Tailor, I couldn't get it to properly handle some of the revisions in Slide's three year history.If you've ever used git-svn(1) you might be familiar with the git-svn dcommit command, which will work for some percentage of users that want to maintain dual repositories between Git and Subversion, things break down however once you introduce branching into the mix.
Giant Disclaimer
What I'm about to write I dutifully accept as Git-heresy, a nasty hack and not something I'm proud of.Flattening into Subversion
First the icky bash script that supports properly flattening revisions into the "master" branch in the git-svn repository and dcommits the results:#!/bin/bash
MERGE_BRANCH=mergemaster
REPO=$1
BRANCH=$2
if [[ -z "${1}" || -z "${2}" ]]; then
echo "===> You must provide a \"remote\" and a \"refspec\" for Git to use!"
echo "===> Exiting :("
exit 1;
fi
LATEST_COMMIT=`git log --max-count=1 --no-merges --pretty=format:"%H"`
function master
{
echo "==> Making sure we're on 'master'"
git checkout master
}
function setup_mergemaster
{
master
echo "==> Killing the old mergemaster branch"
git branch -D $MERGE_BRANCH
echo "==> Creating a new mergemaster branch"
git checkout -b $MERGE_BRANCH
git checkout master
}
function cleanup
{
rm -f .git/SVNPULL_MSG
}
function prepare_message
{
master
echo "===> Pulling from ${REPO}:${BRANCH}"
git pull ${REPO} ${BRANCH}
git checkout ${MERGE_BRANCH}
echo "==> Merging across change from master to ${MERGE_BRANCH}"
git pull --no-commit --squash . master
cp .git/SQUASH_MSG .git/SVNPULL_MSG
master
}
function merge_to_svn
{
git reset --hard ${LATEST_COMMIT}
master
setup_mergemaster
echo "===> Pulling from ${REPO}:${BRANCH}"
git pull ${REPO} ${BRANCH}
git checkout ${MERGE_BRANCH}
echo "==> Merging across change from master to ${MERGE_BRANCH}"
git pull --no-commit --squash . master
echo "==> Committing..."
git commit -a -F .git/SVNPULL_MSG && git-svn dcommit --no-rebase
cleanup
}
setup_mergemaster
prepare_message
merge_to_svn
master
echo "===> All done!"
This setup means when we "pull" (or merge) from Primary_Git/master we are going to be flattening at that point in order to properly merge it into the Primary_SVN. Without further ado, here's the breakdown on the pieces of the script:
function setup_mergemaster
{
master
echo "==> Killing the old mergemaster branch"
git branch -D $MERGE_BRANCH
echo "==> Creating a new mergemaster branch"
git checkout -b $MERGE_BRANCH
git checkout master
}
What the setup_mergemaster branch is responsible for is deleting any prior branches that have been used for merging into the proxy repository and Primary_SVN. It gives us a "mergemaster" branch in the git-svn repository that is effectively at the same chronological point in time as the master branch before any merging occurs.
function prepare_message
{
master
echo "===> Pulling from ${REPO}:${BRANCH}"
git pull ${REPO} ${BRANCH}
git checkout ${MERGE_BRANCH}
echo "==> Merging across change from master to ${MERGE_BRANCH}"
git pull --no-commit --squash . master
cp .git/SQUASH_MSG .git/SVNPULL_MSG
master
}
The prepare_message function is part of the nastiest code in the entire script, in order to get an accurate "squashed commit" commit message when the changesets are pushed into Primary_SVN, we have to generate the commit message separately from the actual merging. Since this function is performing a `git pull` from "master" into "mergemaster" the changesets that are being pulled are going to be the only ones that show up (for reasons I'm about to explain).
function merge_to_svn
{
git reset --hard ${LATEST_COMMIT}
master
setup_mergemaster
echo "===> Pulling from ${REPO}:${BRANCH}"
git pull ${REPO} ${BRANCH}
git checkout ${MERGE_BRANCH}
echo "==> Merging across change from master to ${MERGE_BRANCH}"
git pull --no-commit --squash . master
echo "==> Committing..."
git commit -a -F .git/SVNPULL_MSG && git-svn dcommit --no-rebase
cleanup
}
If you noticed above in the full script block, the "LATEST_COMMIT" code, here's where it's used, it is one of the most important pieces of the entire script. Basically the LATEST_COMMIT piece of script grabs the latest non-merge-commit hash from the `git log` output and saves it for later use (here) where it's used to rollback the proxy repository to the point in time just before the last merge commit. This is done to avoid issues with git-svn(1) not understanding how to handle merge commits whatsoever. After rolling back the proxy repository, a new "mergemaster" branch is created. After the mergemaster branch is created, the actual Primary_Git changesets that differ between the proxy repository and Primary_Git are pulled into the proxy repository's master branch, and sqaushed into the mergemaster branch where they are subsequently committed with the commit message that was prepared before. The "prepare_message" part of the script becomes important at that step because the "squashed commit" message that Git generates at this point in time will effectively contain every commit that has ever been proxied across in this manner ever.
After the "merge_to_svn" function has been run the "transaction" is entirely completed and the changesets that once differed between Primary_SVN/trunk and Primary_Git/master are now normalized.
Mostly Automagically
In the near future I intend on incorporating this script into the post-receive hook on Primary_Git in such a way that will truly propogate changesets automatically from Primary_Git into Primary_SVN, but currently I'm utilizing one of my new favorite "hammers', Hudson (see: One-line Automated Testing). Currently there are two jobs set up for proxying changesets across, the first "Subversion-to-Git" simply polls Subversion for changes and executes a series of commands when changes come in: git-svn fetch && git merge git-svn && git push $Primary_Git master. This is fairly straight-forward and fits in line with what git-svn(1) is intended to do. The other job that I created is "Git-to-Subversion" which must be manually invoked by a user, but still will automatically take care of squashing commits into Primary_SVN/trunk (i.e. bash svnproxy.sh $Primary_Git master).Wrap-up
Admittedly, this sort of setup leaves a lot to be desired. In the ideal world, Tailor would have coped with both our Git and our Subversion repositories in such a way that would have made this script nothing more than a silly idea I had on a bus. Unfortunately that wasn't case and the time budget I had for figuring out a way to force Tailor to work was about 47.5 hours less than it took me to sit down and write the script above. I'd be interested to see other solutions other organizations are utilizing to migrate from one system to the other, but at the time of this writing I can't honestly say I've heard much about people dealing with the "hybrid" scenario that we have currently at Slide.Did you know! Slide is hiring! Looking for talented engineers to write some good Python and/or JavaScript, feel free to contact me at tyler[at]slide