OITOFELIX

Article: Savannah CVS to Git migration

The Savannah CVS to Git migration article guides the reader through the necessary steps in migrating a Savannah hosted project that uses CVS as its VCS (Version Control System) to the more modern and powerful Git, preserving all the project’s history and improving the user naming in commit records.

  • Savannah is the GNU project’s software forge, also available for the free software community at large.
  • CVS is the Concurrent Version System, a traditional and popular centralized VCS.
  • Git is the stupid content tracker, a modern and widely adopted distributed VCS.

This article presents two alternative methods, each one based on a particular migration tool, that can be used to accomplish CVS to Git repository conversion: cvs2git and cvs-fast-export. You can experiment with both and choose which one suits you best.

  • cvs2git is part of the larger cvs2svn package.
  • cvs-fast-export is a tool authored by Keith Packard and currently maintained by Eric S. Raymond.

A handful of command-line tools are used in the procedures described in this article. They must be properly installed in your computer, and are likely to be available from your GNU/Linux distribution’s package repositories. If any happen not to be, you’ll have to fetch its source code in order to build and install from there. That should be straightforward, though. In addition to cvs2git, cvs-fast-export and git, we’ll use rsync.

  • rsync is a tool designed for fast incremental remote files transfer and synchronization.

The commands that you need to type are preceded with a $ sign. The command’s output is shown in the lines immediately following it. That output resembles the one you would obtain by running the same command adapted to your circumstances, but are likely different. The command-line and output pair are meant to be thought as a screenshot of a terminal window, but for brevity’s sake we’ll omit repetitive output by using the [...] ellipsis character sequence.

The original motivation for writing this article arose when I decided to migrate GNU ccd2cue, one package I’ve authored and maintained for the GNU project, from its original CVS-based code repository to a Git-based one, before I started working on the package’s new release (version 0.4 at the time). Therefore, for the sake of simplicity and comprehension, in this article we’ll assume we are working to solve that particular problem case and thus the following meta-information holds about the project in Savannah:

  • User name: oitofelix
  • Project name: ccd2cue
  • Domain: gnu.org

Needless to say, you’ll have to adapt this information to the case at hand; for that end you can consider those as meta-variables, if you like. For instance, if you are not a GNU maintainer your project is probably hosted at the non-GNU Savannah and therefore the domain must be regarded as nongnu.org. Other modifications that may be required (and I’m aware of) are explicitly noted as such in their respective context. However, unforeseen circumstances might arise from differences in repository structure, run-time environment, project requirements, server-side modifications, among other factors. Therefore, be warned that your mileage may vary. Furthermore, this article is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

I’d like to thank Assaf Gordon, the very helpful Savannah hacker whose expertise with cvs-fast-export is the basis for this article on that matter.

Obtaining CVS repository from Savannah

Firstly, we need to obtain from Savannah a local copy of the entire project’s CVS repository. We’ll use rsync in order to do that:

$ rsync -av rsync://cvs.savannah.gnu.org/sources/ccd2cue .
receiving incremental file list
ccd2cue/
ccd2cue/CVSROOT/
ccd2cue/CVSROOT/checkoutlist
ccd2cue/CVSROOT/commitinfo
[...]
ccd2cue/ccd2cue/src/Attic/error.c,v
ccd2cue/ccd2cue/src/Attic/error.h,v

sent 2,685 bytes  received 1,907,505 bytes  103,253.51 bytes/sec
total size is 1,897,875  speedup is 0.99

If everything went well the local directory ccd2cue should contain the CVS repository.

$ ls ccd2cue/
ccd2cue  CVSROOT

Enabling Git repository at Savannah and cloning it

It’s necessary to enable the Git repository at Savannah and clone it so we can import the converted repository and push it back. In order to do that, we have to go to Savannah project’s feature selection page and click to check the “Git” option like in the picture below:

Savannah project's features selection page

Wait about 30 minutes until the repository creation job is tackled by the server and then clone the repository into the ccd2cue.git directory — not ccd2cue since it is occupied by the copy of ccd2cue’s CVS repository made in the previous step.

$ git clone oitofelix@git.sv.gnu.org:/srv/git/ccd2cue.git ccd2cue.git
Cloning into 'ccd2cue.git'...
warning: You appear to have cloned an empty repository.
Checking connectivity... done.

If the server hasn’t completed the Git repository creation we’ll see instead an error message.

$ git clone oitofelix@git.sv.gnu.org:/srv/git/ccd2cue.git ccd2cue.git
Cloning into 'ccd2cue.git'...
fatal: '/srv/git/ccd2cue.git' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

We have to keep trying from time to time until we succeed.

Using cvs2git to convert the repository

Now it’s time to do the actual conversion to a Git repository. You can use the method described in this section or go to cvs-fast-export section for an alternative method.

The cvs2git conversion process is driven by the so called “options file”. That file is a regular Python program that can be used to fine-tune the conversion. The easiest and practical way to get started in writing it is to modify the extensively commented options file distributed along the cvs2svn package. In my computer this file is located at /usr/share/doc/cvs2svn/examples/cvs2git-example.options.gz.

Setting up cvs2git options file

To produce a working options file, which can give us good results for this conversion, we just need to make half dozen changes or so to the vanilla options file. Below are the necessary changes in unified diff format grouped by their intention.

Define CVS repository directory and unset temporary directory

The CVS local repository directory is the ccd2cue directory fetched from Savannah at the last step.

@@ -560,7 +550,7 @@
     # The filesystem path to the part of the CVS repository (*not* a
     # CVS working copy) that should be converted.  This may be a
     # subdirectory (i.e., a module) within a larger CVS repository.
-    r'test-data/main-cvsrepos',
+    r'ccd2cue',
 
     # A list of symbol transformations that can be used to rename
     # symbols in this project.

The temporary directory is where cvs2git outputs the resulting files. We want them to be placed in the current working directory, thus we unset it.

@@ -122,8 +122,6 @@
 #logger.log_level = logger.DEBUG
 
 
-# The directory to use for temporary files:
-ctx.tmpdir = r'cvs2svn-tmp'
 
 # During FilterSymbolsPass, cvs2git records the contents of file
 # revisions into a "blob" file in git-fast-import format.  The

Define blob and dump output file names

The whole conversion process outputs two files, which must be fed to git fast-import. The blob file comprises the revision contents.

@@ -135,7 +133,7 @@
 ctx.revision_collector = GitRevisionCollector(
     # The file in which to write the git-fast-import stream that
     # contains the file revision contents:
-    'cvs2svn-tmp/git-blob.dat',
+    'blob',
 
     # The following option specifies how the revision contents of the
     # RCS files should be read.

The dump file comprises the change-sets and branch/tag information.

@@ -528,7 +518,7 @@
 ctx.output_option = GitOutputOption(
     # The file in which to write the git-fast-import stream that
     # contains the changesets and branch/tag information:
-    os.path.join(ctx.tmpdir, 'git-dump.dat'),
+    'dump',
 
     # The blobs will be written via the revision recorder, so in
     # OutputPass we only have to emit references to the blob marks:

Set symbol transformation rules

When moving to Git it’s a good practice to tag the HEAD of the repository with something like cvs-repository-moved-to-git, so people reaching it can see that the repository is not being updated anymore. The change below prevents cvs2git from generating the same tag in the Git repository.

@@ -575,6 +565,7 @@
         # branches correctly.  The argument is a Python-style regular
         # expression that has to match the *whole* CVS symbol name:
         #IgnoreSymbolTransform(r'nightly-build-tag-.*')
+        IgnoreSymbolTransform(r'cvs-repository-moved-to-git'),
 
         # RegexpSymbolTransforms transform symbols textually using a
         # regular expression.  The first argument is a Python regular

Tag names in CVS are quite restrictive while in Git they are a lot more permissive, thus we’ll transform the old release tags rel_0-1, rel_0-2,… into the more appropriate 0.1, 0.2 and so on.

@@ -591,6 +582,7 @@
         #                      r'release-\1.\2'),
         #RegexpSymbolTransform(r'release-(\d+)_(\d+)_(\d+)',
         #                      r'release-\1.\2.\3'),
+        RegexpSymbolTransform(r'rel_(\d+)-(\d+)', r'\1.\2'),
 
         # Simple 1:1 character replacements can also be done.  The
         # following transform, which converts backslashes into forward

Map CVS users to full names and emails

CVS uses Unix user names in commit records, while Git allows full name plus an email address. It’s helpful to make use of that additional feature thus we’ll map one into another. For that end we need to first obtain a list of all the users that have commited to the CVS repository.

$ sed 's/^[^|]*|\([^|]*\)|.*$/\1/' ccd2cue/CVSROOT/history | uniq
oitofelix

With this list in hands we can create the mapping in the options file.

@@ -512,15 +510,7 @@
 # (name, email).  Please substitute your own project's usernames here
 # to use with the author_transforms option of GitOutputOption below.
 author_transforms={
-    'jrandom' : ('J. Random', 'jrandom@example.com'),
-    'mhagger' : 'Michael Haggerty <mhagger@alum.mit.edu>',
-    'brane' : (u'Branko Čibej', 'brane@xbc.nu'),
-    'ringstrom' : 'Tobias Ringström <tobias@ringstrom.mine.nu>',
-    'dionisos' : (u'Erik Hülsmann', 'e.huelsmann@gmx.net'),
-
-    # This one will be used for commits for which CVS doesn't record
-    # the original author, as explained above.
-    'cvs2svn' : 'cvs2svn <admin@example.com>',
+    'oitofelix' : 'Bruno Félix Rezende Ribeiro <oitofelix@gnu.org>',
     }
 
 # This is the main option that causes cvs2svn to output to a

Running cvs2git

The cvs2git options file has all the necessary settings to guide the conversion, therefore in cvs2git invocation no additional arguments are required besides --options, which we use to specify the file created at the previous step. We assume it’s named options and has been placed in the current working directory.

$ cvs2git --options=options
----- pass 1 (CollectRevsPass) -----
Examining all CVS ',v' files...
ccd2cue/ccd2cue/.cvsignore,v
[...]
cvs2svn Statistics:
------------------
Total CVS Files:               116
Total CVS Revisions:           444
Total CVS Branches:              0
Total CVS Tags:                140
Total Unique Tags:               5
Total Unique Branches:           0
CVS Repos Size in KB:         1816
Total SVN Commits:             154
First Revision Date:    Fri Mar 18 09:45:50 2011
Last Revision Date:     Tue Feb 11 15:13:46 2014
------------------
Timings (seconds):
------------------
0.936   pass1    CollectRevsPass
0.072   pass2    CleanMetadataPass
0.015   pass3    CollateSymbolsPass
5.734   pass4    FilterSymbolsPass
0.056   pass5    SortRevisionsPass
0.028   pass6    SortSymbolsPass
0.278   pass7    InitializeChangesetsPass
0.202   pass8    BreakRevisionChangesetCyclesPass
0.200   pass9    RevisionTopologicalSortPass
0.100   pass10   BreakSymbolChangesetCyclesPass
0.202   pass11   BreakAllChangesetCyclesPass
0.182   pass12   TopologicalSortPass
0.421   pass13   CreateRevsPass
0.022   pass14   SortSymbolOpeningsClosingsPass
0.019   pass15   IndexSymbolsPass
0.314   pass16   OutputPass
8.781   total

Two files should have been generated in the current directory: blob and dump. To create an auto-sufficient Git fast-import file we need to concatenate both.

cat blob dump > fast-import-file

The blob and dump files are no longer necessary and can be wiped out at will.

Using cvs-fast-export to convert the repository

If you have chosen the cvs2git method you can skip to the next section, otherwise continue reading on.

CVS uses Unix user names in commit records, while Git allows full name plus an email address. It’s helpful to make use of that additional feature thus we’ll map one into another. For that end we need to first obtain a list of all the users that have commited to the CVS repository.

$ find ccd2cue -type f | cvs-fast-export -a
oitofelix

With this list in hands we can create a mapping file to guide cvs-fast-export on how to transform user identities. We can use any editor of our choice or, alternatively, if the user name list is small, we can generate that file straight from the command-line.

$ cat > author-map << EOF
> oitofelix=Bruno Félix Rezende Ribeiro <oitofelix@gnu.org>
> EOF

The last step in this method is to invoke cvs-fast-export in order to produce the Git fast-import file.

$ find ccd2cue -type f | cvs-fast-export -A author-map > fast-import-file

Importing the repository and pushing it back to Savannah

Finally, importing the converted repository is as simple as feeding git fast-import with the file generated by cvs2git or cvs-fast-export conversion tools.

$ cd ccd2cue.git && git fast-import < ../fast-import-file
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:       5000
Total objects:          940 (        22 duplicates                  )
      blobs  :          373 (        21 duplicates        310 deltas of        350 attempts)
      trees  :          419 (         1 duplicates        233 deltas of        399 attempts)
      commits:          148 (         0 duplicates          0 deltas of          0 attempts)
      tags   :            0 (         0 duplicates          0 deltas of          0 attempts)
Total branches:           4 (         1 loads     )
      marks:     1073741824 (       542 unique    )
      atoms:             96
Memory total:          2294 KiB
       pools:          2098 KiB
     objects:           195 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize =   33554432
pack_report: core.packedGitLimit      =  268435456
pack_report: pack_used_ctr            =         11
pack_report: pack_mmap_calls          =          4
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =     623459 /     623459
---------------------------------------------------------------------

If you have gitk installed you can run gitk --all to inspect the repository’s sanity and health. Otherwise you can use just a simple git log if that’s enough. You may also find useful to re-compact the repository and discard any garbage.

$ git gc --prune=now
Counting objects: 940, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (238/238), done.
Writing objects: 100% (940/940), done.
Total 940 (delta 543), reused 940 (delta 543)

Our working tree is empty; let’s populate it.

$ git checkout master
Already on 'master'
Your branch is based on 'origin/master', but the upstream is gone.
  (use "git branch --unset-upstream" to fixup)

Finally, we have to push the whole repository back to the remote.

$ git push --all && git push --tags
Counting objects: 792, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (238/238), done.
Writing objects: 100% (792/792), 602.45 KiB | 0 bytes/s, done.
Total 792 (delta 543), reused 792 (delta 543)
To oitofelix@git.sv.gnu.org:/srv/git/ccd2cue.git
 * [new branch]      master -> master
 * [new tag]         rel_0-1 -> rel_0-1
 * [new tag]         rel_0-2 -> rel_0-2
 * [new tag]         rel_0-3 -> rel_0-3

Now, it’s all done. Happy hacking!