Tune Subversion FSFS repository with fsfs-reshard.py

I have worked on an improved version of fsfs-reshard.py:

  • support Subversion 1.6 format
  • able to unpack revisions (1.6 feature) in order to change the shard size
  • generate statistics about shards with current shard size or a target shard size, to be able to fine tune the shard size for each repository

Context

When using Subversion since years, a repository in an old version is still usable with recent Subversion binaries but some features may not be available. It becomes obvious if the svn:mergeinfo property is not supported and your Subversion client complains about this lacking feature on the repository server side.

Here you have two options to upgrade a FSFS repository to the latest version:

  • Generate a dump from your old repository with svnadmin dump and create a brand new repository feeded by svnadmin load
  • Simply run svnadmin upgrade

The dump/load option is often recommended but requires more than three times the original disk space of your repository, consummes a lot of CPU with a significant downtime for the repository users.

After investigation, it seems that svnadmin upgrade only does a few file changes which seems really harmless. In case of doubt, you can run svnadmin verify to check repository integrity.

Since Subversion 1.5, a FSFS repository can be sharded, which is the default option for a brand new repository created with 1.5 or later. But an upgraded repository created by pre-1.5 remains in the original format, now called linear. That is not an issue, recent versions are able to go on working with the linear mode too.

If you want to take benefits of the sharded mode, the repository must be refactored thanks to the fsfs-reshard.py script described in Subversion reference book. The original script available in third party section of Subversion sources tries to do both upgrade and reshard – which sounds too risky to me – and does not support Subversion 1.6 formats.

So I have worked on that script, and here is an improved version of fsfs-reshard.py:

  • support Subversion 1.6 format
  • able to unpack revisions (1.6 feature) in order to change the shard size
  • generate statistics about shards with current shard size or a target shard size, to be able to fine tune the shard size for each repository

This version does not upgrade repository db format. So a 1.5-repository remains at 1.5 version. You have to use svnadmin upgrade first.

How to upgrade a Subversion repository

This procedure has been tested and works from 1.3 to 1.6 versions. You should take care if your FSFS repository is 1.2 old.
First do tests on a repository copy if you are not confident with the script and the procedure. After a while, you will probably apply the procedure directly on repositories.

  • Set the repository offline. Notify users and stop Apache2/WebDAV or svnserve service and change file permissions to prevent read and write access
  • Do a full file backup of the repository. Either a zip or a .tar.gz
  • Run svnadmin upgrade repository with your recent Subversion binaries
  • Check your repository format version and mode with fsfs-reshard.py repository
    No change on repository yet.
    For Subversion 1.6, expected db format version is 4. If it comes from pre-1.5 version, it is in linear mode
  • If you need confidence, run svnadmin verify repository to check integrity
  • Now proceed with the next steps to convert from linear mode to sharded mode.

How to tune Subversion repository shard size

These steps can be apply on your repository at any time to improve performance.
If it has been already packed, the script will unpack it before reshard to a new shard size, you will have to run again svnadmin pack repository after reshard.

Prerequisites : your repository is offline, you have just generated a backup of it.

  • Check your repository format version and mode with fsfs-reshard.py repository
    No change on repository yet. A 1.5 or 1.6 repository can be in linear mode, sharded mode or sharded mode with packed shards.
  • Estimate packed shard file sizes with fsfs-reshard.py repository target=1000
    No change on repository yet. The default Subversion shard size is 1000 revisions per shard.
    Maybe it is not suitable to your specific repository if file sizes are be too large after packing.
  • When your target shard size suits you, run fsfs-reshard.py repository 1000
    Job is done. Repository revisions are now collated into 1000-items groups.
  • Check your repository format version and mode with fsfs-reshard.py repository
    No change on repository. You should get
    Current FSFS db format version 4 with sharded layout, max files per shard: 1000.
    and the list of effective shard sizes.
  • To improve disk access, you can pack complete shards into large files with svnadmin pack repository
  • Check your repository format version and packed mode with fsfs-reshard.py repository
    No change on repository.
  • Run svnadmin verify repository to check integrity.
    If it fails, you probably have to get back your repository backup and use the dump/load method.
  • Set the repository online again.

Hope the script and procedures help you to get the best of Subversion.

Comments and feedbacks are welcome.

Laisser un commentaire