Tuesday, August 07, 2007

What version control systems are missing

Sorry again for the long delays...meant to blog earlier and then went on vacation.

Anyway, I've been looking into different version control systems lately and I find that there are two things missing. First is a way to make them more reliable. Second is a way to make them perform better when dealing with a central repository.

To address the first, when you have a version control system, you typically have either a central server, like Subversion or CVS, or you have a series of distributed nodes, like git or Bazaar. What happens if the central server or a node goes down irrecoverably? One answer is to go back to a backup, which may not have all of the recent changes.

To address the second, which may not be as big of an issue, there is the possibility that you may have a very busy server. For example, the git repository that stores the entire Linux kernel. Granted, most likely it will be primarily reads, but that can still slow things down.

Replication I think would help both of these, especially for centralized servers or nodes where a lot of changes occur. The small solution would be to use Oracle's BerkeleyDB, which supports replication and is very light weight. Also, though this is less practical for distributed systems, you could build the system on top of a relational database, such as MySQL. Then, you can get both replication and clustering.

The main reason I'm thinking this way is that by using a back-end like BerkeleyDB or MySQL, you don't have to deal with many of the low-level issues. Also, they have the built-in technologies to allow for faster recovery. To me, this can be very important in a corporate or open-source environment.

Food for thought.


Post a Comment

Links to this post:

Create a Link

<< Home