Friday, August 31, 2007

Who's right when it comes to version control?

There's been a bit of discussion about who's tools can piss the furthest...I mean, who's tool is the better version control tool. The discussion is mainly between Subversion and Git and which method is right. Subversion relies on a centralized repository. Git is a distributed system.

So, which is better? I say both are good.

Here's my thinking. First, it depends on the organization which fits in better with their business model. A distributed system may not be what the organization wants. Second, they're both good tools, though neither is perfect. Git is better than Subversion for things like branching and merging, however I really like Subversion's diff algorithm and the fact it stores diff's instead of snapshots. Git also doesn't have a hoard of annoying little hidden subdirectories strewn throughout your directory structure. However, Subversion has several good GUI tools that make it much easier for people to use, assuming they're not willing to use the command line. TortoiseSVN is a great tools for Windows users.

I've used both and I like both. Git, I think, is a bit easier to use, but there's nothing wrong with Subversion. In the end, it's what works best for you.

Labels: , ,

More LaTeX goodness

Yeah, I'm boring, but I figured I'd pass along some tidbits. Most of my technical goodies I discover or do at work, which really limits what I can blog about. However, this one I can.

You see, I don't like MS Word, just like lord knows how many people. So, when I had to do some documentation recently, like documenting potential requirements or documenting some design ideas, I decided not to use it. They way I figure it, it won't go to a customer, so why should I bother? Besides, I seem to get it done faster in gVim and I want to get some practice for when I really decide to use it.

Overall, I must say it worked well. Images are included very easily. Just needed some tweaking to get the sizes right and all is good. I even got one image to sit on the right-hand side of the document with the text wrapping around it. No fighting or anything. It just works. Sweet!

Now, I did several documents recently using LaTeX, one was an outline that I felt would be better to do on a computer than paper: easier to reorganize. Today however, I was documenting some potential requirements. Nothing big. I figured I'd do it as a table and when done, get it into an RTF Document and finally a Word document. Needless to say, it was pretty damn easy. The only trouble I came across was that the one table I created was pretty long. Turns out there's a package that comes with MiKTeX, and maybe regular LaTeX, called longtable that handles that. Here's a link:

The only other thing I did was I found a LaTeX class that makes a document look like it was formatted in Word. I can't remember the name of it. I think it was wordstyle, but I can't remember.

The result was pretty good. I had to adjust some margins and the table cell widths. The only real bug was that the LaTeX to RTF tool didn't understand the longtable package, so I ended up having the header displayed twice. Whoopee. I deleted the one row and went on my merry way.

This will be strange to say, but I enjoyed the experience. This either means that I really like working with LaTeX or I'm just that bored at work. Either way, it seems much easier and nicer to work on documents in gVim than MS Word or even, which is relatively sane.

Labels: , ,

Friday, August 10, 2007

Data mining done right.

This is fantastic. Law enforcement actually doing data mining properly and with good results.

Labels: ,

Tuesday, August 07, 2007

What version control systems are missing

Sorry again for the long delays...meant to blog earlier and then went on vacation.

Anyway, I've been looking into different version control systems lately and I find that there are two things missing. First is a way to make them more reliable. Second is a way to make them perform better when dealing with a central repository.

To address the first, when you have a version control system, you typically have either a central server, like Subversion or CVS, or you have a series of distributed nodes, like git or Bazaar. What happens if the central server or a node goes down irrecoverably? One answer is to go back to a backup, which may not have all of the recent changes.

To address the second, which may not be as big of an issue, there is the possibility that you may have a very busy server. For example, the git repository that stores the entire Linux kernel. Granted, most likely it will be primarily reads, but that can still slow things down.

Replication I think would help both of these, especially for centralized servers or nodes where a lot of changes occur. The small solution would be to use Oracle's BerkeleyDB, which supports replication and is very light weight. Also, though this is less practical for distributed systems, you could build the system on top of a relational database, such as MySQL. Then, you can get both replication and clustering.

The main reason I'm thinking this way is that by using a back-end like BerkeleyDB or MySQL, you don't have to deal with many of the low-level issues. Also, they have the built-in technologies to allow for faster recovery. To me, this can be very important in a corporate or open-source environment.

Food for thought.