Sunday, December 28, 2008

Creating More Reliable Software With Good Concurrency

First off, hope everyone had a good holiday season.

Second, something has been on my mind of late: creating better quality software that has good concurrency. You see, I'm working on a system right now where high availability is a major concern, at least for me. Also, with multi-core systems becoming the norm, it would be good to be able to take advantage of these systems without running into more problems.

I have been contemplating the for a while and even though I feel I am good enough to do it in just about any language, I'm a bit nervous about doing it correctly, at least when it comes to concurrency. It appears that there are a variety of solutions and not all of them work as well as we would like them to. Also, to me, the language should make it easy to do this without the programmer having to ensure they don't make a mistake. This isn't quite like C where accidentally using = instead of == causes unexpected behavior. In that case, C is allowing you to be more expressive if you choose. Do it incorrectly and yes, you'll shoot yourself in the foot. Use it correctly, and your code can be more compact and elegant. There's a benefit to doing things in the not-so-clear way. With concurrency, I don't see a benefit to doing any way but the correct way.

Now, concurrency from what I can tell comes in two different forms. First, is threading/forking where you run multiple "processes" on the same machine. This is what I was talking about above and is considered a difficult process. This is mainly because of the sharing of data between "processes." The second type is running a job over multiple machines. The same issues hold true, but appear to be more easily managed. I did this on a project, but I relied on a central database to manage the jobs. By leveraging transactions, I ensured that no two machines worked on the same set of data. This works great, as long as your central server is up and running. Clustering would remedy this, but this doesn't mean it's a good general case solution for all problems. For what I was doing, this worked perfectly because I was expecting each server to perform some long running processes, therefore the overhead of connecting to a central database had a minimal impact on performance.

Making more reliable software is much easier, though we as an industry still fail at it. We have many tools, such as unit testing libraries, static analysis tools, and debuggers as well as techniques for improving software quality. My concerns are more along the lines of high-availability. This doesn't just mean software bugs, it's also about ensuring that if something beyond the programmers control goes wrong, the system can try to remain available.

Initially, I was looking into D to help with this. It has a number of feature that help with this, however I'm not 100% sure the threading model is ideal with respect to keeping me out of trouble. I also looked into Haskell which essentially forced you into doing good things by virtue of it being functional and it's use of software transactional memory is impressive, but it wasn't quite what I was looking for, though I didn't realize it for some time.

I finally found what I think is the right answer: Erlang. You see, the more I read about it and tried to understand it, the more I think it's what I'm looking for. Now, I still have to actually develop some code in it and understand it some more, but the design seems to be just what I want. Each "process" is completely independent of each other and communicates to other "processes" through message passing. Therefore, there are no locking issues, which seems to make things easier. Also, these processes can communicate over multiple machines without any special coding, or at least that's how it seems.

Erlang is also supposed to have very good error handling as it was designed for systems that require high availability. I haven't looked into it yet, but I'm hoping it's pretty easy as well. A good feature that I definitely will be exploring is it's ability to be upgraded in place. This is very nice for when you have to perform an upgrade, but you can't take the system down for very long if at all. Needless to say, these interest me, especially in when I look at what I'm doing at work.

My plan right now is to first try out the communication between processes on a single machine, then across a couple. Not really sure how I'll pull that off other than I may just have to run a couple VMs sometime. Then I'll look into the error handling and upgrades. I'm seriously considering working on a project using Yaws sometime, but that's another post. Web applications are probably where this is going in terms of projects, but I don't know if I'll get to do it at work.

My main concern is database connectivity and integrating with C libraries. The people behind Erlang seems to be very honest about it not being the best tool for every job, which I appreciate, hence why they designed it to access C libraries. In fact, that may be how I end up communicating with a database. Not really sure as there are several libraries for Erlang that do it already. I just don't want these to be the cause of a performance issue.

Anyway, wish me luck in my endeavor as I'm doing this in part to improve software quality at work and in part to make me a better programmer. Oh, and Happy New Year!

Labels: , , , , ,


Post a Comment

Links to this post:

Create a Link

<< Home