Sunday, December 28, 2008

Creating More Reliable Software With Good Concurrency

First off, hope everyone had a good holiday season.

Second, something has been on my mind of late: creating better quality software that has good concurrency. You see, I'm working on a system right now where high availability is a major concern, at least for me. Also, with multi-core systems becoming the norm, it would be good to be able to take advantage of these systems without running into more problems.

I have been contemplating the for a while and even though I feel I am good enough to do it in just about any language, I'm a bit nervous about doing it correctly, at least when it comes to concurrency. It appears that there are a variety of solutions and not all of them work as well as we would like them to. Also, to me, the language should make it easy to do this without the programmer having to ensure they don't make a mistake. This isn't quite like C where accidentally using = instead of == causes unexpected behavior. In that case, C is allowing you to be more expressive if you choose. Do it incorrectly and yes, you'll shoot yourself in the foot. Use it correctly, and your code can be more compact and elegant. There's a benefit to doing things in the not-so-clear way. With concurrency, I don't see a benefit to doing any way but the correct way.

Now, concurrency from what I can tell comes in two different forms. First, is threading/forking where you run multiple "processes" on the same machine. This is what I was talking about above and is considered a difficult process. This is mainly because of the sharing of data between "processes." The second type is running a job over multiple machines. The same issues hold true, but appear to be more easily managed. I did this on a project, but I relied on a central database to manage the jobs. By leveraging transactions, I ensured that no two machines worked on the same set of data. This works great, as long as your central server is up and running. Clustering would remedy this, but this doesn't mean it's a good general case solution for all problems. For what I was doing, this worked perfectly because I was expecting each server to perform some long running processes, therefore the overhead of connecting to a central database had a minimal impact on performance.

Making more reliable software is much easier, though we as an industry still fail at it. We have many tools, such as unit testing libraries, static analysis tools, and debuggers as well as techniques for improving software quality. My concerns are more along the lines of high-availability. This doesn't just mean software bugs, it's also about ensuring that if something beyond the programmers control goes wrong, the system can try to remain available.

Initially, I was looking into D to help with this. It has a number of feature that help with this, however I'm not 100% sure the threading model is ideal with respect to keeping me out of trouble. I also looked into Haskell which essentially forced you into doing good things by virtue of it being functional and it's use of software transactional memory is impressive, but it wasn't quite what I was looking for, though I didn't realize it for some time.

I finally found what I think is the right answer: Erlang. You see, the more I read about it and tried to understand it, the more I think it's what I'm looking for. Now, I still have to actually develop some code in it and understand it some more, but the design seems to be just what I want. Each "process" is completely independent of each other and communicates to other "processes" through message passing. Therefore, there are no locking issues, which seems to make things easier. Also, these processes can communicate over multiple machines without any special coding, or at least that's how it seems.

Erlang is also supposed to have very good error handling as it was designed for systems that require high availability. I haven't looked into it yet, but I'm hoping it's pretty easy as well. A good feature that I definitely will be exploring is it's ability to be upgraded in place. This is very nice for when you have to perform an upgrade, but you can't take the system down for very long if at all. Needless to say, these interest me, especially in when I look at what I'm doing at work.

My plan right now is to first try out the communication between processes on a single machine, then across a couple. Not really sure how I'll pull that off other than I may just have to run a couple VMs sometime. Then I'll look into the error handling and upgrades. I'm seriously considering working on a project using Yaws sometime, but that's another post. Web applications are probably where this is going in terms of projects, but I don't know if I'll get to do it at work.

My main concern is database connectivity and integrating with C libraries. The people behind Erlang seems to be very honest about it not being the best tool for every job, which I appreciate, hence why they designed it to access C libraries. In fact, that may be how I end up communicating with a database. Not really sure as there are several libraries for Erlang that do it already. I just don't want these to be the cause of a performance issue.

Anyway, wish me luck in my endeavor as I'm doing this in part to improve software quality at work and in part to make me a better programmer. Oh, and Happy New Year!

Labels: , , , , ,

Wednesday, December 03, 2008

Portability via Stored Procedures

This was a thought that just hit me yesterday. If you encapsulate all of your queries into stored procedures, I think if you build your app correctly, it can increase the portability of your application. See, if you use something like Perl's DBI, you can treat every RDBMS identically from an API perspective. So, if all you do is call a stored proc, all you should have to do is change the driver/connection string and you're done. Your stored procs can hold all of the queries and still allow you to use vendor-specific features to be as optimized as possible. Granted, you still have to maintain a set of stored procedure scripts per vendor, however chances are you'll have to have to maintain SQL scripts anyway for table creation, tablespaces, etc.

I bring this up because in every discussion I've seen related to databases, stored procedures, and portability, I've never seen this suggested.

Labels: , , ,

Current MySQL Release Comments

Well, MySQL 5.1 has been released and it's apparently not up to par with previous releases. While we all know that no software is completely bug free, unless of course you're talking about NASA (WOW!), apparently some major bugs did slip through. Granted, these are edge cases if my reading is correct, however it is cause for concern.

So, does this mean that the anti-MySQL should rejoice and say "Hah! Ours is better!" No. First, there are still things you can do with MySQL that should still work just fine and can't be done with other RDBMSs. Second, your systems have their problems as well. While vendors like Oracle, and I think even the PostgreSQL people to a degree, would love to say that theirs is the end-all, be-all of databases, they aren't.

Now, how does this impact my view of things? Well, it was kind-of a shot in the arm to maybe look into other solutions a bit more. Yes, I most likely will still be using MySQL for some projects, but oddly enough, I'm now have an interest in PostgreSQL, mainly for some of the more programmatic things it can do. It has its warts, but I think it brings a lot of nice things to the table. Also, I'm now more interested in Drizzle.

Maybe I should look into Firebird as well. Anyone know what the benefits are of it compared to PostgreSQL?

Labels: , , , ,