Sunday, July 26, 2009

Language choice a micro-optimization?

Last weekend I read an article, whose link I wish I kept, that choosing the language for speed reasons is a micro-optimization. If I recall correct, the article led one to believe that the time a developer spends to create the applications is more important than the speed of the application itself. Granted, there are situations where this is true, however I have to disagree with the article because I don't believe it's a micro-optimization, but a macro-optimization.

First, there are applications where speed is of the utmost importance. Let's take a look at a common one: networking. Let's say you have two choices: Python and C. Which would you rather use to write the code to send/receive messages over a network? Personally, I'd use C. Overhead can be a killer, especially if you consider that there are applications, like web servers and databases, where you have to handle a large number, potentially thousands, of requests per second. Wouldn't you want your networking code to handle all of these requests as efficiently as possible? With Python, you have some overhead by virtue of it being a dynamic language with a garbage collector and, for lack of a better term, enhanced data types that make it easier to write programs. All of this imposes overhead on the runtime of the program. Let's say it's only 30%. Would you want to lose 30% of your network performance just to save a few days of development time?

Second, there's more to performance than just computational speed. What about memory usage? For those who have followed this blog, you'll know I reduced the memory usage of a program I wrote by switching from Perl to D. For the dataset I was testing with, this means I went from 600MB of memory to only 60MB. That's a great nicety when you have multiple applications running on a system.

To me, these are not micro-optimizations, they're macro. It makes me wonder how many applications have been written under the other authors assumption would be better if the person or persons who wrote the software took the time to consider the impact of the language on the runtime performance of the app. I know one that I used would crash every time I tried to load a second large data set in it after I loaded, examined, then closed a large data set. It was written in Java and it would not free the memory it already used, even though it was not using that data anymore. I'm not 100% sure it was because of the Java runtime, but I do know that Perl doesn't free any memory it allocated until the program ends.

Labels: , , , ,

Monday, July 13, 2009

Parallel Regular Expression Engine

Here's an interesting use for multi-core/processor programming: parallel regular expression engines. Now, granted, this only works on certain regular expressions, but in the cases where the regular expression has to choose between two or more different paths, it can slow down a regex, especially if it's complicated. Now, if you do each branch in parallel, then as paths don't match, they simply stop processing. At least this is how I see it working as I'm thinking in terms of Erlang-style concurrency. From the perspective of a programmer, I don't care about the paths that fail, I only care about the path that succeeds.

Now, I'm not 100% sure how much of an improvement this will give us in the normal cases, however for more complicated patterns may see some significant benefit, perhaps even making them more possible. And by more possible, I mean make them more likely to run in a reasonable amount of time.

Labels: , ,

Tuesday, July 07, 2009

We need MySQL/Drizzle

Yes, I said it. We need them. I came to this realization yesterday when I saw an article about TokuDB and how it apparently can save you money by being energy efficient. The premise is simple: fewer disks seeks = less energy used.

This type of development is just cool and I don't see it with any other product. This not to say that PostgreSQL and Firebird aren't cool. MySQL and Drizzle just provide opportunities to do some very cool things due to the fact they're both very modular. If we didn't have them, I think the DB world would be more boring. I just hope we can get to the state where we can have a lot of the nice features that a PostgreSQL or Oracle have and still have the flexibility of a MySQL or Drizzle.

Labels: , , , , ,