Thursday, August 25, 2011

Optimization bad? No!

I recently read another article that's has an anti-optimization slant to it. In this case, it's stating that students should not learn various little tricks to save a few nanoseconds off the time to execute a block of code. While this post did have a point, it still angered me.

You see, there have been a number of articles expounding that readability, and probably a few other aspects of coding, are much more important and we should listen to Knuth and not optimize! This recent article even made it sound like such optimizations aren't important anymore and that's what truly got my goat.

Let me phrase this as succinctly as possible: optimizations are at least as important if not more important than they were years ago. These optimizations were originally meant for the original computers where saving a few clock cycles made a large performance improvement. Now, with such fast processors, the theory is that we don't have to save a couple clock cycles here and there because the amount of time saved is inconsequential. Hate to say it, but it wasn't true.

You see, many pieces of software got bigger and bigger. And by getting bigger, there were many components that could be optimized. However, because each component is such a small part of the system, in theory they shouldn't be optimized. However, these optimizations add up and, in many cases, may be in a library used by many aspects of the system.

Now, let's take a step back and look at the reasons for optimization: performance, bandwidth, and memory. Most people talk about performance optimizations where we attempt to reduce the amount of time it takes to execute a specific tasks. And yes, in many cases it probably doesn't matter too much, but to state that one should not learn some of the little tricks to save a few clock cycles is very mistaken.

There are two great examples of where these little tricks are still applicable: big data and real-time processing. Systems like Hadoop process terabytes and more of data, so squeezing out every bit of performance means we get the jobs done faster and potentially with less hardware. With respect to real-time systems, a great article was recently written about the algorithms used on Wall Street and how these companies need results as fast as possible. They even eliminated firewalls in the name of performance!

Another example of where performance matters is web servers, though not in the way you may think. Currently, most web sites use interpreted languages because they're easier to set up and use. However, the use of interpreters does incur a cost in terms of performance and other factors. In some cases there's a shift towards compiled languages. Facebook has even mentioned created a compiler to compile PHP code down to executable code. (There may be a step where it's compiled to C) These savings in these cases isn't just in performance, but in power usage as the faster a request is completed, the fewer resources used. This can lead to fewer servers needed, thus reducing the energy footprint of the servers.

Memory usage is probably the one case where more optimization is desperately needed, but not done. How many application these days use up significantly more memory than their previous versions? How many apps are written in an interpreted language, including Java, and the users must take into account the virtual machine as well as the application itself? How often is memory just poorly allocated? Many Linux distros are capable of running well on a system with 1GB of RAM, but recent versions of Windows require 1GB and don't necessarily run well without more.

By optimizing memory usage, you can allow for more programs to run concurrently without as much contention. Also, you can use cheaper computers to perform the same tasks. Or, better yet, be able to keep more data in memory for those data-intensive tasks. Personally, if I could find a browser that could keep memory usage very small and still be very functional, I would be very happy as it would allow me to have my browser going in my little VM while I'm doing my development.

Part of my job is to create software that can analyze data in batches. The server that this is running on isn't all that big and has several processes running on it. The one aspect of the system that is in our favor is that it is multi-core, so running processes in parallel doesn't affect CPU performance too much, but there's still the issue of memory. So, my software is optimized to use as little memory as possible. Also, this allows the file cache to use more memory, which helps keep file system performance reasonable.

Lastly, there's bandwidth optimization. Here we look at things like compression, JSON vs. XML, and the like. This works for both for file system bandwidth and network bandwidth as reading compressed data off a disk means fewer disk seeks. Similarly, compressed data over the network uses fewer packets. The short of it is, the faster we get data to where it can be used, the better.

Now, most optimizations should be occurring at the algorithmic level. These are things like using a merge sort vs. a quick sort for a sorting multi-gigabyte file or how one distributes work over multiple cores/machines. However, the lower-level optimizations are still important. Isn't it important to save a few clock cycles per record when you're processing billions/trillions of them? Isn't it important to save a few clock cycles when data needs to be converted to information as quickly as possible? If so, then where are people supposed to learn this?

And if people keep downplaying proper optimization, how hard will it become to find developers who can do proper optimization?

Please note that I'm not saying that optimization is the most important part of software development. What I'm saying is that we shouldn't treat it as if it's the least important. Unless we know for certain that performance doesn't matter, primarily because it isn't used very often, then we should attempt make the software as efficient as possible. The process is relatively simple: make it right then make it fast. Well, efficient would be a better word since we know that it's not always about raw speed.

Ugh...getting tired. Here's to hoping this is well written enough to be understandable. I apologize if this sounds angry, but it just burns me that optimization is becoming less and less of a priority, but then we complain about computers getting slower.


Post a Comment

Links to this post:

Create a Link

<< Home