Against any reason, discussions about performance are very popular among software developers. This article explains why most of them are superfluous or premature. It shows examples of such discussions and gives tips on when and how to improve performance.
I often see software developers talking about the potential performance of different solutions. Which programming language has the better performance? How fast are if, switch or bit shifting? Is call by value better or worse than call by reference?
My answer to all of these questions: I don’t care yet! Most of these problems are of absolutely no interest for your final product. The final software will probably be fast enough, no matter which option you choose. So I won’t base my decisions on the answer to this question.
In this article I’m not talking about team or productivity performance, I’m talking about the performance of the software product itself.
Later I’ll show you some examples to validate the above statements. I’ll explain when to care about and how to improve performance.
Justifiable performance discussions
Despite my objection to performance discussions, there are some cases where those discussions are justified. Before blindly assuming that you have one of the cases below, do some rough calculations to see if they really apply to your project.
Big amounts of data
And I mean really big amounts of data, not just a database with 10.000 users. I’m talking of operations on millions or more. Typical examples are social networks, search engines, image, video and audio processing.
These problems are not solved on a technical level, but on a mathematical or conceptual level. To solve them, make yourself familiar with the meaning of the big O Notation and time complexities. Then choose or invent algorithms, data structures or technologies with desired complexities in adding, removing, sorting and searching of data. Also estimation algorithms or distributed algorithms may be helpful.
Short amounts of time
And I mean really short amounts of time, like 100 milliseconds or less. Typical examples are real time applications, like games and video chats or GUI response times.
Those problems are typically solved through parallelization, choice of technology and in rare cases through technical hacks. To give a few examples:
- Graphics calculations are done on the graphics card instead of the CPU.
- Fast network communication uses the low level TCP or UPD layer, instead of the high level HTTP layer.
- Today’s GUIs work parallel to the application logic. Thus the GUI doesn’t freeze while the rest of the application is busy.
Many requests at the same time
This is a typical problem for servers and networks. At rush hours, servers have to answer thousands of requests per second. This problem is solved by using load balancing combined with parallel server clones. Networks can suffer from too many participants. This is solved by not stupidly connecting everyone to everyone, but choosing sophisticated typologies instead.
The problems of data, time and request density may combine to form new problems in any combination. Some examples:
- The user of your applications wants to sort a table with 1000 customers by age. There are only 1000 customers, not billions. It’s normally okay if it takes a second, but a minute would already be too much.
- When Windows searches for a file on your file system, it takes many minutes. What the hell are you doing Windows? You are wasting 8 GB of my disk space for some “index file”, and yet it takes you minutes instead of milliseconds?
- Your server may only get 10 requests per second, but each of them takes 10 seconds to execute. This means your server must be able to execute 1000 requests in parallel, or you’ll have a data jam.
You solve those problems by solving each of the partial problems: data, time and request density.
Often you can wait with performance optimization until the first performance problem really occurs or until someone complains. This occurs more rare than you’d expect it to happen. After it happened, you can profile your application to find the exact bottleneck.
Examples of useless debates about performance
To show what I mean by useless debates, I’ll just take the latest 5 performance questions on stackoverflow as random samples. I reformulated and simplified the questions to fit into this article. Also I gave some rough answers:
Q: Has jquery.add or $(‘#id1, #id2’) the better performance? – source
A: Who cares? Just use what’s more readable. Especially when you’re on the client side!
Q: When I cache something in Ruby, it doesn’t impact my performance. Why? – source
A: Who cares? Because simple caching is already done for you by Ruby. You tried to optimize performance too early.
Q: My Matlab matrix operation is too slow for the giant matrices I have. How to solve it? – source
A: Reasonable question! You’ve got what I called insufficient prototype above. See the accepted stackoverflow answer for some hacks that solve the problem.
Q: Python’s standard matrix operations are too slow. How to fix it? – source
A: Reasonable question! Insufficient prototype again. The Problem is not really about python, you have to invent or find a more sophisticated algorithm, instead of applying many standard matrix operations with brute force… but your first shot at a prototype was good.
Q: I’m building a product search platform. Which search platform library has the better performance? One has a user interface. – source
A: Probably your search engine won’t ever reach a size where performance is challenging for any library specialized on delivering search results fast. So base your decision on other questions than performance.
As you can see, three out of five questions are unnecessary and answerable with “You shouldn’t care!”. These five samples may not be statistically relevant, but they reflect my experience very well.
When to solve performance issues?
At the end! In requirements engineering, one distinguishes between functional and non-functional requirements. Functional requirements determine if the software does what it should do. Non-functional requirements – like performance – only determine how well software fulfills its tasks.
So before caring too much about how well you fulfill a task, you should try to fulfill the task to begin with. Or in other words: quality before quantity. The exception to this rule are the justifiable performance discussions mentioned above.
It was hard to find a statistic which objectively shows how relevant performance is, compared to other tasks in a real project. The best thing I could find, were the Github labels used in their issue tracker. So, as soon as a minimum of project management is applied (by this I mean thinking about the real issues), performance remains that important:
Why are performance discussions so problematic?
For several reasons. First, the discussion and implementation of improvements cost time. Time that could have been better spent into useful tasks, like implementing functional requirements.
Second, they shift the weight of arguments into a wrong direction. Thus, you could take decisions based on irrelevant arguments. Two examples are in the five stackoverflow questions above. One mistake is deciding for a notation based on performance instead of readability. The other is deciding on a library based on performance instead of fundamental questions like SaaS vs. own server.
Recently, I found this video, where a Google expert was asked to compare the performance of Java, Python and Go on the Google App Engine. The Google guy did exactly what I encourage you to do. He argued about anything but performance.
And last, as a result of the early performance optimization, you get some messed up software, which is hard to maintain and update. Donald Knuth put it like this:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil – Donald Knuth
Examples of successfully delaying performance optimization
Although many of my own software projects are cases where performance matters, I still successfully ignored performance issues until they became urgent. Here are some examples.
In one project, I developed the prototype of an image processing algorithm. It had to identify the topology of deformed wallpaper patterns (see picture). There was no need to do it in real time, so my algorithm took a lot of inefficient, brute force approaches. Thus I could keep up a very clean design and could change the algorithm and fix bugs very easily. It took about one hour to process an image.
To be fair, I had one bug occurring at the very end of that whole hour, which I had to wait for three times to fix. But three hours waiting isn’t that much in a 6 month project. In the end, the slow implementation was sufficient for the prototype and was never optimized later on. So any optimization taking longer than three hours would have been a waste of time.
In another project, I implemented a Minecraft clone in Java. Java is often criticized for its bad performance.
We had a 3D-World consisting of 10000 x 10000 x 500 cubes which had to be updated and rendered every 16 milliseconds. Soon, we reached the limits of performance, 100 x 100 x 10 cubes already took 60 times as long as required.
By profiling our game, we recognized that only 1% of the used time was spent for updates and calculations of the world. The other 99% where spend for rendering, i.e. something that was done by the graphics card and game engine only. It was something that had nothing to do with Java, nor with 99% of our code. Wild guesses on performance issues in return, would have had a 99% chance to go into a wrong direction. Instead we had to deliver slightly other data to the graphics card and clone Minecraft’s Chunk mechanism.
In my current project, I develop a method and tool for GUI testing which includes some capture and replay. There were already two performance debates.
Firstly, we discussed much sense it would make to store some data in XML, since XML has bad performance. Luckily, we kept XML. It probably saved me from multiple weeks of workarounds, reinventing the wheel and writing code instead of generating it. Also, the requirements changed and XML is now only used for persistence and not for data transfer anymore. Thus, the two seconds the tools needs for XML stuff, only occur when someone saves a document, which is absolutely acceptable for a prototype.
Secondly I use some brute force algorithms again which slow down the capturing of test sequences. One has to wait around one second after each click and ca. ten seconds after stopping the capture process. This doesn’t feel very comfortable and probably needs to be corrected for a final product, but is totally sufficient for the prototype. At the moment, it’s only about validating if this method works at all. For a productive version, many libraries have to be changed and much code has to be reimplemented. Thus, the performance issues may vanish on their own, just by fixing other issues.
Against any reason, discussions about performance are very popular among developers. I showed some examples of such discussions, explained when they make sense and why to avoid most of them. I explained when and how to do performance optimizations and when to avoid them.
Looking forward to your opinions,
Addendum: I found this tweet, which gives an idea about what can be regarded as BigData:
Event streams smaller than 100000000 items don’t deserve to be called “BigData”. You can easily handle them on 1 PC. pic.twitter.com/rwlUXyvPYy
— Rinat Abdullin (@abdullin) 4. November 2014