Thursday, January 04, 2007

The Java performance debate

  1. Java is slow
  2. Java is a memory hog

The corollary being that Java on the desktop is infeasible for those without the patience of a saint. Funnily enough, in my experience, many people who have commented to me, "Andy, I don't like Java because it's slow" are the very same people who use Perl/Python/PHP. It suggests to me that not all the criticism is entirely objective! This article aims to discuss some of the criticisms aimed at Java and see whether they are justified.

Memory

To be honest, even the most faithful Java evangelist would have trouble believing that Java is light on memory usage (relatively speaking, of course). I don't believe that it is, so I'm not going to pretend. Object-orientated languages typically have a slightly larger memory footprint as you have to carry around a lot of information about all your objects currently initialised. A large factor is simply the JVM itself. A simple "Hello world!" class of say a kilobyte will still require the entire JVM, and the several megabytes that entails. Yet, you must think about what you get for your money, so to speak. The JVM and the Java runtime classes are feature packed.

However, one man's feature is another man's bloat and I suppose this is where a lot of the debate stems from. When designing languages, you go either minimalist, and rely on the users to implement all the functionality they need; or you go for the opposite, and provide an extremely rich language where developers can rapidly produce software. As a Java developer, I don't need to manage my memory programmatically: I can leave it all to the garbage collector. This is great for me, but not necessarily the most efficient way to manage memory. If you want to partake in old-school manual memory-management then you can do a certain amount (such as dereferencing your objects and then explicitly calling the garbage collector), but nothing as low-level as what C programmers would be accustomed to. There are other things you can do as a Java programmer to reduce the overhead such as using third-party libraries designed to be more efficient, such as fastutil or Javolution, which replace commonly used classes like the collections framework.

It's worth being aware that the JVM by default doesn't use all the available memory on a given system. This means that by default a Java application won't overwhelm your system and cause heavy swapping and other such nastiness. Programs known to be memory intensive by nature, like scientific applications can in fact be limited by the JVMs conservative usage, and so many developers launch their Java apps with special JVM flags that permit Java to utilise additional memory.

Speed

To be fair, before I began looking into this, I had never thought that Java was the fastest language out there. I suppose, however, that because I'm still using Java, that I believe it's fast enough. Assuming that C++ is the holy-grail in terms of performance because it's so fast, then even if Java could achieve half it's speed, it's still fast! People seem to get so focussed on the milliseconds that they forget that if a task takes 0.01s in C++ and 0.02s in Java, then, "oh no", it's half as slow!

Yet, since mulling over this topic, I have found interestingly that Java has really made great gains in overall performance that can in fact put it on par with C++, if not a little quicker! There are various benchmarks that have reported Java algorithms running quicker than C++ equivalents. (Java Pulling Ahead, Java Faster than C++, FreeTTS case study) You will of course find many benchmarks finding the converse. What this shows is that they're at least comparable which is enough for me to imply that Java is fast, and I'll leave it to the benchmark zealots to fight over their nanoseconds.

I think Java is somehow still seen as an interpreted language; in fact, it does get compiled to native code using Just In Time (JIT) compilation. It is also a myth to think that JIT code is slower than pre-compiled code. The only difference is that bytecode gets JITed once its required (i.e., the first time a method is called - and the time is negligible) it then gets cached for subsequent calls. JIT code can benefit from all the same optimisations that pre-compiled can get, plus some more (from Lewis and Neumann, 2004):

  • The compiler knows what processor it is running on, and can generate code specifically for that processor. It knows whether (for example) the processor is a PIII or P4, if SSE2 is present, and how big the caches are. A pre-compiler on the other hand has to target the least-common-denominator processor, at least in the case of commercial software.
  • Because the compiler knows which classes are actually loaded and being called, it knows which methods can be de-virtualized and inlined. (Remarkably, modern Java compilers also know how to "uncompile" inlined calls in the case where an overriding method is loaded after the JIT compilation happens.)
  • A dynamic compiler may also get the branch prediction hints right more often than a static compiler.

Even if it were slower, once again, you have to think what value for money you get per-clock cycle with Java. Many of the core classes are thread-safe as standard, for example; and let's not forget free garbage collection. Also, that bit of code will run fine on all supported platforms without any additional effort. All the OS abstraction is done for you. That's pretty incredible when you think about what it takes to actually pull off such a large abstraction layer like that.

Java Virtual Machine

The JVM once again causes controversy when evaluating Java's speed. Java benchmarks typically discount the startup time of the JVM itself. Non-Java benchmarkers would consider this unfair, and not fully representative of the overall program performance. It's a difficult debate. If I want to simply compare, say, raw Java Vector performance versus raw C++ STL vector performance, then why should we consider start-up times? Also, if I'm trying to profile algorithms within my program, again, why should start-up be useful in evaluating? But, if I'm comparing two programs that perform an equivalent task, I'd probably want to consider the overall time from initial execution to completion. Having said that, the larger the application, the less of an issue the JVM load times will be due to economies of scale. (Note, even this is getting better thanks to Class Data Sharing, amongst other things) One other thing to remember is that many Java applications are actually deployed as a Jar file (a glorified Zip file, essentially). You can execute the code within from the JVM directly - it therefore has to decompress the file first, and this obviously incurs a time penalty.

The Project Barcelona research group at Sun are coming up with some interesting technologies to improve the JVM in the future. That's not to say that the current JVM hasn't been improving steadily during each release. The Multi-tasking Virtual Machine (MVM) looks most promising as it'll allow multiple Java applications to share the VM, reducing the overall burden on system resources. Ok, I know you can do this already on the current JVM, but it's still a little raw. The MVM will be much more scalable and efficient. It seems to me that with such a system, you can finally run the JVM as a daemon on your system at startup, and then all subsequent Java programs you load will load as quickly as native apps. One can't help but lose some optimism when you learn that the MVM may not even make it into Java 1.7 (Dolphin) due in the second half of 2007. I hope it does! In the meantime, you could to better than to look at some of the existing shared VMs like Janos, JKernel and Alta.

Java desktop

It's on the desktop where many people form their opinions of Java, and it is here where people complain most about performance issues. The Swing toolkit is very powerful and rich, yet it is huge. Werner Randelshofer states that a Swing "Hello world!" program will require approximately 800 classes to be initialised before the user sees the infamous greeting. Yet, there is no shortage of Java developers who will tell you that Swing is fast.

Perceived performance

What those in the know will tell you is that Java in fact has a problem in perceived performance. You often hear about grey rectangles, unresponsive widgets, etc. The issue here is that Swing is perfectly responsive, until you hit a widget that triggers a task. The biggest trap Swing programmers fall into is that they don't understand the threading model that is employed.

Here's the science...

The Java designers decided not to make Swing components thread-safe, and instead opted for a single thread (known as the event-dispatch thread), to maintain the state of the GUI components. Therefore, your code that you want to run when a user clicks on a given button by default gets added to the EDT. If the code is something non-trivial, like loading a large file or querying a database, then this will block the EDT until it completes, resulting in an unresponsive GUI in the meantime. The remedy is to dispatch long tasks (i.e., anything likely to take > 0.25s) on their own thread, which frees up the EDT for doing purely GUI related tasks (see Ben Galbraith's excellent tutorial).

The EDT approach is hardly exclusive to Java because it makes toolkit design much easier. You'll find .Net's Winforms opts for a similar model. The EDT is supposed to simplify things, yet it seems to be a common pitfall. Some one commented recently:

"... I may point out that Swing is not slow... The problem is that's very easy to shoot yourself in the foot with Swing, but can happen with any toolkit, even those using native code. Well written Swing code can be as fast as a native application, if not faster."

Shooting yourself in the foot

I was interested in this comment and decided to go off at a bit of a tangent here to examine if Java is really that prone to making unresponsive interfaces. I was discussing this with Jasper Potts from Xerto (producer of the visually impressive Java app, Imagery) and we got on to documentation:

"Swing itself is not slow, but that's not to say that it's that easy to write large applications that perform well. There is definitely room for some more articles on using Swing in the desktop... There is a real need for Swing books that tell you how to do things in the real world. [Desktop Java Live] is the first book I have come across that is trying to do that. Even all the books called "Advanced ...","Extreme...", etc., all they do is redo the Java [API] docs for the more advanced components. There are a few topics that just don't seem to be documented at all well anywhere like the Focus System."

Scott Delap, author of Desktop Java Live indulged me an a discussion on this topic too:

"Java/Swing being slow is one of those myths that has become embedded in the mindset of developers. I can easily write a desktop application in any language that is slow. I would suspect that 8 out of 10 developers that comment "Swing is slow" haven't written a Swing app in at least 5 years. It is much like the Linux developer community labelling Microsoft as evil. Even if Microsoft became the best open source citizen tomorrow, it would take many years before the majority of developers would have this opinion of them. As far as countering the statement, I think the best rebuttal is the continuing production of quality desktop applications written in Java. The more there are the harder it is to argue against them."

DJL devotes an entire chapter to threading in Swing. I asked Scott whether he thought threading was a common pitfall for Java Swing developers and therefore was it something inherently wrong with Java's design:

"Anyone that is a software developer gets paid for a reason. Software development isn't something that you can master in a three day correspondence course. I've always felt that it is each developer's responsibility to become competent with the features of the language/toolkit they are developing with. Threading is a core part of writing desktop applications. You can easily find discussions about desktop threading on the Internet in regards to numerous other toolkits. It makes no sense to me when people complain about threading in respect to a language. No one complains they have to learn OO to use Java or for-loops to use .NET effectively. In regards to Swing specifically, it is designed very similarly to other common UI toolkits. If you look at QT, GTK, SWT, .NET, etc., you will see they all use some variation on the single event thread concept. Some have the concept of a global lock, however I'd much rather have the ability to address such complexities myself than a brute force approach supplied for me. I will concede that a few more threading utilities built into the core of Swing might help new developers. However, open source projects such as Foxtrot, Spin, and SwingWorker address a large portion of Swing threading related issues. Relating back to my earlier point, I think it is the responsibility of good developers to become aware of the tools that can enhance their development process."

An interesting point here is that other UI toolkits use this threading model, but you rarely hear that these toolkits are slow. Yet, I have seen examples for most which also block the thread and therefore become unresponsive. I think Java has had some bad luck in that its legacy has stuck. Nowadays though, I do believe that if you see a slow Java application, the fault lies at the developer's door, and not Java. For example, Eclipse had recently come under fire for becoming slow - especially with its start up. Critics automatically assume that this is because of Java. In its most recent M7 milestone release they significantly improved performance - a bottleneck being the plugin framework that Eclipse relies not being very scalable for increased numbers of plugins. The point being that this is an algorithmic issue - one that could have occurred in any language, yet it seems to become a Java issue. Of course, some more prominent articles and examples covering common pitfalls would certainly help to ensure new UI programmers get good performance from the outset.

Java2D

Some other performance issues have been down to the underlying graphics framework, Java2D. Because Swing components are all emulated, they are very much at the mercy of Java2D. Fortunately, this package is extremely competent in features, and by virtue of being graphics orientated, very quick too (i.e., performance of any graphics framework is a core priority, and Java2D is no different). However, the Java2D have been working hard recently to improve several aspects that should enhance this even further. One area is the use of hardware acceleration where possible. This must be a nightmare to implement in a multi-platform fashion, but it's being done, slowly but surely. I hate to say it, but Microsoft has made this easier for the Windows platform due to its DirectX graphics abstraction layer. Windows users can expect to see this pipeline utilised even more in future releases. The OpenGL pipeline is also being advanced significantly which will benefit all platforms (currently, support is disabled by default but can be switched on; this will be enabled in the next release.)

There are also a few little tweaks for Mustang that will even help in the "perceived" performance problems. For example, the abolition of the infamous "grey rect" problem. The fix required the use of true double-buffering support for Swing, which is another great boost for the toolkit.

Will someone please think of the users?!

As a Java developer, I will happily sacrifice some memory, and a bit of speed, because the Java platform affords many other benefits. I find myself being productive with Java because of its richness and ease of use. I do benefit from the multi-platform aspect too, because I use Linux to develop yet many of my typical users are on Windows.

However, the argument is that a potential end-user doesn't care how you made the software, but simply how well it works. If my program becomes unusable due to say, lack of memory, that's one disappointed user. Are they happy that you've sacrificed resources to make your life easier? Well, I'd like to think that such cases are extreme and that actually users will be happier that one uses Java, because I can produce better software with it. I suppose that if I knew in advance that the target machine specification was one where CPU and memory resources were at a premium, then I expect I would consider strongly the alternative technologies. But, much can be done to make your Java code more CPU and memory efficient, but you have to program specifically for that.

Summary

My experience with Java Swing (as a user and a developer) over the past year has been very positive. There was a lot of mud slung at Java in its early days and I reckon that much has stuck. This is in spite of Swing - and Java itself - being faster than ever. There's no reason that Java applications should be poor performers nowadays... unless you're low of memory, that is. Memory consumption is still a make-or-break issue, I think. The "memory is cheap" come-back is not very useful in this situation either, and I think Sun knows this and are hoping to continually improve on this issue. Yet, don't be fooled into thinking that it's only Java that can suffer from memory issues.

Despite all that's been discussed, I know that this will not help to change many peoples' minds. As Lewis and Neumann say "...in web flame wars, people are happy to discuss their speed impressions for many pages without ever referring to actual data." I'd like to think that those Java critics reading would re-examine the current state of Java, especially with the many enhancements coming in the next release. Joshua Marinacci recently blogged: "When I see an ugly webpage I don't blame my browser, I blame the site designer." This can now be applied to Java performance issues, insofar as poorly performing Java programs are due to poorly programmed Java code, and not because it's a Java program.

Probably the only real way of proving how far Java has come is by demonstation, yet Jasper pointed out:

"Java lacks the great applications that show what is possible; some are getting there like Netbeans or IntelliJ but nothing mass market. Limewire is ok, but you can't say wow! about it."

Perhaps this is why the perception of poor performance still exists: think of all the typical everyday apps one uses, such as email client, browser, IM client, media player, etc. There aren't many Java examples within those categories. I bet within the enterprise, there are lots of bespoke solutions that already show off Swing's potential - yet we'll never see them. Many Java developers are aware of Java's capabilities because many of the best examples are tools for Java developers! I've found apps like Intellij - a massive and complex Java IDE - to be blisteringly fast. Java is already good enough for the desktop, but I really believe that Mustang will be a watershed release and we'll be seeing quick growth of (well written and good performing) Java desktop apps in the next couple of years.

No comments: