Tuning The JVM For Unusual Uses - Have Some Tricks Under Your Hat

Knowing a little more about the guts of the JVM can make a huge difference, especially if you are coding off the beaten track.


Speed counts!


I have been working with Java all the way from Java 1.0 whilst I was as post doc' at Zurich University. Back then I used Java to create simple molecular visualisation applet for web-browsers. Apart from being interesting distraction from writing FORTRAN Java did not seem to be much use for anything serious because it was critically, horrifically and ineffably slow. Move forward a couple of years (I think - it was a while ago) and 'HotSpot' came out. Java moved to version 1.2 and suddenly it all started to make sense. The performance was amazing. From being tens to hundreds of times slower than C or FORTRAN, Java was suddenly only a lot slower. A good friend of mine coined the phrase 'Java speed' for the hypnotically graceful yet glacial way Java applications behaved.


Bang, everything changed! I am not quite sure what happened in the HotSpot technology, but something huge did happen. This was sometime in later 2000 or early 2001. I was working for Armature at the time, on application servers. Adam Flinton - with whom I was working - suddenly said 'you say Java is slow, but I've just seen it go faster than C++' - or words similar to this. Not being the most believing of people, I decided to put this to the test. Low and behold - for some simple loops involving floating point maths, and some simple IO, Java as faster than C++. The world had changed.


HotSpot is a Just In Time (JIT) compiler. It is the progress in this technology which has taken Java from being a performance joke to being as fast or faster than C++. Indeed, sometimes it is now faster than C. But truth be told, and this is a distinction I only really started to consider recently, it is not Java which is fast, it is the JVM.


Java is a high level language which is compiled to Byte Code. Byte Code is a low level language. The JVM is the Java Virtual Machine. It can interpret Byte Code and so make the program actually do something. However, it can also compile Byte Code to machine instructions at runtime. Traditionally, it only does this to those pieces of Byte Code which are executed at lot. This is where the 'Hot Spot' name comes from; code which is executed at lot is often referred to as 'hot' as in a 'hot loop'.




People often think of Java being compiled to Byte Code and the Byte Code being interpreted. This was the way the very first versions of the system worked.


With JIT (hotspot etc.) a common misconception sprung up that some code was just in time compiled whilst the rest was interpreted. This is nearly the case, but reality works out to be much more cleaver than that.


The real situation is that all code is interpreted, at least to start with (with one exception - see below). Once the JVM has seen that a method has been executed more than a certain number of times and that it is a good candidate for compilation, it will be compiled. This has the huge advantage that the JVM has 'seen' how the code is used before it compiles it and so has very good information upon which to base its optimizations.

Separating the JVM From Java

The Java Virtual Machine (JVM) was designed to work with Java. However, it actually knows nothing about the Java language, it only knows about Byte Code. In practice, other languages can be compiled to Byte Code and that code executed. Alternatively, automated Byte Code generators can can create Byte Code from non language sources and again,the JVM can execute this code. These realisations have lead me to start thinking about the JVM as a platform and to start to try and understand how to make it work optimally other than through the more traditional ideas of 'this works well for Java'.


Let me give an example of where thinking in Java terms does not necessarily give a good understanding the the JVM. It is often recommended that Java be written using lots of small methods. This is probably good practice for object oriented programs in general, but it also seems to magically produce faster executing programs. The reason for the speed up is in the JVM not in Java. The JIT in the JVM actually checks the number of byte codes for each method it comes across, and if that count is bigger than a magic number, then the method will not be compiled. The magic number is the 'Huge Method Size' and it has some profound implications.


I came across this not in Java (hence the whole thrust of this post) but in COBOL. I am luck enough to have access to an experimental COBOL->Byte Code compiler. Please note, this compiles COBOL to Byte Code, there is no intermediate Java step here, this actually has nothing to do with Java at all! This compiler compiles COBOL program procedure divisions into Byte Code methods. If all the performs are in-line then it is very easy indeed to end up with a method bigger than the huge method size, which is only 8k on most JVMs.


All this added up to produce a really amazing effect: adding a single accept statement (reading from standard input for non COBOLers) would make a program run 10 times slower! What was happening here? Well, without the accept the method was less than 8k and some the JITer compiled it and the accept pushed it over the 8k limit.


To remove the huge method limit, use -XX:-DontCompileHugeMethods on the java command line. The - after the : tells the JVM to ignore the size of a method when choosing to compile it or not. This works for the 1.6 and 1.7 versions of the Sun JVM, I cannot vouch for any other versions.

Traversing The Code Graph

In my little example above the effect of not compiling huge methods was much bigger than I originally expected; what is more, the understanding which came from learning why has proved invaluable since! As the JVM interpretor parses the code graph from method to methods which that method calls, it is constantly making decisions as to what methods to compile using the JIT compiler. If we consider some method:
method a
...
...
load object x
call method x.b
...
end method

I have used pseudo code here is emphasise that this is not Java, it is Byte Code which we are discussing here. Now, consider the case (which I saw recently) where method b is really small but method a is really big. Without -XX:-DontCompileHugeMethods all a a will be ignored by the JIT compiler. Methods are considered for compilation as they are called and this means that methods called from a will also not get considered for compilation unless they are called from somewhere else enough times. I.e. a huge method not only is not compiled its self but all the methods which are called from it on the code graph will also be ignored.


This really caught me out because in my example of a huge method, all the compuational hard work was being done in a small (mathematical) method which was being called by the hige method. Before I realized that it is not just the huge method but all the sub-methods on that part of the calling graph which are ignore, I was stumped by the results.

-Xcomp - Don't Compile Too Early

Performance profiling JVM executed Byte Code can be tricky. It is important to work out what you are wanting to actually profile. In my case, I am interested in profiling which is relevant to long running programs which have a large JIT code cache. So, what does that mean? Well, it means that I need to ensure that methods which are going to be JIT compiled are JIT compiled before timing starts. The trick to doing this is to performance profile using loops and let the loop spin a bit before starting to time. The below COBOL and Java illustrates this:

  01 startTime.
      02  Hours     PIC 99.
      02  Minutes   PIC 99.
      02  Seconds   PIC 9(4).

  01 currentTime.
      02  Hours     PIC 99.
      02  Minutes   PIC 99.
      02  Seconds   PIC 9(4).
  
  01 tSeconds     binary-long.

....

        perform varying i from 1 by 1 until i equals e1
        if i = 2 then
            accept startTime from time
            move startTime to initialTime
        end-if
            perform varying j from 1 by 1 until j equals e2
                compute r = (a + b) / (a - b)
                compute r = (r + b) / (a - b)
                compute r = (r + b) / (a - b)
                compute r = (r + b) / (a - b)
                compute r = (r + b) / (a - b)
            end-perform
        end-perform

        move initialTime to startTime
        perform get-interval
        display "TotalTime:                                 "
                tSeconds " centi seconds"
        .

For Java it would be something like this:
long startTime=0;
for(int i=0;iif(i==1)
   {
       startTime=System.currentTimeMillis();
   }
   for(int j=0;j

Now that I have the ability to measure the performance of the code ignoring the cost of JIT compiling, I can see what happens if I mess with the settings used by the JIT compiler. An interesting setting is -Xcomp. This forces the compiler to compile all methods as it comes across them (unless something stops it like the huge method limit). This is very different from the normal approach of not compiling a method until it has been run at least 1000 (1500 for the client JVM) times.


One might think that -Xcomp would be a good idea from a performance point of view. However, it is not! The JIT compiler uses those 1000 iterations before compilation to gather information on how the method should be compiled for optimal efficiency. -Xcomp removes its ability to do so and thus we can actually see performance slip. Here is an example:


c:\Users\Administrator\Desktop\scratch>java -Xincgc -XX:-DontCompileHugeMethods -Xcomp BENCHINTERNAL
4-byte binaries:                           +0000000961 centi seconds
4-byte binaries in group:                  +0000000000 centi seconds
8-byte binary DIVIDE:                      +0000000583 centi seconds
8-byte binary ADD:                         +0000000001 centi seconds
8-byte binary SUBTRACT:                    +0000000001 centi seconds
8-byte binary group DIVIDE:                +0000000581 centi seconds
8-byte binary group ADD:                   +0000000970 centi seconds

c:\Users\Administrator\Desktop\scratch>java -Xincgc -XX:-DontCompileHugeMethods -XX:CompileThreshold=1 BENCHINTERNAL
4-byte binaries:                           +0000000952 centi seconds
4-byte binaries in group:                  +0000000001 centi seconds
8-byte binary DIVIDE:                      +0000000530 centi seconds
8-byte binary ADD:                         +0000000001 centi seconds
8-byte binary SUBTRACT:                    +0000000001 centi seconds
8-byte binary group DIVIDE:                +0000000535 centi seconds
8-byte binary group ADD:                   +0000000714 centi seconds

c:\Users\Administrator\Desktop\scratch>java -Xincgc -XX:-DontCompileHugeMethods BENCHINTERNAL
4-byte binaries:                           +0000000976 centi seconds
4-byte binaries in group:                  +0000000001 centi seconds
8-byte binary DIVIDE:                      +0000000486 centi seconds
8-byte binary ADD:                         +0000000002 centi seconds
8-byte binary SUBTRACT:                    +0000000000 centi seconds
8-byte binary group DIVIDE:                +0000000483 centi seconds
8-byte binary group ADD:                   +0000000712 centi seconds

Here we can clearly see how -Xcomp has reduced performance for all but the first test. Setting the compilation threshold to 1 (one method invocation before JIT) is almost as bad and that for most parts of the test, the normal setting of 1000 iterations of a method before compilation is much better.

Some final thoughts

I have just skated across the surface of this huge subject. However, I hope that I have at least managed to indicate how thinking of the JVM as a separate apparatus to the Java language can be a major help in application development and tuning. This is especially so if you are not even using Java!

For the record, the very fastest options I have found so far for my bench test with the 1.7 x64 JVM n Windows are: -Xincgc -XX:-DontCompileHugeMethods -XX:MaxInlineSize=1024 -XX:FreqInlineSize=1024 But I am still working on it...