Comparing Java 7 Garbage Collectors Under Extreme Load

"Realtime contracts with the standard JVM garbage collectors" That was what this post was supposed to be. I will write that another day because I have found some amazing things about JVM 7 garbage collectors and have serious concerns about the new G1 collector.

To explain, a realtime system must have a deterministic time to complete a task. "Stop the world" garbage collectors break this model. I have an idea how to allow a mix of deterministic/realtime and normal JVM execution without having to go over to the full Java Real Time VM. I was going to write this up but I cannot right now because I have run out of time. I ran out of time because there are some fascinating things going on in garbage collection.

Here is my 'kick the crud out of the garbage collector' code. I wrote this based on a lot of experience with garbage collectors in the CLR and JVM (and some I wrote my self in C++). This is designed to give a garbage collector a 'really bad day' and it works!

package com.nerdscentral.realtime.spikes;

import java.util.ArrayList;
import java.util.List;

public class KillTheGarbageCollector {
 
 private List<String> outerStore = new ArrayList<>(1000000);
 private List<String> oldStore = new ArrayList<>(1000000);
 private static int nThread=3;
 
 public static void main(String[] args){
  
  for(int i=0;i<nThread;++i){
   new Thread(){
    public void run(){
     new KillTheGarbageCollector().go();
    } 
   }.start();
  }
 }
 
 private void go(){
  long time=System.currentTimeMillis();
  for(int i=0;i<1000;++i){
   if(i%100==0){
    doOld();
    time=System.currentTimeMillis(); // reset the clock
   }
   for(int j=0;j<1000000/nThread;++j){
    outerStore.add(((Integer)j).toString());
   }
   outerStore.clear();
   long t=System.currentTimeMillis();
   System.out.println("" + (t-time));
   time=t;
  }
 }
 
 
 private void doOld(){
  oldStore.clear();
  for(int j=0;j<10000000/nThread;++j){
   oldStore.add(((Integer)j).toString());
  }
 }

}

The code does its dirty work by creating a huge array of 10 million strings and keeping them around whilst it repeatedly creates another array of 1 million strings. Every 100 creations of the million strings, it makes the 10 million ones garbage and then makes another 10 million. That kills the young and old generations of the garbage collector (if it is generational) and gives it a real run for its money. It also makes a lot of references to small objects which is unkind to say the least. Finally, it breaks the task over several threads (thus the there are more arrays which are smaller). That is savage because of the resultant memory contention and synchronisation.

But - this is also exactly the sort of high load that modern programming techniques with lots of immutable small objects can actually create; in my view, a garbage collector should be able to function well under these conditions.

The G1 collector is no good at all under these conditions.

The default and incremental garbage collectors in the JVM 7 (Java HotSpot(TM) 64-Bit Server VM (build 21.1-b02, mixed mode)) do OK but not great; the G1 collector is no good at all. To show this I took the timings from the above code and pivoted them when divided by ten to produce density charts giving the time for the inner 1 million allocation to take place in hundredths of a second (less than one hundredth of a second timing in these conditions is not very reliable on the standard JVM).

This is the performance using the default garbage collector. We can see that some iterations took a
very long time (over two seconds) whilst the vast majority were less than 4 hundredths of a second.
The cumulative time was 203.22 seconds.
The incremental garbage collector is much the same but with the timings more spread out at the lower end.
The pronounced 'second bump' is clear in this chart as well as in that for the default. This is almost certainly
due to major collections (i.e. old generation collections). Please note that the maximum time is also
longer for the incremental collector making it even less viable for time constrained work. The cumulative time
was 192.29 seconds just making it the fastest run of the three.
The G1 collector has a large amount of density in the 7 to 50 centi-seconds region. There makes the very all performance
bad with cumulative being 522.37 seconds. Yes - the run took over twice as long as using the default collector! We can
also see that it is also worse than the default collector in terms of the longest time for an iteration.
So, what on earth is going so horribly wrong with the G1 collector?


This is the visual-vm view of the run using the G1 garbage collector. Of interest is not
the time taken in the garbage collector, but the time that the processing threads are not
running at full speed which is associated with collections.

The trace for the default garbage collector shows that the working threads are getting
more time to run at full performance.


These test completely took me by surprise. In the past, I have not noticed much difference between garbage collectors in the JVM (10% at most between the different ones). The difference between the G1 and default collector is huge (257%). The test machine was a Window 7 x64 box with a dual core I5 Intel processor and 6 gig of RAM. I ran the test using 3 worker threads (that machine having 4 hardware threads). 

My guess as to what is happening is that synchronisation in the G1 collector is causing the throughput to drop significantly. Maybe in Oracle hardware with more highly tuned locking this effect would not be seen. However, on commodity hardware and Windows as the OS, I would strongly suggest avoiding the G1 collector unless you are prepared to put in a lot of benchmarking time to establish it is not giving these troublesome results.

** Please note that the reason the benchmark only measures the time taken for the smaller, young generation allocation is that it was originally designed with the idea that this part could be protected in the realtime contract which I will discuss in a later post.