Java/JVM POD Part II : It's the economy stupid

The days of sitting around reading about
performance coding are go. We don't
have time to sit around, we need to do it!

Software performance matters now in a way it has not for decades; We really do live in a different world.

In response to my previous post I was asked "does any one care?" i.e. does anyone in the Java community, or any regular day job Java coder, actually care about high performance features like POD and serialisation?

It is a good question but I think one from yesteryear. Things have changed over the last decade a great deal. One way they have changed has been in the quantifiable cost of compute power. Once upon a time, servers were quite underused; often machines would be 10% loaded. Further, a website, payment system or data warehouse would run off a computer (yes - a single computer). Big systems might have had two or even four servers and a hot standby. The cost of the hardware was granular and only considered once every two of three years. A 10% performance gain in the software made no measurable impact on cost. If a system needed a Sun E1600 (or what ever) and then the system was made 10% faster, it still needed an E1600 so the hardware cost did not change.

"You cannot manage what you cannot measure"

Now all the above systems will be running on a cluster, grid or cloud. Reducing a web site from 100 nodes to 90 nodes has an instant, quantifiable financial benefit (your bill reduces instantly). Suddenly the cost of inefficient programs is just that - a cost.

Probably the single best example of a system which would turn POD into saved dollars (or Euros or whatever) is distributed caching. Memory caching is the very life blood of large websites and big data processing. Writing stuff too and from disk is intolerably slow so we cache query results in RAM. But - we cannot cache all the results on one machine because that would form a huge bottleneck; we distribute the caching across a mesh of nodes and the data in the cache is shared out much like a modern NUMA hardware architecture but on a data centre scale. So, we have caches distributed over many nodes and clients moving data in and out of these caches.

Do you see the pattern yet?

Yes, distributed caching necessitates serialisation. Suddenly POD turns to gold.

Here are a few more real world cash benefits to POD and direct off heap access for all POD types:

1) Compute grids for the inter-node communication of queries and results.
2) Pumping database objects over JNI for JDBC drivers.
3) Distributed column based databases.
Photons to the chip will utterly transform the way
we think about node interconnect.
4) Mathematical computing using large off heap stores.
5) Map/Reduce data mining.

Consider this, eventually DMA and other fast interconnect technologies will revolutionise inter node communications in clouds. Fibre to the chip (see Silicon Photonics) will fundamentally reduce latency and hugely increase throughput between the off heap memory of individual processes distributed across grids and in the cloud. As this happens, the current cost of moving data on and off heap for Java will change form a quantifiable cost to a game changing barrier. Nothing in the current JVM road map is preparing this powerful, yet increasingly old fashioned, technology for the onslaught of serialisation demands which are coming.

The Ultimate Expression Of Me

In a small suburban bedroom 33 years ago a nerdy boy played his tune using a computer.

Hand written machine code forced a music from a tinny speaker one bit at a time. Somehow, that moment was never lost.

All these years later, married for 25 years to the most amazing person anyone could wish to meet and with three fantastic children, the adult manifestation of this young boy sat down and made music with another computer. A machine thousands of times more powerful than that with which he played as a child turned the cryptic messages of a beautifully encoded midi into this rendition of JS Bach's Passacaglia And Fugue in C minor. 

Every fibre of my soul soars at this sound. The size, the passion, the beauty. Every note, every vibration, created from nothing but mathematics in the cold sterile brain of a computer. Yet, somehow, the fusion of this great master some 300 years ago and my tiny input on my Mac' have come together to create something which, for me at least, symbolises the human need to create.

Java Power Features To Stay Relvant

Relative count of job openings for C and Java
on Dec 29 2014

Java has a problem; as clock speeds are stagnant or even dropping and core counts are only creeping up, its multi-threaded model is starting to become less and less relevant.

If one has to create an inter-process communication system anyhow, because an application needs to run on multiple machines, then the benefit of multi-threading is eroded. Further, on very heavily multi-core machines with huge address spaces, memory mapped files and shared memory make a very simple way to share data between processes. So, what benefit Java's complex multi-threaded model? Why not write simple, single threaded code and dispatch to separate processes to perform parallel work?

The truth is, just such and approach is becoming more and more accessible. The cost of fork and on a *nix platform is very low compare to the power of modern computers. With copy on write memory pages, forking off a process and using IPC is pretty trivial. One might complain, for example, that Python is single threaded (well the CPython implementation is single threaded in the interpreter) but is this really a problem. Writing multi-threaded code is a real pain when the entire state is shared between two thread so we should limit shared state as much as possible. At that point we might as well just use shared memory. So, multiple Python processes using shared memory to communicate is not really any harder to write for then a single process, multi-thread system.

Indeed, making such a multi-process system stable is easier than a multi-process one. For example, it is pretty much impossible to get the JVM to recover from something as easy to accidentally induce as an Out Of Memory error. So, such an issue causes the whole thing to collapse. In a multi-process approach one of the leaf process can go away - even with a SEGV - and the system as a whole can recover.

From Dec 28 2014

I guess where I am going with this is that Java and JVM languages are under attack from several directions. One is from C and C derived languages (Objective C, C++ etc). Most programmers learn these (at least C) to some extend eventually; they are extremely mature and can be made to go as fast as assembler in the right hands (faster under many conditions). Mutli-threaded programming in a real pain even in C++11 never mind raw C; but as I pointed out above, it is not even required as much these days. Multi process programming with shared state and serialisation is taking over.

From Dec 28 2014

Java and and JVM is also under attack from the Python community. As we can see from both the above graphs, the demand for Python programmers is reasonably flat whilst that for Java is dropping. Further, other interpreted languages like Perl (which was once dominant) and Ruby are having their lunch eaten by Python. If these trends continue, it is reasonable to expect the demand for Python to outstrip the demand for Java in the next two years. These trends simply feed into the support for C and C++ which are easy to interface with and co-develop along side Python.

POD - Off And On Heap

Right now the support for high performance IPC is near non existent in Java; it does not support shared memory other than memory mapped files and it cannot share anything but byte arrays (actually DirectByteBuffers - which is anything - is worse). Java lacks a concept of POD (plain old data) which cripples it ability to use IPC either over a network or via mapped files. In C++ or C the following ins POD:

typedef struct
    size_t x;
    char[10] z;
    double q;
} thing;

In other words, it is directly mapped to a simple block of memory. We can so something like sizeof(thing). We can put thing into shared memory:

thing a_thing;

Where is the serialisation? Where is all the boilerplate? There is non because 'thing' defines the use of a block of memory. This is exactly what Java is missing. It needs POD. Why cannot we have:

class Thing
    long t;
    char[10] z;
    double q;

Now I realise that there is a bit of an issue around the char in Java because normally we would put:

char[] z=new char[10];

But with this char[10] syntax (or something like it) we are defining POD. If Java and the JVM could support POD like that then it could also start to support proper shared memory and proper memory mapped files. We could have arrays of POD data which are not serialised to and from an underlying ByteBuffer but are actually mapped to off heap memory directly.

Non of this actually invalidates Java's garbage collection. We can have off heap storage of an array of POD where the handle to the off heap is attached to the garbage collector just as direct byte buffers work now. The only difference is that this would be a sensible way of doing things which would have realistic 21st century performance rather than the pathos of byte buffers and serialisation we need to work with now. As long as off heap memory is treated as a precious resource (as file handles are now for example) it can be managed using the extended try method introduced in Java 7. I have implemented this approach for the memory management in Sonic Field and it appears to work very well (see link below).

Surely This Is 'ValueType'?

Well, it could be. If we look at the proposal for value types they could be POD. On the other hand, they might not be. The key to POD is that it has a layout in memory which can be copied as a single, simple loop. Further, whilst arrays are covered in the specification, off heap mapping is not. Serialisation is discussed but this is not good enough. We need to be able to copy blocks of memory for serialisation off heap not run a complex program to do it.

Real Off Heap

I have discussed this a little so far but here I shall make it explicit. We need to be able to access off heap objects transparently. There is no real reason this cannot be done; it is just another level of indirection. Currently DirectByteBuffer acts as the only mechanism for this behaviour. Years of experience in the Java community with this structure has show that  Java object connected to off heap memory can be used just fine. Indeed, I use this mechanism in Sonic Field to handle memory mapped files. The problem is that we have to use DirectByteBuffer. Notice the issue 'ByteBuffer'. Even if we want to use the intrinsic double we need to call special methods to translate these into bytes.

To work in a realistic way with the trending multi-process shared resource programming paradigm, Java (more specifically, the JVM) needs to be able to directly address POD in off heap storage.

1) Create off heap storage of type Thing
2) Access that Thing storage directly from Java

For example (just to illustrate - I am not saying this needs to be the syntax).

// Create off heap storage for 10 Thing POD data
Thing[] things=new OffHeapArray(Thing.class,10);

We can explicitly clear the memory:

This syntax is broken - do not criticise it - you will just be being foolish. However, it does contain a key point. Off heap storage should be just like on heap storage. The use of the array syntax highlights this and indicates that all the same optimisations, including SIMD should apply. Indeed, we should be able to have non array off heap as well:

Thing thing=new OffHeap(Thing.class,10);

I realise such a system would be tricky; but it is also necessary. Java can not continue to sit in splendid isolation low clunky heterogeneous interoperation. We cannot afford the cost of storing everything on the garbage collected heap. We must accept the the JVM needs transparent, native speed, communication with off heap, plain old data.

More advantages of off heap pod

Off heap pod has other performance advantages. Sorting structures off heap means they are not traversed in garbage collection. Clearly, POD cannot have a handle to other objects else the model is broken. So:

class Thing
    long t;
    char[10] z;
    double q;

class ThingIEr
    short key;
    Think payload;

This could be legal as the payload member is ThingIEr is not a reference to a Thing (as would be the case for a Java class) but is an inlined instance of Thing. This obviates the need for garbage collection traversal. Thus, large instance counts of POD can be stored off heap and massively drop the garbage collection load.

We can also use off heap stacks for deterministic memory management. Thus, much larger execution flows can be created as 'allocation free'. We move allocation from the heap to allocation to off heap stacks or POD and by so doing allow deterministic behaviour in short execution paths. This approach could bring some of the real time behaviour available in C++ and C to Java without a excessively complex programming model or a complete JVM overhaul.

Why Do I Care?

I really like the JVM. It saddens me that development of the JVM is moving away from solid, high performance technology and concentrating of syntactic sugar like lambda syntax. It saddens me further than consideration of interoperation with other technology is so poorly supported in the Java/JVM world. We have to realise that web services and other programatic systems of serialisation, whilst necessary are not sufficient. High performance, multi-process and off heap programming are required to ensure that Java, and more importantly, the JVM its self, do not loose relevant over the next decade.