Comparing C++ -> Java and Java -> C++ Performance

Let's do some serious clock cycle
counting instead!

Copyright: Dr Alexander J Turner
It is 'common knowledge' that calling Java from native code is slower than calling native from Java. I remember reading a paper from the turn of the century which demonstrated this.

The conclusion and that which I have picked up from other developers and papers like Dawid Kurzyniec , Vaidy Sunderam - 2001 is that calling Java from native incurs a time around 10 times greater than calling native from JVM code.

The question is, as most everything else in the JVM has changed, why would we expect these figures to remain relevant? I have created a very cut down test system to check out the cost of calls in the two directions. To get the cleanest results possible my test has the following characteristics:

1) The methods being called just increment a 64 bit signed integer.
2) We test using 'warming loops' to give the JVM chance to get JIT in place.
3) Timing is all done using C++ routines so that they are comparable.
4) I use rdtsc to get raw clock cycle counts. I will discuss this later.
5) The methods being timed are called one million times between timing calls
    to amortise the overhead of acquiring the time.

Writing JNI in C can be a bit tiring. Writing JNI code using javah is just a ridiculously hard mess. However, writing JNI with C++11 style code and using method injection is clean, elegant and easy. I have used the latest version of the Visual Studio 2010 C++ compiler rather than a full C++11 compiler. However, it support lambdas enough to do what I want and it has TR1 in place as well. It is only for_each on a [] lambda which I had to remove to back port to from true C++11; however, bind1st and a function pointer did the trick for me:

JVMInterface::~JVMInterface(){
        for_each(globals.begin(),globals.end(), bind1st(mem_fun(&JVMInterface::deleteRef), this));
        jvm->DestroyJavaVM();
    }

Which JVM?
I used Oracle 1.7_5 server x64 on Windows 7 Professional on a Intel I5 1 socket, 2 core machine with the cpu locked at 2.4 GHz. My understanding is that rdstc is consistent between cores on the I5 (not sockets - but cores). The C++ was compiled with full program optimization for speed turned on.

So - what are the results?

I will explain the code and how it all works - but let's just jump to the result quickly. Just as with the previous authors, I found that calling manage code from native is very slow indeed. It takes around 20 times more clock cycles for these trivial calls than calling native from Java:

Performing tests where Java calls C++
=====================================

Initial run from Java took 35789061 Cycles
Performing warmed runs
Warmed  run from Java took 25 Cycles per call
Warmed  run from Java took 25 Cycles per call
Warmed  run from Java took 25 Cycles per call
Warmed  run from Java took 25 Cycles per call
Warmed  run from Java took 23 Cycles per call
Warmed  run from Java took 24 Cycles per call
Warmed  run from Java took 24 Cycles per call
Warmed  run from Java took 25 Cycles per call
Warmed  run from Java took 24 Cycles per call
Warmed  run from Java took 25 Cycles per call
****
Performing tests where C++ calls Java
=====================================

Initial run from Java took 531871728 Cycles
Performing warmed runs
Warmed  run from Java took 508 Cycles per call
Warmed  run from Java took 511 Cycles per call
Warmed  run from Java took 521 Cycles per call
Warmed  run from Java took 501 Cycles per call
Warmed  run from Java took 580 Cycles per call
Warmed  run from Java took 519 Cycles per call
Warmed  run from Java took 515 Cycles per call
Warmed  run from Java took 605 Cycles per call
Warmed  run from Java took 510 Cycles per call
Warmed  run from Java took 504 Cycles per call
****

These results show that my work on using lambdas as continuations might well be worth using in a performance critical situation where the native really has to return control to the JVM code. The best alternative, if it is possible, is to marshal everything the native code will require together and pass it in with a single call from JVM to native call. 

Future Work
I would like to look at the performance of field access from native to JVM code. Also of great interest will be what performance looks like on arm/dalvik. I will need to replace my timing system, but other than that, and normal platform porting, the code should just work. As Google appear to be relying on the NDK as a way of getting high performance into Android, this is a key area of interest to me and other Android developers.

The Code
I never use javah unless someone forces me to. The 'Register Natives' approach is very much cleaner and simpler than javah. One huge benefit is that if there is a mismatch in signatures between the native and JVM methods it can be detected programmatically rather than the binding silently failing and the program only showing symptoms when the JVM calls the non-linked native method.

Here is my main JNI interface class which provides an easy to use micro-framework for JNI coding:


#ifndef JVMINTERFACE_H
#define JVMINTERFACE_H
#include "Windows.h"
#include <stdio.h>
#include <tchar.h>
#include "jni.h"
#include <iostream>
#include <functional>
#include <string>
#include <memory>
#include <utility>
#include <vector>
#include <algorithm>
#include "timing.h"
using namespace std;

namespace nerds_central {
typedef pair<jclass,jmethodID>               jmethodInfo;
typedef pair<jclass,jfieldID>                jfieldInfo;
typedef function<pair<jlong,jlong>(JNIEnv*,jobject)> continuation;

// An RAII object for managing JVM instances
class JVMInterface{

private:
    // RAII
    JVMInterface(const JVMInterface &);
    JVMInterface& operator=(const JVMInterface &);

    // Real
    JNIEnv* create_vm(const char*);
    JNIEnv* env; // the env for creation, do not used in calls from JVM!
    JavaVM* jvm;
    vector<jobject> globals;
    void deleteRef(jobject ref);

public:
    JVMInterface(const char* cp)  {create_vm(cp);}
    ~JVMInterface();

    void        bindNativeMethod(char*,char*,char*,void*);
    jmethodInfo findJVMMethod   (char*, char*, char*,bool);
    jfieldInfo  findJVMField    (char*, char*, char*,bool);
    JNIEnv*     getEnv          (){return env;}
    void        makeJNICall     (function<void()>);
};

typedef shared_ptr<JVMInterface> JVM_ptr;

class JVMPerformanceTester{
private:
    JVMTimer startTime;
public:

    JVMPerformanceTester() : startTime(){}

    static jlong getInstance(JNIEnv* env,jclass clazz){
        void* ret=new JVMPerformanceTester();
        return (jlong) ret;
    }

    static void deleteInstance(JNIEnv* env,jclass clazz,jlong ptr){
        delete ((JVMPerformanceTester*)ptr);
    }

    static jlong getRDTSCCycles(JNIEnv*,jclass,jlong);

    jlong getElapsedCycles(){
        return startTime.getElapsed();
    }

};


}// nerds-central
#endif // JVMINTERFACE_H

#include "jvm_interface.h"
#include "timing.h"

namespace nerds_central{

    // JVMInterface definitions
    JNIEnv* JVMInterface::create_vm(const char* classPath) {
        JavaVMInitArgs args;
        JavaVMOption options[1];

        args.version = JNI_VERSION_1_6;
        args.nOptions = 1;

        // TODO: clunky
        // create the constant data incoming to something the JVM can use
        string classPathBuilder = "-Djava.class.path=";
        classPathBuilder+=classPath;
        const char* cp=classPathBuilder.c_str();
        
        // clean way of getting rid of const
        unique_ptr<char[]> cp_ptr(new char[classPathBuilder.size() +1]);
        memcpy(cp_ptr.get(),cp,classPathBuilder.size() +1);
        
        // I assume the string is copied in the JVM!
        options[0].optionString = cp_ptr.get();
        cout << "Launching with options '" << options[0].optionString << "'" << endl;

        // end of clunck - back to setting up JVM
        args.options = options;
        args.ignoreUnrecognized = JNI_FALSE;

        JNI_CreateJavaVM(&jvm, (void **)&env, &args);
        return env;
    }

    void JVMInterface::bindNativeMethod(char* className,char* name,char* signature,void* functionPointer){
        jclass toBind;
        makeJNICall([&]{toBind=env->FindClass(className);});
        // TODO: a throw in here could leak local references
        makeJNICall([&]{
            JNINativeMethod ms[]={name,signature,functionPointer};
            env->RegisterNatives(toBind,ms,1);
        });
        makeJNICall([&]{
            env->DeleteLocalRef(toBind);
        });
    }

    // Simple way to handle JNI calls safely
    void makeJNICall(JNIEnv* env,function<void()> func){
        env->ExceptionClear();
        func();
        if(env->ExceptionCheck()){
            env->ExceptionDescribe();
            throw env->ExceptionOccurred();
        }
    }

    void JVMInterface::makeJNICall(function<void()> func){
        nerds_central::makeJNICall(env,func);
    }

    jmethodInfo JVMInterface::findJVMMethod(char* clazz, char* name, char* signature,bool isStatic){
        jmethodInfo ret;
        makeJNICall([&]{
            ret.first=env->FindClass(clazz);
        });
        makeJNICall([&]{
            ret.first=(jclass)(env->NewGlobalRef(ret.first));
            globals.push_back(ret.first);
        });
        makeJNICall([&]{
            ret.second=isStatic
                ?env->GetStaticMethodID(ret.first,name,signature)
                :env->GetMethodID(ret.first,name,signature);
        });
        return ret;
    }

    jfieldInfo JVMInterface::findJVMField(char* clazz, char* name, char* signature,bool isStatic){
        jfieldInfo ret;
        makeJNICall([&]{
            ret.first=env->FindClass(clazz);
        });
        makeJNICall([&]{
            ret.first=(jclass)(env->NewGlobalRef(ret.first));
            globals.push_back(ret.first);
        });
        makeJNICall([&]{
            ret.second=isStatic
                ?env->GetStaticFieldID(ret.first,name,signature)
                :env->GetFieldID(ret.first,name,signature);
        });
        return ret;
    }

    // JVMPerformanceTester definitions
    jlong JVMPerformanceTester::getRDTSCCycles(JNIEnv *env,jclass clazz,jlong obj_ptr){
        return ((JVMPerformanceTester*)obj_ptr)->getElapsedCycles();
    }


    void JVMInterface::deleteRef(jobject ref){
        env->ExceptionClear();
        env->DeleteGlobalRef(ref);
        if(env->ExceptionOccurred()){
            env->ExceptionDescribe();
            env->FatalError("Error occured");
        }
    }

    JVMInterface::~JVMInterface(){
        for_each(globals.begin(),globals.end(), bind1st(mem_fun(&JVMInterface::deleteRef), this));
        jvm->DestroyJavaVM();
    }

} // nerds-central


RDSTC access is done by a very simple C++ class.


#ifndef TIMING_H
#define TIMING_H
#include "jvm_interface.h"
#pragma intrinsic(__rdtsc)

namespace nerds_central{

    #define  CounterType      unsigned __int64 

    class JVMTimer{
    private:
        const CounterType t0;
    public:
        JVMTimer() : t0(__rdtsc()){}

        CounterType getElapsed(){
            return (__rdtsc()-t0);
        }
    };
}

#endif


I pass instances of that class to Java by reference using a jlong to hold a pointer to the C++ instance. The Java wraps the C++ objects in objects of its own. It then binds collection of the C++ objects to collection of its own objects using phantom references. One might consider this overkill, but it was fun!

Here it there entire Java code:


package com.nerdscentral.jni;

import java.lang.ref.PhantomReference;
import java.lang.ref.ReferenceQueue;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.util.IdentityHashMap;

public class TwoWayTester {
    // =================================
    //       TIMER INTERFACE
    // =================================
    private static native long getRDTSCCycles(long ptr);
    private static native long getTimerInstance();
    private static native void deleteTimerInstance(long ptr);
    public static class RDSCTimer{
        
        private static ReferenceQueue<? super RDSCTimer> clearingQueue = new ReferenceQueue<>();
        private static IdentityHashMap<PhantomReference<RDSCTimer>, Long> instances = new IdentityHashMap<>();

        private long instance;

        /* Create a thread which garbage collects C++ instances of RDSCTimer when the
         * Java instance is enqueued for collection.
         */
        static{
            Thread th = new Thread(new Runnable(){
                @SuppressWarnings("unchecked")
                @Override
                public void run() {
                    while(true){
                        try{
                            PhantomReference<RDSCTimer> refToDelete;
                            try {
                                refToDelete=(PhantomReference<RDSCTimer>) clearingQueue.remove();
                            } catch (InterruptedException e) {
                                // Don't just stare at it
                                continue;
                            }
                            deleteTimerInstance(instances.get(refToDelete));
                            // Mark the phantom to be collectable
                            instances.remove(refToDelete);
                            refToDelete.clear();
                            // Not required as we have cleared it, but means the even the ref
                            // can now be collected before the next remove on the queue returns
                            refToDelete=null;
                        }catch(Throwable t){
                            t.printStackTrace();
                        }
                    }
                }
            });
            th.setDaemon(true);
            th.start();
        }
        public RDSCTimer(){
            instance = getTimerInstance();
            // Store away the C++ ptr to the RDSCTimer object so that it can be deleted
            instances.put(new PhantomReference<RDSCTimer>(this, clearingQueue), instance);
        }
        
        public long getCurrentCycles(){
            return getRDTSCCycles(instance);
        }
    }
    
    // =================================
    //       CALL INTERFACE
    // =================================
    public static native long jniIncrement(long toIncrement);
    
    public static long jvmIncrement(long toIncrement){
        return toIncrement+1;
    }
    
    public static long jvmIncrementHolder;
    
    /* Run the test by calling the increment in C++ from Java
     * many times and measure the time taken.
     */
    public static long runTestFromJVM(){
        //long dog=System.nanoTime();
        RDSCTimer timer=new RDSCTimer();
        long t0=timer.getCurrentCycles();
        long j=0;
        for(;j<1000000;j=jniIncrement(j));
        long ret = timer.getCurrentCycles()-t0;
        //System.out.println("Dog " + ((System.nanoTime()-dog)/1000));
        return ret;
    }
    
    public static long _runTestFromJVM(){
        Method m=null;
        try {
            m = TwoWayTester.class.getMethod("jvmIncrement",long.class);
        } catch (NoSuchMethodException | SecurityException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        RDSCTimer timer=new RDSCTimer();
        long t0=timer.getCurrentCycles();
        long j=0;
        try {
            for(;j<1000000;){
                //j=jvmIncrement(j);
                j=(long) m.invoke(null,j);
            }
        } catch (IllegalAccessException | IllegalArgumentException
                | InvocationTargetException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return timer.getCurrentCycles()-t0;
    }
}


The actual program is here:


#ifndef TEST_RUNNER_H
#define TEST_RUNNER_H
#include "jni_calls.h"

namespace nerds_central {
    class TestRunner{
    private:
        double callJavaFromNative_Inner();
        double callNativeFromJava_Inner();
        JVM_ptr jvm;
    public:
        TestRunner(JVM_ptr jvmIn);
        void callJavaFromnative();
        void callNativeFromJava();
    };
}// nerds-central

#endif


#include "test_runner.h"
namespace nerds_central{
    jfieldInfo  incrField;
    jmethodInfo incrMethod;
    jmethodInfo testerMethod;

    TestRunner::TestRunner(JVM_ptr jvmIn) : jvm(jvmIn){};
    
    double TestRunner::callJavaFromNative_Inner(){
        return 0;
    }

    double TestRunner::callNativeFromJava_Inner(){
        return 0;
    }

    void TestRunner::callJavaFromnative(){
    }

    void TestRunner::callNativeFromJava(){
    }

    void main(char* classPath){
        auto jvm=setUpJVM(classPath,incrMethod,testerMethod,incrField);
        jlong timeTaken;
        jvm->makeJNICall([&](){
            timeTaken=jvm->getEnv()->CallStaticLongMethod(testerMethod.first,testerMethod.second);
        });
        cout << "Performing tests where Java calls C++" << endl;
        cout << "=====================================" << endl << endl;
        cout << "Initial run from Java took " << timeTaken << " Cycles" << endl;
        cout << "Performing warmed runs " << endl;
        for(int j=0;j<10;++j){
            for(int i=0;i<11;++i){
                jvm->makeJNICall([&](){
                    timeTaken=jvm->getEnv()->CallStaticLongMethod(testerMethod.first,testerMethod.second);
                });
            }
            cout << "Warmed  run from Java took " << (timeTaken/1000000) << " Cycles per call" << endl;
        }
        cout << "****" << endl;

        cout << "Performing tests where C++ calls Java" << endl;
        cout << "=====================================" << endl << endl;
        cout << "****" << endl;
        JVMPerformanceTester timer;
        auto t0=timer.getElapsedCycles();
        for(jlong i=0;i<1000000;){
             i=jvm->getEnv()->CallStaticLongMethod(incrMethod.first,incrMethod.second,i);
        }
        timeTaken=timer.getElapsedCycles()-t0;
        cout << "Initial run from Java took " << timeTaken << " Cycles" << endl;
        cout << "Performing warmed runs " << endl;
        for(int j=0;j<10;++j){
            for(int k=0;k<11;++k){
                t0=timer.getElapsedCycles();
                for(jlong i=0;i<1000000;){
                     i=jvm->getEnv()->CallStaticLongMethod(incrMethod.first,incrMethod.second,i);
                }
                timeTaken=timer.getElapsedCycles()-t0;
            }
            cout << "Warmed  run from Java took " << (timeTaken/1000000) << " Cycles per call" << endl;
        }
        cout << "****" << endl;
    }

}// nerds_central

int main(int argc, char *argv[]){
    if(argc<1){
        cout << "Requires class path on command line" << endl;
    }else{
        nerds_central::main(argv[1]);
    }
}

In this code we can see just how easy calling Java methods from C++! The code which binds the interface together is below:


#ifndef JNI_CALLS_H
#define JNI_CALLS_H
#include "jvm_interface.h"
namespace nerds_central{
    JVM_ptr setUpJVM(char* cp,jmethodInfo& methodInfo,jmethodInfo& testerMethod,jfieldInfo& fieldInfo);
} // nerds_central
#endif

#include "jni_calls.h"
namespace nerds_central{

    // Junk method to be called
    static jlong jniIncrement(JNIEnv*env,jclass clazz,jlong what){
        return ++what;
    }

    // Set up the system
    JVM_ptr setUpJVM(char* cp,jmethodInfo& methodInfo,jmethodInfo& testerMethod,jfieldInfo& fieldInfo){
        cout << "About to create JVM" << endl;
        auto vm = make_shared<JVMInterface>(cp);
        //shared_ptr<JVMInterface> vm(new JVMInterface(cp));
        cout << "Created JVM" << endl;

        vm->bindNativeMethod(
            "com/nerdscentral/jni/TwoWayTester",
            "getRDTSCCycles",
            "(J)J",
            &JVMPerformanceTester::getRDTSCCycles
            );

        vm->bindNativeMethod(
            "com/nerdscentral/jni/TwoWayTester",
            "getTimerInstance",
            "()J",
            &JVMPerformanceTester::getInstance
            );

        vm->bindNativeMethod(
            "com/nerdscentral/jni/TwoWayTester",
            "deleteTimerInstance",
            "(J)V",
            &JVMPerformanceTester::deleteInstance
            );

        vm->bindNativeMethod(
            "com/nerdscentral/jni/TwoWayTester",
            "jniIncrement",
            "(J)J",
            &jniIncrement
            );

        methodInfo = vm->findJVMMethod(
            "com/nerdscentral/jni/TwoWayTester",
            "jvmIncrement",
            "(J)J",
            true
        );

        testerMethod = vm->findJVMMethod(
            "com/nerdscentral/jni/TwoWayTester",
            "runTestFromJVM",
            "()J",
            true
        );


        fieldInfo = vm->findJVMField(
            "com/nerdscentral/jni/TwoWayTester",
            "jvmIncrementHolder",
            "J",
            true
        );

        return vm;
    }

} // nerds-central




Just to give an example of how easy this is:


        vm->bindNativeMethod(
            "com/nerdscentral/jni/TwoWayTester",
            "getRDTSCCycles",
            "(J)J",
            &JVMPerformanceTester::getRDTSCCycles
            );


Here I pass in the class, name and signature of the native method definition in Java. Along with that I pass in a pointer to the C++ method I would like invoked when Java calls the native method. The framework binds these together or throws an exception if the binding does not match correctly.