Saturday, February 6, 2016

C code always runs way faster than Java, right? Wrong!

So we all know the prejudice that Java being interpreted is slow and that C being compiled and optimized runs very fast. Well as you might know, the picture is quite different.

TL;DR Java is faster for constellations, where the JIT can perform inlining as all methods/functions are visible whereas the C compiler cannot perform optimizations accross compilation units (think of libraries etc.).

A C compiler takes the C code as input, compiles and optimizes it and generates machine code for a specific CPU or architecture to be executed. This leads to an executable which can be directly run on the given machine without further steps. Java on the other hand, has an intermediate step: Bytecode. So the Java compiler takes Java code as input and generates bytecode, which is basically machine code for an abstract machine. Now for each (popular) CPU architecture there is a Java Virual Machine, which simulates this abstract machine and executes (interprets) the generated bytecode. And this is as slow as it sounds. But on the other hand, bytecode is quite portable, as the same output will run on all platforms - hence the slogan "Write once, run everywhere".

Now with the approach described above it would be rather "write once, wait everywhere" as the interpreter would be quite slow. So what a modern JVM does is just in time compilation. This means the JVM internally translates the bytecode into machine code for the CPU at hands. But as this process is quite complex, the Hotspot JVM (the one most commonly used) only does this for code fragments which are executed often enough (hence the name Hotspot). Next to being faster at startup (interpreter starts right away, JIT compiler kicks in as needed) this has another benefit: The hotspot JIT known already what part of the code is called frequently and what not - so it might use that while optimizing the output - and this is where our example comes into play.

Now before having a look at my tiny, totally made up example, let me note, that Java has a lot of features like dynamic dispatching (calling a method on an interface) which also comes with runtime overhead. So Java code is probably easier to write but will still generally be slower than C code. However, when it comes to pure number crunching, like in my example below, there are interesting things to discover.

So without further talk, here is the example C code:

test.c:

int compute(int i);

int test(int i);
 

int main(int argc, char** argv) {
    int sum = 0;
    for(int l = 0; l < 1000; l++) {
        int i = 0;
        while(i < 2000000) {
            if (test(i))
            sum += compute(i);
            i++;
        }   
    }
    return sum;
}

test1.c:

int compute(int i) {
    return i + 1;
}

int test(int i) {
    return i % 3;
}


Now what the main function actually computes isn't important at all. The point is that it calls two functions (test and compute) very often and that those functions are in anther compilation unit (test1.c). Now lets compile and run the program:

> gcc -O2 -c test1.c
> gcc -O2 -c test.c
> gcc test.o test1.o
> time ./a.out

real    0m6.693s
user    0m6.674s
sys    0m0.012s


So this takes about 6.6 seconds to perform the computation. Now let's have a look at the Java program:

Test.java:

public class Test {

    private static int test(int i) {
        return i % 3;    }

    private static int compute(int i) {
        return i + 1;    }

    private static int exec() {
        int sum = 0; 
        for (int l = 0; l < 1000; l++) {
            int i = 0; 
            while (i < 2000000) {
                if (test(i) != 0) {
                    sum += compute(i); 
                }
                i++; 
            }
        }
        return sum; 
    }

    public static void main(String[] args) {
        System.out.println(exec());     
    }
} 
 
Now lets compile and execute this:

> javac Test.java
> time java Test

real    0m3.411s
user    0m3.395s
sys     0m0.030s


So taking 3.4 seconds, Java is quite faster for this simple task (and this even includes the slow startup of the JVM). The question is why? And the answer of course is, that the JIT can perform code optimizations that the C compiler can't. In our case it is function inlining. As we defined our two tiny functions in their own compilation unit, the comiler cannot inline those when compiling test.c - on the other hand, the JIT has all methods at hand and can perform aggressive inlining and hence the compiled code is way faster. 

So is that a totally exotic and made-up example which never occurs in real life? Yes and no. Of course it is an extreme case but think about all the libraries you include in your code. All those methods cannot be considered for optimization in C whereas in Java it does not matter from where the byte code comes. As it is all present in the running JVM, the JIT can optimize at its heart content. Of course there is a dirty trick in C to lower this pain: Marcos. This is, in my eyes, one of the mayor reasons, why so many libraries in C still use macros instead of proper functions - with all the problems and headache that comes with them.

Now before the flamewars start: Both of these languages have their strenghs and weaknesses and both have there place in the world of software engineering. This post was only written to open your eyes to the magic and wonders that a modern JVM makes happen each and every day.