Monday, January 23, 2012

Modular Java Applications - A Microkernel Approach

Software Engineering is all about reuse. We programmers therefore love to split applications up into smaller components so that each of them can be reused or extended in an independent manner.

A keyword here is "loose coupling". Slightly simplified, this means, each component should have as few dependencies to other components as possible. Most important, if I have a component B which relies on component A, I don't want that A needs to know about B. The component A should just provide a clean interface which could be used and extended by B.

In Java there are many frameworks which provide this exact functionality: JavaEE, Spring, OSGI. However, each of those frameworks come with their own way to do things and provide lots and lots of additional functionality - whether you want it or not!

Since we here at scireum love modularity (we build 4 products out of a set of about 10 independet modules) we built our own little framework. I factored out the most important parts and now have a single class with less than 250 lines of code+comments!

I call this a microkernel approach, since it nicely compares to the situation we have with operating systems: There are monolithic kernels like the one of Linux with about 11,430,712 lines of code. And there is a concept called a microkernel, like to one of Minix with about 6,000 lines of executable kernel code. There is still an ongoing discussion which of the two solituons is better. A monolithic kernel is faster, a microkernel has way less critical code (critical code means: a bug there will crash the complete system. If you haven't already, you should read more about mikrokernels on Wikipedia.

However one might think about operating systems - when it comes to Java  I prefer less dependencies and if possible no black magic I don't understand. Especially if this magic involves complex ClassLoader structures. Therefore, here comes Nucleus...

How does this work?
The framework (Nucleus) solves two problems of modular applications:
  • I want to provide a service to other components - but I only want to show an interface and they should be provided with my implementation at runtime without knowning(referencing) it.
  • I want to provide a service or callback for other components. I provide an interface, and I want to know all classes implementig it, so I can invoke them.
Ok, we probably need examples for this. Say we want to implement a simple timer service. It provides an interface:

public interface EveryMinute {
    void runTimer() throws Exception;
}

All classes implementing this interface should be invoked every minute. Additionally we provide some infos - namely, when was the timer executed last.

public interface TimerInfo {
    String getLastOneMinuteExecution();
}

Ok, next we need a client for our services:


@Register(classes = EveryMinute.class)
public class ExampleNucleus implements EveryMinute {

    private static Part<TimerInfo> timerInfo = 
                                     Part.of(TimerInfo.class);

    public static void main(String[] args) throws Exception {
        Nucleus.init();

        while (true) {
            Thread.sleep(10000);
            System.out.println("Last invocation: "
                    + timerInfo.get().getLastOneMinuteExecution());
        }

    }

    @Override
    public void runTimer() throws Exception {
        System.out.println("The time is: "
                + DateFormat.getTimeInstance().format(new Date()));
    }
}

The static field "Part<TimerInfo> timerInfo" is a simple helper class which fetches the registered instance from Nucleus on the first call and loads it into a private field. So accessing this part has almost no overhead to a normal field access - yet we only reference an interface, not an implementation.

The main method first initializes Nucleus (this performs the classpath scan etc.) and then simply goes into an infinite loop, printing the last execution of our timer every ten seconds.

Since our class wears a @Register annotation, it will be discovered by a special ClassLoadAction (not by Nucleus itself) instantiated and registered for the EveryMinute interface. Its method runTimer will then be invoced by our timer service every minute.

Ok, but how would our TimerService look like?

@Register(classes = { TimerInfo.class })
public class TimerService implements TimerInfo {

    @InjectList(EveryMinute.class)
    private List<EveryMinute> everyMinute;
    private long lastOneMinuteExecution = 0;

    private Timer timer;

    public TimerService() {
        start();
    }

    public void start() {
          timer = new Timer(true);
          // Schedule the task to wait 60 seconds and then invoke
          // every 60 seconds.
          timer.schedule(new InnerTimerTask(), 
                         1000 * 60, 
                         1000 * 60);
    }
    private class InnerTimerTask extends TimerTask {

        @Override
        public void run() {
            // Iterate over all instances registered for
            // EveryMinute and invoke its runTimer method.
            for (EveryMinute task : everyMinute) {
                    task.runTimer();
            }
                             // Update lastOneMinuteExecution
            lastOneMinuteExecution = System.currentTimeMillis();
        }

    }

    @Override
    public String getLastOneMinuteExecution() {
        if (lastOneMinuteExecution == 0) {
            return "-";
        }
        return DateFormat.getDateTimeInstance().format(
                new Date(lastOneMinuteExecution));
    }
}




This class also wears a @Register annotation so that it will also be loaded by the ClassLoadAction named above (the ServiceLoadAction actually). As above it will be instantiated and put into Nucleus (as implementation of TimerInfo). Additionally it wears an @InjectList annotation on the everyMinute field. This will be processed by another class named Factory which performs simple dependency injection. Since its constructur starts a Java Timer for the InnerTimerTask, from that point on all instances registered for EveryMinute will be invoced by this timer - as the name says - every minute.

How is it implemented?
The good thing about Nucleus is, that it is powerful on the one hand, but very simple and small on the other hand. As you could see, there is no inner part for special or privileged services. Everything is built around the kernel - the class Nuclues. Here is what it does:

  • It scans the classpath and looks for files called "component.properties". Those need to be in the root folder of a JAR or in the /src folder of each Eclipse project respectively. 
  • For each identified JAR / project / classpath element, it then collects all contained class files and loads them using Class.forName.
  • For each class, it checks if it implements ClassLoadAction, if yes, it is put into a special list.
  • Each ClassLoadAction is instanciated and each previously seen class is sent to it using: void handle(Class<?> clazz)
  • Finally each ClassLoadAction is notified, that nucleus is complete so that final steps (like annotation based dependency injection) could be performed.
That's it. The only other thing Nucleus provides is a registry which can be used to register and retrieve objects for a class. (An in-depth description of the process above, can be found here: http://andreas.haufler.info/2012/01/iterating-over-all-classes-with.html).

Now to make this framework useable as shown above, there is a set of classes around Nucleus. Most important is the class ServiceLoadAction, which will instantiate each class which wears a @Register annoation, runs Factory.inject (our mini DI tool) on it, and throws it into Nucleus for the listed classes. Whats important: The ServiceLoadActions has no specific rights or privileges, you can easily write your implementation which does smarter stuff.

Next to some annotations, there are three other handy classes when it comes to retrieving instances from Nucleus: Factory, Part and Parts. As noted above, the Factory is a simple dependency injector. Currently only the ServiceLoadAction autmatically uses the Factory, as all classes wearing the @Register annotation are scanned for required injections. You can however use this factory to run injections on your own classes or other ClassLoadActions to do the same as ServiceLoadAction. If you can't or don't want to rely in annotation based dependency magic, you can use the two helper classes Part and Parts. Those are used like normal fields (see ExampleNucleus.timerInfo above) and fetch the appropriate object or list of objects automatically. Since the result is cached, repeated invocations have almost no overhead compared to a normal field.

Nucleus and the example shown above is open source (MIT-License) and available here:
https://github.com/andyHa/scireumOpen/blob/master/src/examples/ExampleNucleus.java
https://github.com/andyHa/scireumOpen/tree/master/src/com/scireum/open/nucleus


If you're interested in using Nucleus, I could put the relevant souces into a separater repository and also provide a release jar - just write a comment below an let me know.

Update:  I moved nucleus into a repository on its own: https://github.com/andyHa/nucleus - It even includes a distribution jar.


This post is the fourth part of the my series "Enterprisy Java" - We share our hints and tricks how to overcome the obstacles when trying to build several multi tenant web applications out of a set of common modules.

Tuesday, January 10, 2012

Launching and Debugging Tomcat from Eclipse without complex plugins

 Modern IDEs like Eclipse provide various Plugins to ease web developement. However, I believe that starting Tomcat as "normal" Java application still provides the best debugging experience. Most of the time, this is because these tools launch Tomcat or any other servlet container as external process and then attach a remote debugger on it. While you're still able to set breakpoints and inspect variables, other features like hot code replacement don't work that well.

Therefore I prefer to start my Tomcat just like any other Java application from within Eclipse. Here's how it works:

This article addresses experienced Eclipse users. You should already know how to create projects, change their built path and how to run classes. If you need any help, feel free to leave a comment or contact me.

We'll add the Tomcat as additional Eclipse project, so that paths and all remain platform independent. (I even keep this project in our SVN so that everybody works with the same setup).

Step 1 - Create new Java project named "Tomcat7"




Step 2 - Remove the "src" source folder





Step 3 - Download Tomcat (Core Version) and unzip into our newly created project. This should now look something like this:




Step 4 - If you havn't, create a new Test project which contains your sources (servlets, jsp pages, jsf pages...). Make sure you add the required libraries to the built path of the project





 Step 5.1 - Create a run configuration. Select our Test project as base and  set org.apache.catalina.startup.Bootstrap as main class.





Step 5.2 -  Optionally specify larger heap settings as VM arguments. Important: Select the "Tomcat" project as working directory (Click on the "Workspace" button below the entry field.





Step 5.3 - Add bootstrap.jar and tomcat-juli.jar from the Tomcat7/bin directory as bootstrap classpath.Add everything in Tomcat7/lib as user entries. Make sure the Test project and all other classpath entries (i.e. maven dependencies) are below those.






Now you can "Apply" and start Tomcat by hitting "Debug". After a few seconds (check the console output) you can go to http://localhost:8080/examples/ and check out the examples provided by Tomcat.


Step 6 - Add Demo-Servlet - Go to our Test project, add a new package called "demo" and a new servlet called "TestServlet". Be creative with some test output - like I was...




Step 7 - Change web.xml - Go to the web.xml of the examples context and add our servlet (as shown in the image). Below all servlets you also have to add a servlet-mapping (not shown in the image below). This looks like that:

    <servlet-mapping>
        <servlet-name>test</servlet-name>
        <url-pattern>/demo/test</url-pattern>
    </servlet-mapping>




Hit save and restart tomcat. You should now see your debug output by surfing to http://localhost:8080/examples/demo/test - You now can set breakpoints, change the output (thanks to hot code replacement) and do all the other fun stuff you do with other debugging sessions.


Hint: Keeping your JSP/JSF files as well as your web.xml and other resources already in another project? Just create a little ANT script which copies them into the webapps folder of the tomcat - and you get re-deployment with a single mouse click. Even better (this is what we do): You can modify/override the ResourceResolver of JSF. Therefore you can simply use the classloader to resolve your .xhtml files. This way, you can keep your Java sources and your JSF sources close to each other. I will cover that in another post - The fun stuff starts when running multi tenant systems with custom JSF files per tenant. The JSF implementation of Sun/Oracle has some nice gotchas built-in for that case ;-)

Friday, January 6, 2012

Iterating over all Classes with an Annotation or Interface

...is impossible unless you use JavaEE, right? Wrong!

With some tricks you can iterate over all classes which fullfill a given predicate, like implementing an interface or wearing an annotation. But why should you care?

Well, software engineering is all about reuse. For example we split our software up in about five modules and build currently four different products on top of that. Since a module obviously has no knowledge about any of there products and their classes, we need to discover extensions and handlers and other hooks at runtime.

Yes, we could use OSGI or Spring for that, but when we started out, but we both of these "feature battleships" are far too large for our concerns. So we built or own little DI (dependency injection) framework (with about a handful of classes). Well DI is probably not the key aspect, it's actually all about getting all classes implemeting a given  interface or annotation. (Some concrete examples will follow in the next posts).

So how do we get this magic list? Well, it's tricky - but work's like a charm in our setting:

As I said, we have several modules which each will participate in the seach for classes. Therefore each module will become a JAR file. Now what we do is, we place a file called "component.properties" in the root folder of this JAR, also in the root folder of the Eclipse project repsectively. This file contains some meta-data like name, version and build-date (filled by ant) - but that's irrelevant now.

Now when we want to discover our classes, we first get a list of all component.properties in the classpath, using the technique above, there will be one per module/JAR:

Enumeration<URL> e = Nucleus.class.getClassLoader().
                       getResources("component.properties");

We then use each of the returned URLs and apply the following (dirty) algorithm:

    /**
     * Takes a given url and creates a list which contains 
     * all children of the given url. 
     * (Works with Files and JARs).
     */
    public static List<String> getChildren(URL url) {
        List<String> result = new ArrayList<String>();
        if ("file".equals(url.getProtocol())) {
            File file = new File(url.getPath());
            if (!file.isDirectory()) {
                file = file.getParentFile();
            }
            addFiles(file, result, file);
        } else if ("jar".equals(url.getProtocol())) {
            try {
                JarFile jar = ((JarURLConnection)
                                url.openConnection())
.getJarFile();
                Enumeration<JarEntry> e = jar.entries();
                while (e.hasMoreElements()) {
                    JarEntry entry = e.nextElement();
                    result.add(entry.getName());
                }
            } catch (IOException e) {
                Log.UTIL.WARN(e);
            }
        }
        return result;
    }

    /**
     * Collects all children of the given file into the given 
     * result list. The resulting string is the relative path
     * from the given reference.
     */
    private static void addFiles(File file, 
                                 List<String> result, 
                                 File reference) 
    {
        if (!file.exists() || !file.isDirectory()) {
            return;
        }
        for (File child : file.listFiles()) {
            if (child.isDirectory()) {
                addFiles(child, result, reference);
            } else {
                String path = null;
                while (child != null && !child.equals(reference)) {
                    if (path != null) {
                        path = child.getName() + "/" + path;
                    } else {
                        path = child.getName();
                    }
                    child = child.getParentFile();
                }
                result.add(path);
            }
        }
    }


So what we now have is a list of files like:
    com/acme/MyClass.class
    com/acme/resource.txt
    ... 

We now can filter this list for classes, and load each one, we then can check our predicates (implements interface, has annotation):

...iterating over each result of getChildren(url)...:


if (relativePath.endsWith(".class")) {
  // Remove .class and change / to .
  String className = relativePath.substring(0,
                      relativePath.length() - 6).replace("/", ".");
  try {
     Class<?> clazz = Class.forName(className);
     for (ClassLoadAction action : actions) {
        action.handle(clazz);
     }
  } catch (ClassNotFoundException e) {
     System.err.println("Failed to load class: " + className);
  } catch (NoClassDefFoundError e) {
     System.err.println("Failed to load dependend class: " +
                        className);
  }
}

Now you have loaded all classes which match your predicates. We do this once on startup and fill a lookup Map. Once a component wants to know all implementations of X, we simply query this Map. Furthermore we use the component.properties to let the framework know which component depends on which. We then load the classes in the correct order. This is important, since while loading classes you can provide more implementations of ClassLoadAction - Yes, you can extend the extension loader while loading extensions.

I've made the framework open source. An article about it can be found here: http://andreas.haufler.info/2012/01/modular-java-applications-microkernel.html

This post is the third part of the my series "Enterprisy Java" - We share our hints and tricks how to overcome the obstacles when trying to build several multi tenant web applications out of a set of common modules.

Java: Caching without Crashing

When building larger Java applications you sooner or later stumble over the decision "Quite intensive computation  - should I recompute this every time?". Most of the time the alternative is to take a Map<K,V> and cache the computed value. This works for short lived objects where the total number of cached values predicted. But sometimes you just don't know how many values would be cached. So you'd say, we better take Apache Collection's LRUMap and limit the size of each cache, so things don't go out of hands. This is of course far better than the first solution. However, those caches still grow and grow until they reached their max size - and they'll never shrink!

We've had such a solution running on our servers. After some days of operation, the JVM had always maxed out it's heap. Well it didn't crash, it was acutally very stable, it just consumed a lot of expensive resources, since all caches where full, no matter if they were currently used or not.

So what we needed and built is a CacheManager. Whenever you need a cache, you ask the CacheManager to provide one for you. Internally it will still use Apache Collection's robust LRUMap, along with some statistics and bookkeeping. It will therefore let you specify a maximal "Time To Live (TTL)" for each cache. The CacheManager then checks all caches regularly and cleans out unused entries. Using this, you can easily use caches here and there, knowing that once the utilization goes down, the entries will be evicted and no longer block expensive resources.

Here's how you'd use this class - see ExampleCache.java for a full example:

    Cache<String, Integer> test = CacheManager
            .createCache("Test", // Name of the cache
                         10,     // Max number of entires
                         1,      // TTL
                         TimeUnit.HOUR //Unit of TTL (hours) 
                         );

The Cache can then be used like a Map:

     value = test.get("key");
     test.put("key", value);

All you need to do is set up a Timer or another service which regulary invokes: "CacheManager.runEviction()" - to clean up all caches. 

A complete example and the implementation can be found here (open source - MIT-License):

Each Cache can also provide statistics like size, hit rate, number of uses, etc:
Visualization of the provided usage statistics in our systems - captions are in German, sorry ;-)

This post is the second part of the my series "Enterprisy Java" - We share our hints and tricks how to overcome the obstacles when trying to build several multi tenant web applications out of a set of common modules.

Conveniently Processing Large XML Files with Java

When processing XML data I find it most convenient to load the whole document using a DOM parser and fire some XPath-queries against the result. However, since we're building a multi-tenant eCommerce plattform we regularly have to handle large XML files, with file sizes above 1 GB. You certainly don't want to load such a beast into the heap of a production server, since it easily grows up to 3GB+ as DOM representation.

So what to do? Well, SAX to the rescue! Processing a large XML file using a SAX parser still requires constant (low) memory, since it only invokes callback for detected XML tokens. But, on the other hand, parsing complex XML really becomes a mess.

To resolve this problem we need to have a closer look at our XML input data. Most of the time, at least in our cases, you don't need the whole DOM at once. Say your importing product informations, it sufficient to look at one product at a time. Example:

<nodes>
    <node>
        <name>Node 1</name>
        <price>100</price>
    </node>
    <node>
        <name>Node 2</name>
        <price>23</price>
    </node>
    <node>
        <name>Node 3</name>
        <price>12.4</price>
        <resources>
            <resource type="test">Hello 1</resource>
            <resource type="test1">Hello 2</resource>
        </resources>
    </node>
</nodes>

When processing Node 1, we don't need access to any attribute of Node 2 or three, respectively when processing Node 2, we don't need access to Node 1 or 3, and so on. So what we want is a partial DOM, in our example for every <node>.

What we've therefore built is a SAX parser, for which you can specify in which XML elements you are interested. Once such an element starts, we record the whole sub-tree. When this completes we notify a handler which then can run XPath expressions against this partial DOM. After that, the DOM is released and the SAX parser continues.

Here is a shortened example of how you could parse the XML above - one "<node>" at a time:

   XMLReader r = new XMLReader();

   r.addHandler("node", new NodeHandler() {

     @Override
     public void process(StructuredNode node) {
       System.out.println(node.queryString("name"));
       System.out.println(node.queryValue("price").asDouble(0d));
     }
   });

   r.parse(new FileInputStream("src/examples/test.xml"));

The full example, along with the implementation is open source (MIT-License) and available here:
https://github.com/andyHa/scireumOpen/tree/master/src/com/scireum/open/xml
https://github.com/andyHa/scireumOpen/blob/master/src/examples/ExampleXML.java

We successfully handle up to five parallel imports of 1GB+ XML files in our production system, without measurable heap growth. (Instead of using a FileInputStream, we use JAVAs ZIP capabilities and directly open and process ZIP versions of the XML file. This shrinks those monsters down to 20-50MB and makes uploads etc. much easier.)


This post is the first part of the my series "Enterprisy Java" - We share our hints and tricks how to overcome the obstacles when trying to build several multi tenant web applications out of a set of common modules.

Thursday, January 5, 2012

Richtiges Rechnen mit Mehrwertsteuersätzen - ohne Gefängnisaufenthalt


...klingt eigentlich ganz einfach, ist es aber leider nicht.

Doch zuerst mal etwas Theorie: Geldbeträge werden ja bekanntermaßen kaufmännisch gerundet (Ganz genau: nach DIN 1333), so hat man es in der Schule auch gelernt: Bis 0,4 wird abgerundet, ab 0,5 wird aufgerundet. Richtig? Falsch! Man möge seinem Grundschullehrer verzeihen: Es wird "weg von der 0 gerundet" also bei negativen Zahlen genau anders herum: -0,4 wird 0 und -0,5 wird -1. Aber gut, soweit auch nichts neues.

Etwas verwirrender ist das Rechnen mit gerundeten Zahlen, denn hier gelten plötzlich die Regeln der Mathematik nicht mehr, insbesondere das Distributivgesetz:

a * (b + c) = (a * b) + (a * c)

Brauchen wir das denn unbedingt? Ja, wenn man "a" durch "MwSt.", "b" durch "Preis von Position 1" und "c" durch "Preis von Position 2" ersetzt, sieht man zwei Arten der Mehrwertsteuerberechung einer Rechnung. Links werden die Positionspreise aufsummiert und auf die Summe die MwSt. berechnet. Rechts wird die MwSt. pro Position berechnet und dann aufsummiert. Beide Werte sollten das gleiche Ergebnis liefern - dem ist aber nicht so.

Ok, aber warum gilt das Gesetz nicht mehr? Da wir nach jeder Rechenopertation runden, entsteht dabei ein Rundungsfehler. Dieser kann durch nachfolgende Operationen verstärkt werden. Hier ein kleines Rechenbeispiel:

a = 1,19  "Bruttopreis ist Nettopreis (100%) + 19% Mehrwertsteuer".
b = 14,71 "Preis von Position 1"
c = 10,18 "Preis von Position 2"

a * (b + c) = 1,19 * (14,71 + 10,18) 
            = 1,19 * (24,89) 
            = 29,6191
            = (gerundet) 29,62

(a * b) + (a * c) = (1,19 * 14,71) + (1,19 * 10,18)
                  = (17,5049) + (12,1142)
                  = (gerundet) 17,50 + 12,11
                  = 29,61

Ok, dann rechnen wir eben intern genau, und runden nur das Ergebnis kaufmännisch!  Guter Ansatz, aber leider muss eine Rechnung nach UStG §14 die folgenden Angaben enthalten (grob):
  • Nettosumme der Positionen
  • Summe der abzuführenden Umsatzsteuer (evtl. pro Steuersatz aufgedrößelt)
  • Summe inklusive Umsatzsteuer
Wie wir oben gesehen haben, führt dies dazu, dass gerundet angezeigte Zwischenergebnisse
(z. B. Nettosumme = b + c = 24,89) nicht nachvollziehbar zu dem gerundeten Endergebnis passen: 
29,61 (ungleich 24,89 * 1,19 = 29,62)



Ok überredet, wir runden nach jeder Rechenoperation - oder noch besser, was sagt denn der Gesetzgeber wie man runden soll? Das ist das interessante dabei: Nichts.

Es gibt aber hier eine Art "Best Practice": Tatsächlich sollte intern immer so genau wie möglich gerechnet werden. In unserem Beispiel hat nur 1 Cent "gefehlt". Bei Rechnungen mit vielen Positionen kann sich dieser Fehler aber aufschaukeln und die Differenz kann nahezu beliebig groß werden. Rechnet man intern jedoch exakt und rundet nur die Darstellung der Zwischenergebnisse und Endergebnisse, so erhält man zwar den oben beschriebenen Sonderfall, nämlich u.U. 1 Cent Differenz, dieser Fehler hält sich jedoch in Grenzen, da er durch nachfolgende Rechenoperationen nicht verstärk wird.

In Zukunft also ruhig mal auf den ein oder anderen Kassenzettel schauen - aber wie wir gesehen haben muss man nicht gleich das Finanzamt alarmieren wenn die Netto- und Bruttosumme nicht Cent-genau zusammenpassen.