On Classpath Hygiene

On Classpath Hygiene

One of the nasty problems in large JVM-based systems is that of type conflicts. These arise when more than one definition of a class is found for one and the same name – or, similarly, if there is no single version of a given class that is compatible with all using code.

This post is about how much pain you can inflict, when you expose APIs in a modular environment and do not pay attention about unwanted dependencies exposed to your users.

These situations do not occur because of ignorance or negligence in the first place and most likely not in the code your wrote.

The actual root cause is, from another perspective, one of Java’s biggest strength: The enormous eco system of frameworks and libraries to chose from. Using some third party implementation almost always means to include some dependencies of other libraries – not necessarily of compatible versions.

Almost from its beginning, Java had a way of splitting “class namespaces” so that name clashes of classes with different code could be avoided and type visibility be limited – and not the least that coded may be retrieved from elsewhere (than the classpath of the virtual machine): Class loaders.

Even if they share the same name, classes loaded (defined) by one class loader are separate from classes loaded by other class loaders and may not be casted. They may share some common super type though or use identical classes on their signatures and in their implementation. Indeed the whole concept makes little sense if the splitting approach does not include an approach for sharing.

Isolation by class loaders combined with more or less clever ways of sharing types and resoures is the underpinning of all Java runtime modularization (as in any Java EE server, OSGi, and of course Z2).

In the default setup provided by Java’s makers, class loaders are arranged in a tree structure, where each class loader has a parent class loader:

standard

The golden rule is: When asked to load a class, a class loader first asks its parent (parent delegation). If the parent cannot provide the class, the class loader is supposed to search it in its own way and, if found, define the class with the VM.

This simple pattern makes sure that types available at some class loader node in the tree will be consistently shared by all descendants.

So far so good.

Frequently however, when developers invent the possibility of extension by plugins, modularization comes in as a kind of afterthought and little thinking is invested in making sure that plugin code gets to see no more than what is strictly needed.

Unfortunately, if you chose to expose (e.g.) a version of Hibernate via your API, you essentially make your version the one any only that can responsibly be used. This is a direct consequence of the standard parent-delegation model.

Now let’s imagine a that plugin cannot work with the version that was “accidentally” imposed by the class loading hierarchy, so that the standard model becomes a problem. Then, why not turn things around and let the plugin find it’s version with preference over the provided one?

This is exactly what many Java EE server developers thought as well. And it’s an incredibly bad solution to the problem.

Imagine you have a parent/child class loader setup, where the parent exposes some API with a class B (named “B”) that uses another class A (named “A”). Secondly assume that the child has some class C that uses a class A’ with the same name as A, “A”. Because of a local-first configuration, C indeed uses A’. This was setup due to some problem C had with the exposed class A of the parent.

local-first

Suppose that C can provide instances of A’ and you want to use that capability at some later time. That other time, an innocent

C c = new C(); 
B b = new B(); 
b.doSomethingWithA(c.getA());

will shoot you with a Classloading Constraint Violation Error because A and A’ are  incompatible from the JVM’s perspective – which is completely invisible from the code.

At this level, you might say that’s no big deal. In practice however, this happens somewhere deep down in some third party lib. And it happens at some surprise point in time.

Debug that!

Summary