A simple modularization algorithm

Lately I worked on breaking up a module that had grown too big. It had started to feel hard to maintain and getting oriented in the code’s module felt increasingly cumbersome. As we run tests by module, automated tests triggered by check ins started taking too long and as several developers are working on code of the one module, failing module tests became harder to attribute.

In other words: It was time to think about some module refactoring, some house keeping.

There was a reason however that the module had not been broken up already: It had some lurking dependency problems. Breaking it up would mean to change other module’s dependencies just because – which felt arbitrary – and still there was re-use code to be made accessible to any split off.

Comparing to past experience this is the typical situation when everybody feels that something needs to be done but it always turns out to be a little too risky and unpredictable so that no one really dares. And after all: You can always push things still a bit further.

As that eventually leads to a messed up, stalling code-base and we are smart enough (or simply small enough?) to acknowledge that, we made the decision to fix it.

Now – I have done this kind of exercise on and off. It has some unpleasantly tiring parts and overall feels a little repetitive. Shouldn’t there be some kind of algorithm to follow?

That is what this post is about:

A simple modularization algorithm

Of course, as you will notice shortly: We cannot magically remove the inherent complexity of the problem. But nevertheless, we can put it into a frame that takes out some of distracting elements:

algo

Step 1: Group and Classify

It may sound a ridiculous, but the very first thing is to understand what is actually provided by the current module’s code. This may not be as obvious as it sounds. If it would be clear and easy to grasp, you most probably wouldn’t have ended up in the current mess anyway.

So the job to do is to classify contents into topics and use-cases. E.g.

  • API definitions. Possibly API definitions that can even be split into several APIs
  • Implementation of one or more APIs for independent uses
  • Utility code that exists to support implementations of some API

At this stage, we do not refactor or add abstraction. We only assess content in a way that we end up getting a graph of code fragments (a class, a group of classes) with dependencies. Note: The goal of the excercise is not to get a UML class diagram. Instead we aim for groups that can be named by what they are doing: “API for X”, “Implementation of job Y”, “Helper classes for Z”.

Most likely the result will look ugly. You might find an intermingled mess of some fifty different background service implementations that are all tight together by some shared wiring registry class that wants to know them all. You might find some class hierarchy that is deeply clutterd with business logic specific implementation and extending it further is the only practical way of enhancing the application. Remember: If it was not for any of these, you would not be here.

Our eventual goal is to change and enhance the resulting structure in a way that allows to form useful compartmentation and to turn a mess into a scalable module structure:

trans1

That’s what step 2 and step 3 are about.

Step 2: Abstract

The second step is the difficult piece of work. After step one, looking at your resulting graph it should be easy to categorize sub graphs into either one of the following categories:

  1. many of the same kind (e.g. many independent job implementations),
  2. undesirably complex and/or cyclic
  3. a mix of the two

If only the first holds, you are essentially done with step 2. If there is actual unmanageable complexity left, which is why you are here, you need to now start refactoring to get rid of it.

This is the core of the exercise and where you need to apply your design skills. This comes down to applying software design patterns ([1]), using extensibility mechanisms, and API design. The details are well beyond the scope of this post.

After you completed one such abstraction exercise, repeat step 2 until there is no more b) and c) cases.

Eventually you need to be left with a set of reasonably sized, well-defined code fragment sets, that are in an acyclic directed graph linking them up by linking dependencies.

trans2

(For example, removing the cycle and breaking the one-to-many dependency was achieved by replacing A by a delegation interface A’ and some lookup or registration mechanism).

Step 3: Arrange & Extract

After completing step 2 we are still in one module. Now is the time to split fragments up into several modules so that we can eventually reap the benefits of: Less to comprehend at a time, clearer locating of new implementations, a structure that has come back to manageability – provided of course that you did a good job in step 2 (bet, you saw that coming). This post is not about general strategies and benefits for modularization. But there are plenty in this blog (see below) and elsewhere.

Given our graph of fragments from step 2, make sure it is topologically ordered in direction of linking dependency (in the example from upper left to lower right).

Now start extracting graph nodes into modules. Typically this is easy as most of the naming and abstraction effort was done in the previous steps. Also, when starting you probably had other constraints in mind, like extensibility patterns or different life cycle constraints – e.g. some feature not being part of one deployment while being in another. These all play into the effort of extraction.

The nice thing is: At this time, having the graph chart at hand, the grouping can be easily altered again.

Repeat this effort until done:

trans3

Enjoy!

References

  1. Software Design Patterns (Wikipedia)
  2. Directed Acyclic Graphs (Wikipedia)
  3. Dependency Management for Modular Applications
  4. Modularization is more than cutting it into pieces
  5. Extend me maybe…

This is not OO!

oo

Once in a while I am part of a discussion, typically more of a complaint, that a certain design is not OO. OO as in object oriented.

While that is sometimes a not so well thought through statement anyway, there is something to it. I don’t have a problem with that though, and I am not sure why anybody would. As if OO is something valuable without justification.

Of course it is not. And many have written about it.

In the field that I am mostly concerned with, data driven applications, if not “the same old database apps”, I would go as far as saying: Of course it is not OO. The whole problem is so not OO – no wonder that the application design does not breath OO.

Unlike a desktop application, where the transition of what you expect to interact with as a user kind of naturally, at least on a naive level, translates into an object oriented design, this is only on a very, very, really super much abstract level the case with data driven applications.

That requires some clarification. The term data driven application does not make sense in the singular form. There is always data driven applications. Otherwise you would hardly mention the “data” in the term. As the term suggests, the assumption is that there is a highly semantical, well-defined (not necessarily as in good) data model that makes sense in its own right. And there is many applications that make sense out of it, provide ways of modification and analyse or cross-connect with other data and applications.

It is not far from here to a classic:

Data outlive applications

Or as a corollary:

Data is not bound to its methods, methods are bound to their data.

But that’s what OO in data driven applications tends to do: It ties not only methods to data – but instead also tends to tie data to methods. As for example in OR-Mapping – if you want to call that OO.

That is of course non-sensical and would undermine all modularization and development scale-out efforts – unless of course its the data of essentially a single application. Then, and only then, it makes sense to consider it the state of the objects that in turn describe its methods.

It is still perfectly meaningful to use object orientation as a structuring means offered by a programming language. Objects will typically not represent database backed state – other than as so-called value-objects. But that does by no means diminish the usefulness of object oriented language features.

Not with the target audience anymore?

Lately when I listen to talks on new and hot stuff for developers, be it in person or on the internet, I have the feeling that its not possibly me who’s being talked to.

It’s not just like having seen the latest web development framework over and over again – it’s that’s it’s only ever Web frameworks – sort of. It’s the unpleasant feeling that there is a hell of a lot of noise about stuff that matters very little.

This post got triggered when my long time colleague Georgi told me about Atomist, the new project by Rod Johnson of Spring Framework fame. Atomist is code generation on steroids and will no doubt get a lot of attention in the future. It allows you create and re-use code transformations to quickly create or alter projectsrather than, say, copy-pasting project skeletons.

There may well be a strong market for tools and practices that focus on rapidly starting a project. And that is definitely good for the show. It is really far away from my daily software design and development experience though, where getting started is the least problem.

Problems we work on are the actual business process implementation, code and module structure, extensibility in projects and for specific technology adaptations, data consistency and data scalability, point-in-time recovery, and stuff like “what does it mean to roll this out into production systems that are constantly under load?”.

Not saying that there is no value in frameworks that can do some initial stuff, or even consistently alter some number of projects later (can it?– Any better than a normal refactoring?) – but over the lifetime of a project or even product this seems to add comparatively little value.

So is this because it is just so much simpler to build and marketeer a new Web framework than anything else? Is there really so much more demand? Or is this simply a case of Streetlight effect?

I guess most of the harder problems that show up in the back-ends of growing and heavily used applications cannot be addressed by technology per se – but instead can only addressed by solution patterns (see e.g. Martin Fowler’s Patterns of Enterprise Architecture) to be adhered to. The back-end is where long-lasting value is created though. Far away from the front end. So it should be worthwhile to do something about it.

There is one example of an extremely successful technology that has a solid foundation, some very successful implementations, an impressively long history, and has become the true spine of business computing: The relational database.

Any technology that would standardize and solve problems like application life cycle management, work load management – just to name two – on the level that the relational database model and SQL have standardized data operations should have a golden future.

Links

  1. Atomist
  2. Rod Johnson on Atomist
  3. Streetlight Effect
  4. Martin Fowler’s Patterns of Enterprise Architecture

Some more…

* it may as well be micro-services (a technical approach that aims to become the “Web” of the backend). But then look at stuff like this

The Human Factor in Modularization

Here is yet another piece on my favorite subject: Keeping big and growing systems manageable. Or, conversely, why that is so hard and failing so often?

Why do large projects fail and why is productivity diminishing in large projects?

Admittedly, that is a big question. But here is some piece on humans in that picture.

Modularization – once more

Let’s concentrate on modularization as THE tool to scale software development successfully and the lack that drives projects into death march mode and eventually into failure.

stackIn this write-up, modularization is all the organization of structure of software above the coding level and below the actual process design. All the organization of artefacts to obtain building blocks for the assembly of solutions that implement processes as desired. In many ways the conceptual or even practical interface between specified processes to implement and their actual implementation. So in short: This is not about any specific modularization approach into modules, packages, bundles, namespaces or whatever.

I hereby boldly declare that modularization is about

Isolation of declarations and artifacts from the visibility and harmful impact on other declarations, artifacts, and resource;

Sharing of declarations, artifacts, resources with other declarations, artifacts, and resources in a controlled way;

Leading the extensibility and further development by describing structure and interplay declarations, artifacts, and resources in an instructive way.

Depending on the specific toolset, using these mechanism we craft APIs and implementations and assemble systems from modular building blocks.

If this was only some well-defined engineering craft, we would be done. Obviously this is not the case as so many projects end up as some messy black hole that nobody wants to get near.

The Problem is looking back at you from the Mirror

The task of getting a complex software system into shape is a task that is performed by a group of people and is hence subject to all human flaws and failures we see elsewhere – sometimes leading to one of the greater examples of successful teamwork assembling something much greater than the sum of its pieces.

I found it appropriate to follow the lead of the deadly sins and divine virtues.

hubrisLets start by talking about hubris: The lack of respect when confronted with the challenge of growing a system and overestimating of abilities to fix structural problems on the go. “That’s not rocket science” has been heart more than once before hitting the wall.

greedThis is followed closely in rank and time by syndroms of greed. The unwillingness to invest into structural maintenance. Not so much when things start off, but very much further down the timeline when restructurings are required to preserve the ability to move on.

gluttonyDifferent, but possibly more harmful, is astronaut-architecting, creating an abundance of abstraction layers and “too-many-steps-ahead” designs. The modularization gluttony.

lustTaking pride in designing for unrequested capabilities while avoiding early verticals, showing off platforms and frameworks where solutions and early verticals are expected is a sign of over-indulgence in modularization lust and built-up of vainglory from achievements that can be useful at best but are secondary for value creation.

lazinessjealousNow sticking to a working structure and carefully evolving it for upcoming tasks and challenges requires an ongoing team effort and a practiced consensus. Little is as destructive as team members that work against a commonly established practice out of wrath, resentment, ignorance, or simply sloth.

Modularization efforts fail out of ignorance and incompetence

But it does not need to. If there are human sins increasing the likelihood of failure, virtues should work the opposite.

justiceEvery structure is only as good as it is adaptable. A certain blindness for personal taste in style and people may help implement justice towards future requirements and team talent and so improve development scalability. Offering an insulation of harmful influences, a modularized structure can limit the impact of changes that are still to be proven valuable.

bravenessAt times it is necessary to restructure larger parts of the code base that are either not up to latest requirements or have been silently rotting due to unfitness for some time already. It can take enormous courage and discipline to pass through days or weeks of work for a benefit that is not immediate.

prudenceCourage is nothing without the prudence to guide it towards the right goal, including the correction of previous errors.

temperanceThe wise thing however is to avoid getting driven too far by the momentum of gratifying design by subjecting yourself to a general mode of temperance and patience.

 

 

MAY YOUR PROJECTS SUCCEED!

(Pictures by Pieter Brueghel the older and others)

 

IT Projects vs. Product Projects

A while back when I was discussing a project with a potential client, I was amazed at how little willingness to invest into analysis and design there was. Instead of trying to understand the underlying problem space and projecting what would have to happen in near and mid-term future, the client wanted an immediate solution recipe – something that simply would fix it for now.

What had happened?

I acted like a product developer – the client acted like an IT organization

This made me wonder about the characteristic differences between developing a product and solving problems in an IT organization.

A Question of Attitude

Here’s a little – incomplete –  table of attitudes that I find characterize the two mindsets:

IT Organization Product Organization
Let’s not talk about too much – make decisions! Let’s think it through once more.
The next goal counts. No need to solve problems we do not experience today. We want to solve the “whole” problem – once and forever.
Maintenance and future development is a topic for the next bugdet round. Let’s try to build something that has defined and limited maintenance needs and can be developed further with little modification.
If it works for hundreds, it will certainly work perfectly well for billions. Prepare for scalability challenges early.
We have an ESB? Great let’s use that! We do not integrate around something else. We are a “middle” to integrate with.
There is that SAP /ORCL consultant who claims to know how to do it? Let him pay to solve it! We need to have the core know-how that helps us plan for the future.

I have seen these in action more than once. Both points of view are valid and justified: Either you care about keeping something up and running within budget or you care about addressing a problem space for as long and as effectively as possibly. Competing goals.

It gets a little problematic though when applying the one mindset onto the other’s setting. Or, say, if you think you are solving an IT problem but are actually having a product development problem at hand.

For example, a growing IT organization may discover that some initially simple job of maintaining software client installations on workstations reaches a level that procedures to follow have effectively turned into a product – a solution to the problem domain – without anybody noticing nor paying due attention.

The sad truth is that you cannot build a product without some foresight and long lasting mission philosophy. Without growing and cultivating an ever refined “design story” any product development effort will ends up as the stereotypical big ball of mud.

In the case of the potential client of ours, I am not sure how things worked out. Given their attitude I guess they simply ploughed on making only the most obvious decisions – and probably did not get too far.

Conclusion

As an IT organization make sure not to miss when a problem space starts asking for a product development approach – when it will pay off to dedicate resources and planning to beat the day-to-day plumbing effort by securing key field expertise and maintaining a solution with foresight.

Java Modularity – Failing once more?

Like so many others, I have pretty much ignored project Jigsaw for some time now – assuming it would stay irrelevant for my work or slowly fade away and be gone for good. The repeated shifts in planned inclusion with the JDK seemed to confirm this course. Jigsaw started in 2009 – more than six years ago.

Jigsaw is about establishing a Java Module System deeply integrated with the Java language and core Java runtime specifications. Check out its goals on the project home page. It is important to note the fourth goal:

 

Make it easier for developers to construct and maintain libraries and large applications, for both the Java SE and EE Platforms.

 

Something Missing?

Lately I have run into this mail thread: http://permalink.gmane.org/gmane.comp.java.openjdk.jigsaw/2492

In that mail thread Jürgen Höller (of Spring fame) notes that in order to map Spring’s module layout to Jigsaw modules, it would be required to support optional dependencies – dependencies that may or may not be satisfied given the presence of another module at runtime.

This is how Spring supports its set of adapter and support types for numerous other frameworks that you may or may not use in your application: Making use of Java’s late linking approach, it is possible to expose types that may not be usable without the presence of some dependent type but will not create a problem unless you start using them either. That is, optional dependencies would allow Spring to preserve its way of encapsulating the subject of „Spring support for many other libraries“ into one single module (or actually a jar file).

In case you do not understand the technical problem, it is sufficient to note that anybody who has been anywhere near Java class loading considerations as well as actual Java application construction in real live should know that Spring’s approach is absolutely common for Java infrastructure frameworks.

Do Jigsaw developers actually know or care about Java applications?

Who knows, maybe they simply forgot to fix their goals. I doubt it.

 

Module != JAR file

 

There is a deeper problem: The overloaded use of the term module and the believe of infrastructure developers in the magic of the term.

Considering use of the module term in programming languages, it typically denotes some encapsulation of code with some interface and rules on how to expose or require some other module. This is what Jigsaw focussed on and it is what OSGi focussed on. It is what somebody interested in programming language design would most likely do.

In Java this approach naturally leads using or extending the class loading mechanism to expose or hide types between modules for re-use (or information hiding resp.) which in turn means to invent descriptors that describe use relationships (meaning the ability to reference types in this case) and so on.

This is what Jigsaw does and this is what OSGi did for that matter.

It is not what application developers care about – most of the time.

There is an overlap in interest of course. Code modules are an important ingredient in application assembly. Problems of duplicate type definitions by the same name (think different library versions) and separation of API and implementation are essential to scalable, modular system design.

But knowing how to build a great wall is not the same as knowing how to build a great house.

From an application development perspective, a module is much rather a generic complexity management construct. A module encapsulates a responsibility and in particular should absolutely not be limited to code and is not particularly well-served by squeezing everything into the JAR form factor.

What we see here is a case of Application vs. Infrastructure culture clash in action (see for example Local vs. Distributed Complexity).

The focus on trying to find a particularly smart and technically elegant solution for the low-level modularization problem eventually hurts the usefulness of the result for the broader application development community (*).

Similarly, ignorance of runtime modularization leads to unmaintainable, growth-limited, badly deployable code bases as I tried to describe in Modularization is more than cutting it into pieces.

The truth is somewhere in between – which is necessarily less easily described and less universal in nature.

I believe that z2 is one suitable approach for a wide class of server-side applications. Other usage scenarios might demand other approaches.

I believe that Jigsaw will not deliver anything useful for application developers.

I wish you a happy new year 2016!

References

Ps.:

* One way of telling that the approach will be useless for developers is when discussions conclude that “tools will fix the complexity”. What that comes down to is that you need a compiler (the tool) to make use of the feature, which in turn means you need another input language. So who is going to design that language and why would that be easier?

* It is interesting to check out the history of SpringSource’s dm Server (later passed on to R.I.P. at Eclipse as project Virgo). See in particular the interview with Rod Johnson.

Z2 as a Functional Application Server

Intro

As promised this is first in a series of posts elaborating on integration of Clojure with Z2. It probably looks like a strange mix, however I believe it’s extremely empowering combination of two technologies sharing lots of design philosophy. Clojure has brought me lots of joy by enabling me to achieve much more in my hobby projects and I see how the combination of z2 and Clojure further extends the horizon of what’s possible. I’d be happy if I manage to help other people give it a try and benefit in the same way I did.

The LISP universe is very different one. It’s hard to convince someone with zero previous experience to look at the strange thing with tons of parentheses written using polish notation, so I am going to share my personal story and hope it resonates with you. I will focus on the experience or how I “felt” about it. There is enough theory and intellectual knowledge on the internet already and I will link to it where appropriate.

So, given this is clearly personal and subjective view, let’s put in some context.

Short Bio

I’ve been using Java professionally for 12+ years. Predominantly in the backend. I’ve worked on application servers, as well as on business applications in large, medium and small organizations. Spring is also something I have been heavily relying on in the last 8 years. Same goes for maven. I’ve used JBoss and done bit of application server development myself, but when Spring Boot came up I fell in love with it.

Like every other engineer out there my major struggle through the years have been to manage the complexity. The inherent complexity of the business problem we have to solve plus the accidental complexity added by our tools, our poor understanding of the problem domain and our limited conceptual vocabulary. I have been crushed more than once under the weight of that complexity. Often my own and the team’s share would be more than 50%. I have seen firsthand how poorly groomed code base ends up in state where the next feature is just not possible. This has real business impact.

The most scary thing about complexity is that it grows exponentially with size. This is why I strongly subscribe to the “code is liability” worldview. Same goes for organizations. The slimmer you are the faster and further you can go.

Ways to deal with complexity

Now that the antagonist is clearly labeled, let’s focus on my survival kit.

#1 Modularization

One powerful way to get on top of complexity is divide and conquer by using modularization. This is where z2 comes into the game. It has other benefits as well, but I would put it’s modularization capabilities as feature #1. Maven and Spring have been doing that for me through the years. On more coarse level Tomcat and JBoss provide some modularization facilities as well, however it is extremely rare in my experience where they are deliberately exploited.

Getting modularization right is hard on both ends:

  • The framework has to strike a balance between exercising control and enabling extensibility, otherwise it becomes impractical.
  • The component developers still have to think hard and define boundaries of “things” while using the framework idioms with mastery. I haven’t met yet technology that removes this need. It’s all about methodology and concepts (I dislike the pattern cult).

Discussing more precise definition of what exactly is modularization and why the well-known methodologies are too general to be useless as recipes is too big of discussion for here.

My claim is that z2 strikes the best balance I have seen so far while employing very powerful concept.

#2 Abstractions

Another powerful way is to use better abstractions. While modularization puts structure in chaos, the right abstractions reduce the amount of code and other artifacts, hence the potential for chaos. Just like any other thing, all abstractions are not made equal and I assume they can be ordered according to their power.

My personal definition for power: if abstraction A allows you to achieve the same result with less code than abstraction B then it’s more powerful. Of course reality is much more hairy than this. You have to account for abstraction’s implementation, investment in learning, long term maintenance costs & so on.

Alternative definition: if abstraction A allows you to get further in terms of project size and complexity (before the project collapses) it’s more powerful.

The abstractions we use on daily basis are strongly influenced by the language. Language can encourage, discourage (due ergonomics) or even blacklist an abstraction by having no native support for it. My claim here is that the Java language designers have made some very limiting choices and this has profound effect on the overall productivity as well as the potential access to new abstractions. Clojure, on the other side has excellent mix right out of the box with convenient access to very wide range of other abstractions.

The OO vs. FP discussion deserves special attention and will get it. I won’t claim that Clojure is perfect, far from it. However the difference in power I have experienced is significant and big part of that difference is due to carefully picked set of base abstractions implemented in very pragmatic way.

 So, what’s next?

Next comes the story how Java and DDD helped me survive and how JavaScript made me feel like a fool for wasting so many hours slicing problems the wrong way and worrying about insignificant things. Clojure will show up as well, you can count on this.

While you wait for the next portion, here are two links that have influenced heavily my current thinking:

  • Beating the averages — the blub paradox has been an eye opening concept for me. I have read this article in 2008 for the first time and kept coming back to it. It validated my innate tendency to be constantly dissatisfied with how things are and look for something better. Paradoxically, it never made me try out LISP 🙂
  • Simple made easy — This is the presentation that among other side effects made me give Clojure a chance. This presentation probably has the best return of investment for an hour spent in front of the screen.

Microservices Nonsense

Microservice Architecture (MSA) is a software design approach in which applications are intentionally broken up into remoteable services, in order built from small and independently deployable application building blocks with the goal of reducing deployment operations and dependency management complexity.

(See also Fowler, Thoughtworks)

Back in control

Sounds good, right? Anybody developing applications of some size knows that increasing complexity leads to harder to manage updates, increased deployment and restart durations, more painful distribution of deployables. In particular library dependencies have the tendency to get out of control and version graphs tend to become unmanageable.

So, why not break things up into smaller pieces and gain back control?

This post is on why that is typically the wrong conclusion and why Microservice Architecture is a misleading idea.

Conjunction Fallacy

From a positive angle one might say that MSA is a harmless case of a conjunction fallacy: Because the clear cut, that sounds more specific as a solution approach, makes it more plausible (see the Linda Problem of ….).

If you cannot handle it here, why do you think you can handle it here?

If you cannot organize your design to manage complexity in-process however, why should things work out more smoothly, if you move to a distributed setup where aspects like security, transaction boundaries, interface compatibility, and modifiability are substantially harder to manage (see also Distributed big ball of… )

No question, there can be good reasons for distributed architectures: Organization, load distribution, legacy systems, different expertise and technology preferences.

It’s just the platform (and a little bit of discipline)

Do size of deployables and dependency management complexity belong into that list?

No. The former simply implies that your technology choice has a poor roll-out model. In particular Java EE implementations are notoriously bad at handling large code bases (unlike, you might have guessed, z2). Similarly, loss of control over dependencies shows a lack of dependency discipline and, more often, a brutal lack of modularization effort and capabilities (see also Modularization is….)

Use the right tool

Now these problems might lead to an MSA approach out of desperation. But one should at least be aware that this is platform short-coming and not a logical implication of functional complexity.

If you were asked to move a piece of furniture you would probably use your car. If you were asked to move ten pieces of furniture, you would not look for ten cars – you would get a truck.

 

 

On Classpath Hygiene

On Classpath Hygiene

One of the nasty problems in large JVM-based systems is that of type conflicts. These arise when more than one definition of a class is found for one and the same name – or, similarly, if there is no single version of a given class that is compatible with all using code.

This post is about how much pain you can inflict, when you expose APIs in a modular environment and do not pay attention about unwanted dependencies exposed to your users.

These situations do not occur because of ignorance or negligence in the first place and most likely not in the code your wrote.

The actual root cause is, from another perspective, one of Java’s biggest strength: The enormous eco system of frameworks and libraries to chose from. Using some third party implementation almost always means to include some dependencies of other libraries – not necessarily of compatible versions.

Almost from its beginning, Java had a way of splitting “class namespaces” so that name clashes of classes with different code could be avoided and type visibility be limited – and not the least that coded may be retrieved from elsewhere (than the classpath of the virtual machine): Class loaders.

Even if they share the same name, classes loaded (defined) by one class loader are separate from classes loaded by other class loaders and may not be casted. They may share some common super type though or use identical classes on their signatures and in their implementation. Indeed the whole concept makes little sense if the splitting approach does not include an approach for sharing.

Isolation by class loaders combined with more or less clever ways of sharing types and resoures is the underpinning of all Java runtime modularization (as in any Java EE server, OSGi, and of course Z2).

In the default setup provided by Java’s makers, class loaders are arranged in a tree structure, where each class loader has a parent class loader:

standard

The golden rule is: When asked to load a class, a class loader first asks its parent (parent delegation). If the parent cannot provide the class, the class loader is supposed to search it in its own way and, if found, define the class with the VM.

This simple pattern makes sure that types available at some class loader node in the tree will be consistently shared by all descendants.

So far so good.

Frequently however, when developers invent the possibility of extension by plugins, modularization comes in as a kind of afterthought and little thinking is invested in making sure that plugin code gets to see no more than what is strictly needed.

Unfortunately, if you chose to expose (e.g.) a version of Hibernate via your API, you essentially make your version the one any only that can responsibly be used. This is a direct consequence of the standard parent-delegation model.

Now let’s imagine a that plugin cannot work with the version that was “accidentally” imposed by the class loading hierarchy, so that the standard model becomes a problem. Then, why not turn things around and let the plugin find it’s version with preference over the provided one?

This is exactly what many Java EE server developers thought as well. And it’s an incredibly bad solution to the problem.

Imagine you have a parent/child class loader setup, where the parent exposes some API with a class B (named “B”) that uses another class A (named “A”). Secondly assume that the child has some class C that uses a class A’ with the same name as A, “A”. Because of a local-first configuration, C indeed uses A’. This was setup due to some problem C had with the exposed class A of the parent.

local-first

Suppose that C can provide instances of A’ and you want to use that capability at some later time. That other time, an innocent

C c = new C(); 
B b = new B(); 
b.doSomethingWithA(c.getA());

will shoot you with a Classloading Constraint Violation Error because A and A’ are  incompatible from the JVM’s perspective – which is completely invisible from the code.

At this level, you might say that’s no big deal. In practice however, this happens somewhere deep down in some third party lib. And it happens at some surprise point in time.

Debug that!

Summary

Java Data API Design Revisited

When domain entities get bigger and more complex, designing a safe, usable, future-proof modification API is tricky.

 

This article is on a simple but effective approach to designing for data updates.

 

Providing an update API for a complex domain entity is more complicated than most developers initially expect. As usual, problems start showing when complexity increases.

Here’s the setup: Suppose your software system exposes a service API for some domain entity X to be used by other modules.

When using the Java Persistence API (JPA) it is not uncommon to expose the actual domain classes for API users. That greatly simplifies simple updates: Just invoke domain class setters, and unless the whole transaction fails, updates will be persisted. There is a number of problems with that approach though. Here are some:

  • If modifications of the domain object instance are not performed in one go, other code invoked in between may see inconsistent states (this is one reason why using immutables are favourable).
  • Updates that require non-trivial constraint checking may not be performed on the entity in full but rather require service invocations – leading to a complex to use API.
  • Exposing the persistent domain types, including their “transparent persistence” behavior is very much exposing the actual database structure which easily deviates from a logical domain model over time, leading to an API that leaks “internal” matters to its users.

The obvious alternative to exposing JPA domain classes is to expose read-only, immutable domain type interfaces and complement that by service-level modification methods whose arguments represent all or some state of the domain entity.

Only for very simple domain types, it is practical to offer modification methods with simple built-in types such as numbers or strings though, as that leads to hard to maintain and even harder to use APIs.

Alas, we need some change describing data transfer object (DTO – we use that term regardless of the remoting case) that can serve as a parameter of our update method.

As soon as updates are to prepared either remotely or in some multi-step editing process, intermediate storage of yet-to-be-applied updates needs to be implemented and having some help for that is great in any case. So DTOs are cool.

Given a domain type X (as read-only interface), and some service XService we assume some DTO type XDto, so that the (simplified) service interface looks like this:

 

public interface XService {
 X find(String id);
 X create(XDto xdto);
 X update(String id, XDto xdto);
}

 

If XDto is a regular Java Bean with some members describing updated attributes for X, there are a few annoying issues that take away a lot of the initial attractiveness:

  • You cannot differ a null value from undefined. That is, suppose X has a name attribute and XDto has a name attribute as well – describing a new value for X’s attribute. In that case, null may be a completely valid value. But then: How to describe the case that no change at all should be applied?
  • This is particularly bad, if setting some attribute is meant to trigger some other activity.
  • You need to write or generate a lot of value object boilerplate code to have good equals() and hashcode() implementations.
  • As with the first issue: How do you describe the change of a single attribute only?

In contrast to that, consider an XDto that is implemented as an extension of HashMap<String,Object>:

public class XDto extends HashMap<String,Object> {
  public final static String NAME = "name";
  public XDto() { }
  public XDto(XDto u) {
    if (u.containsKey(NAME)) { setName(u.getName()); }
  }
  public XDto(X x) {
    setName(x.getName());
  }
  public String getName() {
    return (String) get(NAME);
  }
  public void setName(String name) {
    put(NAME,name);
  }
}

Apart from having a decent equals, hashcode, toString implementation, considering it is a value object, this allows for the following features:

  • We can perfectly distinguish between null and undefined using Map.containsKey.
  • This is great, as now, in the implementation of the update method for X, we can safely assume that any single attribute change was meant to be. This allows for atomic, consistent updates with very relaxed concurrency constraints.
  • Determining the difference, compared to some initial state is just an operation on the map’s entry set.

 

In short: We get a data operation programming model (see the drawing below) consisting of initializing some temporary update state as a DTO, operating on this as long as needed, extracting the actual change by comparing DTOs, sending back the change

 

Things get a little more tricky when adding collections of related persistent value objects to the picture. Assume X has some related Ys that are nevertheless owned by X. Think of a user with one or more addresses. As for X we assume some YDto. Where X has some method getYs that returns a list of Y instances, XDto now works with YDtos.

Our goals is to use simple operations on collections to extend the difference computation from above to this case. Ideally, we support adding and removing of Y’s as well as modification, where modified Y‘s should be represented, for update, with a “stripped” YDto as above.

Here is one way of achieving that: As Y is a persistent entity, it has an id. Now, instead of holding on to a list of YDto, we construct XDto to hold a list of pairs (id, value).

Computing the difference between two such lists of pairs means to remove all that are equal and in addition, for those with the same id, to recures into YDto instances for difference computation. Back on the list level, a pair with no id indicates a new Y to be created, a pair with no YDto indicates a Y that no longer is part of X. This is actually rather simple to implement generically.

That is, serializated as JSON, the delta between two XDto states with modified Y collection would look like this:

{
  y:[
    {“id”:”1”, “value”:{“a”=”new A”}},             // update "a" in Y "1"
    {“id”:”2” },                                   // delete Y "2"
    {“value” : {“a”=”initial a”, “b”:”initial b”}} // add a new Y
  ]
}

All in all, we get a programming model that supports efficient and convenient data modifications with some natural serialization for the remote case.

cp

The supplied DTO types serve as state types in editors (for example) and naturally extend to change computation purposes.

As a side note: Between 2006 and 2008 I was a member of the very promising Service-Data-Objects (SDO) working group. SDO envisioned a similar programming style but went much further in terms of abtraction and implementation requirements. Unfortunately, SDO seems to be pretty much dead now – probably due to scope creep and lack of an accessible easy to use implementation (last I checked). Good thing is we can achieve a lot of its goodness with a mix of existing technologies.

References