Continuity is King

towerUnfortunately there has been so much going on in my work life and my private life lately that I didn’t get around thinking and writing much.

Here is just a short note that v2.4 of z2 is ready: v2.4

It simply upgrades z2 to Java 8 and upgrades the version of Jetty we use to 9.3. The latter implies that Java 8 is a strict requirement too.

Little change is good as those projects that run with z2 need anything but platform disruption at this time.

As the core did not change incompatibly (at least not that I know), using a v2.4 core with previous z2-base version 2.3 will simply add Java 8 support to any such setup as well.

Anyway here’s what I hope to continue on once I am back to normal operations again:

  • A piece on how to use clojure on z2 with a guest author from Vienna
  • A piece on a variation of the Stockholm Syndrome that can be observed in the relationship of developers with their toolset
  • A piece on how organization can be classified as Project vs. Product driven

 

Microservices Nonsense

Microservice Architecture (MSA) is a software design approach in which applications are intentionally broken up into remoteable services, in order built from small and independently deployable application building blocks with the goal of reducing deployment operations and dependency management complexity.

(See also Fowler, Thoughtworks)

Back in control

Sounds good, right? Anybody developing applications of some size knows that increasing complexity leads to harder to manage updates, increased deployment and restart durations, more painful distribution of deployables. In particular library dependencies have the tendency to get out of control and version graphs tend to become unmanageable.

So, why not break things up into smaller pieces and gain back control?

This post is on why that is typically the wrong conclusion and why Microservice Architecture is a misleading idea.

Conjunction Fallacy

From a positive angle one might say that MSA is a harmless case of a conjunction fallacy: Because the clear cut, that sounds more specific as a solution approach, makes it more plausible (see the Linda Problem of ….).

If you cannot handle it here, why do you think you can handle it here?

If you cannot organize your design to manage complexity in-process however, why should things work out more smoothly, if you move to a distributed setup where aspects like security, transaction boundaries, interface compatibility, and modifiability are substantially harder to manage (see also Distributed big ball of… )

No question, there can be good reasons for distributed architectures: Organization, load distribution, legacy systems, different expertise and technology preferences.

It’s just the platform (and a little bit of discipline)

Do size of deployables and dependency management complexity belong into that list?

No. The former simply implies that your technology choice has a poor roll-out model. In particular Java EE implementations are notoriously bad at handling large code bases (unlike, you might have guessed, z2). Similarly, loss of control over dependencies shows a lack of dependency discipline and, more often, a brutal lack of modularization effort and capabilities (see also Modularization is….)

Use the right tool

Now these problems might lead to an MSA approach out of desperation. But one should at least be aware that this is platform short-coming and not a logical implication of functional complexity.

If you were asked to move a piece of furniture you would probably use your car. If you were asked to move ten pieces of furniture, you would not look for ten cars – you would get a truck.

 

 

If you do it like all the others, what makes you believe you will do any better?

When deciding on the choice of tool, process, or methodology, developers, architects, and team leads feel a strong urge to “do exactly as is best practice” (or standard), that is, as recommended on popular web sites or in text books.

On average following some “best practice” may protect against bad mistakes. Some reasoning on why choose some approach over another should always be done though. Sometimes there is glaring examples that show that some given best practice is quite obviously not so good after all.

An Example

For the sake of this post I will use the example of “Feature Branching”. The idea of feature branching is that instead of having many developers work on the same code line, while implementing different features, to have developers create branches per feature that will be integrated back into the main code line when done.

While the first part of this idea sounds fantastic and fits wonderfully to models of distributed source control, the second part becomes absurdly complex for large and “wide” code bases and when applied with more than a handful of developers.

There is no need to discuss for what kind of projects and management setups feature branching may work. Assuming you are about to develop an actual solution that is under constant development, feature branching obviously does not work well, as companies that have products with large code bases do simply not use this approach.

This is obvious in the sense that where large teams work on a large solution code base (note: see also local vs. distributed complexity), feature branching is not used as a means of regular feature development. This, to my knowledge includes Facebook, Google, and definitely not traditional SAP (check out the references below).

How Come?

How come something is touted as best practice that does not get embraced where it should actually matter most: At scale?

Here are some guesses: For once, peer pressure is strong. It can be a frustrating intellectual effort to argue against a best practice. Secondly “experts” are somewhat wrong most of the time, simply because professional writing about a field as wide as software engineering leaves little time to actually practice first-handedly what you are writing about. Most expert authors will only ever have experienced tools and approaches for a short time for small problems – and actually have little incentive to do otherwise. And finally: Something that solves a problem in the small and distributed (as often in the OSS community) frequently does not work well in the large and interconnected.

But then: Does it make sense to do differently?

It obviously does – considering the examples above. But one does not look to the likes of facebook. The simple truth is that, if you do not stand out in some way, you are by definition mediocre – which is nothing else but saying that other than for political protection or some other non-market reason, there is no particular reason to have you win the market.

When does it make sense to do differently?

Even assuming you are completely sure to know a better approach, when does it make sense to fight for adoption?

I think the simple answer is: For disruptive approaches, the only meaningful point in time to fight for it is in the very beginning of a project or product (and sometimes organization).

Remember the technology adoption life cycle (see “crossing the chasm” and others):

What this says is that even if you manage to win the enthusiasts, winning the mainstream audience is the harder part.

In our example, the market is the product organization and the disruptive tool or approach is the technology we want to see used. Luckily, initially our organization will have a significant number of visionaries and enthusiasts with respect to anything that promises to give our product a head start.

Over time, choices have been made, customers acquired, visionaries will become pragmatics and the willingness to do anything but the most predictable and least harmful as far the specific product’s development is concerned will fade. That is, the product organization as a whole makes a transition very much in correspondence with the product’s target group changes.

Consequently, introducing a disruptive change from within might turn out to be a futile exercise in any but the earliest stages of a product’s life cycle.

Summary

Doing differently can be the difference between mediocrity and excellence. Foundations are laid at the beginning of product development. Non-mainstream choices must be made early on.

References

  1. Martin Fowler on Feature Branching
  2. Does-Facebook-use-feature-branching (Quora)
  3. Paul Hammant on Trunk Based Development
  4. Paul Hammant on Google’s vs Facebook’s Trunk Based Development

 

The Mechanics of Getting Known

“…a soliton is a self-reinforcing solitary wave (a wave packet or pulse) that maintains its shape while it propagates at a constant velocity. Solitons are caused by a cancellation of nonlinear and dispersive effects in the medium.” (see Wikipedia).

Solitons occur in shallow water. Shock waves like Tsunamis can be modeled as solitons. Solitons can also be observed in lattices (see Toda-Lattice).

Among the many interesting properties of solitons is that solitons can pass “through” each other while overtaking – as if they move completely independently of each other:

By Kraaiennest (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

By Kraaiennest (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)%5D, via Wikimedia Commons

This post is my own little theory on the mechanics of how an information (on a subject, person, anything) becomes relevant – over time:

The Theory

An information on a subject, such as “Grails is very popular among Web developers” (of which I am not so sure anymore) or “No point in buying a Blackberry phone – they will be gone any time soon” (I bought one just last fall) or “Web development agency X is reliable and competent” (this time I really don’t know) spreads in a lattice of people (vertices) and relationships (edges) just like a soliton. It may pass others and it may differ in velocity of traversal.

Its velocity corresponds to how “loud” it is, its amplitude. It is louder, if it is generally considered more noteworthy when it entered the lattice.

As so many information snippets reach us every day, we sort out most as insignificant right away. So what makes a piece of information memorable and in particular recallable (e.g. when wondering “what is actually a good Web development agency?”) or even trigger an action like researching something in more depth?

It is the number of times and some (yet unknown) increasing function of the sum of amplitudes of all times that that piece of information (and its equivalent variants) has reached us.

So what?

Now that we have this wonderful theory, let’s see where that takes us.

It fits to common observations: Big marketing campaigns (high amplitude) send big solitons into the lattice. They do not necessary suffice to create action and so need to be augmented with talks, articles, rumors to add more hits,

Also it explains why that is equivalent to creating many small information solitons. There is great examples of open source tools that made it to impressive fame via repeated references in articles and books – without any big bang.

Most importantly, it explains the non-linearity of “return” on marketing: Little will lead to nothing in the short term. Not just little but actually nothing. Over time however hit thresholds will be exceeded and interest lead to action. As the speed with which solitons pass through the lattice does not change talking to many will not speed up the overall process – but increase the later return instead.

References

Surprisingly enough, some 15 years ago, as part of my dissertation work, I published some math papers on PDEs with solitons:

On Classpath Hygiene

On Classpath Hygiene

One of the nasty problems in large JVM-based systems is that of type conflicts. These arise when more than one definition of a class is found for one and the same name – or, similarly, if there is no single version of a given class that is compatible with all using code.

This post is about how much pain you can inflict, when you expose APIs in a modular environment and do not pay attention about unwanted dependencies exposed to your users.

These situations do not occur because of ignorance or negligence in the first place and most likely not in the code your wrote.

The actual root cause is, from another perspective, one of Java’s biggest strength: The enormous eco system of frameworks and libraries to chose from. Using some third party implementation almost always means to include some dependencies of other libraries – not necessarily of compatible versions.

Almost from its beginning, Java had a way of splitting “class namespaces” so that name clashes of classes with different code could be avoided and type visibility be limited – and not the least that coded may be retrieved from elsewhere (than the classpath of the virtual machine): Class loaders.

Even if they share the same name, classes loaded (defined) by one class loader are separate from classes loaded by other class loaders and may not be casted. They may share some common super type though or use identical classes on their signatures and in their implementation. Indeed the whole concept makes little sense if the splitting approach does not include an approach for sharing.

Isolation by class loaders combined with more or less clever ways of sharing types and resoures is the underpinning of all Java runtime modularization (as in any Java EE server, OSGi, and of course Z2).

In the default setup provided by Java’s makers, class loaders are arranged in a tree structure, where each class loader has a parent class loader:

standard

The golden rule is: When asked to load a class, a class loader first asks its parent (parent delegation). If the parent cannot provide the class, the class loader is supposed to search it in its own way and, if found, define the class with the VM.

This simple pattern makes sure that types available at some class loader node in the tree will be consistently shared by all descendants.

So far so good.

Frequently however, when developers invent the possibility of extension by plugins, modularization comes in as a kind of afterthought and little thinking is invested in making sure that plugin code gets to see no more than what is strictly needed.

Unfortunately, if you chose to expose (e.g.) a version of Hibernate via your API, you essentially make your version the one any only that can responsibly be used. This is a direct consequence of the standard parent-delegation model.

Now let’s imagine a that plugin cannot work with the version that was “accidentally” imposed by the class loading hierarchy, so that the standard model becomes a problem. Then, why not turn things around and let the plugin find it’s version with preference over the provided one?

This is exactly what many Java EE server developers thought as well. And it’s an incredibly bad solution to the problem.

Imagine you have a parent/child class loader setup, where the parent exposes some API with a class B (named “B”) that uses another class A (named “A”). Secondly assume that the child has some class C that uses a class A’ with the same name as A, “A”. Because of a local-first configuration, C indeed uses A’. This was setup due to some problem C had with the exposed class A of the parent.

local-first

Suppose that C can provide instances of A’ and you want to use that capability at some later time. That other time, an innocent

C c = new C(); 
B b = new B(); 
b.doSomethingWithA(c.getA());

will shoot you with a Classloading Constraint Violation Error because A and A’ are  incompatible from the JVM’s perspective – which is completely invisible from the code.

At this level, you might say that’s no big deal. In practice however, this happens somewhere deep down in some third party lib. And it happens at some surprise point in time.

Debug that!

Summary

Working on z2env Version 3

Despite its fantastic qualities as a development and execution environment, z2’s adoption is very low. That of course does not at all stop us from improving it further (as we are actively benefiting from it anyway).

Whenever I talk about z2, the feedback is typically in one of two categories.

The first one is the “I don’t get it”-category.

There was a time when running builds was such a natural ingredient of software development to me, that I would have been in that category as well. So I forgive them their ignorance.

The other category is the “Great idea – … to bad I cannot use it”-category.

Being a disruptive approach and knowing how change averse the development community is (contrary to common belief), it is natural that z2 has to fight with adoption. Specifically, the more profound critique towards z2 is about being too big, too proprietary, too non-standard.

So this is what version 3 is all about:

Less and more focussed

The one thing z2 is about is removing obstacles between code and execution. You should only think about code, modules, software structure. In order to enhance the “do one thing and do it well” qualities of z2, we will strip of capabilities that may not be totally obvious (like for example z2’s JTA implementation, support for worker processes) and either drop those completely or move them into addons.

Better and Friendlier Open Source

Z2 has always been open source. In version 3 all package names will be “org.z2env” and, possibly more interesting than that cosmetic change, we will make sure that there will be no use of libraries with a problematic license like GPL. Only Apache 2 or compatible licenses will be contained.

Integrating with Tomcat

Previously, z2 embedded Jetty as its preferred Web Container. Jetty is a great Web container and its embeddability is (rightfully) legendary. The vast majority of Java developers use Tomcat though.

With version 3 we found a cool way of hooking z2 up with your ordinary Tomcat installation and its configuration, so that Web applications defined in z2 work next to whatever else you have deployed.

If TomEE would not make such harsh structural assumptions on application deployment – assumptions we cannot agree with, and much less adhere to – we would even have EJBs in z2.

That is no big deal though – as I have the vague feeling that EJB enthusiasts are probably even less likely to adopt z2.

Getting Started

Enough talk! While there is still a lot to do (porting the Spring add-on and all the sample applications), a simple Getting Started guide can be found here

https://redmine.z2-environment.net/projects/z2env/wiki/Getting_Started

Willing to invest 15 Min into something cool? Here it is!

Feedback greatly welcome!

Java Data API Design Revisited

When domain entities get bigger and more complex, designing a safe, usable, future-proof modification API is tricky.

 

This article is on a simple but effective approach to designing for data updates.

 

Providing an update API for a complex domain entity is more complicated than most developers initially expect. As usual, problems start showing when complexity increases.

Here’s the setup: Suppose your software system exposes a service API for some domain entity X to be used by other modules.

When using the Java Persistence API (JPA) it is not uncommon to expose the actual domain classes for API users. That greatly simplifies simple updates: Just invoke domain class setters, and unless the whole transaction fails, updates will be persisted. There is a number of problems with that approach though. Here are some:

  • If modifications of the domain object instance are not performed in one go, other code invoked in between may see inconsistent states (this is one reason why using immutables are favourable).
  • Updates that require non-trivial constraint checking may not be performed on the entity in full but rather require service invocations – leading to a complex to use API.
  • Exposing the persistent domain types, including their “transparent persistence” behavior is very much exposing the actual database structure which easily deviates from a logical domain model over time, leading to an API that leaks “internal” matters to its users.

The obvious alternative to exposing JPA domain classes is to expose read-only, immutable domain type interfaces and complement that by service-level modification methods whose arguments represent all or some state of the domain entity.

Only for very simple domain types, it is practical to offer modification methods with simple built-in types such as numbers or strings though, as that leads to hard to maintain and even harder to use APIs.

Alas, we need some change describing data transfer object (DTO – we use that term regardless of the remoting case) that can serve as a parameter of our update method.

As soon as updates are to prepared either remotely or in some multi-step editing process, intermediate storage of yet-to-be-applied updates needs to be implemented and having some help for that is great in any case. So DTOs are cool.

Given a domain type X (as read-only interface), and some service XService we assume some DTO type XDto, so that the (simplified) service interface looks like this:

 

public interface XService {
 X find(String id);
 X create(XDto xdto);
 X update(String id, XDto xdto);
}

 

If XDto is a regular Java Bean with some members describing updated attributes for X, there are a few annoying issues that take away a lot of the initial attractiveness:

  • You cannot differ a null value from undefined. That is, suppose X has a name attribute and XDto has a name attribute as well – describing a new value for X’s attribute. In that case, null may be a completely valid value. But then: How to describe the case that no change at all should be applied?
  • This is particularly bad, if setting some attribute is meant to trigger some other activity.
  • You need to write or generate a lot of value object boilerplate code to have good equals() and hashcode() implementations.
  • As with the first issue: How do you describe the change of a single attribute only?

In contrast to that, consider an XDto that is implemented as an extension of HashMap<String,Object>:

public class XDto extends HashMap<String,Object> {
  public final static String NAME = "name";
  public XDto() { }
  public XDto(XDto u) {
    if (u.containsKey(NAME)) { setName(u.getName()); }
  }
  public XDto(X x) {
    setName(x.getName());
  }
  public String getName() {
    return (String) get(NAME);
  }
  public void setName(String name) {
    put(NAME,name);
  }
}

Apart from having a decent equals, hashcode, toString implementation, considering it is a value object, this allows for the following features:

  • We can perfectly distinguish between null and undefined using Map.containsKey.
  • This is great, as now, in the implementation of the update method for X, we can safely assume that any single attribute change was meant to be. This allows for atomic, consistent updates with very relaxed concurrency constraints.
  • Determining the difference, compared to some initial state is just an operation on the map’s entry set.

 

In short: We get a data operation programming model (see the drawing below) consisting of initializing some temporary update state as a DTO, operating on this as long as needed, extracting the actual change by comparing DTOs, sending back the change

 

Things get a little more tricky when adding collections of related persistent value objects to the picture. Assume X has some related Ys that are nevertheless owned by X. Think of a user with one or more addresses. As for X we assume some YDto. Where X has some method getYs that returns a list of Y instances, XDto now works with YDtos.

Our goals is to use simple operations on collections to extend the difference computation from above to this case. Ideally, we support adding and removing of Y’s as well as modification, where modified Y‘s should be represented, for update, with a “stripped” YDto as above.

Here is one way of achieving that: As Y is a persistent entity, it has an id. Now, instead of holding on to a list of YDto, we construct XDto to hold a list of pairs (id, value).

Computing the difference between two such lists of pairs means to remove all that are equal and in addition, for those with the same id, to recures into YDto instances for difference computation. Back on the list level, a pair with no id indicates a new Y to be created, a pair with no YDto indicates a Y that no longer is part of X. This is actually rather simple to implement generically.

That is, serializated as JSON, the delta between two XDto states with modified Y collection would look like this:

{
  y:[
    {“id”:”1”, “value”:{“a”=”new A”}},             // update "a" in Y "1"
    {“id”:”2” },                                   // delete Y "2"
    {“value” : {“a”=”initial a”, “b”:”initial b”}} // add a new Y
  ]
}

All in all, we get a programming model that supports efficient and convenient data modifications with some natural serialization for the remote case.

cp

The supplied DTO types serve as state types in editors (for example) and naturally extend to change computation purposes.

As a side note: Between 2006 and 2008 I was a member of the very promising Service-Data-Objects (SDO) working group. SDO envisioned a similar programming style but went much further in terms of abtraction and implementation requirements. Unfortunately, SDO seems to be pretty much dead now – probably due to scope creep and lack of an accessible easy to use implementation (last I checked). Good thing is we can achieve a lot of its goodness with a mix of existing technologies.

References

 

Local vs. Distributed Complexity

As a student or programming enthusiast, you will spend considerable time getting your head around data structures and algorithms. It is those elementary concepts that make up the essential tool set to make a dumb machine perform something useful and enjoyable.

When going professional, i. e. when building software to be used by others, typically developers end up either building enabling functionality, e. g. low level frameworks and libraries (infrastructure) or applications or parts thereof, e. g. user interfaces, jobs (solutions).

There is a cultural divide between infrastructure developers and solution developers. The former have a tendency to believe the latter do somehow intellectually inferior work, while the latter believe the former have no clue about real life.

While it is definitely beneficial to develop skills in API design and system level programming, without the experience of developing and delivering an end-to-end solution however, this is like knowing the finest details on kitchen equipment without ever cooking for friends.

The Difference

A typical characteristic of an infrastructure library is a rather well-defined problem scope that is known to imply some level of non-trivial complexity in its implementation (otherwise it would be pointless):

 

Local complexity is expected and accepted.

 

In contrast, solution development is driven by business flows, end-user requirements, and other requirements that are typically far from stable until done and much less over time. Complete solutions typically consists of many spread out – if not distributed – implementation pieces – so that local complexity is simply not affordable.

 

Distributed complexity is expected, local complexity is not acceptable.

 

The natural learning order is from left to right:

local_to_distributed

Conclusion

Unfortunately many career and whole companies do not get past the infrastructure/solution line. This produces deciders that have very little idea about “the real” and tend to view it as a simplified extrapolation of their previous experience. Eventually we see astronaut architectures full of disrespect for the problem space, absurd assumptions on how markets adapt, and eventually how much time and reality exposure solutions require to become solid problem solvers.

 

Java EE is not for Standard Business Software

The “official” technology choice for enterprise software development on the Java platform is the Java Enterprise Edition or Java EE for short. Java EE is a set of specifications and APIs defined within the Java Community Process (JCP) – it is a business software standard.

 

This post is on why it is naive to think that knowing Java EE is your ticket to create for standard business software

I use the term standard business software for software systems that are developed by one party and used by many and that are typically extended and customized for and by specific users (customers) to integrate it with customer-specific business processes. The use of the word “standard” does not indicate that it is necessarily widely used or somehow agreed on by some committee – it just says that it is standardizing a solution approach to a business problem for a range of possible applications – and typically requires some form of adaptation before usable in a specific setting.

How hard can it be?

It is a myth that Java Enterprise development is harder than on other platforms – pre se. That is, from the point of view of the programming language and, specifically, the Java EE APIs, writing the software as such is not more complex compared to other environment. Complex software is complex, regardless of the technology choice.

In order to turn your software into “standard software” however, the following needs to be addressed as well:

You need an approach to customize and extend your software

This is only partially a software architecture problem. It is also means to provide your customer with the ability to add code, manage upgrades, integration test. Java EE provides very little in terms of code extensibility, close to nothing for modularity with isolation, and obviously it says nothing about how to actually produce software.

You need an operational approach

This is the one most underestimated aspect. While any developer knows that the actual Java EE implementation, the Java EE Server, makes a huge difference when things get serious, the simplistic message that an API standard is good enough to make the implementation indeed interchangeable has led to organizations standardize on some specific Java EE product.

This situation had positive side effects for two parties: IT can extend its claim, Java EE vendor can sell more licenses. And it has a terrible side effect for one party: You as a developer.

It’s up to you to qualify your software for different Java EE implementations of different versions. It’s up to you to describe operations of your software in conjunction with the specific IT-mandated version. When things go bad however, you will still get the blame.

Why is it so limited?

There is a pattern here: There is simply no point for Java EE vendors to extend the standard with anything helping you solve those problems, there is simply no point in providing standard means to help you ship customizable extensible business solutions.

Although it is hard to tell, considering the quality of the commercial tools I know of, but addressing the operational side and also solving modularity questions is definitely something that seemed to provide excellent potential for selling added value on the one side and effective vendor-lock-in on the other side.

This extends to the API specifications. When I was working on JCP committees in my days at SAP, it was rather common to argue that some ability should specifically be excluded from the standard or even precluded in order to make sure that you may well be able to develop for some Java EE server product but not in competition to it. And that makes a lot of sense from a vendor’s perspective. This is saying that

Java EE is a customization and extension tool for Java EE vendor solution stacks.

 

Not that any vendor was particularly successful in implementing that effect – thanks to the competition stemming from open source projects that have become de-facto standards such as the Spring Framework and Hibernate two name only two of many more.

Summary

Outside of an established IT organization, i.e. as a party selling solutions into IT organizations, it makes very little sense to focus on supporting a wide range of Java EE implementation and have yourself pay the price for it. Instead try to bundle as much infrastructure as possible with your solution to limit operational combinatorics.

To be fair: It is a good thing that we have Java EE. But one should not be fooled into believing that it is the answer to interoperabiltiy.

References

  1. Java EE, http://en.wikipedia.org/wiki/Java_Platform,_Enterprise_Edition
  2. JCP, http://en.wikipedia.org/wiki/Java_Community_Process