Scoping JPA

This post is about the following question:

If you bought into using the Java Persistence API (JPA) and your solution gets bigger and bigger, should you a) limit the scope of where managed objects are visible and, b) if so, how?

There is a lot of bad things to say about JPA and a few good and rather strong points to it. On the bad side you might bring forward, that the abstraction is incomplete (need to understand SQL anyway), JPQL is not algebraically closed, large persistence units (PU) imply a loading time and memory penalty without even touching a single persistent bit, getting started is ugly, a given persistence unit cannot be extended from other PUs. In short, it violates so many significant properties of the relational model, it can become a little embarassing at times (see also [1] and [2]).

On the plus side however, simple things are simple, the technology is mature, and – most importantly – any Java developer knows how to use it (or can be expected to). It is the standard API to access relational databases in Java. We do use JPA and it is for the latter reasons.

One key problem – in particular when leaving the realm of trivial applications, is that JPA-managed objects have a rather non-intuitive runtime behavior that can be the source of a lot of problems when not knowing the details of the application assembly extremely well: Detaching of objects with or without pre-fetch, lazy vs. eager fetching, accidental persistence updates without business-level validation due to simply calling a setter from a previously unforeseen control flow.

So in fact, there is a development scalability problem. A lack of clean separation of concerns, as in modularization and, as we will see below, to be mitigated via a modularization approach.

That is, to answer the first question, the visibility of domain types as JPA managed objects should be limited, exactly when encapsulation and information hiding becomes an issue due to the size or complexity of the solution at hand.

When projects start off small and don’t care about de-composing early, layering typically looks like this:

getting_started

The bold, dotted line in the picture above emphasizes that in JPA, the entity types of the persistence unit and their metadata do indeed represent the database stored model completely. And hence all Java binding of the domain model is in one place. Unlike JDBC-POJO -mapping tools like myBatis, this is inherently not use-case driven, potentially overlapping views onto the data model.

When the code base grows and gets modular (one way or the other – be it with or without runtime support of modules and separation of API and implementation), the natural tendency, out of convenience and simplicity, is to spread the use of managed entities across otherwise separated concerns:

too_big_to_maintain

(The light arrows imply visibility in the picture above.)

Now all the issues mentioned in the beginning start to show. Concerns that should feel separated stay tightly coupled due to implicit requirements when operating with entities directly.

We are finally about to tackle the second question.

As functions get separated into modules to achieve encapsulation, so should the overall data model be de-composed into separated concerns. That is, different persistence units should be used to represent different parts of the data model. In terms of Domain-Driven-Design (see [3]) this corresponds to the concept of bounded contexts. So, you might say PU==Bounded Context. But of course, it means nothing else but breaking up the overall model. Note that the resulting splits may indeed overlap in the database schema, at least for reading, but not in the code architecture.

Once the split of persistence units, or bounded contexts, is in place, the visibility of JPA types should be bounded to modules that require deep integration with persistent structures. Or put differently, the functional abstractions that structure all the code providing behaviour associated with the persistence unit may require a modular approach due to its complexity and so the bounded context spans several modules. One of these then naturally holds all type definitions of the JPA persistence unit. All other access to the respective bounded context should be hidden behind higher level API abstractions. In fact, the classical term subsystem seems quite adequate to capture this scope. Ideally a subsystem comprises of only one module allowing to completely hide domain types from visibility by other parts of the solution:

two_bcs

As an example consider some large business context as something that is naturally subject to functional extensions and that grows to many modules over time, while some conceptually more strictly constraint user management subsystem can be completely encapsulated into one module.

That brings another rather lengthy post to an end.

As a final remark note that all that has been discussed above may well apply to JPA persistence in particular, due to its specifics as laid out in the beginning. At the end of the day however, the choice of persistence access technology is rather irrelevant with respect to our conclusions. In order to make sure data is consistently validated and consistently exposed and interpreted, limiting all direct access and all application level representation of model data to one coherent implementation is a smart move in any case.

References

[1] “An Introduction to Database Systems”, Chris Data

[2] “The Vietnam of Computer Science”, Ted Neward,

[3] Domain-Driven-Design, Eric Evans