Java Data API Design Revisited

When domain entities get bigger and more complex, designing a safe, usable, future-proof modification API is tricky.


This article is on a simple but effective approach to designing for data updates.


Providing an update API for a complex domain entity is more complicated than most developers initially expect. As usual, problems start showing when complexity increases.

Here’s the setup: Suppose your software system exposes a service API for some domain entity X to be used by other modules.

When using the Java Persistence API (JPA) it is not uncommon to expose the actual domain classes for API users. That greatly simplifies simple updates: Just invoke domain class setters, and unless the whole transaction fails, updates will be persisted. There is a number of problems with that approach though. Here are some:

  • If modifications of the domain object instance are not performed in one go, other code invoked in between may see inconsistent states (this is one reason why using immutables are favourable).
  • Updates that require non-trivial constraint checking may not be performed on the entity in full but rather require service invocations – leading to a complex to use API.
  • Exposing the persistent domain types, including their “transparent persistence” behavior is very much exposing the actual database structure which easily deviates from a logical domain model over time, leading to an API that leaks “internal” matters to its users.

The obvious alternative to exposing JPA domain classes is to expose read-only, immutable domain type interfaces and complement that by service-level modification methods whose arguments represent all or some state of the domain entity.

Only for very simple domain types, it is practical to offer modification methods with simple built-in types such as numbers or strings though, as that leads to hard to maintain and even harder to use APIs.

Alas, we need some change describing data transfer object (DTO – we use that term regardless of the remoting case) that can serve as a parameter of our update method.

As soon as updates are to prepared either remotely or in some multi-step editing process, intermediate storage of yet-to-be-applied updates needs to be implemented and having some help for that is great in any case. So DTOs are cool.

Given a domain type X (as read-only interface), and some service XService we assume some DTO type XDto, so that the (simplified) service interface looks like this:


public interface XService {
 X find(String id);
 X create(XDto xdto);
 X update(String id, XDto xdto);


If XDto is a regular Java Bean with some members describing updated attributes for X, there are a few annoying issues that take away a lot of the initial attractiveness:

  • You cannot differ a null value from undefined. That is, suppose X has a name attribute and XDto has a name attribute as well – describing a new value for X’s attribute. In that case, null may be a completely valid value. But then: How to describe the case that no change at all should be applied?
  • This is particularly bad, if setting some attribute is meant to trigger some other activity.
  • You need to write or generate a lot of value object boilerplate code to have good equals() and hashcode() implementations.
  • As with the first issue: How do you describe the change of a single attribute only?

In contrast to that, consider an XDto that is implemented as an extension of HashMap<String,Object>:

public class XDto extends HashMap<String,Object> {
  public final static String NAME = "name";
  public XDto() { }
  public XDto(XDto u) {
    if (u.containsKey(NAME)) { setName(u.getName()); }
  public XDto(X x) {
  public String getName() {
    return (String) get(NAME);
  public void setName(String name) {

Apart from having a decent equals, hashcode, toString implementation, considering it is a value object, this allows for the following features:

  • We can perfectly distinguish between null and undefined using Map.containsKey.
  • This is great, as now, in the implementation of the update method for X, we can safely assume that any single attribute change was meant to be. This allows for atomic, consistent updates with very relaxed concurrency constraints.
  • Determining the difference, compared to some initial state is just an operation on the map’s entry set.


In short: We get a data operation programming model (see the drawing below) consisting of initializing some temporary update state as a DTO, operating on this as long as needed, extracting the actual change by comparing DTOs, sending back the change


Things get a little more tricky when adding collections of related persistent value objects to the picture. Assume X has some related Ys that are nevertheless owned by X. Think of a user with one or more addresses. As for X we assume some YDto. Where X has some method getYs that returns a list of Y instances, XDto now works with YDtos.

Our goals is to use simple operations on collections to extend the difference computation from above to this case. Ideally, we support adding and removing of Y’s as well as modification, where modified Y‘s should be represented, for update, with a “stripped” YDto as above.

Here is one way of achieving that: As Y is a persistent entity, it has an id. Now, instead of holding on to a list of YDto, we construct XDto to hold a list of pairs (id, value).

Computing the difference between two such lists of pairs means to remove all that are equal and in addition, for those with the same id, to recures into YDto instances for difference computation. Back on the list level, a pair with no id indicates a new Y to be created, a pair with no YDto indicates a Y that no longer is part of X. This is actually rather simple to implement generically.

That is, serializated as JSON, the delta between two XDto states with modified Y collection would look like this:

    {“id”:”1”, “value”:{“a”=”new A”}},             // update "a" in Y "1"
    {“id”:”2” },                                   // delete Y "2"
    {“value” : {“a”=”initial a”, “b”:”initial b”}} // add a new Y

All in all, we get a programming model that supports efficient and convenient data modifications with some natural serialization for the remote case.


The supplied DTO types serve as state types in editors (for example) and naturally extend to change computation purposes.

As a side note: Between 2006 and 2008 I was a member of the very promising Service-Data-Objects (SDO) working group. SDO envisioned a similar programming style but went much further in terms of abtraction and implementation requirements. Unfortunately, SDO seems to be pretty much dead now – probably due to scope creep and lack of an accessible easy to use implementation (last I checked). Good thing is we can achieve a lot of its goodness with a mix of existing technologies.