Software Design Part 3 – Working With Persistent Data

This is a follow-up on the post Software Design – The Big Picture (Intro) and Software-Design – Part 2: Structuring For Control Flow.

Imagine a world where there is no constraint on availability of computational power and memory. We would have software that could simply run forever and could design it in a way that all data it processes is indefinitely kept within its runtime memory. Would we call the data it manages persistent data? It will not get lost, right? That is in fact not what we generally refer to as persistent data.

Persistent data in the sense used here is data that is kept independently from the execution of any specific interpreting or modifying software and may be useful for a variety of different existing or yet to be designed applications. And that is where the trouble starts.

Generally speaking, we need to consider that the same data set is accessed from different execution environments at the same time, implying consistency and concurrency considerations.

Secondly in many cases, just as we want to have a modular software design we want our data to be up to serving such design.

Finally, we want an implementation pattern for data access that is easy to implement, easy to use for dependent code, and provides good control over the former two aspects.

Consistency and Concurrency

The cornerstone technique for consistent data access is transactional data access as implemented by your typical relational database management system (RDBMS). While there is some niche scenarios where this is not required, any implementation scenario where control flows spans an a-priori unknown scope of data access (which is generally true for modular, extensible applications) will rely on transactional data access for operational state management. Trying otherwise is pointless.

As laid out in the previous post (Notes on Working with Transactions), a transaction should typically span an “interaction”, be it a user interaction or a “step” in a background workflow execution. This is the span where your code will invoke data changing operations.

While transactional database access will guarantee that all or no changes of transaction will be applied, it does not necessarily guarantee consistent reading of data nor does it prevent concurrent changes of data. At times you need to make sure that some control flow has exclusive access to some portion of data. That is what (so-called pessimistic) locking is for. As the use of “pessimistic” suggests, there is also “optimistic” locking, which is beneficial only for certain scenarios however and hardly in the general case.

Modular Data

An intrinsic feature of the relational data model is that it allows for arbitrary relationships between tabular data for as long as you can make data types fit. In particular this means any given data model may be enhanced by adding related data. By modeling constraints, even new consistency rules may be added to an existing data model.

If your application structure is reflecting that same modularization, you would want to re-use data type definitions just the same, instead of re-implementing extended domain model types for every extension.

While the relational model is a natural fit here, object-relational mapping tools like JPA for Java make this a little less-obvious. It is possible but requires some care when crafting shared entity classes. I wrote about this in Modularization And Data Relationships.

We will not go into domain model design here. There is an abundance of literature on that and I wouldn’t know how to add to that. See e.g. the classics Domain Driven Design, and The Data Model Resource Book.

In any case, extensibility is a key aspect of data modeling and data model implementation. If you are not prepared for the future, you will not be ready for the present!

Designing Repositories, Implementing Data Access

Given a data model, say, already modeled via some database schema, you need to represent data structures and access methods to those parts of the application that need to read and write data.

  • Data types specified using your programming language of choice that represent to the database schema – the domain API.
  • And a Repository API:
    • Data access methods for reading entities of your data model into memory (typically by id and paged for display etc.)
    • Data access methods for writing changes of entities to the database.

I strongly advocate to separate read access to data from write access and to have more or less completely behavior-free data type implementations. The repository API should very much follow a Command-Query-Responsibility-Segregation (CQRS) approach with complex update objects.

That is:

  • Design read methods matching your query needs.
  • Have one or very few write methods per updatable domain entity type that takes a structured update data object, describing possible updates.

This is not an object oriented design. But, let’s face it: Data modeling does not fit object orientation very well. Here are some benefits of this design:

  1. When mixing structure with update behavior in the implementation of your data model, you easily end up with a spread out update logic that has unclear validation times and is not very instructive on where to find the right update logic.
    In contrast, going for an update object forces you to design a document that explains what is the possible scope of update plus it makes it easy to standardize complex updates (e.g. of nested collections) and has a very natural point in time to apply validatation logic.
  2. Adding business level behavior to entity implementations is a bad idea as it tends to ignore the possibility that there may be many future extensions with modified or extended behavior for the same data.
    In contrast, strictly separating these two aspects makes sure we do not force any competition on entity model interfaces by extensions of business functions working on the data.
  3. A dedicated update data “document” structure is light on your implementation as it is substitutes for possibly many small update methods and is highly efficient in terms coding effort as it can be re-used in service interfaces and user interface models.

I wrote about this in the past (Java Data API Design Revisited). This is similar to the concept of Service Data Objects (I was representing SAP in the SDO expert group at the time). An idea that is even more effective within the application than it is outside: Use a generic to use update descriptor that can be used on many layers to describe modifications on a domain model:

Conclusion

Describing more implementation details here is beyond the scope of this blog. But, if there is one thing to take from this post:

Carefully crafting domain APIs and repositories to not only effectively represent the data model but also provide a simple and widely usable data API that is exensible and instructive is probably the best implementation-related investment possible.