Modularization And Data Relationships

A lot of posts in this blog are on structure largish applications via one or another modularization approach. Size is not the only reason to modularize though. Another important reason to split into independent subsystems is optionality. For example given a core product, industry specific extensions may exist that rely on the core system’s APIs and data models but are made of significant code and data structures to justify an independent software life cycle. While code dependencies have been treated extensively in previous posts, we have not looked much at data relationships.

This post addresses data model dependencies between software subsystems of a modularized software system.

Data Model != Code

When we talk about data model dependencies between subsystems, we are talking about data types, in particular data types representing persistent data, that are exposed by one subsystem to be used by other subsystems.

Exposing data type, i.e. making the knowledge of their definition available between subsystems is typically done using an API contract – be it using a regular programming language, or some data description language such as XML schema.

Making data types available between subsystems is only one side of the story however. Eventually data described by the model will need to be exchanged and combined with other data. Typically describing data and providing access to data is part of a subsystem API and combining data from different subsystems is done by the caller:

In many cases, however, this is not good enough: If subsystems share the same database, losing the ability to query data across domain definitions of subsystems can be a real showstopper for modularization.

An Example

Consider the following hypothetical example: An application system’s core functionality is to manage equipment records, such as for computers, printers, some machinery. An extension to the core system provides functionality specific to the health care industry. In health care, we need to adhere to more regulation than the core system offers. For example we need to observe schedules for inspections by third-party inspectors.

The health care extension has hence its own data base structures that refer to data in the core system. It uses database foreign keys for that. That is, in its database tables we find fields that identify data records in the core system’s database tables.

Now, for a single inspection schedule, that refers to one or more pieces of equipment, looking up the individual equipment via a core system poses no problem. Answering a question such as what is the top-ten pieces of equipment of some given health care inspection schedule that have the most service-incidents is a different case. Providing efficient, sort-able access to any such combined data to end users via independent data retrieval is hard if not a pointless goal in the general case. This is what joins in the relational database world are for after all.

Joining Data between subsystems

Given the construction above, we want to extend the core systems API to not only support single data lookups but also a query abstraction for more clever data retrieval, in particular so that combined data queries, i.e. SQL joins, would be computed on the database rather than in memory.

Unfortunately, a query interface that would include data defined outside of the core application’s data model can typically not be expressed easily.

We have a few choices on providing meaningful access to joinable data between subsystems however.

Exposing Data

For once the core application may simply document database table or database view structures to be used by other subsystems for querying its data.

This way, the health case extension would extend its own database model by those portions of the core systems data base model that are a) of relevance to the extension and b) part of the new API contract that includes these data access definitions. This would only be used for read-only access as there is no natural knowing by the extension what other data update and validation logic may be implemented in the core system.

Something integrated is possible using the Java Persistence API, and should be similarly (if not better) available in Microsoft’s .Net framework via the LINQ and LINQ to SQL features.

Views in Modular JPA

When using the Java Persistence API (JPA), an underlying relational data model is mapped onto a set of Java class definitions and some connecting relationships. Together, the mapping information, some more provider and database specific configuration, is called a persistence unit. At runtime all operations of a JPA Entity Manager, such as performing queries and updates, run within the scope of a persistence unit.

In our example, we would have a core persistence unit for equipment management and one persistence unit for the health care extension. As a principle, both would be private matters of the respective subsystem. We would not want the health care extension to make use of non-public persistence definitions of the core model as that would break any hope for a stable contract between core and extension.

As such there is no simple sharing mechanism for persistence units that would allow to expose a subset of the core persistence unit to other subsystems as part of an API contract.

A close equivalent of a simple database view that is still within the JPA model however is read-only JPA classes that can be included into an extension’s persistence unit definition.

That is, we would have the same data types used in different persistence units. For the extending subsystem, those types appear as a natural extension to its own data model and can hence be used in joining queries, while defined in and hence naturally fitting to the core’s domain model.

As a mix of Entity-Relationship and Class Diagram this would look like this:

Highlighting the scope of persistence units:

Now we are at a point where data access and sharing between subsystems is well-defined and as efficient as if it was not split into separated domain models.

It’s time to move on to the next level.

Data Consistency – or what if data gets deleted?

When split into subsystems, also responsibility of data management gets split. Let’s take a look at our example.

In the health care extension to the equipment management system, inspection schedules, stored in the extension’s domain model refer to equipment data stored in the core application’s domain model. Based on the ideas above, the health care extension can efficiently integrate the core data model into its own.

But then, what happens on updates or on deletions issued by the core application’s equipment management user interface? It would be simplistic to assume that there are no restrictions imposed by the health care extension. Here are some possible considerations:

  • Deletion of equipment should only be possible, if some state has been cleared in the extension.
  • Updating equipment data might be subject to extended validation in the extensions
  • Can the extension subsystem be inactive and miss changes by the core application so that it would end up with logically inconsistent data?

We will look into these exciting topics in a next post: Updates in Modular Data