Updates in Modular Data

In a modular system, data also tends to be modularized. In the post Modularization And Data Relationships we looked at cross-subsystem and cross-domain data dependencies and how to make those available for efficient querying.

In this second part we discuss aspects of update, in particular deletion, of data that other data may depend on. Remember, we are considering a modular domain in which some central piece of data (we will call that dependency data) is referenced by extension modules and domain data (we call that dependent data) that were not specifically considered in the conception of the shared data.

In our example we chose a domain model of some Equipment management core system (the dependency data) that is referenced by extension modules and data definitions, in our case Inspections and Schedules (the dependent data), of a health care extension to that core system:

When equipment data changes or gets deleted dependent inspection data may easily become invalid. For example, a change of equipment type may mean that inspection types or schedules needs to be updated as well. And of course, inspections for deleted equipment will be obsolete altogether.

There is also the technical aspect of foreign key relationships between the inspection database table and the equipment database table. If the foreign key has become obsolete and unresolvable, all equipment information that was previously resolved by joining database tables via the foreign key relationship would be gone and cannot be presented to users anymore.

Simply put there are two choices of how to handle deletions:

  • Do actually not delete records but only mark records as unavailable.
    In that case, for dependent data the original data set is still available and follow up actions may be offered to users. But the domain model becomes more complicated (e.g. w.r.t. uniqueness constraints).
  • Delete and simply cope with it.
    That means: The inspection application needs to be completely coded towards the case that dependency data may be gone. And whatever is needed to supply users with necessary information to plan follow-up action needs to be stored with the dependent data.

Similarly for updates:

  • When data updates to an extent that dependent data has become invalid this situation needs to be discovered when dependent data is visited again to plan and perform whatever corrections are required.

While these approaches look theoretically sound, there is much to be improved. Firstly, developing an application to cope with any kind of inconsistency implied by changes of dependency data will be rather complex in the general case.

Secondly, from a user perspective, it will typically be highly undesirable to even be allowed to perform updates that will lead to situations that require follow-up actions and repairs without being told beforehand.

Hence:

How can we generically handle updates of dependency data not only consistently but also user-friendly?

Here is at least one approach:

A Data Lease System.

With that approach we use a shared persistent data structure that expresses its relationships between dependent and dependency data explicitly and that is known by all subsystems.

In the simplest case, a lease from a dependent data record of a dependency data record holds:

  1. The dependent data record key and a unique domain type identifier for which that key is valid.
    No other present or future domain use of that identifier should be possible.
  2. Likewise the dependency data record key with a corresponding domain type identifier,

( <dependent type>, <dependent key>, <dependency type>, <dependency key>)

In our example this could be something like this:

( “inspection.schedule”, “11ed12f2-8c94-41a7-9143-8d4ff6070f”, “equipment.equipment”, “c5f92006-dc40-41aa-97d0-3ae709b4aca6”)

In other words: From a database perspective, the lease is simply a shared join table structure that is annotated with additional meta-information on “what” is being talked about.

While updating lease information is delegated to a shared service in order to provide additional handling for one-to-many, many-to-one, or many-to-many relationships, cleanup and generic check methods, from a read-access perspective, the principles of Modularization And Data Relationships can be applied in full and the table structure should indeed be used as a join structure establishing the actual relationship.

That is, in the example, the health care to equipment relationship is now expressed via the data lease.

Now, if we have this system in place numerous improvements are readily available.

Giving Users a Choice

First of all the equipment application can now trivially determine, if there are dependent data sets, how many, and of what kind (via the additional domain type identifier).

Based on that information the equipment application could offer choice to users before applying an update. For example, upon a request for deletion, the application may inform the user that the piece of equipment is still referenced by dependent data and refuse the deletion.

For updates this does however not remove the need for understanding the implications on dependent data.

The approach becomes most useful if we pair it up with an extension point approach, in which the domain type identification proposed above is used to look up implementations of callback interfaces provided by the extension modules of the subsystem owning the dependent data.

Giving Power to the Extension

Once we can generically identify the owner of the lease, we can pass on some decision support to the dependent subsystem. In particular: Upon deletion or update, the dependent subsystem may be made aware and analyze exactly what the implications would be.

We can remove all needs for lazy responses to changes of dependency data by involving the dependent subsystem in the update processing in terms of validating the update and eventually handling the update.

In our example, a change to equipment type would be validated by the inspection application. The type change may be unacceptable upon which the user should be informed that deletion is not possible or that the update would disable an inspection schedule. Or, in some other cases the user would be told that it is OK to proceed, but that some settings of dependent inspection data would be altered automatically.

Similarly, a deletion of an equipment record may be rejected, or the user be told that all related inspection data would also be deleted as a consequence.

In any case: Inspection data would be consistent and no lazy checks required anymore.

There is still much to be defined for the specific software system of course. Enjoy designing!