This is a follow up to the article Notes on Working with Transactions.
While there are constraints to keep in mind when working with transactional resources, the main point of the article, there is the one thing that keeps matters in shape: If things go wrong, simply roll back! This is the the all-or-nothing quality of atomic state change in transactional processing: Either the whole state change is applied or none of it.
This post is about handling cases where this assumption cannot be made.
Naturally this occurs when working with an inherently non-transactional resource like the average file system or remote web service.
Another prominent case results from breaking a long running state change, even when implemented over a transactional database, into many small transactions. Even if we are technically working with a transactional resource, due to other constraints such as long execution time, we are forced to implement an overall non-atomic state change.
Unfortunately there is no single generic approach that would fit all cases. There is however ways of reducing complexity into workable pieces.
In order to get there, let’s work out some basic observations:
It is All About Handling Failure
Considering the introduction, this may sound obvious. However, what it is we do, if things go wrong and the system leaves us with a partial state change?
For an automation script that is run once in a while this may not be a crucial question. For a business process running millions of times, failure is a normal and repeating aspect of execution that needs to be taken into account.
The crux is to make sure the system is never in a state that prevents either of the following to actions:
Repetition: If a previous attempt at changing the system state failed due to some external problem (unavailabilty of file system, power outage), the attempt at state change must be repeatable. That is, the system or the user needs to understand that the attempted state change failed and how to start over.
For example: If the state change implies moving a file, a repetition would check if the file was moved and only try again if not.
Compensation: If it is clear that a state change will not be completed, or if that is not desirable, it must be possible for the system or the user to understand the impact of a partial change and possibly how to undo it.
For example: If the state change marked some database entries as deleted by setting a deletion flag, identify deletions by transaction id and unset the delete flag.
For both actions there is an underlying requirement that is even more essential
At any time during a state change the system is always within its consistency model
Technically this means that the scope of what has to be considered consistent, and hence what is acceptable precondition to a state change, has just become considerably broader.
Implement a State Chart
In reality however, processes quickly get complicated and assuring repeatability and compensability becomes a non-trivial exercise.
Consider the following still simple example: Suppose we need to
- Pick up a file F from a remote file system
- Send its content to some remote REST service – at most once. Ask for help if failing.
- Move it to some folder to depending on whether processing completed successfully or not.
Sounds easy enough. A simple flow chart could render this process like this:
However that does tell us very little about how to handle failures – or where to pick up work if an attempt at running the process failed previously. For that it is more suitable to create a state chart. The natural benefit of the state engine model is that it tells us right away, where work may be interrupted and be continued – hence allowing for repeated execution. A trivial state chart would complete work in one go. But as we want to send file content no more than once, we need to safe-guard against duplicate attempts, and as we want to avoid getting tricked into failed operations by broken (remote) file system access, we add some extra states pre-file-moving:
Given the state chart we can now concentrate on implementing robust and repeatable state transitions that only need to worry about simple preconditions. For example, in the processing state we would check only for a previous attempt. In the errored or sent state we only need to check for whether the file has already been moved.
Let us consider how an implementation based on the state chart above would behave under failure:
|File access failed during read of file||Stay in processing. Pause and retry.|
|Notice a second sending attempt.||Give up as we do not know whether the sending actually completed before but we failed to notice. Case to resolve manually.|
|File move failed||Must have completed sending attempts. Stay in sent or errored respectively, pause and retry.|
Obviously, in order to implement that state chart, you need some state persistence model. Describing that and how to provide feedback to users is out of scope of this article. Depending on your needs and scenarios a simple database table to manage a stateful process may be sufficient. Other cases may benefit from implementation tools such as Spring Batch. Others may demand a complete Business Process Management suite – but then you would most likely not read this post.
Going by this artificial example, the point of this post is that in the absence of transactional resources, non-trivial processes may be implemented reliably and robustly but require significant more care and modeling attention. Very much like real world processes involving people and physical resources.