Transactional state handling is nothing we think much about anymore. It’s there. To the extent, that – I suspect – many have not thought much about it in the first place.
This post is about some, say, transaction potholes that I ran into at times – fooled by my own misleading intuition. And then… it is a little database transactions 101 that you may have forgotten again.
When I left university I had not studied database theory. Many Math and CS students did not. And yet I spent most of my professional career working for software solutions that have a relational database system (RDBMS) such as Oracle, Postgres, DB2, MS SQL Server or even MySQL, if not at their heart, than at least as their spine.
I have also worked on solutions that relied on self-implemented file system storage, when an RDBMS could have done the job. I consider that a delusional phase and waste of time.
There is of course limits, and there is problems where an RDBMS is not suitable – or available. Where it is however, it the one solution because:
- There is a rather well-defined and well-standardized storage interface (SQL, Drivers) to store, retrieve, and query your data.
- The relational algebra is logically sound, mostly implemented, well-documented, and really flexible, and actually proven to be so.
- Any popular RDBMS system provides for an operational environment, can be extended with professional support, has backup & recovery methods, etc, etc.
- It is transactional!
Let’s talk about the last bullet point. The key feature of transactional behavior is that of its all or nothing promise: Either all your changes will be applied or none. You will have heard about that one.
This is so important not because it is convenient and saves you some work of change compensation. It is important because normally you have absolutely no idea what are those changes that your code, the code you called, or the code that was called by the code you called actually made! That’s big.
But what is in that “all or nothing” scope? How do you demarcate transations?
That depends a bit on what you are implementing. The principle of least surprise is your friend though. There are some simple cases:
- User interactions are always a good transaction scope
- Processing an event or a service call with a response is a good transaction scope
Or to cut things short a good scope is every control flow…
- … that represents a complete state change in your system’s logic and
- … that does not take long.
Why should it not take long?
There is hardly a system that executes a single control flow. There will be many concurrent control flows – otherwise nobody will want to use the system, right? And they share the same database system. The real problem of a long running transaction is not so much that it holds on to a database connection for a long time, which may or may not be a sparse resource, but that it may prevent other control flows from proceeding by holding on to (pessimistic) locks in your database:
In order to prevent you from creating nonsense updates an RDBMS can and does provide exclusive access to parts of the data stored – e.g. in the form of a row-level lock when updating a record. There is extremely good reasons for that you can read up on elsewhere. Point is: It happens.
And as you have effectively no idea what updates are caused by your code – as we learned above – you can be sure that your system will run into blocked concurrency situation and will not be responsive and will not scale well in the presence of a long running transaction scheme. You do not want that.
Coming back to transaction demarcation – here is a classic. In Java EE, a long time ago, when it was still called J2EE, there was really no declarative out-of-box transaction demarcation for Web applications. Following the logic of what is a good transaction demarcation above, the normal case is however that a single Web application request, at least when representing a user interaction, is a premier candidate for a transaction scope.
In an attempt to map what was thought to be useful for a distributed application, most likely because Enterprise Java Beans (EJB) had been re-purposed from remote objects to local application components (from hell), the proposed model du jour was to capture every Web application user transaction into a method invocation of a so-called Session Bean (EJB) – because those were by default transactional. See e.g. Core J2EE Patterns – Session Facade.
Now imagine due to some oversight, you called two of those for a single interaction: Two transactions. If the first invocation completed, a state change would be committed that a failure of the second invocation would not roll back and “boom!” you would have an incomplete or even inconsistent state change. Stupid.
To avoid that, it is best to move transaction demarcation as high as possible, as near as possible to the entry point as possible within your transaction management.
Sometimes however, you need more than one transaction within one control flow, even though there is no timing constraint. Sometimes you need nested transactions. That is, even before the current transaction terminates, the control flow starts another transaction.
This is needed if some deeper layer, traversed by the current control flow, needs to commit a state change regardless of the outcome of what is happening after. For example, a log needs to be written that an attempt of an interaction was performed.
Seeing this in code, a nested transaction typically radiates an aura of splendid isolation. But it is deceiving and dangerous. The same problem is as for long running transactions applies here: Your nested transaction may need to wait for a lock. In this case: It may need to wait for a lock that was acquired by the very same control flow – a deadlock!
The need is rare – but the concept so tempting that it is used definitely much more frequently than justified.
So what about long running state changes?
Unfortunately some applications need to implement state changes that take long. Some mass update of database objects, some long running stateful interaction. This is a rich subject in its own right and it cannot be excluded that there will be some posts on that in this blog.
- C. J. Date, An Introduction to Database Systems, https://openlibrary.org/books/OL9422130M/An_Introduction_to_Database_Systems
- Core J2EE Patterns – Session Facade, http://www.oracle.com/technetwork/java/sessionfacade-141285.html