Scrum Should Indeed Be Run Like Multiple Parallel Waterfall Projects

Normally I am not writing about processes and methodologies. Not my preferred subject really.

Lately however, I read an article (see below), that restated that agile is not like doing small waterfalls. I think that claim is misleading.

Over time, I have been working on all kinds of projects, ranging from proof of concept work to large systems for 24/7 production, from ongoing maintenance to custom extensions to existing solutions.

Each of those seemed to respond best to a different process approach.

For simple projects, it can be best to only have a rough outline or simply start from an existing example and just get going.

For maintenance projects, a Kanban approach, essentially a work stream comprised of  work items of limited conceptual impact can be best.

It gets more interesting when considering projects that are clearly beyond a few days of work and do have a perfectly clear objective. For example consider a specialized front end for some user group over an existing backend service.

As a paying customer, you would want to define (and understand) what should be the very specific result of the development effort as well as how much that will cost you. Therefore, as a customer, you naturally want development to follow a Waterfall Model:

It starts with a (joint) requirement analysis (the “why”) and a specification and design phase (the “how”). Let’s just call this the Definition Phase.

After Definition a time plan is made (implying costs) and the actual implementation can commence.

Once implementation completes the development result is verified and put into use – ideally on time and on budget. Or, as a simplified flow chart:

As we all know this approach does not work all to well for all projects.

Why is that?

Think of a project as a set of design decisions and work packages that have some interdependence, or simpler as a sequence of N work packages, where a single work package is always assumed to be doable by your average developer in one day. So, effectively, there is some prophecy, N steps deep, that after step X all prerequisites for step X+1 are fulfilled and that after step N the specification is implemented.

For very simple tasks, or tasks that have been done many times, the average probability of failure, that is, that the invariant above does not hold can be sufficiently small so that some simple measures like adding extra time buffers will make sure things still work out overall.

In software projects, in particular those that are not highly repetitive (think non-maintenance development projects), we typically find lots of non-repetitive tasks mixed with use of new technologies and designs that are implemented for the first time. In a situation like that, the probability of any sort of accurate project prediction from the start decreases rapidly with the “depth” of planning.

There are ways to counter this risk. Most notably by continuously validating progress and adapting planning in short iterations, for example in the form of a Scrum managed process.

While that sounds that we are discussing opposing, alternative process approaches, each having a sweet spot at some different point on the scale of project complexity, that is not so.

Execute Parallel Waterfalls

In fact: The gist of this post is that an agile process like Scrum is best run when considering it a parallel execution of multiple smaller waterfall projects.

Here is why: Many projects use Scrum as an excuse not to plan and design ahead of time, but instead only focus on short term feature goals – leaving design decisions to an implementation detail of a small increment. That is not only a great source of frustration as it propells the risk that even small increments end up brutally mis-estimated, it also leads to superficially designed architectures that – at best – require frequent and costly re-design.

Instead we should look for a combination of the two that on the one hand makes sure we make an upfront design of aspects of the overall project to an extent that we feel certain they can be done and estimated reliabl and yet, on the other hand preserve flexibility to adapt changed requirements when needed.

As a result we run multiple parallel waterfall projects, let’s call them part-projects that span one to several sprints, but use resources smartly when we need to adapt or for example work on bugs introduced by previous work.

Visualized simply as parallel execution lanes, processing several planned ahead part-projects, at sprint N we work on some subset

(B denoting a bug ticket) while at Sprint n+1 we proceeded and take in the next tasks:

The sprint cycle forces us to re-assess frequently and enables us to make predictions on work throughput and hence helps in planning of resource assignments. Our actual design and estimation process for part-projects is not part of sprint planning but serves as crucial input to sprint planning.

References

Advertisements

Microservices Once More

Imagine you were running a small company. For one line of products certain skills and some amount of working hours per unit are required. The people you work with have worked on many products previously and adopted a wide range of skills so that, as customer demand changes, responding to a change of workload is no problem.

Imagine now you would split teams by skill and turn each team into an independent company of its own that you outsource processing of all work requiring a certain skill to.

meshtoisolation

Would you imagine that to work better then your previous setup? Do you think that would be more efficient while equally responsive to changing needs? Would it make good use of resources?

I certainly would not. But that is what the Microservices Approach claims to be the thing:

Microservices are a software development technique—a variant of the service-oriented architecture (SOA) architectural style that structures an application as a collection of loosely coupled services. In a microservices architecture, services are fine-grained and the protocols are lightweight. The benefit of decomposing an application into different smaller services is that it improves modularity. This makes the application easier to understand, develop, test, and become more resilient to architecture erosion.

(https://en.wikipedia.org/wiki/Microservices)

Apart from the fact that this definition is overly packed with feel-good terms, it also gets causality upside-down.

Let’s read it in reverse: Yes, good modularity helps preserve architectural integrity and simplifies understanding, developing, and maintenance of a solution. But while good modularization helps identifying useful service interfaces, having service interface as such does not imply good or easily achieved modularization.

In fact, definition of modules or software components is best done by some subject of responsibility for some aspect of the solution, or – and that is now really important for this discussion – for non-functional requirements in the first place. The ability to identify service interfaces in this mix is mostly a result of the modularization at hand rather than the other way around.

Next, the fact that you identified service interfaces, does in no way mean that it is even remotely useful to distribute them in any loosely coupled (meaning via remote invocation interfaces) way. In particular, the more fine-grained services are defined, the harder and less meaningful it becomes to distribute them. Imagine services that rely on services that rely on other services.

Any remote interface introduced comes at a tremendous cost in complexity as you lose transactionality and simple refactor but introduce remote invocation performance and security problems, complex deployments, complex management and monitoring operations.

The thing is: As for outsourcing of business functions, there can be very good reasons to distribute application functions. Those are however never driven by discovery of some API that qualifies as service boundary but almost exclusively by non-functional requirements on components of the solution. For example:

You want to separate some expensive asynchronous load from the user interfacing parts of your application to avoid harming the user experience.

Your database system will be separated from your application server as it requires a single point of data ownership.

Some function requires specialized hardware or has license and security restrictions that prevents it from being embedded into an application directly.

Some parts of your application have much stricter robustness constraints and should be isolated from application failures. And very prominently:

Your system is integrating with some legacy system that is technology different or is not to be touched at all.

To Summarize…

  1. Do not use service interfaces as a driver of modularization – take a look from higher to identify responsibilies.
  2. Responsibilities drive good modularity not technological artifacts.
  3. Avoid the complexity of distributed deployments unless for clear non-functional requirements.

 

Simply Fooled

It is always good to start out with a diagram. Here’s one:

simply-fooled

It tries to say:

Tools that invest in “getting you started quickly” seem to always fail to stay productive or particularly useful in the mid and long term.

The typical example for this is visual programming tools or process designers of any kind. I yet need to see one that is useful beyond most simplistic demos.

I believe however the same problem is inherent with “scaffolding generation frameworks” like Ruby on Rails, Grails, and the like – possibly anything that claims “convention over configuration”.

While these approaches deliver some result quickly – eventually you will need to understand the framework, that is a pile of lots, lots of stuff that, once you lift the lid, can be totally overwhelming.

If you had started with much less, well-understood ingredients, you will be moving more slowly but at least on solid ground.

Not that I ever had an urge to use any of this kind of toolset and platform – I rather build from more essential building blocks that I understand and can keep under control – convinced that that leads to more robust, understandable, and maintainable solutions.

There is one example however that I use a lot and have a bit of a love-hate relationship with: Java Persistence Architecture (JPA).

And yet…

We use JPA a lot in most projects. It is rather convenient for simple tasks like storing records in the database and retrieving it for transactional data processing. The session cache is handy most of the time. Updating entities simply by setting values is nice. But you better not have complex queries, large transactions, or non-trivial relations. And better not ask for too much flexibility in combining modular domain models (a.k.a. Persistence Units in JPA speak). There has been so much stuff written on JPA, there is no need I repeat (the Vietnam of Computer Science however is a must).

Anyway, the point is: When you reach the limits of JPA, you better have your persistence tier already abstracted in a way that it does not hurt too much replacing the actual data access with other techniques where needed, which comes at a price.

Let’s say. If I had not invested in JPA previously, it would probably feel out of place and unwieldy now. Given the style of clean persistence API that we tend to implement on the application level these days, I would rather use some more accessible “bean mapper” tool.

Meaning…

Here is my little theory: Putting your main focus on solving 70% of the problem quickly compromises so hard on the completeness of the approach that the remaining 30% of the problem get overly expensive for its users. Worse, the “left-overs” may require to break your architecture, require unexpected 3rd party stuff, etc.

In addition, if you have made a choice for the early win, the price to pay later on may be due at the worst point in time, just when your architecture has to stand the test of growth.

Hence:

  • Don’t be lured by promises of simplifications of what is inherently not simple
  • Always try to look under the hood. It is not necessary to understand everything down to the bare metal – but knowing how much there is, helps getting an idea
  • Get into it, avoid what cannot be mastered in time – it will be mastered later!

Wish you a great holiday season!

Updates in Modular Data

In a modular system, data also tends to be modularized. In the post Modularization And Data Relationships we looked at cross-subsystem and cross-domain data dependencies and how to make those available for efficient querying.

In this second part we discuss aspects of update, in particular deletion, of data that other data may depend on. Remember, we are considering a modular domain in which some central piece of data (we will call that dependency data) is referenced by extension modules and domain data (we call that dependent data) that were not specifically considered in the conception of the shared data.

In our example we chose a domain model of some Equipment management core system (the dependency data) that is referenced by extension modules and data definitions, in our case Inspections and Schedules (the dependent data), of a health care extension to that core system:

When equipment data changes or gets deleted dependent inspection data may easily become invalid. For example, a change of equipment type may mean that inspection types or schedules needs to be updated as well. And of course, inspections for deleted equipment will be obsolete altogether.

There is also the technical aspect of foreign key relationships between the inspection database table and the equipment database table. If the foreign key has become obsolete and unresolvable, all equipment information that was previously resolved by joining database tables via the foreign key relationship would be gone and cannot be presented to users anymore.

Simply put there are two choices of how to handle deletions:

  • Do actually not delete records but only mark records as unavailable.
    In that case, for dependent data the original data set is still available and follow up actions may be offered to users. But the domain model becomes more complicated (e.g. w.r.t. uniqueness constraints).
  • Delete and simply cope with it.
    That means: The inspection application needs to be completely coded towards the case that dependency data may be gone. And whatever is needed to supply users with necessary information to plan follow-up action needs to be stored with the dependent data.

Similarly for updates:

  • When data updates to an extent that dependent data has become invalid this situation needs to be discovered when dependent data is visited again to plan and perform whatever corrections are required.

While these approaches look theoretically sound, there is much to be improved. Firstly, developing an application to cope with any kind of inconsistency implied by changes of dependency data will be rather complex in the general case.

Secondly, from a user perspective, it will typically be highly undesirable to even be allowed to perform updates that will lead to situations that require follow-up actions and repairs without being told beforehand.

Hence:

How can we generically handle updates of dependency data not only consistently but also user-friendly?

Here is at least one approach:

A Data Lease System.

With that approach we use a shared persistent data structure that expresses its relationships between dependent and dependency data explicitly and that is known by all subsystems.

In the simplest case, a lease from a dependent data record of a dependency data record holds:

  1. The dependent data record key and a unique domain type identifier for which that key is valid.
    No other present or future domain use of that identifier should be possible.
  2. Likewise the dependency data record key with a corresponding domain type identifier,

( <dependent type>, <dependent key>, <dependency type>, <dependency key>)

In our example this could be something like this:

( “inspection.schedule”, “11ed12f2-8c94-41a7-9143-8d4ff6070f”, “equipment.equipment”, “c5f92006-dc40-41aa-97d0-3ae709b4aca6”)

In other words: From a database perspective, the lease is simply a shared join table structure that is annotated with additional meta-information on “what” is being talked about.

While updating lease information is delegated to a shared service in order to provide additional handling for one-to-many, many-to-one, or many-to-many relationships, cleanup and generic check methods, from a read-access perspective, the principles of Modularization And Data Relationships can be applied in full and the table structure should indeed be used as a join structure establishing the actual relationship.

That is, in the example, the health care to equipment relationship is now expressed via the data lease.

Now, if we have this system in place numerous improvements are readily available.

Giving Users a Choice

First of all the equipment application can now trivially determine, if there are dependent data sets, how many, and of what kind (via the additional domain type identifier).

Based on that information the equipment application could offer choice to users before applying an update. For example, upon a request for deletion, the application may inform the user that the piece of equipment is still referenced by dependent data and refuse the deletion.

For updates this does however not remove the need for understanding the implications on dependent data.

The approach becomes most useful if we pair it up with an extension point approach, in which the domain type identification proposed above is used to look up implementations of callback interfaces provided by the extension modules of the subsystem owning the dependent data.

Giving Power to the Extension

Once we can generically identify the owner of the lease, we can pass on some decision support to the dependent subsystem. In particular: Upon deletion or update, the dependent subsystem may be made aware and analyze exactly what the implications would be.

We can remove all needs for lazy responses to changes of dependency data by involving the dependent subsystem in the update processing in terms of validating the update and eventually handling the update.

In our example, a change to equipment type would be validated by the inspection application. The type change may be unacceptable upon which the user should be informed that deletion is not possible or that the update would disable an inspection schedule. Or, in some other cases the user would be told that it is OK to proceed, but that some settings of dependent inspection data would be altered automatically.

Similarly, a deletion of an equipment record may be rejected, or the user be told that all related inspection data would also be deleted as a consequence.

In any case: Inspection data would be consistent and no lazy checks required anymore.

There is still much to be defined for the specific software system of course. Enjoy designing!

Modularization And Data Relationships

A lot of posts in this blog are on structure largish applications via one or another modularization approach. Size is not the only reason to modularize though. Another important reason to split into independent subsystems is optionality. For example given a core product, industry specific extensions may exist that rely on the core system’s APIs and data models but are made of significant code and data structures to justify an independent software life cycle. While code dependencies have been treated extensively in previous posts, we have not looked much at data relationships.

This post addresses data model dependencies between software subsystems of a modularized software system.

Data Model != Code

When we talk about data model dependencies between subsystems, we are talking about data types, in particular data types representing persistent data, that are exposed by one subsystem to be used by other subsystems.

Exposing data type, i.e. making the knowledge of their definition available between subsystems is typically done using an API contract – be it using a regular programming language, or some data description language such as XML schema.

Making data types available between subsystems is only one side of the story however. Eventually data described by the model will need to be exchanged and combined with other data. Typically describing data and providing access to data is part of a subsystem API and combining data from different subsystems is done by the caller:

In many cases, however, this is not good enough: If subsystems share the same database, losing the ability to query data across domain definitions of subsystems can be a real showstopper for modularization.

An Example

Consider the following hypothetical example: An application system’s core functionality is to manage equipment records, such as for computers, printers, some machinery. An extension to the core system provides functionality specific to the health care industry. In health care, we need to adhere to more regulation than the core system offers. For example we need to observe schedules for inspections by third-party inspectors.

The health care extension has hence its own data base structures that refer to data in the core system. It uses database foreign keys for that. That is, in its database tables we find fields that identify data records in the core system’s database tables.

Now, for a single inspection schedule, that refers to one or more pieces of equipment, looking up the individual equipment via a core system poses no problem. Answering a question such as what is the top-ten pieces of equipment of some given health care inspection schedule that have the most service-incidents is a different case. Providing efficient, sort-able access to any such combined data to end users via independent data retrieval is hard if not a pointless goal in the general case. This is what joins in the relational database world are for after all.

Joining Data between subsystems

Given the construction above, we want to extend the core systems API to not only support single data lookups but also a query abstraction for more clever data retrieval, in particular so that combined data queries, i.e. SQL joins, would be computed on the database rather than in memory.

Unfortunately, a query interface that would include data defined outside of the core application’s data model can typically not be expressed easily.

We have a few choices on providing meaningful access to joinable data between subsystems however.

Exposing Data

For once the core application may simply document database table or database view structures to be used by other subsystems for querying its data.

This way, the health case extension would extend its own database model by those portions of the core systems data base model that are a) of relevance to the extension and b) part of the new API contract that includes these data access definitions. This would only be used for read-only access as there is no natural knowing by the extension what other data update and validation logic may be implemented in the core system.

Something integrated is possible using the Java Persistence API, and should be similarly (if not better) available in Microsoft’s .Net framework via the LINQ and LINQ to SQL features.

Views in Modular JPA

When using the Java Persistence API (JPA), an underlying relational data model is mapped onto a set of Java class definitions and some connecting relationships. Together, the mapping information, some more provider and database specific configuration, is called a persistence unit. At runtime all operations of a JPA Entity Manager, such as performing queries and updates, run within the scope of a persistence unit.

In our example, we would have a core persistence unit for equipment management and one persistence unit for the health care extension. As a principle, both would be private matters of the respective subsystem. We would not want the health care extension to make use of non-public persistence definitions of the core model as that would break any hope for a stable contract between core and extension.

As such there is no simple sharing mechanism for persistence units that would allow to expose a subset of the core persistence unit to other subsystems as part of an API contract.

A close equivalent of a simple database view that is still within the JPA model however is read-only JPA classes that can be included into an extension’s persistence unit definition.

That is, we would have the same data types used in different persistence units. For the extending subsystem, those types appear as a natural extension to its own data model and can hence be used in joining queries, while defined in and hence naturally fitting to the core’s domain model.

As a mix of Entity-Relationship and Class Diagram this would look like this:

Highlighting the scope of persistence units:

Now we are at a point where data access and sharing between subsystems is well-defined and as efficient as if it was not split into separated domain models.

It’s time to move on to the next level.

Data Consistency – or what if data gets deleted?

When split into subsystems, also responsibility of data management gets split. Let’s take a look at our example.

In the health care extension to the equipment management system, inspection schedules, stored in the extension’s domain model refer to equipment data stored in the core application’s domain model. Based on the ideas above, the health care extension can efficiently integrate the core data model into its own.

But then, what happens on updates or on deletions issued by the core application’s equipment management user interface? It would be simplistic to assume that there are no restrictions imposed by the health care extension. Here are some possible considerations:

  • Deletion of equipment should only be possible, if some state has been cleared in the extension.
  • Updating equipment data might be subject to extended validation in the extensions
  • Can the extension subsystem be inactive and miss changes by the core application so that it would end up with logically inconsistent data?

We will look into these exciting topics in a next post: Updates in Modular Data

Java 9 Module System – Useful or Not?

Actually… rather not.

I am currently working on preparing z2-Environment version 2.6. While it is not used by a lot of teams, where it is used we have large, modular, code bases.

This blog is packed with posts on all kinds of aspects of modularization of largish software solutions. Essentially it all boils down to isolation, encapsulation, and sharing to keep complexity under control by separating inner complexities from the “public model” of a solution, in order to foster long-term maintainability and ability to change and extend.

Modularization is a means of preserving structural sanity and comprehensibility over time.

That said, modularization is a concern of developing and evolving software solutions – not libraries.

The Java 9 module system is however exactly that: A means to express relationships between JAR libraries within JAR libraries. I wrote about it before a while back (Java Modularity – Failing once more?). Looking at it again – I found nothing that makes it look more useful.

First of all very few libraries out there will have useful module descriptors – unless they work together trivially anyway. Inconsistent Maven dependencies are bad enough, but can usually worked around in your own assembly. A bad or missing module descriptor essentially requires you to change an existing library.

Even, if all was good, clean, consistent with our popular third-party libraries, what problem would those module-infos and the module system actually solve for us?

The only effective means to really hide implementation details to the extent of keeping definitions completely out of visibility, the means to very controlled expose definitions, even if other versions of identically-named definitions are present in the system is still a class loader based modularization approach. Java 9 modularization does not preclude that. It does not add anything useful either as far as I can tell.

Z2 v2.9 will not have explicit integration for Java 9 module system for now – for lack of usefulness.

References

Some Notes On Working with Non-Transactional Resources

This is a follow up to the article Notes on Working with Transactions.

While there are constraints to keep in mind when working with transactional resources, the main point of the article, there is the one thing that keeps matters in shape: If things go wrong, simply roll back! This is the the all-or-nothing quality of atomic state change in transactional processing: Either the whole state change is applied or none of it.

This post is about handling cases where this assumption cannot be made.

Naturally this occurs when working with an inherently non-transactional resource like the average file system or remote web service.

Another prominent case results from breaking a long running state change, even when implemented over a transactional database, into many small transactions. Even if we are technically working with a transactional resource, due to other constraints such as long execution time, we are forced to implement an overall non-atomic state change.

Unfortunately there is no single generic approach that would fit all cases. There is however ways of reducing complexity into workable pieces.

In order to get there, let’s work out some basic observations:

It is All About Handling Failure

Considering the introduction, this may sound obvious. However, what it is we do, if things go wrong and the system leaves us with a partial state change?

For an automation script that is run once in a while this may not be a crucial question. For a business process running millions of times, failure is a normal and repeating aspect of execution that needs to be taken into account.

The crux is to make sure the system is never in a state that prevents either of the following to actions:

Repetition: If a previous attempt at changing the system state failed due to some external problem (unavailabilty of file system, power outage), the attempt at state change must be repeatable. That is, the system or the user needs to understand that the attempted state change failed and how to start over.

For example: If the state change implies moving a file, a repetition would check if the file was moved and only try again if not.

Compensation: If it is clear that a state change will not be completed, or if that is not desirable, it must be possible for the system or the user to understand the impact of a partial change and possibly how to undo it.

For example: If the state change marked some database entries as deleted by setting a deletion flag, identify deletions by transaction id and unset the delete flag.

For both actions there is an underlying requirement that is even more essential

At any time during a state change the system is always within its consistency model

Technically this means that the scope of what has to be considered consistent, and hence what is acceptable precondition to a state change, has just become considerably broader.

Implement a State Chart

In reality however, processes quickly get complicated and assuring repeatability and compensability becomes a non-trivial exercise.

Consider the following still simple example: Suppose we need to

  1. Pick up a file F from a remote file system
  2. Send its content to some remote REST service – at most once. Ask for help if failing.
  3. Move it to some folder to depending on whether processing completed successfully or not.

Sounds easy enough. A simple flow chart could render this process like this:

flow_chart

However that does tell us very little about how to handle failures – or where to pick up work if an attempt at running the process failed previously. For that it is more suitable to create a state chart. The natural benefit of the state engine model is that it tells us right away, where work may be interrupted and be continued – hence allowing for repeated execution. A trivial state chart would complete work in one go. But as we want to send file content no more than once, we need to safe-guard against duplicate attempts, and as we want to avoid getting tricked into failed operations by broken (remote) file system access, we add some extra states pre-file-moving:

state_chart

Given the state chart we can now concentrate on implementing robust and repeatable state transitions that only need to worry about simple preconditions. For example, in the processing state we would check only for a previous attempt. In the errored or sent state we only need to check for whether the file has already been moved.

Let us consider how an implementation based on the state chart above would behave under failure:

File access failed during read of file Stay in processing. Pause and retry.
Notice a second sending attempt. Give up as we do not know whether the sending actually completed before but we failed to notice. Case to resolve manually.
File move failed Must have completed sending attempts. Stay in sent or errored respectively, pause and retry.

Summary

Obviously, in order to implement that state chart, you need some state persistence model. Describing that and how to provide feedback to users is out of scope of this article. Depending on your needs and scenarios a simple database table to manage a stateful process may be sufficient. Other cases may benefit from implementation tools such as Spring Batch. Others may demand a complete Business Process Management suite – but then you would most likely not read this post.

Going by this artificial example, the point of this post is that in the absence of transactional resources, non-trivial processes may be implemented reliably and robustly but require significant more care and modeling attention. Very much like real world processes involving people and physical resources.

Notes on Working with Transactions

Transactional state handling is nothing we think much about anymore. It’s there. To the extent, that – I suspect – many have not thought much about it in the first place.

This post is about some, say, transaction potholes that I ran into at times – fooled by my own misleading intuition. And then… it is a little database transactions 101 that you may have forgotten again.

Recap

When I left university I had not studied database theory. Many Math and CS students did not. And yet I spent most of my professional career working for software solutions that have a relational database system (RDBMS) such as Oracle, Postgres, DB2, MS SQL Server or even MySQL, if not at their heart, than at least as their spine.

I have also worked on solutions that relied on self-implemented file system storage, when an RDBMS could have done the job. I consider that a delusional phase and waste of time.

There is of course limits, and there is problems where an RDBMS is not suitable – or available. Where it is however, it the one solution because:

  1. There is a rather well-defined and well-standardized storage interface (SQL, Drivers) to store, retrieve, and query your data.
  2. The relational algebra is logically sound, mostly implemented, well-documented, and really flexible, and actually proven to be so.
  3. Any popular RDBMS system provides for an operational environment, can be extended with professional support, has backup & recovery methods, etc, etc.
  4. It is transactional!

Let’s talk about the last bullet point. The key feature of transactional behavior is that of its all or nothing promise: Either all your changes will be applied or none. You will have heard about that one.

This is so important not because it is convenient and saves you some work of change compensation. It is important because normally you have absolutely no idea what are those changes that your code, the code you called, or the code that was called by the code you called actually made! That’s big.

But what is in that “all or nothing” scope? How do you demarcate transations?

That depends a bit on what you are implementing. The principle of least surprise is your friend though. There are some simple cases:

  1. User interactions are always a good transaction scope
  2. Processing an event or a service call with a response is a good transaction scope

Or to cut things short a good scope is every control flow…

  • … that represents a complete state change in your system’s logic and
  • … that does not take long.

Why should it not take long?

There is hardly a system that executes a single control flow. There will be many concurrent control flows – otherwise nobody will want to use the system, right? And they share the same database system. The real problem of a long running transaction is not so much that it holds on to a database connection for a long time, which may or may not be a sparse resource, but that it may prevent other control flows from proceeding by holding on to (pessimistic) locks in your database:

In order to prevent you from creating nonsense updates an RDBMS can and does provide exclusive access to parts of the data stored – e.g. in the form of a row-level lock when updating a record. There is extremely good reasons for that you can read up on elsewhere. Point is: It happens.

notlong

And as you have effectively no idea what updates are caused by your code – as we learned above – you can be sure that your system will run into blocked concurrency situation and will not be responsive and will not scale well in the presence of a long running transaction scheme. You do not want that.

Split Transactions

Coming back to transaction demarcation – here is a classic. In Java EE, a long time ago, when it was still called J2EE, there was really no declarative out-of-box transaction demarcation for Web applications. Following the logic of what is a good transaction demarcation above, the normal case is however that a single Web application request, at least when representing a user interaction, is a premier candidate for a transaction scope.

In an attempt to map what was thought to be useful for a distributed application, most likely because Enterprise Java Beans (EJB) had been re-purposed from remote objects to local application components (from hell), the proposed model du jour was to capture every Web application user transaction into a method invocation of a so-called Session Bean (EJB) – because those were by default transactional. See e.g. Core J2EE Patterns – Session Facade.

Now imagine due to some oversight, you called two of those for a single interaction: Two transactions. If the first invocation completed, a state change would be committed that a failure of the second invocation would not roll back and “boom!” you would have an incomplete or even inconsistent state change. Stupid.

To avoid that, it is best to move transaction demarcation as high as possible, as near as possible to the entry point as possible within your transaction management.

Nested Scopes

Sometimes however, you need more than one transaction within one control flow, even though there is no timing constraint. Sometimes you need nested transactions. That is, even before the current transaction terminates, the control flow starts another transaction.

This is needed if some deeper layer, traversed by the current control flow, needs to commit a state change regardless of the outcome of what is happening after. For example, a log needs to be written that an attempt of an interaction was performed.

Seeing this in code, a nested transaction typically radiates an aura of splendid isolation. But it is deceiving and dangerous. The same problem is as for long running transactions applies here: Your nested transaction may need to wait for a lock. In this case: It may need to wait for a lock that was acquired by the very same control flow – a deadlock!

deadlock

The need is rare – but the concept so tempting that it is used definitely much more frequently than justified.

So what about long running state changes?

Unfortunately some applications need to implement state changes that take long. Some mass update of database objects, some long running stateful interaction. This is a rich subject in its own right and it cannot be excluded that there will be some posts on that in this blog.

References

Applications Create Programming Models

Typically when we think about applications we structure them by business functions that will be implemented based on given, quite generic programming models such as, broadly speaking, Web applications and background jobs. These serve as entry points to our internal architecture comprising of, say, services and repositories.

Any code base that grows requires modularization along its abstraction boundaries if it wants to stay manageable. Contracts between modules are expressed as APIs. We tend to think about APIs as “access points” to methods that the “API consumer” invokes:

simple

But that’s only the trivial direction. It is typical that some work be delegated to the API consumer to fill in some aspect. For example streaming some data, handling some event, consuming some computation result. That is: It is not only the subsystem that implements the API, but the consumer does as well:

bidirectional

In the still simple cases, the APIs implemented by the consumer are passed-on callbacks (or event handlers, etc.). This is by far the most prominent method employed by your typical open source library.

If callbacks need to be invoked asynchronously, or completely independently from some previous invocation, e.g. as a job activity on some other cluster node, this approach becomes increasingly cumbersome: In order to make sure callbacks can be found, they need to be registered with the implementation before any invocation need occurs.

In short: You need to start seriously about how to find and manage callback implementations in your runtime environment. That is, switching to the term extension interfaces, you have a serious case of Extend me maybe….

andfind

Being an API that may be used by a probably yet undefined number of consumers, other aspects may require rules and documentation such as

  • The transactional environment propagated to extension interface invocations.
  • Is there authorization constraints that have to be considered or can be declared by implementors?
  • Is there concurrency/threading considerations to be considered?

And that is where it starts becoming worthy to be called a programming model.

While this looks like just another way of saying callback or extension interface, it is important to consider the weight of responsibility implied by this – better early than later.

For a growing software system, managing extensions in a scalable and robust way is a life saver.

Not considering this in time may well become a deferred death blow.

From Here to the Asteroid Belt (I)

When I came up with the title line, I had a completely different conclusion in mind. It’s a nice line though. In contrast to the conclusion, it stayed.

Oh and by the way: Spring is finally here:

IMG_20170323_1720145.jpg
(spring at the office, so much work lately, so little time to enjoy it)

This is one of those “what’s the right tool for the problem” posts. Most, me being no different, try to use tools they know best for essentially any problem at hand. And this is good instinct. It’s what people have always done and obviously they did something right. Knowing a tool well is of great value and typically supersedes in effectiveness the use of a tool that might be more powerful – if used correctly – but that you are not an expert at.

At scale however, when building something more complex or widely distributed, tool choice becomes decisive and intrinsic qualities such as simplicity, reliability, popularity, platform independence, performance, etc. may outweigh the benefits of tool expertise.

What I want to look at specifically is the applicability of a programming and execution platform for deployment scenarios ranging from an in-house, or SaaS deployments to massively distributed stand-alone applications such as mobile apps or desktop applications.

The latter two form the two endpoints of the custom vs. non-custom development scale and the non-distributed to arbitrarily distributed scale.

The rules are pretty clear:

In-house/SaaS: Complete control. The system is the application is the solution. There is no customization or distribution problem because everything is (essentially) 100% custom and 0% distributed.

Mobile/Desktop: No control over the single instance that is executed somewhere in the wild. Hard to monitor what is going on, minimal to no customization, potentially infinitely many instance in use concurrently.

But what about the places in between. The customized business solutions that drive our economic backbone from production sites to warehouse solutions, from planning to financials, from team productivity to workflow orchestration?

diagram.png

Let’s say you have an application that is part standard solution (to be used as is) but typically requires non-trivial customization, adaptation, extension to be effectively useful.

What are the options?

Option C: Maintain a code line per instance or customer

That is (still?) a popular method – probably because it is simple to start with and it makes sure the original developer is in complete control.

That is also its downside: It does not scale well into any sort of eco-system and licensing model including third-parties. For a buyer it means 100% dependency on a supplier that most likely got paid dearly for a customer specific modification and will asked to be so at any time of further adaptation and extension.

Option P: Build a plugin model on top of a binary platform API

That is the model chosen for browsers and similar applications. It works very well as long as the platform use-case is sufficiently well defined, and the market interesting enough.

It obviously requires to invest significantly into feature rich and stable APIs, as well as into an effective plug-in model, a  development approach for plug-ins, and a distribution channels or bundling/installation model.

In essence you build a little operating system for some specific application case – and that is simply not an easy and cheap task to do right.

Option S: Ship (significant) source code and support extension and customization on site

This approach has several strong advantages: You can supply hot fixes and highly special customization with minimal interference. Customization is technically not limited to particular functions or API. There is no extra cost per installation on the provider side compared to Option C.

It assumes however that the ability to change, version, and deploy is built-in and necessary tools are readily available. As code life-cycle is now managed on-site, some attention need to be paid to handle code life cycle cleanly.

From a consumer’s point of view it reduces dependency and (leaving legal considerations aside) technically enables inclusion of third-party modifications and extensions.

Scenario Determines Tool

In part II we will look at how the three different scenarios above translate into tool approaches. Stay tuned.