Software Design Part 3 – Working With Persistent Data

This is a follow-up on the post Software Design – The Big Picture (Intro) and Software-Design – Part 2: Structuring For Control Flow.

Imagine a world where there is no constraint on availability of computational power and memory. We would have software that could simply run forever and could design it in a way that all data it processes is indefinitely kept within its runtime memory. Would we call the data it manages persistent data? It will not get lost, right? That is in fact not what we generally refer to as persistent data.

Persistent data in the sense used here is data that is kept independently from the execution of any specific interpreting or modifying software and may be useful for a variety of different existing or yet to be designed applications. And that is where the trouble starts.

Generally speaking, we need to consider that the same data set is accessed from different execution environments at the same time, implying consistency and concurrency considerations.

Secondly in many cases, just as we want to have a modular software design we want our data to be up to serving such design.

Finally, we want an implementation pattern for data access that is easy to implement, easy to use for dependent code, and provides good control over the former two aspects.

Consistency and Concurrency

The cornerstone technique for consistent data access is transactional data access as implemented by your typical relational database management system (RDBMS). While there is some niche scenarios where this is not required, any implementation scenario where control flows spans an a-priori unknown scope of data access (which is generally true for modular, extensible applications) will rely on transactional data access for operational state management. Trying otherwise is pointless.

As laid out in the previous post (Notes on Working with Transactions), a transaction should typically span an “interaction”, be it a user interaction or a “step” in a background workflow execution. This is the span where your code will invoke data changing operations.

While transactional database access will guarantee that all or no changes of transaction will be applied, it does not necessarily guarantee consistent reading of data nor does it prevent concurrent changes of data. At times you need to make sure that some control flow has exclusive access to some portion of data. That is what (so-called pessimistic) locking is for. As the use of “pessimistic” suggests, there is also “optimistic” locking, which is beneficial only for certain scenarios however and hardly in the general case.

Modular Data

An intrinsic feature of the relational data model is that it allows for arbitrary relationships between tabular data for as long as you can make data types fit. In particular this means any given data model may be enhanced by adding related data. By modeling constraints, even new consistency rules may be added to an existing data model.

If your application structure is reflecting that same modularization, you would want to re-use data type definitions just the same, instead of re-implementing extended domain model types for every extension.

While the relational model is a natural fit here, object-relational mapping tools like JPA for Java make this a little less-obvious. It is possible but requires some care when crafting shared entity classes. I wrote about this in Modularization And Data Relationships.

We will not go into domain model design here. There is an abundance of literature on that and I wouldn’t know how to add to that. See e.g. the classics Domain Driven Design, and The Data Model Resource Book.

In any case, extensibility is a key aspect of data modeling and data model implementation. If you are not prepared for the future, you will not be ready for the present!

Designing Repositories, Implementing Data Access

Given a data model, say, already modeled via some database schema, you need to represent data structures and access methods to those parts of the application that need to read and write data.

  • Data types specified using your programming language of choice that represent to the database schema – the domain API.
  • And a Repository API:
    • Data access methods for reading entities of your data model into memory (typically by id and paged for display etc.)
    • Data access methods for writing changes of entities to the database.

I strongly advocate to separate read access to data from write access and to have more or less completely behavior-free data type implementations. The repository API should very much follow a Command-Query-Responsibility-Segregation (CQRS) approach with complex update objects.

That is:

  • Design read methods matching your query needs.
  • Have one or very few write methods per updatable domain entity type that takes a structured update data object, describing possible updates.

This is not an object oriented design. But, let’s face it: Data modeling does not fit object orientation very well. Here are some benefits of this design:

  1. When mixing structure with update behavior in the implementation of your data model, you easily end up with a spread out update logic that has unclear validation times and is not very instructive on where to find the right update logic.
    In contrast, going for an update object forces you to design a document that explains what is the possible scope of update plus it makes it easy to standardize complex updates (e.g. of nested collections) and has a very natural point in time to apply validatation logic.
  2. Adding business level behavior to entity implementations is a bad idea as it tends to ignore the possibility that there may be many future extensions with modified or extended behavior for the same data.
    In contrast, strictly separating these two aspects makes sure we do not force any competition on entity model interfaces by extensions of business functions working on the data.
  3. A dedicated update data “document” structure is light on your implementation as it is substitutes for possibly many small update methods and is highly efficient in terms coding effort as it can be re-used in service interfaces and user interface models.

I wrote about this in the past (Java Data API Design Revisited). This is similar to the concept of Service Data Objects (I was representing SAP in the SDO expert group at the time). An idea that is even more effective within the application than it is outside: Use a generic to use update descriptor that can be used on many layers to describe modifications on a domain model:

Conclusion

Describing more implementation details here is beyond the scope of this blog. But, if there is one thing to take from this post:

Carefully crafting domain APIs and repositories to not only effectively represent the data model but also provide a simple and widely usable data API that is exensible and instructive is probably the best implementation-related investment possible.

Software-Design – Part 2: Structuring For Control Flow

This is a follow-up on the post Software Design – The Big Picture (Intro).

Processing a request to do something, be it a user interaction, a remote service call, or a local scheduler invocation always require the same “logical” sequence of steps:

  • Check whether the request can be granted execution (permission checking)
  • Translate the external request into internal data structures and validate input
  • Log the request
  • Call one or more repositories and/or local services to retrieve data and perform computations, progress some workflow, and update some data.

Simple enough. In reality however things get messy quickly. That is why in this post we describe a simple but effective and many-times proven abstraction that help us breaking up complexities in code by layers of responsibilities:

The separation of code into three layers that are orderd in terms of control flow simplify dependencies and make the implementation simple to understand.

Interaction

A Facade is a consumer-specific presentation of a function that is implemented via one or more internal services and data subsystems. Facades is the natural place to check on permissions and to do any kind of logging that describes the interaction (as in audit logging). Facades are the “adapter” between internal business logic and data definitions and the external interface (e.g. service or user interface).

Data

It may well be that your code has a definition of a facade. For example, if you use a Model-View-Controller structure, a controller class would be part of the facade.

Essentially any kind of application needs some understanding and access to shared business data. This is for many reasons what should be modeled first when starting a development effort as

Data lasts longer than function, and function lasts longer than interface.

Investing in an effective and robust data model is much better invested money and time than polishing the last details of your user interface. We define all domain data type definition as well as access methods (query and update) to form the Repository layer.

Function

We want our application to actually do something useful besides reading and updating data. What we really want is to solve a business problem and that means to implement some business logic that humans cannot or should not perform by hand.

Services implement business functions over the domain data model beyond data access and modification. Services are modeled in terms of a function domain, not a presentation or interaction and so services are the place to implement logic that is not specific to a facade but instead inherent to the business domain and may be re-used for other facades.

While facade changes are driven by external presentation requirements, while changes are driven by changes in the business workflows, which leads to the inverse formulation of what we had above:

Facades change more often than services and services change more often than the domain data model.

Conclusion

By introducing and sticking to an implementation pattern, which is what we are discussing here, we make the structure of code more readable, more maintainable, and more extensible because concepts repeat.

Note that the layering has very little to do with modularization or even deployment and distribution of microservices or even hexagonal architecture and the likes. We are still talking code structuring – micro-architecture if you will. The motivation and tasks in modularization are different and require deep solution specific abstraction to be done well:

That is, we are still on a rather fine-granular level. That will change when we discuss modularization.

Software Design – The Big Picture (Intro)

Most of my blog posts on software matters are about very specific, almost niche, subjects. I wanted to summarize a big picture however for a long time – some of the most important aspects at least.

Starting from a big picture is important – and slightly dangerous. Important because it helps avoid jumping to conclusions too early. Dangerous because we need to avoid getting trapped in astronaut diagram architecture. Let’s dive in and start with some high level but fundamental aspects of pretty much any non-trivial software system:

Structuring for Control Flow

Any kind of software system will have to process interactions (or requests) from users, or other systems, or from itself. Structuring for control flow is about how to structure your code as to most “usefully” check permissions, where to best implement data types of a given interface or protocol, where to implement the actual domain logic and how to update persistent state. This sounds bigger as it is and it can be nicely explained without going into implementation details.

Working with Persistent Data

While there are certainly applications that derive their value purely from solving some algorithmic problem, the vast majority of software systems we get in touch only exist to help us work with data (state) that represents some aspect of our reality and will live much longer that any specific system we use to access it. That is why data or domain modeling is hugely important and there is excellent literature abundantly available. So we will only look into how to be smart about accessing and updating persistent data.

Modularization

Modularization is one, if not the key method for controlling complexity in software design. It is a mystery to me why this is – I think – absurdly underrepresented in publications. My pet subject.

Distribution

Often abused as a mechanism of modularization, but in fact, a completely different subject. At some point you need to bring your code to execution. If you need to or want to run a distributed system, what are good reasons, not so good reason, and actually nonsense reason for accepting remote boundaries.

With the next posts we will look at each aspect in more details. Stay tuned!

The Bulletin Board Pattern

A rather common problem of software system design is to organize background work in a robust, reliable, and scalable way. For example incoming queries need to be processed, messages sent, or remote systems need to be called. Single work tasks emerge but are not to be done at the time and place they are generated, but instead work work is to performed in the background and asynchronously – possibly using a separate infrastructure from where it originated.

work rushing towards processing

Messaging for Work Distribution

Suitable orchestration of distributed work is not completely trivial though and there is a number of pitfalls. Driven to non-functional requirements such as reliability and robustness and asynchronicity, people often tend to messaging service systems, such as Apache Active MQ, as a convenient mechanism to announce work to the system and to distribute work to processing elements of the system.

However, message oriented middlewares do inherently not have an understanding of a message’s meaning. In the human analogy, the transmission of a message corresponds to the delivery of a letter via the mail service. Apart from quality of service aspects such as express delivery or the requirement to produce a return receipt the delivery is completely oblivious with respect to the letter’s content. Once the letter is on its way it has no relationship to other pending letters and once delivery is completed, the mail service is out of the picture and whatever must happen next is with the receiver.

Hence, using an approach like this for work assignment will inherently be ignorant of work details, pending work, as well as pretty much any particular state the designated processor is in.

How about Bulletin Boards?

Instead consider a bulletin board that holds a table of pending tasks. Instead of receiving isolated work tasks, an interested task processor may use a rule set to select one or multiple tasks depending on its state as well as the overall systems current workload (as seen on the bulletin board).

Considering the bulletin board is a design element of the solution, we may decide to note down highly specific business attributes with the tasks to allow sophisticated task selection rules. For example a worker may process similar tasks much more efficiently than random sequences of tasks. Or business rules may imply a time of day specific prioritization based related business data such as a customer status.

In other words: Messaging is stateless. A bulletin board can be arbitrarily stateful

But How?

So it seems there are some advantages in using a bulletin board and a “pick your work” approach rather than a generic work assignment. But how are we going to build that?

The correct answer is of course to use a relational database system (RDBMS). The whole setup asks for it! In its simplest incarnation, the bulletin board in its most simple incarnation just a database table that holds all pending tasks, some attributes we need for management and whatever business data we deem useful for smart work organization. How about reliability and robustness? After all we just decided to build ourselves.

Whatever RDBMS you are using, most likely there will be an approach for backup/restore and replication/fail-over available. Typically we will want to have a recovery feature: If there was an outage, the application crashed etc. we will want the application to retry whatever was running last. That is, tasks should be picked up “at least once”, possibly multiple times (and consequently task execution should be idempotent).

A simple recovery implementation would work like this: When picking one or a set of tasks the processor leaves a designation that identifies it with the “checked out” task. It does so again when finishing a task. In a recovery situation, these markers can be used by the processor to discover any previously started but not yet finished work to pick up and process first. It’s not complex, and in contrast to many other “advanced” approaches, it is highly transparent and simple to work with when things go bad.

Summarizing

Getting work done is the reason to built software systems. Knowing and organizing its work is a central aspect of its design and should not be delegated to external tools as if as an afterthough but should and can easily be implemented as a fundamental built-in feature.

References

Bad Software Architecture Doesn’t Matter When It Doesn’t Matter…

Lately ran into http://www-scf.usc.edu/~csci201/lectures/Lecture11/royce1970.pdf, a paper published in 1970. After 50 years, anybody in the field can perfectly relate to the problems described. Even the terminology has hardly changed. While we may have a slightly better grip on a pragmatic development process – the difference in sophistication is not really impressive while we are suffering from the very same problems today: Poor decision making, too little upfront analysis and design.

Yet there is a whole industry of people selling simple technology driven approaches to problems that are at most partially technological but first and foremost design questions.

As a developer one (unfortunately) rarely gets the chance of designing and being responsible for a system that grows to a large and vibrant organism. And so actually validated design experience is scarce out there.

Fortunately, singular and small solution can typically made to work anyway somehow, even if they are built with a mix of containerized node.js implemented microservices to store an order record in a non-transactional no-SQL database.

I just made a resolution to write about something positive in my next post. Stay healthy!

Z2-environment Version 2.8 is Available

Finally, version 2.8 is available for download and use. Version 2.8 comes with some useful improvements.

Please check out the wiki and online documentation.

Support for Java 13

Version 2.8 requires Java 9 and runs with Java up to Version 13 and supports a language level up to Java 13 based on the Eclipse Java Compiler ECJ 4.14 (#2035).

Upgraded to Jetty 9.4.24

While that was rather a necessity to run on Java 13, it was also kind of nice to be up-to-date again (#2052)

Follow HEAD!

Previously it was kind of cumbersome to change Git Component Repo declarations when working with feature branches, or to make a connected system switch branches when implementing a system local repo.

At this point, I probably lost you. Anyway, as z2 is self-serving its code and configuration, it would be really cool, if switching branches would be just that: Switching branches and (of course) synchronizing z2. And that is true now.

A Git Component Repository declaration may now use “HEAD” as a reference:

In that case, whatever is the current branch of the repo: Z2 will follow.

Remote Management Goodies

Z2 exposes metrics and basic operations via JMX. Via JMX or via the simple admin user interface, you can check on runtime health and trigger synchronization or reload of worker processes for example. Some things were – in practice – still rather user unfriendly:

  • Synchronizing a remote installation from the command line;
  • Accessing the main log remotely.

There is now a simple to use command line integrated with Z2 that can be used to do just that: Trigger some JMX implemented function and stream back the log. Or just simply continuously stream the log to a remote console.

Remote log streaming is also available from the admin user interface:

More…

Check out the version page for more details. Go to download and getting started in five minutes or check out some samples.

Posted in z2

A Model for Distributed System-Centric Development

It’s been a while and fall has been very busy. I am working on z2 version 2.8 which will bring some very nice remote management additions to simplify managing a distributed application setup.That was the motivation behind this post.

This post is about a deployment approach for distributed software systems that is particularly useful for maintenance and debugging.

But let‘s start from the beginning – let‘s start from the development process.

Getting Started

The basic model of any but the most trivial software development is based on checking out code and configuration from some remotely managed version control system (or Software Configuration Management system, SCM) to a local file system, updating it as needed and testing it on a local execution environment:

At least for the kind of application I care about, various versions, for development, testing, and productive use are stored in version control. In whatever way, be it build and deploy or pull, the different execution environments get updated from changes in the shared SCM. Tagging and branching is used to make sure that latest changes are separated from released changes. Schematically, the real situation is more like this:

There are good reasons we want to have permanent deployments for testing and staging: In large and complex environment a pre-production staging system may consist of a complex distributed setup that integrates with surrounding legacy or mocked third-party systems and have corresponding configurations. In order to collaboratively test workflows, check system configurations, and test with historic data, it is not only convenient but really natural and to have a named installation to turn to. We call that a test system. But then:

How do you collaboratively debug and hotfix a distributed test system?

For compile-package-deployment technologies, you could setup a build pipeline and a distributed deployment mechanism that allows you to push changes you applied locally on your pc to the test system installation. But that would only be you. In order to share and collaborate with other developers on the test system, you need some collaborative change tracking. In other words, you should use an SCM for that.

Better yet, you should have an SCM as an integral part of the test system!

Using an SCM as Integral Part of the System

Here is one approach like that. We are assuming that our test system has a mechanism to either pull changes from an SCM or there is a custom build and deploy pipeline to update the test system from a named branch. Using the z2-Environment, we strongly prefer a pull approach – due to its inherently better robustness.

From a test system‘s perspective we would see this:

Here „test-system“ is the branch defining the current code and configuration of the test system deployment. We simply assume there is a master development branch and a release branch that is still in testing.

So, any push to „test-system“ and a following „pull“ by the test system leads to a consistently tracked system update.

Let‘s assume we are using a distributed version control system (DVCS) like Git. In that case, there is not only an SCM centrally and on the test system, but your development environment has just as capable an SCM. We are going to make use of that.

Overall we are here now:

What we added in this picture is a remote reference to the test-system branch of the test system’s SCM from our local development SCM. That will be important for the workflows we discuss next.

The essence of our approach is that a DVCS like Git provides us a common versioning graph spanning multiple repositories.

Example Workflows

Let‘s play through two main workflows:

  • Consistent and team-enabled update of the test system without polluting the main code line
  • Extracting fix commits from test commits and consolidating the test system

Assume we are in the following situation: In our initial setup, we have a main code line (master) and a release branch. Both have been pushed to our central repository (origin). The test system is supposed to run the release branch but received one extra commit (e.g. for configuration). We omitted the master branch from the test-system repository for clarity. In our development repository (local), we have the master branch, the release branch as well as the test-system branch. The latter two from different remotes respectively. We have remote branches origin/master, origin/release, test-system/test-system to reflect that. We will however not show those here unless that adds information:

In oder to test changes on the test system, we develop locally, push to the test system repo and have the test system be updated from there there. None of that affects the origin repository. Let‘s say we need two rounds:

We are done testing our change with the test system. We want to have the same change in the release and eventually in the master branch.

The most straightforward way of getting there would be to merge the changes back into release and then into master. We did not write particularly helpful commit messages during testing however. For the history of the release and the development branch we prefer some better commit log content. That is why we are squash-merging the test commits the release branch and merge the resulting good commit into release.

After that we can push the release branch and master changes to origin:

While this leads to a clean history centrally, it puts our test system into an unfortunate state. The downside of a squash-merge is that there is no relationship between the resulting commit and the originating history anymore. If we would now merge the „brown“ commit into the test-system branch we would most likely end up with merge conflicts. That may still be the best way forward as it gets you a consistent relationship with the release branch and includes testing information.

At times however, we may want to „reset“ the test-system into a clean state again. In that case, we can do something that we would not allow on the origin repository: Overwrite the test-system history with a new, clean history, starting at where we left off initially. That is, we reset the test-system branch, merge the release commit, and finally force push the new history.

Now after this, the test system has a clean history that shows a history as we would have it when updating with release branch updates normally. None of what we did had any impact on the origin repository until we decided for meaningful changes.

Summary

What looked rather complicated was actually not more then equipping a runtime environment with its own change history and using some ordinary Git versioning „trickery“ to walk through some code and configuration maintenance scenario. We turned an execution environment into a long-living system with a configuration history.

The crucial pre-requisite for any such scenario is the ability of the runtime environment to be updated automatically and easily from a defining configuration repository implemented over Git or a similar DVCS.

A capability that the z2-environment has. With version 2.8 we intend to introduce much better support for distributed update scenarios.

Docker is essentially a Linux Installation Format

When you are designing a software system and you start worrying about how to get it installed on a machine to run, it is time to think about places where to put your code, configuration, supporting resource.

In reality however, you will have thought about that already during development as, I presume, you ran and tested your software. You did just not call it installation.

And when you came up with a concept to keep the various artifacts that are required of your solution in a sound place – you most likely made sure that there is a cohesive whole: In most cases a folder structure that holds everything needed to run, configure, customize your software.

Wouldn’t it be nice if that was what installation was all about: Regardless of your hosting operating system, installation of your software would mean to unpack/copy/pull a folder structure that provides your software, adapt the configuration a little maybe and be done?

Application in folder

That of course is not all there is to it. In many cases you need other supplementary software. Third-party libraries that come with the operating system. Or a database system that should run on the same host OS. It is here that we find crucially different philosophies of re-use depending on what technology you use.

If you are Java developer and your dependencies are Java libraries, you will typically bring them all with your application. In that case, if you even include the Java Runtime, you are pretty much there already.

JAVA application in folder

If you are developing application on the LAMP stack, to go for the other extreme, you typically depend strongly on third party software packages that are (typically again) installed using the OS defined package manager. That is, you blend in with the OS way of installing software.

LAMP application “in” Linux

Going back to the Java case and one step further: Suppose you come up with an extension model to your solution. Additional modules that can deployed with your application. They would need configuration and to have that and be a good citizen, they should adhere to your installation layout.

That is exactly what Linux does. If you want to be a good citizen on your Linux distribution you use the software packaging style of your target distribution and install in /opt/ keep your data in /var/lib, and your configuration in /etc.

But should you? Think about it: This is probably not the structure you use during development as you want to have the freedom to use different versions and variations without switching the OS. More dramatically, if you want to support multiple Operating System distributions, styles of configuration and scripts may vary. In fact, unlike what the drawing of packages installed above suggests, the artifacts of packages are spread out in the file system structure – in sometimes distribution specific ways.

Everything can get messy easily.

Things get messy and complicated anyway, because in that approach you rely on 3rd party software packages that are not distributed with the application but are expected to be provided via the distribution.

From an end user perspective, using today’s Linux package managers is great. From a developer perspective it is the classical dependency hell:

Every to-be-supported distribution and version will require a different dependency graph to be qualified for your solution!

Application with dependencies in Linux

Docker as a Solution Container

Many people look at Docker from the perspective of virtualization – with a focus on isolation of runtimes. But it is actually the opposite. Docker is a means to share operating system managed resources for applications that are packaged packaged with their dependencies in a distribution independent way. From the packaged application’s perspective, its execution environment “looks” like a reduced installation of a Linux distribution, that, by means of building Docker files, was completely defined at development time:

Dockerized Applications on Linux

By providing means to map shared resources like ports from the Hosting OS to Docker containers, Docker even allows to “trick” internal configuration (e.g. of the database port) into a shared execution.

Looking at it from Higher

If we take one step back again, what we actually see is a way of deploying a statically linked solution that includes everything except for the actual OS kernel. That is great and it solves the dependency problems noted above at the expense of somewhat higher resource consumption.

However, if there was a better standardized Linux base layout and better defined ways of including rather than referencing libraries and well defined “extension points”, e. g. if the Apache Web Server could discover Web Applications in “Application Folders”, if databases would discover database schemas and organize storage within the deployment, if port mapping was a deployment descriptor feature and so on … we would need none of it and have much more flexibility. If we had it on the level of the OS, we would have a huge eco-system opportunity.

It is this extensibility problem that any application server environment needs to solve as well – but never does (see for example Modularization is more than cutting it into pieces, Extend me Maybe, and Dependency Management for Modular Applications)

Summary

Creating a Docker image is not as simple as building a folder hierarchy, however, in essence, Docker provides a way to have our own Solution Layout on a Linux system while having strong control over third-party dependencies and still be easily installable and runnable on a variety of hosting environments. It is a cross-platform installation medium.

That is great. But it is really the result of wrong turns in the past. Docker found a dry spot in a swamp.

If it was safely possible to reliably contain required dependencies and configuration – a simple folder based software installation mechanism would have saved the world a lot of trouble.

Scrum Should Indeed Be Run Like Multiple Parallel Waterfall Projects

Normally I am not writing about processes and methodologies. Not my preferred subject really.

Lately however, I read an article (see below), that restated that agile is not like doing small waterfalls. I think that claim is misleading.

Over time, I have been working on all kinds of projects, ranging from proof of concept work to large systems for 24/7 production, from ongoing maintenance to custom extensions to existing solutions.

Each of those seemed to respond best to a different process approach.

For simple projects, it can be best to only have a rough outline or simply start from an existing example and just get going.

For maintenance projects, a Kanban approach, essentially a work stream comprised of  work items of limited conceptual impact can be best.

It gets more interesting when considering projects that are clearly beyond a few days of work and do have a perfectly clear objective. For example consider a specialized front end for some user group over an existing backend service.

As a paying customer, you would want to define (and understand) what should be the very specific result of the development effort as well as how much that will cost you. Therefore, as a customer, you naturally want development to follow a Waterfall Model:

It starts with a (joint) requirement analysis (the “why”) and a specification and design phase (the “how”). Let’s just call this the Definition Phase.

After Definition a time plan is made (implying costs) and the actual implementation can commence.

Once implementation completes the development result is verified and put into use – ideally on time and on budget. Or, as a simplified flow chart:

As we all know this approach does not work all to well for all projects.

Why is that?

Think of a project as a set of design decisions and work packages that have some interdependence, or simpler as a sequence of N work packages, where a single work package is always assumed to be doable by your average developer in one day. So, effectively, there is some prophecy, N steps deep, that after step X all prerequisites for step X+1 are fulfilled and that after step N the specification is implemented.

For very simple tasks, or tasks that have been done many times, the average probability of failure, that is, that the invariant above does not hold can be sufficiently small so that some simple measures like adding extra time buffers will make sure things still work out overall.

In software projects, in particular those that are not highly repetitive (think non-maintenance development projects), we typically find lots of non-repetitive tasks mixed with use of new technologies and designs that are implemented for the first time. In a situation like that, the probability of any sort of accurate project prediction from the start decreases rapidly with the “depth” of planning.

There are ways to counter this risk. Most notably by continuously validating progress and adapting planning in short iterations, for example in the form of a Scrum managed process.

While that sounds that we are discussing opposing, alternative process approaches, each having a sweet spot at some different point on the scale of project complexity, that is not so.

Execute Parallel Waterfalls

In fact: The gist of this post is that an agile process like Scrum is best run when considering it a parallel execution of multiple smaller waterfall projects.

Here is why: Many projects use Scrum as an excuse not to plan and design ahead of time, but instead only focus on short term feature goals – leaving design decisions to an implementation detail of a small increment. That is not only a great source of frustration as it propells the risk that even small increments end up brutally mis-estimated, it also leads to superficially designed architectures that – at best – require frequent and costly re-design.

Instead we should look for a combination of the two that on the one hand makes sure we make an upfront design of aspects of the overall project to an extent that we feel certain they can be done and estimated reliabl and yet, on the other hand preserve flexibility to adapt changed requirements when needed.

As a result we run multiple parallel waterfall projects, let’s call them part-projects that span one to several sprints, but use resources smartly when we need to adapt or for example work on bugs introduced by previous work.

Visualized simply as parallel execution lanes, processing several planned ahead part-projects, at sprint N we work on some subset

(B denoting a bug ticket) while at Sprint n+1 we proceeded and take in the next tasks:

The sprint cycle forces us to re-assess frequently and enables us to make predictions on work throughput and hence helps in planning of resource assignments. Our actual design and estimation process for part-projects is not part of sprint planning but serves as crucial input to sprint planning.

References

Z2-environment Version 2.7 is Available

I am happy to declare version 2.7 ready for download and use. Version 2.7 comes with a lot of small improvements and some notable albeit rather internal changes.

Please check out the wiki and online documentation.

Support for Java 11

Version 2.7 requires Java 9 and runs with Java up to Version 12 as well and supports a language level up to Java 11 based on the Eclipse Java Compiler ECJ 4.10 (#2021

Well… note the use of var in lambdas. The most noteworthy changes in Java 11 are its support and licensing model. Please visit the Oracle Website for more details.

Updated Jetty Version

As the integrated Jetty Web container required an upgrade to run with Java 11 as well, z2 2.7 now includes Jetty 9.4.14 (#2027).

Robust Multi-Instance Operation

The one core feature of z2 is that any small installation of the core runtime is a full-blown representation of a potentially huge code base.

This is frequently used when running not only an application server environment, but instead, from the same installation, command line tools that can make direct use of possibly heavy-weight backend operations.

In development scenarios however, code may have changed between executions and z2 previously sometimes created resource conflicts between long running tasks and freshly started executions. This has been fixed with #1491.

No more Home Layouts

The essential feature that make it easy to set up a system that has execution nodes serving different but well-defined purposes of a greater, coherent whole is the concept of system states. System states allow to express the grouping of features to be enabled in a given configuration and extend naturally into the component dependency chain that is a backbone of z2’s modularization scheme.

Unfortunately Home Layouts, that defined what worker processes to run in a given application server configuration, duplicated parts of this logic but did not integrate with it. That has been fixed with issue #1981. Now, worker processes are simply components that are part of a home process dependency graph. In essence, while the documentation still mentions Home Layouts, a home layout is now simply a system state and a home layout by convention.

More…

Check out the version page for more details. Go to download and getting started in five minutes or check out some samples.

 

Posted in z2