The Ability to Create Abstraction Necessarily Wins Over any Ability to Keep Track of Many Pieces

A Simple Thought.

We know that our ability to create abstractions is key to manage complexity – in life, in science, in mastering technology. Without creating abstractions we would not be able to make sense of our daily routine, what we work on, and much less of the constant sensory input we receive.

In fact, I doubt that anybody can meaningfully keep track of more than a handful interconnected things while “thinking”. That is why powerpoint presentations explaining a concept should never have more than three boxes with arrows between them – nobody will buy your idea otherwise. Likewise, any concept described by three connected boxes look convincing for most people – most likely the true reason for the demise of countless companies.

Abstractions are essential to software development. Not only that the whole idea of software requires some serious level of abstraction, but thankfully programming languages provide the means to stack abstractions on top of each other – leading to libraries of libraries of concepts and abstractions borrowed from those around and before us allowing us to create software that encompasses many millions, if not billions of lines of code – while writing only a fraction of that by ourselfs.

All that while being mostly ignorant to the intricacies of the lower layers of the pile of abstractions (actually the shoulders of the giants) we are standing on. So much so, that something like a file system occurs to us as natural a concept as, say, a horse.

And here is the catch: Because any layer of abstraction is hiding a number of lower level concepts, and since that number is naturally at least two (otherwise: Why bother?), the sum of lower level abstractions made tangible by introducing higher level concepts essentially growth exponentially.

Not very scientifically speaking, for code this means:

However the same pattern applies to other realms, be it running an organization or taking care of business or being a school teacher. Somebody good at computing does not necessarily make a good mathematician while being good at computing is not at all required to be a good mathematician. The ability to understand, create, and apply abstractions hands down wins over any “increased clock speed”.

In other words: As long as we are good at building abstractions, it’s ok that we cannot handle more than three boxes with arrows per slide….

If You Want to Make It, You’ve Got to Own It

Imagine you are a high volume manufacturer of vacuum cleaners. Everything runs smoothly, but you feel there may be some business potential for configurable high end vacuum cleaners that are built to spec.

You image a GOLD series of vacuum cleaners for which customers can configure various color schemes and decorations, various sensor add-ons, GPS tracking, and other features that a certain high-profile customer group finds exciting.

Of course, ordering a GOLD configuration from the web site comes at a premium.

Problem is: Your current production process does not accommodate for an ad hoc built-to-spec production. If you cannot reliably produce it, you cannot sell it!

So you come up with a pragmatic process, that makes sure you can track from order to shipment and that everything is consistent with your ERP recorded data. For example something like this (that I just made up – and you will get the point I suppose):

Obviously this requires some software support. It is not huge, but it may need to evolve and go through changes when you evolve your business. Who knows, maybe one day you will want to inform your customers on the production progress of there GOLD product.

Unfortunately you do not have much software development expertise in-house. So where do you get that software from? Do you ask your ERP supplier? Do you ask your Production Automation / MES supplier? Maybe not. Both are not exactly into custom development and will only increase your lock-in with them.

You could ask a software development agency – maybe even something really cheap with developers elsewhere but a local project manager.

Problem is: You might get a great solution but it will be a one-off solution. Who is going to maintain it, if the team that developed it will break up and join other projects right after? How do make sure, you can maintain it later?

The catch is:

You need to own it, if you want to make it!

Developing appropriate software development expertise is difficult. Developing and maintaining a custom business application that manages some long running workflows and integrates with legacy systems in a manageable way is different from developing a Web site. So you should look for a partner that provides

  • The expertise to build a solution;
  • A blueprint on how to extent and expand the solution into YOUR business platform;
  • A technology platform that you can build on, and
  • Support when you feel it is time to take over

This is the essence of digital transformation: It is not about creating digital versions of processes you already have, it is about making use of digital capabilities to implement new business models or process optimizations that were simply not possible before.

Please check out the great article by Volker Stiehl linked below.

References

  1. https://www.volkerstiehl.de/digitalisierung-vs-digitale-transformation/ (German only)
  2. How to Contract a Software Developer

Software Design Part 4 – Distribution

This is a follow-up on the post Software Design – The Big Picture (Intro) and in particular Software Design Part 4 – Modularization.

Eventually you will need to bring your code into productive use – i.e. you need to make sure it is executed and accessible for users or systems that want to interact with it.

That will involve some form of deployment, maybe automatically or semi-automatically – nothing I really want to talk about here.

What I want to talk about is whether and for what reasons you may want to consider a system setup that involves deployment to more than one execution environments that interact, if at all, via a network boundary. In other words we are talking about distribution as in Distributed Systems or, more specifically, Micro-Services.

To cut things short: The point that I want to make is that you should not distribute except for non-functional reasons – which in many instances require much less distribution consideration than you may be led to believe.

Good Reasons for Distributed Architecture

In an ideal world there would be just one all-encompassing code execution environment and all code we conceive would just be put there to solve the problem it was designed for. We would not worry about reliability, security, availability, performance. But that is obviously not the case.

Throughput: If your code needs to perform a lot of work, a single piece of hardware may not be enough to achieve the required throughput. In that case, you want to scale out (horizontally). That is you want to make your code run on multiple machines to perform more work in parallel.

Security: Some aspects of your code may need access to sensible data while other parts do not need such access. In that case you may want to have corresponding control flows performed in isolated environments that meet higher security requirements for access and maintenance.

Availability: Your code may serve under different quality of service expectations. For example some customers may be paying a premium to have top performance while others would be willing to accept some lagging performance during peak usage. Likewise you may want that some very demanding workload does not impact performance of end user interfaces. In both cases, driven by external requirements such as type of users or type of workload, you will want to separate workloads to different environments.

Bad Reasons for Distributed Architecture

So that was clear, right? In order to identify some bad reason for distributed deployments let’s make a thought experiment. Imagine we have some logical module “Order Management”, which is all about managing orders for something. We do not really care – but we are short on imagination and this is one of the typical examples. So, there are some aspects that make up our order management:

Let’s – for simplicity and to make sure you think beyond implementation code – call this the Order Management Module. Looking at this somewhat cohesive picture, you might think that it would be a good idea to turn the Order Management Module into an Order Management System that is deployed on its own and that others integrate with via remote APIs – i.e. a Microservice.

Let’s run some what-if experiments on that:

Availability: If Order Management is down but our DB is up: We cannot check on orders – just because.

Extensibility: Our system gets more complex and some other services need to be informed on order status changes to trigger some follow up workflow. Can the order management invoke my code? Nope. So we need to messaging? Yes!

Likewise: Special types of orders need extended data validation in scenarios that reuse the order management. How do these extensions get to the Order Management? And how do we enforce compatibility, if the Order Management is de-coupled from its extenders?

Scaling: Some scenario involving orders needs to be scaled out. How much do we need to scale up the individual parts like our Order Management exactly?

Refactoring: Let’s not even go there.

In short: This form of distribution comes at a high cost in terms of additional complexity due to the introduction of remote boundaries and possibly even split project management. It is almost, as if the Order Management is being developed and provided by an independent organization just like any old 3rd party system you need to integrate with.

Oh wait… it is exactly like that! Is that what you wanted?

What Happened?

We confused modularization and distribution – and essentially for no good reason but that it looked obvious in naming.

The better solution however would be to include the Order Management as a module within the system so that its capabilities are available and can potentially be part of any execution of anything in the system.

That does of course not imply that every workload or control flow should run everywhere.

For example: Probably the most obvious reason for separation is to have front-end performance not getting impacted by background work and to being able to scale background work independently from front-end work.

So our deployment would be more like this:

You can tell from the naming that we are less concerned with function but kind of function.

Indeed the Order Management as a named set of capabilities is simply integrated with the overall capabilities of the system. Checking on an order would be is directly from the front-end. Administration of the Order Management is be done directly from the front-end.

Mass checking and email notification jobs are however performed on background nodes.

If the Order Management needs to look for extensions to call, based on a type of order: That would be done in-place as part of the control flow that requires so – as by definition – it would all be part of the system and so potentially available when needed.

That is: Any kind of control flow of the system can be performed in any execution environment. We make however sure, based on smart non-functional reasoning, that this does not happen, if it violates our non-functional requirements.

Conclusion

With this post, finally that little series referenced below comes to an end. It has been a busy year so far and I did not get around to writing a lot. I will try to post some smaller pieces next.

Hope you stick around!

References

Software Design Part 4 – Modularization

This is a follow-up on the post Software Design – The Big Picture (Intro) and Software-Design – Part 2: Structuring For Control Flow, as well as Software Design Part 3 – Working With Persistent Data.

One of the fundamental problems of software development (and not only that) is that a) humans are really bad at managing complexity and that b) that code becomes really complex quickly.

The top one reason behind code going bad is that we stop being able to grasp how it actually works so much so that we start being afraid to change it structurally (i.e. refactor it). Possibly contrary to intuition, code that reached that state is essentially a car that lost its steering and – if at all – still moves out of inertia. Not good!

Obviously there must be a way around our limited intellectual capacity – after all we see enormously complex systems at work around us. But are they?

The trick to managing complexity is to avoid it. And the trick to avoid complexity is to build abstractions. Finally something we are quite good at.

Building abstractions happens everywhere from science to accounting. In software, the abstraction is a means to structure code and to decouple the code relying on an abstraction (such as a mobile app wanting to take a picture) from the details of the implementation of the abstraction (such as the hardware driver for the camera).

The same is true when creating an interface or a generic type in our favorite programming language so that we make effective use polymorphism to better structure some code.

You can look at modularization from different levels. For example as a way of structuring code of a library or application that is developed, packaged, and distributed as a whole from an (essentially) single source code folder. For example by arranging packages and folders in ways that help explain and understand responsibilities.

While maintaining a clear and instructive code structure is really important, it only carries so far. The reason is simple: As there is many, many more ways to screw up than there are to improve, any sufficiently large and non-usage-constraint code is prone to rot by violations of abstractions and complexity creep.

This kind of local (if you will) modularization is not what I am considering in this post. Instead, I am talking about moving whole slews of implementation (modules) away from your code, so that at any time the complexity you actually need to deal with is manageable.

The means of modularization are abstraction, encapsulation, and information hiding. However that is actually really the outcome of:

  1. Coming up with an API (contract)
  2. Separating implementation details from its API (hiding details)
  3. Making sure implementation is neither visible nor impacting other parts of the system (encapsulate)

How to Do It

I wrote a few posts on techniques and aspects of modularization. I will just enumerate the basics:

Re-Using and Extending

The two most notable patterns in contract building between modules are providing an API to be used by another module to invoke some function and, in contrast to that, providing API that is to be implemented by another module so that it can be invoked. The latter is in many ways how Object-Oriented-Programming factors into this story.

See Extend me maybe…

Sharing and Isolating

Exposing a contract means to share capabilities to be used by others. What is needed to do so however should not only be visible (so that it may not be accidentally used), it should not affect other parts of the system by its presence either.

See Modularization is more than cutting it into pieces.

Refactoring

Looking at a large non-modularized code base can easily be overwhelming. I tried to come up with an algorithm to reduce complexity iteratively:

See A simple modularization algorithm.

Conclusion

This post looked at modularization as if it only applies to code. A good modular system should provide modularization capabilities to essentially all aspects it is used for though. If managing configuration is an important aspect, it just means that configuration can be part of modules as well. So do Web application resources or whatever else is part of what your platform of choice is typically used for. That is a core feature of the z2-environment.

Modularization is a rich topic. Doing it right, keeping a sustainable complexity level over time regardless of solution size by finding appropriate contracts and management of contracts is skillful craftmanship.

Not paying attention, a lack of willingness to invest into structural maintenance easily leads to frustrating, endless “too little, too late” activities. A growing code base built on weak isolation requires development disciplin that is unrealistic to expect in most commercial circumstances.

Getting it right, seeing it work out however is a great experience of collaborative creation that I am fortunate to have been part of!

How to Contract a Software Developer

We are a small company developing custom software that typically implements some business critical function. Actual back-ends with lots of asynchronous transactional business workflows, mass-data processing, integration with other back-ends, machine-data and shop floor user interfaces.

We do not design or implement this software from scratch. We have tools and a solid software foundation and experience to analyze business processes, map them into software and eventually implement them. That’s what we bring to the party.

We do in general not do fixed-price projects. We do not do that because – in general – that simply does not make sense – not for us, not for our clients.

This post is on why asking for a fixed-price project is more often than not the wrong thing to ask for, for us as developer and for you as client. It is on why you should not want to contract a developer for a fixed price project and what you should do instead – to make life better for you as client and us as developer.

Groundwork

Normally you will read that the very first step of any software project is to develop an understanding of the actual business problem, its essential data relationships and what users will need to solve it using a software system.

And indeed, while there will be an initial problem description, it is not necessarily describing the problem to solve in terms that map easily to a technical solution approach. So you need to create a more technical and fundamental formulation of the business problem to solve as to create a foundation onto which the project can be planned and implemented.

However that is not the whole story. When you are at that point, you are already in the project. Another indispensable step that comes first is to build ground for common trust between client and developer.

Why would a client trust a software project that potentially evolves into a multi-million euro endeavor to a developer based on an exchange of design ideas and some vague planning?

Why would a software developer risk expensive litigation because of a misunderstanding of what a solution to a million-euro software project is supposed to deliver based on a design that turned out to be wishful thinking?

I believe there are three essential (moving) meta-milestones in any project:

Next: All the features and fixes you know are needed and of which the developer knows (or believes to know) how to do them right. Everything in Next can be done now.

Near: All those features that you believe could be done down the road, possibly relying on the Next, that you think would be really useful to have but you are not sure you are really willing to pay for all them just yet nor is your developer certain how long it will take and how well it will work.

Far: The vision of what could be done, if you had the Next and some of the Near, and maybe some cool idea and the right business framework. You would not know how to plan for it now, but sharing it provides orientation of where, eventually, we want to go.

These moving target meta-milestones define the grounds on which to repeatedly plan and commit. By agreeing on them, we build a common understanding on how we believe the project is to move forward – while committing to the next “realistic” fraction of it:

The Near defines the Next by showing you the boundary of what you feel sure about. The Far on the other hand guides the creation of the Neart and the vision to communicate when justifying the effort as a whole.

While working in the Next, the Near and the Far become clearer – ideally Near flows into Next and there is constantly food for work and success in the project.

Here is the deal however:

  • While agreeing on the Near and communicating the Far, you only contract on the Next.
  • While working on the Next, you fill it up again from the Near.
  • You make sure that splitting up, while not desirable, leaves no more burned ground than the current Next.

Practically Speaking

As potential project partners, developer and client should agree on a first set of a Near and Far. I tend to call them Phase 1 and Phase 2, as that is probably more expected. As the first thing to do however is to come up with an initial high-level design or even somewhat of a specification, that would exactly define the Next.

And that is what should be the first commitment.

The result of the specification will be an understanding of a refined Next, Near, and possibly an updated Far as well. The goal posts will have moved, and you can move forward into the next iteration: Actually implementing the Next.

Speaking in agile development terms: An iteration here is generally not a sprint, but more likely multiple sprints, depending on the size of the project and the planning horizon. You would nevertheless align budgeting and mid-term planning with sprint boundaries as to not interrupt work unnecessarily.

At any time, you make sure that work has been specified and documentation has been updated to the extent that work can be passed on if required.

As a developer, you know that everything is set and you do not have (unexpected) technical or documentation depths that will haunt you later on.

As a client you know that there is no unnecessary dependency that may mean that you lose control over your asset.

In particular this means:

  • Contracts do make sure that anything developed belongs to the client
  • If necessary, the client can continue development with a different team, bring in new developers, move development in-house, if that is desired.

The latter means that project organization tools and content, development and testing infrastructure is either already operated by the client, comes with the project, or can easily be re-created by the client.

It is naturally best, if development and testing is inherently contained with the project sources and mostly independent of other external or proprietary tools.

In order to maintain trust versus the project and in you as a developer, you should make sure to manage a well stuffed backlog for the Near so that continuity of the project is preserved.

Z2-environment Version 2.9 is Available

Finally, Version 2.9 is available for download and use. Version 2.9 comes with some useful improvements.

Please check out the wiki and online documentation.

Support for Java 15

Version 2.9 requires Java 11 and runs with Java up to Version 16 and supports a language level up to Java 15 based on the Eclipse Java Compiler ECJ 4.19 (#2088).

With Java 15, we have now finally multi-line text blocks, saving us some painful reformatting when needing markup or code blocks or long messages as string literals.

@Test
public void multilineStrings() {
	// Text blocks are kind of
	// nice for mark up, messages and code
	err.println("""
	create extension pg_stat_statements;
	select  
	pd.datname, 
	substring(pss.query,1,100) as query,
	calls,
	pss.rows as totalRowCount,
	(pss.total_time / 1000) AS duration,
	((pss.total_time / 1000)/calls) as "avg"  
	from pg_stat_statements as pss 
	join pg_database as pd on pss.dbid=pd.oid 
	order by duration desc limit 20;
	""");
}

Check out the JDK 15 Documentation for more on Java 15.

Upgrade to Jetty 10.0.1

This version now embeds Jetty 10.0.1 as its Web container (#2090). Jetty 10 is the last version supporting the Jakarta EE 8 namespace and the first to support the Servlet 4.0 API.

NOTE: With the next upgrade (Version 2.10) we will move on to Jakarta 9 that is NOT backwards compatible with previous versions of the Jakarta EE or Java EE APIs. This is mainly because package names change from “javax.*” to “jakarta.*” throughout the EE APIs.

See also Understanding Jakarta EE 9.

Supporting JUnit 5 (a.k.a. JUnit Jupiter)

This is arguably the coolest new feature in Z2. Previously Z2 already included an extremely useful in-container testing feature z2 Unit that was built on JUnit 4. I described it in detail in In-system testing re-invented. This is so useful for integration testing of anything that may call itself a meaningful application that I could not imagine developing without it anymore.

Hence it was all the more painful that it took so long to support the JUnit 5 API. Compared to JUnit 4, JUnit 5 is not only completely different but also significantly more complex from the perspective of an extender. However, it is also architecturally cleaner and allows for more testing features and testing flexibilty.

The new implementation of #2036, called z2 Jupiter, now allows to run remote integration tests transparently to the client (IDE, ANT, Jenkins,… etc) without compromising on JUnit 5 features in your tests – even more so than z2Unit did.

package mytest;

import org.junit.jupiter.api.Test;

import com.zfabrik.dev.z2jupiter.Z2JupiterTestable;

@Z2JupiterTestable(componentName="my_tests/java")
public class AZ2IntegratedUnitTest {

    @Test
    public void someTestMethod() {  
        System.out.println("Hello World!");
    }
}

I will described the implementation approach in another blog post. For now please check out How to Unit Test in Z2.

More…

Check out the version page for more details. Go to download and getting started in five minutes or check out some samples.

Posted in z2

Software Design Part 3 – Working With Persistent Data

This is a follow-up on the post Software Design – The Big Picture (Intro) and Software-Design – Part 2: Structuring For Control Flow.

Imagine a world where there is no constraint on availability of computational power and memory. We would have software that could simply run forever and could design it in a way that all data it processes is indefinitely kept within its runtime memory. Would we call the data it manages persistent data? It will not get lost, right? That is in fact not what we generally refer to as persistent data.

Persistent data in the sense used here is data that is kept independently from the execution of any specific interpreting or modifying software and may be useful for a variety of different existing or yet to be designed applications. And that is where the trouble starts.

Generally speaking, we need to consider that the same data set is accessed from different execution environments at the same time, implying consistency and concurrency considerations.

Secondly in many cases, just as we want to have a modular software design we want our data to be up to serving such design.

Finally, we want an implementation pattern for data access that is easy to implement, easy to use for dependent code, and provides good control over the former two aspects.

Consistency and Concurrency

The cornerstone technique for consistent data access is transactional data access as implemented by your typical relational database management system (RDBMS). While there is some niche scenarios where this is not required, any implementation scenario where control flows spans an a-priori unknown scope of data access (which is generally true for modular, extensible applications) will rely on transactional data access for operational state management. Trying otherwise is pointless.

As laid out in the previous post (Notes on Working with Transactions), a transaction should typically span an “interaction”, be it a user interaction or a “step” in a background workflow execution. This is the span where your code will invoke data changing operations.

While transactional database access will guarantee that all or no changes of transaction will be applied, it does not necessarily guarantee consistent reading of data nor does it prevent concurrent changes of data. At times you need to make sure that some control flow has exclusive access to some portion of data. That is what (so-called pessimistic) locking is for. As the use of “pessimistic” suggests, there is also “optimistic” locking, which is beneficial only for certain scenarios however and hardly in the general case.

Modular Data

An intrinsic feature of the relational data model is that it allows for arbitrary relationships between tabular data for as long as you can make data types fit. In particular this means any given data model may be enhanced by adding related data. By modeling constraints, even new consistency rules may be added to an existing data model.

If your application structure is reflecting that same modularization, you would want to re-use data type definitions just the same, instead of re-implementing extended domain model types for every extension.

While the relational model is a natural fit here, object-relational mapping tools like JPA for Java make this a little less-obvious. It is possible but requires some care when crafting shared entity classes. I wrote about this in Modularization And Data Relationships.

We will not go into domain model design here. There is an abundance of literature on that and I wouldn’t know how to add to that. See e.g. the classics Domain Driven Design, and The Data Model Resource Book.

In any case, extensibility is a key aspect of data modeling and data model implementation. If you are not prepared for the future, you will not be ready for the present!

Designing Repositories, Implementing Data Access

Given a data model, say, already modeled via some database schema, you need to represent data structures and access methods to those parts of the application that need to read and write data.

  • Data types specified using your programming language of choice that represent to the database schema – the domain API.
  • And a Repository API:
    • Data access methods for reading entities of your data model into memory (typically by id and paged for display etc.)
    • Data access methods for writing changes of entities to the database.

I strongly advocate to separate read access to data from write access and to have more or less completely behavior-free data type implementations. The repository API should very much follow a Command-Query-Responsibility-Segregation (CQRS) approach with complex update objects.

That is:

  • Design read methods matching your query needs.
  • Have one or very few write methods per updatable domain entity type that takes a structured update data object, describing possible updates.

This is not an object oriented design. But, let’s face it: Data modeling does not fit object orientation very well. Here are some benefits of this design:

  1. When mixing structure with update behavior in the implementation of your data model, you easily end up with a spread out update logic that has unclear validation times and is not very instructive on where to find the right update logic.
    In contrast, going for an update object forces you to design a document that explains what is the possible scope of update plus it makes it easy to standardize complex updates (e.g. of nested collections) and has a very natural point in time to apply validatation logic.
  2. Adding business level behavior to entity implementations is a bad idea as it tends to ignore the possibility that there may be many future extensions with modified or extended behavior for the same data.
    In contrast, strictly separating these two aspects makes sure we do not force any competition on entity model interfaces by extensions of business functions working on the data.
  3. A dedicated update data “document” structure is light on your implementation as it is substitutes for possibly many small update methods and is highly efficient in terms coding effort as it can be re-used in service interfaces and user interface models.

I wrote about this in the past (Java Data API Design Revisited). This is similar to the concept of Service Data Objects (I was representing SAP in the SDO expert group at the time). An idea that is even more effective within the application than it is outside: Use a generic to use update descriptor that can be used on many layers to describe modifications on a domain model:

Conclusion

Describing more implementation details here is beyond the scope of this blog. But, if there is one thing to take from this post:

Carefully crafting domain APIs and repositories to not only effectively represent the data model but also provide a simple and widely usable data API that is exensible and instructive is probably the best implementation-related investment possible.

Software-Design – Part 2: Structuring For Control Flow

This is a follow-up on the post Software Design – The Big Picture (Intro).

Processing a request to do something, be it a user interaction, a remote service call, or a local scheduler invocation always require the same “logical” sequence of steps:

  • Check whether the request can be granted execution (permission checking)
  • Translate the external request into internal data structures and validate input
  • Log the request
  • Call one or more repositories and/or local services to retrieve data and perform computations, progress some workflow, and update some data.

Simple enough. In reality however things get messy quickly. That is why in this post we describe a simple but effective and many-times proven abstraction that help us breaking up complexities in code by layers of responsibilities:

The separation of code into three layers that are orderd in terms of control flow simplify dependencies and make the implementation simple to understand.

Interaction

A Facade is a consumer-specific presentation of a function that is implemented via one or more internal services and data subsystems. Facades is the natural place to check on permissions and to do any kind of logging that describes the interaction (as in audit logging). Facades are the “adapter” between internal business logic and data definitions and the external interface (e.g. service or user interface).

Data

It may well be that your code has a definition of a facade. For example, if you use a Model-View-Controller structure, a controller class would be part of the facade.

Essentially any kind of application needs some understanding and access to shared business data. This is for many reasons what should be modeled first when starting a development effort as

Data lasts longer than function, and function lasts longer than interface.

Investing in an effective and robust data model is much better invested money and time than polishing the last details of your user interface. We define all domain data type definition as well as access methods (query and update) to form the Repository layer.

Function

We want our application to actually do something useful besides reading and updating data. What we really want is to solve a business problem and that means to implement some business logic that humans cannot or should not perform by hand.

Services implement business functions over the domain data model beyond data access and modification. Services are modeled in terms of a function domain, not a presentation or interaction and so services are the place to implement logic that is not specific to a facade but instead inherent to the business domain and may be re-used for other facades.

While facade changes are driven by external presentation requirements, while changes are driven by changes in the business workflows, which leads to the inverse formulation of what we had above:

Facades change more often than services and services change more often than the domain data model.

Conclusion

By introducing and sticking to an implementation pattern, which is what we are discussing here, we make the structure of code more readable, more maintainable, and more extensible because concepts repeat.

Note that the layering has very little to do with modularization or even deployment and distribution of microservices or even hexagonal architecture and the likes. We are still talking code structuring – micro-architecture if you will. The motivation and tasks in modularization are different and require deep solution specific abstraction to be done well:

That is, we are still on a rather fine-granular level. That will change when we discuss modularization.

Software Design – The Big Picture (Intro)

Most of my blog posts on software matters are about very specific, almost niche, subjects. I wanted to summarize a big picture however for a long time – some of the most important aspects at least.

Starting from a big picture is important – and slightly dangerous. Important because it helps avoid jumping to conclusions too early. Dangerous because we need to avoid getting trapped in astronaut diagram architecture. Let’s dive in and start with some high level but fundamental aspects of pretty much any non-trivial software system:

Structuring for Control Flow

Any kind of software system will have to process interactions (or requests) from users, or other systems, or from itself. Structuring for control flow is about how to structure your code as to most “usefully” check permissions, where to best implement data types of a given interface or protocol, where to implement the actual domain logic and how to update persistent state. This sounds bigger as it is and it can be nicely explained without going into implementation details.

Working with Persistent Data

While there are certainly applications that derive their value purely from solving some algorithmic problem, the vast majority of software systems we get in touch only exist to help us work with data (state) that represents some aspect of our reality and will live much longer that any specific system we use to access it. That is why data or domain modeling is hugely important and there is excellent literature abundantly available. So we will only look into how to be smart about accessing and updating persistent data.

Modularization

Modularization is one, if not the key method for controlling complexity in software design. It is a mystery to me why this is – I think – absurdly underrepresented in publications. My pet subject.

Distribution

Often abused as a mechanism of modularization, but in fact, a completely different subject. At some point you need to bring your code to execution. If you need to or want to run a distributed system, what are good reasons, not so good reason, and actually nonsense reason for accepting remote boundaries.

With the next posts we will look at each aspect in more details. Stay tuned!

The Bulletin Board Pattern

A rather common problem of software system design is to organize background work in a robust, reliable, and scalable way. For example incoming queries need to be processed, messages sent, or remote systems need to be called. Single work tasks emerge but are not to be done at the time and place they are generated, but instead work work is to performed in the background and asynchronously – possibly using a separate infrastructure from where it originated.

work rushing towards processing

Messaging for Work Distribution

Suitable orchestration of distributed work is not completely trivial though and there is a number of pitfalls. Driven to non-functional requirements such as reliability and robustness and asynchronicity, people often tend to messaging service systems, such as Apache Active MQ, as a convenient mechanism to announce work to the system and to distribute work to processing elements of the system.

However, message oriented middlewares do inherently not have an understanding of a message’s meaning. In the human analogy, the transmission of a message corresponds to the delivery of a letter via the mail service. Apart from quality of service aspects such as express delivery or the requirement to produce a return receipt the delivery is completely oblivious with respect to the letter’s content. Once the letter is on its way it has no relationship to other pending letters and once delivery is completed, the mail service is out of the picture and whatever must happen next is with the receiver.

Hence, using an approach like this for work assignment will inherently be ignorant of work details, pending work, as well as pretty much any particular state the designated processor is in.

How about Bulletin Boards?

Instead consider a bulletin board that holds a table of pending tasks. Instead of receiving isolated work tasks, an interested task processor may use a rule set to select one or multiple tasks depending on its state as well as the overall systems current workload (as seen on the bulletin board).

Considering the bulletin board is a design element of the solution, we may decide to note down highly specific business attributes with the tasks to allow sophisticated task selection rules. For example a worker may process similar tasks much more efficiently than random sequences of tasks. Or business rules may imply a time of day specific prioritization based related business data such as a customer status.

In other words: Messaging is stateless. A bulletin board can be arbitrarily stateful

But How?

So it seems there are some advantages in using a bulletin board and a “pick your work” approach rather than a generic work assignment. But how are we going to build that?

The correct answer is of course to use a relational database system (RDBMS). The whole setup asks for it! In its simplest incarnation, the bulletin board in its most simple incarnation just a database table that holds all pending tasks, some attributes we need for management and whatever business data we deem useful for smart work organization. How about reliability and robustness? After all we just decided to build ourselves.

Whatever RDBMS you are using, most likely there will be an approach for backup/restore and replication/fail-over available. Typically we will want to have a recovery feature: If there was an outage, the application crashed etc. we will want the application to retry whatever was running last. That is, tasks should be picked up “at least once”, possibly multiple times (and consequently task execution should be idempotent).

A simple recovery implementation would work like this: When picking one or a set of tasks the processor leaves a designation that identifies it with the “checked out” task. It does so again when finishing a task. In a recovery situation, these markers can be used by the processor to discover any previously started but not yet finished work to pick up and process first. It’s not complex, and in contrast to many other “advanced” approaches, it is highly transparent and simple to work with when things go bad.

Summarizing

Getting work done is the reason to built software systems. Knowing and organizing its work is a central aspect of its design and should not be delegated to external tools as if as an afterthough but should and can easily be implemented as a fundamental built-in feature.

References