A simple modularization algorithm

Lately I worked on breaking up a module that had grown too big. It had started to feel hard to maintain and getting oriented in the code’s module felt increasingly cumbersome. As we run tests by module, automated tests triggered by check ins started taking too long and as several developers are working on code of the one module, failing module tests became harder to attribute.

In other words: It was time to think about some module refactoring, some house keeping.

There was a reason however that the module had not been broken up already: It had some lurking dependency problems. Breaking it up would mean to change other module’s dependencies just because – which felt arbitrary – and still there was re-use code to be made accessible to any split off.

Comparing to past experience this is the typical situation when everybody feels that something needs to be done but it always turns out to be a little too risky and unpredictable so that no one really dares. And after all: You can always push things still a bit further.

As that eventually leads to a messed up, stalling code-base and we are smart enough (or simply small enough?) to acknowledge that, we made the decision to fix it.

Now – I have done this kind of exercise on and off. It has some unpleasantly tiring parts and overall feels a little repetitive. Shouldn’t there be some kind of algorithm to follow?

That is what this post is about:

A simple modularization algorithm

Of course, as you will notice shortly: We cannot magically remove the inherent complexity of the problem. But nevertheless, we can put it into a frame that takes out some of distracting elements:

algo

Step 1: Group and Classify

It may sound a ridiculous, but the very first thing is to understand what is actually provided by the current module’s code. This may not be as obvious as it sounds. If it would be clear and easy to grasp, you most probably wouldn’t have ended up in the current mess anyway.

So the job to do is to classify contents into topics and use-cases. E.g.

  • API definitions. Possibly API definitions that can even be split into several APIs
  • Implementation of one or more APIs for independent uses
  • Utility code that exists to support implementations of some API

At this stage, we do not refactor or add abstraction. We only assess content in a way that we end up getting a graph of code fragments (a class, a group of classes) with dependencies. Note: The goal of the excercise is not to get a UML class diagram. Instead we aim for groups that can be named by what they are doing: “API for X”, “Implementation of job Y”, “Helper classes for Z”.

Most likely the result will look ugly. You might find an intermingled mess of some fifty different background service implementations that are all tight together by some shared wiring registry class that wants to know them all. You might find some class hierarchy that is deeply clutterd with business logic specific implementation and extending it further is the only practical way of enhancing the application. Remember: If it was not for any of these, you would not be here.

Our eventual goal is to change and enhance the resulting structure in a way that allows to form useful compartmentation and to turn a mess into a scalable module structure:

trans1

That’s what step 2 and step 3 are about.

Step 2: Abstract

The second step is the difficult piece of work. After step one, looking at your resulting graph it should be easy to categorize sub graphs into either one of the following categories:

  1. many of the same kind (e.g. many independent job implementations),
  2. undesirably complex and/or cyclic
  3. a mix of the two

If only the first holds, you are essentially done with step 2. If there is actual unmanageable complexity left, which is why you are here, you need to now start refactoring to get rid of it.

This is the core of the exercise and where you need to apply your design skills. This comes down to applying software design patterns ([1]), using extensibility mechanisms, and API design. The details are well beyond the scope of this post.

After you completed one such abstraction exercise, repeat step 2 until there is no more b) and c) cases.

Eventually you need to be left with a set of reasonably sized, well-defined code fragment sets, that are in an acyclic directed graph linking them up by linking dependencies.

trans2

(For example, removing the cycle and breaking the one-to-many dependency was achieved by replacing A by a delegation interface A’ and some lookup or registration mechanism).

Step 3: Arrange & Extract

After completing step 2 we are still in one module. Now is the time to split fragments up into several modules so that we can eventually reap the benefits of: Less to comprehend at a time, clearer locating of new implementations, a structure that has come back to manageability – provided of course that you did a good job in step 2 (bet, you saw that coming). This post is not about general strategies and benefits for modularization. But there are plenty in this blog (see below) and elsewhere.

Given our graph of fragments from step 2, make sure it is topologically ordered in direction of linking dependency (in the example from upper left to lower right).

Now start extracting graph nodes into modules. Typically this is easy as most of the naming and abstraction effort was done in the previous steps. Also, when starting you probably had other constraints in mind, like extensibility patterns or different life cycle constraints – e.g. some feature not being part of one deployment while being in another. These all play into the effort of extraction.

The nice thing is: At this time, having the graph chart at hand, the grouping can be easily altered again.

Repeat this effort until done:

trans3

Enjoy!

References

  1. Software Design Patterns (Wikipedia)
  2. Directed Acyclic Graphs (Wikipedia)
  3. Dependency Management for Modular Applications
  4. Modularization is more than cutting it into pieces
  5. Extend me maybe…