When you are designing a software system and you start worrying about how to get it installed on a machine to run, it is time to think about places where to put your code, configuration, supporting resource.
In reality however, you will have thought about that already during development as, I presume, you ran and tested your software. You did just not call it installation.
And when you came up with a concept to keep the various artifacts that are required of your solution in a sound place – you most likely made sure that there is a cohesive whole: In most cases a folder structure that holds everything needed to run, configure, customize your software.
Wouldn’t it be nice if that was what installation was all about: Regardless of your hosting operating system, installation of your software would mean to unpack/copy/pull a folder structure that provides your software, adapt the configuration a little maybe and be done?
That of course is not all there is to it. In many cases you need other supplementary software. Third-party libraries that come with the operating system. Or a database system that should run on the same host OS. It is here that we find crucially different philosophies of re-use depending on what technology you use.
If you are Java developer and your dependencies are Java libraries, you will typically bring them all with your application. In that case, if you even include the Java Runtime, you are pretty much there already.
If you are developing application on the LAMP stack, to go for the other extreme, you typically depend strongly on third party software packages that are (typically again) installed using the OS defined package manager. That is, you blend in with the OS way of installing software.
Going back to the Java case and one step further: Suppose you come up with an extension model to your solution. Additional modules that can deployed with your application. They would need configuration and to have that and be a good citizen, they should adhere to your installation layout.
That is exactly what Linux does. If you want to be a good citizen on your Linux distribution you use the software packaging style of your target distribution and install in /opt/ keep your data in /var/lib, and your configuration in /etc.
But should you? Think about it: This is probably not the structure you use during development as you want to have the freedom to use different versions and variations without switching the OS. More dramatically, if you want to support multiple Operating System distributions, styles of configuration and scripts may vary. In fact, unlike what the drawing of packages installed above suggests, the artifacts of packages are spread out in the file system structure – in sometimes distribution specific ways.
Everything can get messy easily.
Things get messy and complicated anyway, because in that approach you rely on 3rd party software packages that are not distributed with the application but are expected to be provided via the distribution.
From an end user perspective, using today’s Linux package managers is great. From a developer perspective it is the classical dependency hell:
Every to-be-supported distribution and version will require a different dependency graph to be qualified for your solution!
Docker as a Solution Container
Many people look at Docker from the perspective of virtualization – with a focus on isolation of runtimes. But it is actually the opposite. Docker is a means to share operating system managed resources for applications that are packaged packaged with their dependencies in a distribution independent way. From the packaged application’s perspective, its execution environment “looks” like a reduced installation of a Linux distribution, that, by means of building Docker files, was completely defined at development time:
By providing means to map shared resources like ports from the Hosting OS to Docker containers, Docker even allows to “trick” internal configuration (e.g. of the database port) into a shared execution.
Looking at it from Higher
If we take one step back again, what we actually see is a way of deploying a statically linked solution that includes everything except for the actual OS kernel. That is great and it solves the dependency problems noted above at the expense of somewhat higher resource consumption.
However, if there was a better standardized Linux base layout and better defined ways of including rather than referencing libraries and well defined “extension points”, e. g. if the Apache Web Server could discover Web Applications in “Application Folders”, if databases would discover database schemas and organize storage within the deployment, if port mapping was a deployment descriptor feature and so on … we would need none of it and have much more flexibility. If we had it on the level of the OS, we would have a huge eco-system opportunity.
It is this extensibility problem that any application server environment needs to solve as well – but never does (see for example Modularization is more than cutting it into pieces, Extend me Maybe, and Dependency Management for Modular Applications)
Creating a Docker image is not as simple as building a folder hierarchy, however, in essence, Docker provides a way to have our own Solution Layout on a Linux system while having strong control over third-party dependencies and still be easily installable and runnable on a variety of hosting environments. It is a cross-platform installation medium.
That is great. But it is really the result of wrong turns in the past. Docker found a dry spot in a swamp.
If it was safely possible to reliably contain required dependencies and configuration – a simple folder based software installation mechanism would have saved the world a lot of trouble.