When designing software that runs in a distributed environment, an extremely helpful tool is to look for slow-world analogies. As our brain thinks much more intuitively when considering human-implemented processes, finding flaws in system deployment architectures is significantly simpler in the analogy and surprisingly accurate.
In the analogy we identify
|A thread||An activity to attend to (e.g. sorting letters)|
|An OS process||A worker, or more politely: A human|
|An OS instance (a VM)||A home|
|A remote message||A letter|
|A remote invokation||A phone call|
|A file||A file|
You can easily go more fine-grained: A big server running a big database for example corresponds to a big administration building with lots of workers running around piling files in some huge archive packed with file cabinets.
In contrast some legacy host running a lot of under-equipped virtual machines is more like a … trailer park.
Asynchronous communication clearly corresponds to the exchange of letters while phone calls play the role of synchronous service calls and so perfectly allow to model scalability and reliability characteristics of both communication styles.
Example 1: De-coupling via asynchronous communication
It is not uncommon that crucial bottlenecks in a distributed architecture derive from some many-to-one state updates that was simply not taken seriously. I.e. many places synchronously call one place to drop off some state update.
In the anology it is perfectly obvious that having many people call in via phone is much more expensive in terms of capacity requirements and much less reliable than processing piles of letters – a work load that can be independently scaled, is very reliable, and makes good use of resources.
Example 2: Node-local search index
In online portals, a shared database can become a major data reading bottleneck that in addition needs to process most crucial updates as well. In the analogy this corresponds to a blackboard (the DB) and many remote workers (the front ends) calling in to ask for some piece of information. It is much more efficient to hand a periodically updated copy (a catalog) out to the front end workers.
Example 3: Zero-Downtime deployment
This is a particularly nice one. The problem addressed by ZDD is that in a distributed setup, a partial roll out of a new software version introduces some not completely trivial compatibility constraints. In particular, any shared resource (a database, a shared service), when upgraded, still needs to accept interactions with some range of previous software versions running on its clients. In the analogy this corresponds to remote offices where clerks still use an old form in some and a new form version in other offices. A central office needs to be able to process old forms as well as new revisions. Likewise when sending out information to remote offices, it needs to be presented in a format comprehensible by clerks that have not been trained for the new version and yet needs to comply to the latter as well. All ZDD requirements for the IT analogy follow.
I guess, you get the point and I will stop here.
A Final Note
One last piece however, an axiom to the whole idea, if you will, is the
Underlying principle: We all are built the same – we just happen to do different things
Considering traditional labor, this is pretty much true in the real world. It should similarly be true for your solution: If your (anology) workers are overspecialized (can only speak on phone, will not process paper forms…) for no other reason than a deployment diagram that seemed to be a good idea at some time, you are in for trouble mid-term.
That is: As a general principle (modulo well-justified exceptions) all nodes in your deployment decomposition can – in principle – do any kind of application work, from rendering a front end to computing a report.
As a corollary this implies that: Not doing something but still being able should not incur pain in terms of added deployment and configuration complexity. (see also modularization and integratedness).