Lately I have been working on (re-) designing the operational mode of a Java-based solution that includes Hadoop and HBase and is to be scaled out in a “cloud-style” manner. In other words: changing the set of nodes that define the operational environment may change dramatically over the course of time in number and in other parameters (like the exact OS used) – and hence that change must be implementable as cheap as possible. And, if you do not want to get overwhelmed by scale complexity later on, this is something to definitely keep in mind right from the beginning.
Complexity is a killer for clustered solutions. For systems that may need to scale into the hundreds of nodes, any complexity avoided up-front is not just a risk-reduction. It’s a definite must-have.
In order to reduce complexity, you need to narrow dependencies between your stuff – that you can fully control – and their stuff – that you have reduced control of. In an ideal world, you would not need anything like a node local OS in the first place. Of course (and unfortunately) that is not realistic. The next best solution would obviously (!) be that all nodes get a copy (and updates) of everything they need by something as simple as running an rsync, an Subversion checkout, or a “Git pull”.
How to Install
The other important aspect, next to narrowing OS dependencies, is locality of what you install and users.
Linux package managers (rpm, dpkg) are a great way of configuring your OS as far as it concerns basic, stable components where incompatibilities are highly unlikely. If you can, getting along without them is better though. Anything you can configure to run from where you copied it to is naturally superior in simplicity, consistency, clean removal, clean update, ability to install parallel version – i. e. in all practical matters.
Also, and I personally think this is a quality that hardly gets the recognition it deserves, software that can be installed and run from a folder of your choice can be used by developers just the way it is used in production.
Package managers solve a problem in providing to a Linux distribution with dependencies. They do not necessarily solve your problems: On a later solution packages may have become incompatible with previous versions (as in the case of Ganglia for example). The JDK you used may not be available anymore, etc. You do not want the Linux distribution providers or other third parties to determine when you need to change your solution! You need to be on top of that!
Here’s my recipe:
- Everything possible is installed in and executed by one and the same user account
- All software as far as possible is installed in one simple folder layout that has one folder per software component
- Moving software into location is generally good enough. No post-copy transformation is acceptable. At most environment variables may be used to convey topological information.
Specifically, the JDK, hadoop, hbase are installed by folder checkouts.
Fortunately, Java provides a rather solid isolation against OS matters to the extent that you can install a Java Runtime Environment essentially by copying it and you generally do not need native libraries that would have to be installed “ouf of band” (Admittedly as “malloc uses excessive memory for multi-threaded applications” shows the JDK insulation is sometimes not strong enough).
Just like people may look alike but do different things, so do nodes in your cluster. Doing different tasks translates to running different processes with potentially different configurations and may also mean to install different software components.
Fortunately there are tools that automate preparation of a node configuration as well as making sure that exactly the right stuff runs.
I chose CFEngine. While its configuration language sucks, the underlying model (promise convergence) is cool and it has hardly any dependency on anything else (which is arguably the coolest feature of CFEngine). So in short the process of adding a node to the cluster means:
- Add the node to the policy server stored configuration (i. e. designate a purpose – which can be based on IP ranges for example – and hence be “automatic”).
- Throw CFEngine at the node (install one .deb or ,rpm).
- Bootstrap the local CFEngine for the policy server.
Can be done even by an SSH-remoted script.
Once up, CFEngine will check every few minutes whether promises still hold. This includes configuration updates but also making sure your server is still running the right processes and services (assuming your promises are designed accordingly).
The other thing that is cool about CFEngine is that it can run in user-space. That is you could split your configuration into promises that require root-permissions (which better be few and robust) and leave the ugly stuff to user space configuration. If there is anything you want to hide from user space processes (e.g. private keys), you need to block root access as much as possible.
Finally some words on
Along with the ability to bring your solution to the nodes and to adapt it as needed and to make sure all necessary things are up, you can make sure monitoring agents are their and running. Agent based monitoring tools such as Ganglia and Zabbix provide a powerful information interface to the piece you are looking at. The “push from agent” model has other advantages:
- As agents call the monitoring hub, they may self-register
- If the agent doesn’t talk anymore, it is clear the node is in trouble
- Less contention: A node gone bad will not hamper information retrieval (i.e. no blocked and waiting connections).
In the setup above, Z2 is used as execution environment for application code, either to run some actual service endpoint, to run background jobs, or to run MapReduce jobs as in How to Hadoop. Z2 is installed by a core check out (as usual) and pulls its updates on its own.