User Friendly Production Updates

February is almost over – so little time to spend on this blog. Here is a short post on something cool we did a while back in Z2.

The original request was along the lines of

 

Can we do an software upgrade without interrupting a user’s work?

 

This is in the context of the not so insignificant class of applications that require users to operate on some non-trivial but temporary and yet rich state before a persistent state change of the system can be performed. That is, applications that keep non-trivial session state where kicking out users means a real loss of time and nerves and is more than unfriendly.

Still – what if there is an important update to an intranet application that should be applied and, say, it should be tried by some group of users after lunch?

Applying a software upgrade to a running application without interfering with the work progress of currently logged-in users has some natural limitations. For example, smart data migrations will be extremely hard to get right (in a stateful scenario). But anything above the domain level might actually work.

Theoretically the most natural approach would be to temporarily store user session data somewhere, “replace” the application and load the user session data into memory again. Practically speaking however, it will be hard to find complex Java applications that use session serialization and would be assumed to reliably save and restore a user session. Furthermore, during the time of the application restart there would still be some downtime that may be taken as a system failure by users.

So, instead of doing something smart why not do something really obvious:

 

Leave the application running until the last user has logged off (or was logged off due to session expiration). Present the new application version to all user that log in after the update.

 

In fact, the approach we took leaves the whole (frontend) application stack running until the last session “running” has become invalid. It is implemented by the Gateway module and described in detail in the wiki. Here is a short summary:

Normally, a Z2 in “server mode” has at least two processes running: A home process that serves as a watch dog and synchronization service plus a webWorker node that runs the (Jetty) Web server and whatever Web applications are supposed to be up.

With the Gateway module this setup is altered (by configuration) in that the actual Web server entry point is now running in the home process and forwarding requests on a by-session scheme to the actual webWorker process. Worker processes, such as the webWorker, can now be detached from the synchronisation procedure. That is, instead of being stopped because of changes, detached worker processes are simply left unaffected from any updates until nobody needs them anymore:

steady_workers

Using that mechanism, users can decide to complete their current work and decide to upgrade at their convenience by logging off and on again.

Finally, this is another cool show case of how beneficial worker process control within the execution environment is.

References