Continuing on where I left it on my previous post, I’m going to explain how the Announcement service works and why we choose that approach.
The way JBoss and mod_proxy work now is that every time something changes in the topology, either a new proxy is added or removed or a JBoss node, then the proxy list has to be updated and both the node and the proxy have to be aware of their existence.
Mod_proxy is using multicast to announce itself to the cluster but as this is not supported on Windows Azure, we created our own service that runs on the proxy and on the node also. Each time a new proxy or a node is added/removed, the service notifies the rest of the instances that something changed in the topology and they should update their lists with the new record.
The service is not running under a dedicated WorkerRole but it’s part of the same deployment as the proxy and the JBoss node. It’s a WCF service hosted inside a Windows NT Service listening on a dedicated port. That approach gives us greater flexibility as we keep a clear separation of concern between the services on the deployment and we don’t mix code and logic that has to do with the proxy, with the Announcement service. Originally the approach of using an NT Service caused some concerns as how this service is going to be installed on the machines and how can we keep one single code base for that service, running on both scenarios.
First of all, you should be aware that any port you open through your configuration is only available to the host process of the Role. That means if the port is not explicitly open again on the firewall, your service won’t be able to communicate as the port it’s blocked. After we realized that, we fix it by adding an extra line to our Startup Task which was installing the service on the machines. The command looks like this:
which is part of the installer startup task
To make our service even more robust and secure we introduced a couple of NetworkRules that they only allow communication between Proxies and Jboss nodes:
Any kind of communication between the services is secured by certificate based authentication and message level encryption. The service it’s a vital component in our approach and we want it to be as secure as possible.
The service is monitoring a couple of things that helps us also collect telemetry data from the Jboss nodes, but it’s also wired to a couple of RoleEnvironment events like OnStopping and OnChanged. Everytime there is an OnStopping, we send messages out to all of the other service instances to de-register that proxy from their list because it’s going down. Also, the service itself is checking on specific intervals if the others nodes are alive. If they don’t respond after 3 times, they are removed. The reason we do this, is to handle possible crashes of the proxy as fast as possible. Lastly, everytime there is an OnChanged event fired, we verify that everything is as we know they should be (nodes available etc).
Next post in the series, the cluster setup.