Monday 25 May 2009

Still stuck with JEE application server while using Spring?

GigaSpaces application server offers a great alternative to commercial level application servers which are becoming an over kill as the usage of the full fledged application server resources are no longer needed. It is a common paradox nowadays to see JEE app-servers that actually running Spring…


The migration efforts require some analysis, in many cases such as a standard web application the migration is seamless, other cases such as specific app-server resource usage as transaction management, connection pooling etc’ should be inspected and require some configuration changes.


What are the benefits and what still needs to be resolved?


By simply deploying your web application into GigaSpaces you’ll benefit from advanced, elastic runtime known as GigaSpaces service grid which is using SLA driven containers. The elastic runtime is actually application level virtualization (as appose to KVM/VMWare) which does not relay on static IP or specific hardware configuration.


What can the SLA driven container do for you out of the box?


Align with your SLA requirements that can be based on CPU / Mem / custom business logic monitoring beans. In practice that would be specifying how many nodes should serve your application at any specific time with real-time provisioning of your existing commodity servers. No more provisioning guessing game ending in costly over provisioning or worse downtime due to under provisioning


The SLA driven container takes the proactive approach to scale-out on demand based on real-time events such as order-backlog threshold, high CPU/Memory utilization. But what happens if my server hardware just crashed? This is where the self-healing kicks-in. The self-healing is automatically invoked once the SLA was breached by for example any lose of computational resources. In such event the grid service manager will provision a new resource and re-deploy the missing application/service.


High availability has many levels, normally in JEE it is based on the database HA and a simplistic app-server restart invoked by a watchdog service. Another aspect specific for web application is the session replication which is based on memory replication of the http session objects to the entire application server cluster. GigaSpaces extends the session replication resiliency by using in memory clustering of the application data, this part normally involves some coding based on the complexity of the data model and transactions.


Designing a scalable architecture


Scalable application is measured by its linearity, trying to solve the problem at the database level or using JVM replication layers would not lead to linearity as bottlenecks will just continue to pop-up wherever there is a stateful service / data. There are fundamental requirements involving scalability – first is scaling the entire application stack (not just the data), second is implementing the share-nothing approach (AKA Processing Unit) per business unit.


Trying to handle it all normally involves clustering different technologies like messaging, data and service, you'll quickly end up with a complex stack using many moving parts that should be integrated and maintained over time, each with its own fail-over strategy and high skillset required.


Critical path approach


I often get the questions/remarks related to in memory data clustering in terms of size and safety perspective, the remark usually goes like: “we have terabyte of data, we can't cache it all, and if we could it would cost too much”


Ideally a processing unit would hold its own service, data and messaging. One must realize that is not always the case, especially in complex business applications or massive data (in which it might not be cost effective). BUT there is a way out, the rule of thumb would be identifying the critical path both in terms of throughput and latency, then use the tools available (such as SpaceAPI and collocation) to crash that latency path and cache (and partition) the data.


The soft-points exist on the shared resources, no matter how optimized the replication mechanism is (“cool concurrent field level”, object level or in memory database binary log replication), you can actually take Amdahl law to the test and find the diminishing throughput per node until you reach the dead-end. If your solution is based on replication, master / slave you'll probably feel it on the 2nd and 3rd nodes...


A short say about cost


Costs are measured by many aspects such as learning curve, time to market, hidden costs, CTO, ROI, ROA etc’ there is a lot of public information about it. What I want to emphasize is future proofing, existing skills, and open source.


Application servers as runtime platform no longer can relay on hardware / database clustering, it is their job to take care of HA and scaling and be able to meet the requirements leveraging exiting hardware. Real-time HA should not be an extended cost. IT is moving towards elasticity (cloud) based on pure cost savings, that environment requires different runtime capabilities in order to leverage the cloud, the main factor is being agnostic to different (on the fly) provisioned hardware (IP, Disk etc’) .


Last say on open source which I use on a daily basis, open source has a dramatic impact on future standards using the bottom-up approach, so when I choose an app-server I want to verify it supports open-source standards as Spring, BUT as this market grows and commercial entities start take part in it you’ll soon find out they are using the “drug dealer approach” giving you something for free, (sometimes a very essential part of your system!) then when it comes to future requirements or moving to production you discover that HA and other feature are not part of the community package. Trying to find out the costs you might discover that there isn’t any list price available….