Wednesday 3 June 2009

Data Partitioning using GigaSpaces

Data partitioning is the key to reach linear scalability from data growth perspective, here is my take on GigaSpaces partitioning:

Each Data Grid instance (partition) holds a different subset of the objects in the Data Grid. When the objects are written to this Data Grid, they are routed to the proper partition, according to a predefined attribute in the object that acts as the routing index.

When a component (i.e. Data Feeder / processor) write / read / take an object from the data grid it is using the built-in “content based routing” functionality of GigaSpaces which means the object operation would be based on the routing index object property that explicitly points to the relevant partition.

Having said that one comes to the conclusion that the number of partitions can’t change, well that is true. BUT GigaSpaces can still achieve the result of “dynamic re-partitioning” by using its ability to scale out. Looking into dynamic repartitioning the actual requirement is derived from the ability to cope with growing amount of data that naturally consumes growing amount of memory.

Using GigaSpaces you can host any (predefined) number of partitions on a single server and when memory shortage occurs the existing partitions can start the “scale out procedure” utilizing standby / new hardware available on the cloud. In other words: on data peak-loads GigaSpaces is able to relocate partitions to available computational resources using GigaSpaces virtualized runtime (GSC).

From architectural perspective having static predefined number of partitions leads to better data affinity, minimum latency in terms of network hops, natural load sharing model which is more resilient. compared with true dynamic repartitioning one must consider the followings: additional cache instance does not lead to balanced load as the the new instance will slowly fill up equally to the existing partitions which continue to share the load and potentially reach out of memory. secondly in many cases there will be additional network hops to find distributed data (will that be transactional...)


Mickey

8 comments:

  1. .. or use Coherence, which dynamically re-partitions as servers are added or removed ;-)

    Peace,

    Cameron Purdy | Oracle
    http://coherence.oracle.com

    ReplyDelete
  2. As i pointed out in Gojoko post : Coherence vs GigaSpaces to the re-partitioning comment:

    "As far as i know this is the same with both products only that we explicit partition instances and Coherence uses implicit logical partitions. In both cases dynamic scaling would be changing the number of running partitions per JVM container (GSC in our terminology). You can start with 100 of partitions even if you have two machines and spread those partitions as soon there are more resource available. When a machien unit goes down the system will not wait till a new machine becomes available - it will scale down to the existing containers as long as it detects that there is enough memory and CPU capacity available. I’m not sure how scaling down works with coherence but one thing to check is whether there scaling down could lead to out of memory issues in case there is not enough capacity on the available machines.

    I would refer to the main difference between the two approaches as black-box(Coherence) vs White-box (GS). In our philosophy clustering behaviour should be managed in the same consistent way across the entire application stack which means that when we manage a partition or when we scale the application we scale not just the data but the business logic and messaging and any other component that needs to be associated with that. Our experience showed the the black-box approach is simpler to get started with but can be fairly complex once you start to deal with scaling on other layers of the application such as the business logic, messaging layer or the web layer. In many cases this leads to different clustering models across the application tiers which leads to more moving parts and complexity etc. For example in our case if a data grid container or a web container crashes the process of maintaining thier high availability would be exactly the same."


    As of XAP 7.0 release we also added the ability for users to write their own custom SLA and scaling behavior programaticaly - See reference here
    This will enable you to monitor the entire application behavior and decide what threshold should trigger scaling out or down, automate the entire deployment and manage slef-healing when there is a failure.

    This also seem to be reassured by one of Coherence users in his coment:

    "P.S. Coherence’s partition count is also fixed and cannot be changed while the cluster is running."

    ReplyDelete
  3. Nati -

    Your comments are both confusing and misleading, which is quite an accomplishment.

    To start with, Coherence partitions are not tied to threads, or nodes, or JVMs, or servers .. so your comment about Coherence partitions being fixed is like saying "the size of the earth is fixed, so you can't add a new room to your house."

    You also say, "clustering behaviour should be managed in the same consistent way across the entire application stack which means that when we manage a partition or when we scale the application we scale not just the data but the business logic and messaging and any other component that needs to be associated with that," which is exactly how Coherence works, and has always worked, since it introduced these capabilities to the market :-)

    Peace,

    Cameron Purdy | Oracle Coherence
    http://coherence.oracle.com/

    ReplyDelete
  4. Cameron its funny but i could argue that continuesly posting comments on your dynamic clustering without any detailes on the limitations and assumptions behind it such as the fixed number of logical paritions is confusing and misleading ...

    You seem to continue to put vague comments on how Coherence handles application level scaling and clustering. Just saying its a great product that does magic is not helpful.

    Perhaps you can answer the following questions:

    1. If i want to make sure that primary and backup copy of my data is not running in same datacenter how do i do that?

    2. How do i know where each data item "lives" in the cluster?

    3. If a node fails and the entire cluster is at full capacity what would happen?

    4. When a node moves from backup to primary state how would the business logic associated with it change its state accordingly?

    5. How can i write my custom SLA? i.e. trigger scaling of the cluster when the an application reaches certain threshold?


    Nati S.

    ReplyDelete
  5. Nati -

    Since I answered most of these already on TSS, I'll just cut and paste here for you:

    1. Is there away to control whether data migration will happen only when there is a real need for it (not when a new JVM joins the cluster)?

    It's going to be hard to explain this to you, since Gigaspaces has such a different architecture, resembling more of a federated system (e.g. routing by the clients, a la memcached) than a cluster (a la Coherence). In Coherence, the location and load-balancing of data is performed asynchronously and transparently to the application, without data loss or corruption, and *without relaxing write consistency*. When a server joins the cluster and it is configured to manage data, that is what it does, so if there is "no real need for it", then the server wouldn't be there.

    2. Is there away to control that primary and backup will not be provisioned on the same data-center?

    Yes, but we do not suggest clustering to be used across two data centers (again, because our clustering is not just a bunch of servers federated by clients, but an actual cluster).

    Our Push Replication feature is typically used instead; see the Oracle Incubator for more details. High availability across a WAN is certainly one of our main selling points for our data grid edition, since (we've been told by customers) *it's the only solution for achieving HA data grids across multiple data centers*.

    3. What will happen to my data if a machines goes down and the existing machines are already at full capacity?

    Running Gigaspaces or running Coherence? ;-)

    Coherence supports off-heap storage, overflow, etc. It has for years.

    Normally though you'd use an N+1 capacity plan, and (e.g. with Oracle Enterprise Manager or WebLogic Liquid Operations Control) that would ensure that there was always an extra server, even if one went down.

    Honestly though, if you're running a system at full capacity, you should not be running systems .. ;-)

    4. When a node moves from backup to primary state how would the business logic associated with it change its state accordingly?

    With Coherence, all nodes with a storage role can manage both primary and backup information, and when information moves from backup to primary there are both data events and partition events that the application can respond to. When a certain large bank moved from a certain failed "SBA" implementation to Coherence, this is exactly how they solved their HA problems, by modeling their foreign exchange system as a finite state machine, and using the events on failover to relocate their market matching engines.

    5. How can i write my custom SLA? i.e. trigger scaling of the cluster when the an application reaches certain threshold?

    Coherence Suite includes this capability out-of-the-box, and has for some time. See the answer to #3 above.

    Peace,

    Cameron Purdy | Oracle Coherence
    http://coherence.oracle.com/

    ReplyDelete
  6. Cameron

    See my detailed response on TSS here

    "Coherence Suite includes this capability out-of-the-box"

    Can you point me to Coherence Suite?

    "when information moves from backup to primary there are both data events and partition events that the application can respond to"

    What will happen during the period where the partition became primary but the event didn't yet arrived to the listener.

    Nati S.
    GigaSpaces

    ReplyDelete
  7. Nati -

    I responded to you on TSS:

    http://www.theserverside.com/news/thread.tss?thread_id=54903#310134

    > Can you point me to Coherence Suite?

    Coherence Suite used to be called WebLogic Application Grid.

    > What will happen during the period where
    > the partition became primary but the event
    > didn't yet arrived to the listener.

    What listener? Or perhaps I should ask "which listener"? The listeners I was referring to are those _inside_ the partition itself.

    Coherence partitioning (like all of our services) is modeled as a finite state machine. To the entire cluster, the partition is never in an "in between" state. The event arrives to listeners outside of the partition at the instant that the partition is primary. (Consider that in a distributed system, the concept of "the instant" is actually a concept of data flow, not of wall clock time.)

    Peace,

    Cameron Purdy | Oracle Coherence
    http://coherence.oracle.com/

    ReplyDelete
  8. Cameron,

    I responded to you on TSS:

    http://www.theserverside.com/news/thread.tss?thread_id=54903#332217

    In summary:
    "
    For Pete's sake Cameron, how can you spew this BS and keep a straight face? Either you're willfully ignoring facts you don't like to hear or have a VERY selective way of assimilating information about competitors. You can hide behind "our customer tells us" . . . pretty lame. RAISE YOUR GAME!
    "

    Dear Readers: I post this here because it happens to be where I noticed Cameron's comments first, and I felt they were seriously stepping over-the-line.

    Cheers,

    Gideon Low
    Principal Architect -- Customer Facing Solutions
    GemStone Systems
    http://community.gemstone.com
    gideon.low-AT-gemstone.com

    ReplyDelete