What is a Data Grid
Often when doing meetup presentations about Apache Ignite, I ask the crowd if anyone has ever heard of what a Data Grid is. I usually get only a few hands. However, when I flip the question and ask what Distributed Caching is, everyone in the room immediately raises their hands and nods in understanding. The reality is that a Data Grid can be viewed as a Distributed Cache with extra features, so if you do know what a Distributed Cache is, you probably already know a lot about Data Grids as well.
Generally, the term distributed cache means ability to replicate data in memory, so it is accessible from anywhere in the cluster. Data Grids usually accomplish this by partitioning data in memory, where each cluster member is responsible only for its own subset of the data. You can also think of it as a distributed Hash Table. This way, the more servers are available in your cluster, the more data you can cache.
Data grids are generally known for having a fairly rich feature set on top of in-memory caches. The 3 main features that are absolutely mandatory for any data grid solution are:
collocation of compute and data
Without the above 3 features, you cannot really call a product a data grid. Many vendors also differentiate between each other by adding other popular features, including:
Off-Heap Memory (to avoid lengthy GC pauses)
Some of the popular Data Grid providers include Apache Ignite (incubating), Hazelcast and Infinispan in the open source space, and Oracle Coherence and GridGain commercial offerings. GridGain is a commercial offering of the Apache Ignite.
What is an In-Memory Data Fabric
In Memory Data Fabrics represent the natural evolution of in-memory computing. Data Fabrics generally take a broader approach to in memory computing, grouping the whole set of in memory computing use cases into a collection of well-defined independent components. Usually a Data Grid is just one of the components provided by a Data Fabric. Additionally to the data grid functionality, an In-Memory Data Fabric typically also includes a Compute Grid, CEP Streaming, an In-Memory File System, and more.
The main advantage of an In-Memory Data Fabric is that all of the provided in-memory computing components can be used independently, while being well integrated with each other. For example, in Apache Ignite a Compute Grid knows how to load-balance and schedule computations within a cluster, but when used together with a Data Grid, the Compute Grid will also route all the computations that process data to the cluster members responsible for caching that data. The same goes for Streaming and CEP - when working with streamed data, all the processing happens on the cluster members responsible for caching that data as well.
Commonly seen features of In-Memory Data Fabrics include:
Data Grid (must have for any Data Fabric)
Streaming & CEP
Distributed File System
Apache Ignite, an Apache Incubator project, is the only In-Memory Data Fabric available in the Open Source space. GridGain provides a commercial, enterprise edition of Apache Ignite that is targeted toward production, business critical use cases.