Only a few short decades ago, the IT industry suffered from various technical limitations because of what eventually became known as Moore’s Law. Technology (especially data storage technology) was expensive, limited in its capabilities, and complicated to apply. Unless an enterprise had the significant financial means to to overcome such hurdles, they usually had no real choice but to accept them and work with such limitations. Almost everyone can remember a time where storage was expensive and capacity calculations were critical for maximizing impact while minimizing costs. Having single representations of data was difficult enough and very few people could wrap their minds or their wallets around redundant representations of the same data in different technologies. It was just not logical to pursue such options. Now, data redundancies are not only viable but they are also becoming the norm. Welcome to the world of data Persistence Polyglots.
Defining a Polyglot
A Polyglot, in the form of a noun, is a combination or mixture of languages or nomenclatures.
The simplest example is a person who can speak in many different languages. He/she is a language polyglot (a.k.a. a polyglot).
Defining a Technology Polyglot
A Technology Polyglot represents a combination or mixture of technologies to facilitate or represent the same things.
We also see Technology Polyglot examples in monitoring tools that have agents which are written in different languages, which are targeted for and deployed to different operating systems, and which all perform the same monitoring and data collection functions, regardless of language or operating system.
Defining Polyglot Persistence
Polyglot Persistence (a.k.a. Polyglot Storage) is the paradigm of using multiple storage technologies to facilitate data storage.
Polyglot Persistence is often used as a paradigm in applications where different things are stored in different storage technologies to facilitate the interactions of one larger complex system. For example, in-memory storage or caching might be used to speed up access to smaller and more frequently accessed sets of data, while large back-end operational databases might be used to store all operational data.
Another important example of Polyglot Persistence (a paradigm) is what has come to be known as the Persistence Polyglot (i.e. a solution that is a manifestation of the paradigm).
Defining the Persistence Polyglot
The Persistence Polyglot (a.k.a. Storage Polyglot) represents a storage solution that allows the same data to be redundantly represented (possibly different ways) across different data and/or document storage technologies.
As noted earlier, this solution seems counter-intuitive and wasteful when compared to older design paradigms that were forced to limit data storage to one single solution because of technology and cost limitations. However, it has benefits that will be discussed later in this document, which are driving its broad adoption and use.
The Purpose of the Persistence Polyglot
The purpose of the Persistence Polyglot is to allow the same data to be accessed and used different ways, by different technologies and by different system components that need to work different ways.
Tiny Data versus Big Data Persistence Polyglots
Tiny Data Persistence Polyglots usually focus on small and controlled data record sets (i.e. Data Instances) across a smaller set of Data Types. For example, if we have some marketing and sales applications that we group together, we may want them to share the same Persistence Polyglot that contains that marketing and sales data.
Big Data Persistence Polyglots tend to play in more of a data warehouse space. They have many different Data Types with massive quantities of Data Instances/Records across all those types. These polyglots tend to easily grow into the Petabytes range and are often used for enterprise reporting, analytics, and data science activities. They are also used as sources for enterprise functions like enterprise Data Governance (DG), Master Data Management (MDM) and Data Lineage (DL)… providing Static Reference Data and Dynamic Reference Data to downstream systems that need them.
The Benefits or Pros of Using a Persistence Polyglot
There are numerous advantages that come with simultaneously and redundantly storing your data in different storage technologies:
- You do not have to write many different custom translators and integrators that are intended to tie data to downstream systems and tools that all leverage many different technologies.
- Because you don’t have to write as many translators and integrators, you can reuse many of your existing technologies that are intended to deal with your data (e.g. your reporting tools or your search tools). This means you are extending the usable life cycle of and can continue to get more out of your investment for those previously purchased technologies.
- Because you don’t have to build so many translators and integrators, building and delivering solutions becomes far simpler and quicker.
- By physically separating Persistence Polyglot technologies, your enterprise will have natural data redundancy. This is because the same data exists in different storage devices, across different locations. In the event one technology fails, data can quickly and easily be copied from any one working area of the polyglot to recover from any failed component.
- You can see and use your data many different ways with very little effort.
- Data Governance (DG), Master Data Management (MDM), Data Lineage (DL), and Records Management (RM) functions become much easier since certain database technologies are far easier to work with for such activities than other databases.
The Detriments or Cons of Using Persistence Polyglots
- While costs of storage are very low compared to just ten years ago, using multiple different and simultaneous storage technologies will drive your solutions cost up because you will have to engineer, deploy, operate, and support each Persistence Polyglot component. However, this is considered to be offset and sometimes even completely eliminated by the savings that comes with not having to build as many custom downstream integrations.
- You will want to write your own data synchronization solutions to ensure that data stays aligned across each component of the polyglot.
- You and/or your enterprise will require skills that can deal with each technology in your Persistence Polyglot.
Summary and Conclusions
- Years ago, Engineers and Computer Scientists would limit themselves to using one selected database storage technology because of the constraints (e.g. costs and complexities) associated with using more than one. These limitations are now gone.
- Because these limitations are gone, it becomes easy to use many different database storage technologies, simultaneously, allowing for data to be redundantly spread across them.
- In short, the advantages of such a redundant data solution are that your data is readily available in different forms which already comply with other pre-existing and different technology interfaces, ultimately eliminating many of the costs and complexities associated with transforming and integrating data with consuming technologies and systems.
- The downside is that you’ll require the skills to design, deliver, operate, and support many different data storage technologies but the intent is to offset this negative with the costs and time saved by not having to create and manage custom data translations and integrations for, both, upstream and downstream systems.