With the advancements in semantic technologies, CMDBs are moving out of the depths of IT Infrastructure and Operations organizations and into areas of businesses that are further up the value chain to help solve complex knowledge management problems. Being able to do this requires a very clear understanding of CMDBs. This publication covers what a CMDB is and is not, their purposes, their common capabilities and features, their different design paradigms, their different implementation models, important use cases, how they can be used to solve higher-value business problems, and different advantages and disadvantages that any professional thinking of implementing a CMDB should be familiar with.
Defining a CMDB
At a minimum, a Configuration Management Database (CMDB) is a Knowledge Management (KM) tool that allows people to see and explore data and the relationships within and between such data. Some CMDBs go beyond this basic definition and offer capabilities like content embedding, analytical reporting, interactive data visualizations, and automated Semantic Relationship Harvesting and Creation. However, the minimum capabilities of a CMDB are those that allow end users to see and explore data and data relationships. Anything else is icing on the cake.
This being said, it should be made clear that not all CMDBs are semantic. For example, some do not use Semantic Keys for intuitive identifiers or Semantic Relationship Descriptors (a.k.a. Predicates) to help capture context and meaning within relationships (a topic which will be addressed in more detail, further in this publication).
Configuration Management Vernacular
For reference, terms that are commonly used with Configuration Management and CMDBs are:
- A data type is called a Configuration Item Type (CIT)
- A data record of a specific data type is called a Configuration Item Instance (CII or often just CI)
- A relationship between any two data records is called a CI Relationship (CIR).
- A non-semantic relationship is just referred to as a Relationship.
- An attribute or field within a CII is a CI Attribute (CIA) or CI Field (CIF).
We will use these terms and acronyms throughout this publication.
Alternate names for CMDB
Alternate or synonymous names for Configuration Management Database (CMDB) include but are not limited to:
- Semantic Knowledge Base
- Semantic Knowledge Repository
- Semantic Repository
- Semantic Database
- Semantic Library
- Digital Library
What a CMDB is not
While a CMDB is a tool that can be a powerful solution for visualizing relationships between people, places, and things, A CMDB is not the only type of Configuration Management (CfM) tool available for solving Configuration Management (CfM) problems. To be clear, there is no one single tool solves all CfM problems or requirements. For example, a CMDB is almost never designed to capture sequence or state, and those few that are designed to do so rarely perform such functions well. (As technologies evolve, this statement may change in the future.)
For these reasons, you should be aware that other tools do exist to solve other CfM problems. For example, there are software versioning, software build, and software release management tools, which all fall into CfM. There are also robots that can build things like computers or engines, which also are CfM tools. A CMDB is simply (or maybe not so simply) another tool that helps solve just a few of the many different problems related to the discipline of CfM. Therefore, a CMDB is often coupled with other CfM tools to help enterprises with the broader problems of CfM that are important to them.
Understanding CMDB Relationships
A good CMDB allows for and handles different relationship permutations. For example:
- Circular Relationships
- One-to-One Relationships
- One-to-Many Relationships
- Many-to-One Relationships
- Many-to-Many Relationships
Relationships are usually stored in one of two forms: Non-Semantic form or Semantic form.
Non-Semantic Relationships versus Semantic Relationships
There are two types of relationships that can be supported between CMDB data records (CIIs). They are:
- Non-Semantic Relationships (NSRs): NSRs are connections or relationships between two data records that have no descriptive context, which help humans understand meaning or purpose. For example, we can relate a specific person “Jane Doe” to a specific product “MyProduct XYZ” but we have no context for how the two are connected. For example, humans don’t know if Jane is a Product Owner, a Subject Matter Expert, or a Technical Support Contact for that product.
- Semantic Relationships (SRs): SRs are connections or relationships between two data records that do have descriptive context, which help humans understanding meaning or purpose. For example, we can explicitly relate a specific person “Jane Doe” with a specific context “Product Owner” to a specific product “MyProduct XYZ“. This allows humans to clearly understand that, in this very specific case, “Jane Doe is a Product Owner for MyProduct XYZ“. There can also be other distinct relationships that exist simultaneously: “Jane Doe is a Subject Matter Expert for MyProduct XYZ” and “Jane Doe is a Technical Support Contact for MyProduct XYZ“.
CMDB Relationship Harvesting
Getting to the point where an enterprise can successfully harvest relationships (i.e. identify, collect, create and maintain them) between data records is not a simple task because:
- there are far to many different data types that have different existence traits,
- there could be endless quantities of relationships to harvest, and
- both data and relationships are constantly changing for most enterprises.
There are two methods for harvesting and maintaining relationships:
- Manual Relationship Harvesting and Maintenance
- Automated Relationship Harvesting and Maintenance
Manual harvesting and maintenance of relationships usually breaks down very quickly. There are simply far too many to collect and to change as underlying dependencies change (which is often and frequent). While there are some exceptions (such as the data created in Enterprise Architecture functions) it is considered a very bad practice to attempt to manually harvest and maintain all your relationships. This is especially true if you intend to leverage Semantic Relationships, which adds a much higher level of complexity.
Automated harvesting leverages Auto-Discovery Tools (ADTs) and Semantic Engines to harvest data relationships from specific data domains. For example, network crawlers can crawl computer and telecommunications networks for technical CIIs and the CIRs between them but cannot handle the identification and relationship harvesting associated with non-discoverable Business and Industry CIIs and CIRs, which commonly require different solutions like Semantic Engines that can more easily discover relationships within and across data records but which do not perform crawling of networks. Hence, multiple tools solve multiple different problems and a holistic tool approach should always be considered.
Important CMDB Use Cases
There are many different CMDB Use Cases that you should be aware of and personally evaluate when selecting a CMDB for your enterprise. Some are associated with implementation while others are associated with use and benefit. The most critical Uses Cases include the following:
- Stand-alone single-instance CMDB model
- Federated multi-instance CMDB model
- CMDB Branding
- Data Diversity
- Creation of CI Templates for CI collection and organization
- Configurable Data Indexing
- Automated Semantic Relationship Generation and Modification
- Out-of-the-Box and Ready-to-Use Business Intelligence
- Extensible Custom Reports and Data Visualizations
- Advanced Tiny Data and Big Data Analytics
- Content Management for Configuration Items
- Implementation (a.k.a. Simplicity, Speed, Comprehensiveness, Quality & Cost)
Loading Your CMDB With Data
The data types that are defined and tracked within a CMDB are called Configuration Item Types (CITs). Examples of different CI Types can be explored in the article: Understanding Configuration Item Types (CI Types) for CMDBs.
Data records for a specific CIT are loaded into the CMDB and represent the known inventory of Configuration Item Instances for that CIT. So, for example, if we have a CIT called Application (Singular Form) or Applications (Plural Form), we would want to load every Application instance we know of into the CMDB.
Loading the CMDB with CIIs alone has rather limited use since the one of the key capabilities of a CMDB is to view relationships within and across CIIs. These relationships are called CI Relationships (CIRs).
CIRs are loaded into the CMDB manually or through numerous different automated tools that know how to handle CITs, CIIs, CIAs, and CIRs. The industry calls this process of collecting and creating relationships, either manually or automatically, Relationship Harvesting. If harvesting Semantic Relationships, this process is more specifically called Semantic Relationship Harvesting.
When the CMDB is properly loaded with data, that data will be assembled into Computer Science data structure known as a Data Graph (DG) or Data Network (DN) for non-semantic data or a Semantic Data Graph (SDG) or Semantic Data Network (SDN) for semantic data. Upon creation of such graphs/networks, you should be able to view and explore relationships for any CII. For example:
- 360 Degree View of an Application, which represents a technical CII, or a
- 360 Degree View of a Capability, which represents a Business CII.
From any given CII in context, you should also be able to explore (e.g. “drill into”) any related CIIs nodes to see other data and relationships about and associated with them.
It should be noted that no one single tool will solve all relationship harvesting issues. For example, tools that know how to crawl a network for technical assets rarely know how to collect and create relationships in data that is logical and off-network, such as Applications, Capabilities, Market Segments, Contracts, etc. For this reason, it is very important to have a diverse set of solutions for harvesting and maintaining, both, CIIs and CIRs.
CMDB Data Diversity
One CMDB use case that is definitely worth covering, here, is that of Data Diversity. CMDB Data Diversity means that a CMDB is designed to handle many different data types or Configuration Item Types (CITs) along with their associated CI Instances (CIIs) and CI Relationships (CIRs). The more diverse your data becomes, the more powerful the CMDB becomes as a knowledge management solution because, for example, you have a broader base of data for complex analytics, informatics, data science, semantic search, natural language processing, and much more. (NOTE: Data Diversity is a key requirement for successful Big Data implementations.)
When assessing your CMDB for Data Diversity it is recommended that you evaluate for three types or categories of data:
- Technical Data: This type of data represents the bulk of the tools and technologies that are used by IT professionals and IT organizations to facilitate business functions. Examples include but are not limited to: Software, Computing Equipment, Communications Equipment, Network Equipment, Data Center Equipment, etc.
- Business Data: This type of data represents that data that is common to most business functions, regardless of their industry. These data types represent things like People, Organizations, Products, Services, Capabilities, Markets, Market Segments, Contracts, Initiatives, etc.
- Industry Data: This type of data is that data which is specific to one or more vertical industries that an enterprise operates in. For example: Pharmaceutical Industry Data, Insurance Industry Data, Retail Industry Data, Medical Industry Data, etc. A great CMDB will easily accommodate whatever industry data your enterprise needs.
The best CMDBs easily handle any and all combinations of the above data types.
Fact: Most vendor CMDB implementations (not all) are very limited in their ability to mix highly diverse sets of CITs, together. For example, most CMDBs work very well with technical data, like data collected from crawling networks, but have a difficult time with non-technical data, such as business data and industry-specific data. In other words, many CMDBs have a very difficult time mixing Pharmaceutical, Banking, Medical, Insurance, or Retail data with Technical Data. There are, however, some CMDBs that are explicitly designed to handle Data Diversity (See an Example of a CMDB that supports Data Diversity).
For these reasons, when evaluating CMDBs for an enterprise it is highly recommended that you understand and be very clear about how diverse your own data sets will need to be. Will your enterprise require that its CMDB hold only technical data? Does it require only non-technical data? Does it require, both, technical data and non-technical data? (Most mature enterprises strive for a CMDB solution that easily loads and works with technical and non-technical data.)
Repository-based CMDB versus Compiler-based CMDB Design Paradigms
There are normally two unique paradigms that are used to design CMDBs. The first is the more traditional repository-based paradigm and the second is a compiler-based paradigm.
A repository-based CMDB is one that includes and uses its own storage technology; usually a database technology. The CMDB application is built around and is wholly integrated with the database, all as part of a traditional multi-tier application architecture.
The storage technologies in these CMDB applications are designed with fixed and sometimes extensible schematic (schema) structures. This means that in this paradigm CITs, CIAs, CIIs and CIRs are all stored in the stand-alone repository with a fixed schema so that they can be accessed and leveraged by the core application, at a later time.
The primary advantages of repository based CMDBs are:
- They usually provide transactional integrity (which many will argue is never needed).
- They allow for custom queries that can be written against the data held within their structures.
The primary disadvantages of a repository-based model are:
- They raise the complexity of the design and implementation.
- As the data grows within them, their complexity rises and they become much harder to maintain, change, and work with (especially since their schematic/schema structures are often fixed).
- Because they get more difficult to maintain and change, it takes a very long time to change them, meaning very long change and release cycles (think entire development release cycles).
- Because of their fixed schemas, they are very inflexible and often make it very difficult to store, combine, and correlate data that comes from different domain spaces (e.g. it is very difficult to mix technical, business, and industry data).
- They are very expensive to implement, integrate, own, and maintain.
A compiler-based CMDB is one that can be installed and run just like any other software compiler. Instead of compiling software, compiler-based CMDBs compile data (along with processing rules) and automatically generate massive quantities of electronic CMDB documentation.
A very important distinction between the repository-based and compiler-based CMDB is that, in the case of a repository-based CMDB the tool or application (along with its integrated repository) is the CMDB, while in the case of a compiler-based CMDB the tool generates a new and stand-alone CMDB every time it runs (e.g. think traversable electronic documentation).
The latter distinction is very important to understand: A compiler-based CMDB is automatically generated or synthesized by a compiler. In other words, one tool that is a compiler automatically generates another tool that is the CMDB. The CMDB is an output. In the compiler paradigm, you provide your CITs, CIIs, CIAs, and CIRs to the compiler and it (the compiler) assembles them into a CMDB (using a paradigm known as Data Driven Synthesis). In fact, the best compilers will identify and harvest CI Relationships (CIRs) that exist in your data, for you.
Unlike in the case of a repository-based CMDB which has its own data storage, the results of a compiler-based CMDB can be placed wherever you’d like, such as but not limited to: a file system, many different relational Databases, many different NoSQL databases, other systems, etc. In fact, you can use the outputs of your compiled CMDB(s) to create what is called a Big Data Lake, which redundantly stores your data in different technologies that, combined, are called a Persistence Polyglot, as mentioned in an earlier section of this document.
It should be noted that compiler-based CMDBs, like all other forms of compilers, are also considered to support agile design, development, and delivery processes. Read more about Agile CMDBs. They can be used with all forms of Rapid Application Development (RAD) methodologies that require tight development iterations with the ability to immediately adapt and adopt change.
The primary advantages of compiler-based CMDBs are that:
- They are very simple to use (i.e. they are less complex than repository-based CMDBs). Unlike repository-based CMDBs, which can require significant development efforts to get data into them, compiler-based CMDBs can often be generated in minutes, with very little effort.
- They are blazing fast. They can generate entire CMDBs in minutes.
- They require far less money to implement and maintain than repository-based CMDBs require.
- Since each compilation is a snapshot in time, each snapshot can be stored and versioned for history, comparison, and analytical analysis.
- They allow for highly diverse data sets that are sourced from many domains (i.e. it’s very easy to combine technical data with business data and/or industry data.
- They can be used to feed and fan data within and across your Big Data Lake’s Persistence Polyglot.
- Since they are light-weight, affordable, and simple to use, you can have many compiler-based CMDBs in your enterprise, which allows for a Federated CMDB Implementation Model (covered below).
The primary disadvantages of compiler-based CMDBs are that:
- Compiler paradigms are batch-processes and, like all batch processes, there is a delay (i.e. the compile time) from the time the data is provided as input to the compiler to the time the CMDB is generated as an output of the compiler. This means there is no real-time high frequency transactional persistence of CI-related records. (Many will argue such a feature is never really needed for CMDBs.)
- The more data you feed to a compiler, the more compute resources it consumes and the longer time it takes to generate its output. However, both are still always small fractions of the resources and time required for a repository-based CMDB.
Triple Stores as CMDBs
There are databases called Triple-Stores that are sometimes used in the design of repository-based CMDBs and that store semantic relationship triples (a.k.a. tuples). Often, these types of databases do not store detailed data about Semantic Nodes but often just capture Semantic Relationships. These types of storage databases are rapidly becoming less relevant, as multiple new database technologies allow for storage and correlation of, both, Semantic Node data and Semantic Relationships (i.e. triples/tuples) between such nodes.
When evaluating such technologies, it is highly recommended that you be careful to thoroughly ensure that you can store, both, Semantic Node data and Semantic Relationships between such nodes.
Single Instance CMDB Implementation Model versus Federated CMDB Implementation Model
Many enterprises pursue the implementation of only one single repository-based CMDB in what is called a Single Instance CMDB Implementation Model. Enterprises that use this single instance model have one and only one CMDB operating in their production environment. The enterprise uses the single-instance CMDB like one big data warehouse, feeding data to it from multiple sources. Just like data warehouses, these CMDBs get complicated and very expensive to change and support as the data within them grows.
An alternative to the Single Instance CMDB Implementation Model is the Federated CMDB Implementation Model, which is rapidly gaining pace in the industry due to its power, low cost, and flexibility in supporting a wide range of knowledge activities. In this model, an enterprise has more than one CMDB implementation (usually many), where each sits in a different area of the enterprise and has a different purpose. In this model, very detailed data can be localized in local and fit-for-purpose CMDBs while coarse reusable data can be rolled-up into larger CMDBs. For example, an enterprise may have 100 different software development teams, where each delivers one unique product (e.g. software, application, system, etc.). In a federated model, each development team gets its own local CMDB so that they can see and share semantic data at the most granular level of development, specifically for the product that team delivers. This means 100 unique and localized CMDBs. Each of these 100 CMDBs then feeds data to one larger enterprise CMDB that correlates and shares data across all 100 software products. In fact, each local CMDB can be a unique CII in the enterprise CMDB, that can be accessed from it as well.
Because repository-based CMDBs are very expensive and complex, they rarely get used in Federated CMDB implementation models, unless they act as the enterprise-level CMDB that is fed by all other federated CMDBs. Instead, in federated models, compiler-based CMDBs tend to be used because they are more flexible, faster to implement, and far more affordable. They even support Agile development and delivery processes like Continuous Integration and Continuous Delivery (CI/CD), DevOps, etc. (See article on Agile-CMDB.)
Summary and Conclusions
- Configuration Management Databases (CMDBs) are knowledge management tools whose capabilities revolve around seeing, exploring, making sense of, and learning from relationships between data records that represent people, places, and things that are important to the enterprise.
- Not all CMDBs are semantic or support Semantic Relationships. The more advanced CMDBs support Semantic Relationships.
- There are two types of CMDBs, repository-based and compiler-based. Each has their pros and cons. However, compiler-based are far more affordable and flexible. For example, they more easily allow for Federated CMDB implementations.
- Not all CMDBs support Data Diversity. Data Diversity is important because it allows you to mix different types of data, together, and will ultimately allow you to achieve more advanced levels of reporting, analytics, data science, etc. You should beware CMDBs that easily support technical data but not business or industry data.
- A Single Instance CMDB Implementation Model uses one and only one repository-based CMDB, while a Federated CMDB Implementation Model uses many CMDBs.