IF4IT Home

The International Foundation for Information Technology

IF4IT is International
Discipline Quick Links
A B C D E F G
H I J K L M N
O P Q R S T U
V W X Y Z
Disciplines Master Index
Glossary Quick Links
A B C D E F G
H I J K L M N
O P Q R S T U
V W X Y Z
Glossary Master Index

Home Page for the Information Technology (IT) Discipline

"Failover Management"


Table of Contents

Introduction: Introduction to Failover Management
Framework: Using This Artifact as a "Failover Management Framework"
Key Terms: Key Terms for Failover Management
Glossary: The "Failover Management Glossary"
Capabilities: Failover Management as an Enterprise Capability
Ownership: Clearly Defined Failover Management Ownership is Critical for Success
Verbs and Actions: Understanding Why Verbs and Actions are Important to Failover Management
Roles: Key Verb and Action Driven Roles For Failover Management
Taxonomy: Understanding Failover Management Classifications or Categorizations
Ontology: Failover Management Ontology as a Means for Language Standardization
Life Cycle (Lifecycle): Lifecycle Phases for Failover Management
Inventories: Failover Management Inventories
Environments: Failover Management Environments
Metrics: Failover Management Metrics
Services: Failover Management as a Set of Services (a.k.a. Failover Management Services)
Service Paradigms: Centralized Failover Management vs. Federated Failover Management
Principles & Best Practices: Common Principles and Best Practices for Failover Management
Further Reading and Reference Material for Failover Management


Introduction: Introduction to Failover Management

This document represents an aggregated, ordered and contextualized view of the material we've been able to compile and publish that is related to the topic of "Failover Management." The goal is to make this page a landing and launch point for all things related to this topic. As our content becomes more complete and more accurate, this page should become a very useful and powerful knowledge base for this topic and all parties interested in it.

You'll find that the content for this document is consistent with that of other discipline related documents. This is intentional. The consistency is based on a knowledge pattern that helps individuals learn more about different topics, quicker and more efficiently. We hope you find the material useful and easy to learn.

It's important to realize that content in this document and any related sub-documents are constantly evolving. Therefore, we recommend you check for updates, regularly, to keep up with the latest material.

The Foundation always welcomes your feedback and suggestions for improvement, as we're always looking for ways to improve our solutions and offerings to the general community.

All solutions published by the Foundation are subject to the terms and conditions of the Foundation's Master Agreement.


Framework: Using This Artifact as a "Failover Management Framework"

This document or artifact, along with everything in it, is intended to act as a "Framework" that addresses various aspects of Failover Management.

The readers will notice that most sections in the Table of Contents (TOC) use a format where the TOC entry is prefixed with a topic name, followed by a short descriptive title (i.e. "TOPIC_NAME: TOPIC_RELATED_SECTION_TITLE"). This is intentional and represents a format by which the Foundation may achieve things like the identification of appropriate topic areas, the segregation of distinct topic areas from each other, the appropriate ordering of topic areas, and achieve the maintenance of consistency, both, within and across different IT Disciplines.

To elaborate, this artifact is intended to:

  1. Organize different areas of the discipline known as Failover Management into clear and compartmentalized areas that allow the Foundation to more effectively and productively collect, document and publish information that pertains to this discipline.
  2. Decompose each area of Failover Management into smaller and, therefore, more digestible units for more efficient learning and understanding.
  3. Document common industry wisdom about each area, piece or subcomponent of Failover Management
  4. Act as a set of Failover Management related best practices and guidelines that have been collected, documented, and published for the benefit of IT Professionals, regardless of their specific industry, line of business, or area of expertise.
  5. Act as a consistent and repeatable pattern for documenting, publishing and learning, both, within this Discipline and across "all" Disciplines.

From the Foundation's perspective, if done correctly, all of the above will allow the Foundation to properly decompose, document and publish content related to each sub-area or sub-topic for each IT Discipline, including this specific discipline (i.e. "Failover Management").

From the reader's perspective, if done correctly, all of the above will allow him or her to easily find and learn about specific areas of interest associated with this and all other IT Disciplines in a manner where the reader may effectively consume and digest material in small atomic segments that act as repeatable and more effective learning units.

As this artifact evolves and progresses, the reader will see it address key areas of the professional IT Discipline "Failover Management" that range from its detailed definition through closely related terms, phrases and their definitions, to its detailed specification of Failover Management Capabilities, and all the way through to defining, delivering, operating and supporting Failover Management Services.

As mentioned previously, this document will continue to evolve and the Foundation recommends the reader check back, regularly, to stay abreast of modifications and new developments. It is also important to understand that the structure of this artifact may change to meet the needs of such evolution.


Key Terms for Failover Management

Before moving on to learn more about the rest of the Failover Management framework, we suggest that you take some time to familiarlize yourself with the following very basic term(s)...

Failover Management:

"1. The professional discipline that involves working with, in or on any aspect of planning, delivering, operating or supporting for one or more Failover Items or any and all solutions put in place to deal with such Items.

2. The solution set that a person or organization puts in place to manage one or more Failover Items.

3. The process or processes put in place by a person or organization to assist in the management, coordination, control, delivery, or support of one or more Failover Items.

4. The Enterprise Capability that represents the general ability or functional capacity for a Resource or Organization to deal with or handle one or more Failover Items. Such a term is often used by Information Technology (IT) Architects when performing or engaging in the activities associated with general Capability Modeling."

In addition to the above basic term(s), you can also learn a great deal about Failover Management by familiarizing yourself with the broader spectrum of terms that make up the Failover Management Glossary...


Glossary: The "Failover Management Glossary"

IT Glossary

Language between IT professionals and the businesses we serve is often a significant barrier to success, as we often spend countless hours trying to interpret each other's meanings. This is often also true between IT professionals who are taught to use certain terms and definitions as part of the organizations and industries they serve. It's when you start to jump from organization to organization, from enterprise to enterprise, and from industry to industry that you realize how much time and effort is wasted on just getting language and meanings correct. For these reasons, the Foundation puts a great deal of focus on terms and phrases, as well as their corresponding definitions. We highly recommend you spend time learning and understanding all of the related terms and phrases, along with their meanings, for all areas of "Failover Management."

Failover Management Glossary
Centralized Failover Management Failover Management Process
Decentralized Failover Management Failover Management Professional
Enterprise Failover Management Failover Management Program
Failover Failover Management Project
Failover Automation Failover Management Reference Architecture
Failover Capacity Management Failover Management Release
Failover Catalog Failover Management Report
Failover Catalogue Failover Management Reporting
Failover Configuration Failover Management Roadmap
Failover Configuration Item Failover Management Role
Failover Configuration Management Failover Management Rule
Failover Cost Failover Management Schedule
Failover Data Entity Failover Management Security
Failover Database Failover Management Service
Failover Decommission Failover Management Service Assurance
Failover Delivery Failover Management Service Contract
Failover Dependency Failover Management Service Level Agreement (SLA)
Failover Deployment Failover Management Service Level Objective (SLO)
Failover Document Failover Management Service Level Requirement (SLR)
Failover Document Management Failover Management Service Level Target (SLT)
Failover File Plan Failover Management Service Provider
Failover Framework Failover Management Service Request
Failover Governance Failover Management Software
Failover History Failover Management Solution
Failover Identifier Failover Management Stakeholder
Failover Inventory Failover Management Standard
Failover Item Failover Management Strategy
Failover Lifecycle Failover Management Supply
Failover Lifecycle Management Failover Management Support
Failover Management Failover Management System
Failover Management Application Failover Management Theory
Failover Management Best Practice Failover Management Training
Failover Management Blog Failover Management Vision
Failover Management Capability Failover Management Wiki
Failover Management Center of Excellence Failover Management Workflow
Failover Management Certification Failover Metadata
Failover Management Class Failover Migration
Failover Management Community of Practice (CoP) Failover Plan
Failover Management Course Failover Portfolio
Failover Management Data Failover Portfolio Management
Failover Management Data Dictionary Failover Processing
Failover Management Database Failover Record
Failover Management Demand Failover Records Management
Failover Management Dependency Failover Repository
Failover Management Discussion Forum Failover Reuse
Failover Management Document Failover Review
Failover Management Documentation Failover Schedule
Failover Management File Plan Failover Schematic (Schema)
Failover Management Form Failover Security
Failover Management Framework Failover Software
Failover Management Governance Failover Strategy
Failover Management Knowledge Failover Support
Failover Management Lessons Learned Failover Taxonomy
Failover Management Metric Failover Termination
Failover Management Operating Model Failover Tracking
Failover Management Organization Failover Tracking Software
Failover Management Plan Failover Transaction
Failover Management Platform Failover Unique Identifier
Failover Management Policy Failover Verification
Failover Management Portfolio Failover Version
Failover Management Principle Failover Workflow
Failover Management Procedure Federated Failover Management
Failover Management Process Regional Failover Management

Please refer to the IT Glossary for other terms and phrases that may be relevant to this professional discipline.

Readers may also refer to the Taxonomy of Glossaries for terms and phrases that are semantically grouped according to IT Disciplines or enterprise domains.

This Failover Management Glossary is a contextual subset of the master IF4IT Glossary of Terms and Phrases. The master glossary can be used by you and your enterprise as a foundation for broader understanding of Information Technology and can be used as a teaching and learning tool for those you work with, helping to ensure a common and more standard language.


Capabilities: Failover Management as an Enterprise Capability

A Capability, as it pertains to Information Technology (IT) or to an enterprise that an IT Organization serves, is defined to be "A manageable feature, faculty, function, process, service or discipline that represents an ability to perform something which yields an expected set of results and is capable of further advancement or development. In other words, a Capability is nothing more than "the ability to do something" or, quite simply, a Feature or Function. Therefore, when applied to an enterprise, a Capability represents a critical Enterprise Feature or Enterprise Function.

When it comes to Capabilities, there are multiple types that an enterprise needs to be aware of. Examples include but are not limited to:

As can be seen above, there are Capabilities that are associated with Resources, Organizations, and Assets such as Systems. All are important to an enterprise.

In the case of this IT Discipline (i.e. Failover Management), we use the word Capability in the context of an Enterprise Capability or an IT Capability, which are both equivalent to Enterprise Disciplines or IT Disciplines, respectively. In short, the Capability of Failover Management represents the ability to deal with any and all Failover Items and anything relevant that is related to or associated with any Failover Items.

If you think about it, a capability is really nothing more than a "verb" or "action that represents "the ability to do something." Understanding this allows us to derive a consistent and highly repeatable set of sub-capabilities for any Noun we're dealing with. For example:

In summary, the implication is that the Enterprise Capability or Enterprise Discipline known as Failover Management is the superset of all the above Sub-Capabilities, as they pertain to or are applied to the discipline-specific Noun: "Failover." This now translates more specifically to:

For a more complete list of very specific Capabilities/Disciplines, refer to the Foundation's Master Inventory of IT Disciplines. It is important to note that this inventory is in a flat or non-hierarchical form, specifically because "hierarchy" is almost always a matter of personal preference or context (what hierarchy is important to one Resource or Organization may be unimportant to another's needs or requirements). Therefore, the Foundation has published its inventory of Capabilities in a non-hierarchical, flat form.

This now brings us to a very obvious problem that surrounds Capabilities, which is the fact that there are simply too many "granular" or "specific" Capabilities to document and publish in any single Capability Model. The end result is that a Capability Model may become unwieldy because of trying to incorporate so many different specific Capabilities. Also, Capability Modeling "Purists," who all have their own (and very differing) opinions about how Capability Models should or should not be represented, almost always refuse to get into the details. To address this, we recommend using a generic set of Capabilities that map to and are driven by the Systems Development Life Cycle. For example:

As you can see from the above, we now have a very limited, controlled and manageable set of Discipline-specific Capabilities for the Discipline Failover Management.

As a reminder, the above Capability representations are "suggestions" for baselining or initializing your own Enterprise Capability Model (ECM). It's recommended that you take the time to work with your enterprise stakeholders to improve upon and/or customize your own ECM so that you can help meet their needs. However, with that being said, it's always a better idea to go in with a baseline that you can modify rather than building your own solution from scratch, especially if your goals are to standardize, not reinvent the wheel, and not deviate too far from what other enterprises are doing to model their own environments. This is especially true if you've never had any experience building ECMs that have gained and maintained full adoption.

Why do enterprises perform Capability Modeling? Enterprises most often build Capability Models that are associated with Failover Management for the following reasons...

Capability Modeling Recommendations: Some things to consider and keep in mind when working on or creating your Failover Management and Enterprise Capability Models...

Learn More About Capability Models: Taking the time to learn about and understand Capability Models, what they're for, and how they're used may help you learn how Failover Management better fits into the broader enterprise. Therefore, we suggest you spend some time reviewing and understanding the IF4IT Enterprise Capability Model...

Enterprise Capability Model

Ownership: Clearly Defined Failover Management Ownership is Critical for Success

IT Discipline Ownership

Here's a very simple fact... If an enterprise does not establish and enforce clearly defined Ownership (i.e. a Resources and his or her Organization are assigned as accountable ownership) for Failover Management, the enterprise has automatically set itself up for failure in its implementation of that discipline. Therefore, if you and your enterprise want to implement and maintain a successful solution for Failover Management, there must be a clearly defined Owner that can and will be held accountable for getting work done, providing transparency, helping with strategy setting, and coordinating implementation of Failover Management as a fully functional and mature enterprise Service.

Having clearly defined Ownership should not be confused with having fully dedicated Resources that spend one hundred percent of their time working on Failover Management. In fact, smaller enterprises can rarely afford to dedicate full time Resources, like larger enterprises can, to all enterprise IT Disciplines. This being the case, all IT Disciplines, including Failover Management, should "always" have clearly defined Owners so that there is always a clear point of accountability and contact for any issues or work that need to be addressed.

In addition to the common best practice of having clearly assigned Ownership for Failover Management, it is also considered a best practice to clearly publish and socialize Failover Management Ownership details to a centralized location (often referred to as a "Service Catalog" or an "Enterprise Service Catalog"), along with Ownership details for all other IT Disciplines, so that the entire enterprise has constant access to it.

Canonical Ownership of an Enterprise Capability

Figure: How Ownership of the Capability Failover Management fits into the Canonical Model for IT

The above figure helps us understand how Capability or Discipline Ownership fits into the Canonical Model for Information Technology (IT) (i.e. "Think," "Deliver," and "Operate"). Owners are assigned to individual Disciplines or Capabilities, such as Failover Management, and are instantly made accountable to the enterprise for the results of all Failover Management Thinking activities (i.e. Strategy, Research, Planning and Design), all Failover Management Delivery activities (i.e. Construction, Deployment and Quality Assurance), and all Failover Management Operations activities (i.e. Use, Maintenance and Support). Done correctly, Failover Management Ownership is constant and ongoing. It's important to understand that such assigned Ownership should "never" end so that there is clear and constant accountability and transparency for all aspects of the Canonical Model to the enterprise.

Not having clear Ownership for Failover Management means that there is no clear understanding of who is accountable for it, who can provide understanding of what's going on within it, who can help the enterprise provide short term and long term descriptions of work being performed within the Discipline area to improve it over time for its customers, and who can help with getting work done that's associated with it. It means your or your enterprise's implementation for Failover Management will be highly incomplete and erratic because no one is constantly (or even partially) watching over the Discipline and its needs for maintenance and evolution. Not having clear Failover Management Ownership is a recipe for confusion and, sometimes, even chaos.

In summary, if you and your enterprise truly want to be successful with your implementation of Failover Management, ensure that a clear and highly accountable owner is identified and assigned to the Discipline. Publish those ownership details, preferably in an enterprise's Service Catalog, and socialize it so everyone knows whom to go to for answers and for help with Failover Management related work. In other words, if you want to implement Failover Management as an enterprise Service, then you absolutely must start with clearly defined, published and socialized Ownership.


Verbs and Actions: Understanding Why Verbs and Actions are Important to Failover Management

Throughout the Foundation's documentation, you will continuously run into the references of "Nouns and Verbs." These concepts are key to consistency and standardization, throughout the IT Industry, down to each and every IT Discipline. Given that we've discussed the impact of "Nouns" on the discipline of "Failover Management," this section will start to discuss the importance of "Verbs" or "Actions" that can be performed with or against the key Noun or Nouns associated with this Discipline. To reiterate, Verbs or Actions allow us to clearly understand what can be performed on or with the Noun in question. As will be discussed in the next section, Verbs or Actions will also help us clearly identify whom it is (i.e. the "who" or more specifically the Roles) that performs or executes such Verbs or Actions against a Discipline and its associated Noun or Nouns. As will be discussed later, Verbs or Actions will also help identify key Attributes (i.e. Field Names) that are necessary for the very data definition of the Noun or Nouns for this Discipline and will even help identify which Verbs or Actions can be automated for this Discipline.

As a reminder, the base Noun for the discipline known as Failover Management is: "Failover," which is sometimes referred to as a the Noun: "Failover Item."

By now, it should be becoming apparent that verbs represent a baseline for defining solid functional requirements and sub-capabilities for what would be a part of any good Failover Management System or Service. What this means is that if you and/or your Organization is looking for a solution in this space (e.g. the purchasing or building of a software solution or the implementation of a Service to address the needs of Failover Management), you could use discipline-related verbs to drive the foundation of what the solution should or shouldn't do, as mapped to specific stakeholders that will use or provide the solution.

Examples of the types of Verbs or Actions that are important to this Discipline include but are not limited to:

The above list represents a very small subset of all Verbs or Actions that are relevant for this Discipline. The more complete set can be found in the Roles section of this document, where readers can see the direct correlation of Verb to Noun and to, both, Generic Role and Discipline Specific Role.


Roles: Key Verb and Action Driven Roles For Failover Management

An "action" or a "verb" is something that can be performed on or with a specific "noun." The reason it is important to itemize all relevant verbs is because we can now start to determine what we can or cannot do with the noun in question, where in this case the noun is "Failover."

Actions/Verbs Example as Applied to "Failover" Generic Roles Discipline-Specific Roles
Administrate Administrate Failover Administrator Failover Administrator
Approve Approve Failover Approver Failover Approver
Architect Architect Failover Architector Failover Architector
Archive Archive Failover Archiver Failover Archiver
Audit Audit Failover Auditor Failover Auditor
Bundle Bundle Failover Bundler Failover Bundler
Clone Clone Failover Cloner Failover Cloner
Code Code Failover Coder Failover Coder
Configure Configure Failover Configurer Failover Configurer
Copy Copy Failover Copier Failover Copier
Create Create Failover Creator Failover Creator
Decommission Decommission Failover Decommissioner Failover Decommissioner
Delete Delete Failover Deletor Failover Deletor
Deploy Deploy Failover Deployer Failover Deployer
Deprecate Deprecate Failover Deprecator Failover Deprecator
Design Design Failover Designer Failover Designer
Destroy Destroy Failover Destroyer Failover Destroyer
Develop Develop Failover Developer Failover Developer
Distribute Distribute Failover Distributor Failover Distributor
Download Download Failover Downloader Failover Downloader
Edit Edit Failover Editor Failover Editor
Educate Educate Failover Educator Failover Educator
Export Export Failover Exporter Failover Exporter
Govern Govern Failover Governor Failover Governor
Import Import Failover Importer Failover Importer
Initialize Initialize Failover Initializer Failover Initializer
Install Install Failover Installer Failover Installer
Instantiate Instantiate Failover Instantiator Failover Instantiator
Integrate Integrate Failover Integrator Failover Integrator
Manage Manage Failover Manager Failover Manager
Merge Merge Failover Merger Failover Merger
Modify Modify Failover Modifier Failover Modifier
Move Move Failover Mover Failover Mover
Own Own Failover Owner Failover Owner
Package Package Failover Packager Failover Packager
Persist Persist Failover Persister Failover Persister
Plan Plan Failover Planner Failover Planner
Purge Purge Failover Purger Failover Purger
Receive Receive Failover Receiver Failover Receiver
Record Record Failover Recorder Failover Recorder
Recover Recover Failover Recoverer Failover Recoverer
Register Register Failover Registrar Failover Registrar
Relocate Relocate Failover Relocator Failover Relocator
Reject Reject Failover Rejecter Failover Rejecter
Remove Remove Failover Remover Failover Remover
Replicate Replicate Failover Replicator Failover Replicator
Report Report Failover Reporter Failover Reporter
Request Request Failover Requestor Failover Requestor
Restore Restore Failover Restorer Failover Restorer
Review Review Failover Reviewer Failover Reviewer
Save Save Failover Saver Failover Saver
Search Search Failover Searcher Failover Searcher
Split Split Failover Splitter Failover Splitter
Sponsor Sponsor Failover Sponsor Failover Sponsor
Store Store Failover Storer Failover Storer
Strategize Strategize Failover (or Set Failover Strategy) Strategizer (or Strategy Setter) Failover Strategizer (or Failover Strategy Setter)
Support Support Failover Supporter Failover Supporter
Test Test Failover Tester Failover Tester
Train Train Failover Trainer Failover Trainer
Upgrade Upgrade Failover Upgrader Failover Upgrader
Upload Upload Failover Uploader Failover Uploader
Verify Verify Failover Verifier Failover Verifier
Version Version Failover Versioner Failover Versioner
View View Failover Viewer Failover Viewer

At a minimum, the above list of Verbs can be used to help identify, track, and manage the basic "Features" required by and associated with Failover Management, even if your enterprise doesn't maintain a Capability Model that lists specific Failover Management Capabilities. Application designers, developers, and architects often find such Verb Lists or Feature Inventories to be invaluable.


Taxonomy: Understanding Failover Management Classifications or Categorizations

IF4IT Taxonomies

A Taxonomy, in its noun form, is defined as:

...a documented and orderly set of types, classifications, categorizations and/or principles that are often achieved through mechanisms including but not limited to naming, defining and/or the grouping of attributes, and which ultimately help to describe, differentiate, identify, arrange and provide contextual relationships between the entities for which the Taxonomy exists.

From this general definition, we can derive that the definition for a Failover Management Taxonomy is:

...a documented and orderly set of types, classifications, categorizations and/or principles that are often achieved through mechanisms including but not limited to naming, defining and/or the grouping of attributes, and which ultimately help to describe, differentiate, identify, arrange and provide contextual relationships between Failover Items, Entities or Types.

In short, what this means all means is that a Taxonomy is nothing more than a classification or typing mechanism and that a Failover Taxonomy is nothing more than a classification or typing mechanism that helps people and systems distinguish between different Failover Items, Entities, Types, Records or any other Failover Management element you can think of.

It's important to understand that Taxonomies can be as simple as a list of relevant terms or phrases with respective meanings or definitions or they can take on more complex forms, such as hierarchical and graphical model structures that can be homogeneous and heterogeneous in nature. More complex Taxonomies include examples such as "Visual Taxonomies" and "Audible Taxonomies" but, expect in the case of very special technologies, are typically out of scope for general Information Technology (IT) Operations.

The Foundation directs readers to its ever-evolving Inventory of Taxonomies for Standard Taxonomy suggestions. Specifically, readers may want to start with the Taxonomy of Taxonomies, which helps make it clear that the IT Industry is composed of many hundreds if not thousands of Taxonomies, Classifications, Categorizations or Types.


Ontology: Failover Management Ontology as a Means for Lanagugae Standardization

While Taxonomies represent organized classifications or types, you can think of Ontologies as the design and representation of entire lanaguages, with the specific intent to control things like structure, behavior, representation, and meaning. Without getting into a theoretical conversations about Ontologies, you can view this entire article as a foundation for the ontology of Failover Management. Or, in other words, a Failover Management Ontology.

Throughout this artifact/framework, you will find things like Failover Management related terms, phrases, definitions, roles, responsibilities, nouns, verbs, classifications, and so on, all as a means of definining a standard representation for and interpretation of the language of Failover Management.

It is only through the definition, communication, and establishment of such Ontologies that we can standardize language and communication associated with Failover Management, whether it be between humans and/or systems.


Life Cycle (Lifecycle): Lifecycle Phases for Failover Management

When we talk about Life Cycle (or lifecycle) for Failover Management, it's important to keep in mind that there are two different types of Life Cycles that apply. The first is a Data Life Cycle, which addresses Failover Management data or entities, and the second is associated with delivering Failover Management Assets like Systems or Software solutions.

Failover Management Data Life Cycle Phases:

Data Lifecycle (or Life Cycle) for any and all data is the period from the "inception" of data through to its ultimately being "purged" from existence. This is no different for Failover Management related data.

Like the data associated with any other professional IT Discipline, Failover Management related data adheres to the following common Data Lifecycle Phases:

Data Lifecycle Phases

Figure: Failover Management Lifecycle Phases

  1. Inception: Data is in it's raw idea-like form and is not ready for consumption by the general population because it has not been documented or registered, anywhere, in a formal manner.
  2. Creation and Registration: Data is formally put into existence for day-to-day use by appropriate stakeholders.
  3. Iterative Maintenance: Data is in a mode of constant use and is updated and modified, as needed, to meet the needs of daily use by various stakeholders.
  4. Decommission and Deletion: Data is prepared for deletion and eventually deleted from daily operational use but still exists for administrative or organizational purposes, such as historical auditing. It can be restored to any one of its relevant last states and, therefore, can be brought back into existence for day-to-day use.
  5. Purged From Existence: Data is completely removed from an environment with no means to restore or reconstruct it, without recreating it from scratch and with no guarantees that it will match it's previous state.

The above Life Cycle Phases represent the high level transitions that occur from the inception of Failover Items or Entities all the way through to their complete elimination from existence. A more detailed breakdown of these transitions or phases represents what are referred to as "Failover Management States."

Failover Management Systems Development Life Cycle (SDLC) Phases or Failover Management Software Development Life Cycle (SDLC) Phases:

The SDLC is a means for facilitating and controlling how IT Professionals deliver Assets, such as Failover Management Systems and Software. In this case, you should default to the master SDLC, which is used to deliver any Asset of any type, including those associated with the Failover Management discipline.

Failover Management SDLC Diagram

Inventories: Failover Management Inventories

There are probably no greater or more important tools for providing Failover Management transparency and direction than the collection, ordering, categorizing, grouping, and maintenance of all related Failover Items. In other words, Failover Management Inventories.

In short, an Inventory represents a list of individual things or instances of things that are typically all of the same Noun Type or Data Type, where these instances are described and detailed by their Attributes, along with the Data and Information that act as values for such Attributes.

At a minimum, Failover Management Inventories are used for the establishment of solid Failover Configuration Management practices, as the Failover Instances tracked within such Failover Inventories act as Configuration Items (in Target and/or Dependency form) for key Configurations (Failover Management Configurations or otherwise).

Inventories are also used for solid decision making. Good decisions, either strategic or tactical, are made based on having good Data and Information. And, good Data and Information only come from taking the time to follow best practices associated with Inventory Management. It's only through building such Inventories that an enterprise can achieve solid Failover Management Business Intelligence and Reporting.

Also, it's these very same Inventories that act as the foundation for understanding and managing Total Cost of Ownership (a.k.a. "TCO") for Failover Management. Without such Inventories, trying to understand your costs can be nothing more than uneducated guessing.

The obvious place to start is with Failover Inventories and then move on to surrounding Inventories that are directly and indirectly related to Failover Management.

Additionally, there are many other types of Inventories that are common and important to Failover Management, which include but are not limited to examples such as:

  1. People and Organizations related to Failover Management
  2. Roles, Responsibilities, and Skills related to Failover Management
  3. Products and Services related to Failover Management
  4. Capabilities related to Failover Management
  5. Contracts, Agreements, and Licenses related to Failover Management
  6. Processes related to Failover Management
  7. Tools and Technologies (e.g. Systems/Applications/Software/Computers) related to Failover Management
  8. Data Types and Instances related to Failover Management
  9. Data Interfaces related to Failover Management
  10. Environments related to Failover Management
  11. Facilities and Locations related to Failover Management

If you and/or your enterprise are not collecting and maintaining such Inventories, you're probably considered to be very low on the efficiency and effectiveness maturity scale.

It's important to keep in mind that collecting and managing Failover Management Inventories is something that should be performed across all phases of Failover Management Lifecycle and across all Environments (i.e. Failover Management Environments). Both are considered to be very important Best Practices. For example, you and/or your enterprise cannot get a complete understanding of Failover Management costs or impacts without knowing all related Inventory Items in all environments. And, tracking across all lifecycle phases gives a temporal perspective that is important for things like problem analysis, historical reporting, and the reconstruction of state (i.e. Configuration Management).

NOTE: Failover Management Inventories are also important for other enterprise functions, such as Architecture and Design. Such Inventories represent the foundation for understanding an enterprise's Current State and are critical for planning Future State and any related strategies, roadmaps, and transition plans for facilititating change.


Environments: Failover Management Environments

Building environments that are specific to and for the discipline known as Failover Management is no different than doing so for any other discipline area. The reader should, therefore, refer to the IT Environment Framework to understand such environments.

IT Environment Framework for Failover Management

Metrics: Failover Management Metrics

As with any professional Discipline, the place to start with when dealing with Failover Management specific metrics is with standard metrics categorizations. Standard Metrics Categorizations, or what are commonly referred to as "SMCs," include but are not limited to...

Failover Management Quantitative Metrics: Quantitative metrics for Failover Management often revolve around the "counting" of key constructs that are associated with the Discipline. For example, the number of Failover Items or Entities that have been Created, Edited or Modified, Copied or Cloned, Destroyed, Archived, Restored, etc. (Note the correlations to key Failover Management Verbs!). Also, the counts for things like the number of Failover Management Stakeholders, such as but not limited to Paying Customers, End Users, Employees, Consultants, etc. are also very useful.

Failover Management Qualitative Metrics: Qualitative metrics for Failover Management often revolve around concepts such as Failover Management Defects, Failures, Problems, Incidents, and/or Issues. So, for example, if we were to capture the number of Failover Management Defects (i.e. their counts) over time, we could do things like see if Defect quantities are going up or down, over time, allowing us to explore that area for things like correlating Causes and Effects.

Failover Management Time Metrics: When dealing with Failover Management Time Metrics, there are usually two forms. The first was introduced in the previous paragraph, which has to do with capturing and measuring things like Quantitative or Qualitative Metrics, over time. In this case, we capture other metric categories, over time, with the intent to see how they change and perform, based on modifications to the Failover Management Operating Environment. The second form of Time related metrics has to do with system or operational performance, such as in the case of how long it takes to process a Failover Management Request, from the time it is created to the time the Requester gets a satisfactory deliverable that allows him or her to move on with his or her work.

Failover Management Utilization Metrics: Utilization Metrics specifically have to do with the consumption of Failover Management specific solutions or deliverables. For example, tracking the number of Failover Management Service Requests, over periods of time, along with their corresponding Failover Management Deliverables, allows one to measure how active Failover Management Services are against other Services that may exist within the Enterprise.

Failover Management Financial Metrics: As is always the case for any single Discipline, Financial Metrics for Failover Management always revolve around things like revenue, expenses, and profits, both, for operators of the Service or Services and for consumers of the Service or Services. For example, if a Failover Management Request is invoked by a Failover Management Customer (acting as the "Requester"), it becomes important to be able to identify and understand what the cost is to that Customer who is invoking the Request, and it also becomes important to understand why that cost is what it is. In the case of Services that do not yield revenue or profits, measuring costs is a strong way to, at very least, help understand the costs associated with each Service being performed by, within, external to, and for the Enterprise and its Customers.

Note: It's important to understand that, when it comes to metrics, enterprises should take a "Crawl," "Walk," "Run" approach to collecting, working with, and understanding them. That is, you cannot get to complex metrics collection, dissection, analysis, and understanding until you start with basic metrics and slowly work your way to more complex metrics representations.


Services: Failover Management as a Set of Services (a.k.a. Failover Management Services)

One of the most important concepts you will learn about Failover Management (or any Discipline, for that matter) is the notion of implementing the Discipline as an accountable, planned, controlled, transparent, and managed "Service."

In short, Services represent a logically "bounded" and repeatable sets of work types, activities or tasks that are performed by humans and/or machines, with the specific intent to provide outputs or deliverables, in the form of solutions for the requesting Stakeholders who are commonly considered the customers of such Services. In other words, we perform and/or provide a Service to deliver very specific solutions to very specific Stakeholders who are looking for a means to solve a certain problem they have.

A Failover Management Service is defined as:

"1. A set of solutions, either transactional (i.e. Transactional Failover Management Services) or dial-tone (i.e. Dial-Tone Failover Management Services), that are being or have been put in place to yield an intended, controlled, expected, repeatable and measurable set of results or deliverables for Failover Management specific Customers, Consumers or Clients.

NOTE: Failover Management Service Consumers or Clients can be either Human Resources or Systems."

All Services, including Failover Management Services, can be performed manually (i.e. by people), automatically (i.e. by machines such as Computers), or by a combination of the two (i.e. a hybrid that is both manually and automated).

Also, all Services, including Failover Management Services, can be either transactional or dial tone, in nature.

In the case of Transactional Services for Failover Management, a Service Request is submitted and that Request is fulfilled as part of a process that is either manual, automated, or a hybrid of both (e.g. a Service to perform maintainance on your Failover Management System).

In the case of Dial Tone Services for Failover Management, a Service is expected to be up, running, available, and accessible to an End User so that he/she/it may perform some controlled and highly repeatable function (e.g. a "Failover Management System" that is up and running all the time).

Failover Management Service Components: The successful implementation of Failover Management as a set of Services for your enterprise usually implies that a number of key components have been established to support it. These components are:

  1. A clearly documented and socialized Failover Management Service Owner that is held accountable for Service performance, quality, and cost.
  2. A clearly documented and socialized Failover Management Service Provider, Organization or Group who is performing the Service or work.
  3. A clearly documented and socialized inventory of all Failover Management Service Inputs, including Failover Management Service Requests and any artifacts necessary to support such Requests so that consumers of the Service know how to engage and request or take advantage of them.
  4. For every Failover Management Service Input, a clearly documented and socialized inventory of Failover Management Service Outputs, making it clear to consumers what they can expect to receive as a result of a successful Service Request.
  5. For every Failover Management Service Input, a clearly documented and socialized inventory of the work being performed by the Service Provider to achieve such Outputs or Deliverables.
  6. For every Failover Management Service Input, a clearly documented and socialized inventory Service Level Agreements (e.g. Service Availability, Service Duration, Service Guarantees, etc.) that can be used to set expectations and measure actuals against for said Service Outputs.
  7. Clearly specified Failover Management Service Costs that help set expectations for Service Requesters (i.e. the cost of a request) and that provide clear transparency to the organizations that fund and sponsor such Services (i.e. the Total Cost of Ownership (TCO) your Service(s).
  8. Failover Management Service Request Patterns (Estimation Creation, Modification, Decommission, Support/Incidents, Complaints, etc.) in order to create intuitive and repeatable user experiences across different Service Types.
  9. Clearly understand what Failover Management Service Resources are required, human or otherwise, to create and deliver your Failover Management Service Deliverables, in a repeatable, cost-efficient, timely, and high quality manner.
  10. For every Failover Management Service Request, understand the chargeback mechanism, in order to recoup your Service Costs.
  11. For every Failover Management Service, it's important to understand the skills that are required, will need to be developed, and will need to be maintained by Service Resources, in order to deliver each Service Deliverable.
  12. It's important to understand who your Failover Management Service Stakeholders are, this includes but is not limited to your Customers, Consumers, Clients, Sponsers, etc. are, as well as the types of problems it is that they're trying to solve or interests that they will have in your Services.

Failover Management Ownership: The most important thing to understand about a Failover Management Service is that, in order for such a Service to be successful, there must be a clear and accountable Owner for it. That is, there needs to be a very clear and accountable named person or organization that owns and is fully responsible for the Service, all of its sub-Services and, most importantly, all of the Service's "Outcomes." Without clear ownership, Services are almost never successful. And, for those few occasions where Services are successful without clear ownership, you can assume that they're successful because the people working in those Service areas are acting as heroes, or... the those Services are just plain lucky (that kind of luck doesn't last for long).

Failover Management Service Inputs: There are typically two types of inputs to any Failover Management Service. The first is what is known as a "Failover Management Service Request" and the second really represents any and all supporting artifacts that are necessary to support such requests, including but not limited to Data and Information in the form of Documents, either electronic or paper in form. Many would argue that the "money" to pay for the Service execution of the Request would be the third but, for now, we will assume that payment is controlled through the Data and Information provided to the Service Operators, in support of the Request.

Failover Management Service Outputs: The outputs of any Service are often referred to as the Service's Deliverables. Therefore, the readers should be aware that the terms "Failover Management Outputs" and "Failover Management Deliverables" are synonymous and interchangeable. All work performed in any enterprise is, by default, a Service that is being performed for someone else and, therefore, all work or Services yield results. These results are the Service's Outputs or Deliverables and a good Service ensures that such Outputs are appropriately documented to the consumers of said Service. This means that for any given Failover Management Service Request Type or Category there will be one or more clearly defined and documented Outputs or Deliverables, making it clear to the consumer what he, she, or they will get in response to their Request. This can be as simple as an answer to a question or as complex as the Merger of two enterprises.

Failover Management Service Levels: Service Levels represent "performance agreements," contractual or otherwise, that dictate how well a Failover Management Service should perform, most often keeping the Customers, Consumers, Clients or End Users of the Service in mind. Failover Management Service Levels can come in many forms and are often worked out by the Customers paying for the Services and the Service Providers who sell or provide the Services. In many cases, Service Levels are also self-imposed by the Service Providers performing the Services as a means to set expectations for Service Customers. In short, Failover Management Service Levels are constraints, limitations, and/or expectations that are tied directly to Failover Management Service Deliverables. They represent measures for things like quality, efficiency, and cost against said Deliverables or Outputs that allow the consumer of such Services to measure what they actually get against what they expected to get.


Service Paradigms: Centralized Failover Management vs. Federated Failover Management

Assuming an enterprise pursues the establishment of Failover Management as a set of controlled Services, there are three common paradigms for doing so. These include:

  1. A "Centralized Failover Management" implementation paradigm
  2. A "Federated Failover Management" implementation paradigm
  3. A "Hybrid Failover Management" implementation paradigm

Centralized Failover Management is defined as:

"1. The term or phrase that implies establishing and/or practicing the Discipline known as Failover Management as a concentric and singular set of organizations and services, usually in order to serve an entire enterprise, regardless of geographic location, further implying full centralization and no federation of any and all Failover Management associated Work, Activities, Actions, Tasks, Capabilities and/or Services."

Federated Failover Management, which is also referred to as Decentralized Failover Management, is defined as:

"1. The term or phrase that implies establishing and/or practicing the Discipline known as Failover Management in multiple pockets, communities, or organizations, further implying no centralization in the implementation and execution of Failover Management associated Work, Activities, Actions, Tasks, Capabilities and/or Services."

There are clear tradeoffs to each of the two models. For example, in a Centralized paradigm, it's normally easier to coordinate work and provide broad coverage, across many areas of the enterprise and relevant stakeholders. However, it becomes far more difficult for a centralized organization to properly fund and staff resources and services in order to perform all required work across all stakeholders, in a much larger enterprise.

It's also important to note that a third paradigm also exists as an option. This is known as a Hybrid Failover Management paradigm or model. In this case, there is a centralized Failover Management organization that is often responsible for things like centralized governance, command, control, and communications, while federated staff and services deal with localized forms of Failover Management. In this type of paradigm, federated staff and services usually report direclty into their local management but may have matrix reporting or responsibilities into the Centralized Failover Management organization.


Principles & Best Practices: Common Principles and Best Practices for Failover Management

A "Principle" is defined as being: "A professed assumption, basis, tenet, doctrine, plan of action or code of conduct for activities, work or behavior." Therefore, we can deduce the definition of "a Failover Management Principle" to be:

Failover Management Principle: "1. A professed assumption, basis, tenet, doctrine, plan of action or code of conduct for any activities, work or behavior associated with the Discipline known as Failover Management."

A "Best Practice" is defined as being: "One or more Activities, Actions, Tasks or Functions that often do not conform with strict Standards and that have evolved, over time, to be considered as conventional wisdom for consistently and repeated achieving Outcomes or Results that can be measured as being equal to or above acceptable norms." Therefore, we can deduce the definition of "a Failover Management Best Practice" to be:

Failover Management Best Practice: "1. One or more Failover Management related Activities, Actions, Tasks or Functions that often do not conform with strict standards and that have evolved, over time, to be considered as conventional wisdom for consistently and repeatedly achieving Outcomes or Results that can be measured as being equal to or above acceptable norms."

The plural form of this term would be "Failover Management Best Practices."

Common Failover Management related principles and best practices exist to help achieve higher than average expectations of quality and to ease in the implementation, support, operations, and future change associated with the solutions industry professionals put in place to address the needs of this Discipline and all its related stakeholders.

While this entire document is meant to represent and serve as a set of common principles and best practices for Failover Management, the following list represents a summary of some very basic examples of what implementers, supporters, and operators of Failover Management should constantly be working toward:

Principle or Best Practice Description
Establish and always have very clear Ownership for Failover Management. Establishing, publishing and socializing clear Ownership for Failover Management allows an enterprise and all its Resources, regardless of their geographic location, to assign accountability for all aspects of the Discipline. It also ensures that there's always at least one person that everyone can go to for transparency into the Discipline as well as for handling work that is associated with the Discipline.
Define, Collect, and Manage Relevant Failover Management Inventories. As an IT professional, there are probably few things that are as important as knowing what is or is not in your portfolio, as well as understanding key traits about your portfolio. You cannot achieve this without the transparency provided by your inventories. Therefore, it is critical that you clearly define, collect, manage, and govern any and all relevant Failover Management inventories. Lack of Failover Management Inventories means no transparency, a chaotic and immature environment, and (even worse) the implication that you don't know how to do your job.
Always use standard terminology for Failover Management, in order to standardize communications between stakeholders. It is often argued that the biggest mistake you can make is to create your own words and/or your own definitions, when communicating with others. There is no place where this is more accurate than in the field of Information Technology. IT Stakeholders make up their own words and definitions far too often, or let their business constituents do so. When you make up words or definitions, or you let others do so, you're creating a grave injustice for your organization. Self invented terminology and grammar often leads to poor communications, which in turn leads to redundancy of solutions, higher complexity of environments, slower delivery times, and much higher costs. Therefore, the IF4IT always recommends that you leverage standard terminology for Failover Management, whenever possible.
Centralization of Failover related data. While often impossible to centralize and collocate all Failover related data and information, especially in a geographically dispersed environment, Failover Management related stakeholders should always strive to centralize all data and information. The goals are to eliminate data fragmentation, improve source of truth for data, reduce the number of systems needed to support stakeholders, reduce the complexity of solutions, improve usability, and to ultimately reduce the costs associated with Failover Management.
Clearly define, implement, track, and analyze Failover Management Metrics. In order to successfully set up the discipline of Failover Management and its related Services, it is critical to clearly define, track, and constantly analyze Failover Management metrics. Such metrics include but are not limited to Supply and Demand Metrics (i.e. Operational Metrics), Performance Metrics, Quality Metrics, and Financial Metrics.
Transparency of Failover related data. Stakeholders should always strive to make any and all Failover Management data transparent to all other appropriate stakeholders, at a minimum, and often to the entire enterprises. The exception when private user data must be protected. Many stakeholders often make the mistake of treating internal operational data as private or protected. This often creates a data silo and will often lead to internally silo-ed organizations that revolve around such data silos.
Do not let "perfection" of Failover Management solutions stand in the way of "good enough solutions". Often, Failover Management stakeholders "overthink" solutions, leading to the impression that best-of-breed or perfect solutions are more effective than "good enough" solutions. Experience tells us that "good enough" is, almost always, the better path to follow. We live in an age where technologies grow old in the blink of an eye. Even the implementation of something that looks perfect, today, will look antiquated, tomorrow. This is especially true if your enterprise doesn't have a long term funding plan and commitment to improvements and upgrades of the solution(s) put in place.
Follow industry Standards, Best Practices, and Guiding Principles for Failover Management, whenever possible". One of the most common errors many enterprises make is to create solutions from scratch or without the guidance, assistance and/or experience of others who have created such solutions, before them. Whenever possible, the IF4IT recommends that you research existing Standards, Best Practices, and Guiding Principles to avoid the mistakes of others, while also gaining from their successes. Remember, we live in a vast world. Chances are very high that someone else has already experienced the pain you're about to create for yourself. Wise people will always look to learn from such people's experiences before they go down the road of implementing their own solutions.
Work toward and maintain a Single Source of Truth (SSoT), whenever possible. While it may be impossible to truly maintain a Single Source of Truth (SSoT) for all data items at all times, especially in the case where the same data entity or instance enters an enterprise through unique data channels, it is an accepted, industry-wide best practice to always work toward such a goal.

Further Reading and Reference Material for Failover Management

The Information Technology (IT) Learning Framework. A tutorial that helps understand Information Technology and how disciplines, such as this one, fits into the bigger picture of IT Operations.

Copyright 2009 - Present by The International Foundation for Information Technology (IF4IT) : Privacy Policy and Terms of Use