This publication covers the various types of Data Sources that are critical for successful Data, Information, and Records Management efforts, which in turn involve and rely on Data Lineage (DL), Master Data Management (MDM), and Data Governance (DG).
Defining a Data Source
A Data Source is a location from which one or more persons or systems can access one or more types of data.
NOTE: The word Source is used instead of and is preferable to the word System because a Source can represent a Person, Organization, System, Repository, or any other entity from which data can be accessed.
Data Source Types
The following represent the various Data Source Types that are relevant for Data, Records, and Information Management:
- Source of Origin (SoO)
- Source of Truth (SoT)
- Single Source of Truth (SSoT)
- Source of Mastering (SoM)
- Source of Record (SoR)
- Dirty Data Source (DDS)
Source of Origin
A Source of Origin (SoO) represents an entity from which data originates and is created. Data Lineage for a Data Type and its Data Records begins at the Source of Origin and often moves on to other people, systems, and locations (all known as Data Targets), where that data may or may not change.
It is important to note that there may be multiple Sources of Origin for a given entity that data records may represent. For example, if “Jane Doe” is an employee, the Human Resources department may have originally entered her name and data about her in the Employee Management System, where her existence as an employee started. At some other point, “Jane Doe” may have also purchased products from the same company she works for and the Sales organization may have entered her name and data about her in the Sales Management System, where her existence as a customer started.
Source of Truth
A Source of Truth (SoT) is considered a data source that has been deemed to have an appropriate level of quality, integrity, consistency, and availability in a manner that establishes an adequate level of trust for the Data Targets that access and use its data.
In many complex environments, there may be multiple representations of the same data in different systems. For example, a person “Jane Doe” may originate and exist in multiple systems, simultaneously. She might be an employee and exist in the Employee Management System and she may also buy products from the company she works for and will also exist in the Customer Management System. Both representations of “Jane Doe” may be different. For example, in one system her name may be spelled “Jane Doe”, while in another her name may be spelled “Doe, Jane”. Both representations are different but accurate and represent the same “Jane Doe” person.
In the above two examples, the Human Resources organization and all systems that generate employee-related reports may use the Employee Management System as their SoT while the Sales organization and all systems that generate sales related reports may use the Customer Management System as their SoT. In each case, context dictates which system represents the appropriate SoT.
Single Source of Truth
A Single Source of Truth (SSoT) is the name given to a data source that has been deemed the single most trusted source for one or more downstream Data Targets or consumers. In short, a SSoT is just another name for a SoT. However, the term is often misused, implying that there can be one and only one place where data is good and can be accessed for consumption and processing by all.
It should be noted that unless you are working in a very simple environment, it is almost impossible to achieve a Single Source of Truth for data.
Source of Mastering
A Source of Mastering (SoM) is a data source where multiple different records from different Sources of Origin and/or Source of Truth may be combined to create a new SoT for other downstream Data Targets.
In the example where “Jane Doe” exists in, both, the Employee Management System and the Customer Management System, an enterprise might collect, combine and create a new representation of “Jane Doe” (called a mastered representation or mastered record) that can be used for other purposes beyond employee management and sales.
Source of Record
A Source of Record (SoR) is a data source that has been deemed the entity from which data that represents evidence (i.e. Records) can be stored and/or accessed as part of the greater discipline known as Records Management.
In the above example, Human Resource and employee evidence will come from the Employee Management System while Sales and customer evidence will come from the Customer Management System.
NOTE: A Source of Record is always a Source of Trust.
Dirty Data Source
A Dirty Data Source (DDS) represents a data source where the data has been deemed to be of too little trust to utilize. For example, the data in a DDS may be incomplete and/or inconsistent and may not be suitable for use by any system until it is somehow improved or mastered.
For example, there may be an application with an online form that has an attribute called “Name”. When people fill in their respective names, we may get many different name formats and levels of completeness, such as:
- “Jane Doe”
- “King of the Hill”
- “T. Brawly”
- “Smith, Robert”
Each of the above cases has different formats and different levels of completeness. The data is, therefore, untrustworthy and cannot be used for things like automatically generating very formal letters which start with the string “Dear
- Understanding Data Lineage Traits for building comprehensive Data Profiles
- Understanding Knowledge Structures
- Better Knowledge Management via Enterprise (Data) Inventories
- Understanding Different Data/Record Types and Categories
If you’d like to help improve the contents of this publication or you’d like to learn more about Data Lineage and other Data-related activities, feel free to Contact Us and we will happily work with you to accommodate your needs.