Introduction To Data Driven Synthesis
Data Driven Synthesis, also known as Data Compilation, is the process of “automatically” compiling data and creating something (anything) from it. Unlike common compilers that compile source code based on rules, Data Synthesizers or Data Compilers take in data and rules in order to generate some desired output.
Data Driven Synthesis is also a paradigm used in Big Data processing, where you use tools to transform data into useful output artifacts like reports and interactive data visualizations.
If you think that Data Driven Synthesis is a new concept, it’s not. The semiconductor industry has been using synthesis concepts to automatically specify the circuitry layout of massively complex semiconductor circuits for more than twenty years. Companies like Synopsys and Silc Technologies (ultimately acquired by Synopsys) used synthesis to revolutionize how semiconductors were designed, tested and fabricated. Unfortunately, the general Information Technology (IT) industry has not caught up, yet. The bright side is that people are finally getting smarter and are taking advantage of such concepts, more frequently.
Why We Use Data Driven Synthesis
We use Data Compilers, which perform Data Driven Synthesis, for three key reasons:
- They’re lightening fast compared to people.
- They’re highly affordable compared to other technology solutions.
- They yield much higher levels of quality than humans can.
Examples of Data Driven Synthesis Solutions
Two common examples of Data Driven Synthesis solutions include the Java Spring framework and the Ruby on Rails framework. Both use metadata to automatically generate large amounts of software source code constructs that save developers a great deal of coding time and effort.
Another example is a Digital Library Generator, such as our own NOUNZ data compiler, which takes in various types of data, along with user defined rules, and compiles it into a massive static HTML documentation tree that embeds advanced knowledge constructs, such as highly interactive catalogs, indexes, reports, charts, and visualizations. Tools like this can generate millions of web pages in minutes, directly from data, eliminating the need for armies of people to document an enterprise. Imagine generating your own web site that is bigger and more powerful than sites like Wikipedia, in minutes to hours.
Here are examples of knowledge constructs that can be auto-generated (i.e. synthesized) using the DDS paradigm:
- Catalogs (as in Library Catalogs of sets of Indexes),
- Indexes (as in Library Indexes that group classified sets of artifacts),
- Inventory Reports,
- Graphs and Charts (including Dashboards),
- Interactive Visualizations (e.g. complex 360 degree views of data and filterable data relationship views),
- Semantic Relationships (triples that include subjects, predicates, and objects),
- Millions of HTML links between data and all of the above artifacts, and
- All of the above, combined and neatly organized and interlinked, in one massive Digital Library.
Here are example solutions that can be auto-generated (i.e. synthesized) with data synthesis tools (i.e. Data Compilers):
- Architecture Models
- Configuration Management Models / Configuration Management Databases
- Knowledge Maps
- Digital Libraries
- Source Code (for frameworks like Java Spring and Ruby on Rails)
One interesting thing to consider is that all of the above can be combined into one synthesized output. This is something that is near impossible to achieve with traditional database-centric systems that would require complex, time consuming and expensive integration between each of them.
Disadvantages of Data Driven Synthesis
It’s important to understand that, while Data Driven Synthesis has many advantages and offers solutions for many problems, it does not solve all problems and there are times where you may want to use a systems with a more complex technology stack that includes databases. For example:
- Data Driven Synthesis tools, like data compilers, don’t usually support transactional persistence (like databases).
- The larger your dataset the longer it will take to compile the output (although very large quantities can easily be handled).
- Metadata and data changes require recompiles.
Advantages of Data Driven Synthesis
While there may be some disadvantages to DDS, there are many advantages to using this paradigm. Some include but are not limited to:
- Agile “Fail Fast“: Speed to address change is the primary advantage of compilers. When there’s a problem, you simply change the data or rules and just recompile. Data Compilers allow you to quickly set up your data and rules, run the compilation, and see your end state much faster than 3-tier transactional database platforms. You can also very quickly change your data and rules, and rerun the compilation to get to a new state or set of views. 3-tier systems do not allow this iterative and agile flexibility, as you have to change the model, transform old data to fit the new model, and build integrations to bring data into the the new model structure. The more data you have in your 3-tier system, the longer, more difficult, and more expensive it is to change it. Data Compilers allow for rapid change.
- History and Comparison: Data Compilers allow you to create version-able snapshots of points in time. This means you can compare what your data looked like, between any two points in time. Very few 3-tier systems implement such functionality due to its complexity and high costs.
- Multiple Instances: With a compiler you can have multiple instances of the same data or different data because the output of each compilation stands alone, as a separate instance. Such multiple instances are the foundation for being able to achieve History and Comparison (mentioned in the previous paragraph) but they also mean you can isolate different pockets of results, based on context. For example, NOUNZ generates Digital Libraries. I can use the same compiler to generate a Digital Library that contains data about my company or enterprise, and also use it to create a Digital Library that contains data about Astronomy. One tool can generate multiple separate stand alone instances. These instances can, if desired, be tied together at a broader level.
- Portability: Most Data Compilers generate stand-alone outputs. Because of this, the results can usually sit on storage as small as a Flash Drive or a Hard Drive. This means the output can be carried around, conveniently, on devices like Laptops. This is something that is not easily achieved with 3-tier transactional database platforms.
- Lower Complexity and Costs: Because Data Compilers are simple, they’re much more affordable. They do not require all the integrations, data transformations, modeling, and other aspects that are associated with 3-tier transactional database systems. This makes them much simpler and faster to leverage. Simpler and faster translates directly to cost savings and avoidance.
- Versioning and History: Every compilation output is a snapshot in time that can be easily versioned, archived and restored.
Summary and Conclusions
Years ago, compilation of complex artifacts was considered slow and tedious due to the limitations of computer processing technology. However, now computers are much faster and memory can deal with far greater quantities of data, making Data Compilation and the paradigm of Data Driven Synthesis far more effective and efficient. While it does not solve all problems, DDS is a very powerful alternative to expensive multi-tier applications that require significant physical computing infrastructure and virtual software infrastructure to run. This is especially true when it comes to processing data and turning it into complex structures that facilitate knowledge management and sharing, such as Digital Libraries. In the end, DDS presents the simple question of: Why would you deliver and maintain a complex solution that costs significant time, money and energy to deploy and operate when, in fact, you can do the same and far more for a fraction of the time, effort, and investment? We believe that sometimes, you have to think outside the boundaries that the industry imposes on itself.
- Automated Content Generation (ACG) for Better Enterprise Knowledge Management (EKM) (using Data Driven Synthesis)