Data Vault vs. The World (1 of 3)
Okay, maybe not “the world” but is does sometimes seem like it.
Even though the Data Vault has been around for well over 10 years now, has multiple books, video, and tons of success stories, I am constantly asked to compare and contrast Data Vault to approaches generally accepted in the industry.
What’s up with that?
When was the last time you got asked to justify using a star schema for your data warehouse project?
Or when was that expensive consulting firm even asked “so what data modeling technique do you recommend for our data warehouse?”
Such is the life of the “new guy.” (If you are new to Data Vault, read this first.)
So, over the next few posts, I am going to lay out some of the explanations and justifications I use when comparing Data Vault to other approaches to data warehousing.
The first contestant: Poor man’s ODS vs. Data Vault
This approach entails simply replicating the operational (OLTP) tables to another server for read only reporting. This could be used as a partial data warehouse solution using something like Oracle’s GoldenGate to support near real time operational reporting that would minimize impact on the operational system.
This solution, however, does not adequately support needs for dimensional analysis nor would it allow for tracking of changes to the data historical (beyond any temporal tracking inherent in the OLTP data model).
A big risk of this approach is that as the OLTP structures continue to morph and change over time, reports and other extracts that access the changed structures would of course break as soon as the change was replicated to the ODS.
How does Data Vault handle this?
Data Vault avoids these problems by using structures that are not tightly coupled to any one source system. So as the source systems change we simply add Satellite and Link structures as needed. In the Data Vault methodology we do not drop any existing structures so reports will continue to work until we can properly rewrite them to take advantage of the new structure. If there is totally new data added to a source, we would probably end up adding new Hubs as well.
An additional advantage is that because Data Vault uses this loosely coupled approach we can load data from multiple sources. If we replicate specific OLTP structures, we would not be able to easily integrate other source system feeds – we would have to build another repository to do the integration (which would likely entail duplicating quite a bit of the data).
Don’t get me wrong, there is nothing wrong with using replication tools to build real time operational data stores.
In fact it is an excellent solution to getting your operational reporting offloaded from the main production server.
It is a tried and true solution – for a specific problem.
It is however, not the right solution if you are building an enterprise data warehouse and need to integrate multiple sources or need to report on changes to your data over time.
So let’s use the right tool for the right job.
Data Vault is newer, better tool.
In the next two posts I will compare Data Vault to the Kimball-style dimensional approach and then to Inmon-style 3NF.
P.S. Be sure to sign up to follow my blog so you don’t miss the next round of Data Vault vs. The World.