Is the Data Vault too complex?
This was a very interesting topic that came up on LinkedIn the other day, so I wanted to address it here to.
There seems to be quite a few people who think that Data Vault models are harder to build and manage than what we do today in most shops. So let me explain how I came to learn Data Vault Data Modeling.
Before learning Data Vault, I had successfully built several 3NF OLTP, 3NF DW, and Kimball-style Dimensional data warehouses (and wrote about it with Bill Inmon and Len Silverston in the original Data Model Resource Book).
In other words, I had a reasonable amount of experience in the subject area (data modeling and data warehousing).
I personally found Data Vault extremely easy to learn as a modeling technique (once I took the time to study it a bit). At the time that meant reading the old white papers, attending some lunch & learns with Dan Linstedt and then building a few sample models.
I was definitely skeptical at first (and asked lots of questions at the public lunch & learns). I did not care about MPP, scalability, or many of the other benefits Dan mentioned. I just knew from experience there were a few issues I had seen with the other approaches when it came to building a historical enterprise data store and was hoping Data Vault might be a solution.
In comparison to trying to learn how to design and load a Type 2 slowly changing dimension, Data Vault was a piece of cake (for me anyway).
Once I was convinced, I then introduced the technique to my team in Denver – who had virtually no data warehouse experience.
It was universal – everyone from the modelers to the dbas to the ETL programmers found the technique very easy to learn.
Our investment: One week of training from Dan for 7 people and 3 or 4 days of follow-on consulting where Dan came in once a month (for a day) to do a QA review on our models and load routines and mentor us on any issues we were having.
Dan did not make much $$ off of us. 😦
Since then, I have found that experienced 3NF modelers pick up the technique in no-time flat.
Why is that?
Because Data Vault relies on solid relational principles, experienced 3NF modelers seem to grasp it pretty fast.
Modelers who only have experience with star schemas, on the other hand, seem to have a bit of a hard time with the approach. For some of them it is a paradigm shift in modeling technique (i.e., feels very unfamiliar – “too many tables and joins”), Â for others it is almost a dogmatic objection as they were (sadly) taught that dimensional/star was the only “right” way to do data warehousing.
They are just not open to a new approach for any reason (sad but true). 😦
The biggest issue I have seen with clients is a reluctance to try the approach for fear of failure because they don’t personally know anyone (other than me) who has used the approach and because they think it is easier (and cheaper?) to find dimensional modelers.
This happens, even if they agree in concept that Data Vault sounds like a very valid and flexible modeling approach.
As we all know, it takes $$ to train people on star schema design too, so my advice is that if you have a team of people who know 3NF but don’t know dimensional, train them on Data Vault to build your EDW, then hire one or two dimensional modelers to build your end user reporting layer (i.e., data marts) off the Data Vault.
So that’s my 25 cent testimonial. (You get if for free!)
If you want to learn more about Data Vault, check out my presentations on SlideShare or click on the Super Charge book cover (below my picture in the sidebar) to buy the Data Vault modeling book..
Check it out and let me know what you think in the comments. How do we get people over the fear of trying Data Vault?
Talk to you later.
Kent







