The Data Warrior

Changing the world, one data model at a time. How can I help you?

Is the Data Vault too complex?

This was a very interesting topic that came up on LinkedIn the other day, so I wanted to address it here to.

There seems to be quite a few people who think that Data Vault models are harder to build and manage than what we do today in most shops. So let me explain how I came to learn Data Vault Data Modeling.

Before learning Data Vault, I had successfully built several 3NF OLTP, 3NF DW, and Kimball-style Dimensional data warehouses (and wrote about it with Bill Inmon and Len Silverston in the original Data Model Resource Book).

In other words, I had a reasonable amount of experience in the subject area (data modeling and data warehousing).

I personally found Data Vault extremely easy to learn as a modeling technique (once I took the time to study it a bit). At the time that meant reading the old white papers, attending some lunch & learns with Dan Linstedt and then building a few sample models.

I was definitely skeptical at first (and asked lots of questions at the public lunch & learns). I did not care about MPP, scalability, or many of the other benefits Dan mentioned. I just knew from experience there were a few issues I had seen with the other approaches when it came to building a historical enterprise data store and was hoping Data Vault might be a solution.

In comparison to trying to learn how to design and load a Type 2 slowly changing dimension, Data Vault was a piece of cake (for me anyway).

Once I was convinced, I then introduced the technique to my team in Denver – who had virtually no data warehouse experience.

It was universal – everyone from the modelers to the dbas to the ETL programmers found the technique very easy to learn.

Our investment: One week of training from Dan for 7 people and 3 or 4 days of follow-on consulting where Dan came in once a month (for a day) to do a QA review on our models and load routines and mentor us on any issues we were having.

Dan did not make much $$ off of us. 😦

Since then, I have found that experienced 3NF modelers pick up the technique in no-time flat.

Why is that?

Because Data Vault relies on solid relational principles, experienced 3NF modelers seem to grasp it pretty fast.

Modelers who only have experience with star schemas, on the other hand, seem to have a bit of a hard time with the approach. For some of them it is a paradigm shift in modeling technique (i.e., feels very unfamiliar – “too many tables and joins”),  for others it is almost a dogmatic objection as they were (sadly) taught that dimensional/star was the only “right” way to do data warehousing.

They are just not open to a new approach for any reason (sad but true). 😦

The biggest issue I have seen with clients is a reluctance to try the approach for fear of failure because they don’t personally know anyone (other than me) who has used the approach and because they think it is easier (and cheaper?) to find dimensional modelers.

This happens, even if they agree in concept that Data Vault sounds like a very valid and flexible modeling approach.

As we all know, it takes $$ to train people on star schema design too, so my advice is that if you have a team of people who know 3NF but don’t know dimensional, train them on Data Vault to build your EDW, then hire one or two dimensional modelers to build your end user reporting layer (i.e., data marts) off the Data Vault.

So that’s my 25 cent testimonial. (You get if for free!)

If you want to learn more about Data Vault, check out my presentations on SlideShare or click on the Super Charge book cover (below my picture in the sidebar) to buy the Data Vault modeling book..

Check it out and let me know what you think in the comments. How do we get people over the fear of trying Data Vault?

Talk to you later.

Kent

Advertisements

Single Post Navigation

6 thoughts on “Is the Data Vault too complex?

  1. I had substantial background in both 3NF and Star solutions when I was exposed to DV. I found it intuitive and natural from the start. But, most importantly, I found it to address many of the nastiest, common problems of the other approaches. Yes, it’s different. But I believe that, were one to learn DW from the up in each of the 3 paradigms…having no previous bias…one would conclude that DV was the most appropriate for the front-end of the warehouse. As Kent points out, you’ll still want star-based DMs on the backend. I see DV as a part of the overall architecture…not a full replacement.

  2. Production data models leave a deep impression in operator’s memories. Statistical, analytical, reporting, and mining models are sufficiently different and may seem alien to the uninformed.

    The reasons for the differences in model and methodology are easy to understand: each demands a distinct and diverse end-point. Production data serve well-knwn business processes. Analytical data (my shorthand for all the non-production uses) fuel the derivation engines of the Magic Data 8 Ball. (Answer uncertain. Try again later.)

    Like a kata, the production demands, forms, states, stages, and uses are drilled into the early students of production data. It may be harder for some to extend (or even break) the kata before they can explore what may seem far less obvious.

    As you point out in your blog entry, the concepts embraced in this “different” data philosophy are natural and understood in other fields: mostly scientific, or meta-mathematical — but rarely in engineering.

    As it is the schools of engineering that have influenced, badly in many cases, the foundational training of our next generation technologists, the rigidity of engineering can be seen, sometimes, in the difficulty when breaking the norms of the production data kata. A broader understanding of the natural and theoretical worlds would make this transition easier. But it is as it is.

    A natural law in chemistry is that the most common structure of a molecule can be found when the internal energy of the molecule is lowest (of all possible combinations of its atoms). I hold the same theory regarding data. While not the lowest energy state possible of all data elements (in a manner of speaking), the data vault provides an obvious reduction in internal energy for a schema.

    We may not like to acknowledge the fact, but the structure of business process data demands a higher state of internal energy, reducing the energy consumption necessary in business processes or function. The internal energy level in a proposed analytical data state is theoretically much lower. However, by removing the energy demands to support business processes, the data appear to lose a lot of structure (when, in fact, the data take on the greatest possible structure).

    Perhaps this is the greatest impediment to those who expect to see familiar business process structures in their data. They observe a “natural state” of data. Unfortunately for many data professionals, they have never seen data in a natural state.

    As you mention, for those who are comfortable with 3nf data the transition to a more natural state seems easier. I have concluded that the 3nf is a good break-away point to migrate from production to analytical states. Those who expect cognitive compatibility between production and analytical data may experience greater transitional dissonance.

    In short, complexity is different to different people. After becoming comfortable with a high-complexity field of production data, the natural data state may seem so unfamiliar as to mimic complexity, when in fact it is the reduction of such.

  3. Not hard to learn, but a bit hard to implement. I am in the process of building a SCD Type 2 Dimension based upon four hubs with their satellites and three link tables and the join structures are killing my brain 🙂 I try to build point in time tables using autogeneration of sql (we are on Oracle ) and I am searching for relevant and suitable documents on the web telling me exactly how to do that. Any ideas or pointers in the right direction will be much appreciated !

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: