The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “#datavault”

Want to be a Data Vault implementation Black Belt?

Are you tired of seeing failed data warehouse projects?

Tired of being part of the problem or having to clean up after someone else messed it up?

Well, now you can be part of the solution and kick implementation failures in the <you know what>.

I am pleased to tell you that my good friend, Dan Linstedt, creator of the Data Vault, has just launched a new, online, Data Vault training portal.

And it is now open to the public!

The first class you can get is on Data Vault implementation.

It is way cool!

The quality is excellent and the material is even better (including material I have never seen in a class). Dan provides tons of information about not only the right way to implement the Data Vault but gives examples of how he has done it and gives you code templates (for multiple databases) you can implement on a real project.

Why would you want to sign up for this training? Well lots of reasons:

  1. You read the Data Vault modeling book, but can’t quite see how to load the model after it is built.
  2. It’s less expensive than face to face training. No time off or travel required!
  3. You can rewind and watch the training at your own pace (no need to feel behind or ahead of the rest of the class).
  4. You get access to the course for an entire year instead of 1 to 3 days in a lecture format. So you can watch it over and over again.
  5. You get to ask Dan questions directly (and you can even engage and interact with other students).
  6. Dan is going to host tele-seminars for members only where you can ask him any question (without having to pay his normal consulting fees).
  7. It is currently on sale at a huge discount.

This is really a great deal.

So, what are you waiting for?

Head on over to the site now and get started! (If you are ready to buy and want to skip the sales stuff, just scroll to the bottom and hit “add to cart”. So why are you still here?)

You can’t get this material anywhere else and get direct access to the guy that invented it.

Doesn’t get much better than that.

Later.

Kent

More free stuff!

Hey gang,

I have been working hard over the past few weeks to find some of my old white papers so I could make them available to everyone on my blog site. Well, I finally found a few of them on some flash drives and figured out how to upload them to here to WordPress.

If you look above you should now see a new menu item called “White papers”. Click that link to get access to the papers I have found so far.

They are FREE for you to download. I am not even asking you to “opt in” or anything.

I just ask that you respect the copyrights and tell folks where you found them (share on Facebook, LinkedIn, Tweet it, etc).

I know there are more but have to figure out which ones are still useful (or at least moderately so). So be sure to check back often to see what I have added.

If you remember any I did in the past you might want a copy of, tell me in the comments (below) and I will see if I can find it.

Oh and as a bonus, I have also included a copy of my recent “Introduction to Data Vault Data Modeling” article just in case you have not read it yet.

Hope you find some of these useful. Have a great week!

Kent

P.S. I am thinking about publishing some of these, with minor revisions, to Kindle. Do you think that would be useful to any of you?

Is the Data Vault too complex?

This was a very interesting topic that came up on LinkedIn the other day, so I wanted to address it here to.

There seems to be quite a few people who think that Data Vault models are harder to build and manage than what we do today in most shops. So let me explain how I came to learn Data Vault Data Modeling.

Before learning Data Vault, I had successfully built several 3NF OLTP, 3NF DW, and Kimball-style Dimensional data warehouses (and wrote about it with Bill Inmon and Len Silverston in the original Data Model Resource Book).

In other words, I had a reasonable amount of experience in the subject area (data modeling and data warehousing).

I personally found Data Vault extremely easy to learn as a modeling technique (once I took the time to study it a bit). At the time that meant reading the old white papers, attending some lunch & learns with Dan Linstedt and then building a few sample models.

I was definitely skeptical at first (and asked lots of questions at the public lunch & learns). I did not care about MPP, scalability, or many of the other benefits Dan mentioned. I just knew from experience there were a few issues I had seen with the other approaches when it came to building a historical enterprise data store and was hoping Data Vault might be a solution.

In comparison to trying to learn how to design and load a Type 2 slowly changing dimension, Data Vault was a piece of cake (for me anyway).

Once I was convinced, I then introduced the technique to my team in Denver – who had virtually no data warehouse experience.

It was universal – everyone from the modelers to the dbas to the ETL programmers found the technique very easy to learn.

Our investment: One week of training from Dan for 7 people and 3 or 4 days of follow-on consulting where Dan came in once a month (for a day) to do a QA review on our models and load routines and mentor us on any issues we were having.

Dan did not make much $$ off of us. 😦

Since then, I have found that experienced 3NF modelers pick up the technique in no-time flat.

Why is that?

Because Data Vault relies on solid relational principles, experienced 3NF modelers seem to grasp it pretty fast.

Modelers who only have experience with star schemas, on the other hand, seem to have a bit of a hard time with the approach. For some of them it is a paradigm shift in modeling technique (i.e., feels very unfamiliar – “too many tables and joins”),  for others it is almost a dogmatic objection as they were (sadly) taught that dimensional/star was the only “right” way to do data warehousing.

They are just not open to a new approach for any reason (sad but true). 😦

The biggest issue I have seen with clients is a reluctance to try the approach for fear of failure because they don’t personally know anyone (other than me) who has used the approach and because they think it is easier (and cheaper?) to find dimensional modelers.

This happens, even if they agree in concept that Data Vault sounds like a very valid and flexible modeling approach.

As we all know, it takes $$ to train people on star schema design too, so my advice is that if you have a team of people who know 3NF but don’t know dimensional, train them on Data Vault to build your EDW, then hire one or two dimensional modelers to build your end user reporting layer (i.e., data marts) off the Data Vault.

So that’s my 25 cent testimonial. (You get if for free!)

If you want to learn more about Data Vault, check out my presentations on SlideShare or click on the Super Charge book cover (below my picture in the sidebar) to buy the Data Vault modeling book..

Check it out and let me know what you think in the comments. How do we get people over the fear of trying Data Vault?

Talk to you later.

Kent

Data Vault Certification Class – Day 3 (It’a a wrap!)

Well, not so cold today in Montreal. Instead we got very cold rain and snow mix. Yuk. (But I definitely want to come back in the summer!)

This morning, Dan dived into how to load the Data Vault with all new material he has not taught in the certification class before. We really got lucky by attending this class. I knew most of the concepts, and have implemented most of them, but his new slides are just killer. They really get the point across and cleared up a few points for me too.

Not only do the slides include sample SQL for various parts of the load, and the logical patterns, Dan even demonstrated some working code on his really cool, high-powered laptop. It was great for the class to see the Data Vault concepts put into practice. (And he of course had some more tales to tell)

Cool Phrase for the Day

Short Circuit Boolean Evaluation: A mathematical concept, that Dan laid on us, that is used to get very fast results from the Data Vault change detection code . We use it in doing column compares on attribute data to determine if a Satellite row has changes.

In Oracle it looks like this: decode(src_att1, trg_att1,0,1) = 1

In ANSI-SQL it is a bit longer but has the same effect:

CASE WHEN (src_att1 is null and trg_att1 is null or src_att1 = trg_att1))

THEN 1 else 0 = 0

I have been using this for years (learned it from Dan) but had no idea there was a math term for it.

Okay so I am a geek. 🙂

The Test

After all that cool stuff came the certification test.

Not easy. My hand cramped writing out the answers.

We get our results next week. (Dan has a fun weekend ahead of him doing a bunch of scoring).

I am sure everyone in class will do fine. As I said, they all seemed to get it.

Anyway, the class is over now and I am in a hotel in Vermont (where it is snowing now). I fly back to Houston in the morning.

I had a good week here in the northeast (despite the weather). It was definitely worth the time and money to come for this class. I met some great people, learned a lot, and got to spend time with my good friend Dan.

Watch out Montreal – you are about to be descended upon by a whole new batch of Data Vault experts.

It could change the way you do data warehousing.

Later.

Kent

Data Vault Certification – Day 2

Still really cold here in Montreal…

But the classroom discussions heated up a bit today as we dove deeper into the secrets of the Data Vault. Today we got into the nitty-gritty of things like Point in Time (PIT) tables, Bridge tables, DV loading, DV agility, set logic, and designing for performance. We got into some brand new material that has not been in prior certification class. All great stuff.

Dan sure knows a lot about a lot of things (hardware, software, operating systems, disk blocks, cache, etc.). His broad knowledge and experience definitely contributed to what is now Data Vault. We got to hear several juicy stories about real world issues he encountered over his illustrious career that lead him to architect Data Vault to have all the neat features it has (hint: lots of unnamed 3-letter gov’t agencies were apparently involved). Dan is bound and determined to help as many people as possible avoid the many pitfalls he has seen in data warehouse architectures.

What I learned today:

I FINALLY understand the legitimate use for a Bridge table and why it is a good idea (i.e., better query performance getting data out of a DV). The examples in the class got through to me (this time). It is all spelled out in the book, but now I really do get it.

ETL Golden Rule (for performance improvement):

1 process per target per action (i.e., Insert, Update, Delete)

In other words don’t make your ETL too complicated by trying to account for everything in one routine. It becomes too complex to maintain over time and it reduces the amount of parallelization you can do (hence it is SLOWER).

Think about it – new rows get inserted, right? So why waste the cycles forcing them through update logic? In a Data Vault structure it is very easy to separate these out. The SQL is pretty simple actually (but that is another post by itself).

Dan will be teaching more about this in his upcoming online Implementation training course. Stay tuned for that.

Data Vault Data Load Golden Rule:

100% of the data loaded 100% of the time

That really makes the load process pretty easy and streamlined. It is correlated to the Data Vault concept of loading the facts – straight from the source, no modifications. This is why the Data Vault is 100% audit-able.

So for Day 3 we will get even more into how to load a Data Vault.

And then there is the TEST (insert dramatic music here).

We get four hours. 60  questions (half of which are essay!).

Guess I better get studying!

Look for my Day 3 wrap up tomorrow night (assuming I can still write by then).

See ya.

Kent

Post Navigation