The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “Data Vault”

Data Vault Certification – Day 2

Still really cold here in Montreal…

But the classroom discussions heated up a bit today as we dove deeper into the secrets of the Data Vault. Today we got into the nitty-gritty of things like Point in Time (PIT) tables, Bridge tables, DV loading, DV agility, set logic, and designing for performance. We got into some brand new material that has not been in prior certification class. All great stuff.

Dan sure knows a lot about a lot of things (hardware, software, operating systems, disk blocks, cache, etc.). His broad knowledge and experience definitely contributed to what is now Data Vault. We got to hear several juicy stories about real world issues he encountered over his illustrious career that lead him to architect Data Vault to have all the neat features it has (hint: lots of unnamed 3-letter gov’t agencies were apparently involved). Dan is bound and determined to help as many people as possible avoid the many pitfalls he has seen in data warehouse architectures.

What I learned today:

I FINALLY understand the legitimate use for a Bridge table and why it is a good idea (i.e., better query performance getting data out of a DV). The examples in the class got through to me (this time). It is all spelled out in the book, but now I really do get it.

ETL Golden Rule (for performance improvement):

1 process per target per action (i.e., Insert, Update, Delete)

In other words don’t make your ETL too complicated by trying to account for everything in one routine. It becomes too complex to maintain over time and it reduces the amount of parallelization you can do (hence it is SLOWER).

Think about it – new rows get inserted, right? So why waste the cycles forcing them through update logic? In a Data Vault structure it is very easy to separate these out. The SQL is pretty simple actually (but that is another post by itself).

Dan will be teaching more about this in his upcoming online Implementation training course. Stay tuned for that.

Data Vault Data Load Golden Rule:

100% of the data loaded 100% of the time

That really makes the load process pretty easy and streamlined. It is correlated to the Data Vault concept of loading the facts – straight from the source, no modifications. This is why the Data Vault is 100% audit-able.

So for Day 3 we will get even more into how to load a Data Vault.

And then there is the TEST (insert dramatic music here).

We get four hours. 60  questions (half of which are essay!).

Guess I better get studying!

Look for my Day 3 wrap up tomorrow night (assuming I can still write by then).

See ya.

Kent

Data Vault Certification Class – Day 1

As promised, here is your update from the 1st day of the certification class…

First let me say it is COLD here. I forgot how truly bone chilling winter in the northeast could be. Glad I brought extra layers. I walked about 10 blocks from the hotel to the conference center this morning (and back tonight). I was definitely awake when I got there! I am sure that walk alone must have burned off a few hundred calories. 🙂

So the class was fun and educational today. We have 9 people attending. All from the Montreal-area, except of course me (and Dan). Nice group of people; very into it. One gentleman, Pierre, was part of Dan’s online coaching program and has actually already built a successful data vault. It is really nice being with a group of people who “get it”, are engaged, and want to learn more.

Bits and pieces from today:

The goal of certification:

  1. To validate that we actually understand the Data Vault Model and the standards
  2. To validate that we can actually explain it to someone else
  3. To test us to be sure we can actually apply the rules and standards when we develop a real model

Word of the Day:

Deformity: The URGE to continue “slamming” data into an existing conformed dimension until it simply cannot sustain any further changes. This results in a “deformed” dimensions, increased support costs, and likely leads to re-engineering.

Cause: Business saying “But can’t you just add a few more columns to the table? That should be easy right?”

New question to ask:

When you change or re-engineer part of your ETL that scrubs or transforms your source data, do you keep a copy of the original code and structures under some sort of source control? If not, how will you explain to the auditors why the data in the quarterly report changed between runs?

Concept I understand better after today: 

Transaction Links: This is a very special case when you can denormalize  Satellite attributes up into the Link (at a previous job we called these SLINKs). You only do this when the transaction values can NEVER, EVER change. Examples of this are GL entries, and stock trades. Dan’s examples and explanations today really improved my understanding immensely.

Phrase I coined today in the class:

Data Archaeology: Dan uses the analogy of Data Geology (i.e., layers) to explain how (and why) we load data the way we do in the Data Vault. I said that enables us (architects, analysts, users) to effectively do Data Archaeology to find and extract the data we need. We search for those nuggets of wisdom in the data to help our businesses. Sometimes that data is near the surface and sometimes it is fossilized deep in the historic layers within the Data Vault.

No doubt somebody, somewhere, has said this before, but just in case they haven’t, you heard it here first. 😉

We also had great discussions about Agile BI, virtualization of data marts, in-memory dbs, solid state disks, ontologies, and the future of data warehousing in general. And what data warehouse class would be complete without mentioning Bill Inmon and Dr. Ralph Kimball?

Well, that’s it for now.

Stay tuned for Day 2.

Kent

P.S. As I mentioned yesterday, feel free to leave any questions you might have for Dan in the comments and I will pass them on. Or better still, just go buy the Data Vault book.

Hello Montreal!

Well, it took all day, but I am safely tucked away for the evening in my hotel (Le Meridien Versailles) in Montreal.  I am here for a three-day Data Vault Certification class.

I left Houston this morning, flew through Cleveland (CLE) and then to Burlington, Vermont. There, my good friend, Dan Linstedt, kindly picked me up at the airport and drove us across the border to Montreal, Canada (which took about 2 hours) and deposited me at my hotel. Dan lives north of Burlington and was coming this way anyhow (he is actually teaching the class I am attending). It was a great chance for us to catch up as we have not seen each other (face to face anyway) for quite a few years. Look for some big things from Dan in the coming months.

Monday night I prepared for the class by loading up my laptop with all the latest Oracle toys: SQL Developer Data Modeler Release 3.1 Early Adopter, SQL Developer Release 3.1 Early Adopter, and the Oracle Database 11g Express Edition Release 2 (AKA Oracle XE). These are all free tools and very easy to set up (even for a non-DBA like moi). With that I should be able to actually build out the examples in the class for real.

(Hmm…maybe I can use those for examples in my KScope 12 session…)

In any case, it should be a fun few days meeting some new people, seeing a new city (I have never been here before), and seeing what new techniques Dan has come with in the Data Vault arena.

Stay tuned this week as I plan to provide some updates each day on what I learned in the class. (You might want to click the “follow” button if you haven’t already)

If there are any questions you want me to ask Dan, leave them below in the comments.

 

Announcement: Data Vault Model & Methodology Certification Class in Montreal!

I just found out today that my good friend Dan is going to teach a Data Vault certification class later this month in Montreal, Canada.

Guess what? I have already registered!

What a great opportunity to learn about Data Vault from the guy who invented it.

Plus you will get to be part of an elite group of data warehouse professionals who are actually certified as Data Vault Practitioners -assuming you pass the test of course 😉

This is going to be the only such class he will teach this year in North America so this is a rare opportunity. The class is being coordinated by agileDSS in Montreal. You can get details and register here. Don’t waste any time: the class starts January 25th and there are only 5 seats left as of today.

So, you might ask why I am going to this class – “Didn’t you coauthor a book with Dan and help him with the most recent one?”

Why yes, yes I did.

So why am I going to spend my hard earned money and take time away from my current gig to fly from Houston, Texas to Montreal to take this class?

  1. Dan is my friend. I want to support him in his business ventures.
  2. Dan is my friend. I haven’t seen him face-to-face since he moved from Colorado to Vermont and I moved from Colorado to Texas. This is a rare opportunity for us to catch up a bit in person.
  3. Dan is some kind of genius. He invented a new way to model data warehouse structures and has proven its effectiveness over the last 10+ years. It is not often you get to learn from someone like that (live and in person).
  4. I can always stand a refresher course. I have not been in a formal Data Vault class, taught by someone else, in close to 10 years. I am pretty sure Dan has come up with some new variations, interpretations, techniques, and in-the-trenches war stories since then!
  5. I love networking with other data warehouse professionals in person.
  6. I have never been to Montreal.
  7. It’s a new year. Time to invest a bit in my own professional development.
  8. Oh, and did I mention, Dan is my friend.

So why don’t you sign up too and join me in Montreal? I guarantee it will change the way you do data warehousing.

And you get to hang out with me and Dan! Such a deal!

So go to the link, register, and tell them the Oracle Data Warrior sent you.

See you in Montreal!

Kent

2012 – Year of the Data Vault?

Well, several of us sure hope so! 😉

Data Vault Modeling appears to finally be catching on in the DW community here in the USA and in Canada. My co-author (and co-conspirator) Dan Linstedt provided some details on organizations and consultants who have been successful using Data Vault in 2011. We got some great quotes from a few industry luminaries to boot! You can see the details in his year-end blog post.

We (Dan & I) spent a fair amount of time (i.e., years) and effort trying to get to this point. One big highlight (for me anyway) was finally seeing the technical book on data vault data modeling get published. We started writing parts of the book over five years ago, but things like making the mortgage payment kind of got in the way. Writing and self-publishing a book can be pretty intense so it is really nice to see it in print. You can get your very own hard-copy on Amazon, or an e-copy (with some cool bonuses) from Dan’s Learn Data Vault site. (Full disclosure – if you go buy the book from either site, I will of course make huge piles of $$$$ in royalties that might let me spring for lunch occasionally.)

If you are not really sure what Data Vault is all about and want the short course first, check out my Introduction to Data Vault slides from my presentation at Oracle Open World 2011.  I had about 30 people attend the session and had some great discussions with the attendees. That was pretty gratifying since it was the LAST session on the LAST day of the conference.

So 2011 was a very successful year for getting the word out that there is a better way to develop your enterprise data warehouse.

Here’s to continuing the momentum in 2012.

Later.

Kent

Post Navigation