The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the month “January, 2012”

Data Vault Certification Class – Day 3 (It’a a wrap!)

Well, not so cold today in Montreal. Instead we got very cold rain and snow mix. Yuk. (But I definitely want to come back in the summer!)

This morning, Dan dived into how to load the Data Vault with all new material he has not taught in the certification class before. We really got lucky by attending this class. I knew most of the concepts, and have implemented most of them, but his new slides are just killer. They really get the point across and cleared up a few points for me too.

Not only do the slides include sample SQL for various parts of the load, and the logical patterns, Dan even demonstrated some working code on his really cool, high-powered laptop. It was great for the class to see the Data Vault concepts put into practice. (And he of course had some more tales to tell)

Cool Phrase for the Day

Short Circuit Boolean Evaluation: A mathematical concept, that Dan laid on us, that is used to get very fast results from the Data Vault change detection code . We use it in doing column compares on attribute data to determine if a Satellite row has changes.

In Oracle it looks like this: decode(src_att1, trg_att1,0,1) = 1

In ANSI-SQL it is a bit longer but has the same effect:

CASE WHEN (src_att1 is null and trg_att1 is null or src_att1 = trg_att1))

THEN 1 else 0 = 0

I have been using this for years (learned it from Dan) but had no idea there was a math term for it.

Okay so I am a geek. 🙂

The Test

After all that cool stuff came the certification test.

Not easy. My hand cramped writing out the answers.

We get our results next week. (Dan has a fun weekend ahead of him doing a bunch of scoring).

I am sure everyone in class will do fine. As I said, they all seemed to get it.

Anyway, the class is over now and I am in a hotel in Vermont (where it is snowing now). I fly back to Houston in the morning.

I had a good week here in the northeast (despite the weather). It was definitely worth the time and money to come for this class. I met some great people, learned a lot, and got to spend time with my good friend Dan.

Watch out Montreal – you are about to be descended upon by a whole new batch of Data Vault experts.

It could change the way you do data warehousing.

Later.

Kent

Data Vault Certification – Day 2

Still really cold here in Montreal…

But the classroom discussions heated up a bit today as we dove deeper into the secrets of the Data Vault. Today we got into the nitty-gritty of things like Point in Time (PIT) tables, Bridge tables, DV loading, DV agility, set logic, and designing for performance. We got into some brand new material that has not been in prior certification class. All great stuff.

Dan sure knows a lot about a lot of things (hardware, software, operating systems, disk blocks, cache, etc.). His broad knowledge and experience definitely contributed to what is now Data Vault. We got to hear several juicy stories about real world issues he encountered over his illustrious career that lead him to architect Data Vault to have all the neat features it has (hint: lots of unnamed 3-letter gov’t agencies were apparently involved). Dan is bound and determined to help as many people as possible avoid the many pitfalls he has seen in data warehouse architectures.

What I learned today:

I FINALLY understand the legitimate use for a Bridge table and why it is a good idea (i.e., better query performance getting data out of a DV). The examples in the class got through to me (this time). It is all spelled out in the book, but now I really do get it.

ETL Golden Rule (for performance improvement):

1 process per target per action (i.e., Insert, Update, Delete)

In other words don’t make your ETL too complicated by trying to account for everything in one routine. It becomes too complex to maintain over time and it reduces the amount of parallelization you can do (hence it is SLOWER).

Think about it – new rows get inserted, right? So why waste the cycles forcing them through update logic? In a Data Vault structure it is very easy to separate these out. The SQL is pretty simple actually (but that is another post by itself).

Dan will be teaching more about this in his upcoming online Implementation training course. Stay tuned for that.

Data Vault Data Load Golden Rule:

100% of the data loaded 100% of the time

That really makes the load process pretty easy and streamlined. It is correlated to the Data Vault concept of loading the facts – straight from the source, no modifications. This is why the Data Vault is 100% audit-able.

So for Day 3 we will get even more into how to load a Data Vault.

And then there is the TEST (insert dramatic music here).

We get four hours. 60  questions (half of which are essay!).

Guess I better get studying!

Look for my Day 3 wrap up tomorrow night (assuming I can still write by then).

See ya.

Kent

Data Vault Certification Class – Day 1

As promised, here is your update from the 1st day of the certification class…

First let me say it is COLD here. I forgot how truly bone chilling winter in the northeast could be. Glad I brought extra layers. I walked about 10 blocks from the hotel to the conference center this morning (and back tonight). I was definitely awake when I got there! I am sure that walk alone must have burned off a few hundred calories. 🙂

So the class was fun and educational today. We have 9 people attending. All from the Montreal-area, except of course me (and Dan). Nice group of people; very into it. One gentleman, Pierre, was part of Dan’s online coaching program and has actually already built a successful data vault. It is really nice being with a group of people who “get it”, are engaged, and want to learn more.

Bits and pieces from today:

The goal of certification:

  1. To validate that we actually understand the Data Vault Model and the standards
  2. To validate that we can actually explain it to someone else
  3. To test us to be sure we can actually apply the rules and standards when we develop a real model

Word of the Day:

Deformity: The URGE to continue “slamming” data into an existing conformed dimension until it simply cannot sustain any further changes. This results in a “deformed” dimensions, increased support costs, and likely leads to re-engineering.

Cause: Business saying “But can’t you just add a few more columns to the table? That should be easy right?”

New question to ask:

When you change or re-engineer part of your ETL that scrubs or transforms your source data, do you keep a copy of the original code and structures under some sort of source control? If not, how will you explain to the auditors why the data in the quarterly report changed between runs?

Concept I understand better after today: 

Transaction Links: This is a very special case when you can denormalize  Satellite attributes up into the Link (at a previous job we called these SLINKs). You only do this when the transaction values can NEVER, EVER change. Examples of this are GL entries, and stock trades. Dan’s examples and explanations today really improved my understanding immensely.

Phrase I coined today in the class:

Data Archaeology: Dan uses the analogy of Data Geology (i.e., layers) to explain how (and why) we load data the way we do in the Data Vault. I said that enables us (architects, analysts, users) to effectively do Data Archaeology to find and extract the data we need. We search for those nuggets of wisdom in the data to help our businesses. Sometimes that data is near the surface and sometimes it is fossilized deep in the historic layers within the Data Vault.

No doubt somebody, somewhere, has said this before, but just in case they haven’t, you heard it here first. 😉

We also had great discussions about Agile BI, virtualization of data marts, in-memory dbs, solid state disks, ontologies, and the future of data warehousing in general. And what data warehouse class would be complete without mentioning Bill Inmon and Dr. Ralph Kimball?

Well, that’s it for now.

Stay tuned for Day 2.

Kent

P.S. As I mentioned yesterday, feel free to leave any questions you might have for Dan in the comments and I will pass them on. Or better still, just go buy the Data Vault book.

Hello Montreal!

Well, it took all day, but I am safely tucked away for the evening in my hotel (Le Meridien Versailles) in Montreal.  I am here for a three-day Data Vault Certification class.

I left Houston this morning, flew through Cleveland (CLE) and then to Burlington, Vermont. There, my good friend, Dan Linstedt, kindly picked me up at the airport and drove us across the border to Montreal, Canada (which took about 2 hours) and deposited me at my hotel. Dan lives north of Burlington and was coming this way anyhow (he is actually teaching the class I am attending). It was a great chance for us to catch up as we have not seen each other (face to face anyway) for quite a few years. Look for some big things from Dan in the coming months.

Monday night I prepared for the class by loading up my laptop with all the latest Oracle toys: SQL Developer Data Modeler Release 3.1 Early Adopter, SQL Developer Release 3.1 Early Adopter, and the Oracle Database 11g Express Edition Release 2 (AKA Oracle XE). These are all free tools and very easy to set up (even for a non-DBA like moi). With that I should be able to actually build out the examples in the class for real.

(Hmm…maybe I can use those for examples in my KScope 12 session…)

In any case, it should be a fun few days meeting some new people, seeing a new city (I have never been here before), and seeing what new techniques Dan has come with in the Data Vault arena.

Stay tuned this week as I plan to provide some updates each day on what I learned in the class. (You might want to click the “follow” button if you haven’t already)

If there are any questions you want me to ask Dan, leave them below in the comments.

 

10 Commandments of Health

Being a modern day Data Warrior is hard work!

It’s amazing how hard it is to sit all day, looking at a computer monitor, talking on con calls, communicating via IM and email, keeping up on trends with a little online video training, or sitting in conference rooms collecting requirements.

To be effective and focused, both your mind  and your body must be strong. That is as true today for modern IT workers as it was in the past for warriors of ancient Sparta.

In 1928, the U.S. Congress hired a doctor, Dr. George W Calver, to try to keep them healthy so they could effectively do their jobs for the country.

Dr. Calver came up with his 10 Commandments for Health and posted them all over the U.S. Capital to remind the congressman of how to stay in good health.

Here they are for you:

  • Eat wisely
  • Drink Plentifully (water)
  • Eliminate Thoroughly
  • Bathe Cleanly
  • Exercise Rationally
  • Accept Inevitables (i.e., don’t worry)
  • Play Enthusiastically
  • Relax Completely
  • Sleep Sufficiently
  • Check up Occasionally

His final piece of advice: “Give 5% of your time to keeping well. You won’t have to give 100% getting over being sick.”

Great advice for any age (or career). For more check out the January AARP Bulletin.

Take care. Stay healthy.

Kent

P.S. If you liked this post (or any of my other posts) please “like” it, Tweet it, email your friends, repost it to Facebook, LinkedIn or wherever using the buttons below.

P.P.S. Want to get my new posts in your email? Just click the “follow” button to the right.

Post Navigation

%d bloggers like this: