The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “Data Warehouse”

Ready for a Rocky Mountain High?

Got the mid winter blues?

Ready for some bright sunshine and fresh mountain air? Maybe some snow?

The wait is almost over! RMOUG Training Days 2012, February 14 – 16, is only 14 days away! That’s right, only a few days until the biggest, and the best regional Oracle users group conference kicks off! But why wait until the last minute?

You still have a few days to take advantage of the lower Standard Registration rates. Register by February 9 to receive the lower rate.

And don’t forget to sign up for a University session. I am participating in this one:

Data Warehouse Performance

Jerry Ireland, Rightsizing Inc., Mark W. Farnham, Rightsizing, Inc., Tim Gorman, Evergreen Database Technologies, Inc, & Kent Graziano, TrueBridge Resources

This session brings together four experts in performance and data warehousing. All are Oracle ACEs, two are members of the OakTable, and together they represent over ninety years of experience with Oracle. All of the ideas and techniques presented can be implemented with minimal additional hardware and most with a manageable effort. The session will start with some hints and tips that can be used on existing data warehouses and progress to a method of getting performance when the CBO does not recognize a star schema. The session will present partitioning strategies to increase scaling, and finally take a glimpse at some of the benefits of relatively new, but growing, database architecture. The day will be wrapped up with a Q & A with all symposium presenters.

So don’t delay, make your plans today!

And maybe get some skiing in?

See ya!

Kent

Data Vault Certification Class – Day 3 (It’a a wrap!)

Well, not so cold today in Montreal. Instead we got very cold rain and snow mix. Yuk. (But I definitely want to come back in the summer!)

This morning, Dan dived into how to load the Data Vault with all new material he has not taught in the certification class before. We really got lucky by attending this class. I knew most of the concepts, and have implemented most of them, but his new slides are just killer. They really get the point across and cleared up a few points for me too.

Not only do the slides include sample SQL for various parts of the load, and the logical patterns, Dan even demonstrated some working code on his really cool, high-powered laptop. It was great for the class to see the Data Vault concepts put into practice. (And he of course had some more tales to tell)

Cool Phrase for the Day

Short Circuit Boolean Evaluation: A mathematical concept, that Dan laid on us, that is used to get very fast results from the Data Vault change detection code . We use it in doing column compares on attribute data to determine if a Satellite row has changes.

In Oracle it looks like this: decode(src_att1, trg_att1,0,1) = 1

In ANSI-SQL it is a bit longer but has the same effect:

CASE WHEN (src_att1 is null and trg_att1 is null or src_att1 = trg_att1))

THEN 1 else 0 = 0

I have been using this for years (learned it from Dan) but had no idea there was a math term for it.

Okay so I am a geek. 🙂

The Test

After all that cool stuff came the certification test.

Not easy. My hand cramped writing out the answers.

We get our results next week. (Dan has a fun weekend ahead of him doing a bunch of scoring).

I am sure everyone in class will do fine. As I said, they all seemed to get it.

Anyway, the class is over now and I am in a hotel in Vermont (where it is snowing now). I fly back to Houston in the morning.

I had a good week here in the northeast (despite the weather). It was definitely worth the time and money to come for this class. I met some great people, learned a lot, and got to spend time with my good friend Dan.

Watch out Montreal – you are about to be descended upon by a whole new batch of Data Vault experts.

It could change the way you do data warehousing.

Later.

Kent

Data Vault Certification – Day 2

Still really cold here in Montreal…

But the classroom discussions heated up a bit today as we dove deeper into the secrets of the Data Vault. Today we got into the nitty-gritty of things like Point in Time (PIT) tables, Bridge tables, DV loading, DV agility, set logic, and designing for performance. We got into some brand new material that has not been in prior certification class. All great stuff.

Dan sure knows a lot about a lot of things (hardware, software, operating systems, disk blocks, cache, etc.). His broad knowledge and experience definitely contributed to what is now Data Vault. We got to hear several juicy stories about real world issues he encountered over his illustrious career that lead him to architect Data Vault to have all the neat features it has (hint: lots of unnamed 3-letter gov’t agencies were apparently involved). Dan is bound and determined to help as many people as possible avoid the many pitfalls he has seen in data warehouse architectures.

What I learned today:

I FINALLY understand the legitimate use for a Bridge table and why it is a good idea (i.e., better query performance getting data out of a DV). The examples in the class got through to me (this time). It is all spelled out in the book, but now I really do get it.

ETL Golden Rule (for performance improvement):

1 process per target per action (i.e., Insert, Update, Delete)

In other words don’t make your ETL too complicated by trying to account for everything in one routine. It becomes too complex to maintain over time and it reduces the amount of parallelization you can do (hence it is SLOWER).

Think about it – new rows get inserted, right? So why waste the cycles forcing them through update logic? In a Data Vault structure it is very easy to separate these out. The SQL is pretty simple actually (but that is another post by itself).

Dan will be teaching more about this in his upcoming online Implementation training course. Stay tuned for that.

Data Vault Data Load Golden Rule:

100% of the data loaded 100% of the time

That really makes the load process pretty easy and streamlined. It is correlated to the Data Vault concept of loading the facts – straight from the source, no modifications. This is why the Data Vault is 100% audit-able.

So for Day 3 we will get even more into how to load a Data Vault.

And then there is the TEST (insert dramatic music here).

We get four hours. 60  questions (half of which are essay!).

Guess I better get studying!

Look for my Day 3 wrap up tomorrow night (assuming I can still write by then).

See ya.

Kent

Data Vault Certification Class – Day 1

As promised, here is your update from the 1st day of the certification class…

First let me say it is COLD here. I forgot how truly bone chilling winter in the northeast could be. Glad I brought extra layers. I walked about 10 blocks from the hotel to the conference center this morning (and back tonight). I was definitely awake when I got there! I am sure that walk alone must have burned off a few hundred calories. 🙂

So the class was fun and educational today. We have 9 people attending. All from the Montreal-area, except of course me (and Dan). Nice group of people; very into it. One gentleman, Pierre, was part of Dan’s online coaching program and has actually already built a successful data vault. It is really nice being with a group of people who “get it”, are engaged, and want to learn more.

Bits and pieces from today:

The goal of certification:

  1. To validate that we actually understand the Data Vault Model and the standards
  2. To validate that we can actually explain it to someone else
  3. To test us to be sure we can actually apply the rules and standards when we develop a real model

Word of the Day:

Deformity: The URGE to continue “slamming” data into an existing conformed dimension until it simply cannot sustain any further changes. This results in a “deformed” dimensions, increased support costs, and likely leads to re-engineering.

Cause: Business saying “But can’t you just add a few more columns to the table? That should be easy right?”

New question to ask:

When you change or re-engineer part of your ETL that scrubs or transforms your source data, do you keep a copy of the original code and structures under some sort of source control? If not, how will you explain to the auditors why the data in the quarterly report changed between runs?

Concept I understand better after today: 

Transaction Links: This is a very special case when you can denormalize  Satellite attributes up into the Link (at a previous job we called these SLINKs). You only do this when the transaction values can NEVER, EVER change. Examples of this are GL entries, and stock trades. Dan’s examples and explanations today really improved my understanding immensely.

Phrase I coined today in the class:

Data Archaeology: Dan uses the analogy of Data Geology (i.e., layers) to explain how (and why) we load data the way we do in the Data Vault. I said that enables us (architects, analysts, users) to effectively do Data Archaeology to find and extract the data we need. We search for those nuggets of wisdom in the data to help our businesses. Sometimes that data is near the surface and sometimes it is fossilized deep in the historic layers within the Data Vault.

No doubt somebody, somewhere, has said this before, but just in case they haven’t, you heard it here first. 😉

We also had great discussions about Agile BI, virtualization of data marts, in-memory dbs, solid state disks, ontologies, and the future of data warehousing in general. And what data warehouse class would be complete without mentioning Bill Inmon and Dr. Ralph Kimball?

Well, that’s it for now.

Stay tuned for Day 2.

Kent

P.S. As I mentioned yesterday, feel free to leave any questions you might have for Dan in the comments and I will pass them on. Or better still, just go buy the Data Vault book.

Announcement: Data Vault Model & Methodology Certification Class in Montreal!

I just found out today that my good friend Dan is going to teach a Data Vault certification class later this month in Montreal, Canada.

Guess what? I have already registered!

What a great opportunity to learn about Data Vault from the guy who invented it.

Plus you will get to be part of an elite group of data warehouse professionals who are actually certified as Data Vault Practitioners -assuming you pass the test of course 😉

This is going to be the only such class he will teach this year in North America so this is a rare opportunity. The class is being coordinated by agileDSS in Montreal. You can get details and register here. Don’t waste any time: the class starts January 25th and there are only 5 seats left as of today.

So, you might ask why I am going to this class – “Didn’t you coauthor a book with Dan and help him with the most recent one?”

Why yes, yes I did.

So why am I going to spend my hard earned money and take time away from my current gig to fly from Houston, Texas to Montreal to take this class?

  1. Dan is my friend. I want to support him in his business ventures.
  2. Dan is my friend. I haven’t seen him face-to-face since he moved from Colorado to Vermont and I moved from Colorado to Texas. This is a rare opportunity for us to catch up a bit in person.
  3. Dan is some kind of genius. He invented a new way to model data warehouse structures and has proven its effectiveness over the last 10+ years. It is not often you get to learn from someone like that (live and in person).
  4. I can always stand a refresher course. I have not been in a formal Data Vault class, taught by someone else, in close to 10 years. I am pretty sure Dan has come up with some new variations, interpretations, techniques, and in-the-trenches war stories since then!
  5. I love networking with other data warehouse professionals in person.
  6. I have never been to Montreal.
  7. It’s a new year. Time to invest a bit in my own professional development.
  8. Oh, and did I mention, Dan is my friend.

So why don’t you sign up too and join me in Montreal? I guarantee it will change the way you do data warehousing.

And you get to hang out with me and Dan! Such a deal!

So go to the link, register, and tell them the Oracle Data Warrior sent you.

See you in Montreal!

Kent

Post Navigation