The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the category “Data Modeling”

Data Vault Informatica Class is Live!

Just a quick note to let you all know that Dan has finally released the class on how to easily implement a Data Vault using Informatica.

I wrote about the class here.

I have gone through a few of the lessons already and can tell you the instruction is very clear and easy to follow (even for me!) and the audio and video is excellent. The audio seems to come on a bit load so just be sure you have your volume turned down a bit when you start the videos.

And there is a money back guarantee if for some reason you decide the class is not for you.

If you did not get on Dan’s early notice list you can still sign up by going directly here: http://learndatavault.com/kentdvi

And, since you are a reader of my blog, if you sign up in the next few weeks and enter the coupon code DATAWARRIOR13, you can get $100 off !

So if you use Informatica and plan to do a Data Vault, you owe it to yourself to take a look at this course.

Take care.

Kent

Data Vault and the Oracle Reference Architecture

Thanks to Mark Rittman and Twitter, I found out just before RMOUG that Oracle had published a new reference architecture.  It used to be called the Data Warehouse Reference Architecture, now it is called the Information Management Reference Architecture.

Oracle Information Management Ref Architecture

Oracle updated the architecture to allow for unstructured and big data to fit into the picture.

In my talks about Data Vault over the last few years I have been referring to the Foundation Layer of the architecture as the place where Data Vault fits. The new version of the architecture actual fits the definition of the Data Vault even better.

Now the Foundation Layer is defined as “Immutable Enterprise Data with Full History”.

If that is not the definition of Data Vault, I don’t know what is!

Immutable – does not change. Data Vault is insert only, no update – ever.

Enterprise Data – well duh! That pretty well fits any real data warehouse architecture. The model covers an enterprise view of the data not just a departmental views (like a data mart).

Full History – tracks data changes over time. That is one of the keys to the data Vault approach. We track all the data change history in Satellites so we can always refer to a view of the data at any point in time  That allows us to build (or re-build) dependent data marts whenever we need or whenever the business changes the rules.

So it is possible to do a Data Vault approach and be compliant with Oracle’s reference architecture.

Guess Dan was just a bit ahead of the game…

Later

Kent

RMOUG Training Days 2013 – Day 1

Unlike many conferences, today started off not with the keynote but with an actual session (probably some advanced psychology at work here). 🙂

I started off with John King’s session on Oracle 11g features that developers should know about. (He was going to talk about 12c but since it has not been released yet, he could not speak about it)

John King giving Session 1 at RMOUG 2013

John King giving Session 1 at RMOUG 2013

John is a great speaker and gave us some very detailed information.

One very interesting piece to me, as a data modeler and data warehouse designer, was the addition of Virtual Columns. With this you can declare a virtual, calculated/derived column to be part of a table definition. With this you can define a calculation once and have it appear when querying the table without actually physically adding a column to the table. Looks promising.

John told us about lots of new things like Pivot, Unpivot, Results Cache, PL/SQL Results cache and Nth Value functions. Some of them are shown in the following pictures.

SQL PIVOT Example

SQL PIVOT Example

Example of UNPIVOT

Example of UNPIVOT

Another cool SQL Function: Nth Value

Another cool SQL Function: Nth Value

All neat options I did not really know about.

Next up was the keynote speech by Mogens Norgaard from Denmark. Mogens is an ACE Director, CEO of his own consulting firm, and a brew master. Interesting guy.

He showed up in his bathrobe to talk to us all about how the smartphone is taking over  the world and all the cool apps you could build (and some he has built).

Mogens Norgaard in his keynote best.

Mogens Norgaard in his keynote best.

Next was my turn – my first session of the conference – 5 Ways to Make Data Modeling Fun (based on a blog post).

I was pleasantly surprised that I had 40-50 people attend and most stayed for the whole talk. It was a good, interactive session. My good buddy Jon Arnold assisted me in administering some of the activities. It was great fun getting the attendees to actually collaborate on activities during a session.

Great participant collaboration during my talk

Great participant collaboration during my talk

As promised, I did give out prizes for some of the activities (all branded Data Warrior LLC stuff).

Next was the ACE Director networking lunch where they put our names on tables so people could sit with us to ask questions (if they wanted too).

Networking Lunch

Networking Lunch

After lunch we some vendor sessions (which I skipped) and several panel discussions. These included the Women in Technology Panel and an Oracle Career Roundtable.

Women in Technology Panel

Women in Technology Panel

Oracle Careers Roundtable

Oracle Careers Roundtable

Anyone notice that the Women in Tech had one male on the panel but the Oracle Career panel had no women? Just sayin’ folks…

Next I sat in for part of a session on Oralce TimesTem database for real-time BI. It turned out to be the same stuff I heard at Oracle Open World so I did not stay.

Last for my day at RMOUG was my joint session with Stewart Bryson on Data Vault and OBIEE. Unfortunately due to the late slot (5:15 PM) we had a very low turn out. 😦 But is was a good session as I discovered all the things Stewart learned trying to use the data vault model for virtualizing the data mart layer (in OBIEE). It was all very good and reinforced my belief that Data Vault is a great way to model an EDW and that non-data vault people could understand it and apply it to dimensional modeling (or that Stewart is really exceptional).

Adios for now.

Kent

P.S. Forgot to mention again that I will be conducting another morning Chi Gung class at & AM above the registration area. Please join!

Rocky Mountain High and Some Chi Gung

Are you ready?

Are you coming?

The annual Rocky Mountain Oracle Users Group Training Days kicks off next Monday, February 11 at the Denver Convention Center. I can hardly wait! I love the beautiful surroundings and vibe of Denver and the mountains. It is a great place to hang out, relax, and sharpen your skills.

I will be there attending sessions, networking with some old friends, and doing three presentations. Here they are:

Five Ways to Make Data Modeling Fun Tuesday, 11:15 – 12:15

Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach Tuesday, 5:15 – 6:15 PM

Top 10 Cool Features in Oracle SQL Developer Data Modeler Wednesday 2:45 – 3:45 PM

In addition to these officially scheduled sessions, I will be conducting  Morning Chi Gung sessions at 7 Am both Tuesday and Wednesday morning for those interested. Not sure yet where we will gather for this but probably in the Convention Center somewhere. (Keep an eye on my twitter stream @KentGraziano for details once I am on site next Monday).

The Chi Gung sessions are staring to be a traditon with me at various Oracle user events, so please come and join our growing tribe. If you are not sure what Chi Gung is, check out my post from last summer for a video introduction. For those of us coming from lower altitude, these morning sessions should help us adjust a little easier. Since the air is a bit thinner in Denver, learning to breathe deeply will be a useful skill. 🙂

If you are signed up and planning to attend, I look forward to meeting up with you. If not, get signed up soon and get yourself to Denver to enjoy a little Rocky Mountain high with 1,000 of the brightest Oracle minds in the world.

See ya!

Kent

 

Data Vault vs.The World (3 of 3)

So far in this series I have compared and contrasted the Data Vault approach to that of  using a replicated ODS or a dimensional Kimball-style structure for an enterprise data warehouse repository. Let me thank everyone who has replied or commented for taking these posts as they were intended, keeping the conversation professional and not starting any flame wars. 🙂

So to close out the thread I must of course discuss using classic Inmon-style CIF (Corporate Information Factory) approach. Many of you may also think of this as a 3NF EDW approach.

To start it should be noted that I started my data warehouse career building 3NF data warehouses and applying Bill Inmon’s principles. Very shortly after I began learning about data warehousing I had the incredible privilege of getting to not only meet Mr. Inmon and learn from him but I got to co-author a book (my first) with him. That book was The Data Model Resource Book. I got to work with Bill on the data warehouse chapters and because of that experience became well versed in his theories and principles. For many years I even gave talks at user group events about how to convert an enterprise data model to an enterprise data warehouse model.

So how does this approach work? Basically you do some moderate denormalization of the source system model (where it is not already denormalized) and add a snapshot date to all the primary keys (to track changes over time). This of course is an oversimplification – there are a number of denormaliztion techniques that could be used in build a data warehouse using Bill’s original thesis.

Additionally this approach (CIF or even DW 2.0) calls for then building dimensional data marts on top of that EDW (along with other reporting structures as needed). Risks here are similar to those mentioned in the previous posts.

The primary one being that the resulting EDW data structure is usually pretty tightly coupled to the OLTP model so the risk of reworking and reloading data is very high as the OLTP structure changes over time. This of course would have impacts downstream to the dimensional models, reports, and dependent extracts.

The addition of snapshot dates to all the PKs in this style data warehouse model also adds quite a bit of complexity to the load and query logic as the dates cascade down through parent-child-child-type relationships. Getting data out ends up needing lots of nested Max(Date) sorts of sub-queries. Miss a sub-query or get it wrong and you get the wrong data. Overall a fairly fragile architecture in the long run.

Also like the dimensional approach, I have encountered few teams that have been successful trying to implement this style of data warehouse in an incremental or agile fashion. My bad luck? Maybe…

The loosely coupled Data Vault data model mitigates these risks and also allows for agile deployment.

As discussed in the previous posts, the data model for a Data Vault based data warehouse is based on business keys and processes rather than the model of any one source system. The approach was specifically developed to mitigate the risks and struggles that were evident in the traditional approaches to data warehousing, including what we all considered the Inmon approach.

As I mentioned earlier I got to interact with Bill Inmon while we worked on a book. The interaction did not stop there. I have had many discussions over the years with Bill on many topics related to data warehousing, which of course includes talking about Data Vault. Both Dan and I talked with Bill about the ideas in the Data Vault approach. I spent a number of lunches telling him about my real-world experience with the approach and how it compared to his original approach (since I had done both). There were both overlaps and differences. Initially, Bill simply agreed it sounded like a reasonable approach (which was a relief to me!).

Over a period of time, many conversations with many people, study, and research, we actually won Bill Inmon over and got his endorsement for Data Vault. In June of 2007,  Bill Inmon stated for the record: 

The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.

So if Bill Inmon agrees that Data Vault is a better approach for modeling an enterprise data warehouse, why would anyone keep using his old methods and not at least consider learning more about Data Vault?

Something to think about, eh?

I hope you enjoyed this little series about Data Vault and will keep it in mind as you get into your new data warehouse projects for 2013.

Kent

P.S. – if you are ready to learn more about Data Vault, check out the introductory paper on my White Papers page, or just go for broke and buy the Super Charge book.

Post Navigation