The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “@dlinstedt”

Data Vault and the Oracle Reference Architecture

Thanks to Mark Rittman and Twitter, I found out just before RMOUG that Oracle had published a new reference architecture.  It used to be called the Data Warehouse Reference Architecture, now it is called the Information Management Reference Architecture.

Oracle Information Management Ref Architecture

Oracle updated the architecture to allow for unstructured and big data to fit into the picture.

In my talks about Data Vault over the last few years I have been referring to the Foundation Layer of the architecture as the place where Data Vault fits. The new version of the architecture actual fits the definition of the Data Vault even better.

Now the Foundation Layer is defined as “Immutable Enterprise Data with Full History”.

If that is not the definition of Data Vault, I don’t know what is!

Immutable – does not change. Data Vault is insert only, no update – ever.

Enterprise Data – well duh! That pretty well fits any real data warehouse architecture. The model covers an enterprise view of the data not just a departmental views (like a data mart).

Full History – tracks data changes over time. That is one of the keys to the data Vault approach. We track all the data change history in Satellites so we can always refer to a view of the data at any point in time  That allows us to build (or re-build) dependent data marts whenever we need or whenever the business changes the rules.

So it is possible to do a Data Vault approach and be compliant with Oracle’s reference architecture.

Guess Dan was just a bit ahead of the game…

Later

Kent

How to Use Informatica to Build a Data Vault

Yes, its true – you will soon be able to get online training on how to build a Data Vault data warehouse using Informatica.

Dan Linstedt has been working hard for several months now to put together some top notch training for all you who use Informatica.

Dan will teach you all his best practices for getting the job done quickly using Informatica for your ETL tool.

If you want in, it’s not too late to get on the VIP early notification (which entitle you some discounts). Get in on the list here.

Here are a few common questions that Dan recently answered:

Q. What are the pre-requisties?
A. You must know Informatica PowerCenter and Data Vault modeling basics. The training works with Informatica PowerCenter v8.x and v9.x, but the mappings will only import to version 9.x or higher.

Q. Do I need access to an Informatica installation?
A. You will, if you want to do any of the hands on portions. We can’t help you with this. They used to provide a limited developer edition with a devnet membership, but that seems to have been discontinued.

Q. Will I learn Informatica PowerCenter?
A. No! This course assumes, you have at least 3 months experience in Informatica and know the difference between mapping, session and workflow objects. If you’ve never worked with Informatica tools, then we recommend that you DO NOT invest in it.

Q. Do I need to know Data Vault Modeling?
A. Yes, and the knowledge in the book “Super Charge your Data Warehouse” is sufficient for the course. It’s better if you have  more hands on experience though.

Q. Would it benefit me if I’ve gone through  the Data Vault Implementation and Best Practices course?
A. Yes.

Want to know more? Check out this video that has more details about the class and what it covers.

That’s it for now.

Later.

Kent

Data Vault vs.The World (3 of 3)

So far in this series I have compared and contrasted the Data Vault approach to that of  using a replicated ODS or a dimensional Kimball-style structure for an enterprise data warehouse repository. Let me thank everyone who has replied or commented for taking these posts as they were intended, keeping the conversation professional and not starting any flame wars. 🙂

So to close out the thread I must of course discuss using classic Inmon-style CIF (Corporate Information Factory) approach. Many of you may also think of this as a 3NF EDW approach.

To start it should be noted that I started my data warehouse career building 3NF data warehouses and applying Bill Inmon’s principles. Very shortly after I began learning about data warehousing I had the incredible privilege of getting to not only meet Mr. Inmon and learn from him but I got to co-author a book (my first) with him. That book was The Data Model Resource Book. I got to work with Bill on the data warehouse chapters and because of that experience became well versed in his theories and principles. For many years I even gave talks at user group events about how to convert an enterprise data model to an enterprise data warehouse model.

So how does this approach work? Basically you do some moderate denormalization of the source system model (where it is not already denormalized) and add a snapshot date to all the primary keys (to track changes over time). This of course is an oversimplification – there are a number of denormaliztion techniques that could be used in build a data warehouse using Bill’s original thesis.

Additionally this approach (CIF or even DW 2.0) calls for then building dimensional data marts on top of that EDW (along with other reporting structures as needed). Risks here are similar to those mentioned in the previous posts.

The primary one being that the resulting EDW data structure is usually pretty tightly coupled to the OLTP model so the risk of reworking and reloading data is very high as the OLTP structure changes over time. This of course would have impacts downstream to the dimensional models, reports, and dependent extracts.

The addition of snapshot dates to all the PKs in this style data warehouse model also adds quite a bit of complexity to the load and query logic as the dates cascade down through parent-child-child-type relationships. Getting data out ends up needing lots of nested Max(Date) sorts of sub-queries. Miss a sub-query or get it wrong and you get the wrong data. Overall a fairly fragile architecture in the long run.

Also like the dimensional approach, I have encountered few teams that have been successful trying to implement this style of data warehouse in an incremental or agile fashion. My bad luck? Maybe…

The loosely coupled Data Vault data model mitigates these risks and also allows for agile deployment.

As discussed in the previous posts, the data model for a Data Vault based data warehouse is based on business keys and processes rather than the model of any one source system. The approach was specifically developed to mitigate the risks and struggles that were evident in the traditional approaches to data warehousing, including what we all considered the Inmon approach.

As I mentioned earlier I got to interact with Bill Inmon while we worked on a book. The interaction did not stop there. I have had many discussions over the years with Bill on many topics related to data warehousing, which of course includes talking about Data Vault. Both Dan and I talked with Bill about the ideas in the Data Vault approach. I spent a number of lunches telling him about my real-world experience with the approach and how it compared to his original approach (since I had done both). There were both overlaps and differences. Initially, Bill simply agreed it sounded like a reasonable approach (which was a relief to me!).

Over a period of time, many conversations with many people, study, and research, we actually won Bill Inmon over and got his endorsement for Data Vault. In June of 2007,  Bill Inmon stated for the record: 

The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.

So if Bill Inmon agrees that Data Vault is a better approach for modeling an enterprise data warehouse, why would anyone keep using his old methods and not at least consider learning more about Data Vault?

Something to think about, eh?

I hope you enjoyed this little series about Data Vault and will keep it in mind as you get into your new data warehouse projects for 2013.

Kent

P.S. – if you are ready to learn more about Data Vault, check out the introductory paper on my White Papers page, or just go for broke and buy the Super Charge book.

2012: Year in the Life of an Oracle Data Warrior

Hard to believe it is nearly the end of the year. But…it is here.

I will be taking time until the end of the year so I am doing my “year-end” post now.

It was a significant year for me with many new things, events, conferences, and clients. Here is a list, by month of a few of them:

January

I launched this blog – Oracle Data Warrior! At the stroke of midnight on January 1, I hit publish for this posting. So far I have had over 22,000 views on the site with the best/biggest day drawing 294 views on September 24th. People came to check out a free promotion for my new Kindle book.

So far 78 of you have subscribed to this blog and hence get notification whenever I post something new.

Thanks for your support! (For the rest – subscribed now so you don’t miss anything in 2013).

In January I also launched the Year of the Data Vault by going to Dan Linstedt’s Data Vault certification class in Montreal. It was a great class. Check the January archive for my posts about the class.

February

I posted what has turned out to be THE most popular article so far: The best FREE data modeling tool ever. So far it has had 8,213 views! Wow! (of course since a bunch of you just clicked the link that number has gone up again)

Also big in February (every year) is the RMOUG Training Days in Denver, Colorado. This year I did the first ever remote presentation via skype as part of their pre-conference seminar on data warehousing. My presentation was, of course, on Data Vault. There were a few technical issues but with the help of my good friend Jerry Ireland we got through it fine.

(Note: For RMOUG 2013, I will actually be presenting in person).

March

Two really big things this month:

  1. I filed with the state of Texas and formed Data Warrior LLC, signed my very first 1099 (independent) contract and became an official business.
  2. The Data Vault Training Portal was launched. You can read my post about that here.

April

Business wise, I started the 1099 contract work at MD Anderson Cancer Center and got to work building a data vault for one of their internal projects.

On the blog, I made some modification to the layout and added a War Chest page with links to some resources that cost a little money (as opposed to my White Paper page which has Free stuff).

May

After one month of being an independent contractor I bought my first smartphone – an LG Nitro. I am not really a huge gadget guy so I had put this off for sometime but finally gave in so I could tweet at the upcoming ODTUG conference in San Antonio.

Of course this means I signed up for Twitter. You can find me there at https://twitter.com/KentGraziano.

June

June was  HUGE month.

  1. The Data Vault modeling book, hit #1 on Kindle.
  2. I got “promoted” to Oracle ACE Director (and found out via a Facebook post!).
  3. And of course there was KScope12 in San Antonio, Texas. I taught Chi Gung every morning at 7 AM and blogged about the event every night (at about midnight). Just check my June archives for all the posts and plenty of pictures.

July

Slowed down a bit here. Recovered from KScope12 (started planning for KScope13). Wrote a bit about work/life balance and posted this cool InfoGraphic.

August

Another first for me in August was I published my first eBook on Kindle about data model design reviews.

Then we had an excellent family vacation with my father back east. We drove through the Adirondack Mountains in New York State and then to the Green Mountains of Vermont where we stayed at the Trapp Family Lodge. It gets my highest recommendation for a family friendly, environmentally aware, upscale, outdoor vacation resort. Pay the money and go – you only live once!

While on the trip, my nine year old son came up with a great idea for a blog post: How to make data modeling fun. When we got back, I wrote and posted it here. (Soon it will be a presentation at a conference near you)

September

This was another big and fun month – all about Oracle Open World 2012 and getting to attend my first Oracle ACE Director meeting at Oracle HQ. Like at KScope, I blogged every night in the wee hours to capture what I saw and learned that day. The smart phone got a lot of use taking pictures in session and around San Francisco. It is all logged in the September archives.

October

Actually OOW 2012 bled over into October so there are even more posts and pictures in the October Archive folder.

The other biggie in October was that I finished out my contract at MD Anderson Cancer Center and started a new gig at McKesson Specialty Health (US Oncology). This has turned out to be a great project with a good team (like I had at MD Anderson), but with the added benefit of only being 9 miles from my house. This is the shortest commute I have had since college! Saves me 2.5 hours a day in driving.

Needless to say, that is a very nice aspect of the job.

November

This month was less about data (and my normal work) and more about fitness, a new habit, and being a warrior. (Though I did get accepted to present at the RMOUG Training Days in Denver.)

The highlight of the month was attending the 20th Anniversary celebration for the International Combat Hapkido Federation. I have been attending their workshops and seminars for over 15 of those years and have had the privilege to train with several of their master as well as their founder and grand master John Pellegrini. Combat Hapkido is a very practical martial art for self-defense and a lot of fun to learn and practice.

It was a great event with back to back workshops (i.e., work outs!) with many masters and grand masters. We got training in Tai Chi, stretching, conditioning, kicking, Filipino Escrima, ground survival, and pressure points. There were actual martial arts celebs in attendance including Bill “Superfoot” Wallace, Cynthia Rothrock, and Stephen Hayes.

Since my main art is Tae Kwon Do, I was very privileged to meet and train with Grandmaster Bill Wallace (who actually has signed my last two black belt certificates along with GM Pellegrini). GM Wallace’s session was challenging and fun. He is quite entertaining.

Me (right) with GM Superfoot Wallace (center)  and Master Ramon Voils

Me (right) with GM Superfoot Wallace (center) and Master Ramon Voils

At 67 years old, GM Wallace can kick faster and higher than pretty much everyone I have every trained with. I can only hope to be doing so well when I reach that age.

This why he is called "Superfoot"

This is why he is called “Superfoot”

For more pictures from the event, you can subscribe to my newsfeed on Facebook or like my page. You might even find a picture of me in a suit!

December

And now we are up to this final month of 2012. I have been very busy with my work at McKesson so have only got one post out about the newest release of SQL Developer Data Modeler (which I use nearly every day!).

I did however recently get notification that I had several papers accepted for presentation at the ODTUG  KScope13 conference in New Orleans next June. Be sure to register for that event too!

Yes it was quite the busy year…

Stay tuned for 2013 and see what happens.

Merry Christmas and Happy New Year!

Kent

The Oracle Data Warrior

List of Top Data Vault Resources (UPDATED 2016)

As I finished out my latest contract, my team mates wanted to know where they could go to get their data vault questions answered (besides emailing me!).

So I put together this list for them and figured the readers of my blog would probably like to see the same list.

Here it is!

My Stuff

Introduction to Data Vault 1.0 (pdf):

https://kentgraziano.com/white-papers/

Book:

Intro to Agile Data Engineering Using Data Vault 2.0

Slides

Introduction to  Data Vault and Why Data Vault?  (ppt):

http://www.slideshare.net/kgraziano/why-data-vault?

http://www.slideshare.net/kgraziano/agile-data-warehouse-modeling-introduction-to-data-vault-data-modeling

Dan’s Data Vault Books

The NEW Data Vault 2.0 Book:

http://www.amazon.com/Building-Scalable-Data-Warehouse-Vault-ebook/dp/B015KKYFGO/

The Data Vault Modeling book (DV 1.0):

http://www.amazon.com/Super-Charge-Your-Data-Warehouse/dp/1463778686/

The Data Vault Modeling book – Kindle version:

http://www.amazon.com/Super-Charge-Your-Warehouse-ebook/dp/B00853265G/

The Data Vault Modeling book – downloadable PDF version:

http://learndatavault.com/books/super-charge-your-data-warehouse/

Data Vault Implementation using Pentaho (DV 1.0):

http://www.lulu.com/shop/peter-van-til/implementing-a-datavault-architecture-with-pentaho-data-integration/paperback/product-17580260.html

Around the Web

Dan has two online classes for Implementing Data Vault (1.0):

  1. Using Informatica. You can see that here.
  2. Using SQL. You can see that here.

Dan’s main site and blog – Subscribe to this to get email updates/announcements regarding data vault:

http://danlinstedt.com/

Best overall source of Q&A – Data Vault Discussion group on LinkedIn:

http://www.linkedin.com/groups?gid=44926&trk=hb_side_g

Martin Evers,  data vault expert from Europe,  (just one of his articles) :

http://dm-unseen.blogspot.nl/2012/10/data-vault-business-key-mutations-matter.html

On YouTube

Data Vault videos from Dan (and Sanjay):

http://www.youtube.com/user/learndatavault

Older videos (includes RapidACE demo):

http://www.youtube.com/user/dlinstedt/videos?sort=dd&flow=list&page=1&view=0

Data Vault Architecture:

http://www.youtube.com/watch?v=WmFENnqgoS0&feature=youtu.be&a  (BTW – turn the volume down first. The “theme” music is loud)

Well that’s the main ones for now.

What’s your favorite?

Enjoy!

Kent

The Data Warrior

Data Vault Master and CDVP2

Authorized DV Bootcamp Instructor

Post Navigation