The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “@dlinstedt”

Better Data Modeling: The Oracle Data Warrior Speaks!

Looks like I will be doing a bit of speaking this year at a number of  events around the country, and later on, the globe (more on that later).

As usually all my talks will center around using SQL Developer Data Modeler, data modeling standards, Data Vault, agile, or a combination of all of the above.

If you have budget and time, please come out to at least one of these events this year, I would love to meet you in person and talk about the world of Oracle and Data Modeling.

If you aren’t planning to attend one of these – WHY NOT?

These are all great events with tons of learning opportunities. The networking alone is worth the price of admission.

Here is a list of the first three events confirmed on my calendar (and SURPRISE – they are NOT all Oracle related events):

RMOUG Training Days

In less than two weeks: The Rocky Mountain Oracle Users Group Training Days 2014 in Denver, Colorado. This runs from Feb 5-7 , will have at least 1,000 people and you cannot beat the price.I will be presenting Friday at 1:30PM on how I save my clients big $$ by applying repeatable processes and standards to my data models.

Follow it on twitter with #RMTD14.

Data Vault Consortium

Next up March 20 – 22nd , I will be participating in the 1st ever World Wide Data Vault Consortium and User Group meetup in beautiful northern Vermont near the home of my good friend, the inventor of the The Data Vault Model and Methodology, Dan Linstedt. I will be speaking about agile and data warehousing, using SDDM to do Data Vault modeling, and no doubt engaging in some lively debates with Data Vault experts from around the globe. Check out the agenda on the event page for more details on who will be speaking (hint: Bill Inmon, father of  data warehousing is participating!).

Enterprise Data World 2014

The #EDW14 event is really the annual conference put on by DAMA International and the speaker list is a veritable who’s-who of the data architecture and modeling world. This year the event is in Austin, Texas on April 27 – May 1. Since that is quite close to where I live, I figured I would submit an abstract and I was honored to be accepted. I have attended this event only once before when it was in Denver (a long time ago!) and have been a member of DAMA on and off for years, but this is the first time I have been asked to speak. I am looking forward to it for sure (not sure how I will fit my talk into a 45 minute slot!). Sign up for it here.

If you are planning to attend any of these, drop me a line over Twitter or LinkedIn so we can plan to meet up.

Later.

Kent

The Oracle Data Warrior

Agile Data Warehouse Modeling: How to Build a Virtual Type 2 Slowly Changing Dimension

One of the ongoing complaints about many data warehouse projects is that they take too long to delivery. This is one of the main reasons that many of us have tried to adopt methods and techniques (like SCRUM) from the agile software world to improve our ability to deliver data warehouse components more quickly.

So, what activity takes the bulk of development time in a data warehouse project?

Writing (and testing) the ETL code to move and transform the data can take up to 80% of the project resources and time.

So if we can eliminate, or at least curtail, some of the ETL work, we can deliver useful data to the end user faster.

One way to do that would be to virtualize the data marts.

For several years Dan Linstedt and I have discussed the idea of building virtual data marts on top of a Data Vault modeled EDW.

In the last few years I have floated the idea among the Oracle community. Fellow Oracle ACE Stewart Bryson and I even created a presentation this year (for #RMOUG and #KScope13) on how to do this using the Business Model (meta-layer) in OBIEE (It worked great!).

While doing this with a BI tool is one approach, I like to be able to prototype the solution first using Oracle views (that I build in SQL Developer Data Modeler of course).

The approach to modeling a Type 1 SCD this way is very straight forward.

How to do this easily for a Type 2 SCD has evaded me for years, until now.

Building a Virtual Type 2 SCD (VSCD2)

So how to create a virtual type 2 dimension (that is “Kimball compliant” ) on a Data Vault when you have multiple Satellites on one Hub?

(NOTE: the next part assumes you understand Data Vault Data Modeling. if you don’t, start by reading my free white paper, but better still go buy the Data Vault book on LearnDataVault.com)

Here is how:

Build an insert only PIT (Point-in-Time) table that keeps history. This is sometimes referred to as a historicized PIT tables.  (see the Super Charge book for an explanation of the types of PIT tables)

Add a surrogate Primary Key (PK) to the table. The PK of the PIT table will then serve as the PK for the virtual dimension. This meets the standard for classical star schema design to have a surrogate key on Type 2 SCDs.

To build the VSCD2 you now simply create a view that uses the PIT table to join the Hub and all the Satellites together. Here is an example:

Create view Dim2_Customer (Customer_key, Customer_Number, Customer_Name, Customer_Address, Load_DTS)
as
Select sat_pit.pit_seq, hub.customer_num, sat_1.name, sat_2.address, sat_pit.load_dts
from HUB_CUST hub,        
          SAT_CUST_PIT sat_pit,        
          SAT_CUST_NAME sat_1,        
          SAT_CUST_ADDR sat_2
where  hub.CSID = sat_pit.CSID           
    and hub.CSID = sat_1.CSID           
    and hub.CSID = sat_2.CSID           
    and sat_pit.NAME_LOAD_DTS = sat_1.LOAD_DTS           
    and sat_pit.ADDRESS_LOAD_DTS = sat_2.LOAD_DTS 
 

Benefits of a VSCD2

  1. We can now rapidly demonstrate the contents of a type 2 dim prior to ETL programming
  2. With using PIT tables we don’t need the Load End DTS on the Sats so the Sats become insert only as well (simpler loads, no update pass required)
  3. Another by product is the Sat is now also Hadoop compliant (again insert only)
  4. Since the nullable Load End DTS is not needed, you can now more easily partition the Sat table by Hub Id and Load DTS.

Objections

The main objection to this approach is that the virtual dimension will perform very poorly. While this may be true for very high volumes, or on poorly tuned or resourced databases, I maintain that with today’s evolving hardware appliances  (e.g., Exadata, Exalogic) and the advent of in memory databases, these concerns will soon be a thing of the past.

UPDATE 26-May-2018  – Now 5 years later I have successfully done the above on Oracle. But now we also have Snowflake elastic cloud data warehouse where all the prior constraints are indeed eliminated. With Snowflake you can now easily chose to instantly add compute power if the view is too slow or do the work and processing to materialize the view. (end update)

Worst case, after you have validated the data with your users, you can always turn it into a materialized view or a physical table if you must.

So what do you think? Have you ever tried something like this? Let me know in the comments.

Get virtual, get agile!

Kent

The Data Warrior

P.S. I am giving a talk on Agile Data Warehouse Modeling at the East Coast Oracle Conference this week. If you are there, look me up and we can discuss this post in person!

Free Introduction to Data Warehousing the Data Vault Way

My good friend Dan (@dlinstedt) has put together a sweet set of three videos to introduce everyone to the wonderful world of Data Vault.

When you sign up you will get a set of email messages from Dan discussing the Data Vault Approach to data warehousing. You get access to three videos about the architecture, the methodology, and the modeling technique.

Plus you get free downloads of the first few chapters of the data vault book Super Charge Your Data Warehouse.

So if you have always wanted to learn more about Data Vault, but don’t have the budget for a full on class, this offer will get you headed in the right direction.

Head on over to the Learn Data Vault site now. It is all free.

(NB: this is an affiliate link. If you eventually buy the book or some other training off Dan’s site, I will get a small piece of the action. Not enough to retire mind you, but it might buy lunch.)

The videos are pretty cool. Enjoy.

Kent

Are you certifiable?

Just a quick note to let everyone know that Dan Linstedt, (inventor of the Data Vault Model and Methodology) has announced that he will be teaching a live Data Vault 2.0 Certification Class on July 29th in St Albans, Vermont.

This class will cover all the new DV2.0 standards which incorporate big data, and NoSQL as options for implementation.

So do you have what it takes to become a Certified Data Vault Practitioner – CDVP2?

Check out these details:

Concepts that are covered include the following:

  • Implementation (ETL & SQL) best practices – structured, performance and restartability
  • Data Integration Strategies – real time, big data, and unstructured
  • Overview of Architecture – 3 tier systems architecture
  • Overview of Methodology – agile project planning & delivery
  • Overview of Data Vault Modeling – DV tables, patterns, business vault
  • Foundation of NoSQL & Big Data – where it fits, key structures, parallel loading
  • DV2.0 Methodology (Agility, Roles and Responsibilities, Architecture, CMMI Level 5)
  • DV2.0 Structure Changes
  • Managed Self Service BI
  • Loading Templates (including DV2.0 performance improvements)
  • Business Vault – introduction and overview, application & use

The new course relies heavily on the Super Charge your Data Warehouse book. Students must have completed reading the book before attending class. The class will spend minimal time covering the structural details of the Data Vault itself, and will focus more on the application of the Data Vault, the methodology (or project implementation), and the overall architecture. Including a look at managed self-service BI.

via Data Vault Certification | Class Descriptions.

Get the time, place, costs etc. on the Data Vault Certification site.

Join the elite ranks of certified Data Vault Practitioners.

Get certified!

Take care.

Kent

Data Vault Informatica Class is Live!

Just a quick note to let you all know that Dan has finally released the class on how to easily implement a Data Vault using Informatica.

I wrote about the class here.

I have gone through a few of the lessons already and can tell you the instruction is very clear and easy to follow (even for me!) and the audio and video is excellent. The audio seems to come on a bit load so just be sure you have your volume turned down a bit when you start the videos.

And there is a money back guarantee if for some reason you decide the class is not for you.

If you did not get on Dan’s early notice list you can still sign up by going directly here: http://learndatavault.com/kentdvi

And, since you are a reader of my blog, if you sign up in the next few weeks and enter the coupon code DATAWARRIOR13, you can get $100 off !

So if you use Informatica and plan to do a Data Vault, you owe it to yourself to take a look at this course.

Take care.

Kent

Post Navigation

%d bloggers like this: