The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “#aced”

Agile Data Warehouse Modeling: How to Build a Virtual Type 2 Slowly Changing Dimension

One of the ongoing complaints about many data warehouse projects is that they take too long to delivery. This is one of the main reasons that many of us have tried to adopt methods and techniques (like SCRUM) from the agile software world to improve our ability to deliver data warehouse components more quickly.

So, what activity takes the bulk of development time in a data warehouse project?

Writing (and testing) the ETL code to move and transform the data can take up to 80% of the project resources and time.

So if we can eliminate, or at least curtail, some of the ETL work, we can deliver useful data to the end user faster.

One way to do that would be to virtualize the data marts.

For several years Dan Linstedt and I have discussed the idea of building virtual data marts on top of a Data Vault modeled EDW.

In the last few years I have floated the idea among the Oracle community. Fellow Oracle ACE Stewart Bryson and I even created a presentation this year (for #RMOUG and #KScope13) on how to do this using the Business Model (meta-layer) in OBIEE (It worked great!).

While doing this with a BI tool is one approach, I like to be able to prototype the solution first using Oracle views (that I build in SQL Developer Data Modeler of course).

The approach to modeling a Type 1 SCD this way is very straight forward.

How to do this easily for a Type 2 SCD has evaded me for years, until now.

Building a Virtual Type 2 SCD (VSCD2)

So how to create a virtual type 2 dimension (that is “Kimball compliant” ) on a Data Vault when you have multiple Satellites on one Hub?

(NOTE: the next part assumes you understand Data Vault Data Modeling. if you don’t, start by reading my free white paper, but better still go buy the Data Vault book on LearnDataVault.com)

Here is how:

Build an insert only PIT (Point-in-Time) table that keeps history. This is sometimes referred to as a historicized PIT tables.  (see the Super Charge book for an explanation of the types of PIT tables)

Add a surrogate Primary Key (PK) to the table. The PK of the PIT table will then serve as the PK for the virtual dimension. This meets the standard for classical star schema design to have a surrogate key on Type 2 SCDs.

To build the VSCD2 you now simply create a view that uses the PIT table to join the Hub and all the Satellites together. Here is an example:

Create view Dim2_Customer (Customer_key, Customer_Number, Customer_Name, Customer_Address, Load_DTS)
as
Select sat_pit.pit_seq, hub.customer_num, sat_1.name, sat_2.address, sat_pit.load_dts
from HUB_CUST hub,        
          SAT_CUST_PIT sat_pit,        
          SAT_CUST_NAME sat_1,        
          SAT_CUST_ADDR sat_2
where  hub.CSID = sat_pit.CSID           
    and hub.CSID = sat_1.CSID           
    and hub.CSID = sat_2.CSID           
    and sat_pit.NAME_LOAD_DTS = sat_1.LOAD_DTS           
    and sat_pit.ADDRESS_LOAD_DTS = sat_2.LOAD_DTS 
 

Benefits of a VSCD2

  1. We can now rapidly demonstrate the contents of a type 2 dim prior to ETL programming
  2. With using PIT tables we don’t need the Load End DTS on the Sats so the Sats become insert only as well (simpler loads, no update pass required)
  3. Another by product is the Sat is now also Hadoop compliant (again insert only)
  4. Since the nullable Load End DTS is not needed, you can now more easily partition the Sat table by Hub Id and Load DTS.

Objections

The main objection to this approach is that the virtual dimension will perform very poorly. While this may be true for very high volumes, or on poorly tuned or resourced databases, I maintain that with today’s evolving hardware appliances  (e.g., Exadata, Exalogic) and the advent of in memory databases, these concerns will soon be a thing of the past.

UPDATE 26-May-2018  – Now 5 years later I have successfully done the above on Oracle. But now we also have Snowflake elastic cloud data warehouse where all the prior constraints are indeed eliminated. With Snowflake you can now easily chose to instantly add compute power if the view is too slow or do the work and processing to materialize the view. (end update)

Worst case, after you have validated the data with your users, you can always turn it into a materialized view or a physical table if you must.

So what do you think? Have you ever tried something like this? Let me know in the comments.

Get virtual, get agile!

Kent

The Data Warrior

P.S. I am giving a talk on Agile Data Warehouse Modeling at the East Coast Oracle Conference this week. If you are there, look me up and we can discuss this post in person!

Advertisements

At the mother ship: Oracle ACE Director Briefing

Well, I can’t really talk about the meeting (yet) as everything is under NDA until next week (at least), but I can say it is great to get together with the top Oracle practitioners in the world and chat and discuss the issues and solutions in our field. It is like returning to the mother ship.

Oracle HQ - The Mother Ship

Oracle HQ – The Mother Ship

With all the tech talk and product futures, there is also just general catching up, networking, and bonding. Much is learned over pizza and beer.

And we are all following the America’s Cup race with Oracle Team USA trying to keep the cup away from Team New Zealand

I try to do my part in making sure we stay healthy and energized as well by running Chi Gung (Qigong) sessions in the morning before breakfast. Here is a shot of some of us outside this morning:

 

Not the best photo of Chi Gung practice, but what are you gonna do.

Lots more to come this week at Oracle Open World, so stay tuned for some very exciting announcements.

Go Team Oracle!

Kent

The 1st Tip is Free: Oracle SQL Developer Data Modeler

This year’s ODTUG KScope conference is less than a month away and I am ready to get to New Orleans and meet up with all my friends and colleagues to talk shop and learn some new things.

So in hopes of getting you to attend, I will share with you one of my productivity tips for using SQL Developer Data Modeler (SDDM).

DDL Previewer

I love this tool. I use it every single day in my work building data warehouses for my clients.

Really.

It’s easy – just right click on the table (or view) you want to check and Voila!

Right mouse over object to get preview option

Right mouse over object to get preview option

Once you pick “DDL Preview”, then you get the DDL in another window.

Example of DDL script in Preview Window

Example of DDL script in Preview Window

With this you can:

1. Check the design to see if it is what you wanted/expected.

2. Copy and paste it to a worksheet in SQL Developer where you can then execute it (if you have the rights!).

One more cool thing is that if you leave the DDL window open, you can now click on any other object in the open diagram and instantly see the DDL for that object.

So there you go – really 3 tips for the price of one!

If you want more, I will be presenting my Top 10 SDDM Tips and Tricks at KScope13 in New Orleans in a few weeks. So go register now!

See you in NOLA!

Kent

 

RMOUG Training Days 2013 – Day 1

Unlike many conferences, today started off not with the keynote but with an actual session (probably some advanced psychology at work here). 🙂

I started off with John King’s session on Oracle 11g features that developers should know about. (He was going to talk about 12c but since it has not been released yet, he could not speak about it)

John King giving Session 1 at RMOUG 2013

John King giving Session 1 at RMOUG 2013

John is a great speaker and gave us some very detailed information.

One very interesting piece to me, as a data modeler and data warehouse designer, was the addition of Virtual Columns. With this you can declare a virtual, calculated/derived column to be part of a table definition. With this you can define a calculation once and have it appear when querying the table without actually physically adding a column to the table. Looks promising.

John told us about lots of new things like Pivot, Unpivot, Results Cache, PL/SQL Results cache and Nth Value functions. Some of them are shown in the following pictures.

SQL PIVOT Example

SQL PIVOT Example

Example of UNPIVOT

Example of UNPIVOT

Another cool SQL Function: Nth Value

Another cool SQL Function: Nth Value

All neat options I did not really know about.

Next up was the keynote speech by Mogens Norgaard from Denmark. Mogens is an ACE Director, CEO of his own consulting firm, and a brew master. Interesting guy.

He showed up in his bathrobe to talk to us all about how the smartphone is taking over  the world and all the cool apps you could build (and some he has built).

Mogens Norgaard in his keynote best.

Mogens Norgaard in his keynote best.

Next was my turn – my first session of the conference – 5 Ways to Make Data Modeling Fun (based on a blog post).

I was pleasantly surprised that I had 40-50 people attend and most stayed for the whole talk. It was a good, interactive session. My good buddy Jon Arnold assisted me in administering some of the activities. It was great fun getting the attendees to actually collaborate on activities during a session.

Great participant collaboration during my talk

Great participant collaboration during my talk

As promised, I did give out prizes for some of the activities (all branded Data Warrior LLC stuff).

Next was the ACE Director networking lunch where they put our names on tables so people could sit with us to ask questions (if they wanted too).

Networking Lunch

Networking Lunch

After lunch we some vendor sessions (which I skipped) and several panel discussions. These included the Women in Technology Panel and an Oracle Career Roundtable.

Women in Technology Panel

Women in Technology Panel

Oracle Careers Roundtable

Oracle Careers Roundtable

Anyone notice that the Women in Tech had one male on the panel but the Oracle Career panel had no women? Just sayin’ folks…

Next I sat in for part of a session on Oralce TimesTem database for real-time BI. It turned out to be the same stuff I heard at Oracle Open World so I did not stay.

Last for my day at RMOUG was my joint session with Stewart Bryson on Data Vault and OBIEE. Unfortunately due to the late slot (5:15 PM) we had a very low turn out. 😦 But is was a good session as I discovered all the things Stewart learned trying to use the data vault model for virtualizing the data mart layer (in OBIEE). It was all very good and reinforced my belief that Data Vault is a great way to model an EDW and that non-data vault people could understand it and apply it to dimensional modeling (or that Stewart is really exceptional).

Adios for now.

Kent

P.S. Forgot to mention again that I will be conducting another morning Chi Gung class at & AM above the registration area. Please join!

Happy 2013! What will you do this year?

Happy New Year! Welcome to year #2 of the Oracle Data Warrior.

I hope everyone is looking forward to a bright, happy, and successful year (however you measure it).

For me it will be a year of figuring out my long term business model (maybe?), writing a few more short ebooks (stay tuned), doing my Oracle ACE Director thing, continuing to work as a Data Vault and Data Warehouse advisor and consultant,  presenting at RMOUG, KScope13, and hopefully a few other choice events, and of course writing on this blog (and practicing my martial arts).

That ought to do it, don’t you think?

But you never know what life may throw your way, so I am not tied to any of that really, but that is where my wave seems to be heading today.

One thing I have already done was to take advantage of Vizify to build a visual story about myself. I really like the look and feel of the app and the way it presents my information. Check out the animation on the location page and then the timeline on the career page (which is not quite complete yet). Very cool.

How about you? What is on your horizon for 2013?

Cowabunga!

Kent

P.S. See this cool 2012 report WordPress generated automatically. It covers the stats I put in my last post but much nicer presentation. 😉

The WordPress.com stats helper monkeys prepared a 2012 annual report for this blog.

Here’s an excerpt:

4,329 films were submitted to the 2012 Cannes Film Festival. This blog had 23,000 views in 2012. If each view were a film, this blog would power 5 Film Festivals

Click here to see the complete report.

Post Navigation

%d bloggers like this: