The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the category “Data Warehouse”

Agile Data Warehouse Modeling: How to Build a Virtual Type 2 Slowly Changing Dimension

One of the ongoing complaints about many data warehouse projects is that they take too long to delivery. This is one of the main reasons that many of us have tried to adopt methods and techniques (like SCRUM) from the agile software world to improve our ability to deliver data warehouse components more quickly.

So, what activity takes the bulk of development time in a data warehouse project?

Writing (and testing) the ETL code to move and transform the data can take up to 80% of the project resources and time.

So if we can eliminate, or at least curtail, some of the ETL work, we can deliver useful data to the end user faster.

One way to do that would be to virtualize the data marts.

For several years Dan Linstedt and I have discussed the idea of building virtual data marts on top of a Data Vault modeled EDW.

In the last few years I have floated the idea among the Oracle community. Fellow Oracle ACE Stewart Bryson and I even created a presentation this year (for #RMOUG and #KScope13) on how to do this using the Business Model (meta-layer) in OBIEE (It worked great!).

While doing this with a BI tool is one approach, I like to be able to prototype the solution first using Oracle views (that I build in SQL Developer Data Modeler of course).

The approach to modeling a Type 1 SCD this way is very straight forward.

How to do this easily for a Type 2 SCD has evaded me for years, until now.

Building a Virtual Type 2 SCD (VSCD2)

So how to create a virtual type 2 dimension (that is “Kimball compliant” ) on a Data Vault when you have multiple Satellites on one Hub?

(NOTE: the next part assumes you understand Data Vault Data Modeling. if you don’t, start by reading my free white paper, but better still go buy the Data Vault book on LearnDataVault.com)

Here is how:

Build an insert only PIT (Point-in-Time) table that keeps history. This is sometimes referred to as a historicized PIT tables.  (see the Super Charge book for an explanation of the types of PIT tables)

Add a surrogate Primary Key (PK) to the table. The PK of the PIT table will then serve as the PK for the virtual dimension. This meets the standard for classical star schema design to have a surrogate key on Type 2 SCDs.

To build the VSCD2 you now simply create a view that uses the PIT table to join the Hub and all the Satellites together. Here is an example:

Create view Dim2_Customer (Customer_key, Customer_Number, Customer_Name, Customer_Address, Load_DTS)
as
Select sat_pit.pit_seq, hub.customer_num, sat_1.name, sat_2.address, sat_pit.load_dts
from HUB_CUST hub,        
          SAT_CUST_PIT sat_pit,        
          SAT_CUST_NAME sat_1,        
          SAT_CUST_ADDR sat_2
where  hub.CSID = sat_pit.CSID           
    and hub.CSID = sat_1.CSID           
    and hub.CSID = sat_2.CSID           
    and sat_pit.NAME_LOAD_DTS = sat_1.LOAD_DTS           
    and sat_pit.ADDRESS_LOAD_DTS = sat_2.LOAD_DTS 
 

Benefits of a VSCD2

  1. We can now rapidly demonstrate the contents of a type 2 dim prior to ETL programming
  2. With using PIT tables we don’t need the Load End DTS on the Sats so the Sats become insert only as well (simpler loads, no update pass required)
  3. Another by product is the Sat is now also Hadoop compliant (again insert only)
  4. Since the nullable Load End DTS is not needed, you can now more easily partition the Sat table by Hub Id and Load DTS.

Objections

The main objection to this approach is that the virtual dimension will perform very poorly. While this may be true for very high volumes, or on poorly tuned or resourced databases, I maintain that with today’s evolving hardware appliances  (e.g., Exadata, Exalogic) and the advent of in memory databases, these concerns will soon be a thing of the past.

UPDATE 26-May-2018  – Now 5 years later I have successfully done the above on Oracle. But now we also have Snowflake elastic cloud data warehouse where all the prior constraints are indeed eliminated. With Snowflake you can now easily chose to instantly add compute power if the view is too slow or do the work and processing to materialize the view. (end update)

Worst case, after you have validated the data with your users, you can always turn it into a materialized view or a physical table if you must.

So what do you think? Have you ever tried something like this? Let me know in the comments.

Get virtual, get agile!

Kent

The Data Warrior

P.S. I am giving a talk on Agile Data Warehouse Modeling at the East Coast Oracle Conference this week. If you are there, look me up and we can discuss this post in person!

Let’s Review #OOW13 and #OTW13 in Pictures

Yes I have been derelict in my duty and not posted about the sessions I attended at Oracle OpenWorld (#OOW13) and OakTable World (#OTW).

Well here are the high points with pictures!

Monday

Monday started off with the now annual Swim the Bay (so I missed the keynote). If you have Facebook, you can see pictures from the event here.

Most of the day I then spent at the alternate conference, OakTable World (#OTW13) seeing a few talk and giving one myself.

My good friend from Denver, Tim Gorman gave a nice talk about all the data compression options available in Oracle.

Tim Gorman: Oracle Compression Options

Tim Gorman: Oracle Compression Options

Next was a great session from the well known blogger and author Fabian Pascal. I have been reading his work for years but this was the first time I got to hear him speak in person. As with his writing, the talk was both intellectually stimulating and challenging!

Fabian Pascal: The Last Null

Fabian Pascal: The Last Null

It really is quite a debate in the database world about the meaning and use of NULL in an RDBMS. Fabian has a proposal on how we can (and should) represent data in a way where there will never be NULL attributes.

After a some scheduling issues. later in the day, I did my presentation on using Data Vault Modeling for Agile Data Warehouse Modeling. The room I got had a huge wall for me to project my session on. Definitely the biggest screen ever for one of my talks.

Biggest screen ever for me and my data vault presentation.

Biggest screen ever for me and my data vault presentation.

Tuesday

Started the morning with a few friends doing morning Chi Gung in Union Square, then followed by getting a quick survey of the exhibit hall in Moscone South and a trip to the Demo grounds.

The throng descends into the depths of Moscone West to hunt the exhibit hall for goodies.

The throng descends into the depths of Moscone West to hunt the exhibit hall for goodies.

The hall was of course HUGE as usual so some of the vendors who were tucked in back got creative on getting the foot traffic to come their way.

A clever gimmick one vendor did to get traffic to their booth in the gigantic hall

A clever gimmick one vendor did to get traffic to their booth in the gigantic hall

For sessions, I attend a road map session on Oracle’s Big Data strategy given by my friend JP Dijcks.

JP talks all things Big Data

JP talks all things Big Data

Mostly he painted a picture of the issues with figuring out how to collect and put all that data to real work. Of course Oracle has a ton of products to offer to help solve the problem.

How to shrink the gap between getting big data and actually using it!

How to shrink the gap between getting big data and actually using it!

Next up I attended Jeff Smith’s session on SQL Developer 4.0 and got to learn that there was a data mining extension available for the tool that makes doing some advanced analytics a lot easier.

Definition for Data Mining. An extension for Data Mining is available for SQL Developer.

Definition for Data Mining. An extension for Data Mining is available for SQL Developer.

Next on my agenda was the Cloud keynote with Microsoft. I wrote about that here.

Finally for the day, a late presentation by Maria Colgan and Jonathan Lewis giving us their top tuning tips in what they called the SQL Tuning Bootcamp.

Optimizer tips from a pro Jonathan Lewis. I am sure it means something to someone out there!

Optimizer tips from a pro Jonathan Lewis. I am sure it means something to someone out there!

As always with these type session, there was a ton of useful information that makes my brain hurt. I have to keep reviewing  my notes to make sure I can use at least 10% of what they taught.

Wednesday

This was mostly a work day for me at a client site. And a late lunch to see the final race of the America’s Cup.

In case you have been under a rock since last week, Team USA won! It was great to actually be there on Pier 27 during the final race. Not a great vantage point overall but with the big screen to watch and then seeing the boats right after they finished, it was worth the walk.

After the race and a little more data model work at my client’s office, I walked back to the conference to see a final session (for me) given by Gwen Shapira about using solid state disks with Exadata.

I really did not know much about SSDs before this session but feel really educated now. I actually had no idea that SSD and FLASH drives or FLASH memory were the same thing. Guess I was behind on the hardware buzzwords.

Gwen and Mark on Solid State Disk AKA Flash

Gwen and Mark on Solid State Disk AKA Flash

Then it was off to the annual blogger meetup then dinner on the town with friends at The Stinking Rose (thanks Tim!).

I decided to skip the appreciation event this year and take it easy, have a nice dinner, then pack up to head home. Thursday it was breakfast at Lori’s Diner then off to the airport and back home.

As a reminder if you want to see what the buzz was at the events, just check out the hashtags #OOW13 and #OTW13 on twitter (if you had a big data machine you might even be able to generate some insight from those feeds).

Well that’s a wrap for this years big show.

Next up, I will be speaking at the upcoming ECO conference in North Carolina. Should be fun.

Later.

Kent

P.S. If you want to see my OTW presentation, you can find them on Slideshare.

P.P.S. For another great review of OOW13 check out this post by my friend from Turkey, Gurcan. See if you can find my unlabeled cameo in the post.

#OOW13 – Oracle OpenWorld 2013: Pictures, Tweets and more

#OOW13 is well underway now with lots of tweets and blog posts already out on the web. Check the stream on twitter under #oow13 for all the news.

I tried to see some of the Americas Cup race on Saturday but unfortunately the weather and wind did  not cooperate so the race was called for the day. 😦

But got a few pictures to share:

Americas Cup Racing: Team USA

Americas Cup Racing: Team USA

And a  nice view:

Golden Gate from the East Bleachers at the Americas Cup

Golden Gate from the East Bleachers at the Americas Cup

The New Oracle Plaza

Imagine our shock when we learned that there would be no Howard Street Tent this year! The nerve.

But in its place is the open-air Oracle Plaza, full of comfortable seats for hanging out, networking, eating lunch, and watching the keynotes on a big screen. Oracle is really counting on the weather to be typical dry fall weather. Hopefully that holds up.

OOW13: The New Oracle Plaza

OOW13: The New Oracle Plaza

User Group Sunday

I did attend a few sessions on Sunday.

The only session this year (from a user) on my favorite tool, Oracle SQL Developer Data Modeler was given by Heli Helskyaho, the CEO of Miracle Finland (@HeliFromFinland).

Heli Helskyaho: Why do I need #SQLDevModler?

Heli Helskyaho: Why do I need #SQLDevModler?

She did an excellent job of introducing folks to SDDM and laid out 9 use cases for using the tool.

9 use cases for #SQLDevModeler

9 use cases for #SQLDevModeler

(Hard to read I know, but zoom in…)

I also attended a session by my buddy Stewart Bryson, who is now the new Chief Innovation Officer for Rittman Mead globaly, Congrats Stewart!

He did an amazing (IMO) talk about how to go beyond agile and achieve Extreme BI. He gave us the beginnings of a complete Oracle-based framework which looks very promising.

He said some heretical things like skip the staging area for your data warehouse! I do think he is on to something and can make it work so we have a real chance at real time BI and still have a solid architectural foundation.

Stewart Bryson's talk about Extreme BI

Stewart Bryson’s talk about Extreme BI

Opening Keynote

Of course the was the opening Keynote by Oracle CEO Larry Ellison. Key words: #bigdata #inmemory and #AmericasCup (of course). Lots of big news.

Check out this info graphic about the event.

Mark Rittman already did a nice review of the talk (among other things). So rather than repeat check out his post.

And for those interested, right now the Americas Cup standings are USA 5 to New Zealand 8. But that will change later today for sure when they race again.

More to come about #oow13 and this years Oaktable World (#OTW13).

Later.

Kent

The Oracle Data Warrior

P.S. Had a great evening last night at the annual Oracle #ACE dinner which was hosted at the Walt Disney Museum in The Presidio.

Another Quick Tip for SQL Developer Data Modeler

Not sure how I missed this little utility, but I did.

Ever need to quickly build a set of views on a set of tables to create a read-only data layer you could expose to some users or processes?

Well as I am developing a new data warehouse for my current client, we decided to control access by creating a read-only user that would hold views that pointed back to our main data warehouse schema. The BI tool points to the read only schema (for now anyway).

Anyway, under the Tools menu I found a Table to View generator.

Under the Tools menu look for the Wizard to create a view definition based on a table

Under the Tools menu look for the Wizard to create a view definition based on a table

Once you select it from the menu you get a dialog box with all the tables in your model selected. So with one push of a button you have views on all the tables.

Get a list of table to convert and Select or Deselect all

Get a list of table to convert and Select or Deselect all

Then you can edit the views, if needed, using the view query builder.

Or you can select (or de-select) specific tables to build views on.

Even better – the tool applies a naming standard on the output view names (v_<table name>).

On top of this, if you happen to have some views (maybe for testing?) that you want to turn into tables and then populate with an ETL process, right below the first wizard is an option to create a table definition from the view definition.

Now granted most of you can easily do either of these tasks using plain old SQL, but imagine you need to do it for several hundred tables.

This little wizard would save you a ton of time (and testing).

And you will have a documented data model when you are done.

So go give it a try!

Later.

Kent

NB: I am taking a little downtime in early August so don’t look for any new posts until near the end of the month.

KScope13 Day Five: C’est Fini!

Yes, the last day of the conference arrived. Many folks got a bit of a slow start as they recovered from the prior evenings festivities (but that is true almost any day in the French Quarter!).

Morning Chi Gung was smaller but still a respectable turnout of 14 people. Some of my participants had already started their trek home, others just could not quite get up… c’est la vie.

Morning Chi Gung participants practicing a  still stance meditation and breathing exercise.

Morning Chi Gung participants practicing a still stance meditation and breathing exercise.

The KScope Social Network

My first session on this final day was Charles Elliott from Rittman Mead, discussing how to do Social Network Analysis with Oracle Tools. He discussion centered around using R and a graphing tool called D2 (www.d3js.org).

Network analysis of KScope contributors and influencers across two tracks.

Network analysis of KScope contributors and influencers across two tracks.

Charles had mined some data (from twitter I think) and then charted some of the connections to determine who were the major influencers related to KScope13 and several of our tracks. My name is up there, but I am a very small dot. You might be able to see in the picture a pretty big dot which is Gwen Shapira, who was not even at the conference this year!

Looks like very cool tech. Not sure where I would use it (yet).

Agile Case Study?

One session I was not too happy with was this one. It was titled as an agile implementation success story but it was (IMO) a veiled pitch for a free addin to SQL Developer.

The entire presentation was a statement of issues in trying to do version control of database objects. And they were all legitimate issues. But there was no case study of a specific project and how they tried to solve the problem. The solution was a product from the speaker’s company that they have developed as an add in to SQL Developer to allow you to control database object check in and check out at the database level.

Nice idea. Looks like it will work.

But, the product is not even available today! It will be release later this summer and is free for up to five users (then I assume it will cost $$).

My beef is that this should have been clearly labeled as a vendor presentation not as a “success” story.

More Crossover

For my last session I went to Stewart Bryson and Edward Roske’s presentation on Innovation in BI: Oracle Business Intelligence Against Essbase & Relational.

This was a great collaborative effort between these two guys who work for companies that might be considered competitors. I am pretty sure this is a KScope first (in many ways).

A KScope 1st: Stewart Bryson and Edward Roske do a joint presentation on using OBIEE against both Essbase and a relation data warehouse at the same time.

A KScope 1st: Stewart Bryson and Edward Roske do a joint presentation on using OBIEE against both Essbase and a relation data warehouse at the same time.

In the end, after some demonstrations of how to do this, they guys left us with a really nice comparison chart on when to use which tool and its relative effectiveness in solving specific problems.

Edward and Stewart came up with this nice chart trying to compare the two technologies on a number of features and functions.

Edward and Stewart came up with this nice chart trying to compare the two technologies on a number of features and functions.

Nice job guys!

C’est Fini

A relatively new KScope tradition is to hold a final closing general session. Here we say farewell to the event and the city with some slides and humorous videos from the week.

We also get to learn who the top five speakers were for the event (as ranked by session evaluations) and who the speaker of the year winner is.

This year the Best Speaker Award went to Edward (don’t call me Ed!) Roske from InterRel.

There was also an award for the best Kscope Ambassador (the one who helped with the most sessions). This went to Mark Becerra.

Congratulations to both!

Attendees at the KScope13 closing session with their KScope14 travel mugs

Attendees at the KScope13 closing session with their KScope14 travel mugs

As a final note we got a great little video welcoming KScope14 to Seattle.

The even bigger news was that we already have sponsors for the 2014 event! Platinum, gold, and sliver level sponsors have already signed! Plus a boat load of exhibitors.

We must be doing something right that they have secured their spot a year in advance.

We have come a long way baby!

The website for Kscope14 is up and running. You can register, submit an abstract, and make your hotel reservation for next year.

So head on over there now while you are thinking of it!  KScope14.com

One Last Shot

I can’t end this series without at least one picture of the great food here in New Orleans. I did eat well!

Ending the conference on a high note with blueberry upside down cake at celebrity chef Emeril Lagasse's restaurant NOLA in the French Quarter.

Ending the conference on a high note with blueberry upside down cake at celebrity chef Emeril Lagasse’s restaurant NOLA in the French Quarter.

See you in Seattle (if not sooner)!

Kent

The Oracle Data Warrior

Post Navigation