The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the category “Data Vault”

Oracle Data Warrior: 2013 in Review

Happy New Year!

I have been busy relaxing with my family on vacation so I decided not to write a full review this year. It was a busy year in many ways (not just blogging!). If you take a look in the right column, you will see a list of recent posts, top posts, and a complete archive by month. Go crazy…

If you want some highlights and details to see what happened on Oracle Data Warrior, check out this cool report provided by WordPress. Below the report I am also posting the link for Jeff Smith’s SQL Developer year-in-review.

Year in Review Report

The WordPress.com stats helper monkeys prepared a 2013 annual report for my blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 46,000 times in 2013. If it were a concert at Sydney Opera House, it would take about 17 sold-out performances for that many people to see it.

Click here to see the complete report.

Other Reviews

Well my main man Jeff Smith did a nice write up for SQL Developer. Check it out here: www.thatjeffsmith.com/archive/2013/12/the-best-of-2013-sql-developer-posts/

I was pretty busy on Twitter this year too. Here is a list of my top tweets: https://twitter.com/KentGraziano/timelines/419701819502104577.

For something fun check out the Vizify video link in my twitter profile.

For me, 2014 will be more adventures in Oracle data land, attending conferences, listening in at the BBBT, working with Dan and Sanjay on DV 2.0 stuff, and working to build and improve data models and data warehouses for my clients.

What will you do in 2014?

Cheers!

Kent G

The Oracle Data Warrior

Merry Christmas!

To all my loyal readers, followers, #BBBTers, #ODTUGers, #ChiGungers, #RMOUGers, #DataVaulters, #TKDers, and fellow Oracle #ACEs and #ACEDs, I want to wish you all a

Very Merry Christmas!

Talk to you soon.
Kent

East Coast Oracle Users Conference (#ECOracle13) Review

This week I did a little travel and went to Durham, North Carolina to present at the 2013 East Coast Oracle Users Conference (aka ECO). While I have been aware of this event for over 20 years, it is the first time I have attended.

It was worth the trip. (Thanks to Jeff Smith at Oracle for alerting me to the event and encouraging me to submit). He actually sent me, Danny and Sarah (The EPM Queen). It was great to have members of the ODTUG clan together.

The gang of three - ODTUGers at ECO13 thanks to That Jeff Smith guy. Yea - he sent us!

The gang of three – ODTUGers at ECO13 thanks to That Jeff Smith guy. Yea – he sent us!

Overall a well run event held at the Sheraton Imperial Hotel and Conference Center. It drew over 300 attendees and a large list of Oracle ACE and ACE Directors were there to present to a crowd very eager to learn and network.

Fun and Games: The Keynote

Our opening keynote from Steven Feuerstein (inventor of the PL/SQL Challenge)  was a fun take on different types of therapy and how they might be applied to software developers.

PL/SQL Evangelist Steven Feuerstein discusses Coding Therapy for Software Developers

PL/SQL Evangelist Steven Feuerstein discusses Coding Therapy for Software Developers

His discussed the use of:

  • Game therapy (try out mastermind or setgame.com)
  • Dream Therapy
  • Confessional Therapy
  • Shock Therapy
  • Couples Therapy
    • For DBA & Developers
    • For Developers & Their Managers

It was a fun, light way to start the conference with some very valuable advice.

Heavy Duty DBA-type Tuning Talks

Oracle ACE Director, author, and trainer Craig Shallahamer did two deep dive tuning sessions that I attended. In the first one, Introduction to Time-based Performance Analysis: Stop the Guessing, Craig gave us his four point framework for Holistic Performance Analysis. The points were:

  1. The Three Circles to consider (OS, Database, Application)
  2. Be Quantitative (i.e., trust the numbers not a hunch)
  3. Serialization is death, Parallel is life
  4. Tell a story (make the explanation of the issue understandable to managers)

With that he got into all sorts of v$ view stuff that went mostly over my head. Needless to say I will have to download the slides from his site (orapub.com) and give them to someone more attuned to this kind of tuning than I!

Oracle ACE Director, Craig Shallahamer discusses low level details for understanding Oracle CPU consumption

Oracle ACE Director, Craig Shallahamer discusses low level details for understanding Oracle CPU consumption

The second presentation Craig gave was called Understanding Oracle CPU Consumption: The Missing Link. Again lots of views and some Linux OS utilities (e.g., perf) and lots of numbers were displayed and discussed to try to ferret out how to determine what Oracle functions were actually taking up CPU time.

Even though I don’t really understand a lot of this (hey, I am a data modeler, not a dba right?) I like to go to sessions like this as I enjoy listening to smart people talk passionately about the things they do, and I figure I might retain just enough to point someone else in the right direction in the future, even if it is only to give them a copy of these slides!

Lovely Southern Style Lunch

ECO had one of the nicest little lunch buffets I have eaten in a while. Very simple southern food that included cole slaw, potato salad, baked chicken, fried chicken, pulled port (with N. Carolina bbq sauce), hush puppies and apple cobbler. (I did not say it was a light lunch right?)

I love all kinds of BBQ and the pulled pork did not disappoint. I do not usually like fried chicken but figured I should try it and was pleasantly surprised. Crisp and moist. Very nice.

Traditional Southern Fare for Lunch

Traditional Southern Fare for Lunch

My 1st Session – Making Data Modeling Fun

I had the best turnout ever for this topic with over 40 people in the session most of whom were game to try my gamification of data model review sessions.

Session attendees developing Haiku poems based on a Data Model

Session attendees developing Haiku poems based on a Data Model

One of the tasks was to translate relationship sentences and model descriptions into Haiku (or another form). There were prizes as an incentive to play along.

Some of the prizes for participants at my talk

Some of the prizes for participants at my talk

The winner by general acclamation was Edie Waite from Raleigh, NC with this little limerick:

There once was a country named France
Which had many regions for dance
The locations they chose to dance on their toes
Made employees all look askance.

The data model we used had the entities: Country, Region, Employee, Locations, and a few others.

Another Haiku from Sarah Zumbrum (a noted non-data modeler) went like this:

More than one region
Can reside in a country
Like the USA
The session was really a lot of fun thanks to everyone being open minded and being willing to try some unconventional approaches to gathering data model requirements. (There was one other Haiku in French which I will add as soon as the author sends it to me!)

ECO 13 – Day 2

Keynote today was about eBusiness suite stuff. I sat there after breakfast mostly not listening as I started to put this blog post together.

Then I did my 2nd talk.

Agile Data Warehouse Modeling

I had a somewhat disappointing turnout (only 5 people, sigh) but it was a great exchange with those 5 people. We had a very good discussion about applying agile techniques to building a data warehouse and I was able to introduce them to some of the details of Data Vault Data Modeling. None of them knew much about data vault, but some had heard the term.

One attendee did tell me he was skeptical about the approach when he came in as he was a traditional Kimball dimensional data warehouse guy. But after the session he was willing to concede there was some merit and ideas he had not seen before and he was going to take those into consideration as he embarked on a new phase of his project where there were some complex problems to solve. He could see that data vault might just help.

Really can’t ask for more than that!

Embedded Analytics

So my last session for the event was to attend Craig Warman’s talk on embedded analytics. It was a good discussion about how BI and analytics have evolved, Craig presented a simple maturity model as part of the talk:

Level 0: BI reporting and analytic applications are completely seperate from other applications
Level 1: Gateway Analytics – Operational applications have a report tab or menu item to launch the BI reporting tool interface. Maybe there is a login pass through.
Level 2: Inline Analytics – at this level, the analytics and BI tool has been incorporated into the operational application interface to the point it has the same look and feel and you can’t tell it is a separate product or tool. This where many organizations are today.
Level 3: Infused Analytics – this is the goal. At this level the analytics are truly part of the application and provide core functionality. Examples of this are the recommendations you get on Amazon as you check out or the movie suggestions you get on Netflix based on your prior movie choices. If the analytic pieces were removed the application would not function correctly.
Craig Warman (ECO13 conference chair) talks about what embedded analytics is (and is not)

Craig Warman (ECO13 conference chair) talks about what embedded analytics is (and is not)

Well that’s it for this conference.

Put ECO on your radar for 2014.

See you around.

Kent

P.S. Next conference on my agenda is RMOUG TD 2014. Let me know if you will be there.

Agile Data Warehouse Modeling: How to Build a Virtual Type 2 Slowly Changing Dimension

One of the ongoing complaints about many data warehouse projects is that they take too long to delivery. This is one of the main reasons that many of us have tried to adopt methods and techniques (like SCRUM) from the agile software world to improve our ability to deliver data warehouse components more quickly.

So, what activity takes the bulk of development time in a data warehouse project?

Writing (and testing) the ETL code to move and transform the data can take up to 80% of the project resources and time.

So if we can eliminate, or at least curtail, some of the ETL work, we can deliver useful data to the end user faster.

One way to do that would be to virtualize the data marts.

For several years Dan Linstedt and I have discussed the idea of building virtual data marts on top of a Data Vault modeled EDW.

In the last few years I have floated the idea among the Oracle community. Fellow Oracle ACE Stewart Bryson and I even created a presentation this year (for #RMOUG and #KScope13) on how to do this using the Business Model (meta-layer) in OBIEE (It worked great!).

While doing this with a BI tool is one approach, I like to be able to prototype the solution first using Oracle views (that I build in SQL Developer Data Modeler of course).

The approach to modeling a Type 1 SCD this way is very straight forward.

How to do this easily for a Type 2 SCD has evaded me for years, until now.

Building a Virtual Type 2 SCD (VSCD2)

So how to create a virtual type 2 dimension (that is “Kimball compliant” ) on a Data Vault when you have multiple Satellites on one Hub?

(NOTE: the next part assumes you understand Data Vault Data Modeling. if you don’t, start by reading my free white paper, but better still go buy the Data Vault book on LearnDataVault.com)

Here is how:

Build an insert only PIT (Point-in-Time) table that keeps history. This is sometimes referred to as a historicized PIT tables.  (see the Super Charge book for an explanation of the types of PIT tables)

Add a surrogate Primary Key (PK) to the table. The PK of the PIT table will then serve as the PK for the virtual dimension. This meets the standard for classical star schema design to have a surrogate key on Type 2 SCDs.

To build the VSCD2 you now simply create a view that uses the PIT table to join the Hub and all the Satellites together. Here is an example:

Create view Dim2_Customer (Customer_key, Customer_Number, Customer_Name, Customer_Address, Load_DTS)
as
Select sat_pit.pit_seq, hub.customer_num, sat_1.name, sat_2.address, sat_pit.load_dts
from HUB_CUST hub,        
          SAT_CUST_PIT sat_pit,        
          SAT_CUST_NAME sat_1,        
          SAT_CUST_ADDR sat_2
where  hub.CSID = sat_pit.CSID           
    and hub.CSID = sat_1.CSID           
    and hub.CSID = sat_2.CSID           
    and sat_pit.NAME_LOAD_DTS = sat_1.LOAD_DTS           
    and sat_pit.ADDRESS_LOAD_DTS = sat_2.LOAD_DTS 
 

Benefits of a VSCD2

  1. We can now rapidly demonstrate the contents of a type 2 dim prior to ETL programming
  2. With using PIT tables we don’t need the Load End DTS on the Sats so the Sats become insert only as well (simpler loads, no update pass required)
  3. Another by product is the Sat is now also Hadoop compliant (again insert only)
  4. Since the nullable Load End DTS is not needed, you can now more easily partition the Sat table by Hub Id and Load DTS.

Objections

The main objection to this approach is that the virtual dimension will perform very poorly. While this may be true for very high volumes, or on poorly tuned or resourced databases, I maintain that with today’s evolving hardware appliances  (e.g., Exadata, Exalogic) and the advent of in memory databases, these concerns will soon be a thing of the past.

UPDATE 26-May-2018  – Now 5 years later I have successfully done the above on Oracle. But now we also have Snowflake elastic cloud data warehouse where all the prior constraints are indeed eliminated. With Snowflake you can now easily chose to instantly add compute power if the view is too slow or do the work and processing to materialize the view. (end update)

Worst case, after you have validated the data with your users, you can always turn it into a materialized view or a physical table if you must.

So what do you think? Have you ever tried something like this? Let me know in the comments.

Get virtual, get agile!

Kent

The Data Warrior

P.S. I am giving a talk on Agile Data Warehouse Modeling at the East Coast Oracle Conference this week. If you are there, look me up and we can discuss this post in person!

Let’s Review #OOW13 and #OTW13 in Pictures

Yes I have been derelict in my duty and not posted about the sessions I attended at Oracle OpenWorld (#OOW13) and OakTable World (#OTW).

Well here are the high points with pictures!

Monday

Monday started off with the now annual Swim the Bay (so I missed the keynote). If you have Facebook, you can see pictures from the event here.

Most of the day I then spent at the alternate conference, OakTable World (#OTW13) seeing a few talk and giving one myself.

My good friend from Denver, Tim Gorman gave a nice talk about all the data compression options available in Oracle.

Tim Gorman: Oracle Compression Options

Tim Gorman: Oracle Compression Options

Next was a great session from the well known blogger and author Fabian Pascal. I have been reading his work for years but this was the first time I got to hear him speak in person. As with his writing, the talk was both intellectually stimulating and challenging!

Fabian Pascal: The Last Null

Fabian Pascal: The Last Null

It really is quite a debate in the database world about the meaning and use of NULL in an RDBMS. Fabian has a proposal on how we can (and should) represent data in a way where there will never be NULL attributes.

After a some scheduling issues. later in the day, I did my presentation on using Data Vault Modeling for Agile Data Warehouse Modeling. The room I got had a huge wall for me to project my session on. Definitely the biggest screen ever for one of my talks.

Biggest screen ever for me and my data vault presentation.

Biggest screen ever for me and my data vault presentation.

Tuesday

Started the morning with a few friends doing morning Chi Gung in Union Square, then followed by getting a quick survey of the exhibit hall in Moscone South and a trip to the Demo grounds.

The throng descends into the depths of Moscone West to hunt the exhibit hall for goodies.

The throng descends into the depths of Moscone West to hunt the exhibit hall for goodies.

The hall was of course HUGE as usual so some of the vendors who were tucked in back got creative on getting the foot traffic to come their way.

A clever gimmick one vendor did to get traffic to their booth in the gigantic hall

A clever gimmick one vendor did to get traffic to their booth in the gigantic hall

For sessions, I attend a road map session on Oracle’s Big Data strategy given by my friend JP Dijcks.

JP talks all things Big Data

JP talks all things Big Data

Mostly he painted a picture of the issues with figuring out how to collect and put all that data to real work. Of course Oracle has a ton of products to offer to help solve the problem.

How to shrink the gap between getting big data and actually using it!

How to shrink the gap between getting big data and actually using it!

Next up I attended Jeff Smith’s session on SQL Developer 4.0 and got to learn that there was a data mining extension available for the tool that makes doing some advanced analytics a lot easier.

Definition for Data Mining. An extension for Data Mining is available for SQL Developer.

Definition for Data Mining. An extension for Data Mining is available for SQL Developer.

Next on my agenda was the Cloud keynote with Microsoft. I wrote about that here.

Finally for the day, a late presentation by Maria Colgan and Jonathan Lewis giving us their top tuning tips in what they called the SQL Tuning Bootcamp.

Optimizer tips from a pro Jonathan Lewis. I am sure it means something to someone out there!

Optimizer tips from a pro Jonathan Lewis. I am sure it means something to someone out there!

As always with these type session, there was a ton of useful information that makes my brain hurt. I have to keep reviewing  my notes to make sure I can use at least 10% of what they taught.

Wednesday

This was mostly a work day for me at a client site. And a late lunch to see the final race of the America’s Cup.

In case you have been under a rock since last week, Team USA won! It was great to actually be there on Pier 27 during the final race. Not a great vantage point overall but with the big screen to watch and then seeing the boats right after they finished, it was worth the walk.

After the race and a little more data model work at my client’s office, I walked back to the conference to see a final session (for me) given by Gwen Shapira about using solid state disks with Exadata.

I really did not know much about SSDs before this session but feel really educated now. I actually had no idea that SSD and FLASH drives or FLASH memory were the same thing. Guess I was behind on the hardware buzzwords.

Gwen and Mark on Solid State Disk AKA Flash

Gwen and Mark on Solid State Disk AKA Flash

Then it was off to the annual blogger meetup then dinner on the town with friends at The Stinking Rose (thanks Tim!).

I decided to skip the appreciation event this year and take it easy, have a nice dinner, then pack up to head home. Thursday it was breakfast at Lori’s Diner then off to the airport and back home.

As a reminder if you want to see what the buzz was at the events, just check out the hashtags #OOW13 and #OTW13 on twitter (if you had a big data machine you might even be able to generate some insight from those feeds).

Well that’s a wrap for this years big show.

Next up, I will be speaking at the upcoming ECO conference in North Carolina. Should be fun.

Later.

Kent

P.S. If you want to see my OTW presentation, you can find them on Slideshare.

P.P.S. For another great review of OOW13 check out this post by my friend from Turkey, Gurcan. See if you can find my unlabeled cameo in the post.

Post Navigation