The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the category “SQL Developer Data Modeler”

Better Data Modeling: The best FREE data modeling tool just got better!

Yes, it true Virginia, the is a Santa Claus!

And this year Santa brought you a new, improved version of the best FREE data modeling tool in the known universe: Oracle SQL Developer Data Modeler 4.0.

The team at Oracle went all out this year and produced three (yes three) pre-release versions to make sure all the fixes and new features were rock solid before they called it production.

That is a lot of testing and work.

But worth the effort – they fixed piles of bugs and added dozens of new features.

Oracle product manager Jeff Smith (@thatjeffsmith) has already published several articles highlighting his favorite new features. Check out what he has to say here then go download the new version and give it try.

Let me know what you your favorite new feature is.

Merry Christmas!

The Oracle Christmas Elf

(Kent)

P.S. Without proven methods and standards, even the best tool will not insure you build the best model, so why not increase your chances by giving yourself the gift of knowledge by picking up a copy of my data model checklist book (on sale for a few more hours).

 

Agile Data Warehouse Modeling: How to Build a Virtual Type 2 Slowly Changing Dimension

One of the ongoing complaints about many data warehouse projects is that they take too long to delivery. This is one of the main reasons that many of us have tried to adopt methods and techniques (like SCRUM) from the agile software world to improve our ability to deliver data warehouse components more quickly.

So, what activity takes the bulk of development time in a data warehouse project?

Writing (and testing) the ETL code to move and transform the data can take up to 80% of the project resources and time.

So if we can eliminate, or at least curtail, some of the ETL work, we can deliver useful data to the end user faster.

One way to do that would be to virtualize the data marts.

For several years Dan Linstedt and I have discussed the idea of building virtual data marts on top of a Data Vault modeled EDW.

In the last few years I have floated the idea among the Oracle community. Fellow Oracle ACE Stewart Bryson and I even created a presentation this year (for #RMOUG and #KScope13) on how to do this using the Business Model (meta-layer) in OBIEE (It worked great!).

While doing this with a BI tool is one approach, I like to be able to prototype the solution first using Oracle views (that I build in SQL Developer Data Modeler of course).

The approach to modeling a Type 1 SCD this way is very straight forward.

How to do this easily for a Type 2 SCD has evaded me for years, until now.

Building a Virtual Type 2 SCD (VSCD2)

So how to create a virtual type 2 dimension (that is “Kimball compliant” ) on a Data Vault when you have multiple Satellites on one Hub?

(NOTE: the next part assumes you understand Data Vault Data Modeling. if you don’t, start by reading my free white paper, but better still go buy the Data Vault book on LearnDataVault.com)

Here is how:

Build an insert only PIT (Point-in-Time) table that keeps history. This is sometimes referred to as a historicized PIT tables.  (see the Super Charge book for an explanation of the types of PIT tables)

Add a surrogate Primary Key (PK) to the table. The PK of the PIT table will then serve as the PK for the virtual dimension. This meets the standard for classical star schema design to have a surrogate key on Type 2 SCDs.

To build the VSCD2 you now simply create a view that uses the PIT table to join the Hub and all the Satellites together. Here is an example:

Create view Dim2_Customer (Customer_key, Customer_Number, Customer_Name, Customer_Address, Load_DTS)
as
Select sat_pit.pit_seq, hub.customer_num, sat_1.name, sat_2.address, sat_pit.load_dts
from HUB_CUST hub,        
          SAT_CUST_PIT sat_pit,        
          SAT_CUST_NAME sat_1,        
          SAT_CUST_ADDR sat_2
where  hub.CSID = sat_pit.CSID           
    and hub.CSID = sat_1.CSID           
    and hub.CSID = sat_2.CSID           
    and sat_pit.NAME_LOAD_DTS = sat_1.LOAD_DTS           
    and sat_pit.ADDRESS_LOAD_DTS = sat_2.LOAD_DTS 
 

Benefits of a VSCD2

  1. We can now rapidly demonstrate the contents of a type 2 dim prior to ETL programming
  2. With using PIT tables we don’t need the Load End DTS on the Sats so the Sats become insert only as well (simpler loads, no update pass required)
  3. Another by product is the Sat is now also Hadoop compliant (again insert only)
  4. Since the nullable Load End DTS is not needed, you can now more easily partition the Sat table by Hub Id and Load DTS.

Objections

The main objection to this approach is that the virtual dimension will perform very poorly. While this may be true for very high volumes, or on poorly tuned or resourced databases, I maintain that with today’s evolving hardware appliances  (e.g., Exadata, Exalogic) and the advent of in memory databases, these concerns will soon be a thing of the past.

UPDATE 26-May-2018  – Now 5 years later I have successfully done the above on Oracle. But now we also have Snowflake elastic cloud data warehouse where all the prior constraints are indeed eliminated. With Snowflake you can now easily chose to instantly add compute power if the view is too slow or do the work and processing to materialize the view. (end update)

Worst case, after you have validated the data with your users, you can always turn it into a materialized view or a physical table if you must.

So what do you think? Have you ever tried something like this? Let me know in the comments.

Get virtual, get agile!

Kent

The Data Warrior

P.S. I am giving a talk on Agile Data Warehouse Modeling at the East Coast Oracle Conference this week. If you are there, look me up and we can discuss this post in person!

Better Data Modeling: New and Improved Oracle SQL Developer Data Modeler (#SQLDevModeler)

Yup, my friends at Oracle have been hard at working enhancing what was already the best FREE data modeling tool out there.

They just released SDDM R4 EA3! You can go get it right now: http://www.oracle.com/technetwork/developer-tools/datamodeler/downloads/datamodeler-4ea-downloads-1988443.html

As always there are both new features and bug fixes.

One of the coolest new features is the ability to show entity (or table) comments right on the diagram in the object. This will be very useful for enabling data model reviews with the business users.

Product manager Ashley tweeted and example the other day:

 

For even more details and ideas how to use this feature check out Jeff Smith’s post on the feature here.

So what are you waiting for? Go get it today!

Data Modeling is Fun!

Later

Kent
The Oracle Data Warrior

Better Data Modeling: Finding Missing Unique Keys in Oracle #SQLDevModeler

One of the best practices I recommend is to always define unique business keys for every entity (or table) in a model.

It is the only way to really understand what the data in that object represents.

So what do you do when you inherit someone else’s model with hundreds of tables and few (if any) unique keys to be found?

After you reverse engineer it into SDDM (SQL Developer Data Modeler), you could go through the model table by table and look at the properties.

Or, you could look at all the diagrams to look for the the little U’s indicating a column is part of a unique key constraint (assuming there are any diagrams to look at).

Or you could create a Custom Design Rule that checks for you.

So how do you write a design rule that will list all tables with no UKs on them?

Open your design, the go to Tools -> Design Rules -> Custom Rules.

  1. Hit the green Plus sign to add a new rule.
  2. Give it a name (like Missing UKs),
  3. Select Table for the object type,
  4. Mozilla Rhino for the Engine,
  5. Warning for the type, and
  6. Select table as the variable
  7. Past in this code: 
function checkUKs(table){
ruleMessage=””;
if(table.getUKeys().size() == 0){
  ruleMessage=”no UKs”;
  errType=”Problem:”;
  return false;
} else {
  return true;
}
}
checkUKs(table);

Hit Save, then Apply.

The result will be a list of all the tables in your design that do not have any Unique Key Constraints defined.

Now the real work begins – fixing those tables! As you work your way through the model adding the new business keys, you can keep using this report to see which ones you have left, and make sure you don’t miss any.

Get to it my friends!

Kent

The Oracle Data Warrior

P.S. Special thanks to DimitarSlavov  of Oracle for posting the code to answer my question. If you want to see the whole thread go here.

Let’s Review #OOW13 and #OTW13 in Pictures

Yes I have been derelict in my duty and not posted about the sessions I attended at Oracle OpenWorld (#OOW13) and OakTable World (#OTW).

Well here are the high points with pictures!

Monday

Monday started off with the now annual Swim the Bay (so I missed the keynote). If you have Facebook, you can see pictures from the event here.

Most of the day I then spent at the alternate conference, OakTable World (#OTW13) seeing a few talk and giving one myself.

My good friend from Denver, Tim Gorman gave a nice talk about all the data compression options available in Oracle.

Tim Gorman: Oracle Compression Options

Tim Gorman: Oracle Compression Options

Next was a great session from the well known blogger and author Fabian Pascal. I have been reading his work for years but this was the first time I got to hear him speak in person. As with his writing, the talk was both intellectually stimulating and challenging!

Fabian Pascal: The Last Null

Fabian Pascal: The Last Null

It really is quite a debate in the database world about the meaning and use of NULL in an RDBMS. Fabian has a proposal on how we can (and should) represent data in a way where there will never be NULL attributes.

After a some scheduling issues. later in the day, I did my presentation on using Data Vault Modeling for Agile Data Warehouse Modeling. The room I got had a huge wall for me to project my session on. Definitely the biggest screen ever for one of my talks.

Biggest screen ever for me and my data vault presentation.

Biggest screen ever for me and my data vault presentation.

Tuesday

Started the morning with a few friends doing morning Chi Gung in Union Square, then followed by getting a quick survey of the exhibit hall in Moscone South and a trip to the Demo grounds.

The throng descends into the depths of Moscone West to hunt the exhibit hall for goodies.

The throng descends into the depths of Moscone West to hunt the exhibit hall for goodies.

The hall was of course HUGE as usual so some of the vendors who were tucked in back got creative on getting the foot traffic to come their way.

A clever gimmick one vendor did to get traffic to their booth in the gigantic hall

A clever gimmick one vendor did to get traffic to their booth in the gigantic hall

For sessions, I attend a road map session on Oracle’s Big Data strategy given by my friend JP Dijcks.

JP talks all things Big Data

JP talks all things Big Data

Mostly he painted a picture of the issues with figuring out how to collect and put all that data to real work. Of course Oracle has a ton of products to offer to help solve the problem.

How to shrink the gap between getting big data and actually using it!

How to shrink the gap between getting big data and actually using it!

Next up I attended Jeff Smith’s session on SQL Developer 4.0 and got to learn that there was a data mining extension available for the tool that makes doing some advanced analytics a lot easier.

Definition for Data Mining. An extension for Data Mining is available for SQL Developer.

Definition for Data Mining. An extension for Data Mining is available for SQL Developer.

Next on my agenda was the Cloud keynote with Microsoft. I wrote about that here.

Finally for the day, a late presentation by Maria Colgan and Jonathan Lewis giving us their top tuning tips in what they called the SQL Tuning Bootcamp.

Optimizer tips from a pro Jonathan Lewis. I am sure it means something to someone out there!

Optimizer tips from a pro Jonathan Lewis. I am sure it means something to someone out there!

As always with these type session, there was a ton of useful information that makes my brain hurt. I have to keep reviewing  my notes to make sure I can use at least 10% of what they taught.

Wednesday

This was mostly a work day for me at a client site. And a late lunch to see the final race of the America’s Cup.

In case you have been under a rock since last week, Team USA won! It was great to actually be there on Pier 27 during the final race. Not a great vantage point overall but with the big screen to watch and then seeing the boats right after they finished, it was worth the walk.

After the race and a little more data model work at my client’s office, I walked back to the conference to see a final session (for me) given by Gwen Shapira about using solid state disks with Exadata.

I really did not know much about SSDs before this session but feel really educated now. I actually had no idea that SSD and FLASH drives or FLASH memory were the same thing. Guess I was behind on the hardware buzzwords.

Gwen and Mark on Solid State Disk AKA Flash

Gwen and Mark on Solid State Disk AKA Flash

Then it was off to the annual blogger meetup then dinner on the town with friends at The Stinking Rose (thanks Tim!).

I decided to skip the appreciation event this year and take it easy, have a nice dinner, then pack up to head home. Thursday it was breakfast at Lori’s Diner then off to the airport and back home.

As a reminder if you want to see what the buzz was at the events, just check out the hashtags #OOW13 and #OTW13 on twitter (if you had a big data machine you might even be able to generate some insight from those feeds).

Well that’s a wrap for this years big show.

Next up, I will be speaking at the upcoming ECO conference in North Carolina. Should be fun.

Later.

Kent

P.S. If you want to see my OTW presentation, you can find them on Slideshare.

P.P.S. For another great review of OOW13 check out this post by my friend from Turkey, Gurcan. See if you can find my unlabeled cameo in the post.

Post Navigation