The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “Data Modeling”

Better Data Modeling: The best FREE data modeling tool just got better!

Yes, it true Virginia, the is a Santa Claus!

And this year Santa brought you a new, improved version of the best FREE data modeling tool in the known universe: Oracle SQL Developer Data Modeler 4.0.

The team at Oracle went all out this year and produced three (yes three) pre-release versions to make sure all the fixes and new features were rock solid before they called it production.

That is a lot of testing and work.

But worth the effort – they fixed piles of bugs and added dozens of new features.

Oracle product manager Jeff Smith (@thatjeffsmith) has already published several articles highlighting his favorite new features. Check out what he has to say here then go download the new version and give it try.

Let me know what you your favorite new feature is.

Merry Christmas!

The Oracle Christmas Elf

(Kent)

P.S. Without proven methods and standards, even the best tool will not insure you build the best model, so why not increase your chances by giving yourself the gift of knowledge by picking up a copy of my data model checklist book (on sale for a few more hours).

 

Better Data Models: Early Black Friday Data Model Book Sale!

I am truly thankful for the career I have in Oracle, data modeling and data warehouse design.

And I am thankful for you, my loyal readers and followers!

So, in honor of American Thanksgiving and our crazy Black Friday and Cyber Monday insane shopping addiction, I am putting my Kindle ebook on SALE!

Get my book A Check List for Doing Data Model Design Reviews for 34% off starting 8 AM PST Thanksgiving Day (November 29th).

Just go here on Amazon.com.

This is a limited time sale which ends next week on December 5th.

Get a copy for your favorite data modeler!

Happy Thanksgiving and Happy Shopping!

Kent

P.S. For all my overseas friends attending the #UKOUG_Tech13 event next week, you can go here for a similar sale from Sunday December 1st though December 7th.

Agile Data Warehouse Modeling: How to Build a Virtual Type 2 Slowly Changing Dimension

One of the ongoing complaints about many data warehouse projects is that they take too long to delivery. This is one of the main reasons that many of us have tried to adopt methods and techniques (like SCRUM) from the agile software world to improve our ability to deliver data warehouse components more quickly.

So, what activity takes the bulk of development time in a data warehouse project?

Writing (and testing) the ETL code to move and transform the data can take up to 80% of the project resources and time.

So if we can eliminate, or at least curtail, some of the ETL work, we can deliver useful data to the end user faster.

One way to do that would be to virtualize the data marts.

For several years Dan Linstedt and I have discussed the idea of building virtual data marts on top of a Data Vault modeled EDW.

In the last few years I have floated the idea among the Oracle community. Fellow Oracle ACE Stewart Bryson and I even created a presentation this year (for #RMOUG and #KScope13) on how to do this using the Business Model (meta-layer) in OBIEE (It worked great!).

While doing this with a BI tool is one approach, I like to be able to prototype the solution first using Oracle views (that I build in SQL Developer Data Modeler of course).

The approach to modeling a Type 1 SCD this way is very straight forward.

How to do this easily for a Type 2 SCD has evaded me for years, until now.

Building a Virtual Type 2 SCD (VSCD2)

So how to create a virtual type 2 dimension (that is “Kimball compliant” ) on a Data Vault when you have multiple Satellites on one Hub?

(NOTE: the next part assumes you understand Data Vault Data Modeling. if you don’t, start by reading my free white paper, but better still go buy the Data Vault book on LearnDataVault.com)

Here is how:

Build an insert only PIT (Point-in-Time) table that keeps history. This is sometimes referred to as a historicized PIT tables.  (see the Super Charge book for an explanation of the types of PIT tables)

Add a surrogate Primary Key (PK) to the table. The PK of the PIT table will then serve as the PK for the virtual dimension. This meets the standard for classical star schema design to have a surrogate key on Type 2 SCDs.

To build the VSCD2 you now simply create a view that uses the PIT table to join the Hub and all the Satellites together. Here is an example:

Create view Dim2_Customer (Customer_key, Customer_Number, Customer_Name, Customer_Address, Load_DTS)
as
Select sat_pit.pit_seq, hub.customer_num, sat_1.name, sat_2.address, sat_pit.load_dts
from HUB_CUST hub,        
          SAT_CUST_PIT sat_pit,        
          SAT_CUST_NAME sat_1,        
          SAT_CUST_ADDR sat_2
where  hub.CSID = sat_pit.CSID           
    and hub.CSID = sat_1.CSID           
    and hub.CSID = sat_2.CSID           
    and sat_pit.NAME_LOAD_DTS = sat_1.LOAD_DTS           
    and sat_pit.ADDRESS_LOAD_DTS = sat_2.LOAD_DTS 
 

Benefits of a VSCD2

  1. We can now rapidly demonstrate the contents of a type 2 dim prior to ETL programming
  2. With using PIT tables we don’t need the Load End DTS on the Sats so the Sats become insert only as well (simpler loads, no update pass required)
  3. Another by product is the Sat is now also Hadoop compliant (again insert only)
  4. Since the nullable Load End DTS is not needed, you can now more easily partition the Sat table by Hub Id and Load DTS.

Objections

The main objection to this approach is that the virtual dimension will perform very poorly. While this may be true for very high volumes, or on poorly tuned or resourced databases, I maintain that with today’s evolving hardware appliances  (e.g., Exadata, Exalogic) and the advent of in memory databases, these concerns will soon be a thing of the past.

UPDATE 26-May-2018  – Now 5 years later I have successfully done the above on Oracle. But now we also have Snowflake elastic cloud data warehouse where all the prior constraints are indeed eliminated. With Snowflake you can now easily chose to instantly add compute power if the view is too slow or do the work and processing to materialize the view. (end update)

Worst case, after you have validated the data with your users, you can always turn it into a materialized view or a physical table if you must.

So what do you think? Have you ever tried something like this? Let me know in the comments.

Get virtual, get agile!

Kent

The Data Warrior

P.S. I am giving a talk on Agile Data Warehouse Modeling at the East Coast Oracle Conference this week. If you are there, look me up and we can discuss this post in person!

Better Data Modeling: New and Improved Oracle SQL Developer Data Modeler (#SQLDevModeler)

Yup, my friends at Oracle have been hard at working enhancing what was already the best FREE data modeling tool out there.

They just released SDDM R4 EA3! You can go get it right now: http://www.oracle.com/technetwork/developer-tools/datamodeler/downloads/datamodeler-4ea-downloads-1988443.html

As always there are both new features and bug fixes.

One of the coolest new features is the ability to show entity (or table) comments right on the diagram in the object. This will be very useful for enabling data model reviews with the business users.

Product manager Ashley tweeted and example the other day:

 

For even more details and ideas how to use this feature check out Jeff Smith’s post on the feature here.

So what are you waiting for? Go get it today!

Data Modeling is Fun!

Later

Kent
The Oracle Data Warrior

Better Data Modeling: Finding Missing Unique Keys in Oracle #SQLDevModeler

One of the best practices I recommend is to always define unique business keys for every entity (or table) in a model.

It is the only way to really understand what the data in that object represents.

So what do you do when you inherit someone else’s model with hundreds of tables and few (if any) unique keys to be found?

After you reverse engineer it into SDDM (SQL Developer Data Modeler), you could go through the model table by table and look at the properties.

Or, you could look at all the diagrams to look for the the little U’s indicating a column is part of a unique key constraint (assuming there are any diagrams to look at).

Or you could create a Custom Design Rule that checks for you.

So how do you write a design rule that will list all tables with no UKs on them?

Open your design, the go to Tools -> Design Rules -> Custom Rules.

  1. Hit the green Plus sign to add a new rule.
  2. Give it a name (like Missing UKs),
  3. Select Table for the object type,
  4. Mozilla Rhino for the Engine,
  5. Warning for the type, and
  6. Select table as the variable
  7. Past in this code: 
function checkUKs(table){
ruleMessage=””;
if(table.getUKeys().size() == 0){
  ruleMessage=”no UKs”;
  errType=”Problem:”;
  return false;
} else {
  return true;
}
}
checkUKs(table);

Hit Save, then Apply.

The result will be a list of all the tables in your design that do not have any Unique Key Constraints defined.

Now the real work begins – fixing those tables! As you work your way through the model adding the new business keys, you can keep using this report to see which ones you have left, and make sure you don’t miss any.

Get to it my friends!

Kent

The Oracle Data Warrior

P.S. Special thanks to DimitarSlavov  of Oracle for posting the code to answer my question. If you want to see the whole thread go here.

Post Navigation