The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “Data Modeling”

Better Data Models: Early Black Friday Data Model Book Sale!

I am truly thankful for the career I have in Oracle, data modeling and data warehouse design.

And I am thankful for you, my loyal readers and followers!

So, in honor of American Thanksgiving and our crazy Black Friday and Cyber Monday insane shopping addiction, I am putting my Kindle ebook on SALE!

Get my book A Check List for Doing Data Model Design Reviews for 34% off starting 8 AM PST Thanksgiving Day (November 29th).

Just go here on Amazon.com.

This is a limited time sale which ends next week on December 5th.

Get a copy for your favorite data modeler!

Happy Thanksgiving and Happy Shopping!

Kent

P.S. For all my overseas friends attending the #UKOUG_Tech13 event next week, you can go here for a similar sale from Sunday December 1st though December 7th.

Agile Data Warehouse Modeling: How to Build a Virtual Type 2 Slowly Changing Dimension

One of the ongoing complaints about many data warehouse projects is that they take too long to delivery. This is one of the main reasons that many of us have tried to adopt methods and techniques (like SCRUM) from the agile software world to improve our ability to deliver data warehouse components more quickly.

So, what activity takes the bulk of development time in a data warehouse project?

Writing (and testing) the ETL code to move and transform the data can take up to 80% of the project resources and time.

So if we can eliminate, or at least curtail, some of the ETL work, we can deliver useful data to the end user faster.

One way to do that would be to virtualize the data marts.

For several years Dan Linstedt and I have discussed the idea of building virtual data marts on top of a Data Vault modeled EDW.

In the last few years I have floated the idea among the Oracle community. Fellow Oracle ACE Stewart Bryson and I even created a presentation this year (for #RMOUG and #KScope13) on how to do this using the Business Model (meta-layer) in OBIEE (It worked great!).

While doing this with a BI tool is one approach, I like to be able to prototype the solution first using Oracle views (that I build in SQL Developer Data Modeler of course).

The approach to modeling a Type 1 SCD this way is very straight forward.

How to do this easily for a Type 2 SCD has evaded me for years, until now.

Building a Virtual Type 2 SCD (VSCD2)

So how to create a virtual type 2 dimension (that is “Kimball compliant” ) on a Data Vault when you have multiple Satellites on one Hub?

(NOTE: the next part assumes you understand Data Vault Data Modeling. if you don’t, start by reading my free white paper, but better still go buy the Data Vault book on LearnDataVault.com)

Here is how:

Build an insert only PIT (Point-in-Time) table that keeps history. This is sometimes referred to as a historicized PIT tables.  (see the Super Charge book for an explanation of the types of PIT tables)

Add a surrogate Primary Key (PK) to the table. The PK of the PIT table will then serve as the PK for the virtual dimension. This meets the standard for classical star schema design to have a surrogate key on Type 2 SCDs.

To build the VSCD2 you now simply create a view that uses the PIT table to join the Hub and all the Satellites together. Here is an example:

Create view Dim2_Customer (Customer_key, Customer_Number, Customer_Name, Customer_Address, Load_DTS)
as
Select sat_pit.pit_seq, hub.customer_num, sat_1.name, sat_2.address, sat_pit.load_dts
from HUB_CUST hub,        
          SAT_CUST_PIT sat_pit,        
          SAT_CUST_NAME sat_1,        
          SAT_CUST_ADDR sat_2
where  hub.CSID = sat_pit.CSID           
    and hub.CSID = sat_1.CSID           
    and hub.CSID = sat_2.CSID           
    and sat_pit.NAME_LOAD_DTS = sat_1.LOAD_DTS           
    and sat_pit.ADDRESS_LOAD_DTS = sat_2.LOAD_DTS 
 

Benefits of a VSCD2

  1. We can now rapidly demonstrate the contents of a type 2 dim prior to ETL programming
  2. With using PIT tables we don’t need the Load End DTS on the Sats so the Sats become insert only as well (simpler loads, no update pass required)
  3. Another by product is the Sat is now also Hadoop compliant (again insert only)
  4. Since the nullable Load End DTS is not needed, you can now more easily partition the Sat table by Hub Id and Load DTS.

Objections

The main objection to this approach is that the virtual dimension will perform very poorly. While this may be true for very high volumes, or on poorly tuned or resourced databases, I maintain that with today’s evolving hardware appliances  (e.g., Exadata, Exalogic) and the advent of in memory databases, these concerns will soon be a thing of the past.

UPDATE 26-May-2018  – Now 5 years later I have successfully done the above on Oracle. But now we also have Snowflake elastic cloud data warehouse where all the prior constraints are indeed eliminated. With Snowflake you can now easily chose to instantly add compute power if the view is too slow or do the work and processing to materialize the view. (end update)

Worst case, after you have validated the data with your users, you can always turn it into a materialized view or a physical table if you must.

So what do you think? Have you ever tried something like this? Let me know in the comments.

Get virtual, get agile!

Kent

The Data Warrior

P.S. I am giving a talk on Agile Data Warehouse Modeling at the East Coast Oracle Conference this week. If you are there, look me up and we can discuss this post in person!

Better Data Modeling: New and Improved Oracle SQL Developer Data Modeler (#SQLDevModeler)

Yup, my friends at Oracle have been hard at working enhancing what was already the best FREE data modeling tool out there.

They just released SDDM R4 EA3! You can go get it right now: http://www.oracle.com/technetwork/developer-tools/datamodeler/downloads/datamodeler-4ea-downloads-1988443.html

As always there are both new features and bug fixes.

One of the coolest new features is the ability to show entity (or table) comments right on the diagram in the object. This will be very useful for enabling data model reviews with the business users.

Product manager Ashley tweeted and example the other day:

 

For even more details and ideas how to use this feature check out Jeff Smith’s post on the feature here.

So what are you waiting for? Go get it today!

Data Modeling is Fun!

Later

Kent
The Oracle Data Warrior

Better Data Modeling: Finding Missing Unique Keys in Oracle #SQLDevModeler

One of the best practices I recommend is to always define unique business keys for every entity (or table) in a model.

It is the only way to really understand what the data in that object represents.

So what do you do when you inherit someone else’s model with hundreds of tables and few (if any) unique keys to be found?

After you reverse engineer it into SDDM (SQL Developer Data Modeler), you could go through the model table by table and look at the properties.

Or, you could look at all the diagrams to look for the the little U’s indicating a column is part of a unique key constraint (assuming there are any diagrams to look at).

Or you could create a Custom Design Rule that checks for you.

So how do you write a design rule that will list all tables with no UKs on them?

Open your design, the go to Tools -> Design Rules -> Custom Rules.

  1. Hit the green Plus sign to add a new rule.
  2. Give it a name (like Missing UKs),
  3. Select Table for the object type,
  4. Mozilla Rhino for the Engine,
  5. Warning for the type, and
  6. Select table as the variable
  7. Past in this code: 
function checkUKs(table){
ruleMessage=””;
if(table.getUKeys().size() == 0){
  ruleMessage=”no UKs”;
  errType=”Problem:”;
  return false;
} else {
  return true;
}
}
checkUKs(table);

Hit Save, then Apply.

The result will be a list of all the tables in your design that do not have any Unique Key Constraints defined.

Now the real work begins – fixing those tables! As you work your way through the model adding the new business keys, you can keep using this report to see which ones you have left, and make sure you don’t miss any.

Get to it my friends!

Kent

The Oracle Data Warrior

P.S. Special thanks to DimitarSlavov  of Oracle for posting the code to answer my question. If you want to see the whole thread go here.

Data Modeling for Fun and Profit: Are you ready to take the Database Design Challenge?

Are you up to publicly testing your database design chops?

Want to improve your street cred?

If so, then read on…

Relational databases form the backbone of thousands, if not millions, of applications and systems around the globe. A key part of building these applications is designing and implementing the data structures they use.

(Well a few of us “old school” guys still think so no matter what the anyone else thinks!)

Proper table design can mean the difference between a scalable, high performing database that is a joy to query and an unfathomable, unsupportable mess that makes your brain melt.

Given the importance of these databases, understanding good data modelling techniques and physical implementation methods are essential skills for data and system architects, database administrators and developers creating database applications.

Building on the hugely successful SQL and PL/SQL quizzes already available at the PL/SQL Challenge, the new weekly Database Design Quiz kicks off on Saturday, October 5, 2013 to help you build these skills. The quiz will cover many areas of database modeling and design, from logical design all the way to physical database design, including topics such as:

  • Normalization – ensuring you have high quality data
  • Referential integrity – saving you the time and effort of writing your own application-based constraints
  • Indexing – enabling you to write fast and efficient queries

Whether you’re an experienced data modeler or completely new to relational databases, the weekly Database Design Quiz offers you the opportunity to both learn new approaches and show off your expertise. It will teach techniques that you can use to improve the quality for your work and impress future employers with your achievements.

This weekly quiz is managed by Chris Saxon, who has been playing the PL/SQL Challenge since August 2010 and placed second in the most recent PL/SQL Championship. More to the point of this quiz, however, he is also a database technologist with 10 years experience designing and building Oracle database applications. He currently works as the Data Architect for the airline Flybe, a role which sees him creating the data structures for the flybe.com database and the company’s enterprise data warehouse. He also runs the blog www.sqlfail.com, a project to explain database concepts and other topics of interest using just SQL and PL/SQL.

Registering is quick, easy and free. If you’re not already a member of the PL/SQL Challenge, then head to www.plsqlchallenge.com and sign up for a free account.

Let the Database Design competition commence!

Show me what ya got!

Kent

P.S. I have been helping Chris a little by reviewing the questions and I can tell you this quiz will be a real challenge even for those of you with years of experience. So get on over and sign up today (there will be prizes): www.plsqlchallenge.com

Post Navigation