The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “data model”

Better Data Models: Early Black Friday Data Model Book Sale!

I am truly thankful for the career I have in Oracle, data modeling and data warehouse design.

And I am thankful for you, my loyal readers and followers!

So, in honor of American Thanksgiving and our crazy Black Friday and Cyber Monday insane shopping addiction, I am putting my Kindle ebook on SALE!

Get my book A Check List for Doing Data Model Design Reviews for 34% off starting 8 AM PST Thanksgiving Day (November 29th).

Just go here on Amazon.com.

This is a limited time sale which ends next week on December 5th.

Get a copy for your favorite data modeler!

Happy Thanksgiving and Happy Shopping!

Kent

P.S. For all my overseas friends attending the #UKOUG_Tech13 event next week, you can go here for a similar sale from Sunday December 1st though December 7th.

Agile Data Warehouse Modeling: How to Build a Virtual Type 2 Slowly Changing Dimension

One of the ongoing complaints about many data warehouse projects is that they take too long to delivery. This is one of the main reasons that many of us have tried to adopt methods and techniques (like SCRUM) from the agile software world to improve our ability to deliver data warehouse components more quickly.

So, what activity takes the bulk of development time in a data warehouse project?

Writing (and testing) the ETL code to move and transform the data can take up to 80% of the project resources and time.

So if we can eliminate, or at least curtail, some of the ETL work, we can deliver useful data to the end user faster.

One way to do that would be to virtualize the data marts.

For several years Dan Linstedt and I have discussed the idea of building virtual data marts on top of a Data Vault modeled EDW.

In the last few years I have floated the idea among the Oracle community. Fellow Oracle ACE Stewart Bryson and I even created a presentation this year (for #RMOUG and #KScope13) on how to do this using the Business Model (meta-layer) in OBIEE (It worked great!).

While doing this with a BI tool is one approach, I like to be able to prototype the solution first using Oracle views (that I build in SQL Developer Data Modeler of course).

The approach to modeling a Type 1 SCD this way is very straight forward.

How to do this easily for a Type 2 SCD has evaded me for years, until now.

Building a Virtual Type 2 SCD (VSCD2)

So how to create a virtual type 2 dimension (that is “Kimball compliant” ) on a Data Vault when you have multiple Satellites on one Hub?

(NOTE: the next part assumes you understand Data Vault Data Modeling. if you don’t, start by reading my free white paper, but better still go buy the Data Vault book on LearnDataVault.com)

Here is how:

Build an insert only PIT (Point-in-Time) table that keeps history. This is sometimes referred to as a historicized PIT tables.  (see the Super Charge book for an explanation of the types of PIT tables)

Add a surrogate Primary Key (PK) to the table. The PK of the PIT table will then serve as the PK for the virtual dimension. This meets the standard for classical star schema design to have a surrogate key on Type 2 SCDs.

To build the VSCD2 you now simply create a view that uses the PIT table to join the Hub and all the Satellites together. Here is an example:

Create view Dim2_Customer (Customer_key, Customer_Number, Customer_Name, Customer_Address, Load_DTS)
as
Select sat_pit.pit_seq, hub.customer_num, sat_1.name, sat_2.address, sat_pit.load_dts
from HUB_CUST hub,        
          SAT_CUST_PIT sat_pit,        
          SAT_CUST_NAME sat_1,        
          SAT_CUST_ADDR sat_2
where  hub.CSID = sat_pit.CSID           
    and hub.CSID = sat_1.CSID           
    and hub.CSID = sat_2.CSID           
    and sat_pit.NAME_LOAD_DTS = sat_1.LOAD_DTS           
    and sat_pit.ADDRESS_LOAD_DTS = sat_2.LOAD_DTS 
 

Benefits of a VSCD2

  1. We can now rapidly demonstrate the contents of a type 2 dim prior to ETL programming
  2. With using PIT tables we don’t need the Load End DTS on the Sats so the Sats become insert only as well (simpler loads, no update pass required)
  3. Another by product is the Sat is now also Hadoop compliant (again insert only)
  4. Since the nullable Load End DTS is not needed, you can now more easily partition the Sat table by Hub Id and Load DTS.

Objections

The main objection to this approach is that the virtual dimension will perform very poorly. While this may be true for very high volumes, or on poorly tuned or resourced databases, I maintain that with today’s evolving hardware appliances  (e.g., Exadata, Exalogic) and the advent of in memory databases, these concerns will soon be a thing of the past.

UPDATE 26-May-2018  – Now 5 years later I have successfully done the above on Oracle. But now we also have Snowflake elastic cloud data warehouse where all the prior constraints are indeed eliminated. With Snowflake you can now easily chose to instantly add compute power if the view is too slow or do the work and processing to materialize the view. (end update)

Worst case, after you have validated the data with your users, you can always turn it into a materialized view or a physical table if you must.

So what do you think? Have you ever tried something like this? Let me know in the comments.

Get virtual, get agile!

Kent

The Data Warrior

P.S. I am giving a talk on Agile Data Warehouse Modeling at the East Coast Oracle Conference this week. If you are there, look me up and we can discuss this post in person!

Better Data Modeling: New and Improved Oracle SQL Developer Data Modeler (#SQLDevModeler)

Yup, my friends at Oracle have been hard at working enhancing what was already the best FREE data modeling tool out there.

They just released SDDM R4 EA3! You can go get it right now: http://www.oracle.com/technetwork/developer-tools/datamodeler/downloads/datamodeler-4ea-downloads-1988443.html

As always there are both new features and bug fixes.

One of the coolest new features is the ability to show entity (or table) comments right on the diagram in the object. This will be very useful for enabling data model reviews with the business users.

Product manager Ashley tweeted and example the other day:

 

For even more details and ideas how to use this feature check out Jeff Smith’s post on the feature here.

So what are you waiting for? Go get it today!

Data Modeling is Fun!

Later

Kent
The Oracle Data Warrior

Better Data Modeling: Are you making these 3 beginner mistakes in your data models?

There are lots of people in the database industry that end up building data models.

Some of them are very educated and well trained and are professional data modelers and data architects. (If that is you, you can probably skip this article)

Others have learned on-the-job with little or no training or education on modeling concepts or techniques. They may be database administrators or even programmers that got asked to produce a model diagram by their boss or project manager, after the project was delivered (but there was no data modeler on the project ever).

This article is for this last group of folks who may want to improve their knowledge or skill in data modeling.

So here are three of the most common mistakes I have seen over the years:

1. Only defining surrogate keys

Do all your tables have a primary key defined and that primary key is a single column integer generated by the system? That is a surrogate key.

Instead of that, you should be defining a natural, or business, key for every table in your system. A natural key is a an attribute or set of attributes (that occur naturally in the data set) required to uniquely identify a row in that table. In addition you should define a Unique Key Constraint on those attributes in the database. Then you can be sure you will not get any duplicate data into the tables.

CLARIFICATION: This point has caused a lot of questions and comments. To be clear, the mistake here is to have ONLY defined a surrogate key. i believe that even if using surrogate keys is the best solution for your design, you should ALSO define an alternate unique natural key.

2. Using Visio or PowerPoint to draw your data model diagrams

This is all too common when no data modeler has been hired for a project or there is a really tight (as in NO) budget. The result is a pretty picture that gets out of date very quickly and can’t help you generate DDL code to build (or rebuild) the database.

Instead of that, you should either invest in a real data modeling tool (like ERWin), or better still, get Oracle’s totally free SQL Developer Data Modeler (my favorite!). The point being with a real data modeling  tool, you can forward and reverse engineer your database tables, make changes, review them, generate DDL, etc.

3. Not reviewing the model with real business people

Build it and they will come does not work!  How do you even know you are using the right terminology or have defined the relationships correctly if you have not talked to a business person about their data and data needs?

Instead of that, you need to involve business users from the very beginning of the project. You need to either set up 1:1 interviews with key stake holders or better yet have group JAD sessions to discuss the project and have them help you define the data model. That is the best way to get buy in to the end result.

Just avoiding these three rookie mistakes can greatly improve your chance of success.

Bonus Tip:  The best way to avoid these and other common rookie mistakes is to use a pre-defined data model review checklist.

You can get a jump-start on your own check list by downloading  my Kindle book A Checklist for Doing Data Model Design Reviews on Amazon.com.

Get it here: http://www.amazon.com/Check-Doing-Design-Reviews-ebook/dp/B008RG9L5E/, or just search Amazon for author Kent Graziano.

Here’s to building better data models! Are you with me?

Kent Graziano

The Data Warrior

Another Quick Tip for SQL Developer Data Modeler

Not sure how I missed this little utility, but I did.

Ever need to quickly build a set of views on a set of tables to create a read-only data layer you could expose to some users or processes?

Well as I am developing a new data warehouse for my current client, we decided to control access by creating a read-only user that would hold views that pointed back to our main data warehouse schema. The BI tool points to the read only schema (for now anyway).

Anyway, under the Tools menu I found a Table to View generator.

Under the Tools menu look for the Wizard to create a view definition based on a table

Under the Tools menu look for the Wizard to create a view definition based on a table

Once you select it from the menu you get a dialog box with all the tables in your model selected. So with one push of a button you have views on all the tables.

Get a list of table to convert and Select or Deselect all

Get a list of table to convert and Select or Deselect all

Then you can edit the views, if needed, using the view query builder.

Or you can select (or de-select) specific tables to build views on.

Even better – the tool applies a naming standard on the output view names (v_<table name>).

On top of this, if you happen to have some views (maybe for testing?) that you want to turn into tables and then populate with an ETL process, right below the first wizard is an option to create a table definition from the view definition.

Now granted most of you can easily do either of these tasks using plain old SQL, but imagine you need to do it for several hundred tables.

This little wizard would save you a ton of time (and testing).

And you will have a documented data model when you are done.

So go give it a try!

Later.

Kent

NB: I am taking a little downtime in early August so don’t look for any new posts until near the end of the month.

Post Navigation