The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the category “Data Modeling”

RI (Referential Integrity) Constraints: 3 Reasons to Include Them in Your Data Warehouse

Over the years, I have had numerous conversations about the value of having referential integrity (RI) constraints, such as primary and foreign keys, in a relational data warehouse or data mart.

Many DBAs object that RI constraints slow the load process. This is a valid point if you are talking about enforced constraints that are checked in real-time during the load. But this is not an issue if you define the constraints as disabled.

Which then leads to this common question:

Is there any reason to maintain a permanently disabled FK in the data model?  If it is not going to be enabled, then from my perspective, it doesn’t make any sense to define the FK.  Instead, the relationship can be described in the comment of the child column.

So, why would I want RI constraints in my data warehouse?

Here are 3 reasons to consider…RI (Referential Integrity) Constraints: 3 Reasons to Include Them in Your Data Warehouse

Model on!

Kent

The Data Warrior

Agile Amped Video – Live from Southern Fried Agile

A few weeks back I had the pleasure of attending and speaking at the Southern Fried Agile event in Charlotte, North Carolina.

It was quite an event with over 700 people in attendance. This is a primarily local event drawing people mostly from the Charlotte area. To say agile is big in that town is an understatement. There were some very large, name brand, companies present. If you are interested in learning more about taking an agile approach to development, I can highly recommend this event which draws excellent speakers (and vendors) from all over. (And the traditional southern fried lunch was pretty darn good too!)

southernfriedlunch

My talk was on applying agile methods to data warehousing projects. I had a great audience of about 50 who wanted to learn how they might approach their DW projects differently.

In addition to giving my talk I got the opportunity to participate in my 2nd Agile Amped interview. This time my interviewer was Leslie from SolutionsIQ. We talked a lot about the rise of agile in the data warehousing space and how new technology solutions like Snowflake’s Cloud-Native Elastic Data Warehouse are helping organizations achieve a more agile deployment cycle.

Here is the entire interview for your viewing pleasure:

If you want to watch more Agile Amped video podcasts you can find them online here.

Enjoy.

Kent

The Data Warrior

Drill to Detail Podcast: Data Modeling, Data Vault, and Snowflake!

My good friend Mark Rittman has embarked on a new adventure as an independent analyst and consultant. As part of his new venture Mark started a new podcast series on iTunes that he calls Drill to Detail where he will feature interviews discussing a range of topics related to data warehousing, business intelligence, analytics, and big data.

I was honored to be asked to take part in this new venture and got to spend a hour with Mark a few weeks back recording what is now Episode 5 of the series. In this interview we talk about:

The podcast is about 60 minutes with each topic being about 20 minutes (so feel free to skip ahead if you are short on time). Please have a listen and let us know what you think in the comments below.

Cheers!

Kent

The Data Warrior

P.S. I will be speaking on these and related topics at a bunch of events over the next few weeks. Check out my speaking schedule and join me in person if you can!

Maintaining disabled FK’s, wisdom or farce?

A while back, I wrote a post about having FKs (foreign keys) in your data warehouse.

Well, a similar question came up recently on an Oracle forum with the above title. It is a fair question and it does surface fairly regularly in a variety of contexts (not just data warehousing).

Of course, as The Data Warrior, I felt is was my duty to respond.

The Question

Is there any reason to maintain a permanently disabled FK in the data model?  I’m not envisioning a reason to do it.  If it is not going to be enabled, then from my perspective, it would not make any sense to have it defined.  If anything, provide the definition of the relationship in the comment of the child column.

My Answer

Yes, by all means keep the FK please!

I see three good reasons for doing so:

  1. It is valuable metadata (& documentation). If somebody reverse engineers the database (say with ERWin or Oracle Data Modeler), the FK shows up in the diagram (way better than having to read a column comment to find out)
    Data Vault 2.0 Example

    A picture is worth a thousand words!

    .

  2. BI Metadata – If you want to use any sort of reporting or BI tool against the database, most tools will import the FK definition with the tables and build the proper join conditions. Way better than having someone guess what the join will be and then manually adding it to the metadata layer in the reporting tool. Examples that can read the Oracle data dictionary include OBIEE, Business Objects, COGNOS, Looker, and many others.(Note here that since the FK is not enforced on the remote databases, you might want to make sure these are treated as outer joins, lest you lose some transaction in the reports).
  3. The Oracle optimizer will use disabled constraints to improve query performance of joins. Again, this is metadata in the data dictionary which the optimizer can read. This is documented in the Oracle Data Warehouse guide and I have validated it on multiple occasions with Oracle product management.

While #3 applies specifically to Oracle, for other databases like MS SQL Server and Snowflake, #1 and #2 still apply.

Even if only one of the above is true for a given database, that, in my opinion, still justifies keeping the disabled constraint around.

Final Answer = Wisdom

What do you think? Feel free to comment below.

And please share on your favorite social media platform!

Model on!

Kent

The Data Warrior

 

One more time: Do we still need Data Modeling?

More specifically do we still need to worry about data modeling in the NoSQL, Hadoop, Big Data, Data Lake, world?

This keeps coming up. Today it was via email after a presentation I gave last week. This time the query was about the place of data modeling tools in this new world order.

Bottom line: YES, YES, YES! We still need to do data modeling and therefore need good data modeling tools and skills.

Snowflake with RI

A picture can say so much!

 

In order to get any business value out of the data, regardless of where or how it is stored, you have to understand the data, right?

That means you have to understand the model of the data. Even if the model (or schema) is not needed upfront to store the data (schema-on-write), you must discern the model in order to use it (schema-on-read).

It is (mostly) impossible to get repeatable, auditable metrics, KPIs, dashboard, or reports that bring value to the business without understanding the semantics of the data – which means you at least need a conceptual or logical model.

And if you want/need to join data from multiple source then you really have to understand each source or there is no way to properly join it all together to get meaningful results.

There are a few data cleansing, discovery,and “virtualization” tools out there that will help you figure out those relationships but they are expensive and mostly rely on standard data profiling techniques to find similar data objects across the sets and propose “relationships”. Some allow for the definition of fairly sophisticated matching rules including customizations. But a human still needs to figures those out, test, and validate the results.

In the end you still have to know your data.

One of the best ways to do that, in my opinion, is to model that data. Otherwise your data lake will likely become a data swamp!

So keep your data modeling tool and keep building your data dictionary with your business folks.

Final Stage Table

A good modeling tool can act as a visual data dictionary too!

If you agree with me, please share on social media!

#LoveYourData

Kent

The Data Warrior

P.S. If you need a good modeling tool, check out Oracle SQL Developer Data Modeler. And check out my books and training offering for SDDM on the blog sidebar.

Post Navigation

%d bloggers like this: