The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “foreign keys”

Maintaining disabled FK’s, wisdom or farce?

A while back, I wrote a post about having FKs (foreign keys) in your data warehouse.

Well, a similar question came up recently on an Oracle forum with the above title. It is a fair question and it does surface fairly regularly in a variety of contexts (not just data warehousing).

Of course, as The Data Warrior, I felt is was my duty to respond.

The Question

Is there any reason to maintain a permanently disabled FK in the data model?  I’m not envisioning a reason to do it.  If it is not going to be enabled, then from my perspective, it would not make any sense to have it defined.  If anything, provide the definition of the relationship in the comment of the child column.

My Answer

Yes, by all means keep the FK please!

I see three good reasons for doing so:

  1. It is valuable metadata (& documentation). If somebody reverse engineers the database (say with ERWin or Oracle Data Modeler), the FK shows up in the diagram (way better than having to read a column comment to find out)
    Data Vault 2.0 Example

    A picture is worth a thousand words!

    .

  2. BI Metadata – If you want to use any sort of reporting or BI tool against the database, most tools will import the FK definition with the tables and build the proper join conditions. Way better than having someone guess what the join will be and then manually adding it to the metadata layer in the reporting tool. Examples that can read the Oracle data dictionary include OBIEE, Business Objects, COGNOS, Looker, and many others.(Note here that since the FK is not enforced on the remote databases, you might want to make sure these are treated as outer joins, lest you lose some transaction in the reports).
  3. The Oracle optimizer will use disabled constraints to improve query performance of joins. Again, this is metadata in the data dictionary which the optimizer can read. This is documented in the Oracle Data Warehouse guide and I have validated it on multiple occasions with Oracle product management.

While #3 applies specifically to Oracle, for other databases like MS SQL Server and Snowflake, #1 and #2 still apply.

Even if only one of the above is true for a given database, that, in my opinion, still justifies keeping the disabled constraint around.

Final Answer = Wisdom

What do you think? Feel free to comment below.

And please share on your favorite social media platform!

Model on!

Kent

The Data Warrior

 

Advertisements

Better Data Modeling: My Top 3 Reasons why you should put Foreign Keys in your Data Warehouse

This question came up at the recent World Wide Data Vault Consortium. Seems there are still many folks who build a data warehouse (or data mart) that do not include FKs in the database.

The usual reason is that it “slows down” load performance.

No surprise there. Been hearing that for years.

And I say one of two things:

1. So what! I need my data to be correct and to come out fast too!

or

2. Show me! How slow is it really?

Keep in mind that while getting the data in quickly is important, so is getting the data out.

Who would you rather have complain – the ETL programmer or the business user trying to run a report?

Yes, it has to be a balance, but you should not immediately dismiss including FKs in your warehouse without considering the options and benefits of those options.

So here are my three main reasons why you should include FK constraints in your Oracle data warehouse database:

  1. The Oracle optimizer uses the constraints to make better decisions on join paths.
  2. Your Data Modeling and BI tools can read the FKs from the data dictionary to create correct joins in the meta data of the tool (SDDM, Erwin, OBIEE, Cognos, Bus Objects can all do this).
  3. It is a good QA check on your ETL. (Yeah, I know… the ETL code is perfect and checks all that stuff, bla, bla, bla)

Now of course there are compromise options. The three main ones are I know:

  1. Drop the constraints at the start of the load then add them back in after the load completes. If any fail to build, that tells you immediately where you may have some data quality problems or your model is wrong (or something else changed).
  2. Build all the constraints as DISABLE NOVALIDATE. This puts them in the database for the BI tools and data modeling tools to see and capture but, since they are not enforced, they put minimal overhead on the load process. And, so I am told by those that know, even a disabled constraint helps the optimizer make a smarter choice on the join path.
  3. (really 2a) Best of both – disable the constraints, load your data, then re-enable the constraints. You get optimization and quality checks.

So NOW what is your reason for not using FKs in your data warehouse?

Happy Modeling!

Kent

Post Navigation

%d bloggers like this: