The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “#BetterDataModeling”

Better Data Models (redux)

Lately there have been quite a few posts and articles about what is needed for AI to be successful. In the data world, that means high quality data is table stakes. But it’s not just the data in your database. It’s having nailed down the semantic meaning of that data (in business terms) and all the context surrounding that data (including relationships to other data and systems).

It sounds like we need to get back to the basics of good #datamanagment (like #DAMA has taught for decades via their DMBOK and certifications).

And that gets back to having a well curated #datamodel (you must have seen that coming from me!). And that then reminded me that I once wrote a book with a Checklist for Data Model Design Reviews. And it is still available on Amazon! Check it out: https://lnkd.in/gVE4u5FR

By no means is it everything you need to be successful in building a data driven #AI agent or process, but if you have not looked at your data models in years, or (more likely) don’t have one, this will get you started in the right direction to at least ask the questions of your data team to see where you are.

The output from AI is only as good as the information we make available to it. So get going and make sure your data is indeed #AIReady.

NB – the book mentions specific data modeling tools I have used in the past, but the concepts and questions can be applied to any modeling tool you are using today.

Model on!

Kent G

All new Data Vault Alliance Forums are open!

Great news for all fans and followers of #DataVault – Ask DVA forums are up and running!

You can sign up at https://forums.datavaultalliance.com (NB: No longer active, please see my more recent post on the new site.)

These all new forums are the sanctioned voice of DVA (Data Vault Alliance). That means you can expect the answers will be reviewed and in compliance with data vault standards and best practices as taught by Dan Linstedt and all the authorized DV 2.0 CDVP trainers. Hopefully this will help reduce confusion in the market and assist organizations in successfully applying the approach.

Other (important) details:

  • 100% free – to anyone and everyone
  • Anyone that has an account can: reply, or post a new topic.
  • 100% readable by guests – without ever creating an account
  • 100% moderated (by DVA for starters, by a few others that have already agreed to moderate)
  • Members can POST POLLS!!!

And better still, the forums are OPEN to both Business and Technical questions!!

You can expect answers to your toughest data vault questions through examples, best practices, and standards.

So why wait? Hop on now and take a minute to view the forums, sign up / register, and possibly ask or reply to questions.

Enjoy!

Kent

The Data Warrior

Data Vault 2.0 Automation with erwin and Snowflake

I am seeing a HUGE uptick in interest in Data Vault around the globe. Part of the interest is the need for agility in building a modern data platform. One of the benefits of the Data Vault 2.0 method is the repeatable patterns which lend themselves to automation.  I am please to pass on this great new post with details on how to automate building your Data Vault 2.0 architecture on Snowflake using erwin! Thanks to my buddy John Carter at erwin for taking this project on.

The Data Vault methodology can be applied to almost any data store and populated by almost any ETL or ELT data integration tool. As Snowflake Chief Technical Evangelist Kent Graziano mentions in one of his many blog posts, “DV (Data Vault) was developed specifically to address agility, flexibility, and scalability issues found in the other mainstream data modeling approaches used in the data warehousing space.” In other words, it enables you to build a scalable data warehouse that can incorporate disparate data sources over time. Traditional data warehousing typically requires refactoring to integrate new sources, but when implemented correctly, Data Vault 2.0 requires no refactoring.

Successfully implementing a Data Vault solution requires skilled resources and traditionally entails a lot of manual effort to define the Data Vault pipeline and create ETL (or ELT) code from scratch. The entire process can take months or even years, and it is often riddled with errors, slowing down the data pipeline. Automating design changes and the code to process data movement ensures organizations can accelerate development and deployment in a timely and cost-effective manner, speeding the time to value of the data.

Snowflake’s Data Cloud contains all the necessary components for building, populating, and managing Data Vault 2.0 solutions. erwin’s toolset models, maps, and automates the creation, population, and maintenance of Data Vault solutions on Snowflake. The combination of Snowflake and erwin provides an end-to-end solution for a governed Data Vault with powerful performance.

Get the rest of the details here: Data Vault Automation with erwin and Snowflake

Vault away my friends!

Kent

The Data Warrior

Tips for Optimizing the #DataVault Architecture on #Snowflake

Data Vault is an architectural approach that includes a specific data model design pattern and methodology developed specifically to support a modern, agile approach to building an enterprise data warehouse and analytics repository.

Typical Data Vault Design with Hubs, Sats, and a Link

Snowflake Cloud Data Platform was built to be design pattern agnostic. That means you can use it with equal efficiency 3NF models, dimensional (star) schemas, DV, or any hybrid you might have.Snowflake supports DV designs and handles several DV design variations very well with excellent performance.

This series of blog posts will present some tips and recommendations that have evolved over the last few years for implementing a DV-style warehouse in Snowflake.

Here is the first set of tips:Tips for Optimizing the Data Vault Architecture on Snowflake (part 1)

I hope you find this helpful!

Kent

The Data Warrior and Chief Technical Evangelist for Snowflake

RI (Referential Integrity) Constraints: 3 Reasons to Include Them in Your Data Warehouse

Over the years, I have had numerous conversations about the value of having referential integrity (RI) constraints, such as primary and foreign keys, in a relational data warehouse or data mart.

Many DBAs object that RI constraints slow the load process. This is a valid point if you are talking about enforced constraints that are checked in real-time during the load. But this is not an issue if you define the constraints as disabled.

Which then leads to this common question:

Is there any reason to maintain a permanently disabled FK in the data model?  If it is not going to be enabled, then from my perspective, it doesn’t make any sense to define the FK.  Instead, the relationship can be described in the comment of the child column.

So, why would I want RI constraints in my data warehouse?

Here are 3 reasons to consider…RI (Referential Integrity) Constraints: 3 Reasons to Include Them in Your Data Warehouse

Model on!

Kent

The Data Warrior

Post Navigation