More specifically do we still need to worry about data modeling in the NoSQL, Hadoop, Big Data, Data Lake, world?
This keeps coming up. Today it was via email after a presentation I gave last week. This time the query was about the place of data modeling tools in this new world order.
Bottom line: YES, YES, YES! We still need to do data modeling and therefore need good data modeling tools and skills.
A picture can say so much!
In order to get any business value out of the data, regardless of where or how it is stored, you have to understand the data, right?
That means you have to understand the model of the data. Even if the model (or schema) is not needed upfront to store the data (schema-on-write), you must discern the model in order to use it (schema-on-read).
It is (mostly) impossible to get repeatable, auditable metrics, KPIs, dashboard, or reports that bring value to the business without understanding the semantics of the data – which means you at least need a conceptual or logical model.
And if you want/need to join data from multiple source then you really have to understand each source or there is no way to properly join it all together to get meaningful results.
There are a few data cleansing, discovery,and “virtualization” tools out there that will help you figure out those relationships but they are expensive and mostly rely on standard data profiling techniques to find similar data objects across the sets and propose “relationships”. Some allow for the definition of fairly sophisticated matching rules including customizations. But a human still needs to figures those out, test, and validate the results.
In the end you still have to know your data.
One of the best ways to do that, in my opinion, is to model that data. Otherwise your data lake will likely become a data swamp!
So keep your data modeling tool and keep building your data dictionary with your business folks.
A good modeling tool can act as a visual data dictionary too!
If you agree with me, please share on social media!
The Data Warrior
P.S. If you need a good modeling tool, check out Oracle SQL Developer Data Modeler. And check out my books and training offering for SDDM on the blog sidebar.