One more time: Do we still need Data Modeling?
More specifically do we still need to worry about data modeling in the NoSQL, Hadoop, Big Data, Data Lake, world?
This keeps coming up. Today it was via email after a presentation I gave last week. This time the query was about the place of data modeling tools in this new world order.
Bottom line: YES, YES, YES! We still need to do data modeling and therefore need good data modeling tools and skills.
In order to get any business value out of the data, regardless of where or how it is stored, you have to understand the data, right?
That means you have to understand the model of the data. Even if the model (or schema) is not needed upfront to store the data (schema-on-write), you must discern the model in order to use it (schema-on-read).
It is (mostly) impossible to get repeatable, auditable metrics, KPIs, dashboard, or reports that bring value to the business without understanding the semantics of the data – which means you at least need a conceptual or logical model.
And if you want/need to join data from multiple source then you really have to understand each source or there is no way to properly join it all together to get meaningful results.
There are a few data cleansing, discovery,and “virtualization” tools out there that will help you figure out those relationships but they are expensive and mostly rely on standard data profiling techniques to find similar data objects across the sets and propose “relationships”. Some allow for the definition of fairly sophisticated matching rules including customizations. But a human still needs to figures those out, test, and validate the results.
In the end you still have to know your data.
One of the best ways to do that, in my opinion, is to model that data. Otherwise your data lake will likely become a data swamp!
So keep your data modeling tool and keep building your data dictionary with your business folks.
If you agree with me, please share on social media!
The Data Warrior
P.S. If you need a good modeling tool, check out Oracle SQL Developer Data Modeler. And check out my books and training offering for SDDM on the blog sidebar.
Reblogged this on TH TECHNOLOGY and commented:
Yes, we need still need to do data modeling, and yes, data modeling is a skill that developers need to learn and practice.
Ken’s points are important:
— To get business value out of the data, you need to understand the data
— Even if you don’t need a model to store the data, you D need a model t understand the data and use it properly
— Know your data!
I see fewer and fewer data modeling session at conferences these days – dull and boring maybe, but essential for efficient development and data retrieval based on that data.
As ever, wise words. Many industries not only need models to understand the data but details of where the data comes from, who is using it and where it is going. There are often legal requirements to manage and understand your data consumption and use. Data lakes, swamps and canals have leaks, often undetected.
We certainly still need data models and associated meta data to ensure legal compliance, yet alone ease of management.
All the best from the UK.
Schema on read allows you to DEFER the hard work of data modeling, and that is great. But the fact that you can do it later, by no means implies you can skip it. It’s a pity those fried pigeons still don’t come flying straight into your mouth, isn’t it? 🙂
Agreed! Sadly many vendors seem to imply we can skip this step and it helps us deliver faster so CIO/CTO folk start to believe they no longer need to worry about it. But eventually they will get the wakeup call once they need to do something with that data!
If anything, we actually need MORE data modelling, not less. Maybe more sophisticated, maybe less prescriptive (e.g. referential-ISH integrity, relationships with a degree of uncertainty or relationships which are in fact annotations). The pure and simple truth is never pure and rarely simple – life is messy. But that’s no reason to make a bigger mess of it!!
Agreed Paul! Trying to get people to avoid letting there data lake turn into a data swamp where they can’t get any useful data out!
Pingback: Is Data Modelling still important? | Tobias Maasland
first thank you for your retweet. I did not expect that to happen.
I’ve had some discussions about this topic in the past and your post just really fit into it, so I decided to blog about it as well. With all those visualization tools and the NoSQL market somehow this question comes up more and more often. And it is always fun to explain why it is important to understand the data first and THEN use those insights to generate value for the business. Especially with the ever-growing mass of data.
I appreciate your quoting me in your post. We seem to be a small group that continues to preach about the value of data modeling, so I figure we need to stick together and support the each other!
Sound data modelling is like world-class fresh garlic to a lot of these vampiric big data, data lake, pool and puddle initiatives. Then they die when exposed to the cold rational light of good data engineering principles and practice. 🙂
Your post and comments are music to our hears! Checkout our visual data modeling software for NoSQL and multi-model databases at http://hackolade.com