The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “#NoSQL”

One more time: Do we still need Data Modeling?

More specifically do we still need to worry about data modeling in the NoSQL, Hadoop, Big Data, Data Lake, world?

This keeps coming up. Today it was via email after a presentation I gave last week. This time the query was about the place of data modeling tools in this new world order.

Bottom line: YES, YES, YES! We still need to do data modeling and therefore need good data modeling tools and skills.

Snowflake with RI

A picture can say so much!


In order to get any business value out of the data, regardless of where or how it is stored, you have to understand the data, right?

That means you have to understand the model of the data. Even if the model (or schema) is not needed upfront to store the data (schema-on-write), you must discern the model in order to use it (schema-on-read).

It is (mostly) impossible to get repeatable, auditable metrics, KPIs, dashboard, or reports that bring value to the business without understanding the semantics of the data – which means you at least need a conceptual or logical model.

And if you want/need to join data from multiple source then you really have to understand each source or there is no way to properly join it all together to get meaningful results.

There are a few data cleansing, discovery,and “virtualization” tools out there that will help you figure out those relationships but they are expensive and mostly rely on standard data profiling techniques to find similar data objects across the sets and propose “relationships”. Some allow for the definition of fairly sophisticated matching rules including customizations. But a human still needs to figures those out, test, and validate the results.

In the end you still have to know your data.

One of the best ways to do that, in my opinion, is to model that data. Otherwise your data lake will likely become a data swamp!

So keep your data modeling tool and keep building your data dictionary with your business folks.

Final Stage Table

A good modeling tool can act as a visual data dictionary too!

If you agree with me, please share on social media!



The Data Warrior

P.S. If you need a good modeling tool, check out Oracle SQL Developer Data Modeler. And check out my books and training offering for SDDM on the blog sidebar.

Oracle Open World 2012: Day 2

A crazy group of us got the day off to a rip roaring start by heading down to the Dolphin Club on San Francisco Bay for an early morning swim in the bay. We cabbed it down and were greeted by a full moon over the Golden Gate Bridge.

Good Morning San Francisco Bay!

It was about 62 degrees (F) and the water was purported to be 59 degrees (F), even if it felt much colder. Here I am with my friend Debra Lilley (@debralilley) and few other crazy Oracle people, right before we took the plunge.

Ready to swim the bay!

Well, we all survived. You can see more pictures on twitter (@kentgraziano) and Facebook if you really feel the need.

Thanks to our buddy Chet (@oraclenerd) for setting this up and shaming us into doing it. It was quite invigorating and great way to start the day. (I did some Chi Gung on the beach too in order to prepare and to warm back up). Next year, we want ribbons or t-shirts or something for the effort.

After trekking back to the hotel and cleaning up, it was on to the conference.

Missed the keynotes but instead got to attend a real live press briefing (thanks to my blogger status) with Mark Hurd, the president of Oracle Corporation.

Mark Hurd Press Conference

It was great to be in the small room with all the reporters and bloggers getting the scope first hand from Mr. Hurd about Oracle’s strategy.

Oracle Strategy

The slide sums it up well – simplify IT by providing a complete stack of software and hardware and by giving customers complete choice. The choice now is host your own set up, use a private cloud, use a public cloud (hosted by Oracle), or use a hybrid model. You pick where and what you want hosted. You can mix and match and change your mind later. Sounds like a good idea. The next year or so will show us how will it works as a model.

Mark had a lot to tell us, much of which you will be able to read elsewhere in the main stream tech media. The thing that got my attention was the fact that Oracle  has spent over $14 billion (with a B) in the last two years on R&D, and over $6 billion in the last year on mergers and acquisitions. It is good to be Uncle Larry.

Most memorable quote from Mark: “we are the best”.

In other news…

The exhibit halls opened today. Bigger and more stuffed than before . A dizzying array of vendors hawking their wares. This year there is even an Airstream trailer and a very large Buddha in the hall (check my twitter stream for pictures of those).

OOW 2012 Exhibit Hall Opens

I spent some time in the hall today catching up with some product managers, learning about Oracle NoSQL, and even talked to the MongoDB guys (another NoSQL engine). So much information, so little time.

Went to a few sessions as well. Checked out Big Data Mining and RDF Graph Tools. Still trying to get my head around why you use these other technology approaches like RDF, NoSQL, and Hadoop. Spatial I get, since I did some GIS work in the past, but these others are harder.  Lots of companies seem to be including them in their overall solution architecture so there is something to it. I think I just have not run across a real need on my projects (at least not yet).

Like Oracle Endeca, there are a lot of advances in what Oracle is building in this space.

I am sure it will sink in eventually.

Oracle In-Database Analytics Platform

On the networking side, I attended the ODTUG reception this evening and manage to hear the last two tunes from Jimmy Cliff who was performing at the Oracle Music festival.

Tomorrow – chi gung in the morning, two keynotes and a few sessions. Then the first every Tweet Meet at OOW.

More to come…




Post Navigation

%d bloggers like this: