The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “#bigdata”

One more time: Do we still need Data Modeling?

More specifically do we still need to worry about data modeling in the NoSQL, Hadoop, Big Data, Data Lake, world?

This keeps coming up. Today it was via email after a presentation I gave last week. This time the query was about the place of data modeling tools in this new world order.

Bottom line: YES, YES, YES! We still need to do data modeling and therefore need good data modeling tools and skills.

Snowflake with RI

A picture can say so much!

 

In order to get any business value out of the data, regardless of where or how it is stored, you have to understand the data, right?

That means you have to understand the model of the data. Even if the model (or schema) is not needed upfront to store the data (schema-on-write), you must discern the model in order to use it (schema-on-read).

It is (mostly) impossible to get repeatable, auditable metrics, KPIs, dashboard, or reports that bring value to the business without understanding the semantics of the data – which means you at least need a conceptual or logical model.

And if you want/need to join data from multiple source then you really have to understand each source or there is no way to properly join it all together to get meaningful results.

There are a few data cleansing, discovery,and “virtualization” tools out there that will help you figure out those relationships but they are expensive and mostly rely on standard data profiling techniques to find similar data objects across the sets and propose “relationships”. Some allow for the definition of fairly sophisticated matching rules including customizations. But a human still needs to figures those out, test, and validate the results.

In the end you still have to know your data.

One of the best ways to do that, in my opinion, is to model that data. Otherwise your data lake will likely become a data swamp!

So keep your data modeling tool and keep building your data dictionary with your business folks.

Final Stage Table

A good modeling tool can act as a visual data dictionary too!

If you agree with me, please share on social media!

#LoveYourData

Kent

The Data Warrior

P.S. If you need a good modeling tool, check out Oracle SQL Developer Data Modeler. And check out my books and training offering for SDDM on the blog sidebar.

Introduction to Snowflake!

Heads up! I will be giving a webinar next week, called Enabling Cloud-Native Elastic Data Warehousing to introduce folks to the Snowflake Elastic Data Warehouse. Sign up here and join me on July 12th!

Special thanks to DAMA International for inviting me to do this!

See you there!

Kent

The Data Warrior

Forecast: Partly Cloudy — Connecting #OBIEE to @SnowflakeDB

My good friends at RedPill Analytics have done it again! In their never ending mission to #ChallengeEverything, they thought it would be cool to try to connect OBIEE (Oracle Business Intelligence Enterprise Edition) to the Snowflake Elastic Data Warehouse as a way to give OBIEE users access to a high performance data warehouse cloud service. This is their story:

If you’re reading this post, you’ve probably seen my other post on connecting OBIEE to Amazon Web Services’ Redshift database offering. If not, thanks for joining in for this part of the adventure. To summarize the last post, cloud applications are becoming increasingly more common, and because of this fact, more of the data that is produced and consumed will be in the cloud. How can we connect on-premises business intelligence tools to these cloud sources? Specifically, how can we connect OBIEE to them?

See the details on how to connect to Snowflake here:

Forecast: Partly Cloudy (Part 2) — Connecting OBIEE to Snowflake

Here comes the sun!

Kent

The Data Warrior

#Kscope16 Blog Hop: #BigData and #AdvancedAnalytics Sessions Not to Miss

You are attending #KScope16 right?

Me too.

But there are so many sessions to choose from (mine included), which do you pick? How do you pick?

Well, I (and my fellow bloggers) are here to help you out with a Blog Hop. We are going to give you our top picks for for each track. In this post, I will give you my picks for the Big Data and Advanced Analytics track.

Big Data and Advanced Analytics Sessions

Why did I pick that this track? Really because it is a necessary adjunct to BI and Data Warehousing. In fact I find it hard to imagine that these two really won’t merge over the next few years (at my company, Snowflake, it really has already). Every company that is investing in BI/DW is also finding that they need to deal with Big Data too. And Advanced Analytics is, to me, the logical extension to BI.

So after looking at the agenda, really most of the sessions are of interest to me (sigh). But in reality I am sure I will not be able to attend them all, so here are my top 5 picks to see at KScope16:

  1. How to Build an Internet of Things Data Pipeline presented by Rex Eng
  2. Oracle Big Data Discovery: Extending into Machine Learning and Advanced Visualizations presented by Mark Rittman
  3. Introduction to Apache Kafka and Real-Time ETL presented by Gwen Shapira
  4. Getting Started with a Data Discovery Lab: You Don’t Have to Go Big to Gain Big presented by Kathryn Watson
  5. Getting Started with Oracle R and OBIEE presented by Kevin McGinley
 Why those? Simply because they hit on all the top issues and topics that see being discussed (or written about) in the field, and I need to get a better grip on these things:
  • IoT – it is here already
  • Machine Learning – I am pretty clueless about this one so far
  • Kafka – ETL/ELT in the cloud
  • Data Discovery – the next step beyond BI
  • R – the language of choice for data scientists

And I actually know all of but one of the presenters, so am sure they will be very informative and lively talks.

The rest of the blog hop:

Thanks for attending this ODTUG blog hop!

Looking for some other juicy cross-track sessions to make your Kscope16 experience more educational? Check out the following session recommendations from fellow experts!

I hope this gives you some great ideas on what to see at KScope16!

See you in Chicago.

Kent

The Data Warrior

P.S. Don’t forget to make time to attend my Morning Chi Gung sessions down by the river to get each day started right with a clear mind and strong heart. Look for signs at the hotel.

 

Support for Multiple Workloads in Snowflake DB

Excerpt from my most recent post on the Snowflake blog:

With our unique Multi-cluster, shared data architecture, Snowflake can easily support multiple and disparate workloads.  This is a common issue in traditional data warehouses so it makes total sense to be able to keep disparate workloads separate, to truly avoid resource contention, rather than just saying we support “mixed” workloads.

Read the rest here: Support for Multiple Workloads

 

Love your data!

Kent

The Data Warrior

p.s. Don’t forget to join me a #WWDVC May 25-28 in Stowe, Vermont. I will be giving away a GoPro! after my talk on Thursday.

Post Navigation

%d bloggers like this: