The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “Data Warehouse”

Data Vault and Data Mesh – A Match Made in the Cloud?

Check out my latest thoughts about Data Vault and Data Mesh:

Over the last six years (after I joined Snowflake basically), I have witnessed a massive increase in the interest and implementation of Data Vault 2.0. I have talked to literally hundreds of companies across the globe and across all industries about changing their approach to building an enterprise data platform. It was sort of mind boggling how many folks wanted to speak to me about this. So why, after almost two decades of successful data vault implementations, have so many people “suddenly” got interested in Data Vault?

Well, a few reasons:

1. They are moving to the cloud (in this case, Snowflake) and figured it was time to look at their approach to data warehousing and data lakes.

2. What they had been doing for decades, on critical review, really was not working (i.e., lots of expensive re-engineering all the time) and definitely could not scale.

3. Things are changing so rapidly, they needed to find a way to be more agile.

Read the rest of the post here on the Data Rebels site to find out how Data Vault relates to Data Mesh – Data Vault and Data Mesh

What do you think?

Have a great week!

Kent

The Data Warrior

Life and Times of The Data Warrior

Please join me on March 30th at 11AM CDT for my first official post-retirement talk!

I will be chatting with Mike Lampa from Great Data Minds about my career and plans post-Snowflake. I expect we will cover a wide range of topics including what community and technical evangelism is all about, as well as what made me jump from nearly 30 years in the Oracle world to a small startup with less than 100 people that claimed to have re-invented data warehousing in the cloud.

You can register here for this free event.

See you then!

Kent

The Data Warrior

Mass Data Fragmentation: Reducing ‘Data Puddles’

Shortly before leaving Snowflake last year, I was interviewed for this post about one of the worst case examples of data siloes I had seen – we called them data puddles!

A few years ago, Kent Graziano joined a big organization to work on its data. The first problem was that nobody really knew what and where all the data was. Graziano took his first three months on the job investigating data sources and targets, ultimately creating an enterprise data map to illustrate all the flows. It wasn’t pretty.

“In the end, I discovered that the same data was being sent to three or four places,” he said. In one case raw data was transformed and stored in a data warehouse, then moved from there into another warehouse—which was also pulling in the original raw data.

Graziano, who recently retired from his post as Chief Technical Evangelist at Snowflake, said this scenario is entirely common. Data scattered and copied in lakes, warehouses, data marts, SaaS platforms, spreadsheets, test systems, and more. That’s mass data fragmentation, or, more colloquially, data sprawl or data puddles. 

Indeed, 75% of organizations do not have a complete architecture in place to manage an end-to-end set of data activities including integration, access, governance, and protection, according to IDC’s State of the CDO research, December 2021. This lack of governance combines with legacy systems, shadow IT, and good intentions to pave the road to a lot of fragmentation.

Check out the rest of the post to learn how data sprawl hurts businesses and what to do about it. Read it all here!

Try not to step into any of those puddles!

Kent

The Data Warrior

Snowflake Resources from The Data Warrior

Since you are on this blog, I assume that means you are “following” me in that techie, non-stalker sort of way. 😉

That being the case you are aware of my tenure for the last 5+ years at the hottest data-focused software company in the world – Snowflake. And you know I have produced a lot of content (really – A LOT!). Everything from blogs to ebooks to videos to podcasts – in addition to my usual array of industry and Snowflake sponsored webinars and talks.

But if, like me, you have consumed so much content in the last year or so, you have probably lost track of some of it, right?

Well if you are trying to find something you saw me do (or tweet about doing) but just can’t remember what it was (or when or where) and you want to find it but don’t want to sift through the hundreds of Google search results or the thousands of social media posts I have done, look no further!

I decided to make it easier for you (and me frankly) by putting a pretty comprehensive list together on a permanent page here on my site.

I broke it up into videos/Podcasts, Thought Leader articles, ebooks, and blogs and have included links to all of these so you can quickly get to the content you need when you need it.

So check it out and bookmark the page now, before you forget or lose track of this post too. 🙂

You’re welcome.

Kent

The Data Warrior

Data Engineering Podcast – What is Snowflake?

A few months back I had the privilege of being interviewed by Tobias Macey on his Data Engineering Podcast show. This came about because Tobias actually Tweeted at me about wanting to do the interview! In this episode we spent an hour discussing the ins and outs of the Snowflake Cloud Data Platform. You can find it here. Hope you enjoy it!

Interview Outline

  • How did you get involved in the area of data management?
  • Can you start by explaining what Snowflake is for anyone who isn’t familiar with it?
    • How does it compare to the other available platforms for data warehousing?
    • How does it differ from traditional data warehouses?
      • How does the performance and flexibility affect the data modeling requirements?
  • Snowflake is one of the data stores that is enabling the shift from an ETL to an ELT workflow. What are the features that allow for that approach and what are some of the challenges that it introduces?
  • Can you describe how the platform is architected and some of the ways that it has evolved as it has grown in popularity?
    • What are some of the current limitations that you are struggling with?
  • For someone getting started with Snowflake what is involved with loading data into the platform?
    • What is their workflow for allocating and scaling compute capacity and running analyses?
  • One of the interesting features enabled by your architecture is data sharing. What are some of the most interesting or unexpected uses of that capability that you have seen?
  • What are some other features or use cases for Snowflake that are not as well known or publicized which you think users should know about?
  • When is Snowflake the wrong choice?
  • What are some of the plans for the future of Snowflake?

This is a great podcast series, so you might want to add it to your regular list!

Cheers.

Kent

The Data Warrior & Chief Technical Evangelist at Snowflake

Post Navigation

%d bloggers like this: