The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “Data Warehouse”

Life and Times of The Data Warrior

Please join me on March 30th at 11AM CDT for my first official post-retirement talk!

I will be chatting with Mike Lampa from Great Data Minds about my career and plans post-Snowflake. I expect we will cover a wide range of topics including what community and technical evangelism is all about, as well as what made me jump from nearly 30 years in the Oracle world to a small startup with less than 100 people that claimed to have re-invented data warehousing in the cloud.

You can register here for this free event.

See you then!

Kent

The Data Warrior

Mass Data Fragmentation: Reducing ‘Data Puddles’

Shortly before leaving Snowflake last year, I was interviewed for this post about one of the worst case examples of data siloes I had seen – we called them data puddles!

A few years ago, Kent Graziano joined a big organization to work on its data. The first problem was that nobody really knew what and where all the data was. Graziano took his first three months on the job investigating data sources and targets, ultimately creating an enterprise data map to illustrate all the flows. It wasn’t pretty.

“In the end, I discovered that the same data was being sent to three or four places,” he said. In one case raw data was transformed and stored in a data warehouse, then moved from there into another warehouse—which was also pulling in the original raw data.

Graziano, who recently retired from his post as Chief Technical Evangelist at Snowflake, said this scenario is entirely common. Data scattered and copied in lakes, warehouses, data marts, SaaS platforms, spreadsheets, test systems, and more. That’s mass data fragmentation, or, more colloquially, data sprawl or data puddles. 

Indeed, 75% of organizations do not have a complete architecture in place to manage an end-to-end set of data activities including integration, access, governance, and protection, according to IDC’s State of the CDO research, December 2021. This lack of governance combines with legacy systems, shadow IT, and good intentions to pave the road to a lot of fragmentation.

Check out the rest of the post to learn how data sprawl hurts businesses and what to do about it. Read it all here!

Try not to step into any of those puddles!

Kent

The Data Warrior

Snowflake Resources from The Data Warrior

Since you are on this blog, I assume that means you are “following” me in that techie, non-stalker sort of way. 😉

That being the case you are aware of my tenure for the last 5+ years at the hottest data-focused software company in the world – Snowflake. And you know I have produced a lot of content (really – A LOT!). Everything from blogs to ebooks to videos to podcasts – in addition to my usual array of industry and Snowflake sponsored webinars and talks.

But if, like me, you have consumed so much content in the last year or so, you have probably lost track of some of it, right?

Well if you are trying to find something you saw me do (or tweet about doing) but just can’t remember what it was (or when or where) and you want to find it but don’t want to sift through the hundreds of Google search results or the thousands of social media posts I have done, look no further!

I decided to make it easier for you (and me frankly) by putting a pretty comprehensive list together on a permanent page here on my site.

I broke it up into videos/Podcasts, Thought Leader articles, ebooks, and blogs and have included links to all of these so you can quickly get to the content you need when you need it.

So check it out and bookmark the page now, before you forget or lose track of this post too. 🙂

You’re welcome.

Kent

The Data Warrior

Data Engineering Podcast – What is Snowflake?

A few months back I had the privilege of being interviewed by Tobias Macey on his Data Engineering Podcast show. This came about because Tobias actually Tweeted at me about wanting to do the interview! In this episode we spent an hour discussing the ins and outs of the Snowflake Cloud Data Platform. You can find it here. Hope you enjoy it!

Interview Outline

  • How did you get involved in the area of data management?
  • Can you start by explaining what Snowflake is for anyone who isn’t familiar with it?
    • How does it compare to the other available platforms for data warehousing?
    • How does it differ from traditional data warehouses?
      • How does the performance and flexibility affect the data modeling requirements?
  • Snowflake is one of the data stores that is enabling the shift from an ETL to an ELT workflow. What are the features that allow for that approach and what are some of the challenges that it introduces?
  • Can you describe how the platform is architected and some of the ways that it has evolved as it has grown in popularity?
    • What are some of the current limitations that you are struggling with?
  • For someone getting started with Snowflake what is involved with loading data into the platform?
    • What is their workflow for allocating and scaling compute capacity and running analyses?
  • One of the interesting features enabled by your architecture is data sharing. What are some of the most interesting or unexpected uses of that capability that you have seen?
  • What are some other features or use cases for Snowflake that are not as well known or publicized which you think users should know about?
  • When is Snowflake the wrong choice?
  • What are some of the plans for the future of Snowflake?

This is a great podcast series, so you might want to add it to your regular list!

Cheers.

Kent

The Data Warrior & Chief Technical Evangelist at Snowflake

The Snowflake Data Sharehouse. Wow!

Data Sharing for All Your Data

They say the Internet changed everything…

Then Big Data changed everything…

Then the Cloud changed everything…

Well my friends, Snowflake‘s announcement of its new data sharing feature has changed the game again! Your data warehouse in the cloud can now be a data sharehouse.

Building on all these technology evolutions, Snowflake has taken what we can now do with big data in a cloud-native data warehouse to whole new level by introducing, what I like to think of as Data Sharing as a Service (DSaaS).

This may be my new #1 favorite feature of Snowflake.

What is Snowflake Data Sharing?

Snowflake Data Sharing is a new feature that lets you easily, seamlessly, and securely, share tables, views, even entire databases with anyone inside the Snowflake ecosystem, in a read only mode. They can then query the data from within their own Snowflake account and even join it to their own internal data as if it was all in their database.

Snowflake Data Sharing architecture

That means no more needed to reformat and export data to flat files so they can be transmitted (via secure FTP or some other transfer protocol) to then be loaded into your customer’s or partner’s database.

All that time and effort – gone!

Data extraction process – gone!

Data movement – gone!

Data latency – gone!

Extra storage – gone!

You create your database, load the data, then share the data. And once the data object is shared, as you add more data or update the data set, those changes are immediately available for the data consumers to query. No more wasted time waiting for an incremental update file to be built and transmitted.

And you have complete control on who sees what data. In fact you can revoke anyones access instantly with a single command.

Oh – did I mention that the new feature is FREE to all Snowflake customers. It is built into the standard edition! (That’s just crazy!)

How does it work?

The reason that only Snowflake can do this is because of its unique multi-cluster, shared data architecture that completely separates compute resources from storage. That is why the data can be stored once (by the data provider) and then be shared to an unlimited number of data consumers. The global meta data and security services in Snowflake’s cloud services layer are key components that allow sharing to be not only fast but secure. With independent compute clusters (i.e., virtual warehouses), data consumers can use whatever amount of compute they require to query and use the shared data without impact on either the data provider or other data consumers.

So the basic process for data sharing is simple:

  1. Data Provider creates a share container with the objects (databases, schemas, tables, or views) to be shared.
  2. Data Provider then grants a Data Consumer account access to the share.
  3. Data Consumer creates new database that maps to the shared object(s).
  4. Data Consumer then grants access privileges to a role in their account
  5. Data Consumer starts querying, using the privileged role and their virtual warehouse.

Snowflake Data Sharing setup

Code examples:

Data Provider code:

Here is a scenario where the data provider wants to share just a single table in a database to several accounts. This approach allows the provider to verify the configuration and contents of the share before making it visible to other accounts (this is the recommended approach).

CREATE SHARE sales_s1; -- create an empty share

GRANT USAGE on DATABASE sales to SHARE sales_s1; -- add database

GRANT USAGE on SCHEMA sales.east to SHARE sales_s1; -- add schema

GRANT SELECT on TABLE sales.east.new_orders 
             to SHARE sales_s1; -- add table

SHOW SHARES;

ALTER SHARE sales_s1 ADD ACCOUNTS=a1, a2, a3; -- add accounts

Data Consumer code:

On the consumer side, each account would create a database from the share sales_s1, then grant access to the new database in order to access the table NEW_ORDERS.

CREATE DATABASE External_SalesData from SHARE ProviderAcct1.sales_s1;

GRANT IMPORTED PRIVILEGES on DATABASE External_SalesData to MyRole;

Security – Revoking a Share

If for some reason a Data Provider needs to stop sharing their data either to a single account or to everyone, that is also easy to do. They can either REVOKE the privileges granted or completely DROP the share.

REVOKE SELECT ON TABLE sales.east.new_orders
  FROM SHARE sales_s1;

or just

DROP SHARE sales_s1;

Unlimited Possibilities for the New Data Economy

So, how can your business change and grow with this capability (that costs you nothing)? Do you have partners that have wanted access to your data but found it too difficult to engineer that data pipeline? Is there a market for your data, and the insights it provides, that you have not even explored?

This feature redefines the old Data Warehouse into a modern Data Sharehouse that lets you derive even more value from all your data – with no limits.

With Snowflake Data Sharing, you can now transform your data into a valuable, strategic business asset.

For More Information

For more details on Snowflake Data Sharing, check out these posts:

https://www.snowflake.net/data-sharehouse-brings-forth-new-market/

https://www.snowflake.net/data-sharehouse/

Then download the free ebook “From Data Warehouse to Data Sharehouse” for an even more in-depth look at Snowflake Data Sharing

And signup for the live webinar “A Deeper Look at Data Sharing” coming next week.

So what do you think? How could this change your business?

Cheers.

Kent

The Data Warrior

Post Navigation

%d bloggers like this: