The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “Data Warehouse”

Data Engineering Podcast – What is Snowflake?

A few months back I had the privilege of being interviewed by Tobias Macey on his Data Engineering Podcast show. This came about because Tobias actually Tweeted at me about wanting to do the interview! In this episode we spent an hour discussing the ins and outs of the Snowflake Cloud Data Platform. You can find it here. Hope you enjoy it!

Interview Outline

  • How did you get involved in the area of data management?
  • Can you start by explaining what Snowflake is for anyone who isn’t familiar with it?
    • How does it compare to the other available platforms for data warehousing?
    • How does it differ from traditional data warehouses?
      • How does the performance and flexibility affect the data modeling requirements?
  • Snowflake is one of the data stores that is enabling the shift from an ETL to an ELT workflow. What are the features that allow for that approach and what are some of the challenges that it introduces?
  • Can you describe how the platform is architected and some of the ways that it has evolved as it has grown in popularity?
    • What are some of the current limitations that you are struggling with?
  • For someone getting started with Snowflake what is involved with loading data into the platform?
    • What is their workflow for allocating and scaling compute capacity and running analyses?
  • One of the interesting features enabled by your architecture is data sharing. What are some of the most interesting or unexpected uses of that capability that you have seen?
  • What are some other features or use cases for Snowflake that are not as well known or publicized which you think users should know about?
  • When is Snowflake the wrong choice?
  • What are some of the plans for the future of Snowflake?

This is a great podcast series, so you might want to add it to your regular list!

Cheers.

Kent

The Data Warrior & Chief Technical Evangelist at Snowflake

The Snowflake Data Sharehouse. Wow!

Data Sharing for All Your Data

They say the Internet changed everything…

Then Big Data changed everything…

Then the Cloud changed everything…

Well my friends, Snowflake‘s announcement of its new data sharing feature has changed the game again! Your data warehouse in the cloud can now be a data sharehouse.

Building on all these technology evolutions, Snowflake has taken what we can now do with big data in a cloud-native data warehouse to whole new level by introducing, what I like to think of as Data Sharing as a Service (DSaaS).

This may be my new #1 favorite feature of Snowflake.

What is Snowflake Data Sharing?

Snowflake Data Sharing is a new feature that lets you easily, seamlessly, and securely, share tables, views, even entire databases with anyone inside the Snowflake ecosystem, in a read only mode. They can then query the data from within their own Snowflake account and even join it to their own internal data as if it was all in their database.

Snowflake Data Sharing architecture

That means no more needed to reformat and export data to flat files so they can be transmitted (via secure FTP or some other transfer protocol) to then be loaded into your customer’s or partner’s database.

All that time and effort – gone!

Data extraction process – gone!

Data movement – gone!

Data latency – gone!

Extra storage – gone!

You create your database, load the data, then share the data. And once the data object is shared, as you add more data or update the data set, those changes are immediately available for the data consumers to query. No more wasted time waiting for an incremental update file to be built and transmitted.

And you have complete control on who sees what data. In fact you can revoke anyones access instantly with a single command.

Oh – did I mention that the new feature is FREE to all Snowflake customers. It is built into the standard edition! (That’s just crazy!)

How does it work?

The reason that only Snowflake can do this is because of its unique multi-cluster, shared data architecture that completely separates compute resources from storage. That is why the data can be stored once (by the data provider) and then be shared to an unlimited number of data consumers. The global meta data and security services in Snowflake’s cloud services layer are key components that allow sharing to be not only fast but secure. With independent compute clusters (i.e., virtual warehouses), data consumers can use whatever amount of compute they require to query and use the shared data without impact on either the data provider or other data consumers.

So the basic process for data sharing is simple:

  1. Data Provider creates a share container with the objects (databases, schemas, tables, or views) to be shared.
  2. Data Provider then grants a Data Consumer account access to the share.
  3. Data Consumer creates new database that maps to the shared object(s).
  4. Data Consumer then grants access privileges to a role in their account
  5. Data Consumer starts querying, using the privileged role and their virtual warehouse.

Snowflake Data Sharing setup

Code examples:

Data Provider code:

Here is a scenario where the data provider wants to share just a single table in a database to several accounts. This approach allows the provider to verify the configuration and contents of the share before making it visible to other accounts (this is the recommended approach).

CREATE SHARE sales_s1; -- create an empty share

GRANT USAGE on DATABASE sales to SHARE sales_s1; -- add database

GRANT USAGE on SCHEMA sales.east to SHARE sales_s1; -- add schema

GRANT SELECT on TABLE sales.east.new_orders 
             to SHARE sales_s1; -- add table

SHOW SHARES;

ALTER SHARE sales_s1 ADD ACCOUNTS=a1, a2, a3; -- add accounts

Data Consumer code:

On the consumer side, each account would create a database from the share sales_s1, then grant access to the new database in order to access the table NEW_ORDERS.

CREATE DATABASE External_SalesData from SHARE ProviderAcct1.sales_s1;

GRANT IMPORTED PRIVILEGES on DATABASE External_SalesData to MyRole;

Security – Revoking a Share

If for some reason a Data Provider needs to stop sharing their data either to a single account or to everyone, that is also easy to do. They can either REVOKE the privileges granted or completely DROP the share.

REVOKE SELECT ON TABLE sales.east.new_orders
  FROM SHARE sales_s1;

or just

DROP SHARE sales_s1;

Unlimited Possibilities for the New Data Economy

So, how can your business change and grow with this capability (that costs you nothing)? Do you have partners that have wanted access to your data but found it too difficult to engineer that data pipeline? Is there a market for your data, and the insights it provides, that you have not even explored?

This feature redefines the old Data Warehouse into a modern Data Sharehouse that lets you derive even more value from all your data – with no limits.

With Snowflake Data Sharing, you can now transform your data into a valuable, strategic business asset.

For More Information

For more details on Snowflake Data Sharing, check out these posts:

https://www.snowflake.net/data-sharehouse-brings-forth-new-market/

https://www.snowflake.net/data-sharehouse/

Then download the free ebook “From Data Warehouse to Data Sharehouse” for an even more in-depth look at Snowflake Data Sharing

And signup for the live webinar “A Deeper Look at Data Sharing” coming next week.

So what do you think? How could this change your business?

Cheers.

Kent

The Data Warrior

Cloud Analytics Conference – London!

Next up on The Data Warrior speaking tour 2017 is the Snowflake Cloud Analytics Conference in London on June 1st!

CloudConference

Snowflake is kicking off this year’s Cloud Analytics City Tour with a blow out event in London, England. This will be a full day workshop style event where you get to hear and learn from industry veterans and thought leaders like myself, and the CEO of Snowflake Computing, Bob Muglia (to name just a few). In addition we will have a Practitioner Panel discussion that includes several of our customers along with other industry thought leaders.

The unique value proposition for this event is that in the afternoon you can choose from two tracks of in depth sessions related to implementing your BI solutions and your data warehouse in the cloud.

I will be presenting my talk Agile Methods and Data Warehousing: How to Deliver Faster. My highly seasoned colleagues from Snowflake (all industry experts) will teach you about loading data in the cloud, deploying BI in the cloud, and how to best use Snowflake to be successful with your cloud analytics program.

And of course there will be food, drinks, and networking.

You can find all the agenda details here along with the registration form. Use discount code DATAWARRIOR for 50% off the registration fee.  Sign up today!

This will be my first time ever in London, so if you are in the area, please come by, say “hi” and learn about the new world of Cloud Analytics.

Until then, cheers!

Kent

The Data Warrior

P.S. I will be in London the day before and after the event, so if you want to have a more detailed or personalized discussion of the benefits of cloud-native data warehousing, please reach out to me at kent.graziano@snowflake.net.

Meet me in St. Louie, Louie.

Next up on the Data Warrior speaking schedule is the St. Louis SilverLinings event on May 2nd. It will be held at the St. Charles Convention Center, St. Louis, MS.

SilverLinings

This promises to be a very exciting event boasting “edgy” and forward looking technical topics. It’s going to be a very busy day for me with three talks in total on some of my favorite topics.

Topic 1Demystifying Data Warehousing as a Service: Top 10 Cool Features in Snowflake

Topic 2Agile Methods and Data Warehousing: How to Deliver Faster

Topic 3Agile Data Engineering: Introduction to Data Vault Data Modeling

So if you are in the St. Louis area, or fancy a trip to the Gateway to the West,  please join me there on May 2nd.

Special Discount for Data Warrior fans!

The organizers were kind enough to offer my followers a 50% discount. Wow!

Just use this code when you sign up: KGraz280790

So what are you waiting for – sign up register here.

See you soon!

Kent

The Data Warrior

Data Vault 2.0 Online Training – Early Adopter

Finally! People have been asking for this literally for years – to be able to get authentic Data Vault 2.0 (CDVP2) training in an online format.

Please remember there are no refunds and to get the best deal on the Early Adopter offer ($300 off), you must purchase by Friday March 24th, 2017. After that, the price goes up to $997.

So if you have been waiting to get Data Vault 2.o training straight from the inventor, Dan “Data Vault” Linstedt – this is your chance! Get it here.

Happy Vaulting!

Kent

The Data Warrior

NB: I have seen the videos and can say the content is the quality and caliber I expect from Dan and Sanjay, but you should also know that by buying via the links in this post, I will get a cut. Thank you.

P.S. Don’t forget about the upcoming World Wide Data Vault Consortium in Stowe this May. Sign up here.

 wwdvc2017

Post Navigation

%d bloggers like this: