The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “#datawarehousing”

Tips for Optimizing the #DataVault Architecture on #Snowflake (Part 2)

SETTING UP FOR MAXIMAL PARALLEL LOADING!

In this post, I discuss how to engineer your Data Vault load in Snowflake Cloud Data Platform for maximum speed.

Because Snowflake separates compute from storage and allows the definition of multiple independent compute clusters, it provides some truly unique opportunities to configure virtual warehouses to support optimal throughput of DV loads.

Along with using larger “T-shirt size” warehouses to increase throughput, using multi-cluster warehouses during data loading increases concurrency for even faster loads at scale.

Get the details –  Tips for Optimizing the Data Vault Architecture on Snowflake (Part 2)

Enjoy!

Kent

The Data Warrior & Chief Technical Evangelist for Snowflake

Tips for Optimizing the #DataVault Architecture on #Snowflake

Data Vault is an architectural approach that includes a specific data model design pattern and methodology developed specifically to support a modern, agile approach to building an enterprise data warehouse and analytics repository.

Typical Data Vault Design with Hubs, Sats, and a Link

Snowflake Cloud Data Platform was built to be design pattern agnostic. That means you can use it with equal efficiency 3NF models, dimensional (star) schemas, DV, or any hybrid you might have.Snowflake supports DV designs and handles several DV design variations very well with excellent performance.

This series of blog posts will present some tips and recommendations that have evolved over the last few years for implementing a DV-style warehouse in Snowflake.

Here is the first set of tips:Tips for Optimizing the Data Vault Architecture on Snowflake (part 1)

I hope you find this helpful!

Kent

The Data Warrior and Chief Technical Evangelist for Snowflake

5 Business Needs That Fuel Enterprise Data Warehouse Development

The global market for data warehousing is expected to grow to $34.7 billion by 2025, according to a recent report from Allied Market Research. That’s nearly double the $18.6 billion it was worth in 2017.

What fuels investment in enterprise data warehouse development? Cloud data warehouse technology has given rise to innovative systems and practices that increase efficiency and reduce costs across company functions. Today, departments like marketing, finance, and supply chain operations benefit from a modern data warehouse as much as the organization’s engineering and data science teams.

In this blog post, I list five business priorities that fuel increased investment in modern enterprise data warehouse development. See them here on the Snowflake blog site:

5 Business Needs That Fuel Enterprise Data Warehouse Development

And don’t forget to download my newest ebook (free) listed at the end of the post!

Kent

The Data Warrior

The Elephant in the Data Lake and Snowflake

So is Hadoop finally dead? For many use cases, I think it really is. The cloud and the continued evolution of technology has created newer, better ways of working with data at scale. Check out what Jeff has to say about it!

Jeffrey Jacobs, Consulting Data Architect, Snowflake SnowPro Core Certified

Let’s talk about the elephant in the data lake, Hadoop, and the constant evolution of technology.

Hadoop, (symbolized by an elephant), was created to handle massive amounts of raw data that were beyond the capabilities of existing database technologies. At its core, Hadoop is simply a distributed file system. There are no restrictions on the types of data files that can be stored, but the primary file contents are structured and semi-structured text. “Data lake” and Hadoop have been largely synonymous, but, as we’ll discuss, it’s time to break that connection with Snowflake’s cloud data warehouse technology.

Hadoop’s infrastructure requires a great deal of system administration, even in cloud managed systems.   Administration tasks include: replication, adding nodes, creating directories and partitions, performance, workload management, data (re-)distribution, etc.  Core security tools are minimal, often requiring add-ons. Disaster recovery is another major headache.  Although Hadoop is considered a “shared nothing” architecture, all…

View original post 809 more words

Post Navigation

%d bloggers like this: