The Data Warrior

Changing the world, one data model at a time. How can I help you?

4 Keys to Succeeding with Agile Data Warehousing in 2016

I have been out giving talks again on using agile methods for data warehouse and business intelligence projects, so I thought it was time for me to share my thoughts about the 4 key elements you need to be successful with an Agile DW project in 2016.

Adopt an Agile Methodology

By this I am talking about SCRUM, Kanban, ScrumBan, or DAD (Disciplined Agile Development), among others.

Go read the blogs, read the books, study these methods. Attend a conference (like Agile Tech in April). Figure out what will work for your organization’s culture and leverage the skills of your staff. One size does not fit all.

In past engagements I have used approaches primarily based on SCRUM and Kanban. Both have been very effective once we got our processes down.

If you need/want help, find a good agile coach.

Use an Agile Data Engineering Approach

If you want to develop your data warehouse in an agile, iterative manner, then you need a way to design your EDW repository that lends itself to this approach without causing huge re-engineering pains (known as refactoring) in future iterations.

The best way I have found is using the Data Vault modeling approach. It was designed specifically for building data warehouses in this manner. I have written much about this approach and give many talks showing examples of successful agile projects using Data Vault. And there is plenty of material available to help you learn how to do it (see the books on the sidebar of this blog).

Also keep an eye on Dan Linstedt’s twitter feed and blog for his training classes.

Use Data Warehouse Automation Software

No better way to get agile and deliver results fast, than to automate as much of your development work as possible. If you use repeatable patterns (like Data Vault) in your design methodology, then it is even easier to automate and greatly reduce your time to market.

There are two vendors in the market that I like a lot and have had some experience with. They are WhereScape and AnalytixDS. And both support not only “traditional” approaches to data warehousing (like automating the ETL for a Type 2 Slowly Changing Dimension) but they both also support Data Vault (and both will be at WWDVC 2016).

Which of these tools you might use depends on your approach, your current tools, and your skills.

If you are coming from a more traditional DW paradigm and use ETL tools like Informatica, Talend, or DataStage, then I would recommend you look at AnalytixDS Mapping Manager which allows you to generate your ETL code from source to target mappings.

If you are just getting started or are committed to more of a database-centric approach and want your ETL or ELT code to run in the database, then look at WhereScape’s products.

Both are great companies with knowledgable people and happy customers.

Your third option is to write your own automation routines. There are many shops doing that as well. Just be sure you have the appropriate skills in house and can allocate the upfront time to get going (a month or so at least).

Deploy on an Agile Data Warehouse Platform

So now that I have learned about Elastic Data Warehousing in the cloud, I can’t imagine trying to do an agile DW project any other way.

Of course I am referring to Snowflake Computing’s DWaaS (data warehouse as a service) offering. Yes, I might be a bit biased since I do work for them now, but…this tech is really good!

From a features perspective, what I am talking about is having a high powered, easily scalable database that supports BI and analytic workloads and does not require a ton of time to configure and tweak.

Why do I think that is a success criteria? Because I have spent way too many months on way too many “agile” projects waiting to get access to the hardware! Or I get access and we either run out of space (e.g., “we had no idea you need THAT much storage”) or we can’t properly test production level loads and queries because the development box does not have enough horsepower.

Taking advantage of the elasticity of the cloud solves both of these problems and the folks at Snowflake have successfully built an RDBMS in the cloud that specifically harnesses these features and leverages them for data warehouse and analytic workloads by providing the ability to scale up and scale down both storage and compute resources on demand.

That and its many other features, give me the infrastructure I need to get an agile data warehouse project off the ground almost instantly. And I can do a Data Vault on Snowflake too.

Very cool.

So what do you think? Are you ready to accelerate your team’s performance and adopt an agile approach to data warehousing?

I hope this post gives you a few ideas on how to make that happen.

Model on!

Kent

The Data Warrior

 

Data Vault Modeling and Snowflake Elastic Data Warehouse

Since I have joined Snowflake, I have been asked multiple times what data warehouse modeling approach does Snowflake support best. Well, the cool thing is that we support multiple data modeling approaches equally.

Turns out we have a few customers who have existing data warehouses built using a particular approach known as the Data Vault modeling approach (which my readers no well by now) and they have decided to move into Snowflake (yeah!).

So the conversation often goes like this:

Customer: “Can you do Data Vault on Snowflake?”

Me: “Yes you can! Why do you ask?”

Customer: “Well, your name is “snowflake” so we thought that might mean you only support snowflake-type schemas.”

Me: “Well, yes I can see your confusion in that case, but the name has nothing to do with data warehouse design really. In fact, we support any type of relational design, including Data Vault.”

See the rest of the post here:

Data Vault Modeling and Snowflake

Keep on Modeling!

Kent

The Data Warrior

P.S. Next week I will be in Grapevine, Texas at the Gartner BI & Analytics Summit. Snowflake has a booth there. Come by and say “howdy!”

Connect to Snowflake with JDBC

As promised in my earlier post, here is a deeper dive into the next of the Top 10 Cool Features from Snowflake:

#9 Ability to connect with JDBC

This seems like a no brainer but is very important. If you’re interested in connecting any custom or packaged Java based applications to Snowflake, JDBC is what you need. JDBC technology lets you access information in SQL databases using standard SQL queries.

So why is this cool? Because all of the modern applications written in Java can take advantage of our elastic cloud based data warehouse through a JDBC connection.

And we have plenty of customers doing that today with industry leading tools.

You can read the rest of the post here:

Connect to Snowflake with JDBC

Enjoy!

Kent

The Data Warrior

P.S. Don’t forget to join me in Nashville this week for their TDWI Meetup

Viva Las Vegas!

Later this morning I am flying out to Las Vegas to attend my first HIMSS conference. This is the biggest IT Healthcare related event on the planet.

I will be hanging out at the Snowflake booth (#14078) in the Clinical & Business Intelligence Knowledge Center, ready to talk about elastic data warehousing and how it can help with BI and analytics in healthcare.

Here are the hours:

  • Monday, February 29 6:30 pm – 8:00 pm (Opening Reception)
  • Tuesday, March 1 9:30 am – 6:00 pm
  • Wednesday, March 2 9:30 am – 6:00 pm
  • Thursday, March 3 9:30 am – 4:00 pm

So, if you are attending HIMSS this year, please stop by and say hi.

Kent

The Data Warrior

P.S. I will be tweeting as always and may even attempt to Periscope live from HIMSS. Watch for #ElasticDW #HIMSS16

Snowflake Cool Features: Query History 

As promised in the previous post I wrote for Snowflake, here is a deeper dive into one of the Top 10 Cool Features from Snowflake:

#10 Result sets available via History

There are a lot of times when you want to make a small change to your large query, and want to be able to see the effect of a change quickly without rerunning the previous query. This is hard in most systems because you have to rerun the previous query, using up resources and time. Our solution allows users to view the result sets from queries that were executed previously, via history. One benefit users get is that if they had already executed a complex query that took some amount of time to execute, the user doesn’t have to run the query again to access the previous results. They can just go back to the history, and access the result set. This is also beneficial when working on a development project using the data warehouse. Developers can use the result set history to compare the effects of changes to the query or to the data set, without running the previous queries again.

Read the rest of the post here:

Snowflake Query Result Sets Available to Users via History – Snowflake

Enjoy!

Kent

The Data Warrior

Post Navigation