The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “data warehousing”

Snowflake SQL: Making Schema-on-Read a Reality (Part 1) 

This is my 1st official post on the Snowflake blog in my new role as their Technical Evangelist. It discusses getting results from semi-structured JSON data using our extensions to ANSI SQL.

Schema? I don’t need no stinking schema!

Over the last several years, I have heard this phrase schema-on-read used to explain the benefit of loading semi-structured data into a Big Data platform like Hadoop. The idea being you could delay data modeling and schema design until long after the data was loaded (so as to not slow down getting your data while waiting for those darn data modelers).

Every time I heard it, I thought (and sometimes said) – “but that implies there is a knowable schema.”  So really you are just delaying the inevitable need to understand the structure in order to derive some business value from that data. Pay me now or pay me later.

Why delay the pain?

Check out the rest of the post here:

Snowflake SQL: Making Schema-on-Read a Reality (Part 1) – Snowflake

Enjoy!

Kent

The Data Warrior

Tech Tip: Connect to Snowflake db using #SQLDevModeler

So, some of you may have noticed that I took “real” job this week. I am now the Senior Technical Evangelist for a cool startup company called Snowflake Computing.

Basically we provide a data warehouse database as a service in the cloud.

Pretty cool stuff. (If you want to know more, check out our site at snowflake.net)

I will talk more about the coolness of Snowflake (pun intended) in the future, but for now I just want to show you how easy it is to connect to.

Of course the first thing I want to do when I meet a new database is see if I can connect my most favorite data modeling tool, Oracle SQL Developer Data Modeler (SDDM),  to it and reverse engineer some tables.

The folks here told me that tools like Informatica, MicroStrategy, and Tableau connect just fine using either JDBC or ODBC, and that since we are ANSI SQL compliant, there should be no problem.

And they were right. It was almost as easy as connecting to Oracle but it was WAY easier than connecting to SQL Server.

First you need a login to a Snowflake database. No problem here. Since I am an employee, I do get a login. Check.

We have both a web-UI and a desktop command line tool. Turned out I needed the command line tool which incidentally needed our Snowflake JDBC connector to work. Followed the Snowflake documentation, downloaded the JDBC drive (to my new Mac!). Piece of cake.

So connecting from SDDM is really easy. First add the 3rd party JDBC driver in preferences. Preferences ->Data Modeler -> Third Party JDBC Driver (press the green + sign, then browse to the driver).

Add JDBC Driver

As you can see our JDBC driver is conveniently named snowflake_jdbc.jar.

Next step is to configure the database connection. To do this you go to File -> Import -> Data Dictionary, then add a new connection in the wizard.

Configure Connection

Give at a name and login information, then go to the JDBC tab.

So getting the URL was the trick (for me anyway). Luckily the command line tool displayed the URL when I launched it in a terminal window, so I just copied it from there (totally wild guess on my part).

So the URL (for future reference) is:

jdbc:snowflake://sfcsandbox.snowflakecomputing.com:443/?account=<service name>&user=<account>&ssl=on

Where account is whatever you named your account in Snowflake (once you have one of your very own that is).

The driver class was a little trickier – I had to read our documentation! Thankfully it is very good and has an entire section on how to connect using JDBC. In there I found the drive class name:

com.snowflake.client.jdbc.SnowflakeDriver

That was it.

I pushed the Test button and success!

Now to really test it, I did the typical reverse engineer and was able to see the demo schema and tables and brought them all in.

Snowflake Schema

Demo schema in Snowflake (no, not a snowflake schema!)

So I call that a win.

Not a bad weeks work really:

  1. New job orientation
  2. Start learning a new tech and the “cloud”
  3. Got logged in
  4. Installed SDDM on a Mac for the 1st time ever!
  5. Configured to speak to an “alien” database
  6. Successfully reverse engineer a schema
  7. Blog about it.

So that was my 1st week a a Senior Technical Evangelist.

TGIF!

Kent

still, The Data Warrior

P.S. If you want to see more about my week, just check my twitter stream and start following @SnowflakeDB too.

 

 

Better Data Modeling: What is #DataVault 2.0 and Why do I care?

Have you heard?

Dan Linstedt has just had his new book published on Data Vault 2.0. It is called Building a Scalable Data Warehouse with Data Vault 2.0. If you are at all into data warehousing and agile design, you need to get this book now. So click here and be done.

For those of you not sure what this DV 2.0 stuff is all about and why you might want to learn about it, I recently did a series of guest posts for Vertabelo to introduce folks to the concepts. In the series I walk you through some of the history of Data Vault and why we need a new way to do data warehousing. Then I get into the basics of modeling the Data Vault, the Business Vault, and finally building Information Marts from a Data Vault.

So you can find the posts here:

#1 – Agile Modeling: Not an Option Anymore

#2 – Data Vault 2.0 Modeling Basics

#3 – The Business Data Vault

#4 – Building an Information Mart with Your Data Vault

Once you have read these, I am sure you will want to go buy the new Data Vault 2.0 book and maybe sign up for some online training on LearnDataVault.com

Model on!

Kent

The Data Warrior

P.S. If you want to catch up, you can still purchase the original Data Vault (1.0) modeling book Super Charge Your Data Warehouse. It is a great reference book to have on hand (you can get it on Kindle too). Might as well have the whole set.

P.P.S. I turned this series into a Kindle ebook for easier reference, you can find it on my Author Profile or just click on the book cover in the right side bar above.

Better Data Modeling: 7 Differentiating Characteristics of Data Vault 2.0

Hard to believe that the 2nd Annual World Wide Data Vault Consortium (WWDVC15) is NEXT WEEK in beautiful Stowe Vermont. It promises to be an excellent event. The speakers include myself, Claudia Imhoff, Dan Linstedt (the inventor of Data Vault), Scott Ambler, Roelant Vos, Dirk Lerner and many more. The focus will be DV 2.0, agile data warehousing, big data, NoSQL, virtualization and automation. Check out the agenda here: http://wwdvc.com/schedule/

So in preparation (and to encourage you to attend), I thought it might be good to review some of the important basics about Data Vault 2.0 and why it is an important evolution for the data warehousing community.

The approach started out as the Common Foundational Warehouse Modeling Architecture as it’s official name. Then it was more commonly known as the “Data Vault” and became a modelling method for Data Warehouses. It also had a methodology with implementation guidelines and worked very, very well on relational platforms for many, many years (over 10 years for those who did not know).

But technology evolved. NoSQL architectures came into the picture primarily as sources. The Apache Hadoop platform started offering a cheaper storage and processing MPP architecture.

Data Vault evolved into Data Vault 2.0 and already has many successful implementations. The original Data Vault is now referred to as Data Vault 1.0 (or DV 1.0) and it primarily has a modelling focus. DV 2.0 on the other hand changes some things, and adds a LOT.

Data Vault 2.0 has the following 7 differing characteristics:

1. DV 2.0 is a complete system of Business Intelligence. It talks about everything from concept to delivery. While DV 1.0 had a major focus on modelling and many of the modelling concepts are similar, DV 2.0 goes a step further and talks about data from source to business user facing constructs with guidelines for implementation, agile, virtualization and more.

2. DV 2.0 can adapt to changes better than pretty much ANY other data warehouse architecture or framework. It can do it even better than DV 1.0 because of the change in design to adapt to NoSQL and MPP platforms, if needed. DV 2.0 has successfully been implemented on MPP RDBMS platforms like Teradata as well (ask Dan for details).

3. DV 2.0 is both “big data” and “NoSQL” ready. In fact, there are implementations where data is sourced in real-time from NoSQL databases with phenomenal success stories. One of these was presented at the WWDVC 2014 where an organization saved lots of money by using this architecture.

A near real-time case study for absorbing data from MongoDB is being presented at WWDVC2015. It’s not to be missed.

4. DV 2.0 takes advantage of MPP style platforms and is designed with MPP in mind. While DV 1.0 also did this to an extent, DV 2.0 takes it to a completely other level with a zero-dependency type architecture. Of course, there are a few caveats you will need to learn.

5. DV 2.0 lets you easily tie structured and multi-structured data together (logically) where you can join data across environments easily. This particular aspect lets you build your Data Warehouse on multiple platforms while using the most appropriate storage platform to the particular data set. It lets you build a truly distributed Data Warehouse.

6. DV 2.0 has a greater focus on agility with principles of Disciplined Agile Delivery (DAD) embedded in the architecture and approach. Again, being agile was certainly possible with DV 1.0, but it wasn’t a part of the methodology. DV 2.0 is not just “agile ready”, it’s completely agile.

7. DV 2.0 has a very strong focus on both automation and virtualization as much as possible. There are already a couple of automation tools in the market that have the Dan’s approval (just ask). Some of them will be at WWDVC15.

It’s real-time ready, cloud ready, NoSQL ready and big data friendly. And practitioners have already had success in all these areas (on real projects not just in the lab).

And, as you’ll notice on the agenda, the focus at WWDVC15 will be Data Vault 2.0 with examples of sourcing it from MongoDB, with examples of virtualization (from me!), with examples of design mods (also one from me), with examples of Hadoop implementations and more. It’s not something you want to miss, and there’s hardly any time or seats left.

If you are coming, I look forward to seeing you and chatting about the world of DW/BI and agile. If you want to attend, grab one of the last seats over at http://wwdvc.com/#tile_registration  (if there are still seats left by the time you get this message).

See you soon!

Kent

The Data Warrior

P.S. After the conference, the next place you’ll hear about DV 2.0 is in Berlin. There is a bootcamp and certification starting on 16th June at Berlin, Germany. The details are here: http://www.doerffler.com/en/data-vault-training/data-vault-2-0-boot-camp-and-certification-berlin/

The 12 Steps to Faster Data Warehouse Success

Announcement!

I have exciting news!

With the help of my good friend Dan Linstedt (of LearnDataVault.com fame), we have just launched my first online training video based on my very popular white paper and talk: Agile Methods and Data Warehousing: How to Deliver Faster.

Most of you will agree that data warehousing and business intelligence projects take too long to deliver tangible results. I am sure all you project and program managers wish it was not true.

Often by the time a solution is in place, the business needs have changed.

With all the talk about Agile development methods, including SCRUM and Extreme Programming, the question arises as to how these approaches can be used to deliver data warehouse and business intelligence projects faster. This new online course will look at the 12 principles behind the Agile Manifesto and see how they can be applied in the context of a data warehouse project. Then I will talk about some of the specific agile techniques I have used with great success on my projects over the last 15 years. The goal is to determine a method or methods to get a more rapid (2-4 week) delivery of portions of an enterprise data warehouse architecture.

The last time I gave this talk, in Helsinki, Finland at Harmony 2014, I had standing room only and ended up being rated the 2nd best speaker at the event (pretty cool!). It was so popular that the UK Oracle Users Group asked me to write an article on the same topic for their international newsletter.

Since many of you don’t get the chance to travel to events like this (or may have missed my session), you can now see my talk online, at your convenience, for much less than the cost of a conference fee (and the airfare to get there!). We just filmed it last week, after I completed my most recent agile data warehouse engagement, so it contains some new insights and stories that even the folks in Helsinki did not get to hear.

As a bonus, once you have finished the course, you will be able to download a free copy of the detailed article I wrote for UKOUG.

If you have questions during or after the course, you can post them right there in the training portal where I will answer them. So in addition to the training course and the white paper, you also get interactive access to me!

How do I sign up?

So how do you sign up for this new class and how much does it cost?

Well, the full price for course will be $199, but for those of  you who read my blog, I have a special Valentines Day Special offer: if you are one of the first 50 people to purchase the class between now and midnight February 15, 2015, you get a full 50% off the retail price.

So that is $99.50 for over an hour of valuable content PLUS a copy of my white paper (and access to ask me your burning questions).

Use the coupon code: GRAZIANO50

You can buy it now by going to the all new Learn Data Vault training portal now.

On the site you see the class description, outline, and my introductory video, along with the “Buy Now!” button.

So hurry and cash in my special gift to you before the time is up (remember after 2/15/15 it will be $199).

Applying Agile

For those of you who had no idea there were 12 Principles behind the Agile Manifesto, let me tell you about one that I think is vitally important: Principle #6

The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.

This means the team works best when co-located so they can easily talk to each other during the day.

HINT: If not co-located, you need to be sure you have an adequate instant messaging system in place to facilitate their daily interaction. And that the team agrees to use it!

In addition, another best practice is to hold Team Huddles every morning. In the class, I give a lot of details about huddles and how they work, but the main point is that the team needs to meet briefly once a day (usually the morning) to make sure they are all one the same page as to what they are all working on.

I can tell you for a fact, that the daily huddles and ongoing interaction is definitely a critical success factor in adopting agile practices for your data warehouse team. I have seen great success where this was implemented properly and I have also seen lots of issues when the team did not communicate daily. There is no better recipe for disaster than to have your data architect building the wrong view when the report writer is trying to finalize the output with the user. Yikes!

So, if you want to learn how to apply the 12 Principle of Agile to become more successful in delivering usable results to your data warehouse and BI program, please go over to the training site and sign up from my class.

Here’s to your success!

Kent

The Data Warrior

P.S. Don’t forget to sign up before 2/15/15 with coupon code GRAZIANO50 to get 50% off the full price.

 

Post Navigation

%d bloggers like this: