The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “Data Warehouse”

Tech Tip: Connect to Snowflake db using #SQLDevModeler

So, some of you may have noticed that I took “real” job this week. I am now the Senior Technical Evangelist for a cool startup company called Snowflake Computing.

Basically we provide a data warehouse database as a service in the cloud.

Pretty cool stuff. (If you want to know more, check out our site at snowflake.net)

I will talk more about the coolness of Snowflake (pun intended) in the future, but for now I just want to show you how easy it is to connect to.

Of course the first thing I want to do when I meet a new database is see if I can connect my most favorite data modeling tool, Oracle SQL Developer Data Modeler (SDDM),  to it and reverse engineer some tables.

The folks here told me that tools like Informatica, MicroStrategy, and Tableau connect just fine using either JDBC or ODBC, and that since we are ANSI SQL compliant, there should be no problem.

And they were right. It was almost as easy as connecting to Oracle but it was WAY easier than connecting to SQL Server.

First you need a login to a Snowflake database. No problem here. Since I am an employee, I do get a login. Check.

We have both a web-UI and a desktop command line tool. Turned out I needed the command line tool which incidentally needed our Snowflake JDBC connector to work. Followed the Snowflake documentation, downloaded the JDBC drive (to my new Mac!). Piece of cake.

So connecting from SDDM is really easy. First add the 3rd party JDBC driver in preferences. Preferences ->Data Modeler -> Third Party JDBC Driver (press the green + sign, then browse to the driver).

Add JDBC Driver

As you can see our JDBC driver is conveniently named snowflake_jdbc.jar.

Next step is to configure the database connection. To do this you go to File -> Import -> Data Dictionary, then add a new connection in the wizard.

Configure Connection

Give at a name and login information, then go to the JDBC tab.

So getting the URL was the trick (for me anyway). Luckily the command line tool displayed the URL when I launched it in a terminal window, so I just copied it from there (totally wild guess on my part).

So the URL (for future reference) is:

jdbc:snowflake://sfcsandbox.snowflakecomputing.com:443/?account=<service name>&user=<account>&ssl=on

Where account is whatever you named your account in Snowflake (once you have one of your very own that is).

The driver class was a little trickier – I had to read our documentation! Thankfully it is very good and has an entire section on how to connect using JDBC. In there I found the drive class name:

com.snowflake.client.jdbc.SnowflakeDriver

That was it.

I pushed the Test button and success!

Now to really test it, I did the typical reverse engineer and was able to see the demo schema and tables and brought them all in.

Snowflake Schema

Demo schema in Snowflake (no, not a snowflake schema!)

So I call that a win.

Not a bad weeks work really:

  1. New job orientation
  2. Start learning a new tech and the “cloud”
  3. Got logged in
  4. Installed SDDM on a Mac for the 1st time ever!
  5. Configured to speak to an “alien” database
  6. Successfully reverse engineer a schema
  7. Blog about it.

So that was my 1st week a a Senior Technical Evangelist.

TGIF!

Kent

still, The Data Warrior

P.S. If you want to see more about my week, just check my twitter stream and start following @SnowflakeDB too.

 

 

Better Data Modeling: What is #DataVault 2.0 and Why do I care?

Have you heard?

Dan Linstedt has just had his new book published on Data Vault 2.0. It is called Building a Scalable Data Warehouse with Data Vault 2.0. If you are at all into data warehousing and agile design, you need to get this book now. So click here and be done.

For those of you not sure what this DV 2.0 stuff is all about and why you might want to learn about it, I recently did a series of guest posts for Vertabelo to introduce folks to the concepts. In the series I walk you through some of the history of Data Vault and why we need a new way to do data warehousing. Then I get into the basics of modeling the Data Vault, the Business Vault, and finally building Information Marts from a Data Vault.

So you can find the posts here:

Data Vault 2.0 Modeling Basics

The Business Data Vault

Building an Information Mart with Your Data Vault

Once you have read these, I am sure you will want to go buy the new Data Vault 2.0 book and maybe sign up for some online training on DataVaultAlliance.com.

Model on!

Kent

The Data Warrior

P.S. If you want to catch up, you can still purchase the original Data Vault (1.0) modeling book Super Charge Your Data Warehouse. It is a great reference book to have on hand (you can get it on Kindle too). Might as well have the whole set.

P.P.S. I turned this series into a Kindle ebook for easier reference, you can find it on my Author Profile or just click on the book cover in the right side bar above.

The Data Warrior Goes to Oracle Open World 2015 (#OOW15)

Yes boy and girls, it is that time of year again. Time for the BIG show in San Francisco – OOW15

I will as usual, attend, but only for a few days this year (been away from home too much).

My Talk

I will be presenting my session Worst Practices in Data Warehouse Design (UGF9985) on Sunday at 10:00 AM in Moscone South, room 274 (courtesy of ODTUG!).  This will be a quick one as I only have 45 minutes to do a one hour talk. Yikes!

If you will be in SF by Sunday, be sure to come by and say “hi.”

Morning Chi Gung?

As usual, I will try to gather some folks near either Union Square or maybe by Yerba Buena Gardens for a little morning relaxation with some Chi Gung (Qigong) . Most likely around 7 AM for 30 minutes. Probably Saturday, Sunday, and Tuesday, Follow me on twitter @kentgraziano and look for #MorningChiGung #OOW15 for final time and location.

Monday for sure though, I will be doing a shortened session on the beach at the 4th Annual OOW SF Bay Swim around 9:15 AM or so. Nothing like a little chi movement to warm you up before and after jumping in the bay.

The Rest of My Schedule

Much shorter than usual but here is what I plan to see so far:

Session Title Session Code Date Start Time End Time Room
Oracle Database 12c-Data Redaction UGF1513 10/25/2015 8:00 8:45 Moscone West-3011
Oracle Business Intelligence Best Practices: the Bad, the Worse, and the Really Ugly UGF7409 10/25/2015 9:00 9:45 Moscone South-301
Worst Practices in Data Warehouse Design (ME) UGF9985 10/25/2015 10:00 10:45 Moscone South-274
Oracle Database Security: Frustrating the Patient Intruder UGF10016 10/25/2015 11:00 11:45 Moscone South-274
Improving Analytics: Experiences Testing Oracle Database In-Memory 12c Feature UGF9936 10/25/2015 13:30 14:15 Moscone South-303
Fennia Two Years in Production with Oracle Exadata UGF5105 10/25/2015 14:30 15:15 Moscone South-305
Managing the Changes in Database Structures (Heli) UGF6317 10/25/2015 15:30 16:15 Moscone South-306
Oracle OpenWorld Welcome Keynote KEY10818 10/25/2015 17:00 19:00 Moscone North-Hall D
Oracle OpenWorld Monday Keynote Swim The Bay!
KEY10819 10/26/2015 9:00 10:00 Moscone North-Hall D The Dolphin Club
Oracle SQL Developer Tips and Tricks (Jeff Smith) CON8721 10/26/2015 16:00 16:45 Moscone South-304
Database Development for the Cloud and On Premises (Kris Rice) CON8779 10/26/2015 17:15 18:00 Moscone South-304

Other Fun Stuff

Along with the educational sessions is of course the opportunity to network with folks. I expect I will be doing that as much as possible. There is the annual Oracle ACE dinner, and who know what else. I am sure I will find plenty to keep me busy.

Since I arrive early for the Oracle ACE Directors annual briefing at HQ, I know I will be downtown Friday evening and all day Saturday. Drop me a note if you will be there too and want to hang out and see some of the sites and eat some top notch grub.

Cheers.

Kent

The Data Warrior

The 12 Steps to Faster Data Warehouse Success

Announcement!

I have exciting news!

With the help of my good friend Dan Linstedt (of LearnDataVault.com fame), we have just launched my first online training video based on my very popular white paper and talk: Agile Methods and Data Warehousing: How to Deliver Faster.

Most of you will agree that data warehousing and business intelligence projects take too long to deliver tangible results. I am sure all you project and program managers wish it was not true.

Often by the time a solution is in place, the business needs have changed.

With all the talk about Agile development methods, including SCRUM and Extreme Programming, the question arises as to how these approaches can be used to deliver data warehouse and business intelligence projects faster. This new online course will look at the 12 principles behind the Agile Manifesto and see how they can be applied in the context of a data warehouse project. Then I will talk about some of the specific agile techniques I have used with great success on my projects over the last 15 years. The goal is to determine a method or methods to get a more rapid (2-4 week) delivery of portions of an enterprise data warehouse architecture.

The last time I gave this talk, in Helsinki, Finland at Harmony 2014, I had standing room only and ended up being rated the 2nd best speaker at the event (pretty cool!). It was so popular that the UK Oracle Users Group asked me to write an article on the same topic for their international newsletter.

Since many of you don’t get the chance to travel to events like this (or may have missed my session), you can now see my talk online, at your convenience, for much less than the cost of a conference fee (and the airfare to get there!). We just filmed it last week, after I completed my most recent agile data warehouse engagement, so it contains some new insights and stories that even the folks in Helsinki did not get to hear.

As a bonus, once you have finished the course, you will be able to download a free copy of the detailed article I wrote for UKOUG.

If you have questions during or after the course, you can post them right there in the training portal where I will answer them. So in addition to the training course and the white paper, you also get interactive access to me!

How do I sign up?

So how do you sign up for this new class and how much does it cost?

Well, the full price for course will be $199, but for those of  you who read my blog, I have a special Valentines Day Special offer: if you are one of the first 50 people to purchase the class between now and midnight February 15, 2015, you get a full 50% off the retail price.

So that is $99.50 for over an hour of valuable content PLUS a copy of my white paper (and access to ask me your burning questions).

Use the coupon code: GRAZIANO50

You can buy it now by going to the all new Learn Data Vault training portal now.

On the site you see the class description, outline, and my introductory video, along with the “Buy Now!” button.

So hurry and cash in my special gift to you before the time is up (remember after 2/15/15 it will be $199).

Applying Agile

For those of you who had no idea there were 12 Principles behind the Agile Manifesto, let me tell you about one that I think is vitally important: Principle #6

The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.

This means the team works best when co-located so they can easily talk to each other during the day.

HINT: If not co-located, you need to be sure you have an adequate instant messaging system in place to facilitate their daily interaction. And that the team agrees to use it!

In addition, another best practice is to hold Team Huddles every morning. In the class, I give a lot of details about huddles and how they work, but the main point is that the team needs to meet briefly once a day (usually the morning) to make sure they are all one the same page as to what they are all working on.

I can tell you for a fact, that the daily huddles and ongoing interaction is definitely a critical success factor in adopting agile practices for your data warehouse team. I have seen great success where this was implemented properly and I have also seen lots of issues when the team did not communicate daily. There is no better recipe for disaster than to have your data architect building the wrong view when the report writer is trying to finalize the output with the user. Yikes!

So, if you want to learn how to apply the 12 Principle of Agile to become more successful in delivering usable results to your data warehouse and BI program, please go over to the training site and sign up from my class.

Here’s to your success!

Kent

The Data Warrior

P.S. Don’t forget to sign up before 2/15/15 with coupon code GRAZIANO50 to get 50% off the full price.

 

Better Data Modeling: My Top 3 Reasons why you should put Foreign Keys in your Data Warehouse

This question came up at the recent World Wide Data Vault Consortium. Seems there are still many folks who build a data warehouse (or data mart) that do not include FKs in the database.

The usual reason is that it “slows down” load performance.

No surprise there. Been hearing that for years.

And I say one of two things:

1. So what! I need my data to be correct and to come out fast too!

or

2. Show me! How slow is it really?

Keep in mind that while getting the data in quickly is important, so is getting the data out.

Who would you rather have complain – the ETL programmer or the business user trying to run a report?

Yes, it has to be a balance, but you should not immediately dismiss including FKs in your warehouse without considering the options and benefits of those options.

So here are my three main reasons why you should include FK constraints in your Oracle data warehouse database:

  1. The Oracle optimizer uses the constraints to make better decisions on join paths.
  2. Your Data Modeling and BI tools can read the FKs from the data dictionary to create correct joins in the meta data of the tool (SDDM, Erwin, OBIEE, Cognos, Bus Objects can all do this).
  3. It is a good QA check on your ETL. (Yeah, I know… the ETL code is perfect and checks all that stuff, bla, bla, bla)

Now of course there are compromise options. The three main ones are I know:

  1. Drop the constraints at the start of the load then add them back in after the load completes. If any fail to build, that tells you immediately where you may have some data quality problems or your model is wrong (or something else changed).
  2. Build all the constraints as DISABLE NOVALIDATE. This puts them in the database for the BI tools and data modeling tools to see and capture but, since they are not enforced, they put minimal overhead on the load process. And, so I am told by those that know, even a disabled constraint helps the optimizer make a smarter choice on the join path.
  3. (really 2a) Best of both – disable the constraints, load your data, then re-enable the constraints. You get optimization and quality checks.

So NOW what is your reason for not using FKs in your data warehouse?

Happy Modeling!

Kent

Post Navigation