The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “agile”

The 12 Steps to Faster Data Warehouse Success

Announcement!

I have exciting news!

With the help of my good friend Dan Linstedt (of LearnDataVault.com fame), we have just launched my first online training video based on my very popular white paper and talk: Agile Methods and Data Warehousing: How to Deliver Faster.

Most of you will agree that data warehousing and business intelligence projects take too long to deliver tangible results. I am sure all you project and program managers wish it was not true.

Often by the time a solution is in place, the business needs have changed.

With all the talk about Agile development methods, including SCRUM and Extreme Programming, the question arises as to how these approaches can be used to deliver data warehouse and business intelligence projects faster. This new online course will look at the 12 principles behind the Agile Manifesto and see how they can be applied in the context of a data warehouse project. Then I will talk about some of the specific agile techniques I have used with great success on my projects over the last 15 years. The goal is to determine a method or methods to get a more rapid (2-4 week) delivery of portions of an enterprise data warehouse architecture.

The last time I gave this talk, in Helsinki, Finland at Harmony 2014, I had standing room only and ended up being rated the 2nd best speaker at the event (pretty cool!). It was so popular that the UK Oracle Users Group asked me to write an article on the same topic for their international newsletter.

Since many of you don’t get the chance to travel to events like this (or may have missed my session), you can now see my talk online, at your convenience, for much less than the cost of a conference fee (and the airfare to get there!). We just filmed it last week, after I completed my most recent agile data warehouse engagement, so it contains some new insights and stories that even the folks in Helsinki did not get to hear.

As a bonus, once you have finished the course, you will be able to download a free copy of the detailed article I wrote for UKOUG.

If you have questions during or after the course, you can post them right there in the training portal where I will answer them. So in addition to the training course and the white paper, you also get interactive access to me!

How do I sign up?

So how do you sign up for this new class and how much does it cost?

Well, the full price for course will be $199, but for those of  you who read my blog, I have a special Valentines Day Special offer: if you are one of the first 50 people to purchase the class between now and midnight February 15, 2015, you get a full 50% off the retail price.

So that is $99.50 for over an hour of valuable content PLUS a copy of my white paper (and access to ask me your burning questions).

Use the coupon code: GRAZIANO50

You can buy it now by going to the all new Learn Data Vault training portal now.

On the site you see the class description, outline, and my introductory video, along with the “Buy Now!” button.

So hurry and cash in my special gift to you before the time is up (remember after 2/15/15 it will be $199).

Applying Agile

For those of you who had no idea there were 12 Principles behind the Agile Manifesto, let me tell you about one that I think is vitally important: Principle #6

The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.

This means the team works best when co-located so they can easily talk to each other during the day.

HINT: If not co-located, you need to be sure you have an adequate instant messaging system in place to facilitate their daily interaction. And that the team agrees to use it!

In addition, another best practice is to hold Team Huddles every morning. In the class, I give a lot of details about huddles and how they work, but the main point is that the team needs to meet briefly once a day (usually the morning) to make sure they are all one the same page as to what they are all working on.

I can tell you for a fact, that the daily huddles and ongoing interaction is definitely a critical success factor in adopting agile practices for your data warehouse team. I have seen great success where this was implemented properly and I have also seen lots of issues when the team did not communicate daily. There is no better recipe for disaster than to have your data architect building the wrong view when the report writer is trying to finalize the output with the user. Yikes!

So, if you want to learn how to apply the 12 Principle of Agile to become more successful in delivering usable results to your data warehouse and BI program, please go over to the training site and sign up from my class.

Here’s to your success!

Kent

The Data Warrior

P.S. Don’t forget to sign up before 2/15/15 with coupon code GRAZIANO50 to get 50% off the full price.

 

Better Data Modeling: The Data Warrior Speaks 2015

Great news, I have confirmed three major events, and one local event so far this year where you can come out and hear me speak about some of my favorite topics: #DataModeling, #SQLDevModeler, and #DataVault.

So, line up your training budget and get registered for at least one of these great events.

DAMA Houston

My first talk for the year will be local – downtown Houston. I will present an Introduction to Data Vault Modeling for the Houston Chapter of DAMA International (next week!).

When: 10-Feb-2015
Time: 1pm – 4:30pm
Where: Chevron Building, Rio Grande Room – 51st floor, 1600 Smith, Houston, TX 77002

If you plan to attend, please RSVP directly to stephen.pace@kalido.com.

RMOUG Training Days 2015

Held every year at the Denver Convention Center in mid-February, The Rocky Mountain Oracle Users Group Training Days is the best value around for user group events – low cost ($395- $455), great locations (Denver!), and excellent speaker lineup (international speakers, Oracle ACE and ACE Directors).

I will be speaking both Wednesday, February 18, and Thursday, February 19 (last session!). My topics this year will be an Introduction to SQL Developer Data Modeler, and Worst Practices in Data Warehouse Design.

Plus I will be leading Morning Chi Gung exercises at 7 AM both days to get you all warmed up for a great day of learning. Check the entire agenda here.

As a bonus there are some excellent deep dive sessions on Tuesday, February 17th that are not to be missed, so get there early.

New this year will be Special Interest Group (SIG) meetings during lunch on Wednesday. I will be co-leading one on Data Integration & Data Warehousing with Bobby Curtis.

So, lots to do see and learn. Sign up today (and bring your ski equipment for the weekend after).

2nd Annual World Wide Data Vault Consortium

WWDVC was so successful last year, that Dan decided to do it again. This year there is even a new cool website for the event, which will be held May 28-30 in Stowe, Vermont at the Trapp Family Lodge. This will be a small event (less than 60 people), with a single track so you won’t have to decide which talk to attend.

Yes, the hills will be alive with the sounds of Data Vault geeks from around the world telling their tales of trials, tribulations, and success as they try to implement large, agile, enterprise data warehouse programs across many industries. Topics include:

  • Big Data, NoSQL
  • Virtualization of Data Marts
  • Data Vault 2.0 & Agility
  • Changing roles of Data Modeling
  • Managed Self-Service BI

The speaker lineup is a who’s-who of the data warehouse and agile world.

Special guests this year include a keynote from Claudia Imhoff, Dan Linstedt,  and newest addition Scott Ambler (one of the authors of the Agile Manifesto).

I will be there again, giving two talks with my buddy from McKesson, Keith Hoyle. We will discuss Data Warehousing in the Real World and talk about our endeavors to develop Virtualized Hybrid Type 1-2 Dimensions to enable Extreme BI.

Don’t miss this chance to rub elbows and network with the top innovators and thinkers in the data warehouse and BI space. Sign up soon as there are limited slots and limited rooms at the inn.

ODTUG KScope15

Another amazing annual event, this user group gathering will be a veritable who’s-who in the Oracle community. Again you will find Oracle ACEs and ACE Directors, as well as Oracle Product Managers, all ready and willing to discuss the latest and greatest tools for doing Oracle development work. Check out the amazing list of talks and presenters.

This year it is back to the beach for KScope. It will be held June 21-25 at the Diplomat Resort and Spa on the beach in Hollywood, Florida.

By popular demand, the last day of the conference will be all Deep Dive sessions, so be sure to plan your travel to hang out until the end (and then enjoy the beach!).

I will be giving two talks during the week (same ones as at RMOUG), answering questions on a panel or two, and again running my annual Morning Chi Gung sessions every morning (but this year outside on the beach).

This should be a very educational and relaxing event as it is every year. And it is in a family-friendly location so bring the gang along.  You can register today and still get a huge early registration discount.

So what are you  waiting for?

See you soon!

Kent

The Data Warrior

P.S. While at these events I do expect to have some limited free time, so if you would like some one-on-one coaching in person, contact me directly at kent <dot> graziano <at> att <dot> net to set up a session.

 

Better Data Modeling: #DataVault 2.0 Virtualising your Data Vault – Satellites

Guest Blog Link: Virtualising your Data Vault – Satellites

This is a MUST READ for anyone wanting to get Agile in their DW/BI program and for anyone doing Data Vault 2.0.

Actually anyone doing Data Vault 1.0 can benefit from the technique as well.

Roelant Vos has done quite a bit on trying to virtualize and automate data warehouses using DV 2.0 (especially since the WWDVC in Vermont.). Please check out his blog and follow him on twitter too.

Cheers

Kent

The Oracle Data Warrior

Better Data Modeling: Color Code Your Data Model Diagrams using #SQLDevModeler

One of the standards I recommend in my book  Check List for Doing Data Model Design Reviews is to use color in your diagrams to visually differentiate types of entities or tables.

As luck would have it, Oracle SQL Developer Data Modeler has a feature that makes this very easy. It is Classification Types.

In the latest version. 4.0, you set these up by going to the context menu on the Design level. From that menu pick Properties. Once on the property dialog go to Settings -> Diagram -> Classification Types. (In 3.x look under Tools -> Preferences)

The default install comes with a bunch already – fact, dimensions, logging, summary, and temporary. Each has a pre-set color assigned. You can change that color by clicking on the color and selecting another option from the pallet. You can also set a prefix for each type. (Note – if you are already using a classification and change the color, when you hit apply the new color will be applied in all existing diagrams within the design.)

You add new types by clicking the green plus (+) sign and then just add in whatever you want and save.

For Data Vault modeling, I add three new types: Hub, Link, and Satellite with the colors you see in the screen shot here.

Using Classification Types to Color Code Your Diagrams

Using Classification Types to Color Code Your Diagrams

To apply a classification type to an existing table, open the table property dialog and look for the classification types node in the tree (in 4.0). In 3.x, there is a simple classification type drop down on the main property page.

Once applied, the first letter of the classification type will appear in the upper left corner of the table (see screen shot).

Another way I have used this recently was in my current data warehouse project where I have source, stage, and dimensional tables all in one design. I found I often want to show all three tiers in on diagram (sub view) for a sprint (we are using a SCRUM approach) to help the ETL programmers and QA folks have one place to go where they can see how these layers are related. So for this project, I also added a source and stage classification type.

So if you have been color coding you diagrams by hand, this tip should save you a bunch of time since you won’t have to pick the colors by hand on each table. Plus the color selection will be more consistent.

If you aren’t color coding, now would be a great time to start!

Bonus Tip: If, like me, you want to be consistent across all your designs with the types and colors, I just figured out I can hack the dl_settings.xml file to copy my classification type customizations from one design to another. Just be sure to exit and then restart SDDM after you update the file for it to take effect.

Have fun coloring your diagram! (Maybe more people will read them)

Kent

The Oracle Data Warrior

 

Agile Data Warehouse Modeling: How to Build a Virtual Type 2 Slowly Changing Dimension

One of the ongoing complaints about many data warehouse projects is that they take too long to delivery. This is one of the main reasons that many of us have tried to adopt methods and techniques (like SCRUM) from the agile software world to improve our ability to deliver data warehouse components more quickly.

So, what activity takes the bulk of development time in a data warehouse project?

Writing (and testing) the ETL code to move and transform the data can take up to 80% of the project resources and time.

So if we can eliminate, or at least curtail, some of the ETL work, we can deliver useful data to the end user faster.

One way to do that would be to virtualize the data marts.

For several years Dan Linstedt and I have discussed the idea of building virtual data marts on top of a Data Vault modeled EDW.

In the last few years I have floated the idea among the Oracle community. Fellow Oracle ACE Stewart Bryson and I even created a presentation this year (for #RMOUG and #KScope13) on how to do this using the Business Model (meta-layer) in OBIEE (It worked great!).

While doing this with a BI tool is one approach, I like to be able to prototype the solution first using Oracle views (that I build in SQL Developer Data Modeler of course).

The approach to modeling a Type 1 SCD this way is very straight forward.

How to do this easily for a Type 2 SCD has evaded me for years, until now.

Building a Virtual Type 2 SCD (VSCD2)

So how to create a virtual type 2 dimension (that is “Kimball compliant” ) on a Data Vault when you have multiple Satellites on one Hub?

(NOTE: the next part assumes you understand Data Vault Data Modeling. if you don’t, start by reading my free white paper, but better still go buy the Data Vault book on LearnDataVault.com)

Here is how:

Build an insert only PIT (Point-in-Time) table that keeps history. This is sometimes referred to as a historicized PIT tables.  (see the Super Charge book for an explanation of the types of PIT tables)

Add a surrogate Primary Key (PK) to the table. The PK of the PIT table will then serve as the PK for the virtual dimension. This meets the standard for classical star schema design to have a surrogate key on Type 2 SCDs.

To build the VSCD2 you now simply create a view that uses the PIT table to join the Hub and all the Satellites together. Here is an example:

Create view Dim2_Customer (Customer_key, Customer_Number, Customer_Name, Customer_Address, Load_DTS)
as
Select sat_pit.pit_seq, hub.customer_num, sat_1.name, sat_2.address, sat_pit.load_dts
from HUB_CUST hub,        
          SAT_CUST_PIT sat_pit,        
          SAT_CUST_NAME sat_1,        
          SAT_CUST_ADDR sat_2
where  hub.CSID = sat_pit.CSID           
    and hub.CSID = sat_1.CSID           
    and hub.CSID = sat_2.CSID           
    and sat_pit.NAME_LOAD_DTS = sat_1.LOAD_DTS           
    and sat_pit.ADDRESS_LOAD_DTS = sat_2.LOAD_DTS 
 

Benefits of a VSCD2

  1. We can now rapidly demonstrate the contents of a type 2 dim prior to ETL programming
  2. With using PIT tables we don’t need the Load End DTS on the Sats so the Sats become insert only as well (simpler loads, no update pass required)
  3. Another by product is the Sat is now also Hadoop compliant (again insert only)
  4. Since the nullable Load End DTS is not needed, you can now more easily partition the Sat table by Hub Id and Load DTS.

Objections

The main objection to this approach is that the virtual dimension will perform very poorly. While this may be true for very high volumes, or on poorly tuned or resourced databases, I maintain that with today’s evolving hardware appliances  (e.g., Exadata, Exalogic) and the advent of in memory databases, these concerns will soon be a thing of the past.

UPDATE 26-May-2018  – Now 5 years later I have successfully done the above on Oracle. But now we also have Snowflake elastic cloud data warehouse where all the prior constraints are indeed eliminated. With Snowflake you can now easily chose to instantly add compute power if the view is too slow or do the work and processing to materialize the view. (end update)

Worst case, after you have validated the data with your users, you can always turn it into a materialized view or a physical table if you must.

So what do you think? Have you ever tried something like this? Let me know in the comments.

Get virtual, get agile!

Kent

The Data Warrior

P.S. I am giving a talk on Agile Data Warehouse Modeling at the East Coast Oracle Conference this week. If you are there, look me up and we can discuss this post in person!

Post Navigation