The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “Claudia Imhoff”

Better Data Modeling: 7 Differentiating Characteristics of Data Vault 2.0

Hard to believe that the 2nd Annual World Wide Data Vault Consortium (WWDVC15) is NEXT WEEK in beautiful Stowe Vermont. It promises to be an excellent event. The speakers include myself, Claudia Imhoff, Dan Linstedt (the inventor of Data Vault), Scott Ambler, Roelant Vos, Dirk Lerner and many more. The focus will be DV 2.0, agile data warehousing, big data, NoSQL, virtualization and automation. Check out the agenda here: http://wwdvc.com/schedule/

So in preparation (and to encourage you to attend), I thought it might be good to review some of the important basics about Data Vault 2.0 and why it is an important evolution for the data warehousing community.

The approach started out as the Common Foundational Warehouse Modeling Architecture as it’s official name. Then it was more commonly known as the “Data Vault” and became a modelling method for Data Warehouses. It also had a methodology with implementation guidelines and worked very, very well on relational platforms for many, many years (over 10 years for those who did not know).

But technology evolved. NoSQL architectures came into the picture primarily as sources. The Apache Hadoop platform started offering a cheaper storage and processing MPP architecture.

Data Vault evolved into Data Vault 2.0 and already has many successful implementations. The original Data Vault is now referred to as Data Vault 1.0 (or DV 1.0) and it primarily has a modelling focus. DV 2.0 on the other hand changes some things, and adds a LOT.

Data Vault 2.0 has the following 7 differing characteristics:

1. DV 2.0 is a complete system of Business Intelligence. It talks about everything from concept to delivery. While DV 1.0 had a major focus on modelling and many of the modelling concepts are similar, DV 2.0 goes a step further and talks about data from source to business user facing constructs with guidelines for implementation, agile, virtualization and more.

2. DV 2.0 can adapt to changes better than pretty much ANY other data warehouse architecture or framework. It can do it even better than DV 1.0 because of the change in design to adapt to NoSQL and MPP platforms, if needed. DV 2.0 has successfully been implemented on MPP RDBMS platforms like Teradata as well (ask Dan for details).

3. DV 2.0 is both “big data” and “NoSQL” ready. In fact, there are implementations where data is sourced in real-time from NoSQL databases with phenomenal success stories. One of these was presented at the WWDVC 2014 where an organization saved lots of money by using this architecture.

A near real-time case study for absorbing data from MongoDB is being presented at WWDVC2015. It’s not to be missed.

4. DV 2.0 takes advantage of MPP style platforms and is designed with MPP in mind. While DV 1.0 also did this to an extent, DV 2.0 takes it to a completely other level with a zero-dependency type architecture. Of course, there are a few caveats you will need to learn.

5. DV 2.0 lets you easily tie structured and multi-structured data together (logically) where you can join data across environments easily. This particular aspect lets you build your Data Warehouse on multiple platforms while using the most appropriate storage platform to the particular data set. It lets you build a truly distributed Data Warehouse.

6. DV 2.0 has a greater focus on agility with principles of Disciplined Agile Delivery (DAD) embedded in the architecture and approach. Again, being agile was certainly possible with DV 1.0, but it wasn’t a part of the methodology. DV 2.0 is not just “agile ready”, it’s completely agile.

7. DV 2.0 has a very strong focus on both automation and virtualization as much as possible. There are already a couple of automation tools in the market that have the Dan’s approval (just ask). Some of them will be at WWDVC15.

It’s real-time ready, cloud ready, NoSQL ready and big data friendly. And practitioners have already had success in all these areas (on real projects not just in the lab).

And, as you’ll notice on the agenda, the focus at WWDVC15 will be Data Vault 2.0 with examples of sourcing it from MongoDB, with examples of virtualization (from me!), with examples of design mods (also one from me), with examples of Hadoop implementations and more. It’s not something you want to miss, and there’s hardly any time or seats left.

If you are coming, I look forward to seeing you and chatting about the world of DW/BI and agile. If you want to attend, grab one of the last seats over at http://wwdvc.com/#tile_registration  (if there are still seats left by the time you get this message).

See you soon!

Kent

The Data Warrior

P.S. After the conference, the next place you’ll hear about DV 2.0 is in Berlin. There is a bootcamp and certification starting on 16th June at Berlin, Germany. The details are here: http://www.doerffler.com/en/data-vault-training/data-vault-2-0-boot-camp-and-certification-berlin/

Better Data Modeling: The Data Warrior Speaks 2015

Great news, I have confirmed three major events, and one local event so far this year where you can come out and hear me speak about some of my favorite topics: #DataModeling, #SQLDevModeler, and #DataVault.

So, line up your training budget and get registered for at least one of these great events.

DAMA Houston

My first talk for the year will be local – downtown Houston. I will present an Introduction to Data Vault Modeling for the Houston Chapter of DAMA International (next week!).

When: 10-Feb-2015
Time: 1pm – 4:30pm
Where: Chevron Building, Rio Grande Room – 51st floor, 1600 Smith, Houston, TX 77002

If you plan to attend, please RSVP directly to stephen.pace@kalido.com.

RMOUG Training Days 2015

Held every year at the Denver Convention Center in mid-February, The Rocky Mountain Oracle Users Group Training Days is the best value around for user group events – low cost ($395- $455), great locations (Denver!), and excellent speaker lineup (international speakers, Oracle ACE and ACE Directors).

I will be speaking both Wednesday, February 18, and Thursday, February 19 (last session!). My topics this year will be an Introduction to SQL Developer Data Modeler, and Worst Practices in Data Warehouse Design.

Plus I will be leading Morning Chi Gung exercises at 7 AM both days to get you all warmed up for a great day of learning. Check the entire agenda here.

As a bonus there are some excellent deep dive sessions on Tuesday, February 17th that are not to be missed, so get there early.

New this year will be Special Interest Group (SIG) meetings during lunch on Wednesday. I will be co-leading one on Data Integration & Data Warehousing with Bobby Curtis.

So, lots to do see and learn. Sign up today (and bring your ski equipment for the weekend after).

2nd Annual World Wide Data Vault Consortium

WWDVC was so successful last year, that Dan decided to do it again. This year there is even a new cool website for the event, which will be held May 28-30 in Stowe, Vermont at the Trapp Family Lodge. This will be a small event (less than 60 people), with a single track so you won’t have to decide which talk to attend.

Yes, the hills will be alive with the sounds of Data Vault geeks from around the world telling their tales of trials, tribulations, and success as they try to implement large, agile, enterprise data warehouse programs across many industries. Topics include:

  • Big Data, NoSQL
  • Virtualization of Data Marts
  • Data Vault 2.0 & Agility
  • Changing roles of Data Modeling
  • Managed Self-Service BI

The speaker lineup is a who’s-who of the data warehouse and agile world.

Special guests this year include a keynote from Claudia Imhoff, Dan Linstedt,  and newest addition Scott Ambler (one of the authors of the Agile Manifesto).

I will be there again, giving two talks with my buddy from McKesson, Keith Hoyle. We will discuss Data Warehousing in the Real World and talk about our endeavors to develop Virtualized Hybrid Type 1-2 Dimensions to enable Extreme BI.

Don’t miss this chance to rub elbows and network with the top innovators and thinkers in the data warehouse and BI space. Sign up soon as there are limited slots and limited rooms at the inn.

ODTUG KScope15

Another amazing annual event, this user group gathering will be a veritable who’s-who in the Oracle community. Again you will find Oracle ACEs and ACE Directors, as well as Oracle Product Managers, all ready and willing to discuss the latest and greatest tools for doing Oracle development work. Check out the amazing list of talks and presenters.

This year it is back to the beach for KScope. It will be held June 21-25 at the Diplomat Resort and Spa on the beach in Hollywood, Florida.

By popular demand, the last day of the conference will be all Deep Dive sessions, so be sure to plan your travel to hang out until the end (and then enjoy the beach!).

I will be giving two talks during the week (same ones as at RMOUG), answering questions on a panel or two, and again running my annual Morning Chi Gung sessions every morning (but this year outside on the beach).

This should be a very educational and relaxing event as it is every year. And it is in a family-friendly location so bring the gang along.  You can register today and still get a huge early registration discount.

So what are you  waiting for?

See you soon!

Kent

The Data Warrior

P.S. While at these events I do expect to have some limited free time, so if you would like some one-on-one coaching in person, contact me directly at kent <dot> graziano <at> att <dot> net to set up a session.

 

Live from Boulder: Denodo speaks to the #BBBT

Yes, I was in Boulder, Colorado for my first in-person BBBT meeting.

First, what is the BBBT?

BBBT – The Boulder BI Brain Trust

The BBBT was founded by Dr. Claudia Imhoff, an internationally known data warehousing and business intelligence expert who happens to live near Boulder, Colorado. I have known Claudia personal for many years. She did me the honor of critiquing and editing chapters int he very first book I wrote (and definitely made it better).

BBBT HQ Office in Boulder

BBBT HQ Office in Boulder

The BBBT is a group of independent BI analysts, practitioners, authors, experts and consultants that gather periodically to learn about the various vendors and trends in this industry.

The group is by invitation from Claudia, and I was very honored to be asked to join this amazing group earlier this year.

Since I was heading to Denver for some other activities, I decided to go a bit earlier so I could attend the briefing with Denodo and surprise Claudia (since I usually just dial in from Houston). As a bonus I got to partake in some nice little treats Claudia had laid out for us.

Yum!

Yum!

Who is Denodo?

So this week’s briefing was from Denodo Technologies. They are a privately held global company that provides modern data virtualization software. Here is a bit from their website:

Denodo Technologies, Inc. has redefined data integration to make the delivery of data to the corporate business applications simple.

The Denodo Data Services Platform is an enterprise Data Virtualization, Data Federation and Cloud Data Integration middleware that uses a declarative approach to abstract, unify, federate and understand disparate data sources and systems, supporting multiple acquisition and delivery modes and latency requirements, as well as a rich set of easy to use data transformation, data federation and data mashup capabilities.

Through Data Virtualization and Data Services, Denodo makes virtual data integration more flexible to adapt to the changing business needs and the evolution of the IT infrastructure, more universal to connect to a wider range of internal and external data sources, including the Web data, Cloud data, SaaS applications and less structured sources, and more cost-effective, by radically reducing licenses costs and the need for professional services and support.

Get the rest of the details about the company here.

Our in person presenters were Suresh Chandrasekaran, Senior VP, and Paul Moxon, Senior Director.

Denodo - Suresh presenting

Denodo – Suresh presenting

One key phrase they use is Broad Spectrum Capabilities. Here is a tweet with a nice picture of what they meant by that:

Capabilities of the @denodo #data platform #BBBT pic.twitter.com/UEoIcQOs26

— Jorge García (@jgptec) December 13, 2013

Data Virtualization

Everyone has their own take on what “data virtualization” means. Here is one slide showing what Denodo means:

Denodo Data Virtualization

Pretty broad based definition overall but it definitely set the context for the discussion. Being a Data Warrior, I am particularly interested in their Common Data Layer (alternatively called a Data Services Layer). Basically they allow you to map any data from nearly any source (and type of source) to what I would call a logical canonical model.

Not easy by any means, but I was reasonably impressed with what I saw of their data modeling tool where you defined not only the logical definition of the “entity” but also defined where that data came from and how it was joined to other sources for virtual integration.

One key point that people need to remember is that even with an easy to use, web-based, graphical tool for defining these virtualized data objects, somebody, somewhere, still has to do the very hard work of determining how this data joins together. That is hard enough to do when you are joining relational tables, but it does not get any easier when you throw in NoSQL, unstructured or semi-structured data streams, JSON documents, etc from sources like Cloudera and MongoDB (to name only a very few).

While I have not used the Denodo tool myself, their data modeling and mapping tool did look fairly easy to use and navigate.

Virtualization Patterns

One unique thing Denodo has done is define a variety of patterns they have observed with their clients. In turn they have developed best practices and solutions for implementing those patterns efficiently with their tool (of course).

Virtualization Patterns

Virtualization Patterns

While Agile BI and Data Warehousing are amoung the patterns, they are not the only ones.

This is good and progressive thinking (IMO). Most organizations really have a need for more than one of these patterns to truly solve their modern data management issues. There are many sources and uses of data in the modern data landscape and thinking that addressing only one of these (for example BI) will solve your issues, is thinking too much “inside the box”.

I am pleased to see a vendor taking this broad-based view of the situation and working to provide a unified solution platform to help.

So if you are looking/considering building a data services architecture and are looking at data virtualization tools, I would recommend you at least consider Denodo.

And of course stay tuned to the BBBT, and check the archived podcasts to see what other interesting vendors there are out there that you might want to consider.

Keep learning!

Kent

The Oracle Data Warrior

Post Navigation

%d bloggers like this: