The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “data virtualization”

Better Data Modeling: 7 Differentiating Characteristics of Data Vault 2.0

Hard to believe that the 2nd Annual World Wide Data Vault Consortium (WWDVC15) is NEXT WEEK in beautiful Stowe Vermont. It promises to be an excellent event. The speakers include myself, Claudia Imhoff, Dan Linstedt (the inventor of Data Vault), Scott Ambler, Roelant Vos, Dirk Lerner and many more. The focus will be DV 2.0, agile data warehousing, big data, NoSQL, virtualization and automation. Check out the agenda here: http://wwdvc.com/schedule/

So in preparation (and to encourage you to attend), I thought it might be good to review some of the important basics about Data Vault 2.0 and why it is an important evolution for the data warehousing community.

The approach started out as the Common Foundational Warehouse Modeling Architecture as it’s official name. Then it was more commonly known as the “Data Vault” and became a modelling method for Data Warehouses. It also had a methodology with implementation guidelines and worked very, very well on relational platforms for many, many years (over 10 years for those who did not know).

But technology evolved. NoSQL architectures came into the picture primarily as sources. The Apache Hadoop platform started offering a cheaper storage and processing MPP architecture.

Data Vault evolved into Data Vault 2.0 and already has many successful implementations. The original Data Vault is now referred to as Data Vault 1.0 (or DV 1.0) and it primarily has a modelling focus. DV 2.0 on the other hand changes some things, and adds a LOT.

Data Vault 2.0 has the following 7 differing characteristics:

1. DV 2.0 is a complete system of Business Intelligence. It talks about everything from concept to delivery. While DV 1.0 had a major focus on modelling and many of the modelling concepts are similar, DV 2.0 goes a step further and talks about data from source to business user facing constructs with guidelines for implementation, agile, virtualization and more.

2. DV 2.0 can adapt to changes better than pretty much ANY other data warehouse architecture or framework. It can do it even better than DV 1.0 because of the change in design to adapt to NoSQL and MPP platforms, if needed. DV 2.0 has successfully been implemented on MPP RDBMS platforms like Teradata as well (ask Dan for details).

3. DV 2.0 is both “big data” and “NoSQL” ready. In fact, there are implementations where data is sourced in real-time from NoSQL databases with phenomenal success stories. One of these was presented at the WWDVC 2014 where an organization saved lots of money by using this architecture.

A near real-time case study for absorbing data from MongoDB is being presented at WWDVC2015. It’s not to be missed.

4. DV 2.0 takes advantage of MPP style platforms and is designed with MPP in mind. While DV 1.0 also did this to an extent, DV 2.0 takes it to a completely other level with a zero-dependency type architecture. Of course, there are a few caveats you will need to learn.

5. DV 2.0 lets you easily tie structured and multi-structured data together (logically) where you can join data across environments easily. This particular aspect lets you build your Data Warehouse on multiple platforms while using the most appropriate storage platform to the particular data set. It lets you build a truly distributed Data Warehouse.

6. DV 2.0 has a greater focus on agility with principles of Disciplined Agile Delivery (DAD) embedded in the architecture and approach. Again, being agile was certainly possible with DV 1.0, but it wasn’t a part of the methodology. DV 2.0 is not just “agile ready”, it’s completely agile.

7. DV 2.0 has a very strong focus on both automation and virtualization as much as possible. There are already a couple of automation tools in the market that have the Dan’s approval (just ask). Some of them will be at WWDVC15.

It’s real-time ready, cloud ready, NoSQL ready and big data friendly. And practitioners have already had success in all these areas (on real projects not just in the lab).

And, as you’ll notice on the agenda, the focus at WWDVC15 will be Data Vault 2.0 with examples of sourcing it from MongoDB, with examples of virtualization (from me!), with examples of design mods (also one from me), with examples of Hadoop implementations and more. It’s not something you want to miss, and there’s hardly any time or seats left.

If you are coming, I look forward to seeing you and chatting about the world of DW/BI and agile. If you want to attend, grab one of the last seats over at http://wwdvc.com/#tile_registration  (if there are still seats left by the time you get this message).

See you soon!

Kent

The Data Warrior

P.S. After the conference, the next place you’ll hear about DV 2.0 is in Berlin. There is a bootcamp and certification starting on 16th June at Berlin, Germany. The details are here: http://www.doerffler.com/en/data-vault-training/data-vault-2-0-boot-camp-and-certification-berlin/

Live from Boulder: Denodo speaks to the #BBBT

Yes, I was in Boulder, Colorado for my first in-person BBBT meeting.

First, what is the BBBT?

BBBT – The Boulder BI Brain Trust

The BBBT was founded by Dr. Claudia Imhoff, an internationally known data warehousing and business intelligence expert who happens to live near Boulder, Colorado. I have known Claudia personal for many years. She did me the honor of critiquing and editing chapters int he very first book I wrote (and definitely made it better).

BBBT HQ Office in Boulder

BBBT HQ Office in Boulder

The BBBT is a group of independent BI analysts, practitioners, authors, experts and consultants that gather periodically to learn about the various vendors and trends in this industry.

The group is by invitation from Claudia, and I was very honored to be asked to join this amazing group earlier this year.

Since I was heading to Denver for some other activities, I decided to go a bit earlier so I could attend the briefing with Denodo and surprise Claudia (since I usually just dial in from Houston). As a bonus I got to partake in some nice little treats Claudia had laid out for us.

Yum!

Yum!

Who is Denodo?

So this week’s briefing was from Denodo Technologies. They are a privately held global company that provides modern data virtualization software. Here is a bit from their website:

Denodo Technologies, Inc. has redefined data integration to make the delivery of data to the corporate business applications simple.

The Denodo Data Services Platform is an enterprise Data Virtualization, Data Federation and Cloud Data Integration middleware that uses a declarative approach to abstract, unify, federate and understand disparate data sources and systems, supporting multiple acquisition and delivery modes and latency requirements, as well as a rich set of easy to use data transformation, data federation and data mashup capabilities.

Through Data Virtualization and Data Services, Denodo makes virtual data integration more flexible to adapt to the changing business needs and the evolution of the IT infrastructure, more universal to connect to a wider range of internal and external data sources, including the Web data, Cloud data, SaaS applications and less structured sources, and more cost-effective, by radically reducing licenses costs and the need for professional services and support.

Get the rest of the details about the company here.

Our in person presenters were Suresh Chandrasekaran, Senior VP, and Paul Moxon, Senior Director.

Denodo - Suresh presenting

Denodo – Suresh presenting

One key phrase they use is Broad Spectrum Capabilities. Here is a tweet with a nice picture of what they meant by that:

Capabilities of the @denodo #data platform #BBBT pic.twitter.com/UEoIcQOs26

— Jorge García (@jgptec) December 13, 2013

Data Virtualization

Everyone has their own take on what “data virtualization” means. Here is one slide showing what Denodo means:

Denodo Data Virtualization

Pretty broad based definition overall but it definitely set the context for the discussion. Being a Data Warrior, I am particularly interested in their Common Data Layer (alternatively called a Data Services Layer). Basically they allow you to map any data from nearly any source (and type of source) to what I would call a logical canonical model.

Not easy by any means, but I was reasonably impressed with what I saw of their data modeling tool where you defined not only the logical definition of the “entity” but also defined where that data came from and how it was joined to other sources for virtual integration.

One key point that people need to remember is that even with an easy to use, web-based, graphical tool for defining these virtualized data objects, somebody, somewhere, still has to do the very hard work of determining how this data joins together. That is hard enough to do when you are joining relational tables, but it does not get any easier when you throw in NoSQL, unstructured or semi-structured data streams, JSON documents, etc from sources like Cloudera and MongoDB (to name only a very few).

While I have not used the Denodo tool myself, their data modeling and mapping tool did look fairly easy to use and navigate.

Virtualization Patterns

One unique thing Denodo has done is define a variety of patterns they have observed with their clients. In turn they have developed best practices and solutions for implementing those patterns efficiently with their tool (of course).

Virtualization Patterns

Virtualization Patterns

While Agile BI and Data Warehousing are amoung the patterns, they are not the only ones.

This is good and progressive thinking (IMO). Most organizations really have a need for more than one of these patterns to truly solve their modern data management issues. There are many sources and uses of data in the modern data landscape and thinking that addressing only one of these (for example BI) will solve your issues, is thinking too much “inside the box”.

I am pleased to see a vendor taking this broad-based view of the situation and working to provide a unified solution platform to help.

So if you are looking/considering building a data services architecture and are looking at data virtualization tools, I would recommend you at least consider Denodo.

And of course stay tuned to the BBBT, and check the archived podcasts to see what other interesting vendors there are out there that you might want to consider.

Keep learning!

Kent

The Oracle Data Warrior

Post Navigation