The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “data warehouse solution”

Oracle 12c Release 1 is here!

After much anticipation the new Release 1 of Oracle 12c ( is finally available!

Get it here:

What does this mean:

Oracle’s ground breaking In Memory Database is now available!

As they say…this is BIG. Get more details about In Memory here:

I am very excited about the possibilities of what we can do with this technology. It will be a game changer especially for those of us trying to do agile data warehousing and business intelligence.

Now we can talk about having big data in memory – quickly accessible so it can be used.

Anyway, it is a big day in Oracle-land.

Now if I can just get my client to upgrade….



The Oracle Data Warrior

Live from Boulder: Denodo speaks to the #BBBT

Yes, I was in Boulder, Colorado for my first in-person BBBT meeting.

First, what is the BBBT?

BBBT – The Boulder BI Brain Trust

The BBBT was founded by Dr. Claudia Imhoff, an internationally known data warehousing and business intelligence expert who happens to live near Boulder, Colorado. I have known Claudia personal for many years. She did me the honor of critiquing and editing chapters int he very first book I wrote (and definitely made it better).

BBBT HQ Office in Boulder

BBBT HQ Office in Boulder

The BBBT is a group of independent BI analysts, practitioners, authors, experts and consultants that gather periodically to learn about the various vendors and trends in this industry.

The group is by invitation from Claudia, and I was very honored to be asked to join this amazing group earlier this year.

Since I was heading to Denver for some other activities, I decided to go a bit earlier so I could attend the briefing with Denodo and surprise Claudia (since I usually just dial in from Houston). As a bonus I got to partake in some nice little treats Claudia had laid out for us.



Who is Denodo?

So this week’s briefing was from Denodo Technologies. They are a privately held global company that provides modern data virtualization software. Here is a bit from their website:

Denodo Technologies, Inc. has redefined data integration to make the delivery of data to the corporate business applications simple.

The Denodo Data Services Platform is an enterprise Data Virtualization, Data Federation and Cloud Data Integration middleware that uses a declarative approach to abstract, unify, federate and understand disparate data sources and systems, supporting multiple acquisition and delivery modes and latency requirements, as well as a rich set of easy to use data transformation, data federation and data mashup capabilities.

Through Data Virtualization and Data Services, Denodo makes virtual data integration more flexible to adapt to the changing business needs and the evolution of the IT infrastructure, more universal to connect to a wider range of internal and external data sources, including the Web data, Cloud data, SaaS applications and less structured sources, and more cost-effective, by radically reducing licenses costs and the need for professional services and support.

Get the rest of the details about the company here.

Our in person presenters were Suresh Chandrasekaran, Senior VP, and Paul Moxon, Senior Director.

Denodo - Suresh presenting

Denodo – Suresh presenting

One key phrase they use is Broad Spectrum Capabilities. Here is a tweet with a nice picture of what they meant by that:

Capabilities of the @denodo #data platform #BBBT

— Jorge García (@jgptec) December 13, 2013

Data Virtualization

Everyone has their own take on what “data virtualization” means. Here is one slide showing what Denodo means:

Denodo Data Virtualization

Pretty broad based definition overall but it definitely set the context for the discussion. Being a Data Warrior, I am particularly interested in their Common Data Layer (alternatively called a Data Services Layer). Basically they allow you to map any data from nearly any source (and type of source) to what I would call a logical canonical model.

Not easy by any means, but I was reasonably impressed with what I saw of their data modeling tool where you defined not only the logical definition of the “entity” but also defined where that data came from and how it was joined to other sources for virtual integration.

One key point that people need to remember is that even with an easy to use, web-based, graphical tool for defining these virtualized data objects, somebody, somewhere, still has to do the very hard work of determining how this data joins together. That is hard enough to do when you are joining relational tables, but it does not get any easier when you throw in NoSQL, unstructured or semi-structured data streams, JSON documents, etc from sources like Cloudera and MongoDB (to name only a very few).

While I have not used the Denodo tool myself, their data modeling and mapping tool did look fairly easy to use and navigate.

Virtualization Patterns

One unique thing Denodo has done is define a variety of patterns they have observed with their clients. In turn they have developed best practices and solutions for implementing those patterns efficiently with their tool (of course).

Virtualization Patterns

Virtualization Patterns

While Agile BI and Data Warehousing are amoung the patterns, they are not the only ones.

This is good and progressive thinking (IMO). Most organizations really have a need for more than one of these patterns to truly solve their modern data management issues. There are many sources and uses of data in the modern data landscape and thinking that addressing only one of these (for example BI) will solve your issues, is thinking too much “inside the box”.

I am pleased to see a vendor taking this broad-based view of the situation and working to provide a unified solution platform to help.

So if you are looking/considering building a data services architecture and are looking at data virtualization tools, I would recommend you at least consider Denodo.

And of course stay tuned to the BBBT, and check the archived podcasts to see what other interesting vendors there are out there that you might want to consider.

Keep learning!


The Oracle Data Warrior

Data Vault vs. The World (1 of 3)

Okay, maybe not “the world” but is does sometimes seem like it.

Even though the Data Vault has been around for well over 10 years now, has multiple books, video, and tons of success stories,  I am constantly asked to compare and contrast Data Vault to approaches generally accepted in the industry.

What’s up with that?

When was the last time you got asked to justify using a star schema for your data warehouse project?

Or when was that expensive consulting firm even asked “so what data modeling technique do you recommend for our data warehouse?”

Oh…like never.

Such is the life of the “new guy.” (If you are new to Data Vault, read this first.)

So, over the next few posts, I am going to lay out some of the explanations and justifications I use when comparing Data Vault to other approaches to data warehousing.

The first contestant: Poor man’s ODS vs. Data Vault

This approach entails simply replicating the operational (OLTP) tables to another server for read only reporting. This could be used as a partial data warehouse solution using something like Oracle’s GoldenGate to support near real time operational reporting that would minimize impact on the operational system.

This solution, however, does not adequately support needs for dimensional analysis nor would it allow for tracking of changes to the data historical (beyond any temporal tracking inherent in the OLTP data model).

A big risk of this approach is that as the OLTP structures continue to morph and change over time, reports and other extracts that access the changed structures would of course break as soon as the change was replicated to the ODS.

How does Data Vault handle this?

Data Vault avoids these problems by using structures that are not tightly coupled to any one source system. So as the source systems change we simply add Satellite and Link structures as needed.  In the Data Vault methodology we do not drop any existing structures so reports will continue to work until we can properly rewrite them to take advantage of the new structure.  If there is totally new data added to a source, we would probably end up adding new Hubs as well.

An additional advantage is that because Data Vault uses this loosely coupled approach we can load data from multiple sources. If we replicate specific OLTP structures, we would not be able to easily integrate other source system feeds – we would have to build another repository to do the integration (which would likely entail duplicating quite a bit of the data).

Don’t get me wrong, there is nothing wrong with using replication tools to build real time operational data stores.

In fact it is an excellent solution to getting your operational reporting offloaded from the main production server.

It is a tried and true solution – for a specific problem.

It is however, not the right solution if you are building an enterprise data warehouse and need to integrate multiple sources or need to report on changes to your data over time.

So let’s use the right tool for the right job.

Data Vault is newer, better tool.

In the next two posts I will compare Data Vault to the Kimball-style dimensional approach (part 2 of 3) and then to Inmon-style 3NF (part 3 of 3).

Stay tuned.


P.S. Be sure to sign up to follow my blog so you don’t miss the next round of Data Vault vs. The World.


Post Navigation

%d bloggers like this: