The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the category “BBBT”

The Hills were Alive with the Sounds of #DataVault

Yes folks a few weeks back we held the 2nd Annual World Wide Data Vault Consortium (#WWDVC) at the lovely Trapp Family Lodge outside Stowe, Vermont. What a great venue! Beautiful scenery, near perfect weather, great food, and great beer (they have their own brewery). Standing on the hillside it is easy to see why the Von Trapp Family Singers (you know from “The Sound of Music”) decided to settle here to build their new life in America.

What a view!

What a view!

Of course the learning and networking were outstanding again. This year was even better than last year.

Why?

  1. Location, location, location
  2. It was May (so much warmer that last year in St Albans – brrrr.)
  3. Dr. Claudia Imhoff gave the keynote! I love her new concept #XDW – the Extended Data Warehouse.
  4. Scott Ambler talked about Agile DW! It takes a Disciplined approach to be agile.
  5. Dan talked about DV 2.0 and Big Data.
  6. Sanjay showed us how he built a DV 2.0 platform on Hadoop.
  7. Multiple, real world case studies of DV 2.0 working in the wild around the globe.
  8. I gave two talks and showed models and code from one of my recent adventures.
  9. Five members of the Boulder BI Brain Trust (#BBBT) in attendance.
  10. We had multiple 30-minute networking sessions between the talks (who does that?). Plenty of time to ask questions and get to know each other.
  11. Three (count ’em 3!) global software vendors with off the shelf tools that support the automatic generation of DV 2.0 compliant components. Wow!
  12. BBQ dinner hosted by AnalytixDS. Yum!
  13. Crazy shirt day and contest.
  14. And did I mention three days of face-to-face networking with world-renowned experts. (I got to have lunch with Claudia Imhoff AND Scott Amber at the same time – a once in a lifetime opportunity)
  15. Fresh German-style craft beer.
  16. Bavarian pastries from the in house bakery.
  17. Did I mention the food?
  18. The view.
  19. The hiking. (Good to get outside and exercise after all those sessions.)
  20. The mountain biking (after the conference of course).

As if that was not enough, I was privileged to attend an exclusive workshop/mentoring/Q&A session with Dan the day before the event, where he told us about new, as yet unpublished DV 2.0 additions, explained in depth the zero-key concept, the right way to use hash keys, 3 stages of managed self-service BI, and a host of other topics and issues we all wanted feedback on. My brain was tired before the conference even started.

Hint: if you want to get invited to that special session next year, you need to get DV 2.0 certified ASAP. Keep an eye on LearnDataVault for Dan’s teaching and speaking schedule and locations or contact me about setting up a class if you can’t make one of his (I am an authorized DV 2.0 Bootcamp Instructor too).

Bummed out now that you missed all this great learning? Not like I did not warn you!

Well first, you can catch a lot of the action and a bunch of pictures by mining the Twitter stream for #WWDVC. But since I know you are all too busy (or lazy?), here it is for you:

Really wish you were there? Really?

You are in luck because Dan managed to record some of the best session on video! The videos and all the PowerPoint presentations are now available, for sale, on the Data Vault learning site. Just check out this offering WWDVC 2015 Videos. In addition to the videos listed, you get all the other presentation materials from the speakers (including me).

Right now the cost is $499 (yup more than the conference but hey, no travel expense). Since you are a loyal reader of my blog, you can get a 20% discount off that by using the coupon code KENT10S during checkout.

Even without the discount, it is more than worth the money. The video of Claudia’s keynote and Scott Ambler’s talk are worth that much alone.  The videos are high quality and both of them are amazing speakers. (FYI – some of the videos are very long and may take a minute or two to load depending on your internet connection)

So that is my short review of WWDVC 2015. Glad I was able to be a part of this great event!

VonTrappLodge2Keep you eyes on http://wwdvc.com/ for the announcement of the 2016 event and the call for papers (which will open soon).

See you next year? (Somewhere near Stowe again)

Kent

The Data Warrior

P.S. Dan’s newest book that covers Data Vault 2.0 is now available for pre-order on Amazon. Get a preview of Dan’s new DV 2.0 book.

Better Data Modeling: 7 Differentiating Characteristics of Data Vault 2.0

Hard to believe that the 2nd Annual World Wide Data Vault Consortium (WWDVC15) is NEXT WEEK in beautiful Stowe Vermont. It promises to be an excellent event. The speakers include myself, Claudia Imhoff, Dan Linstedt (the inventor of Data Vault), Scott Ambler, Roelant Vos, Dirk Lerner and many more. The focus will be DV 2.0, agile data warehousing, big data, NoSQL, virtualization and automation. Check out the agenda here: http://wwdvc.com/schedule/

So in preparation (and to encourage you to attend), I thought it might be good to review some of the important basics about Data Vault 2.0 and why it is an important evolution for the data warehousing community.

The approach started out as the Common Foundational Warehouse Modeling Architecture as it’s official name. Then it was more commonly known as the “Data Vault” and became a modelling method for Data Warehouses. It also had a methodology with implementation guidelines and worked very, very well on relational platforms for many, many years (over 10 years for those who did not know).

But technology evolved. NoSQL architectures came into the picture primarily as sources. The Apache Hadoop platform started offering a cheaper storage and processing MPP architecture.

Data Vault evolved into Data Vault 2.0 and already has many successful implementations. The original Data Vault is now referred to as Data Vault 1.0 (or DV 1.0) and it primarily has a modelling focus. DV 2.0 on the other hand changes some things, and adds a LOT.

Data Vault 2.0 has the following 7 differing characteristics:

1. DV 2.0 is a complete system of Business Intelligence. It talks about everything from concept to delivery. While DV 1.0 had a major focus on modelling and many of the modelling concepts are similar, DV 2.0 goes a step further and talks about data from source to business user facing constructs with guidelines for implementation, agile, virtualization and more.

2. DV 2.0 can adapt to changes better than pretty much ANY other data warehouse architecture or framework. It can do it even better than DV 1.0 because of the change in design to adapt to NoSQL and MPP platforms, if needed. DV 2.0 has successfully been implemented on MPP RDBMS platforms like Teradata as well (ask Dan for details).

3. DV 2.0 is both “big data” and “NoSQL” ready. In fact, there are implementations where data is sourced in real-time from NoSQL databases with phenomenal success stories. One of these was presented at the WWDVC 2014 where an organization saved lots of money by using this architecture.

A near real-time case study for absorbing data from MongoDB is being presented at WWDVC2015. It’s not to be missed.

4. DV 2.0 takes advantage of MPP style platforms and is designed with MPP in mind. While DV 1.0 also did this to an extent, DV 2.0 takes it to a completely other level with a zero-dependency type architecture. Of course, there are a few caveats you will need to learn.

5. DV 2.0 lets you easily tie structured and multi-structured data together (logically) where you can join data across environments easily. This particular aspect lets you build your Data Warehouse on multiple platforms while using the most appropriate storage platform to the particular data set. It lets you build a truly distributed Data Warehouse.

6. DV 2.0 has a greater focus on agility with principles of Disciplined Agile Delivery (DAD) embedded in the architecture and approach. Again, being agile was certainly possible with DV 1.0, but it wasn’t a part of the methodology. DV 2.0 is not just “agile ready”, it’s completely agile.

7. DV 2.0 has a very strong focus on both automation and virtualization as much as possible. There are already a couple of automation tools in the market that have the Dan’s approval (just ask). Some of them will be at WWDVC15.

It’s real-time ready, cloud ready, NoSQL ready and big data friendly. And practitioners have already had success in all these areas (on real projects not just in the lab).

And, as you’ll notice on the agenda, the focus at WWDVC15 will be Data Vault 2.0 with examples of sourcing it from MongoDB, with examples of virtualization (from me!), with examples of design mods (also one from me), with examples of Hadoop implementations and more. It’s not something you want to miss, and there’s hardly any time or seats left.

If you are coming, I look forward to seeing you and chatting about the world of DW/BI and agile. If you want to attend, grab one of the last seats over at http://wwdvc.com/#tile_registration  (if there are still seats left by the time you get this message).

See you soon!

Kent

The Data Warrior

P.S. After the conference, the next place you’ll hear about DV 2.0 is in Berlin. There is a bootcamp and certification starting on 16th June at Berlin, Germany. The details are here: http://www.doerffler.com/en/data-vault-training/data-vault-2-0-boot-camp-and-certification-berlin/

Oracle Data Warrior: 2013 in Review

Happy New Year!

I have been busy relaxing with my family on vacation so I decided not to write a full review this year. It was a busy year in many ways (not just blogging!). If you take a look in the right column, you will see a list of recent posts, top posts, and a complete archive by month. Go crazy…

If you want some highlights and details to see what happened on Oracle Data Warrior, check out this cool report provided by WordPress. Below the report I am also posting the link for Jeff Smith’s SQL Developer year-in-review.

Year in Review Report

The WordPress.com stats helper monkeys prepared a 2013 annual report for my blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 46,000 times in 2013. If it were a concert at Sydney Opera House, it would take about 17 sold-out performances for that many people to see it.

Click here to see the complete report.

Other Reviews

Well my main man Jeff Smith did a nice write up for SQL Developer. Check it out here: www.thatjeffsmith.com/archive/2013/12/the-best-of-2013-sql-developer-posts/

I was pretty busy on Twitter this year too. Here is a list of my top tweets: https://twitter.com/KentGraziano/timelines/419701819502104577.

For something fun check out the Vizify video link in my twitter profile.

For me, 2014 will be more adventures in Oracle data land, attending conferences, listening in at the BBBT, working with Dan and Sanjay on DV 2.0 stuff, and working to build and improve data models and data warehouses for my clients.

What will you do in 2014?

Cheers!

Kent G

The Oracle Data Warrior

Merry Christmas!

To all my loyal readers, followers, #BBBTers, #ODTUGers, #ChiGungers, #RMOUGers, #DataVaulters, #TKDers, and fellow Oracle #ACEs and #ACEDs, I want to wish you all a

Very Merry Christmas!

Talk to you soon.
Kent

Live from Boulder: Denodo speaks to the #BBBT

Yes, I was in Boulder, Colorado for my first in-person BBBT meeting.

First, what is the BBBT?

BBBT – The Boulder BI Brain Trust

The BBBT was founded by Dr. Claudia Imhoff, an internationally known data warehousing and business intelligence expert who happens to live near Boulder, Colorado. I have known Claudia personal for many years. She did me the honor of critiquing and editing chapters int he very first book I wrote (and definitely made it better).

BBBT HQ Office in Boulder

BBBT HQ Office in Boulder

The BBBT is a group of independent BI analysts, practitioners, authors, experts and consultants that gather periodically to learn about the various vendors and trends in this industry.

The group is by invitation from Claudia, and I was very honored to be asked to join this amazing group earlier this year.

Since I was heading to Denver for some other activities, I decided to go a bit earlier so I could attend the briefing with Denodo and surprise Claudia (since I usually just dial in from Houston). As a bonus I got to partake in some nice little treats Claudia had laid out for us.

Yum!

Yum!

Who is Denodo?

So this week’s briefing was from Denodo Technologies. They are a privately held global company that provides modern data virtualization software. Here is a bit from their website:

Denodo Technologies, Inc. has redefined data integration to make the delivery of data to the corporate business applications simple.

The Denodo Data Services Platform is an enterprise Data Virtualization, Data Federation and Cloud Data Integration middleware that uses a declarative approach to abstract, unify, federate and understand disparate data sources and systems, supporting multiple acquisition and delivery modes and latency requirements, as well as a rich set of easy to use data transformation, data federation and data mashup capabilities.

Through Data Virtualization and Data Services, Denodo makes virtual data integration more flexible to adapt to the changing business needs and the evolution of the IT infrastructure, more universal to connect to a wider range of internal and external data sources, including the Web data, Cloud data, SaaS applications and less structured sources, and more cost-effective, by radically reducing licenses costs and the need for professional services and support.

Get the rest of the details about the company here.

Our in person presenters were Suresh Chandrasekaran, Senior VP, and Paul Moxon, Senior Director.

Denodo - Suresh presenting

Denodo – Suresh presenting

One key phrase they use is Broad Spectrum Capabilities. Here is a tweet with a nice picture of what they meant by that:

Capabilities of the @denodo #data platform #BBBT pic.twitter.com/UEoIcQOs26

— Jorge García (@jgptec) December 13, 2013

Data Virtualization

Everyone has their own take on what “data virtualization” means. Here is one slide showing what Denodo means:

Denodo Data Virtualization

Pretty broad based definition overall but it definitely set the context for the discussion. Being a Data Warrior, I am particularly interested in their Common Data Layer (alternatively called a Data Services Layer). Basically they allow you to map any data from nearly any source (and type of source) to what I would call a logical canonical model.

Not easy by any means, but I was reasonably impressed with what I saw of their data modeling tool where you defined not only the logical definition of the “entity” but also defined where that data came from and how it was joined to other sources for virtual integration.

One key point that people need to remember is that even with an easy to use, web-based, graphical tool for defining these virtualized data objects, somebody, somewhere, still has to do the very hard work of determining how this data joins together. That is hard enough to do when you are joining relational tables, but it does not get any easier when you throw in NoSQL, unstructured or semi-structured data streams, JSON documents, etc from sources like Cloudera and MongoDB (to name only a very few).

While I have not used the Denodo tool myself, their data modeling and mapping tool did look fairly easy to use and navigate.

Virtualization Patterns

One unique thing Denodo has done is define a variety of patterns they have observed with their clients. In turn they have developed best practices and solutions for implementing those patterns efficiently with their tool (of course).

Virtualization Patterns

Virtualization Patterns

While Agile BI and Data Warehousing are amoung the patterns, they are not the only ones.

This is good and progressive thinking (IMO). Most organizations really have a need for more than one of these patterns to truly solve their modern data management issues. There are many sources and uses of data in the modern data landscape and thinking that addressing only one of these (for example BI) will solve your issues, is thinking too much “inside the box”.

I am pleased to see a vendor taking this broad-based view of the situation and working to provide a unified solution platform to help.

So if you are looking/considering building a data services architecture and are looking at data virtualization tools, I would recommend you at least consider Denodo.

And of course stay tuned to the BBBT, and check the archived podcasts to see what other interesting vendors there are out there that you might want to consider.

Keep learning!

Kent

The Oracle Data Warrior

Post Navigation

%d bloggers like this: