The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the category “Data Modeling”

The best FREE data modeling tool ever

Yup, I said FREE!

Oracle just released the latest and greatest version of SQL Developer Data Modeler  (SDDM) and it is free to the world to not only download but to use in your production environment to develop all your models.

As many of you know, I have been using this tool for several years now and have mentioned it many times on various LinkedIn forums (just search for me and check out my activity). I have used SDDM for both Oracle and SQL Server. For forward engineering and reverse engineering. For conceptual, logical, and physical data models.

I think it is a great tool (even if it was not free).

I loved Oracle Designer and got quite good at that, but once shops stopped buying and using Designer (and Oracle pretty much sun-setted the tool), I suffered for a few years using other tools.

I was a very happy camper when Oracle came out with this new data modeling tool. I am even happier now with the new features they have added.

The one I like the most, so far, is the visual editor they added for defining views. The previous version had a decent declarative approach that allowed you to specify tables, columns, and joins, but you could not really “see” the implied data model.

The newest version of SDDM (version 3.1) has added in a visual editor that shows you a diagram of the tables, columns, and joins. So now when you open (or define) a view and press the “query” property button you get drag and drop interface to build the view and a nice visual diagram.

And the best part is when you upgrade your existing models from previous versions, the old views automatically get diagrammed.

To get the best out of the new version you need to run a one time utility labeled “Parse Older Style Views”. You can find that off the right mouse menu in any diagram with views. It runs very fast and basically reads the SQL for your views then parses it out to show up properly in the diagram.

One nice new feature with the parsed views is that if the underlying tables in the view are part of the same design file (hopefully you did not drop those), then the view object on your diagram will now list those tables below all the columns. This is nice because now I do not have to open the view definition to see which tables the view is pulling from.

The other great new feature is the “Test Query” button on the view property dialog.

No more writing views that do not work. You press the button, specify a database connection to use, then the base query for the view fires.

If there is an error in your syntax, or a table you don’t have access to, you find out immediately.

So gone are the days of writing the view in your modeling tools, loging into SQL Plus or SQL Developer, testing the view, having it fail, then going back to SDDM to fix it.

Now you can do agile view development! In one tool!

Neat!

Oh, and if the view works, there is a data tab so you can see the actual data the view will produce – live. Right in the data modeling tool.

Pretty cool.

Nice job guys.

Convinced yet? Head over to the Oracle site and download your own copy and give it try.

UPDATE 2015: Data Modeler is now up to version 4.1 and going strong. Plus now there is an Oracle Press book available on Amazon: http://www.amazon.com/Oracle-Developer-Modeler-Database-Mastery-ebook/dp/B00VMMR9EA/

And I even have a tips and tricks Kindle book out on SDDM. You can find that here.

Let me know what you think in the blog comments.

Talk to you all later.

Kent

P.S. For all the new features in SDDM 4.1 check out the full list over here.

P.P.S Need training on SDDM? Check out my post about my new workshop.

Is the Data Vault too complex?

This was a very interesting topic that came up on LinkedIn the other day, so I wanted to address it here to.

There seems to be quite a few people who think that Data Vault models are harder to build and manage than what we do today in most shops. So let me explain how I came to learn Data Vault Data Modeling.

Before learning Data Vault, I had successfully built several 3NF OLTP, 3NF DW, and Kimball-style Dimensional data warehouses (and wrote about it with Bill Inmon and Len Silverston in the original Data Model Resource Book).

In other words, I had a reasonable amount of experience in the subject area (data modeling and data warehousing).

I personally found Data Vault extremely easy to learn as a modeling technique (once I took the time to study it a bit). At the time that meant reading the old white papers, attending some lunch & learns with Dan Linstedt and then building a few sample models.

I was definitely skeptical at first (and asked lots of questions at the public lunch & learns). I did not care about MPP, scalability, or many of the other benefits Dan mentioned. I just knew from experience there were a few issues I had seen with the other approaches when it came to building a historical enterprise data store and was hoping Data Vault might be a solution.

In comparison to trying to learn how to design and load a Type 2 slowly changing dimension, Data Vault was a piece of cake (for me anyway).

Once I was convinced, I then introduced the technique to my team in Denver – who had virtually no data warehouse experience.

It was universal – everyone from the modelers to the dbas to the ETL programmers found the technique very easy to learn.

Our investment: One week of training from Dan for 7 people and 3 or 4 days of follow-on consulting where Dan came in once a month (for a day) to do a QA review on our models and load routines and mentor us on any issues we were having.

Dan did not make much $$ off of us. 😦

Since then, I have found that experienced 3NF modelers pick up the technique in no-time flat.

Why is that?

Because Data Vault relies on solid relational principles, experienced 3NF modelers seem to grasp it pretty fast.

Modelers who only have experience with star schemas, on the other hand, seem to have a bit of a hard time with the approach. For some of them it is a paradigm shift in modeling technique (i.e., feels very unfamiliar – “too many tables and joins”),  for others it is almost a dogmatic objection as they were (sadly) taught that dimensional/star was the only “right” way to do data warehousing.

They are just not open to a new approach for any reason (sad but true). 😦

The biggest issue I have seen with clients is a reluctance to try the approach for fear of failure because they don’t personally know anyone (other than me) who has used the approach and because they think it is easier (and cheaper?) to find dimensional modelers.

This happens, even if they agree in concept that Data Vault sounds like a very valid and flexible modeling approach.

As we all know, it takes $$ to train people on star schema design too, so my advice is that if you have a team of people who know 3NF but don’t know dimensional, train them on Data Vault to build your EDW, then hire one or two dimensional modelers to build your end user reporting layer (i.e., data marts) off the Data Vault.

So that’s my 25 cent testimonial. (You get if for free!)

If you want to learn more about Data Vault, check out my presentations on SlideShare or click on the Super Charge book cover (below my picture in the sidebar) to buy the Data Vault modeling book..

Check it out and let me know what you think in the comments. How do we get people over the fear of trying Data Vault?

Talk to you later.

Kent

Data Vault Certification Class – Day 3 (It’a a wrap!)

Well, not so cold today in Montreal. Instead we got very cold rain and snow mix. Yuk. (But I definitely want to come back in the summer!)

This morning, Dan dived into how to load the Data Vault with all new material he has not taught in the certification class before. We really got lucky by attending this class. I knew most of the concepts, and have implemented most of them, but his new slides are just killer. They really get the point across and cleared up a few points for me too.

Not only do the slides include sample SQL for various parts of the load, and the logical patterns, Dan even demonstrated some working code on his really cool, high-powered laptop. It was great for the class to see the Data Vault concepts put into practice. (And he of course had some more tales to tell)

Cool Phrase for the Day

Short Circuit Boolean Evaluation: A mathematical concept, that Dan laid on us, that is used to get very fast results from the Data Vault change detection code . We use it in doing column compares on attribute data to determine if a Satellite row has changes.

In Oracle it looks like this: decode(src_att1, trg_att1,0,1) = 1

In ANSI-SQL it is a bit longer but has the same effect:

CASE WHEN (src_att1 is null and trg_att1 is null or src_att1 = trg_att1))

THEN 1 else 0 = 0

I have been using this for years (learned it from Dan) but had no idea there was a math term for it.

Okay so I am a geek. 🙂

The Test

After all that cool stuff came the certification test.

Not easy. My hand cramped writing out the answers.

We get our results next week. (Dan has a fun weekend ahead of him doing a bunch of scoring).

I am sure everyone in class will do fine. As I said, they all seemed to get it.

Anyway, the class is over now and I am in a hotel in Vermont (where it is snowing now). I fly back to Houston in the morning.

I had a good week here in the northeast (despite the weather). It was definitely worth the time and money to come for this class. I met some great people, learned a lot, and got to spend time with my good friend Dan.

Watch out Montreal – you are about to be descended upon by a whole new batch of Data Vault experts.

It could change the way you do data warehousing.

Later.

Kent

Data Vault Certification – Day 2

Still really cold here in Montreal…

But the classroom discussions heated up a bit today as we dove deeper into the secrets of the Data Vault. Today we got into the nitty-gritty of things like Point in Time (PIT) tables, Bridge tables, DV loading, DV agility, set logic, and designing for performance. We got into some brand new material that has not been in prior certification class. All great stuff.

Dan sure knows a lot about a lot of things (hardware, software, operating systems, disk blocks, cache, etc.). His broad knowledge and experience definitely contributed to what is now Data Vault. We got to hear several juicy stories about real world issues he encountered over his illustrious career that lead him to architect Data Vault to have all the neat features it has (hint: lots of unnamed 3-letter gov’t agencies were apparently involved). Dan is bound and determined to help as many people as possible avoid the many pitfalls he has seen in data warehouse architectures.

What I learned today:

I FINALLY understand the legitimate use for a Bridge table and why it is a good idea (i.e., better query performance getting data out of a DV). The examples in the class got through to me (this time). It is all spelled out in the book, but now I really do get it.

ETL Golden Rule (for performance improvement):

1 process per target per action (i.e., Insert, Update, Delete)

In other words don’t make your ETL too complicated by trying to account for everything in one routine. It becomes too complex to maintain over time and it reduces the amount of parallelization you can do (hence it is SLOWER).

Think about it – new rows get inserted, right? So why waste the cycles forcing them through update logic? In a Data Vault structure it is very easy to separate these out. The SQL is pretty simple actually (but that is another post by itself).

Dan will be teaching more about this in his upcoming online Implementation training course. Stay tuned for that.

Data Vault Data Load Golden Rule:

100% of the data loaded 100% of the time

That really makes the load process pretty easy and streamlined. It is correlated to the Data Vault concept of loading the facts – straight from the source, no modifications. This is why the Data Vault is 100% audit-able.

So for Day 3 we will get even more into how to load a Data Vault.

And then there is the TEST (insert dramatic music here).

We get four hours. 60  questions (half of which are essay!).

Guess I better get studying!

Look for my Day 3 wrap up tomorrow night (assuming I can still write by then).

See ya.

Kent

Data Vault Certification Class – Day 1

As promised, here is your update from the 1st day of the certification class…

First let me say it is COLD here. I forgot how truly bone chilling winter in the northeast could be. Glad I brought extra layers. I walked about 10 blocks from the hotel to the conference center this morning (and back tonight). I was definitely awake when I got there! I am sure that walk alone must have burned off a few hundred calories. 🙂

So the class was fun and educational today. We have 9 people attending. All from the Montreal-area, except of course me (and Dan). Nice group of people; very into it. One gentleman, Pierre, was part of Dan’s online coaching program and has actually already built a successful data vault. It is really nice being with a group of people who “get it”, are engaged, and want to learn more.

Bits and pieces from today:

The goal of certification:

  1. To validate that we actually understand the Data Vault Model and the standards
  2. To validate that we can actually explain it to someone else
  3. To test us to be sure we can actually apply the rules and standards when we develop a real model

Word of the Day:

Deformity: The URGE to continue “slamming” data into an existing conformed dimension until it simply cannot sustain any further changes. This results in a “deformed” dimensions, increased support costs, and likely leads to re-engineering.

Cause: Business saying “But can’t you just add a few more columns to the table? That should be easy right?”

New question to ask:

When you change or re-engineer part of your ETL that scrubs or transforms your source data, do you keep a copy of the original code and structures under some sort of source control? If not, how will you explain to the auditors why the data in the quarterly report changed between runs?

Concept I understand better after today: 

Transaction Links: This is a very special case when you can denormalize  Satellite attributes up into the Link (at a previous job we called these SLINKs). You only do this when the transaction values can NEVER, EVER change. Examples of this are GL entries, and stock trades. Dan’s examples and explanations today really improved my understanding immensely.

Phrase I coined today in the class:

Data Archaeology: Dan uses the analogy of Data Geology (i.e., layers) to explain how (and why) we load data the way we do in the Data Vault. I said that enables us (architects, analysts, users) to effectively do Data Archaeology to find and extract the data we need. We search for those nuggets of wisdom in the data to help our businesses. Sometimes that data is near the surface and sometimes it is fossilized deep in the historic layers within the Data Vault.

No doubt somebody, somewhere, has said this before, but just in case they haven’t, you heard it here first. 😉

We also had great discussions about Agile BI, virtualization of data marts, in-memory dbs, solid state disks, ontologies, and the future of data warehousing in general. And what data warehouse class would be complete without mentioning Bill Inmon and Dr. Ralph Kimball?

Well, that’s it for now.

Stay tuned for Day 2.

Kent

P.S. As I mentioned yesterday, feel free to leave any questions you might have for Dan in the comments and I will pass them on. Or better still, just go buy the Data Vault book.

Post Navigation