The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “automation”

Why Column-Aware Metadata Is Key to Automating Data Transformation

So it is 2023, and like everyone, I have a few predictions. Check out my thoughts in this post about automating data transformations using column-aware metadata and why it is critical to the success of more collaboration and data sharing..

Data, data, data. It does seem we are not only surrounded by talk about data, but by the actual data itself. We are collecting data from every nook and cranny of the universe (literally!). IoT devices in every industry; geolocation information on our phones, watches, cars, and every other mobile device; every website or app we access—all are collecting data. 

In order to derive value from this avalanche of data, we have to get more agile when it comes to preparing the data for consumption. This process is known as data transformation, and while automation in many areas of the data ecosystem has changed the data industry over the last decade, data transformations have lagged behind. 

That started to change in 2022, and in 2023 I predict we will see an accelerated adoption of platforms that enable data transformation automation. 

Read the rest of the post here – Why Column-Aware Metadata Is Key to Automating Data Transformation

May 2023 be your most agile year yet!

Kent

The Data Warrior

Why Automation is No Longer a Choice for Your Data Architecture

Back in the saddle again for The Data Warrior! Here is a piece I just did for the folks at Wherescape about one of my favorite topics – Automation!

The world of data has changed for sure. Especially over the past several years. In fact, the pandemic accelerated some changes, like the migration to cloud-based data platforms. When everyone needed to be remote, it just made sense to move to the cloud and use a service for your data platform.

Along with that came more data, more data types, and an actual business needs to move faster. Companies had to adapt very quickly during the pandemic if they wanted to survive. Many did and thrived while others, well, not so much.

As the demand for data continues to grow at unprecedented rates, and as it becomes a non-negotiable asset for organizational success, the requirement to rapidly deliver value from that data (i.e., turn it into information for data-driven decision making) has become an imperative.

So how do we deliver value faster with our data warehouses, data meshes, and enterprise data hubs? Automate, automate, automate.

Check out the full post here – Why Automation is No Longer a Choice for Your Data Architecture

Enjoy!

Kent

The Data Warrior

4 Keys to Succeeding with Agile Data Warehousing in 2016

I have been out giving talks again on using agile methods for data warehouse and business intelligence projects, so I thought it was time for me to share my thoughts about the 4 key elements you need to be successful with an Agile DW project in 2016.

Adopt an Agile Methodology

By this I am talking about SCRUM, Kanban, ScrumBan, or DAD (Disciplined Agile Development), among others.

Go read the blogs, read the books, study these methods. Attend a conference (like Agile Tech in April). Figure out what will work for your organization’s culture and leverage the skills of your staff. One size does not fit all.

In past engagements I have used approaches primarily based on SCRUM and Kanban. Both have been very effective once we got our processes down.

If you need/want help, find a good agile coach.

Use an Agile Data Engineering Approach

If you want to develop your data warehouse in an agile, iterative manner, then you need a way to design your EDW repository that lends itself to this approach without causing huge re-engineering pains (known as refactoring) in future iterations.

The best way I have found is using the Data Vault modeling approach. It was designed specifically for building data warehouses in this manner. I have written much about this approach and give many talks showing examples of successful agile projects using Data Vault. And there is plenty of material available to help you learn how to do it (see the books on the sidebar of this blog).

Also keep an eye on Dan Linstedt’s twitter feed and blog for his training classes.

Use Data Warehouse Automation Software

No better way to get agile and deliver results fast, than to automate as much of your development work as possible. If you use repeatable patterns (like Data Vault) in your design methodology, then it is even easier to automate and greatly reduce your time to market.

There are two vendors in the market that I like a lot and have had some experience with. They are WhereScape and AnalytixDS. And both support not only “traditional” approaches to data warehousing (like automating the ETL for a Type 2 Slowly Changing Dimension) but they both also support Data Vault (and both will be at WWDVC 2016).

Which of these tools you might use depends on your approach, your current tools, and your skills.

If you are coming from a more traditional DW paradigm and use ETL tools like Informatica, Talend, or DataStage, then I would recommend you look at AnalytixDS Mapping Manager which allows you to generate your ETL code from source to target mappings.

If you are just getting started or are committed to more of a database-centric approach and want your ETL or ELT code to run in the database, then look at WhereScape’s products.

Both are great companies with knowledgable people and happy customers.

Your third option is to write your own automation routines. There are many shops doing that as well. Just be sure you have the appropriate skills in house and can allocate the upfront time to get going (a month or so at least).

Deploy on an Agile Data Warehouse Platform

So now that I have learned about Elastic Data Warehousing in the cloud, I can’t imagine trying to do an agile DW project any other way.

Of course I am referring to Snowflake Computing’s DWaaS (data warehouse as a service) offering. Yes, I might be a bit biased since I do work for them now, but…this tech is really good!

From a features perspective, what I am talking about is having a high powered, easily scalable database that supports BI and analytic workloads and does not require a ton of time to configure and tweak.

Why do I think that is a success criteria? Because I have spent way too many months on way too many “agile” projects waiting to get access to the hardware! Or I get access and we either run out of space (e.g., “we had no idea you need THAT much storage”) or we can’t properly test production level loads and queries because the development box does not have enough horsepower.

Taking advantage of the elasticity of the cloud solves both of these problems and the folks at Snowflake have successfully built an RDBMS in the cloud that specifically harnesses these features and leverages them for data warehouse and analytic workloads by providing the ability to scale up and scale down both storage and compute resources on demand.

That and its many other features, give me the infrastructure I need to get an agile data warehouse project off the ground almost instantly. And I can do a Data Vault on Snowflake too.

Very cool.

So what do you think? Are you ready to accelerate your team’s performance and adopt an agile approach to data warehousing?

I hope this post gives you a few ideas on how to make that happen.

Model on!

Kent

The Data Warrior

 

Oracle Designer Lives!

Amazing as it seems, I picked this article up on Twitter today.

An up to date, current, and NEW article about automating builds of applications from the Oracle Designer repository.

How very agile…

Thanks to all the gang over at AMIS (http://technology.amis.nl/) for keeping the technology alive and for being innovative enough to adapt it for the modern agile development world.

Running Oracle Designer Generation from Ant and Hudson

Introduction

Oracle Designer is a windows client-server development tool that is meant to be manually operated by a developer. Anyone trying to integrate Designer with an automatic build environment will find that it does not provide an API or a commandline version to kick-off any generation automatically.

There is however a hook that can be exploited by generating so-called GBU files directly from the Designer Repository. These GBU files are then fed to an executable called dwzrun61.exe that executes the actual generation of DDL scripts and forms.

This article describes how this can be done using examples from a real world situation. It shows how to generate the GBU files, the different strategies that can be followed and some of the pitfalls you might run into trying to pull this off yourself.

The code of the program we wrote can be found on here and is free to be adjusted to fit any other situation than ours.

via Running Oracle Designer Generation from Ant and Hudson.

If you want to meet some of the guys from AMIS and pick their brains, be sure to sign up for KScope13 and come meet them live in person.

See you there.

Kent

Post Navigation

%d bloggers like this: