The Data Warrior

Changing the world, one data model at a time. How can I help you?

Archive for the tag “semi-structured data”

Schema-on-what? How to model JSON

It seems hard to believe, but all year, around the world, I continue to have this conversation on whether or not we still need data modeling.

I know! Crazy!

Thought we were past that…

As I have said before,

Schema-on-read has the word SCHEMA in it!

So instead of continuing to rant about it, I decided to put together a talk to show people, graphically, what I meant by decomposing, step by step, a few JSON documents into real data models. For the sake of the talk I decided to go with 3NF and Data Vault styles to make my point.

This talk has been very well received so I decided I would share it a bit more publicly by posting it here on my blog.

 

Now that you can see how to model JSON, check out my Snowflake ebook on how to easily analyze JSON using SQL.

If you know any meet-ups or conferences that I should be giving this talk at, please let me know. Or check out my speaking schedule for 2019 and join me at one of the events already on my calendar. (1st up is ITOUG in Milano!)

Ciao!

Kent

The Data Warrior & Chief Evangelist at Snowflake

P.S. There was no magic, or built-in wizard, to creating the models. I did it all by hand using Oracle Sql Developer Data Modeler.

 

Advertisements

Snowflake SQL: Making Schema-on-Read a Reality (Part 2)

This is the 2nd of my articles on the Snowflake blog.

In the first article of this series, I discussed the Snowflake data type VARIANT, showed a simple example of how to load a VARIANT column in a table with a JSON document, and then how easy it is to query data directly from that data type. In this post I will show you how to access an array of data within the JSON document and how we handle nested arrays. Then finally I will give you an example of doing an aggregation using data in the JSON structure and how simple it is to filter your query results by referring to values within an array.

Check out the rest of the post here:

Snowflake SQL: Making Schema-on-Read a Reality (Part 2) – Snowflake

Enjoy!

Kent

The Data Warrior

Snowflake SQL: Making Schema-on-Read a Reality (Part 1) 

This is my 1st official post on the Snowflake blog in my new role as their Technical Evangelist. It discusses getting results from semi-structured JSON data using our extensions to ANSI SQL.

Schema? I don’t need no stinking schema!

Over the last several years, I have heard this phrase schema-on-read used to explain the benefit of loading semi-structured data into a Big Data platform like Hadoop. The idea being you could delay data modeling and schema design until long after the data was loaded (so as to not slow down getting your data while waiting for those darn data modelers).

Every time I heard it, I thought (and sometimes said) – “but that implies there is a knowable schema.”  So really you are just delaying the inevitable need to understand the structure in order to derive some business value from that data. Pay me now or pay me later.

Why delay the pain?

Check out the rest of the post here:

Snowflake SQL: Making Schema-on-Read a Reality (Part 1) – Snowflake

Enjoy!

Kent

The Data Warrior

Post Navigation

%d bloggers like this: