So is Hadoop finally dead? For many use cases, I think it really is. The cloud and the continued evolution of technology has created newer, better ways of working with data at scale. Check out what Jeff has to say about it!
Let’s talk about the elephant in the data lake, Hadoop, and the constant evolution of technology.
Hadoop, (symbolized by an elephant), was created to handle massive amounts of raw data that were beyond the capabilities of existing database technologies. At its core, Hadoop is simply a distributed file system. There are no restrictions on the types of data files that can be stored, but the primary file contents are structured and semi-structured text. “Data lake” and Hadoop have been largely synonymous, but, as we’ll discuss, it’s time to break that connection with Snowflake’s cloud data warehouse technology.
Hadoop’s infrastructure requires a great deal of system administration, even in cloud managed systems. Administration tasks include: replication, adding nodes, creating directories and partitions, performance, workload management, data (re-)distribution, etc. Core security tools are minimal, often requiring add-ons. Disaster recovery is another major headache. Although Hadoop is considered a “shared nothing” architecture, all…
Maybe not a cool as Star Fleet Academy, but this is pretty cool.
Snowflake and a number of our partners have come together to create the first, self-paced, vendor agnostic, online training academy for analytics in the cloud. This academy will get you up to speed on what is happening today in the cloud with respect to data warehousing and analytics so that you can be a leader in your organization.
The Cloud Analytics Academy is a training and certification program for data professionals who want to advance their skills for the technology and business demands of today’s data analytics. It’s a collective industry effort from executives at Snowflake Computing, AWS, Looker, Talend and WhereScape.
The Academy is designed for data professionals of all technical and business levels and backgrounds. You can complete any or all of the following Academy tracks:
Executive Fast Track– Learn the key technologies and techniques to foster an effective cloud analytics team.
Cloud Foundation Track– Become proficient with the fundamental building blocks of cloud analytics.
Modern Data Analytics Track– Learn advanced technical concepts to propel your cloud analytics.
There will be quiz questions at the end of each session to test your retention. By completing the sessions and the quizzes in a track you will get a related academy badge. Anyone who completes all three tracks will be certified as a Cloud Analytics Academy Master.
The Academy launches on November 14th with a live keynote from Tom Davenport. You can get all the details and sign up here. (Don’t worry if you miss it as it will be recorded and available the rest of the year).
So what are you waiting for? Sign up today and take your career to the next level – above the clouds!
Snowflake Computing is making great strides in the evolution of our Elastic DWaaS in the cloud. Here is a recent update from engineering and product management on our integration with Spark:
This is the first post in an ongoing series describing Snowflake’s integration with Spark. In this post, we introduce the Snowflake Connector for Spark (package available from Maven Central or Spark Packages, source code in Github) and make the case for using it to bring Spark and Snowflake together to power your data-driven solutions.
Big Data. NoSQL. The Cloud. Self-service<whatever>.
And Cloud Data Warehousing.
Some of the offerings and solutions are real. Some less so.
Newest on the scene is cloud data warehousing (or data warehousing in the cloud). As with all new tech, there are a variety of offerings out there with different characteristics. To help folks try to understand the space a bit more, the company I work for (Snowflake Computing) put together a (hopefully) hype-free, vendor agnostic book on the topic called Cloud Data Warehousing for Dummies, which I blogged about last month. If you have not already gotten a copy and read it, I encourage you to do so soon. I think you will find it very helpful in the coming months as this topic heats up.
It is where data warehousing is going. Period.
But is Cloud Data Warehousing really for real?
I may be biased here (okay, likely), but based on my experience working with Snowflake for over a year now, I have to say yes. Emphatically, yes!
Cloud Data Warehousing is real. It can handle real data and real workloads. To the tune of hundreds of terabytes and even petabytes of structured, and semi-structured, data, all for a fraction of the cost of traditional on-premises data warehouse solutions, and with the ease of administration you expect from a cloud-based SaaS solution.
But, as they say, the proof is in the pudding!
So here are a few proof-points for you from real, live customers, who have been using Snowflake to improve their business outcomes.
AthenaHealth is a leading healthcare services provider (with a network of 85,000 providers and 83 million patients nationwide). So yes, it is possible to have a cloud data warehouse that is secure enough to pass HIPAA regulations for holding PHI (Personal Healthcare Information).
In this video, Adam Weinstein, Executive Director of Analytics & Data Science explains how AthenaHealth leverages the Snowflake Cloud Data Warehouse service to radically accelerate their reporting with real-time updates, more advanced analytics, and machine-learning, while minimizing overhead and maintenance.
Some of the key benefits AthenaHelth experienced using Snowflake:
Ability to work with petabytes of healthcare data
Ability to scale to meet analytic needs both internally and externally
Lower total cost of ownership (TCO) than other options
Ability to support machine learning-based products
Reduction in overhead maintenance thanks to the Snowflake service offering
What I see Snowflake enabling us to deliver to our clients, internal stakeholders and paying customers will be pretty freaking cool!
Iovation is the leading SaaS provider of fraud prevention and multifactor authentication solutions. So needless to say, they know security and they feel very secure with their data in the cloud.
In this video, Kurk Spendlove, Director of Engineering, shares why they switched from Vertica to the Snowflake Cloud Data Warehouse service in order to load semi-structured data directly into the cloud data warehouse and analyze years of data in a matter of minutes.
Some of the key benefits Iovation experienced using Snowflake:
Ability to load semi-structured data directly into Snowflake
Loading schema-less data – not having to modify schema every time data is changing in new weekly releases
Ability to scan through years’ worth of data and having the report back in minutes
Powerful support for new machine learning-based products
Minimize management for data warehouse and overhead
I’m a big fan of Snowflake and the people behind it.
Rue La La
Rue La La is a flash sale site with over 18 million members looking for great deals on designer fashion and accessories.
Director of BI and Data Warehousing at Rue La La, Erick Roesch says:
Snowflake’s separation of compute and storage is just revolutionary!
In this video, he explains how they replaced their legacy data warehouse and Hadoop data lake with a Snowflake Cloud Data Warehouse to merge data sources for fast, data-driven business decisions.
Key benefits Rue La La saw from switching to Snowflake:
Merge different data sources for data-driven insights- 360-view of their customers!
Better targeted marketing and promotions to Rue La La members based on their personalized preferences
Better purchasing decisions for Merchandising and planning dept – they can learn more about context of the product, avoid having residual inventory of things that don’t sell
All data in one place in real time– internal and external data feeds (demographic, census, geo-location data)
No admin and infrastructure costs
Streamlined development cycles -traditional development activities and processes become very simple
Sharethrough is the leading global native advertising (adtech) platform. In this short video listen to the Head of Analytics, Joseph Bates, explain how they were able to drastically reduce query times, streamline complex processes, and build new data pipelines by switching from MySQL to the Snowflake Cloud Data Warehouse.
Some key benefits Sharethrough saw from using Snowflake:
Reduced query times from hours to seconds (before, basic queries took an hour to return)
Streamline complex processes with minimal cost
“Query that used to take an entire weekend & $1,200 of compute time to run, now in Snowflake runs with bare minimum ETL, 4 lines of SQL in 30 seconds.”
Minimal database administration
The next step will be to see how we can build new data pipelines and meet the demands of our business, and I think Snowflake is unparalleled in this regard.
Cloud Data Warehousing is not just hype
Hopefully you can see by the passion and excitement from these customers, that it is not all hype. The promise of the cloud combined with a next-generation SQL-based data warehouse engine is in fact delivering the goods.
I am even more excited about the possibilities now than when I joined a year ago. It is awesome to see what these, and other companies are doing to transform their businesses and really challenging the status quo of in not only the data warehousing arena, but big data as well.
If this tech excites you too, please share on social media with any and all who love data and want to change the story for enterprise data warehousing! And don’t forget to follow Snowflake on twitter @snowflakedb for more customer success stories, upcoming webinars, and product announcements.
This is the time of year when we all make plans and set goals for the new year, right?
So what are you going to do different this year? How about grow your career by learning something new?
Read a book on an area of tech you are not so familiar with. Cloud perhaps? My recommendation, of course, is to check out the new Cloud Data Warehouse for Dummies book I mentioned in my post last month. Or maybe one of my ebooks listed on the blog sidebar?
Attend a webinar. My favorite user group, ODTUG, has a continuous lineup of FREE webinars through out the year. You can see the list and sign up here. (I will be giving one next week!)
Attend a conference or meetup. As I mentioned in my post on staying current, nothing beats meeting and learning from folks face-to-face. Plan ahead, budget some time and training money to attend one of the many industry events that happen all year. For some ideas, check out my speaking schedule for 2017 with options around the US.
So what will it be? If possible try to do at least one of each – read a book, attend a webinar, go to a meetup!
Make 2017 a great year!
The Data Warrior
P.S. If you plan to attend any of my in-person talks, please drop me a line to let me know! And be sure to follow me on twitter or check my schedule periodically as I am adding new talks and locations all the time.
You must be logged in to post a comment.