The Elephant in the Data Lake and Snowflake
So is Hadoop finally dead? For many use cases, I think it really is. The cloud and the continued evolution of technology has created newer, better ways of working with data at scale. Check out what Jeff has to say about it!
Jeffrey Jacobs, Consulting Data Architect, Snowflake SnowPro Core Certified
Let’s talk about the elephant in the data lake, Hadoop, and the constant evolution of technology.
Hadoop, (symbolized by an elephant), was created to handle massive amounts of raw data that were beyond the capabilities of existing database technologies. At its core, Hadoop is simply a distributed file system. There are no restrictions on the types of data files that can be stored, but the primary file contents are structured and semi-structured text. “Data lake” and Hadoop have been largely synonymous, but, as we’ll discuss, it’s time to break that connection with Snowflake’s cloud data warehouse technology.
Hadoop’s infrastructure requires a great deal of system administration, even in cloud managed systems. Administration tasks include: replication, adding nodes, creating directories and partitions, performance, workload management, data (re-)distribution, etc. Core security tools are minimal, often requiring add-ons. Disaster recovery is another major headache. Although Hadoop is considered a “shared nothing” architecture, all…
View original post 809 more words