December 13, 2012 | Nick Ochoa
Hadoop and Analytic Data Platforms Play Nice
Recently, Mark Madsen posted an article titled What Hadoop Is. What Hadoop Isn’t. This is a well thought out discussion of the place of Hadoop vis-à-vis other data technologies. He describes the challenges that analytic database/platform vendors are feeling in the market – the misperception that Hadoop can deliver it all. The reality is that it can’t do it all, efficiently or optimally. From Calpont’s perspective, Hadoop is complementary to InfiniDB (and other data technologies, such as NoSQL), as Hadoop doesn’t offer both the scalability and fast processing times now required for massive structured analytics, but is great for general storage, processing, ETL, and custom unstructured and structured analytic needs.
To Mark’s point, database vendors need to figure out how to work with emerging technologies like Hadoop vs. fighting against them. We saw this need in 2011 based on customer and prospect feedback, and created a data connector between InfiniDB and the Hadoop cluster. Unlike Sqoop, the connector is bi-directional, enabling organizations to determine where they want to do the analysis. We see the primary need for bringing data into InfiniDB and processing the analytics within it, but every organization has their own requirements so we provide the flexibility.
While more organizations are using Hadoop for large-scale structured and unstructured data, the sheer volume of data collected often results in a Hadoop implementation that no longer performs at the desired speed for analysis. This can mean redesign or much manual tuning, which yields both a sub-optimal maintenance experience and data environment for analysis.
High-performance analytics requires a high-performance analytic data platform. The Hadoop framework is great for custom development, but it doesn't inherently have the “out-of-the-box” ability of doing analytics extremely quickly, as Mr. Madsen points out. We enabled InfiniDB to connect to the Hadoop environment to pull data into InfiniDB to execute large scale analytics rapidly. And since InfiniDB converts SQL queries into small “map and reduce” jobs, it is also provides the scale necessary to accommodate Hadoop Big Data deployments, whether for ad-hoc analytics, data warehousing or predictive analytics. InfiniDB requires no special database tuning, indexing or materialized views, making it an ideal element for a hybrid data architecture.
For more information on connectivity between InfiniDB and Hadoop visit http://www.calpont.com/products/hadoop-connectivity