July 20, 2013 | abalogh
An Introduction to InfiniDB and Hadoop
Often customers find InfiniDB while evaluating a range of tools to solve their Big Data issues. In fact, InfiniDB is used alongside Hadoop in various ways. We thought it would be helpful to describe the similarities and differences between Hadoop and InfiniDB.
Hadoop has rapidly gained popularity due to its power, scalability and affordability. Hadoop's strengths shine when dealing with semi-structured and unstructured data -- that is, information that exists in web log files, text documents, social media networks and other data sources that exist outside of traditional enterprise data warehouses. That said, by itself, Hadoop can not and does not solve every Big Data analytics problem.
First, Hadoop is designed to be a batch processing tool. In other words, it executes a specific sequence of operations across a distributed infrastructure. The challenge is that currently, Hadoop entails a similar overhead for small queries as it does for large queries. Thus, small Hadoop queries can take hundreds or thousands of times longer to execute analytics as a special-purpose analytic tool such as InfiniDB.
Second, Hadoop cannot be accessed by SQL directly. Hadoop has firm roots in Java. Although the greater Hadoop ecosystem features tools such as Hive, which is accessed through HiveQL (a SQL-like language), generally working with Hadoop still requires significant Java development experience. Correspondingly, performing Business Intelligence tasks using Hadoop require developer time to create and modify queries. Due to high demand for Hadoop developers, this unfortunately restricts the viability of Hadoop-based solutions for many enterprises.
For this reason, while Hadoop can be acceptable for operational uses, it effectively excludes many ad-hoc analytics use cases. It’s difficult to ask investigative questions from a dataset when requests can take up to several hours to process.
InfiniDB offers MPP scalability combined with columnar storage of data on disk that delivers a tremendous performance experience. The InfiniDB distributed architecture has been tuned to avoid overhead for small queries, and the columnar architecture delivers a 10x or greater reduction in I/O, accelerating operations from disk and memory.
Companies that need fast analytic response times, MySQL familiarity AND access to Hadoop can use the free InfiniDB Hadoop connector. The Hadoop connector allows data transfer between a Hadoop Cluster and an InfiniDB deployment.
Finally, we're proud to share that, coming later in 2013 InfiniDB with integrate directly with Hadoop as a native query engine – look for more details soon! For more information on using InfiniDB alongside Hadoop, please contact us for more detail.