January 11, 2012 | Nick Ochoa
To SQL or NoSQL for Analytics is Not the Question
If you missed it, one of the biggest 2011 events surrounding Big Data was the NoSQL Now Conference, which took place August 23-25 in San Jose. William McKnight, President of the McKnight Consulting Group, presented on columnar database technologies for Big Data analytics. (He also authored a white paper on the Best Practices for Columnar Databases, which you can find here).
As William discussed, the columnar approach dramatically speeds up the performance of analytic queries by minimizing I/O to disk. Though not new (column-oriented DBs have been in play since the 1970’s), the emergence of large data and the need for “real time” analytics to become a reality has spurred the development of columnar and row-column hybrid data engines. The columnar paradigm has proven to be the most efficient in addressing Big Data analytics. In fact, columnar databases have become so popular that they are on the cusp of hitting critical mass within the market, rounding out the last phase in visibility and maturity on the Gartner hype cycle.
2011 Gartner Hype Cycle for Data Management
Columnar databases benefit an enterprise in a multitude of ways, making it clear that they’ve pushed past the “slope of enlightenment” and are into mainstream adoption. Query performance for large data sets, the obvious value add, has been so much greater than that of OLTP and row OLAP dbs, hardly a comparison is warranted.
Wow @Calpont @InfiniDB is 85 times faster than @Oracle on a 1B record table. The perfect start to our p.o.c.
25 Aug via web
Traditional RDBMS have also struggled with data load performance, also picked up nicely by columnar databases. These two in combination have enabled an expansion of dimensional analyses within the same data warehouse scheme (i.e., without parsing into separate federated marts) while performing as needed, at scale. They’ve also enabled improved segmentation of data for analysis (i.e. aggregations).
Earlier this year, I sponsored a primary research study in the use of analytic databases for telco and online media organizations. Over 95% of my respondents were familiar with columnar databases, with a good number of those planning to evaluate for use within their organizations for dimensional business analytics and non dimensional predictive/data mining analytics, either as an augmentation or replacement for legacy database systems. (In fact, why build MOLAP cubes on top of columnar-based relational star schemas if the stars will run faster than the cubes whether sparsely or densely populated? Keep the semantic layers simple.)
So now that you know what columnar databases are and why they are important, what do you know about InfiniDB and why should you be interested? You may be surprised to know that InfiniDB is not only a MPP columnar analytic database, but also converts SQL (MySQL to be exact) statements into map and reduce operations to execute queries of massive size. So, not only does it provide the benefit of a 100% columnar design (which is, by the way, more efficient than row columnar hybrids from an I/O perspective), it also enables SQL for Big Data (i.e., NoSQL = Not only SQL, providing the best of both worlds).
Although not a Hadoop implementation, InfiniDB’s map reduce style execution provides the best of query performance and scale in one environment - fully distributed, and parallel, and also right “out-of-the-box”. Add automatic partitioning, compression, partition drop, and db tuning, then InfiniDB is one of the easiest high performance data solutions on the market to own and use.
With the inevitable rise in Hadoop use, we continue to look for ways to enhance our product and anticipate the needs of data administrators, developers, and users looking to leverage other data environments and tools. This is why Calpont created the InfiniDB-Hadoop data connector, which transfers data to/from InfiniDB and the Hadoop Cluster. It removes the heavy lifting needed to leverage Hadoop data for low latency analytics of Big Data, providing a perfect complement to a heterogeneous Hadoop environment.
As was made clear by William’s presentation and other industry talk at the NoSQL event, Big Data continues to grow bigger and bigger, with need for the right data infrastructure to tackle the needs of the enterprise. A “one size fits all” approach to data processing and analytics just won’t work with today’s varied workloads. In fact, Forrester research indicates that the term we’ve all become familiar with, “Big Data,” will soon likely be replaced by “Any Data”, as ALL data is becoming of huge importance for enterprises the world over. With InfiniDB, we’ve got Any Data analytics covered, now and in the future.