October 28, 2013 | Bob Wilkinson

Seeking the Essential Truths of Big Data

The world of “Big Data” is a fascinating place – never before has so much capability to store and analyze data been so accessible and affordable.  And yet, all that capability comes in a dizzying array of overlapping technologies and one-off interfaces.  

Now, I am an engineer and relish creating new and innovative products as much as the next.  However, at the same time I think our calling as engineers should be to seek out those essential truths that simplify rather than complicate.  I suggest that these simple and essential “Big Data” truths are:

  • There will be lots (and lots and lots…) of data put in to Hadoop/HDFS.  It’s simple too easy, too cost effective, and too enabling to not do it.
  • Most of that data is inherently structured – we don’t always store it as such, but I challenge anyone to try to come up with something without any structure – I think you will find it awfully hard.
  • The world already knows how to query structured data – that little standard called SQL
  • Faster always wins out over slower – no, this is not an AT&T commercial, but in all seriousness I argue that you can always show value to the enterprise by delivering answers sooner and with less use of resources (all the Hive users out there take note…)

At Calpont, we took these truths and conceived our InfiniDB 4 release.  Because of the flexible and clean architecture of InfiniDB, we had the potential to integrate natively with HDFS and in a manner that achieves very high performance.  Second, we recognized that part of the value of Hadoop is in its open-ness, so we decided to open-source the core InfiniDB engine.  The result: a world-class analytics capability that is open source, runs natively in Hadoop, and performs 100x – 1000x than “native” Hadoop solutions offering some modicum of SQL access.  I am excited because I see whole new classes of valuable business problems will be solved that otherwise would languish by either taking too long or requiring solutions that cost too much.

Download InfiniDB 4 and try it out for yourself.  Throw a hard problem at it and see if you think we might be on to something too.  We would love to hear about what you find.