Hardware Sizing Guide










































Release:  4.5

Document Version:  4.5-1
























InfiniDB Hardware Sizing Guide

March 2014

Copyright © 2014 InfiniDB Corporation. All Rights Reserved.


InfiniDB, the InfiniDB logo and any other product or service names or slogans contained in this document are trademarks of InfiniDB and its suppliers or licensors, and may not be copied, imitated or used, in whole or in part, without the prior written permission of InfiniDB or the applicable trademark holder.


Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of InfiniDB.

InfiniDB may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering  subject matter in this document. Except as expressly provided in any written license agreement from InfiniDB, the furnishing of this document does not give you any license to these patents, trademarks copyrights, or other intellectual property. The information in this document is subject to change without notice. InfiniDB shall not be liable for any damages resulting from technical errors or omissions which may be present in this document, or from use of this document.

Overview. 4

Data Size and Data Volumes. 4

Workload and Active Data Set 5

Memory Requirements. 5

Linear Scale Expectations. 6



InfiniDB delivers tremendous I/O efficiency versus traditional row-based DBMS through a number of techniques. This I/O efficiency allows for queries to scale linearly with additional processing power without encountering traditional bottlenecks. Adding cores can be accomplished either by scaling up and leveraging today’s multi-core systems, or by scaling out and leveraging commodity hardware.


Sizing discussions for InfiniDB involve understanding both the business requirements related to the size of the data, as well as the target workload to be processed.


Data size requirements will dictate the drive capacity requirements

Workload processing requirements will dictate the CPU and memory requirements.

Data Size and Data Volumes

Configurations decisions based on data size are used to determine the capacity of the storage system, and do not necessarily determine the required processing power, or the number of servers. InfiniDB supports multiple data volumes, and the number of data volumes can be scaled independently of the number of servers. The number of data volumes configured for InfiniDB should be larger than planned number of servers.


InfiniDB 2.x runs with each node having full visibility to all disk resources, typically in a SAN configuration. This can be accomplished via a fiber channel or other switch providing connectivity to many data volumes.


Data Volumes should be configured to have between 4 and 8 disks configured in a RAID 10 configuration, delivering 250 to 350 MB/s read throughput per volume. Data volumes can be configured to support between 1 TB and 3 TB of storage. Raid 5 can be used if the load rate requirements are minimal and the load operations don’t overlap with significant query workload.


For example, assuming a 5:1 compression ratio and 2TB data volumes, storage for 40TB of raw data would require 5-6 data volumes to allow for some future growth.


Advanced SAN systems may be configured with differing amounts of drives depending on the SAN architecture, however the ratio of read performance (250 to 350 MB/s) to volume capacity (1 to 3 TB) should remain consistent. Write-Back cache can be used if supported by battery or UPS backup.


Configuration on Amazon Web Services should maximize the number of Elastic Block Store volumes to maximize overall I/O throughput.


Workload and Active Data Set

Workload requirements for a database system are a function of Query Size, Query Concurrency, and the Active Data Set.


The output of the Workload calculations is a metric in terms of blocks touched per second. Typical processing capabilities for modern CPU can be between 10,000 and 50,000 block operations per second per core. These metrics can be used to calculate a projected ability to process workload. Newer CPUs will deliver higher processing capabilities, and processing data from cache will also result in higher block processing rates.

Memory Requirements

Memory is critical to both query flexibility and performance with InfiniDB. Memory is used to avoid I/O from storage as well as providing space for memory based join and aggregation operations. Minimum specification for evaluation of InfiniDB is 32GB, but the typical sizing for full-scale testing/production use is higher.  A significant number of customers go into production with between 96GB and 512GB of memory.


Analysis with dimension tables under 1 million rows and minimal need for self-join of fact tables can be handled with 32GB of memory. For more complex analysis, large dimension tables, and self-join of fact tables a larger memory configuration is recommended. The largest configuration in production is approximately 200GB allocated for TotalUmMemory.


The recommended amount of memory for the data buffer cache to avoid I/O from storage is 20% of the Active Data Set. For example, with 36 months of history and 36 TB of raw data size, the Active Data Set would be approximately 1 TB, and a data buffer cache of approximately 200GB would be appropriate.

Linear Scale Expectations

InfiniDB offers scalable disk resources and scalable processing that allows for linear scaling of queries. In general, doubling the processing power and I/O throughput will deliver twice the query performance, cutting query time in half.


Scaling expectations:


Note that doubling the total data size, while the Active Data Set remains the same will not generally impact performance. For example if the Active Data Set is 1 month, but the historical data grows from 36 months of history to 72 months of history the query performance will remain consistent.