Extremely high disk space consumption

10 posts / 0 new
Last post
testing4tester
testing4tester's picture
Offline
Last seen: 2 years 6 months ago
Joined: Jan 4 2012
Junior Boarder

Posts: 4

te tes
Extremely high disk space consumption

Hi,
I've installed recently InfiniDB to test its value as logs database. As I'm satisfied with its querying performance, I'm very unsatisfied of its disk space consumption.
I created a test table in MySQL and then copied it to InfiniDB using mysqldump and Infini's source function. The table consumes ~75 MB in MySQL and its dump is ~94 MB. However, when I look into a size of files created by InfiniDB, I was shocked - over 350 MB have been taken. The table itself has ~1,5M rows and three columns (datetime and two varchars).
Is this normal behaviour of InfiniDB? The above example is just my preliminary test case. In real application the table consists of around 14 columns and hundreds millions rows, so this kind of overhead is simply unacceptable.

radams
radams's picture
Offline
Last seen: 22 hours 3 min ago
Joined: Jan 3 2011
Administrator

Posts: 481

Robert Adams
Re: Extremely high disk space consumption

Hi,

And thanks for checking out InfiniDB!

Please review some of the information available on how our database works. The Concepts Guide has a section that describes how we store data. One of the steps we take when creating a table is to allocate space for a minimum of 8 million rows of data.

http://infinidb.org/downloads/cat_view/39-documentation

The table your testing with may only have 1.5 million rows but the disk space is sized to store up to 8 million rows.

Thanks,

Robert
Calpont

testing4tester
testing4tester's picture
Offline
Last seen: 2 years 6 months ago
Joined: Jan 4 2012
Junior Boarder

Posts: 4

te tes
Re: Extremely high disk space consumption

Dear Robert,

Thank you for your response. However, I still have doubts. I've just performed some more tests.
As you stated in your response, InfiniDB allocates space for 8 millions rows. I filled database with ~ 7 millions rows of the same type as previously. I could then expect similar space consumption. However, the size has increased to over 600 MB (from ~350 MB with 1,5 million rows), while exacty the same table in MySQL has ~320 MB. Is there any explanation for that?

radams
radams's picture
Offline
Last seen: 22 hours 3 min ago
Joined: Jan 3 2011
Administrator

Posts: 481

Robert Adams
Re: Extremely high disk space consumption

Hi,

InfiniDB uses a token-dictionary strategy to manage strings longer than 8-bytes fixed length, and 7 bytes variable length. Non-dictionary columns are set to the 8 million row size when they exceed 256K rows. Your varchar columns are apparently dictionary columns. Dictionary columns will grow as needed, 64 MB at a time.

The Enterprise version of InfiniDB does offer compression if you are interested in checking it out.

Thanks,

Robert
Calpont

testing4tester
testing4tester's picture
Offline
Last seen: 2 years 6 months ago
Joined: Jan 4 2012
Junior Boarder

Posts: 4

te tes
Re: Extremely high disk space consumption

Dear Robert,

Thank you for the details. Currently we're not considering purchasing Enterprise version as most probably Community will be more than sufficient for our needs.

However, I'm still curious about the disk space consumption. As I stated, my table has three columns: datetime, varchar(20) and varchar(100). If we go to production with InfiniDB, the structure will be much more complex, with approx. 10 varchar(100) columns. I just simply want to avoid situation when the DB becomes too big.

Correct me, if I'm wrong, but I understand that in the way that if in a column a varchar/char longer than 7 or 8 characters are stored, the table's size will increase by 64 MB (per table? per column?) when the allocated space is exhausted?

And one more question - is there any way to determine the size of the table (or database) from within idbmysql?

testing4tester
testing4tester's picture
Offline
Last seen: 2 years 6 months ago
Joined: Jan 4 2012
Junior Boarder

Posts: 4

te tes
Re: Extremely high disk space consumption

Hi,

Just one more question in this topic. What is Logical Data Compression? Is it enabled by default or it has to be started explicitely?

radams
radams's picture
Offline
Last seen: 22 hours 3 min ago
Joined: Jan 3 2011
Administrator

Posts: 481

Robert Adams
Re: Extremely high disk space consumption

Hi,

Dictionary columns will grow as needed, 64 MB at a time when the additional space is needed.

A tables size can not be determined by using commands in idbmysql.

Logical Data Compression - means when using a string column which only has a few unique values and they are repeated, the string will be stored and multiple rows reuse the string.

Thanks,

Robert
Calpont

vaultec
vaultec's picture
Offline
Last seen: 1 year 5 months ago
Joined: Jun 23 2012
Junior Boarder

Posts: 3

Hans Steinmann
Re: Extremely high disk space consumption

Hi there!

Sorry if I warm up an old post, but this was the nearest to my problem. (And it's my first post too :)

But I have a similar issue... I acctually started developing with InfiniDB and I run out of disc space with my first 10 tables :(

All tables are identical. (85 columns, 10 dec, 35 int and 40 varchar from 1 to 18)

I just loaded 3.2 million lines in those tables (0 to the smalles, 260.000 to the largest) table and the size in total is now 17 GB? I'm not sure how much space was allocated when I just created the tables...

I thought that the initial size of every table should be enough for 8 million lines... but I ran out of discspace because of many many extends. (OK, my mistake, I'll resize by VMWare-Image...)

But is it possible that in the end I'll allocate 30 GB tablespace with my 2 GB test data?

Greetings
vaultec

radams
radams's picture
Offline
Last seen: 22 hours 3 min ago
Joined: Jan 3 2011
Administrator

Posts: 481

Robert Adams
Re: Extremely high disk space consumption

Hi Vaulted,

The amount of disk space can be calculated based on the table elements being populated:

bigint = 8 bytes
int = 4 bytes
smallint = 2bytes
tinyint=1 byte

varchar(8) or larger = 8 byte token + strings lengths
char(9) or larger = 8 byte token + string lengths

char(1) = 1 bytes
char(2) = 2 bytes
char(3-4) = 4 bytes
char(5-8) = 8 bytes

varchar(1) = 2 bytes
varchar(2-3) = 4 bytes
varchar(4-7) = 8 bytes

date = 4 bytes
datetime = 8 bytes

Non-dictionary columns are set to the 8 million row size when they exceed 256K rows. Dictionary columns will grow as needed, 64 MB at a time.

Thanks,

Robert

vaultec
vaultec's picture
Offline
Last seen: 1 year 5 months ago
Joined: Jun 23 2012
Junior Boarder

Posts: 3

Hans Steinmann
Re: Extremely high disk space consumption

Hi Robert,

I found your list in a different thread, but I was not able to calculate the size I need with my sample data. In my case the worst case would be that I double the amount of bytes I need. But after a while of thinking looking at my data:

When I have a table with 10 col. with 12 bytes varchar and I'll have more that 256.000 lines, the next extend will allocate

8 bytes for the token and 12 bytes for the data -> 20 bytes per col
20 bytes * 8.000.000 (because of non-dic) --> 160 mb per col
160 mbytes * 10 col --> 1.6 GB

Or do I make a mistake? (And by the way, are there differences when I load the data with dml statements except the speed?)

Greetings
vaultec

May be I'll have a look at the enterprise data compression ...