Currently I am working with a very large table of approx. 300M rows in SQL Server which is being constantly updated (The table is used to record log entries). The table schema is quite inefficient/inelegant in my opinion as it is a very flat structure consisting of about 15 columns, most of which are of the type VARCHAR(255, NULL). The table also contains an extremely large amount of duplicate data.
As a part of setting up a BI/Analysis tool, I would like to do two things (in no particular order):
1. Import the data into Calpont's InfiniDB
2. Take this database and normalize it, turning as many of VARCHAR columns into INT's acting as foreign keys to tables with distinct values.
Essentially what the step requires is to do a 'SELECT DISTINCT [column name] INTO [new table]', where [new table] will have an auto increment, primary key column. The transformation should then add a foreign key column in the original table pointing to the value in the [new table] and finally drop the [column name] column.
I've been told on a [url=http://forums.pentaho.org/showthread.php?p=232920#post232920]seperate forum[/url] that there is no point in normalizing the data, and that it might even degrade performance in InfiniDB to restructure the schema this way. Is this true? Am I better off keeping the data as one large table of VARCHAR values?