Data/Server corruption after bulk importing

10 posts / 0 new
Last post
kensuda
kensuda's picture
Offline
Last seen: 10 months 3 days ago
Joined: Jul 16 2010
Junior Boarder

Posts: 14

Ken Suda
Data/Server corruption after bulk importing

In testing, I have come across two strange occurrences, neither of which I have since reproduced. Both happened after doing a bulk import.
1) A summary query began to return an incorrect results for the counts. Selecting the full set of data show the correct records, i.e. select gender,count(*) from data group by gender where state='ME' returned like 1 for male, but select * from data where state='ME' and gender='male' return like 150. After deleting a bogus record (that wasn't male nor ME), the queries started to return the correct results.
2) After importing a large data set (20m by 200), running a simple query that should have taken a couple seconds, pegged 50+ cores and didn't return after letting it sit for 20 minutes. After restarting the infinidb service, executing the same exact query again took 5 seconds.

The two of these combined make me think there is some step I might be missing while doing an import.
1) Should the server be shutdown or restarted after doing an import?
2) Is there some "clear cache" or "sync up" command that needs to be called after doing an import?

The sequence I have been following while testing is to drop the tables and database, create a new database and tables, create the import job, import data using bulk loader.

davidhill
davidhill's picture
Offline
Last seen: 4 weeks 1 day ago
Joined: Oct 27 2009
Administrator

Posts: 595

david hill
Re:Data/Server corruption after bulk importing

You definitely don't have to do a restart after doing an import.

Are you using any external disk storage for the database or using the local disk?
On external disk storage, there can be a delay for data caching/syncing.

kensuda
kensuda's picture
Offline
Last seen: 10 months 3 days ago
Joined: Jul 16 2010
Junior Boarder

Posts: 14

Ken Suda
Re:Data/Server corruption after bulk importing

The storage is a local raid controller/array, no iSCSI or Fibre Channel.

patgade
patgade's picture
Offline
Last seen: 3 years 8 months ago
Joined: Jul 16 2010
Junior Boarder

Posts: 5

Patrick Gade
Re:Data/Server corruption after bulk importing

Hi,

I have noticed a similar problem. To recreate it with the simplest example:

1. Create a new table with a single character field
--- create table test_table (test_field char(*)) engine=infinidb;
2. Run colxml and then cpimport to import the data, in my 2,716,720 records.
3. Run a summary group by query
--- select test_field, count(*) from test_table group by test_field order by test_field;
--- This works and gives the correct counts
4. Run a simple count using the like operator
--- select count(*) from test_table where test_field like 'foo';
--- This also works and gives the correct count
5. Run simple count using the = operator
--- select count(*) from test_table where test_field = 'foo';
--- This [u]doesn't [/u] work and returns zero rows.

After that you insert a bogus record and then delete it
--- insert into test_table(test_field) values('bogus');
--- delete from test_table where test_field = 'bogus';
Once this is complete you can run the simple count with the = operator again
--- select count(*) from test_table where test_field = 'foo';
--- This now works and returns the correct count.

Thanks,

Patrick

davidhill
davidhill's picture
Offline
Last seen: 4 weeks 1 day ago
Joined: Oct 27 2009
Administrator

Posts: 595

david hill
Re:Data/Server corruption after bulk importing

Hey Patrick, thanks for the post.

We haven't been able to reproduce the problem. Here is the cpimort and queries.
We are importing alternating 'aa' and 'foo' for our testing. I don't know if that should make any difference. Can you check our steps to see what we might be doing different that might be related to not doing able to reproduce.
And if you try our steps, do they work for you?

1. step 1 create table:
create table test_table (test_field char(8)) engine=infinidb;
2. step 2, colxml and cpimport, here is the script I used:
k=1
while [ $k -le 1500000 ] ; do
echo 'foo|'
echo 'aa|'
k=$(( $k + 1 ))
done | /usr/local/Calpont/bin/cpimport -f STDIN -j 299
3. Ran the queries:
mysql> select test_field, count(*) from test_table group by test_field order by test_field;
+------------+----------+
| test_field | count(*) |
+------------+----------+
| aa | 1500000 |
| foo | 1500000 |
+------------+----------+
2 rows in set (0.49 sec)

mysql> select count(*) from test_table where test_field like 'foo';
+----------+
| count(*) |
+----------+
| 1500000 |
+----------+
1 row in set (2.35 sec)

mysql> select count(*) from test_table where test_field = 'foo';
+----------+
| count(*) |
+----------+
| 1500000 |
+----------+
1 row in set (0.13 sec)

mysql>

David Hill
Calpont

patgade
patgade's picture
Offline
Last seen: 3 years 8 months ago
Joined: Jul 16 2010
Junior Boarder

Posts: 5

Patrick Gade
Re:Data/Server corruption after bulk importing

Hi,

I tried out your example and I had the same result as you, no problem. I've gone back to the data that originally showed me the problem and I've tried to isolate it further.

I've found it doesn't require any volume of data to reproduce, just 10 rows was enough.

I've also found when I loaded in 20 files with 2-3 million records each and 44 fields (most of them char), the problem was also there. However, in this case the problem did not go away after an insert and delete. or even after a restart. It seemed to be permanently wrong. It seemed to only affect character fields, numbers and dates were ok.

Anyway, here is my 10 row test_table.tbl to reproduce:

MCARD|
POS|
POS|
MCARD|
MCARD|
VISA|
POS|
POS|
MCARD|
MCARD|

You can see the results below. LIKE works, = and IN don't work.

mysql> drop table test_table;
Query OK, 0 rows affected (1.37 sec)

mysql> create table test_table(test_field char(8)) engine = infinidb;
Query OK, 0 rows affected (0.61 sec)

mysql> select test_field, count(*) from test_table group by test_field order by test_field;
+------------+----------+
| test_field | count(*) |
+------------+----------+
| MCARD | 5 |
| POS | 4 |
| VISA | 1 |
+------------+----------+
3 rows in set (0.42 sec)

mysql> select count(*) from test_table where test_field = 'MCARD';
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.24 sec)

mysql> select count(*) from test_table where test_field = 'POS';
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.17 sec)

mysql> use idb
Database changed
mysql> select count(*) from test_table where test_field = 'VISA';
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.21 sec)

mysql> select count(*) from test_table where test_field LIKE 'VISA';
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row in set (0.22 sec)

mysql> select count(*) from test_table WHERE test_field IN (select test_field from test_table where test_field LIKE 'VISA');
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.92 sec)

Thanks,

Patrick

davidhill
davidhill's picture
Offline
Last seen: 4 weeks 1 day ago
Joined: Oct 27 2009
Administrator

Posts: 595

david hill
Re:Data/Server corruption after bulk importing

Thanks Patrick, we were able to re-produce the problem in-house and I have BUG the problem.

https://bugs.launchpad.net/infinidb/+bug/614380

patgade
patgade's picture
Offline
Last seen: 3 years 8 months ago
Joined: Jul 16 2010
Junior Boarder

Posts: 5

Patrick Gade
Re:Data/Server corruption after bulk importing

Cool thanks. Just to reiterate, the problem goes away when you perform inserts and deletes after a single load. However after multiple loads, it seems to become permanent/unfixable. At least in the case of our actual archived transaction table.

Thanks,

Patrick

wweeks
wweeks's picture
Offline
Last seen: 1 month 4 weeks ago
Joined: Jan 1 1970
Gold Boarder

Posts: 86

Walt Weeks
Re:Data/Server corruption after bulk importing

Hi Patrick,

The bug is that cpimport is incorrectly setting the extent map min and max in certain circumstances for chars with length <= 8 and varchars with length <= 7. We will have this fixed in the 1.5.4 patch release due out in early September. In the mean time, the edtem utility has an option to clear the min and max entries in the extent map for a given column which can be used as a work around.

Here's a one liner that can be used in Linux to clear the min and max for all of the extent map entries for test_table.test_column.

/usr/local/Calpont/bin/editem -c `idbmysql calpontsys -e "select objectid from syscolumn where columnname='test_field' and tablename='test_table';" | grep -v objectid`

If you run this after each import into test_table it should clear up the problem. It should complete very quickly.

Thanks very much for using InfiniDB and reporting this issue. Please let us know if this works for you.

Thanks,
Walt Weeks

patgade
patgade's picture
Offline
Last seen: 3 years 8 months ago
Joined: Jul 16 2010
Junior Boarder

Posts: 5

Patrick Gade
Re:Data/Server corruption after bulk importing

Hi,

Yes, I added the edititem command to my load script and it resolved the problem.

Thanks,

Patrick