select speed when retrieving data.

8 posts / 0 new
Last post
rmaniac
rmaniac's picture
Offline
Last seen: 1 year 3 months ago
Joined: May 12 2012
Junior Boarder

Posts: 6

Brian Sutherland
select speed when retrieving data.

We have been evaluating the InfiniDB Community Edition. We have been running PostgreSQL and are currently running both on identical servers. When I run any sort of count on the InfiniDB machine it will generally beat PostgreSQL hands down. This works well for getting quick information out of the database and we were planning to standardize on InfiniDB.

The problem is, when I try to do a match and retrieve data PostgreSQL is much faster. At first I assume this might have something to do with using the substring function, so I took a common substring we use and put it in it's own column. The column is 16 char long, and I just select where column = 'something' on it. Even with this column set up, running about 150k selects takes some 30 minutes on PostgreSQL while checking about 500 million rows, but I have allow InfiniDB to run all night and into the next day without finishing.

Is there a setting I need to be looking at or is InfiniDB just not build to handle rapid fire queries like that? I am running them one at a time on each DB.

Thanks

radams
radams's picture
Offline
Last seen: 15 hours 45 min ago
Joined: Jan 3 2011
Administrator

Posts: 481

Robert Adams
Re: select speed when retrieving data.

Hi,

You can post the query being executed along with the schema(create table statements) and we can determine if we can provide a recommendation. A string function is going to be slower than a numeric filter.

Thanks,

Robert
Calpont

rmaniac
rmaniac's picture
Offline
Last seen: 1 year 3 months ago
Joined: May 12 2012
Junior Boarder

Posts: 6

Brian Sutherland
Re: select speed when retrieving data.

Sure, I am selecting like this:

SELECT email FROM customer WHERE individual = $key LIMIT 1

$key is first initial, 5 of last, 5 of address and 5 of zip.

Homer Simpson
742 Evergreen Terrace
Springfield, ST 90701

Would be:
"HSIMPS742 EV90701"

The tables are something like this:

CREATE TABLE IF NOT EXISTS `customer` (
`first_name` varchar(30) DEFAULT NULL,
`last_name` varchar(30) DEFAULT NULL,
`address` varchar(50) DEFAULT NULL,
`address2` varchar(50) DEFAULT NULL,
`city` varchar(30) DEFAULT NULL,
`state` char(2) DEFAULT NULL,
`zip5` char(5) DEFAULT NULL,
`zip4` char(4) DEFAULT NULL,
`dob` char(15) DEFAULT NULL,
`gender` char(1) DEFAULT NULL,
`home_phone` bigint(20) DEFAULT NULL,
`email` varchar(75) DEFAULT NULL,
`ip` varchar(15) DEFAULT NULL,
`url` varchar(100) DEFAULT NULL,
`record_date` datetime DEFAULT NULL,
`interest` double DEFAULT NULL,
`tobacco_use` varchar(5) DEFAULT NULL,
`vin` varchar(20) DEFAULT NULL,
`weight` smallint(6) DEFAULT NULL,
`amount_earned` smallint(6) DEFAULT NULL,
`amount_per_week` smallint(6) DEFAULT NULL,
`ownership` varchar(5) DEFAULT NULL,
`title` varchar(5) DEFAULT NULL,
`filename` varchar(70) DEFAULT NULL,
`coreg_id` bigint(20) DEFAULT NULL,
`individual` varchar(16) DEFAULT NULL
) ENGINE=InfiniDB DEFAULT CHARSET=latin1;

Before matching against 'individual' I was just using substring in sql to match each field.

So, in postgreSQL I use this:

select email from customer where substring(last_name,1,5) = ? and substring(address,1,5) = ? and zip5 = ? limit 1

and it is MUCH faster. It takes maybe 30min to run 150k against the same tables.

InfiniDB was stated at "Mon Aug 6 03:00:53 EDT 2012" and ran till "Mon Aug 13 15:35:57 EDT 2012" when I just cancelled it. It was done with 135k at that point. I have no idea what could be going on. Both machines are the same and are only running the dbs.

radams
radams's picture
Offline
Last seen: 15 hours 45 min ago
Joined: Jan 3 2011
Administrator

Posts: 481

Robert Adams
Re: select speed when retrieving data.

Hi,

The InfiniDB engine is going to process the query and send the results to the mysql front end where the limit is going to be applied. Not sure what your trying to accomplish but using a limit like this on InfiniDB is not going to be a good process.

Thanks,

Robert

rmaniac
rmaniac's picture
Offline
Last seen: 1 year 3 months ago
Joined: May 12 2012
Junior Boarder

Posts: 6

Brian Sutherland
Re: select speed when retrieving data.

I am confused, are you complementing my use on limits or did you mean it would not be good and are making a suggestion?

I am simply trying to pull the first record that matches the listed criteria. Normally putting a limit of 1 will stop it from doing more than it has to. I can try without, of course.

mikesloper
mikesloper's picture
Offline
Last seen: 1 year 11 months ago
Joined: Aug 20 2012
Junior Boarder

Posts: 1

Mike Sloper
Re: select speed when retrieving data.

I also noticed a huge performance drop compared to indexed databases when using ORDER BY and LIMIT without filters. (20 seconds vs 0.3 seconds on 30m rows)

While this is fine for most data analysis I don't think it is irrelevant. It is also a type of query I would use in day to day monitoring and checking. (eg. daily data loads)

I believe addressing this was considered a while back. See http://infinidb.org/forum/13-performance-monitoring-and-tuning/679-how-t...

Would be nice if Infinidb would give you the option of adding an index if you wanted. ;)

bdempsey
bdempsey's picture
Offline
Last seen: 18 hours 57 min ago
Joined: Oct 27 2009
Platinum Boarder

Posts: 194

Robert Dempsey
Re: select speed when retrieving data.

Most string functions will be slower in InfiniDB than in row-based databases that have indexes and other auxiliary structures specifically designed to find one row. InfiniDB is specifically designed to process billions of rows and aggregate them in a multi-threaded, parallel, distributed manner.

Generally speaking, if you're only interested in one row, you don't have a problem that InfiniDB will be particularly good at solving.

Specifically in this case, InfiniDB schedules a lot of work that begins and must finish before the limit clause is even applied. So, if your application is trying to toss 100K rows and keep only 1, and you are doing this repeatedly, there will be a lot of extra work done by InfiniDB that is thrown away.

If you can convert your string hash to a BIGINT hash and filter on that, it will work faster, but you'll still need to aggregate the rows or apply other filters to reduce the number of rows thrown away.

rmaniac
rmaniac's picture
Offline
Last seen: 1 year 3 months ago
Joined: May 12 2012
Junior Boarder

Posts: 6

Brian Sutherland
Re: select speed when retrieving data.

Ok, that makes sense now. The original reason we started looking InfiniDB was to process hundreds or millions of rows to return an aggregate. As you mentioned, it accels at this and we are happy to use if for those types of queries. We were just going to standardize on it for all queries. It looks like this is not a good idea, so we will just have to use the best DB for each task. Our only issue there is resource management, since we only have a small number of machines.

So, if you have any comment/suggestions on running (or not) InfiniDB on a machine with another DB (ie PostgreSQL), please let me know.

Thank you for your support,
Brian