How to export data quickly?

10 posts / 0 new
Last post
joegh
joegh's picture
Offline
Last seen: 1 month 4 days ago
Joined: Jan 17 2014
Junior Boarder

Posts: 10

Joe Wu
How to export data quickly?

Hello,

 

When I export data from InfiniDB, I find it a little slow. I do exporting in two usages: 

 

1. export data to out file

Because InfiniDB has param limitation using mysqldump, so I use "SELECT ... INTO OUTFILE" and "idbmysql -e 'SELECT ... ' > outfile". Both is slow, nearly 10 million records per min.

 

 2. export data to table internal InfiniDB

It's recommend to use "idbmysql -e 'SELECT ... ' | cpimport ... " in document of Admin Guide. But it's also too slow to do that even though the SELECT statement can be respones quickly, the whole process spends many minutes. And it's much more faster to load same size data from file using cpimort.

 

I guess it may caused by the data must process from PMs to UM, and then ouput from single UM node.

So, any other ways to make such exporting faster? Thank you.

djoshi
djoshi's picture
Offline
Last seen: 4 weeks 1 hour ago
Joined: Jun 6 2012
Administrator

Posts: 28

Dipti Joshi
How to export data quickly?

Joe:

 

Our 4.5 release is coming out soon at the end of this month, where we have added new feature to allow extract data in parallel at each PM node, rather than going through UM node. Please lookforward to its announcement in few days and you will get access to all the 4.5 documents with instructions.

 

Thanks for using InfiniDB

 

Dipti Joshi

joegh
joegh's picture
Offline
Last seen: 1 month 4 days ago
Joined: Jan 17 2014
Junior Boarder

Posts: 10

Joe Wu
Looking forward to the new

Looking forward to the new release very much!

joegh
joegh's picture
Offline
Last seen: 1 month 4 days ago
Joined: Jan 17 2014
Junior Boarder

Posts: 10

Joe Wu
Hi djoshi,

Hi djoshi,

 

I have tried the new 4.5 release. But I don't find the new feature to extract data in parallel at each PM node. And use "idbmysql -e 'SELECT ... ' | cpimport ... " is still slow.

 

The only related feature I found in the new document is "Local PM Query". It's useful if I only export data from one table with simple SELECT statement. But it can not be used if I want to join/aggregate data from one or more table and extract the result then reload to a new table.

 

I'm confused if I miss some new features stated in the documents?

djoshi
djoshi's picture
Offline
Last seen: 4 weeks 1 hour ago
Joined: Jun 6 2012
Administrator

Posts: 28

Dipti Joshi
Hi Joe: Can you please share

Hi Joe:

 

Can you please share your query ? And let us know the size of the tables involved in the join - I have solution for you.

 

Thanks,

Dipti

djoshi
djoshi's picture
Offline
Last seen: 4 weeks 1 hour ago
Joined: Jun 6 2012
Administrator

Posts: 28

Dipti Joshi
How to export data quickly with Local PM query for a join query

Hello Joe:

 

When  you install make sure to select "yes" for the answer to "Do you want to install UM on each PM node"  in the postConfigure step - which enables "Local PM Query"

 

Now when you want to do parallel extract out of the PM node

 

(1) if the select only involves single table use  "idbmysql -e 'SELECT ... ' | cpimport ... " on the PM nodes

 

(2) if the select involves a join between a fact table on the PM node and dimension table across all the nodes, create a script like following

 

set infinidb_local_query=0;

select fact.column, dim.column2 from fact join dim using (key) where idbPm(fact.key) = idbLocalPm();
 
and then run it on the PM node and pipe the output to cpimport. e.g  'idbmysql  dbName  < extract_query_script.sql | cpimport ..."
 
This way the UM process on the PM node gets the fact table data locally from the PM node(as indicated by the use of of idbLocalPm() function), while the dimension table is data is extracted from all the PM nodes.
 
Let us know how it goes.
 
We will update our syntax guide to explain the join scenario that you have. Thanks for your feedback.
 
Thanks,
 
Dipti
joegh
joegh's picture
Offline
Last seen: 1 month 4 days ago
Joined: Jan 17 2014
Junior Boarder

Posts: 10

Joe Wu
How to export data quickly with Local PM query for a join query

Hi djoshi,

 

Thank you for your reply. I will try your solution as soon as possible, and then feedback my test result.

 

I tried to update InfiniDB to 4.5 in recent days and met some problems.

I install binary package using non-root user and use the option of postConfigure -lq to enable local PM Query.

I found that these files' install path will not point to the param of INFINIDB_INSTALL_DIR: 

  1. Calpont/bin/rsync.sh
  2. Calpont/mysql/my.cnf (PMs)
  3. Calpont/mysql/install_calpont_mysql.sh (PMs)

 

Would you please check and fix it, Thank you.

radams
radams's picture
Offline
Last seen: 1 day 17 hours ago
Joined: Jan 3 2011
Administrator

Posts: 492

Robert Adams
How to export data quickly with Local PM query for a join query

 

 

Hi,

 

Thank you for the feedback. We have this identified as an open bug which occurs when trying to use local PM query on a non-root system. We will be addressing this issue with an upcoming release.

 

Thank you,

 

Robert

 

 

joegh
joegh's picture
Offline
Last seen: 1 month 4 days ago
Joined: Jan 17 2014
Junior Boarder

Posts: 10

Joe Wu
How to export data quickly with Local PM query for a join query

Hello djoshi:

 

I have tried to do parallel extract out of the PM node using the methods you supplied above. And it works.

Both the single table and tables join extract from PM nodes can improve the speed of ETL. With our 8 PMs InfiniDB cluster, when a ETL process spend 53s in UM, it takes 7s to process in every PM. Because every ETL in PMs can be processing parallel, it can save a lot of time totally.

 

In addition, this method for ETL is a bit complex, especially the multi tables join extract. I think it would be much better if all operations can be done in UM-side. Looking forward to more perfect version, thank you.

djoshi
djoshi's picture
Offline
Last seen: 4 weeks 1 hour ago
Joined: Jun 6 2012
Administrator

Posts: 28

Dipti Joshi
How to export data quickly with Local PM query for a join query

Hi Joe:

 

Can you install as a root user ?

 

Thanks,

Dipti