Failed to deploy a 25-node cluster

5 posts / 0 new
Last post
bianhaoqiong
bianhaoqiong's picture
Offline
Last seen: 4 months 2 weeks ago
Joined: Mar 16 2014
Junior Boarder

Posts: 6

Haoqiong Bian
Failed to deploy a 25-node cluster

I successfully deployed a 5-node(1um and 4 pms) infinidb4.5 cluster on CDH4.5 and tested the query performance. Verything was perfect.

But when I tried to install a 25-node(1um and 24 pms) cluster, it failed in the "System Installation" step, the transcript of postConfigure is:

 

 

===== System Installation =====

 

System Configuration is complete, System Installation is the next step.

Would you like to continue with the System Installation? [y,n] (y) > 

 

Enter the Package Type being installed to other servers [rpm,deb,binary] (binary) > 

Performing an InfiniDB System install using a Binary package located in the /root directory.

 

Next step is to enter the password to access the other Servers.

This is either your password or you can default to using a ssh key

If using a password, the password needs to be the same on all Servers.

 

Enter password, hit 'enter' to default to using a ssh key, or 'exit' > 

Confirm password > 

 

----- Performing Install on 'um1 / infinidb1' -----

 

Install log file is located here: /tmp/um1_binary_install.log

 

 

----- Performing Install on 'pm2 / infinidb3' -----

 

Install log file is located here: /tmp/pm2_binary_install.log

 

 

----- Performing Install on 'pm3 / infinidb4' -----

 

Install log file is located here: /tmp/pm3_binary_install.log

 

 

----- Performing Install on 'pm4 / infinidb5' -----

 

Install log file is located here: /tmp/pm4_binary_install.log

 

 

----- Performing Install on 'pm5 / infinidb6' -----

 

Install log file is located here: /tmp/pm5_binary_install.log

 

 

----- Performing Install on 'pm6 / infinidb7' -----

 

Install log file is located here: /tmp/pm6_binary_install.log

 

 

----- Performing Install on 'pm7 / infinidb8' -----

 

Install log file is located here: /tmp/pm7_binary_install.log

 

 

----- Performing Install on 'pm8 / infinidb9' -----

 

Install log file is located here: /tmp/pm8_binary_install.log

 

 

----- Performing Install on 'pm9 / infinidb10' -----

 

Install log file is located here: /tmp/pm9_binary_install.log

 

 

----- Performing Install on 'pm10 / infinidb11' -----

 

Install log file is located here: /tmp/pm10_binary_install.log

 

 

----- Performing Install on 'pm11 / infinidb12' -----

 

Install log file is located here: /tmp/pm11_binary_install.log

 

 

----- Performing Install on 'pm12 / infinidb13' -----

 

Install log file is located here: /tmp/pm12_binary_install.log

 

 

----- Performing Install on 'pm13 / infinidb14' -----

 

Install log file is located here: /tmp/pm13_binary_install.log

 

 

----- Performing Install on 'pm14 / infinidb15' -----

 

Install log file is located here: /tmp/pm14_binary_install.log

 

 

----- Performing Install on 'pm15 / infinidb16' -----

 

Install log file is located here: /tmp/pm15_binary_install.log

 

 

----- Performing Install on 'pm16 / infinidb17' -----

 

Install log file is located here: /tmp/pm16_binary_install.log

 

 

----- Performing Install on 'pm17 / infinidb18' -----

 

Install log file is located here: /tmp/pm17_binary_install.log

 

 

----- Performing Install on 'pm18 / infinidb19' -----

 

Install log file is located here: /tmp/pm18_binary_install.log

 

 

----- Performing Install on 'pm19 / infinidb20' -----

 

Install log file is located here: /tmp/pm19_binary_install.log

 

 

----- Performing Install on 'pm20 / infinidb21' -----

 

Install log file is located here: /tmp/pm20_binary_install.log

 

 

----- Performing Install on 'pm21 / infinidb22' -----

 

Install log file is located here: /tmp/pm21_binary_install.log

 

 

----- Performing Install on 'pm22 / infinidb23' -----

 

Install log file is located here: /tmp/pm22_binary_install.log

 

 

----- Performing Install on 'pm23 / infinidb24' -----

 

Install log file is located here: /tmp/pm23_binary_install.log

 

 

----- Performing Install on 'pm24 / infinidb25' -----

 

Install log file is located here: /tmp/pm24_binary_install.log

 

 

InfiniDB Package being installed, please wait ...

Failure with a remote module install, check install log files in /tmp

 

I read the log files in /tmp and found that some pms was successfully installed but some others failed, the log like:

 

 

Uninstall Calpont Package                       bash: ./setenv-hdfs-20: No such file or directory
 
date
ssh root@172.16.0.117 'rm -f /etc/init.d/infinidb /etc/init.d/mysql-Calpont /usr/local/Calpont/releasenum >/dev/null 2>&1'
bash-4.1#  
bash-4.1# date
Fri Apr 25 10:55:46 EDT 2014
bash-4.1# ssh root@172.16.0.117 'rm -f /etc/init.d/infinidb /etc/init.d/mysql-Calpont /usr/local/Calpont/releasenum >/dev/null 2>&1'
bash-4.1# 
Copy New Calpont Package to Module               
bash-4.1# date
Fri Apr 25 10:56:46 EDT 2014
bash-4.1# scp /root/infinidb-4.5.0-1.x86_64.bin.tar.gz root@172.16.0.117:/root/infinidb-4.5.0-1.x86_64.bin.tar.gz
infinidb-4.5.0-1.x86_64.bin.tar.gz                                                                                                                                   8% 4752KB   2.2MB/s   00:24 ETA
infinidb-4.5.0-1.x86_64.bin.tar.gz                                                                                                                                  17%   10MB   2.1MB/s   00:22 ETAE
RROR: Invalid package
 
I found that /root/infinidb-4.5.0-1.x86_64.bin.tar.gz on these failed nodes were really incomplete, but when I tried to copy this file from AMO node to these nodes, verything is OK...
 
And I retried with Number of PMs = 8, the installation is successful.
 
What is the matter?
 
PS:
this 25-node installtion was done on a 25-node CDH cluster(um on namenode and pms on datanodes)
each node is a vm on openstack with 4 vcpus, 20GB rams and 500GB storage.
 
Thanks.

 

 

 

bianhaoqiong
bianhaoqiong's picture
Offline
Last seen: 4 months 2 weeks ago
Joined: Mar 16 2014
Junior Boarder

Posts: 6

Haoqiong Bian
I build 25 new openstack vms

I build 25 new openstack vms and set the timeout to 1200000 and tried again, it still failed to scp the packages to some nodes......

 

time of all vms are synchronized by ntp. and the network between vm nodes is gigabit Eth, scp a 1GB file from one node to another, the speed is about 70MB/s, ping from one node to another, the time delay is about 0.7 ms,  we have tested the scalability and performance of hive\impala\presto\spark-shark on this openstack platform, so I think the bandwidth and network delay may be OK.... 

 

davidhill
davidhill's picture
Offline
Last seen: 1 month 3 weeks ago
Joined: Oct 27 2009
Administrator

Posts: 595

david hill
Did you get a chance to try

Did you get a chance to try it by doing step 2? I'm still unsure why is error is being reported ..

 

2. You could try this now or in a second run if change for 1 doesn't make a different, command out line beginning with "No such file or directory"

 

David

bianhaoqiong
bianhaoqiong's picture
Offline
Last seen: 4 months 2 weeks ago
Joined: Mar 16 2014
Junior Boarder

Posts: 6

Haoqiong Bian
Sorry, I am not sure what

Sorry, I am not sure what should I do in step 2.

The binary-installer.sh always fail to copy the package to other nodes. I changed timeout from 120 to 240 or higher value, it still does not work.

 

Thanks.

radams
radams's picture
Offline
Last seen: 2 hours 7 min ago
Joined: Jan 3 2011
Administrator

Posts: 493

Robert Adams
Sorry, I am not sure what

 

Hi,

 

David was asking to try a test by commenting out the line in the script with "No such file or directory" .

 

Have you been able to try this?

 

Thanks,

 

Robert