PXEClusterInstall
From WBITT's Cooker!
(→=) |
|||
Line 1: | Line 1: | ||
bismilla-hirrahma-nirraheem | bismilla-hirrahma-nirraheem | ||
- | + | = Introduction and scenario = | |
My host machine/physical machine is named kworkbee, running Fedora 8 (i386), and VMware server 2.0 on top of it. | My host machine/physical machine is named kworkbee, running Fedora 8 (i386), and VMware server 2.0 on top of it. | ||
Line 10: | Line 10: | ||
My Virtual machines, part of this beowulf cluster are :- | My Virtual machines, part of this beowulf cluster are :- | ||
- | headnode | + | * headnode |
- | node1 (MAC address: 00:50:56:00:00:11) | + | * node1 (MAC address: 00:50:56:00:00:11) |
- | node2 (MAC address: 00:50:56:00:00:12) | + | * node2 (MAC address: 00:50:56:00:00:12) |
I am using "redhat" (without quotes), as my root password on all machines. | I am using "redhat" (without quotes), as my root password on all machines. | ||
Line 18: | Line 18: | ||
The /etc/hosts file on my host machine and headnode, looks like:- | The /etc/hosts file on my host machine and headnode, looks like:- | ||
+ | |||
+ | <pre> | ||
[root@kworkbee ~]# cat /etc/hosts | [root@kworkbee ~]# cat /etc/hosts | ||
127.0.0.1 localhost.localdomain localhost | 127.0.0.1 localhost.localdomain localhost | ||
Line 24: | Line 26: | ||
192.168.0.11 node1 node1 | 192.168.0.11 node1 node1 | ||
192.168.0.12 node2 node2 | 192.168.0.12 node2 node2 | ||
- | + | </pre> | |
On the host server, extracted the ISO file in a directory: | On the host server, extracted the ISO file in a directory: | ||
- | + | <pre> | |
- | + | mount -o loop /data/cdimages/CentOS-5.2-i386-bin-DVD.iso /media/loop/ | |
- | + | rsync -av /media/loop/* /data/cdimages/centos/ | |
- | + | </pre> | |
+ | Configure an apache Alias and restart apache service. | ||
+ | <pre> | ||
# vi /etc/httpd/conf.d/centos.conf | # vi /etc/httpd/conf.d/centos.conf | ||
Line 45: | Line 49: | ||
# service httpd restart | # service httpd restart | ||
+ | </pre> | ||
Check by opening a browser window, on the host computer. You should get a list of files from the top level directory of the Centos distribution:- | Check by opening a browser window, on the host computer. You should get a list of files from the top level directory of the Centos distribution:- | ||
Line 51: | Line 56: | ||
- | The same should be | + | The same should be accessible from the head node of your cluster, using the address:- |
http://192.168.0.1/centos | http://192.168.0.1/centos | ||
- | To ease the pain of installation of | + | To ease the pain of installation of various software on the headnode, I have edited the yum repository file on headnode as shown below. Comment out the rest of the file:- |
+ | <pre> | ||
# vi /etc/yum.repos.d/CentOS-Base.repo | # vi /etc/yum.repos.d/CentOS-Base.repo | ||
Line 65: | Line 71: | ||
gpgcheck=0 | gpgcheck=0 | ||
gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-5 | gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-5 | ||
- | + | </pre> | |
Install necessary software on the headnode :- | Install necessary software on the headnode :- | ||
- | |||
- | |||
+ | * DHCP-server | ||
+ | * TFTP-server | ||
+ | * syslinux | ||
+ | <pre> | ||
[root@headnode ~]# yum -y install dhcp tftp-server syslinux | [root@headnode ~]# yum -y install dhcp tftp-server syslinux | ||
- | + | </pre> | |
Setup the /etc/dhcpd.conf file as shown here:- | Setup the /etc/dhcpd.conf file as shown here:- | ||
+ | <pre> | ||
# vi /etc/dhcpd.conf | # vi /etc/dhcpd.conf | ||
ddns-update-style interim; | ddns-update-style interim; | ||
Line 114: | Line 123: | ||
} | } | ||
- | + | </pre> | |
Note: The file pxelinux.0 is a special file, which gets installed through the package syslinux. This is the actual pxe boot linux kernel, and is run as soon as the pxe client/ client machine gets a DHCP lease. The actual location of this file is mentioned later in this howto. | Note: The file pxelinux.0 is a special file, which gets installed through the package syslinux. This is the actual pxe boot linux kernel, and is run as soon as the pxe client/ client machine gets a DHCP lease. The actual location of this file is mentioned later in this howto. | ||
- | + | <pre> | |
[root@headnode ~]# service dhcpd restart | [root@headnode ~]# service dhcpd restart | ||
Starting dhcpd: [ OK ] | Starting dhcpd: [ OK ] | ||
[root@headnode ~]# chkconfig --level 35 dhcpd on | [root@headnode ~]# chkconfig --level 35 dhcpd on | ||
- | + | </pre> | |
Now enable TFTP as well:- | Now enable TFTP as well:- | ||
+ | <pre> | ||
[root@headnode ~]# vi /etc/xinetd.d/tftp | [root@headnode ~]# vi /etc/xinetd.d/tftp | ||
service tftp | service tftp | ||
Line 146: | Line 156: | ||
Stopping xinetd: [ OK ] | Stopping xinetd: [ OK ] | ||
Starting xinetd: [ OK ] | Starting xinetd: [ OK ] | ||
+ | |||
[root@headnode ~]# chkconfig --level 35 xinetd on | [root@headnode ~]# chkconfig --level 35 xinetd on | ||
- | + | </pre> | |
+ | Next, we need to copy two special files on the host machine, from a special location of distribution media. On the host machine, I checked the list of files to make sure that the files are available:- | ||
- | + | <pre> | |
- | + | ||
[root@kworkbee ~]# ls /data/cdimages/centos/images/pxeboot/ | [root@kworkbee ~]# ls /data/cdimages/centos/images/pxeboot/ | ||
initrd.img README TRANS.TBL vmlinuz | initrd.img README TRANS.TBL vmlinuz | ||
[root@kworkbee ~]# | [root@kworkbee ~]# | ||
+ | </pre> | ||
We will copy the vmlinuz and initrd.img files, from this location on the host machine, to /tftpboot directory on the headnode machine :- | We will copy the vmlinuz and initrd.img files, from this location on the host machine, to /tftpboot directory on the headnode machine :- | ||
Line 160: | Line 172: | ||
On the headnode, we have the CentOS distribution available through HTTP path : http://192.168.0.1:/centos . We will use wget to download these two files from the host machine. I can also use scp, but just to make things interesting, here it is:- | On the headnode, we have the CentOS distribution available through HTTP path : http://192.168.0.1:/centos . We will use wget to download these two files from the host machine. I can also use scp, but just to make things interesting, here it is:- | ||
- | + | <pre> | |
[root@headnode ~]# cd /tftpboot/ | [root@headnode ~]# cd /tftpboot/ | ||
Line 166: | Line 178: | ||
[root@headnode tftpboot]# wget http://192.168.0.1/centos/images/pxeboot/initrd.img | [root@headnode tftpboot]# wget http://192.168.0.1/centos/images/pxeboot/initrd.img | ||
- | + | </pre> | |
PXE configuration detail:- | PXE configuration detail:- | ||
- | When a client boots, by default it will look for a configuration file from TFTP, with the same name as it's MAC address. However, | + | When a client boots, by default it will look for a configuration file from TFTP, with the same name as it's MAC address. However, afrer trying several options, it will fall back to requesting a default file, with the name "default". This file needs to be in a directory in /tftp of the headnode. |
+ | <pre> | ||
# mkdir /tftpboot/pxelinux.cfg | # mkdir /tftpboot/pxelinux.cfg | ||
Line 180: | Line 193: | ||
kernel vmlinuz | kernel vmlinuz | ||
append vga=normal initrd=initrd.img | append vga=normal initrd=initrd.img | ||
- | + | </pre> | |
Now is the time to copy the pxelinux.0 file from the installed location to /tftpboot directory . This file (pxelinux.0) is provided by the syslinux package, which comes with your linux distribution. | Now is the time to copy the pxelinux.0 file from the installed location to /tftpboot directory . This file (pxelinux.0) is provided by the syslinux package, which comes with your linux distribution. | ||
+ | <pre> | ||
# cp /usr/lib/syslinux/pxelinux.0 /tftpboot/ | # cp /usr/lib/syslinux/pxelinux.0 /tftpboot/ | ||
+ | </pre> | ||
Now make sure that all files and direcoties inside /tftpboot is world readable. | Now make sure that all files and direcoties inside /tftpboot is world readable. | ||
+ | <pre> | ||
# chmod +r /tftpboot/* -R | # chmod +r /tftpboot/* -R | ||
+ | </pre> | ||
Now you are ready to boot your client. You should see your client getting an IP and a boot file and booting off from the pxe boot image. | Now you are ready to boot your client. You should see your client getting an IP and a boot file and booting off from the pxe boot image. | ||
Line 195: | Line 212: | ||
While your client boots, you should the following in your /var/log/messages of your headnode:- | While your client boots, you should the following in your /var/log/messages of your headnode:- | ||
+ | <pre> | ||
# tail -f /var/log/messages | # tail -f /var/log/messages | ||
Feb 27 19:55:15 beowulf dhcpd: DHCPDISCOVER from 00:50:56:00:00:11 via eth0 | Feb 27 19:55:15 beowulf dhcpd: DHCPDISCOVER from 00:50:56:00:00:11 via eth0 | ||
Line 204: | Line 222: | ||
Feb 27 19:55:17 beowulf dhcpd: DHCPACK on 192.168.0.11 to 00:50:56:00:00:11 via eth0 | Feb 27 19:55:17 beowulf dhcpd: DHCPACK on 192.168.0.11 to 00:50:56:00:00:11 via eth0 | ||
Feb 27 19:55:17 beowulf in.tftpd[12476]: tftp: client does not accept options | Feb 27 19:55:17 beowulf in.tftpd[12476]: tftp: client does not accept options | ||
+ | </pre> | ||
+ | By doing this, you have managed to start up the interactive installation . Congratulations! | ||
- | + | = Automated KickStart installations = | |
- | + | ||
- | + | ||
- | + | ||
For automated KickStart based setups, you need to do the following additional steps. | For automated KickStart based setups, you need to do the following additional steps. | ||
First you need a kickstart file. You can use the minimal kickstart file from your headnode! How ? Well, when you installed your headnode, the installer created a file anaconda-ks.cfg in your /root directory. You can use this file, modify it a bit and use it as the kickstart file for your compute nodes. | First you need a kickstart file. You can use the minimal kickstart file from your headnode! How ? Well, when you installed your headnode, the installer created a file anaconda-ks.cfg in your /root directory. You can use this file, modify it a bit and use it as the kickstart file for your compute nodes. | ||
- | + | <pre> | |
[root@headnode ~]# cp anaconda-ks.cfg compute-ks.cfg | [root@headnode ~]# cp anaconda-ks.cfg compute-ks.cfg | ||
+ | </pre> | ||
+ | |||
+ | Edit this file as per your requirements. | ||
+ | <pre> | ||
[root@headnode ~]# vi compute-ks.cfg | [root@headnode ~]# vi compute-ks.cfg | ||
Line 247: | Line 268: | ||
%packages | %packages | ||
@base | @base | ||
+ | </pre> | ||
- | + | Now, copy this file to the document root of your web server, and make it world readable:- | |
+ | <pre> | ||
cp /root/compute-ks.cfg /var/www/html/ | cp /root/compute-ks.cfg /var/www/html/ | ||
chmod +r /var/www/html/compute-ks.cfg | chmod +r /var/www/html/compute-ks.cfg | ||
- | + | </pre> | |
- | + | ||
Edit the tftpboot file again and add extra options:- | Edit the tftpboot file again and add extra options:- | ||
+ | <pre> | ||
[root@headnode ~]# vi /tftpboot/pxelinux.cfg/default | [root@headnode ~]# vi /tftpboot/pxelinux.cfg/default | ||
Line 266: | Line 289: | ||
kernel vmlinuz | kernel vmlinuz | ||
append vga=normal initrd=initrd.img ip=dhcp ksdevice=eth0 ks=http://192.168.0.10/compute-ks.cfg | append vga=normal initrd=initrd.img ip=dhcp ksdevice=eth0 ks=http://192.168.0.10/compute-ks.cfg | ||
+ | <pre> | ||
+ | You need to make sure that httpd is running on your head node otherwise the installer will not be able to access this file. | ||
- | |||
Another option is to copy this compute-ks.cfg file to the document root of your host machine (192.168.0.1), in /var/www/html directory. | Another option is to copy this compute-ks.cfg file to the document root of your host machine (192.168.0.1), in /var/www/html directory. | ||
+ | <pre> | ||
[root@headnode ~]# service httpd restart | [root@headnode ~]# service httpd restart | ||
- | + | </pre> | |
Line 279: | Line 304: | ||
For the logs of package retrieval during the actual installation, you should check apache access log on the hostmachine. | For the logs of package retrieval during the actual installation, you should check apache access log on the hostmachine. | ||
+ | <pre> | ||
[root@headnode ~]# tail -f /var/log/httpd/access_log | [root@headnode ~]# tail -f /var/log/httpd/access_log | ||
192.168.0.11 - - [27/Feb/2009:21:06:50 +0300] "GET /compute-ks.cfg HTTP/1.0" 200 680 "-" "anacona/11.1.2.113" | 192.168.0.11 - - [27/Feb/2009:21:06:50 +0300] "GET /compute-ks.cfg HTTP/1.0" 200 680 "-" "anacona/11.1.2.113" | ||
- | |||
Line 292: | Line 317: | ||
Feb 27 21:06:51 beowulf dhcpd: DHCPREQUEST for 192.168.0.11 (192.168.0.10) from 00:50:56:00:00:11 via eth0 | Feb 27 21:06:51 beowulf dhcpd: DHCPREQUEST for 192.168.0.11 (192.168.0.10) from 00:50:56:00:00:11 via eth0 | ||
Feb 27 21:06:51 beowulf dhcpd: DHCPACK on 192.168.0.11 to 00:50:56:00:00:11 via eth0 | Feb 27 21:06:51 beowulf dhcpd: DHCPACK on 192.168.0.11 to 00:50:56:00:00:11 via eth0 | ||
+ | </pre> | ||
+ | Try logging in to a node after installation. The screenshot below shows that regardless of the node we log into, we are shown "node" as the hostname of the node. This is because we fixed the hostname as node in the kickstart file. This is a limitation in this type of installation. In case you are installing more than one nodes of a cluster, using such method, you either create separate pxe files and related separate kickstart files, through some sort of script and install nodes using that method. Or, you can manually change the node name after they are installed. | ||
- | + | I found the following links helpful: | |
http://www.debian-administration.org/articles/478 | http://www.debian-administration.org/articles/478 | ||
Line 302: | Line 329: | ||
This can be observed manually by doing the following manual steps. | This can be observed manually by doing the following manual steps. | ||
- | + | Rename the file /tftpboot/pxelinux.cfg/default to 01-<MAC address of any one of your node> | |
i.e. (e.g. node1) | i.e. (e.g. node1) | ||
+ | <pre> | ||
[root@headnode pxelinux.cfg]# mv /tftpboot/pxelinux.cfg/default /tftpboot/pxelinux.cfg/01-00-50-56-00-00-11 | [root@headnode pxelinux.cfg]# mv /tftpboot/pxelinux.cfg/default /tftpboot/pxelinux.cfg/01-00-50-56-00-00-11 | ||
+ | </pre> | ||
The "01", before the MAC address represents a hardware type of ethernet. | The "01", before the MAC address represents a hardware type of ethernet. | ||
Line 311: | Line 340: | ||
Now as you note that you don't have a default file in your tftp setup any more. Only node1 should be able to boot and install from a pxe image properly. Node2, should fail. This can be seen from the screenshot (pxe-boot-7.png) below:- | Now as you note that you don't have a default file in your tftp setup any more. Only node1 should be able to boot and install from a pxe image properly. Node2, should fail. This can be seen from the screenshot (pxe-boot-7.png) below:- | ||
- | So you need to develop a mechanism to automate this all, for your cluster. And for the sake of automating a simple task, such as initial installation, there are management tools / software, such as ROCKS, OSCAR, Scali / Platform, etc. | + | So you need to develop a mechanism to automate this all, for your cluster. And for the sake of automating a simple task, such as initial installation, there are management tools / software, such as ROCKS, OSCAR, Cobbler, Scali / Platform, etc. |
With the help of a little scripting , and the availability of hostnames, mac addresses and IP address range, we can create multiple pxe boot files and related multiple KickStart files. Each pxe file will have an entry against it's related ks file only. A simple example is show below:- | With the help of a little scripting , and the availability of hostnames, mac addresses and IP address range, we can create multiple pxe boot files and related multiple KickStart files. Each pxe file will have an entry against it's related ks file only. A simple example is show below:- | ||
Line 317: | Line 346: | ||
First, the PXE file for node1:- | First, the PXE file for node1:- | ||
+ | <pre> | ||
[root@headnode ~]# vi /tftpboot/pxelinux.cfg/01-00-50-56-00-00-11 | [root@headnode ~]# vi /tftpboot/pxelinux.cfg/01-00-50-56-00-00-11 | ||
Line 325: | Line 355: | ||
kernel vmlinuz | kernel vmlinuz | ||
append vga=normal initrd=initrd.img ip=dhcp ks=http://192.168.0.10/node1-ks.cfg | append vga=normal initrd=initrd.img ip=dhcp ks=http://192.168.0.10/node1-ks.cfg | ||
- | + | </pre> | |
And then, the KickStart file for node1. Notice the different network line for both static IP and hostname :- | And then, the KickStart file for node1. Notice the different network line for both static IP and hostname :- | ||
+ | <pre> | ||
[root@headnode ~]# vi /var/www/html/node1-ks.cfg | [root@headnode ~]# vi /var/www/html/node1-ks.cfg | ||
Line 358: | Line 389: | ||
%packages | %packages | ||
@base | @base | ||
+ | </pre> | ||
+ | = Collection of SSH fingerprints of all cluster node = | ||
- | + | Excellent article at : | |
- | + | http://itg.chem.indiana.edu/inc/wiki/software/openssh/189.html | |
- | + | ||
- | + | ||
- | + | ||
- | itg.chem.indiana.edu/inc/wiki/software/openssh/189.html | + | |
+ | <pre> | ||
[root@headnode ~]# ping node1 | [root@headnode ~]# ping node1 | ||
PING node1 (192.168.0.11) 56(84) bytes of data. | PING node1 (192.168.0.11) 56(84) bytes of data. | ||
Line 387: | Line 417: | ||
rtt min/avg/max/mdev = 1.695/2.355/3.016/0.662 ms | rtt min/avg/max/mdev = 1.695/2.355/3.016/0.662 ms | ||
[root@headnode ~]# | [root@headnode ~]# | ||
+ | </pre> | ||
Let's try to connect to a node and see if it asks for saving fingerprint. We will select NO to the fingerprint save option for the time being:- | Let's try to connect to a node and see if it asks for saving fingerprint. We will select NO to the fingerprint save option for the time being:- | ||
+ | <pre> | ||
[root@headnode ~]# ssh node1 | [root@headnode ~]# ssh node1 | ||
The authenticity of host 'node1 (192.168.0.11)' can't be established. | The authenticity of host 'node1 (192.168.0.11)' can't be established. | ||
Line 396: | Line 428: | ||
Are you sure you want to continue connecting (yes/no)? no | Are you sure you want to continue connecting (yes/no)? no | ||
Host key verification failed. | Host key verification failed. | ||
+ | </pre> | ||
- | + | The file /etc/ssh/ssh_known_hosts has all the fingerprints. However it does not exist by default: | |
+ | <pre> | ||
[root@headnode ~]# ls /etc/ssh/ssh_known_hosts | [root@headnode ~]# ls /etc/ssh/ssh_known_hosts | ||
ls: /etc/ssh/ssh_known_hosts: No such file or directory | ls: /etc/ssh/ssh_known_hosts: No such file or directory | ||
+ | </pre> | ||
+ | Lets gerenate the RSA1 fingerprints of our two cluster nodes. (Remember, the nodes were ping-able): | ||
- | + | <pre> | |
- | + | ||
- | + | ||
[root@headnode ~]# ssh-keyscan -t rsa localhost headnode node1 node2 > /etc/ssh/ssh_known_hosts | [root@headnode ~]# ssh-keyscan -t rsa localhost headnode node1 node2 > /etc/ssh/ssh_known_hosts | ||
# node1 SSH-2.0-OpenSSH_4.3 | # node1 SSH-2.0-OpenSSH_4.3 | ||
# node2 SSH-2.0-OpenSSH_4.3 | # node2 SSH-2.0-OpenSSH_4.3 | ||
- | + | </pre> | |
See! It worked ! | See! It worked ! | ||
+ | <pre> | ||
[root@headnode ~]# ssh node1 | [root@headnode ~]# ssh node1 | ||
Warning: Permanently added the RSA host key for IP address '192.168.0.11' to the list of known hosts. | Warning: Permanently added the RSA host key for IP address '192.168.0.11' to the list of known hosts. | ||
root@node1's password: | root@node1's password: | ||
[root@localhost ~]# | [root@localhost ~]# | ||
+ | </pre> | ||
- | - | + | Notice that it did not ask to save any finger-print. |
- | |||
+ | = Host based authentication (Warning: This is NOT much liked solution) = | ||
+ | |||
+ | Good article at: | ||
http://kbase.redhat.com/faq/docs/DOC-9164 | http://kbase.redhat.com/faq/docs/DOC-9164 | ||
- | |||
For hostbased authentication to work, you should have ssh host keys on both headnode and compute nodes. | For hostbased authentication to work, you should have ssh host keys on both headnode and compute nodes. | ||
- | |||
Normally the following would be needed to be setup on the "server" side, which you are trying to access from a "client" . In our case, our head node is in-fact acting in client role. And the compute node is infact the "server". | Normally the following would be needed to be setup on the "server" side, which you are trying to access from a "client" . In our case, our head node is in-fact acting in client role. And the compute node is infact the "server". | ||
Line 436: | Line 472: | ||
Server:- | Server:- | ||
+ | <pre> | ||
# vi /etc/ssh/sshd_config | # vi /etc/ssh/sshd_config | ||
Line 449: | Line 486: | ||
# ssh-keyscan -t rsa node1 node1.mybeowulf.local > /etc/ssh/ssh_known_hosts | # ssh-keyscan -t rsa node1 node1.mybeowulf.local > /etc/ssh/ssh_known_hosts | ||
- | + | </pre> | |
Client :- | Client :- | ||
+ | <pre> | ||
... | ... | ||
# vi /etc/ssh/ssh_config | # vi /etc/ssh/ssh_config | ||
Line 470: | Line 508: | ||
# chmod 600 ~/.shosts | # chmod 600 ~/.shosts | ||
+ | </pre> | ||
+ | Make sure that your name resolution is setup correctly: | ||
- | + | <pre> | |
# vi /etc/hosts | # vi /etc/hosts | ||
127.0.0.1 localhost.localdomain localhost | 127.0.0.1 localhost.localdomain localhost | ||
192.168.0.11 node1.mybeowulf.local node1 | 192.168.0.11 node1.mybeowulf.local node1 | ||
192.168.0.10 headnode headnode | 192.168.0.10 headnode headnode | ||
- | + | </pre> | |
- | + | = (Not directly linked topic) "How to Control VMware Virtual Machines from command line?" = | |
- | = | + | |
- | + | ||
- | + | ||
Note: Some people say that VMware tools must be installed on the vmmachines for this to work. I did not install any vmtools on my virtual machines and yet I got this working properly. | Note: Some people say that VMware tools must be installed on the vmmachines for this to work. I did not install any vmtools on my virtual machines and yet I got this working properly. | ||
Line 490: | Line 527: | ||
This works perfectly for restarting the vm machines from linux command prompt. Please note that my datastore is named "standard", and the exact location of vmware machine (node2) on my disk is (/data/vmachines/beowulf_node2/beowulf_node2.vmx) :- | This works perfectly for restarting the vm machines from linux command prompt. Please note that my datastore is named "standard", and the exact location of vmware machine (node2) on my disk is (/data/vmachines/beowulf_node2/beowulf_node2.vmx) :- | ||
+ | <pre> | ||
[root@kworkbee ~]# vmrun -T server -h https://localhost:8333/sdk -u root -p redhat reset "[standard] beowulf_node2/beowulf_node2.vmx" | [root@kworkbee ~]# vmrun -T server -h https://localhost:8333/sdk -u root -p redhat reset "[standard] beowulf_node2/beowulf_node2.vmx" | ||
[root@kworkbee ~]# | [root@kworkbee ~]# | ||
+ | </pre> | ||
+ | = SSH key based authentication = | ||
- | + | <pre> | |
- | + | KEYGEN -q -t rsa1 -f $RSA1_KEY -C '' -N '' >&/dev/null | |
- | KEYGEN -q -t rsa1 -f $RSA1_KEY -C '' -N '' >&/dev/null | + | chmod 600 $RSA1_KEY |
- | + | chmod 644 $RSA1_KEY.pub | |
- | + | </pre> | |
- | + | ||
+ | <pre> | ||
[root@headnode .ssh]# ssh-keygen -t rsa -f /root/.ssh/id_rsa -C '' -N '' | [root@headnode .ssh]# ssh-keygen -t rsa -f /root/.ssh/id_rsa -C '' -N '' | ||
Generating public/private rsa key pair. | Generating public/private rsa key pair. | ||
Line 521: | Line 561: | ||
[root@headnode .ssh]# cat id_dsa.pub >> authorized_keys | [root@headnode .ssh]# cat id_dsa.pub >> authorized_keys | ||
[root@headnode .ssh]# cat id_rsa.pub >> authorized_keys | [root@headnode .ssh]# cat id_rsa.pub >> authorized_keys | ||
+ | </pre> | ||
Now try logging in to this machine:- | Now try logging in to this machine:- | ||
+ | <pre> | ||
[root@headnode .ssh]# ssh localhost | [root@headnode .ssh]# ssh localhost | ||
Last login: Sat Mar 7 13:12:09 2009 from localhost.localdomain | Last login: Sat Mar 7 13:12:09 2009 from localhost.localdomain | ||
[root@headnode ~]# | [root@headnode ~]# | ||
+ | </pre> | ||
As you can see, it works without asking for password. Good. Now lets copy the private files and the public files to the .ssh directory of the nodes. We can put them in a special directory named ssh in our webroot and can get them through wget on the node, during %post of the kickstart . This way all nodes will have a single ssh private and public file. Not much hassle. | As you can see, it works without asking for password. Good. Now lets copy the private files and the public files to the .ssh directory of the nodes. We can put them in a special directory named ssh in our webroot and can get them through wget on the node, during %post of the kickstart . This way all nodes will have a single ssh private and public file. Not much hassle. | ||
- | + | <pre> | |
[root@headnode .ssh]# mkdir /var/www/html/ssh | [root@headnode .ssh]# mkdir /var/www/html/ssh | ||
Line 548: | Line 591: | ||
-rw-r--r-- 1 root root 382 Mar 7 13:17 id_rsa.pub | -rw-r--r-- 1 root root 382 Mar 7 13:17 id_rsa.pub | ||
[root@headnode .ssh]# | [root@headnode .ssh]# | ||
- | + | </pre> | |
+ | |||
Now lets write down steps of setting up a correct .ssh directory on the node. | Now lets write down steps of setting up a correct .ssh directory on the node. | ||
+ | <pre> | ||
mkdir /root/.ssh | mkdir /root/.ssh | ||
chmod 0700 /root/.ssh | chmod 0700 /root/.ssh | ||
Line 564: | Line 609: | ||
cd | cd | ||
+ | </pre> | ||
Lets test: | Lets test: | ||
+ | <pre> | ||
[root@headnode .ssh]# ssh root@node1 | [root@headnode .ssh]# ssh root@node1 | ||
Last login: Sat Mar 7 13:27:13 2009 from headnode | Last login: Sat Mar 7 13:27:13 2009 from headnode | ||
[root@node1 ~]# | [root@node1 ~]# | ||
- | + | </pre> | |
Great ! Let's check the other way round. | Great ! Let's check the other way round. | ||
- | + | <pre> | |
[root@node1 .ssh]# ssh root@headnode | [root@node1 .ssh]# ssh root@headnode | ||
Last login: Sat Mar 7 13:13:11 2009 from localhost.localdomain | Last login: Sat Mar 7 13:13:11 2009 from localhost.localdomain | ||
[root@headnode ~]# | [root@headnode ~]# | ||
- | + | </pre> | |
This works great too! | This works great too! | ||
Line 595: | Line 642: | ||
The following four commands setup the headnode's ssh-host-key on the node. | The following four commands setup the headnode's ssh-host-key on the node. | ||
+ | <pre> | ||
[root@headnode .ssh]# ssh-keyscan -t rsa headnode > /var/www/html/ssh/ssh_known_hosts | [root@headnode .ssh]# ssh-keyscan -t rsa headnode > /var/www/html/ssh/ssh_known_hosts | ||
chmod +r /var/www/html/ssh/ssh_known_hosts | chmod +r /var/www/html/ssh/ssh_known_hosts | ||
- | + | </pre> | |
On the node side:- | On the node side:- | ||
+ | <pre> | ||
[root@node1 ssh]# rm ssh_known_hosts* -f | [root@node1 ssh]# rm ssh_known_hosts* -f | ||
[root@node1 ssh]# wget http://192.168.0.10/ssh/ssh_known_hosts | [root@node1 ssh]# wget http://192.168.0.10/ssh/ssh_known_hosts | ||
- | + | </pre> | |
Then, I would use the following to add the ssh-host-key of the newly installed compute node to the ssh_known_hosts file on the headnode:- | Then, I would use the following to add the ssh-host-key of the newly installed compute node to the ssh_known_hosts file on the headnode:- | ||
+ | <pre> | ||
[root@node1 ssh]# SSH_SCAN_KEY=$(ssh-keyscan -t rsa `hostname -s`) | [root@node1 ssh]# SSH_SCAN_KEY=$(ssh-keyscan -t rsa `hostname -s`) | ||
# node1 SSH-2.0-OpenSSH_4.3 | # node1 SSH-2.0-OpenSSH_4.3 | ||
Line 614: | Line 664: | ||
[root@node1 ssh]# ssh headnode "echo $SSH_SCAN_KEY >> /tmp/hosts.txt" | [root@node1 ssh]# ssh headnode "echo $SSH_SCAN_KEY >> /tmp/hosts.txt" | ||
[root@node1 ssh]# | [root@node1 ssh]# | ||
+ | </pre> | ||
Let's send the FQDN key as well:- | Let's send the FQDN key as well:- | ||
+ | <pre> | ||
[root@node1 ssh]# SSH_SCAN_KEY=$(ssh-keyscan -t rsa `hostname`) | [root@node1 ssh]# SSH_SCAN_KEY=$(ssh-keyscan -t rsa `hostname`) | ||
# node1.mybeowulf.local SSH-2.0-OpenSSH_4.3 | # node1.mybeowulf.local SSH-2.0-OpenSSH_4.3 | ||
Line 623: | Line 675: | ||
[root@node1 ssh]# ssh headnode "echo $SSH_SCAN_KEY >> /tmp/hosts.txt" | [root@node1 ssh]# ssh headnode "echo $SSH_SCAN_KEY >> /tmp/hosts.txt" | ||
[root@node1 ssh]# | [root@node1 ssh]# | ||
+ | </pre> | ||
Let's check on the server side:- | Let's check on the server side:- | ||
+ | <pre> | ||
[root@headnode ssh]# cat /tmp/hosts.txt | [root@headnode ssh]# cat /tmp/hosts.txt | ||
node1 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzKIg/MmPJzPoQxBWRN8G8ZGad74EqXRyR1T6EXWXQ+xvSKZmI6CuvExBuXoKCBVJ/TzTQ5x46c4fM2+3aU0xTpupzCGhrpcI+21ITwhJjlaF6Kc0CGhyTG8ztftxIdcBus0rW8VkSvVbLnMTDPQstHAVvrSqahoBfLCAWqLnWcJ8+BqenFtPI9Tvq6Dj+Ilx+ukNiGoS7+ng43WGWMHWP4LtGeI/628Hzt23WCjSLL+HqzoUF3u8ouwZlPiYP8BbUXOoTG9XME9M4Oiny0X6LoHMf0lNO89dlFpllRL3ZzURXPO+bT4KiR/Juo645JhTDi0Y7Nk6MToML0ji00yKVw== | node1 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzKIg/MmPJzPoQxBWRN8G8ZGad74EqXRyR1T6EXWXQ+xvSKZmI6CuvExBuXoKCBVJ/TzTQ5x46c4fM2+3aU0xTpupzCGhrpcI+21ITwhJjlaF6Kc0CGhyTG8ztftxIdcBus0rW8VkSvVbLnMTDPQstHAVvrSqahoBfLCAWqLnWcJ8+BqenFtPI9Tvq6Dj+Ilx+ukNiGoS7+ng43WGWMHWP4LtGeI/628Hzt23WCjSLL+HqzoUF3u8ouwZlPiYP8BbUXOoTG9XME9M4Oiny0X6LoHMf0lNO89dlFpllRL3ZzURXPO+bT4KiR/Juo645JhTDi0Y7Nk6MToML0ji00yKVw== | ||
node1.mybeowulf.local ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzKIg/MmPJzPoQxBWRN8G8ZGad74EqXRyR1T6EXWXQ+xvSKZmI6CuvExBuXoKCBVJ/TzTQ5x46c4fM2+3aU0xTpupzCGhrpcI+21ITwhJjlaF6Kc0CGhyTG8ztftxIdcBus0rW8VkSvVbLnMTDPQstHAVvrSqahoBfLCAWqLnWcJ8+BqenFtPI9Tvq6Dj+Ilx+ukNiGoS7+ng43WGWMHWP4LtGeI/628Hzt23WCjSLL+HqzoUF3u8ouwZlPiYP8BbUXOoTG9XME9M4Oiny0X6LoHMf0lNO89dlFpllRL3ZzURXPO+bT4KiR/Juo645JhTDi0Y7Nk6MToML0ji00yKVw== | node1.mybeowulf.local ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzKIg/MmPJzPoQxBWRN8G8ZGad74EqXRyR1T6EXWXQ+xvSKZmI6CuvExBuXoKCBVJ/TzTQ5x46c4fM2+3aU0xTpupzCGhrpcI+21ITwhJjlaF6Kc0CGhyTG8ztftxIdcBus0rW8VkSvVbLnMTDPQstHAVvrSqahoBfLCAWqLnWcJ8+BqenFtPI9Tvq6Dj+Ilx+ukNiGoS7+ng43WGWMHWP4LtGeI/628Hzt23WCjSLL+HqzoUF3u8ouwZlPiYP8BbUXOoTG9XME9M4Oiny0X6LoHMf0lNO89dlFpllRL3ZzURXPO+bT4KiR/Juo645JhTDi0Y7Nk6MToML0ji00yKVw== | ||
[root@headnode ssh]# | [root@headnode ssh]# | ||
- | + | </pre> | |
Alhumdulillah. As you can see, the test is successful. So I will use /etc/ssh/ssh_known_hosts file instead of using /tmp/hosts.txt . | Alhumdulillah. As you can see, the test is successful. So I will use /etc/ssh/ssh_known_hosts file instead of using /tmp/hosts.txt . | ||
+ | <pre> | ||
[root@node1 ssh]# SSH_SCAN_KEY=$(ssh-keyscan -t rsa `hostname -s`) | [root@node1 ssh]# SSH_SCAN_KEY=$(ssh-keyscan -t rsa `hostname -s`) | ||
# node1 SSH-2.0-OpenSSH_4.3 | # node1 SSH-2.0-OpenSSH_4.3 | ||
Line 642: | Line 697: | ||
[root@node1 ssh]# ssh headnode "echo $SSH_SCAN_KEY >> /etc/ssh/ssh_known_hosts" | [root@node1 ssh]# ssh headnode "echo $SSH_SCAN_KEY >> /etc/ssh/ssh_known_hosts" | ||
[root@node1 ssh]# | [root@node1 ssh]# | ||
+ | </pre> | ||
On the server:- | On the server:- | ||
+ | <pre> | ||
[root@headnode ssh]# cat /etc/ssh/ssh_known_hosts | [root@headnode ssh]# cat /etc/ssh/ssh_known_hosts | ||
node1 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzKIg/MmPJzPoQxBWRN8G8ZGad74EqXRyR1T6EXWXQ+xvSKZmI6CuvExBuXoKCBVJ/TzTQ5x46c4fM2+3aU0xTpupzCGhrpcI+21ITwhJjlaF6Kc0CGhyTG8ztftxIdcBus0rW8VkSvVbLnMTDPQstHAVvrSqahoBfLCAWqLnWcJ8+BqenFtPI9Tvq6Dj+Ilx+ukNiGoS7+ng43WGWMHWP4LtGeI/628Hzt23WCjSLL+HqzoUF3u8ouwZlPiYP8BbUXOoTG9XME9M4Oiny0X6LoHMf0lNO89dlFpllRL3ZzURXPO+bT4KiR/Juo645JhTDi0Y7Nk6MToML0ji00yKVw== | node1 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzKIg/MmPJzPoQxBWRN8G8ZGad74EqXRyR1T6EXWXQ+xvSKZmI6CuvExBuXoKCBVJ/TzTQ5x46c4fM2+3aU0xTpupzCGhrpcI+21ITwhJjlaF6Kc0CGhyTG8ztftxIdcBus0rW8VkSvVbLnMTDPQstHAVvrSqahoBfLCAWqLnWcJ8+BqenFtPI9Tvq6Dj+Ilx+ukNiGoS7+ng43WGWMHWP4LtGeI/628Hzt23WCjSLL+HqzoUF3u8ouwZlPiYP8BbUXOoTG9XME9M4Oiny0X6LoHMf0lNO89dlFpllRL3ZzURXPO+bT4KiR/Juo645JhTDi0Y7Nk6MToML0ji00yKVw== | ||
node1.mybeowulf.local ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzKIg/MmPJzPoQxBWRN8G8ZGad74EqXRyR1T6EXWXQ+xvSKZmI6CuvExBuXoKCBVJ/TzTQ5x46c4fM2+3aU0xTpupzCGhrpcI+21ITwhJjlaF6Kc0CGhyTG8ztftxIdcBus0rW8VkSvVbLnMTDPQstHAVvrSqahoBfLCAWqLnWcJ8+BqenFtPI9Tvq6Dj+Ilx+ukNiGoS7+ng43WGWMHWP4LtGeI/628Hzt23WCjSLL+HqzoUF3u8ouwZlPiYP8BbUXOoTG9XME9M4Oiny0X6LoHMf0lNO89dlFpllRL3ZzURXPO+bT4KiR/Juo645JhTDi0Y7Nk6MToML0ji00yKVw== | node1.mybeowulf.local ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzKIg/MmPJzPoQxBWRN8G8ZGad74EqXRyR1T6EXWXQ+xvSKZmI6CuvExBuXoKCBVJ/TzTQ5x46c4fM2+3aU0xTpupzCGhrpcI+21ITwhJjlaF6Kc0CGhyTG8ztftxIdcBus0rW8VkSvVbLnMTDPQstHAVvrSqahoBfLCAWqLnWcJ8+BqenFtPI9Tvq6Dj+Ilx+ukNiGoS7+ng43WGWMHWP4LtGeI/628Hzt23WCjSLL+HqzoUF3u8ouwZlPiYP8BbUXOoTG9XME9M4Oiny0X6LoHMf0lNO89dlFpllRL3ZzURXPO+bT4KiR/Juo645JhTDi0Y7Nk6MToML0ji00yKVw== | ||
[root@headnode ssh]# | [root@headnode ssh]# | ||
- | + | </pre> | |
Now, I can ssh into my nodes, without password:- | Now, I can ssh into my nodes, without password:- | ||
+ | <pre> | ||
[root@headnode ssh]# ssh node1 | [root@headnode ssh]# ssh node1 | ||
Last login: Sat Mar 7 13:27:38 2009 from headnode | Last login: Sat Mar 7 13:27:38 2009 from headnode | ||
[root@node1 ~]# | [root@node1 ~]# | ||
+ | </pre> | ||
Alhumdulillah. | Alhumdulillah. | ||
- | + | = Or should I setup Rlogin first ? (Warning: Not needed / desired) = | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | Or should I setup Rlogin first ? | + | |
For Rlogin, we need to have rsh-server package installed on compute nodes. And to have it two-way, we need to have it on both compute nodes and the headnode. | For Rlogin, we need to have rsh-server package installed on compute nodes. And to have it two-way, we need to have it on both compute nodes and the headnode. | ||
Put this in post:- | Put this in post:- | ||
+ | <pre> | ||
[root@headnode ssh]# yum -y install rsh-server | [root@headnode ssh]# yum -y install rsh-server | ||
Line 729: | Line 783: | ||
} | } | ||
[root@headnode ssh]# | [root@headnode ssh]# | ||
- | |||
service xinetd restart | service xinetd restart | ||
+ | </pre> | ||
Same on the node:- | Same on the node:- | ||
+ | <pre> | ||
[root@node1 ssh]# perl -pi -e 's/= yes/= no/' /etc/xinetd.d/rlogin | [root@node1 ssh]# perl -pi -e 's/= yes/= no/' /etc/xinetd.d/rlogin | ||
[root@node1 ssh]# perl -pi -e 's/= yes/= no/' /etc/xinetd.d/rexec | [root@node1 ssh]# perl -pi -e 's/= yes/= no/' /etc/xinetd.d/rexec | ||
[root@node1 ssh]# perl -pi -e 's/= yes/= no/' /etc/xinetd.d/rsh | [root@node1 ssh]# perl -pi -e 's/= yes/= no/' /etc/xinetd.d/rsh | ||
+ | |||
[root@node1 ssh]# service xinetd restart | [root@node1 ssh]# service xinetd restart | ||
Stopping xinetd: [FAILED] | Stopping xinetd: [FAILED] | ||
Line 743: | Line 799: | ||
[root@node1 ssh]# chkconfig --level 35 xinetd on | [root@node1 ssh]# chkconfig --level 35 xinetd on | ||
[root@node1 ssh]# | [root@node1 ssh]# | ||
- | + | </pre> | |
- | + | ||
- | + | ||
Line 752: | Line 806: | ||
Server:- | Server:- | ||
+ | <pre> | ||
+ | (Incomplete. To do!) | ||
+ | </pre> | ||
- | + | = YUM repositories = | |
- | + | ||
- | + | ||
- | + | ||
What about YUM repository on the nodes? Put the following in the %post:- | What about YUM repository on the nodes? Put the following in the %post:- | ||
+ | <pre> | ||
[root@node1 ssh]# cat > /etc/yum.repos.d/CentOS-Base.repo << EOF | [root@node1 ssh]# cat > /etc/yum.repos.d/CentOS-Base.repo << EOF | ||
[base] | [base] | ||
Line 768: | Line 823: | ||
gpgkey=http://192.168.0.1/centos/RPM-GPG-KEY-CentOS-5 | gpgkey=http://192.168.0.1/centos/RPM-GPG-KEY-CentOS-5 | ||
EOF | EOF | ||
+ | <pre> | ||
- | + | = MPI = | |
- | + | ||
- | + | ||
OK. Now, lets setup MPI on this cluster. | OK. Now, lets setup MPI on this cluster. | ||
We need a central storage for storing MPI programs and user home directories. | We need a central storage for storing MPI programs and user home directories. | ||
- | + | On the server:- | |
+ | <pre> | ||
mkdir /cluster | mkdir /cluster | ||
mkdir /cluster/mpiuser | mkdir /cluster/mpiuser | ||
Line 784: | Line 839: | ||
vi /etc/exports | vi /etc/exports | ||
/cluster *(rw,no_root_squash,sync) | /cluster *(rw,no_root_squash,sync) | ||
- | |||
service nfs restart | service nfs restart | ||
chkconfig --level 35 nfs on | chkconfig --level 35 nfs on | ||
- | + | </pre> | |
We need to mount this directory on all cluster nodes. | We need to mount this directory on all cluster nodes. | ||
+ | <pre> | ||
[root@node1 ~]# mkdir /cluster | [root@node1 ~]# mkdir /cluster | ||
[root@node1 ~]# mount -t nfs headnode:/cluster /cluster | [root@node1 ~]# mount -t nfs headnode:/cluster /cluster | ||
Line 800: | Line 855: | ||
[root@node2 ~]# mkdir /cluster | [root@node2 ~]# mkdir /cluster | ||
[root@node2 ~]# mount -t nfs headnode:/cluster /cluster | [root@node2 ~]# mount -t nfs headnode:/cluster /cluster | ||
+ | </pre> | ||
Put the mount request in /etc/fstab of all the compute nodes. Also put the same in %post of compute.ks . | Put the mount request in /etc/fstab of all the compute nodes. Also put the same in %post of compute.ks . | ||
- | |||
- | |||
Now we need an MPI user with same userID on all nodes. This can be done manually as following, or through NIS. | Now we need an MPI user with same userID on all nodes. This can be done manually as following, or through NIS. | ||
+ | <pre> | ||
[root@headnode ~]# groupadd -g 600 mpiuser | [root@headnode ~]# groupadd -g 600 mpiuser | ||
[root@headnode ~]# useradd -u 600 -g 600 -c "MPI user" -d /cluster/mpiuser mpiuser | [root@headnode ~]# useradd -u 600 -g 600 -c "MPI user" -d /cluster/mpiuser mpiuser | ||
Line 814: | Line 869: | ||
drwx------ 3 mpiuser mpiuser 4096 Mar 8 09:16 mpiuser | drwx------ 3 mpiuser mpiuser 4096 Mar 8 09:16 mpiuser | ||
[root@headnode ~]# | [root@headnode ~]# | ||
+ | </pre> | ||
And on all compute nodes as well :- | And on all compute nodes as well :- | ||
+ | <pre> | ||
groupadd -g 600 mpiuser | groupadd -g 600 mpiuser | ||
useradd -u 600 -g 600 -c "MPI user" -d /cluster/mpiuser mpiuser | useradd -u 600 -g 600 -c "MPI user" -d /cluster/mpiuser mpiuser | ||
+ | </pre> | ||
Next we need ssh equivalence for mpiuser on all nodes. We already know that they have common home mounted on each node, as /cluster/mpiuser. So we just need to generate ssh keys for them and put the public key in the authorized_keys file only on headnode. By doing that, we would setup the ssh equivalence automateically. | Next we need ssh equivalence for mpiuser on all nodes. We already know that they have common home mounted on each node, as /cluster/mpiuser. So we just need to generate ssh keys for them and put the public key in the authorized_keys file only on headnode. By doing that, we would setup the ssh equivalence automateically. | ||
+ | <pre> | ||
[root@headnode ~]# su - mpiuser | [root@headnode ~]# su - mpiuser | ||
- | |||
[mpiuser@headnode ~]$ ssh-keygen -t rsa -C '' -N '' -f /cluster/mpiuser/.ssh/id_rsa | [mpiuser@headnode ~]$ ssh-keygen -t rsa -C '' -N '' -f /cluster/mpiuser/.ssh/id_rsa | ||
Line 841: | Line 899: | ||
The key fingerprint is: | The key fingerprint is: | ||
63:ef:8b:62:94:ea:88:83:c9:73:78:5b:f7:a0:0f:08 | 63:ef:8b:62:94:ea:88:83:c9:73:78:5b:f7:a0:0f:08 | ||
- | + | </pre> | |
Now lets copy the public key to the authorized_keys. | Now lets copy the public key to the authorized_keys. | ||
+ | <pre> | ||
[mpiuser@headnode ~]$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys | [mpiuser@headnode ~]$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys | ||
Line 850: | Line 909: | ||
[mpiuser@headnode ~]$ chmod 600 .ssh/* | [mpiuser@headnode ~]$ chmod 600 .ssh/* | ||
+ | </pre> | ||
Try logging on to the node1 as mpiuser:- | Try logging on to the node1 as mpiuser:- | ||
+ | <pre> | ||
[mpiuser@headnode ~]$ ssh mpiuser@node1 | [mpiuser@headnode ~]$ ssh mpiuser@node1 | ||
Warning: Permanently added the RSA host key for IP address '192.168.0.11' to the list of known hosts. | Warning: Permanently added the RSA host key for IP address '192.168.0.11' to the list of known hosts. | ||
[mpiuser@node1 ~]$ | [mpiuser@node1 ~]$ | ||
+ | </pre> | ||
Great! Alhumdulillah! | Great! Alhumdulillah! | ||
Line 862: | Line 924: | ||
Ok. Now we need GCC on all nodes (headnode+compute). | Ok. Now we need GCC on all nodes (headnode+compute). | ||
+ | <pre> | ||
yum -y install gcc | yum -y install gcc | ||
- | + | </pre> | |
After that, we need to download MPI version 2, also known as MPICH. The site is:- | After that, we need to download MPI version 2, also known as MPICH. The site is:- | ||
+ | <pre> | ||
http://www.mcs.anl.gov/research/projects/mpich2 | http://www.mcs.anl.gov/research/projects/mpich2 | ||
- | + | </pre> | |
Download on headnode and compile it in the shared location /cluster/mpich2 . | Download on headnode and compile it in the shared location /cluster/mpich2 . | ||
+ | <pre> | ||
cd /cluster | cd /cluster | ||
Line 883: | Line 948: | ||
./configure --prefix=/cluster/mpich2 | ./configure --prefix=/cluster/mpich2 | ||
+ | </pre> | ||
On a VMware machine, the configuration part takes 2-3 minutes. | On a VMware machine, the configuration part takes 2-3 minutes. | ||
+ | <pre> | ||
make | make | ||
+ | </pre> | ||
On a VMware machine, the compilation part takes 2-3 minutes. | On a VMware machine, the compilation part takes 2-3 minutes. | ||
+ | <pre> | ||
make install | make install | ||
+ | </pre> | ||
+ | Alright, now we need to define certain environment variables in the .bashrc or .bash_profile of the mpiuser. | ||
- | + | <pre> | |
- | + | ||
vi /cluster/mpiuser/.bash_profile | vi /cluster/mpiuser/.bash_profile | ||
... | ... | ||
PATH=$PATH:$HOME/bin:/cluster/mpich2/bin | PATH=$PATH:$HOME/bin:/cluster/mpich2/bin | ||
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/cluster/mpich2/lib | LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/cluster/mpich2/lib | ||
- | + | </pre> | |
I personally think that the following two lines are totally useless:- | I personally think that the following two lines are totally useless:- | ||
+ | <pre> | ||
Next we run this command in order to define MPICH installation path to SSH. | Next we run this command in order to define MPICH installation path to SSH. | ||
mpiu@ub0:~$ sudo echo /mirror/mpich2/bin >> /etc/environment | mpiu@ub0:~$ sudo echo /mirror/mpich2/bin >> /etc/environment | ||
- | + | </pre> | |
Lets login as mpiuser and see if our mpi executables are found when needed:- | Lets login as mpiuser and see if our mpi executables are found when needed:- | ||
+ | <pre> | ||
su - mpiuser | su - mpiuser | ||
Line 920: | Line 992: | ||
[mpiuser@headnode ~]$ which mpirun | [mpiuser@headnode ~]$ which mpirun | ||
/cluster/mpich2/bin/mpirun | /cluster/mpich2/bin/mpirun | ||
- | + | </pre> | |
Setup MPD:- | Setup MPD:- | ||
MPD is MPI Daemon. We need to create a file named mpd.hosts in mpiuser's home directory, and put in the names of our compute nodes. ( I am using headnode as compute node as well). | MPD is MPI Daemon. We need to create a file named mpd.hosts in mpiuser's home directory, and put in the names of our compute nodes. ( I am using headnode as compute node as well). | ||
+ | <pre> | ||
vi /cluster/mpiuser/mpd.hosts | vi /cluster/mpiuser/mpd.hosts | ||
headnode | headnode | ||
node1 | node1 | ||
node2 | node2 | ||
- | + | </pre> | |
We also need to have a secrets file for the cluster:- | We also need to have a secrets file for the cluster:- | ||
+ | <pre> | ||
vi /cluster/mpiuser/.mpd.conf | vi /cluster/mpiuser/.mpd.conf | ||
secretword=redhat | secretword=redhat | ||
+ | </pre> | ||
Tighten the permissions:- | Tighten the permissions:- | ||
+ | <pre> | ||
chmod 0600 /cluster/mpiuser/.mpd.conf | chmod 0600 /cluster/mpiuser/.mpd.conf | ||
- | + | </pre> | |
Now run the following sequence of commands to check if things are working:- | Now run the following sequence of commands to check if things are working:- | ||
Line 946: | Line 1,022: | ||
On the headnode:- | On the headnode:- | ||
+ | <pre> | ||
mpd & | mpd & | ||
sleep 2 | sleep 2 | ||
mpdtrace | mpdtrace | ||
mpdallexit | mpdallexit | ||
- | + | </pre> | |
This should give you the following output. Notice the hostname returned by the mpdtrace command:- | This should give you the following output. Notice the hostname returned by the mpdtrace command:- | ||
+ | <pre> | ||
[mpiuser@headnode ~]$ mpd& | [mpiuser@headnode ~]$ mpd& | ||
[1] 19235 | [1] 19235 | ||
Line 960: | Line 1,038: | ||
[mpiuser@headnode ~]$ mpdallexit | [mpiuser@headnode ~]$ mpdallexit | ||
[mpiuser@headnode ~]$ | [mpiuser@headnode ~]$ | ||
- | + | </pre> | |
Here is an interesting check. I intentionally shutdown node2 and then checked, what MPD reports:- | Here is an interesting check. I intentionally shutdown node2 and then checked, what MPD reports:- | ||
+ | <pre> | ||
[mpiuser@headnode ~]$ mpdboot -n 3 --chkuponly | [mpiuser@headnode ~]$ mpdboot -n 3 --chkuponly | ||
checking node1 | checking node1 | ||
Line 970: | Line 1,049: | ||
['node2'] | ['node2'] | ||
[mpiuser@headnode ~]$ | [mpiuser@headnode ~]$ | ||
- | + | </pre> | |
Lets try booting MPD on all three nodes:- | Lets try booting MPD on all three nodes:- | ||
+ | <pre> | ||
[mpiuser@headnode ~]$ mpdboot -n 3 | [mpiuser@headnode ~]$ mpdboot -n 3 | ||
mpdboot_headnode (handle_mpd_output 406): from mpd on node2, invalid port info: | mpdboot_headnode (handle_mpd_output 406): from mpd on node2, invalid port info: | ||
no_port | no_port | ||
- | + | </pre> | |
Failed! Ok. Lets boot MPD on two nodes only:- | Failed! Ok. Lets boot MPD on two nodes only:- | ||
+ | <pre> | ||
[mpiuser@headnode ~]$ mpdboot -n 2 | [mpiuser@headnode ~]$ mpdboot -n 2 | ||
- | + | </pre> | |
mpdtrace should return the name of hosts mpd is successfully running on:- | mpdtrace should return the name of hosts mpd is successfully running on:- | ||
+ | <pre> | ||
[mpiuser@headnode ~]$ mpdtrace | [mpiuser@headnode ~]$ mpdtrace | ||
headnode | headnode | ||
Line 994: | Line 1,076: | ||
headnode_52307 (192.168.0.10) | headnode_52307 (192.168.0.10) | ||
node1_38651 (192.168.0.11) | node1_38651 (192.168.0.11) | ||
- | + | </pre> | |
So far so good. Lets execute a sample program provided to us in the examples directory in mpch2 source code directory:- | So far so good. Lets execute a sample program provided to us in the examples directory in mpch2 source code directory:- | ||
+ | <pre> | ||
[mpiuser@headnode cluster]$ cd /cluster/mpich2-1.0.8/examples/ | [mpiuser@headnode cluster]$ cd /cluster/mpich2-1.0.8/examples/ | ||
+ | </pre> | ||
There is a compiled program in this directory. The othere programs need compiling with mpimake, NOT with simple make. | There is a compiled program in this directory. The othere programs need compiling with mpimake, NOT with simple make. | ||
+ | <pre> | ||
[mpiuser@headnode examples]$ ls -l | [mpiuser@headnode examples]$ ls -l | ||
total 968 | total 968 | ||
Line 1,010: | Line 1,095: | ||
... | ... | ||
... | ... | ||
+ | </pre> | ||
Lets run one process of mpiexec. mpiexec will automatically select any one node to do that.:- | Lets run one process of mpiexec. mpiexec will automatically select any one node to do that.:- | ||
+ | <pre> | ||
[mpiuser@headnode examples]$ mpiexec -n 1 ./cpi | [mpiuser@headnode examples]$ mpiexec -n 1 ./cpi | ||
Process 0 of 1 is on headnode | Process 0 of 1 is on headnode | ||
Line 1,019: | Line 1,106: | ||
wall clock time = 0.000014 | wall clock time = 0.000014 | ||
[mpiuser@headnode examples]$ | [mpiuser@headnode examples]$ | ||
- | + | </pre> | |
Lets run two processes of mpiexec. mpiexec will automatically select any two nodes to do that.:- | Lets run two processes of mpiexec. mpiexec will automatically select any two nodes to do that.:- | ||
+ | <pre> | ||
[mpiuser@headnode examples]$ mpiexec -n 2 ./cpi | [mpiuser@headnode examples]$ mpiexec -n 2 ./cpi | ||
Process 0 of 2 is on headnode | Process 0 of 2 is on headnode | ||
Line 1,030: | Line 1,118: | ||
wall clock time = 0.001619 | wall clock time = 0.001619 | ||
[mpiuser@headnode examples]$ | [mpiuser@headnode examples]$ | ||
- | + | </pre> | |
Lets run two processes of mpiexec. mpiexec will automatically place two processes (out of total of four), on each node, as we have only two nodes:- | Lets run two processes of mpiexec. mpiexec will automatically place two processes (out of total of four), on each node, as we have only two nodes:- | ||
+ | <pre> | ||
[mpiuser@headnode examples]$ mpiexec -n 4 ./cpi | [mpiuser@headnode examples]$ mpiexec -n 4 ./cpi | ||
Process 0 of 4 is on headnode | Process 0 of 4 is on headnode | ||
Line 1,042: | Line 1,131: | ||
pi is approximately 3.1415926544231239, Error is 0.0000000008333307 | pi is approximately 3.1415926544231239, Error is 0.0000000008333307 | ||
wall clock time = 0.005809 | wall clock time = 0.005809 | ||
- | + | </pre> | |
The wall time has increased, by the increase in number of processes. It should have decreased, you must be thinking. You are right! But the hardware is not! Notice that these machines, are on a single laptop computer, created inside vmware. As soon as we increase nodes. The same single CPU is divided and shared between the compute nodes. Effectively decreasing the compute power of each node, all of a sudden. On real hardware based compute nodes. This time, WILL decrease, as each process will have a full CPU to itself and thus will take lesser time. | The wall time has increased, by the increase in number of processes. It should have decreased, you must be thinking. You are right! But the hardware is not! Notice that these machines, are on a single laptop computer, created inside vmware. As soon as we increase nodes. The same single CPU is divided and shared between the compute nodes. Effectively decreasing the compute power of each node, all of a sudden. On real hardware based compute nodes. This time, WILL decrease, as each process will have a full CPU to itself and thus will take lesser time. | ||
Line 1,051: | Line 1,140: | ||
There is a file named, icpi. Lets compile that and run that. [icpi is Interactive version of cpi]. | There is a file named, icpi. Lets compile that and run that. [icpi is Interactive version of cpi]. | ||
+ | <pre> | ||
[mpiuser@headnode examples]$ ls -l | [mpiuser@headnode examples]$ ls -l | ||
total 968 | total 968 | ||
Line 1,063: | Line 1,153: | ||
... | ... | ||
... | ... | ||
+ | </pre> | ||
Compile:- | Compile:- | ||
- | + | <pre> | |
[mpiuser@headnode examples]$ mpicc -o /cluster/mpiuser/icpi /cluster/mpich2-1.0.8/examples/icpi.c | [mpiuser@headnode examples]$ mpicc -o /cluster/mpiuser/icpi /cluster/mpich2-1.0.8/examples/icpi.c | ||
- | + | </pre> | |
Execute:- | Execute:- | ||
- | + | <pre> | |
cd ~ | cd ~ | ||
Line 1,082: | Line 1,173: | ||
Enter the number of intervals: (0 quits) 0 | Enter the number of intervals: (0 quits) 0 | ||
[mpiuser@headnode ~]$ | [mpiuser@headnode ~]$ | ||
- | + | </pre> | |
Terminate the MPD daemon:- | Terminate the MPD daemon:- | ||
- | + | <pre> | |
mpdallexit | mpdallexit | ||
+ | </pre> | ||
By using these simple examples, we have seen how we can setup and run MPI and MPI based programs. Alhumdulillah. | By using these simple examples, we have seen how we can setup and run MPI and MPI based programs. Alhumdulillah. | ||
- | + | = Linpack = | |
- | + | ||
- | + | ||
- | Linpack | + | |
Linpack needs mpi/lam or mpich or openmpi installed on the system. User equivalence should also be setup. This is what we have already setup in the steps above. | Linpack needs mpi/lam or mpich or openmpi installed on the system. User equivalence should also be setup. This is what we have already setup in the steps above. |
Revision as of 22:04, 14 February 2010
bismilla-hirrahma-nirraheem
Introduction and scenario
My host machine/physical machine is named kworkbee, running Fedora 8 (i386), and VMware server 2.0 on top of it.
My virtual machines are connected on HostOnly network vmnet1. 192.168.0.0/24 . Whereas 192.168.0.1 is the ip of the vmnet1 interface on host machine.
The head node of my cluster is named "headnode" and has an IP of 192.168.0.10 , on it's eth0.
My Virtual machines, part of this beowulf cluster are :-
- headnode
- node1 (MAC address: 00:50:56:00:00:11)
- node2 (MAC address: 00:50:56:00:00:12)
I am using "redhat" (without quotes), as my root password on all machines.
The /etc/hosts file on my host machine and headnode, looks like:-
[root@kworkbee ~]# cat /etc/hosts 127.0.0.1 localhost.localdomain localhost 192.168.0.1 kworkbee kworkbee 192.168.0.10 headnode headnode 192.168.0.11 node1 node1 192.168.0.12 node2 node2
On the host server, extracted the ISO file in a directory:
mount -o loop /data/cdimages/CentOS-5.2-i386-bin-DVD.iso /media/loop/ rsync -av /media/loop/* /data/cdimages/centos/
Configure an apache Alias and restart apache service.
# vi /etc/httpd/conf.d/centos.conf Alias /centos /data/cdimages/centos/ <Location /centos> Order deny,allow Allow from all Options +Indexes </Location> # service httpd restart
Check by opening a browser window, on the host computer. You should get a list of files from the top level directory of the Centos distribution:-
The same should be accessible from the head node of your cluster, using the address:-
To ease the pain of installation of various software on the headnode, I have edited the yum repository file on headnode as shown below. Comment out the rest of the file:-
# vi /etc/yum.repos.d/CentOS-Base.repo [base] name=CentOS-$releasever - Base baseurl=http://192.168.0.1/centos/ gpgcheck=0 gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-5
Install necessary software on the headnode :-
- DHCP-server
- TFTP-server
- syslinux
[root@headnode ~]# yum -y install dhcp tftp-server syslinux
Setup the /etc/dhcpd.conf file as shown here:-
# vi /etc/dhcpd.conf ddns-update-style interim; ignore client-updates; subnet 192.168.0.0 netmask 255.255.255.0 { # --- default gateway option routers 192.168.0.1; option subnet-mask 255.255.255.0; option domain-name "mybeowulf.local"; option domain-name-servers 192.168.0.10; option time-offset -18000; # Eastern Standard Time option ntp-servers 192.168.0.10; filename "pxelinux.0"; range dynamic-bootp 192.168.0.11 192.168.0.20; default-lease-time 21600; max-lease-time 43200; # next-server points to the TFTP/PXE server. host node1 { filename "pxelinux.0"; next-server 192.168.0.10; hardware ethernet 00:50:56:00:00:11; fixed-address 192.168.0.11; } host node2 { filename "pxelinux.0"; next-server 192.168.0.10; hardware ethernet 00:50:56:00:00:12; fixed-address 192.168.0.12; } }
Note: The file pxelinux.0 is a special file, which gets installed through the package syslinux. This is the actual pxe boot linux kernel, and is run as soon as the pxe client/ client machine gets a DHCP lease. The actual location of this file is mentioned later in this howto.
[root@headnode ~]# service dhcpd restart Starting dhcpd: [ OK ] [root@headnode ~]# chkconfig --level 35 dhcpd on
Now enable TFTP as well:-
[root@headnode ~]# vi /etc/xinetd.d/tftp service tftp { socket_type = dgram protocol = udp wait = yes user = root server = /usr/sbin/in.tftpd server_args = -s /tftpboot disable = no per_source = 11 cps = 100 2 flags = IPv4 } [root@headnode ~]# service xinetd restart Stopping xinetd: [ OK ] Starting xinetd: [ OK ] [root@headnode ~]# chkconfig --level 35 xinetd on
Next, we need to copy two special files on the host machine, from a special location of distribution media. On the host machine, I checked the list of files to make sure that the files are available:-
[root@kworkbee ~]# ls /data/cdimages/centos/images/pxeboot/ initrd.img README TRANS.TBL vmlinuz [root@kworkbee ~]#
We will copy the vmlinuz and initrd.img files, from this location on the host machine, to /tftpboot directory on the headnode machine :-
On the headnode, we have the CentOS distribution available through HTTP path : http://192.168.0.1:/centos . We will use wget to download these two files from the host machine. I can also use scp, but just to make things interesting, here it is:-
[root@headnode ~]# cd /tftpboot/ [root@headnode tftpboot]# wget http://192.168.0.1/centos/images/pxeboot/vmlinuz [root@headnode tftpboot]# wget http://192.168.0.1/centos/images/pxeboot/initrd.img
PXE configuration detail:- When a client boots, by default it will look for a configuration file from TFTP, with the same name as it's MAC address. However, afrer trying several options, it will fall back to requesting a default file, with the name "default". This file needs to be in a directory in /tftp of the headnode.
# mkdir /tftpboot/pxelinux.cfg # vi /tftpboot/pxelinux.cfg/default prompt 1 timeout 5 default linux label linux kernel vmlinuz append vga=normal initrd=initrd.img
Now is the time to copy the pxelinux.0 file from the installed location to /tftpboot directory . This file (pxelinux.0) is provided by the syslinux package, which comes with your linux distribution.
# cp /usr/lib/syslinux/pxelinux.0 /tftpboot/
Now make sure that all files and direcoties inside /tftpboot is world readable.
# chmod +r /tftpboot/* -R
Now you are ready to boot your client. You should see your client getting an IP and a boot file and booting off from the pxe boot image.
While your client boots, you should the following in your /var/log/messages of your headnode:-
# tail -f /var/log/messages Feb 27 19:55:15 beowulf dhcpd: DHCPDISCOVER from 00:50:56:00:00:11 via eth0 Feb 27 19:55:15 beowulf dhcpd: DHCPOFFER on 192.168.0.11 to 00:50:56:00:00:11 via eth0 Feb 27 19:55:17 beowulf dhcpd: Dynamic and static leases present for 192.168.0.11. Feb 27 19:55:17 beowulf dhcpd: Remove host declaration node1 or remove 192.168.0.11 Feb 27 19:55:17 beowulf dhcpd: from the dynamic address pool for 192.168.0/24 Feb 27 19:55:17 beowulf dhcpd: DHCPREQUEST for 192.168.0.11 (192.168.0.10) from 00:50:56:00:00:11 via eth0 Feb 27 19:55:17 beowulf dhcpd: DHCPACK on 192.168.0.11 to 00:50:56:00:00:11 via eth0 Feb 27 19:55:17 beowulf in.tftpd[12476]: tftp: client does not accept options
By doing this, you have managed to start up the interactive installation . Congratulations!
Automated KickStart installations
For automated KickStart based setups, you need to do the following additional steps.
First you need a kickstart file. You can use the minimal kickstart file from your headnode! How ? Well, when you installed your headnode, the installer created a file anaconda-ks.cfg in your /root directory. You can use this file, modify it a bit and use it as the kickstart file for your compute nodes.
[root@headnode ~]# cp anaconda-ks.cfg compute-ks.cfg
Edit this file as per your requirements.
[root@headnode ~]# vi compute-ks.cfg # Kickstart file automatically generated by anaconda. install # My centos distribution is on the hostmachine (kworkbee)(192.168.0.1), # , not on the headnode. url --url http://192.168.0.1/centos lang en_US.UTF-8 keyboard us network --device eth0 --bootproto dhcp --hostname node rootpw --iscrypted $1$t7dSrF04$Ea4kcb4QFbC3JdZmVyTTA/ firewall --disabled authconfig --enableshadow --enablemd5 selinux --disabled timezone Asia/Riyadh zerombr yes bootloader --location=mbr --driveorder=sda # The following is the partition information you requested # Note that any partitions you deleted are not expressed # here so unless you clear all partitions first, this is # not guaranteed to work clearpart --all --initlabel part / --fstype ext3 --size=1 --grow part swap --size=256 reboot %packages @base
Now, copy this file to the document root of your web server, and make it world readable:-
cp /root/compute-ks.cfg /var/www/html/ chmod +r /var/www/html/compute-ks.cfg
Edit the tftpboot file again and add extra options:-
[root@headnode ~]# vi /tftpboot/pxelinux.cfg/default prompt 1 timeout 5 default linux label linux kernel vmlinuz append vga=normal initrd=initrd.img ip=dhcp ksdevice=eth0 ks=http://192.168.0.10/compute-ks.cfg <pre> You need to make sure that httpd is running on your head node otherwise the installer will not be able to access this file. Another option is to copy this compute-ks.cfg file to the document root of your host machine (192.168.0.1), in /var/www/html directory. <pre> [root@headnode ~]# service httpd restart
Time to test this setup. On the head node, open up /var/log/messages and /var/log/httpd/access_log files in separate terminals. You should get an entry in your apache access log file, when the installer gets the kickstart file, from the headnode.
For the logs of package retrieval during the actual installation, you should check apache access log on the hostmachine.
[root@headnode ~]# tail -f /var/log/httpd/access_log 192.168.0.11 - - [27/Feb/2009:21:06:50 +0300] "GET /compute-ks.cfg HTTP/1.0" 200 680 "-" "anacona/11.1.2.113" [root@headnode ~]# tail -f /var/log/messages Feb 27 21:06:51 beowulf dhcpd: DHCPDISCOVER from 00:50:56:00:00:11 via eth0 Feb 27 21:06:51 beowulf dhcpd: DHCPOFFER on 192.168.0.11 to 00:50:56:00:00:11 via eth0 Feb 27 21:06:51 beowulf dhcpd: Dynamic and static leases present for 192.168.0.11. Feb 27 21:06:51 beowulf dhcpd: Remove host declaration node1 or remove 192.168.0.11 Feb 27 21:06:51 beowulf dhcpd: from the dynamic address pool for 192.168.0/24 Feb 27 21:06:51 beowulf dhcpd: DHCPREQUEST for 192.168.0.11 (192.168.0.10) from 00:50:56:00:00:11 via eth0 Feb 27 21:06:51 beowulf dhcpd: DHCPACK on 192.168.0.11 to 00:50:56:00:00:11 via eth0
Try logging in to a node after installation. The screenshot below shows that regardless of the node we log into, we are shown "node" as the hostname of the node. This is because we fixed the hostname as node in the kickstart file. This is a limitation in this type of installation. In case you are installing more than one nodes of a cluster, using such method, you either create separate pxe files and related separate kickstart files, through some sort of script and install nodes using that method. Or, you can manually change the node name after they are installed.
I found the following links helpful:
http://www.debian-administration.org/articles/478 http://linux-sys.org/internet_serving/pxeboot.html
This can be observed manually by doing the following manual steps.
Rename the file /tftpboot/pxelinux.cfg/default to 01-<MAC address of any one of your node> i.e. (e.g. node1)
[root@headnode pxelinux.cfg]# mv /tftpboot/pxelinux.cfg/default /tftpboot/pxelinux.cfg/01-00-50-56-00-00-11
The "01", before the MAC address represents a hardware type of ethernet.
Now as you note that you don't have a default file in your tftp setup any more. Only node1 should be able to boot and install from a pxe image properly. Node2, should fail. This can be seen from the screenshot (pxe-boot-7.png) below:-
So you need to develop a mechanism to automate this all, for your cluster. And for the sake of automating a simple task, such as initial installation, there are management tools / software, such as ROCKS, OSCAR, Cobbler, Scali / Platform, etc.
With the help of a little scripting , and the availability of hostnames, mac addresses and IP address range, we can create multiple pxe boot files and related multiple KickStart files. Each pxe file will have an entry against it's related ks file only. A simple example is show below:-
First, the PXE file for node1:-
[root@headnode ~]# vi /tftpboot/pxelinux.cfg/01-00-50-56-00-00-11 prompt 1 timeout 5 default linux label linux kernel vmlinuz append vga=normal initrd=initrd.img ip=dhcp ks=http://192.168.0.10/node1-ks.cfg
And then, the KickStart file for node1. Notice the different network line for both static IP and hostname :-
[root@headnode ~]# vi /var/www/html/node1-ks.cfg # Kickstart file automatically generated by anaconda. install # My centos distribution is on the hostmachine (kworkbee)(192.168.0.1), # , not on the headnode. url --url http://192.168.0.1/centos lang en_US.UTF-8 keyboard us network --device eth0 --bootproto static --ip 192.168.0.11 --netmask 255.255.255.0 --gateway 192.168.0.10 --nameserver 192.168.0.10 --hostname node1 rootpw --iscrypted $1$t7dSrF04$Ea4kcb4QFbC3JdZmVyTTA/ firewall --disabled authconfig --enableshadow --enablemd5 selinux --disabled timezone Asia/Riyadh zerombr yes bootloader --location=mbr --driveorder=sda # The following is the partition information you requested # Note that any partitions you deleted are not expressed # here so unless you clear all partitions first, this is # not guaranteed to work clearpart --all --initlabel part / --fstype ext3 --size=1 --grow part swap --size=256 reboot %packages @base
Collection of SSH fingerprints of all cluster node
Excellent article at :
http://itg.chem.indiana.edu/inc/wiki/software/openssh/189.html
[root@headnode ~]# ping node1 PING node1 (192.168.0.11) 56(84) bytes of data. 64 bytes from node1 (192.168.0.11): icmp_seq=1 ttl=64 time=3.15 ms --- node1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 3.153/3.153/3.153/0.000 ms [root@headnode ~]# ping node2 PING node2 (192.168.0.12) 56(84) bytes of data. 64 bytes from node2 (192.168.0.12): icmp_seq=1 ttl=64 time=3.01 ms 64 bytes from node2 (192.168.0.12): icmp_seq=2 ttl=64 time=1.69 ms --- node2 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 1.695/2.355/3.016/0.662 ms [root@headnode ~]#
Let's try to connect to a node and see if it asks for saving fingerprint. We will select NO to the fingerprint save option for the time being:-
[root@headnode ~]# ssh node1 The authenticity of host 'node1 (192.168.0.11)' can't be established. RSA key fingerprint is b0:7c:d0:45:3a:98:ee:b8:8c:4c:47:c5:0e:31:91:13. Are you sure you want to continue connecting (yes/no)? no Host key verification failed.
The file /etc/ssh/ssh_known_hosts has all the fingerprints. However it does not exist by default:
[root@headnode ~]# ls /etc/ssh/ssh_known_hosts ls: /etc/ssh/ssh_known_hosts: No such file or directory
Lets gerenate the RSA1 fingerprints of our two cluster nodes. (Remember, the nodes were ping-able):
[root@headnode ~]# ssh-keyscan -t rsa localhost headnode node1 node2 > /etc/ssh/ssh_known_hosts # node1 SSH-2.0-OpenSSH_4.3 # node2 SSH-2.0-OpenSSH_4.3
See! It worked !
[root@headnode ~]# ssh node1 Warning: Permanently added the RSA host key for IP address '192.168.0.11' to the list of known hosts. root@node1's password: [root@localhost ~]#
Notice that it did not ask to save any finger-print.
Host based authentication (Warning: This is NOT much liked solution)
Good article at: http://kbase.redhat.com/faq/docs/DOC-9164
For hostbased authentication to work, you should have ssh host keys on both headnode and compute nodes.
Normally the following would be needed to be setup on the "server" side, which you are trying to access from a "client" . In our case, our head node is in-fact acting in client role. And the compute node is infact the "server".
So you may need to setup the following on both sides, if you want both sides to logon to each other in passwordless fashion.
Server:-
# vi /etc/ssh/sshd_config ... HostbasedAuthentication yes IgnoreUserKnownHosts yes IgnoreRhosts no ChallengeResponseAuthentication no GSSAPIAuthentication no GSSAPICleanupCredentials no .... # ssh-keyscan -t rsa node1 node1.mybeowulf.local > /etc/ssh/ssh_known_hosts
Client :-
... # vi /etc/ssh/ssh_config GSSAPIAuthentication no HostbasedAuthentication yes EnableSSHKeySign yes ... [root@node1 ssh]# vi ~/.shosts headnode root (Or/and) 192.168.0.10 root or/and headnode.mybeowulf.local root # needs a working DNS # chmod 600 ~/.shosts
Make sure that your name resolution is setup correctly:
# vi /etc/hosts 127.0.0.1 localhost.localdomain localhost 192.168.0.11 node1.mybeowulf.local node1 192.168.0.10 headnode headnode
(Not directly linked topic) "How to Control VMware Virtual Machines from command line?"
Note: Some people say that VMware tools must be installed on the vmmachines for this to work. I did not install any vmtools on my virtual machines and yet I got this working properly.
This works perfectly for restarting the vm machines from linux command prompt. Please note that my datastore is named "standard", and the exact location of vmware machine (node2) on my disk is (/data/vmachines/beowulf_node2/beowulf_node2.vmx) :-
[root@kworkbee ~]# vmrun -T server -h https://localhost:8333/sdk -u root -p redhat reset "[standard] beowulf_node2/beowulf_node2.vmx" [root@kworkbee ~]#
SSH key based authentication
KEYGEN -q -t rsa1 -f $RSA1_KEY -C '' -N '' >&/dev/null chmod 600 $RSA1_KEY chmod 644 $RSA1_KEY.pub
[root@headnode .ssh]# ssh-keygen -t rsa -f /root/.ssh/id_rsa -C '' -N '' Generating public/private rsa key pair. Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 4a:74:2d:62:09:16:2e:fd:2e:31:a2:97:d3:e5:a6:15 [root@headnode .ssh]# ssh-keygen -t dsa -f /root/.ssh/id_dsa -C '' -N '' Generating public/private dsa key pair. Your identification has been saved in /root/.ssh/id_dsa. Your public key has been saved in /root/.ssh/id_dsa.pub. The key fingerprint is: 8f:90:a2:02:82:e3:7a:4b:55:a7:79:cc:14:87:3b:d9 [root@headnode .ssh]# [root@headnode .ssh]# cat id_dsa.pub >> authorized_keys [root@headnode .ssh]# cat id_rsa.pub >> authorized_keys
Now try logging in to this machine:-
[root@headnode .ssh]# ssh localhost Last login: Sat Mar 7 13:12:09 2009 from localhost.localdomain [root@headnode ~]#
As you can see, it works without asking for password. Good. Now lets copy the private files and the public files to the .ssh directory of the nodes. We can put them in a special directory named ssh in our webroot and can get them through wget on the node, during %post of the kickstart . This way all nodes will have a single ssh private and public file. Not much hassle.
[root@headnode .ssh]# mkdir /var/www/html/ssh [root@headnode .ssh]# cp /root/.ssh/id_* /var/www/html/ssh/ [root@headnode .ssh]# cp /root/.ssh/authorized_keys /var/www/html/ssh/ [root@headnode .ssh]# chmod +r /var/www/html/ssh/* [root@headnode .ssh]# ls -l /var/www/html/ssh/ total 20 -rw-r--r-- 1 root root 972 Mar 7 13:18 authorized_keys -rw-r--r-- 1 root root 672 Mar 7 13:17 id_dsa -rw-r--r-- 1 root root 590 Mar 7 13:17 id_dsa.pub -rw-r--r-- 1 root root 1675 Mar 7 13:17 id_rsa -rw-r--r-- 1 root root 382 Mar 7 13:17 id_rsa.pub [root@headnode .ssh]#
Now lets write down steps of setting up a correct .ssh directory on the node.
mkdir /root/.ssh chmod 0700 /root/.ssh cd /root/.ssh wget http://192.168.0.10/ssh/id_dsa wget http://192.168.0.10/ssh/id_dsa.pub wget http://192.168.0.10/ssh/id_rsa wget http://192.168.0.10/ssh/id_rsa.pub wget http://192.168.0.10/ssh/authorized_keys chmod 0600 /root/.ssh/* cd
Lets test:
[root@headnode .ssh]# ssh root@node1 Last login: Sat Mar 7 13:27:13 2009 from headnode [root@node1 ~]#
Great ! Let's check the other way round.
[root@node1 .ssh]# ssh root@headnode Last login: Sat Mar 7 13:13:11 2009 from localhost.localdomain [root@headnode ~]#
This works great too!
Note: This excercise assumes that RSA HOST Keys were scanned for each host and saved in /etc/ssh/ssh_known_hosts . However, it doesn't make sense. You see, in the beginning there will be only one host RSA key in the /etc/ssh/ssh_known_hosts on the headnode. And that would be the entry for headnode only. And we also see that we could not know the RSA host key of any nodes which are not installed yet. So, one way to do it is to make this ssh_known_hosts file available to the node as well over http. We will copy this file from the headnode to the client in the %post of client kickstart.
But, how would the headnode know that it is time that a particular node is done installation and it can generate it's ssh-host-key and add it to its /etc/ssh/ssh_known_hosts ? I wonder how my company is doing it ?
Remember we are doing this manually anyway. I mean the node restart one by one . We are not doing it through any management software as yet. So we need to manually add the ssh-host-key of each node to the /etc/ssh/ssh_known_hosts file on the server, at the end of installation of each node.
We can still automate it somhow. That is, we can put the ssh host key of this computer on the headnode as soon as this node is isntalled. Through a cron or something, we will add the keys of all nodes, which are installed , to the ssh_known_hosts file on headnode.
Or we can constantly monitor some log file to check when a node sends completion signal and we can initiate its key generation process.
The following four commands setup the headnode's ssh-host-key on the node.
[root@headnode .ssh]# ssh-keyscan -t rsa headnode > /var/www/html/ssh/ssh_known_hosts chmod +r /var/www/html/ssh/ssh_known_hosts
On the node side:-
[root@node1 ssh]# rm ssh_known_hosts* -f [root@node1 ssh]# wget http://192.168.0.10/ssh/ssh_known_hosts
Then, I would use the following to add the ssh-host-key of the newly installed compute node to the ssh_known_hosts file on the headnode:-
[root@node1 ssh]# SSH_SCAN_KEY=$(ssh-keyscan -t rsa `hostname -s`) # node1 SSH-2.0-OpenSSH_4.3 [root@node1 ssh]# [root@node1 ssh]# ssh headnode "echo $SSH_SCAN_KEY >> /tmp/hosts.txt" [root@node1 ssh]#
Let's send the FQDN key as well:-
[root@node1 ssh]# SSH_SCAN_KEY=$(ssh-keyscan -t rsa `hostname`) # node1.mybeowulf.local SSH-2.0-OpenSSH_4.3 [root@node1 ssh]# [root@node1 ssh]# ssh headnode "echo $SSH_SCAN_KEY >> /tmp/hosts.txt" [root@node1 ssh]#
Let's check on the server side:-
[root@headnode ssh]# cat /tmp/hosts.txt node1 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzKIg/MmPJzPoQxBWRN8G8ZGad74EqXRyR1T6EXWXQ+xvSKZmI6CuvExBuXoKCBVJ/TzTQ5x46c4fM2+3aU0xTpupzCGhrpcI+21ITwhJjlaF6Kc0CGhyTG8ztftxIdcBus0rW8VkSvVbLnMTDPQstHAVvrSqahoBfLCAWqLnWcJ8+BqenFtPI9Tvq6Dj+Ilx+ukNiGoS7+ng43WGWMHWP4LtGeI/628Hzt23WCjSLL+HqzoUF3u8ouwZlPiYP8BbUXOoTG9XME9M4Oiny0X6LoHMf0lNO89dlFpllRL3ZzURXPO+bT4KiR/Juo645JhTDi0Y7Nk6MToML0ji00yKVw== node1.mybeowulf.local ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzKIg/MmPJzPoQxBWRN8G8ZGad74EqXRyR1T6EXWXQ+xvSKZmI6CuvExBuXoKCBVJ/TzTQ5x46c4fM2+3aU0xTpupzCGhrpcI+21ITwhJjlaF6Kc0CGhyTG8ztftxIdcBus0rW8VkSvVbLnMTDPQstHAVvrSqahoBfLCAWqLnWcJ8+BqenFtPI9Tvq6Dj+Ilx+ukNiGoS7+ng43WGWMHWP4LtGeI/628Hzt23WCjSLL+HqzoUF3u8ouwZlPiYP8BbUXOoTG9XME9M4Oiny0X6LoHMf0lNO89dlFpllRL3ZzURXPO+bT4KiR/Juo645JhTDi0Y7Nk6MToML0ji00yKVw== [root@headnode ssh]#
Alhumdulillah. As you can see, the test is successful. So I will use /etc/ssh/ssh_known_hosts file instead of using /tmp/hosts.txt .
[root@node1 ssh]# SSH_SCAN_KEY=$(ssh-keyscan -t rsa `hostname -s`) # node1 SSH-2.0-OpenSSH_4.3 [root@node1 ssh]# ssh headnode "echo $SSH_SCAN_KEY >> /etc/ssh/ssh_known_hosts" [root@node1 ssh]# SSH_SCAN_KEY=$(ssh-keyscan -t rsa `hostname`) # node1.mybeowulf.local SSH-2.0-OpenSSH_4.3 [root@node1 ssh]# ssh headnode "echo $SSH_SCAN_KEY >> /etc/ssh/ssh_known_hosts" [root@node1 ssh]#
On the server:-
[root@headnode ssh]# cat /etc/ssh/ssh_known_hosts node1 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzKIg/MmPJzPoQxBWRN8G8ZGad74EqXRyR1T6EXWXQ+xvSKZmI6CuvExBuXoKCBVJ/TzTQ5x46c4fM2+3aU0xTpupzCGhrpcI+21ITwhJjlaF6Kc0CGhyTG8ztftxIdcBus0rW8VkSvVbLnMTDPQstHAVvrSqahoBfLCAWqLnWcJ8+BqenFtPI9Tvq6Dj+Ilx+ukNiGoS7+ng43WGWMHWP4LtGeI/628Hzt23WCjSLL+HqzoUF3u8ouwZlPiYP8BbUXOoTG9XME9M4Oiny0X6LoHMf0lNO89dlFpllRL3ZzURXPO+bT4KiR/Juo645JhTDi0Y7Nk6MToML0ji00yKVw== node1.mybeowulf.local ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzKIg/MmPJzPoQxBWRN8G8ZGad74EqXRyR1T6EXWXQ+xvSKZmI6CuvExBuXoKCBVJ/TzTQ5x46c4fM2+3aU0xTpupzCGhrpcI+21ITwhJjlaF6Kc0CGhyTG8ztftxIdcBus0rW8VkSvVbLnMTDPQstHAVvrSqahoBfLCAWqLnWcJ8+BqenFtPI9Tvq6Dj+Ilx+ukNiGoS7+ng43WGWMHWP4LtGeI/628Hzt23WCjSLL+HqzoUF3u8ouwZlPiYP8BbUXOoTG9XME9M4Oiny0X6LoHMf0lNO89dlFpllRL3ZzURXPO+bT4KiR/Juo645JhTDi0Y7Nk6MToML0ji00yKVw== [root@headnode ssh]#
Now, I can ssh into my nodes, without password:-
[root@headnode ssh]# ssh node1 Last login: Sat Mar 7 13:27:38 2009 from headnode [root@node1 ~]#
Alhumdulillah.
Or should I setup Rlogin first ? (Warning: Not needed / desired)
For Rlogin, we need to have rsh-server package installed on compute nodes. And to have it two-way, we need to have it on both compute nodes and the headnode.
Put this in post:-
[root@headnode ssh]# yum -y install rsh-server [root@headnode ssh]# perl -pi -e 's/= yes/= no/' /etc/xinetd.d/rlogin [root@headnode ssh]# cat /etc/xinetd.d/rlogin # default: on # description: rlogind is the server for the rlogin(1) program. The server \ # provides a remote login facility with authentication based on \ # privileged port numbers from trusted hosts. service login { socket_type = stream wait = no user = root log_on_success += USERID log_on_failure += USERID server = /usr/sbin/in.rlogind disable = no } [root@headnode ssh]# perl -pi -e 's/= yes/= no/' /etc/xinetd.d/rexec [root@headnode ssh]# cat /etc/xinetd.d/rexec # default: off # description: Rexecd is the server for the rexec(3) routine. The server \ # provides remote execution facilities with authentication based \ # on user names and passwords. service exec { socket_type = stream wait = no user = root log_on_success += USERID log_on_failure += USERID server = /usr/sbin/in.rexecd disable = no } [root@headnode ssh]# perl -pi -e 's/= yes/= no/' /etc/xinetd.d/rsh [root@headnode ssh]# cat /etc/xinetd.d/rsh # default: on # description: The rshd server is the server for the rcmd(3) routine and, \ # consequently, for the rsh(1) program. The server provides \ # remote execution facilities with authentication based on \ # privileged port numbers from trusted hosts. service shell { socket_type = stream wait = no user = root log_on_success += USERID log_on_failure += USERID server = /usr/sbin/in.rshd disable = no } [root@headnode ssh]# service xinetd restart
Same on the node:-
[root@node1 ssh]# perl -pi -e 's/= yes/= no/' /etc/xinetd.d/rlogin [root@node1 ssh]# perl -pi -e 's/= yes/= no/' /etc/xinetd.d/rexec [root@node1 ssh]# perl -pi -e 's/= yes/= no/' /etc/xinetd.d/rsh [root@node1 ssh]# service xinetd restart Stopping xinetd: [FAILED] Starting xinetd: [ OK ] [root@node1 ssh]# chkconfig --level 35 xinetd on [root@node1 ssh]#
setup hosts.equiv files for r* commands.
Server:-
(Incomplete. To do!)
YUM repositories
What about YUM repository on the nodes? Put the following in the %post:-
[root@node1 ssh]# cat > /etc/yum.repos.d/CentOS-Base.repo << EOF [base] name=CentOS-$releasever - Base baseurl=http://192.168.0.1/centos/ gpgcheck=1 gpgkey=http://192.168.0.1/centos/RPM-GPG-KEY-CentOS-5 EOF <pre> = MPI = OK. Now, lets setup MPI on this cluster. We need a central storage for storing MPI programs and user home directories. On the server:- <pre> mkdir /cluster mkdir /cluster/mpiuser vi /etc/exports /cluster *(rw,no_root_squash,sync) service nfs restart chkconfig --level 35 nfs on
We need to mount this directory on all cluster nodes.
[root@node1 ~]# mkdir /cluster [root@node1 ~]# mount -t nfs headnode:/cluster /cluster [root@node2 ~]# mkdir /cluster [root@node2 ~]# mount -t nfs headnode:/cluster /cluster
Put the mount request in /etc/fstab of all the compute nodes. Also put the same in %post of compute.ks .
Now we need an MPI user with same userID on all nodes. This can be done manually as following, or through NIS.
[root@headnode ~]# groupadd -g 600 mpiuser [root@headnode ~]# useradd -u 600 -g 600 -c "MPI user" -d /cluster/mpiuser mpiuser [root@headnode ~]# ls -l /cluster/ total 4 drwx------ 3 mpiuser mpiuser 4096 Mar 8 09:16 mpiuser [root@headnode ~]#
And on all compute nodes as well :-
groupadd -g 600 mpiuser useradd -u 600 -g 600 -c "MPI user" -d /cluster/mpiuser mpiuser
Next we need ssh equivalence for mpiuser on all nodes. We already know that they have common home mounted on each node, as /cluster/mpiuser. So we just need to generate ssh keys for them and put the public key in the authorized_keys file only on headnode. By doing that, we would setup the ssh equivalence automateically.
[root@headnode ~]# su - mpiuser [mpiuser@headnode ~]$ ssh-keygen -t rsa -C '' -N '' -f /cluster/mpiuser/.ssh/id_rsa Generating public/private rsa key pair. Created directory '/cluster/mpiuser/.ssh'. Your identification has been saved in /cluster/mpiuser/.ssh/id_rsa. Your public key has been saved in /cluster/mpiuser/.ssh/id_rsa.pub. The key fingerprint is: 1e:93:94:b8:f3:51:d7:84:31:2a:66:28:ad:23:ab:e8 [mpiuser@headnode ~]$ ssh-keygen -t dsa -C '' -N '' -f /cluster/mpiuser/.ssh/id_dsa Generating public/private dsa key pair. Your identification has been saved in /cluster/mpiuser/.ssh/id_dsa. Your public key has been saved in /cluster/mpiuser/.ssh/id_dsa.pub. The key fingerprint is: 63:ef:8b:62:94:ea:88:83:c9:73:78:5b:f7:a0:0f:08
Now lets copy the public key to the authorized_keys.
[mpiuser@headnode ~]$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys [mpiuser@headnode ~]$ cat .ssh/id_dsa.pub >> .ssh/authorized_keys [mpiuser@headnode ~]$ chmod 600 .ssh/*
Try logging on to the node1 as mpiuser:-
[mpiuser@headnode ~]$ ssh mpiuser@node1 Warning: Permanently added the RSA host key for IP address '192.168.0.11' to the list of known hosts. [mpiuser@node1 ~]$
Great! Alhumdulillah!
Ok. Now we need GCC on all nodes (headnode+compute).
yum -y install gcc
After that, we need to download MPI version 2, also known as MPICH. The site is:-
http://www.mcs.anl.gov/research/projects/mpich2
Download on headnode and compile it in the shared location /cluster/mpich2 .
cd /cluster wget http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.0.8/mpich2-1.0.8.tar.gz tar xzf mpich2-1.0.8.tar.gz cd mpich2-1.0.8 mkdir /cluster/mpich2 ./configure --prefix=/cluster/mpich2
On a VMware machine, the configuration part takes 2-3 minutes.
make
On a VMware machine, the compilation part takes 2-3 minutes.
make install
Alright, now we need to define certain environment variables in the .bashrc or .bash_profile of the mpiuser.
vi /cluster/mpiuser/.bash_profile ... PATH=$PATH:$HOME/bin:/cluster/mpich2/bin LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/cluster/mpich2/lib
I personally think that the following two lines are totally useless:-
Next we run this command in order to define MPICH installation path to SSH. mpiu@ub0:~$ sudo echo /mirror/mpich2/bin >> /etc/environment
Lets login as mpiuser and see if our mpi executables are found when needed:-
su - mpiuser [mpiuser@headnode ~]$ which mpd /cluster/mpich2/bin/mpd [mpiuser@headnode ~]$ which mpiexec /cluster/mpich2/bin/mpiexec [mpiuser@headnode ~]$ which mpirun /cluster/mpich2/bin/mpirun
Setup MPD:- MPD is MPI Daemon. We need to create a file named mpd.hosts in mpiuser's home directory, and put in the names of our compute nodes. ( I am using headnode as compute node as well).
vi /cluster/mpiuser/mpd.hosts headnode node1 node2
We also need to have a secrets file for the cluster:-
vi /cluster/mpiuser/.mpd.conf secretword=redhat
Tighten the permissions:-
chmod 0600 /cluster/mpiuser/.mpd.conf
Now run the following sequence of commands to check if things are working:-
On the headnode:-
mpd & sleep 2 mpdtrace mpdallexit
This should give you the following output. Notice the hostname returned by the mpdtrace command:-
[mpiuser@headnode ~]$ mpd& [1] 19235 [mpiuser@headnode ~]$ mpdtrace headnode [mpiuser@headnode ~]$ mpdallexit [mpiuser@headnode ~]$
Here is an interesting check. I intentionally shutdown node2 and then checked, what MPD reports:-
[mpiuser@headnode ~]$ mpdboot -n 3 --chkuponly checking node1 checking node2 these hosts are down; exiting ['node2'] [mpiuser@headnode ~]$
Lets try booting MPD on all three nodes:-
[mpiuser@headnode ~]$ mpdboot -n 3 mpdboot_headnode (handle_mpd_output 406): from mpd on node2, invalid port info: no_port
Failed! Ok. Lets boot MPD on two nodes only:-
[mpiuser@headnode ~]$ mpdboot -n 2
mpdtrace should return the name of hosts mpd is successfully running on:-
[mpiuser@headnode ~]$ mpdtrace headnode node1 [mpiuser@headnode ~]$ [mpiuser@headnode ~]$ mpdtrace -l headnode_52307 (192.168.0.10) node1_38651 (192.168.0.11)
So far so good. Lets execute a sample program provided to us in the examples directory in mpch2 source code directory:-
[mpiuser@headnode cluster]$ cd /cluster/mpich2-1.0.8/examples/
There is a compiled program in this directory. The othere programs need compiling with mpimake, NOT with simple make.
[mpiuser@headnode examples]$ ls -l total 968 -rw-r--r-- 1 3714 311 678 Nov 3 2007 child.c -rwxr-xr-x 1 root root 577390 Mar 8 10:17 cpi -rw-r--r-- 1 3714 311 1515 Nov 3 2007 cpi.c -rw-r--r-- 1 root root 1964 Mar 8 10:17 cpi.o ... ...
Lets run one process of mpiexec. mpiexec will automatically select any one node to do that.:-
[mpiuser@headnode examples]$ mpiexec -n 1 ./cpi Process 0 of 1 is on headnode pi is approximately 3.1415926544231341, Error is 0.0000000008333410 wall clock time = 0.000014 [mpiuser@headnode examples]$
Lets run two processes of mpiexec. mpiexec will automatically select any two nodes to do that.:-
[mpiuser@headnode examples]$ mpiexec -n 2 ./cpi Process 0 of 2 is on headnode Process 1 of 2 is on node1.mybeowulf.local pi is approximately 3.1415926544231318, Error is 0.0000000008333387 wall clock time = 0.001619 [mpiuser@headnode examples]$
Lets run two processes of mpiexec. mpiexec will automatically place two processes (out of total of four), on each node, as we have only two nodes:-
[mpiuser@headnode examples]$ mpiexec -n 4 ./cpi Process 0 of 4 is on headnode Process 1 of 4 is on node1.mybeowulf.local Process 2 of 4 is on headnode Process 3 of 4 is on node1.mybeowulf.local pi is approximately 3.1415926544231239, Error is 0.0000000008333307 wall clock time = 0.005809
The wall time has increased, by the increase in number of processes. It should have decreased, you must be thinking. You are right! But the hardware is not! Notice that these machines, are on a single laptop computer, created inside vmware. As soon as we increase nodes. The same single CPU is divided and shared between the compute nodes. Effectively decreasing the compute power of each node, all of a sudden. On real hardware based compute nodes. This time, WILL decrease, as each process will have a full CPU to itself and thus will take lesser time.
You can run other examples as well, by compiling them:-
There is a file named, icpi. Lets compile that and run that. [icpi is Interactive version of cpi].
[mpiuser@headnode examples]$ ls -l total 968 ... ... -rw-r--r-- 1 root root 1964 Mar 8 10:17 cpi.o -rw-r--r-- 1 3714 311 4469 Nov 3 2007 cpi.vcproj drwxr-xr-x 2 3714 311 4096 Mar 8 10:12 cxx drwxr-xr-x 2 3714 311 4096 Oct 24 20:31 developers drwxr-xr-x 2 3714 311 4096 Mar 8 10:12 f90 -rw-r--r-- 1 3714 311 1892 Nov 3 2007 icpi.c ... ...
Compile:-
[mpiuser@headnode examples]$ mpicc -o /cluster/mpiuser/icpi /cluster/mpich2-1.0.8/examples/icpi.c
Execute:-
cd ~ [mpiuser@headnode ~]$ mpiexec -n 2 ./icpi Enter the number of intervals: (0 quits) 1000 pi is approximately 3.1415927369231258, Error is 0.0000000833333327 wall clock time = 0.008686 Enter the number of intervals: (0 quits) 100000000 pi is approximately 3.1415926535900001, Error is 0.0000000000002069 wall clock time = 1.812169 Enter the number of intervals: (0 quits) 0 [mpiuser@headnode ~]$
Terminate the MPD daemon:-
mpdallexit
By using these simple examples, we have seen how we can setup and run MPI and MPI based programs. Alhumdulillah.
Linpack
Linpack needs mpi/lam or mpich or openmpi installed on the system. User equivalence should also be setup. This is what we have already setup in the steps above.
We need g77, gcc and related compilers, on all nodes:-
yum -y install gcc compat-gcc-34-g77
Next we need to download GOTO BLAS library. This is available at www.tacc.utexas.edu/resources/software/software.php . It needs trivial user registration, which you should go through.
After downloading it, extract it and compile it.
[as root] chown mpiuser:mpiuser /cluster -R
su - mpiuser tar xzf GotoBLAS-1.26.tar.gz cd ~/GotoBLAS
Some guides may ask you to uncomment a line in Makefile.rule
You may want to uncomment the following line, (around line # 14 or 16), so it shows:
F_COMPILER = G77
Please node that in GotoBLAS-1.26, the comments in the Makefile.rule say that if the line is not uncommented, it will use g77 anyway. So no need to change anything here.
The README file tells us to run :-
./quickbuild.32bit
... ... ./gensymbol linktest _ 1 > linktest.c gcc -O2 -D_GNU_SOURCE -Wall -fPIC -DF_INTERFACE_GFORT -DMAX_CPU_NUMBER=1 -DNUM_BUFFERS=\(2*1\) -DEXPRECISION -m128bit-long-double -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DBUNDERSCORE=_ -DNEEDBUNDERSCORE -I.. -DARCH_X86 -DCORE2 -DL1_CODE_SIZE=32768 -DL1_CODE_ASSOCIATIVE=8 -DL1_CODE_LINESIZE=64 -DL1_DATA_SIZE=32768 -DL1_DATA_ASSOCIATIVE=8 -DL1_DATA_LINESIZE=64 -DL2_SIZE=4194304 -DL2_ASSOCIATIVE=8 -DL2_LINESIZE=64 -DITB_SIZE=4096 -DITB_ASSOCIATIVE=4 -DITB_ENTRIES=128 -DDTB_SIZE=4096 -DDTB_ASSOCIATIVE=4 -DDTB_ENTRIES=256 -DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_CFLUSH -DHAVE_HIT=1 -DNUM_SHAREDCACHE=1 -DNUM_CORES=1 -DCORE_CORE2 -w -o linktest linktest.c ../libgoto_core2-r1.26.so -lm -lm && echo OK. OK. rm -f linktest
Done. This library is compiled with following conditions.
Binary ... 32bit Fortran ... GFORTRAN
Then run make:-
[mpiuser@headnode GotoBLAS]$ make ... ... DCNAME=zpotri -DBUNDERSCORE=_ -DNEEDBUNDERSCORE -I../.. -DARCH_X86 -DCORE2 -DL1_CODE_SIZE=32768 -DL1_CODE_ASSOCIATIVE=8 -DL1_CODE_LINESIZE=64 -DL1_DATA_SIZE=32768 -DL1_DATA_ASSOCIATIVE=8 -DL1_DATA_LINESIZE=64 -DL2_SIZE=4194304 -DL2_ASSOCIATIVE=8 -DL2_LINESIZE=64 -DITB_SIZE=4096 -DITB_ASSOCIATIVE=4 -DITB_ENTRIES=128 -DDTB_SIZE=4096 -DDTB_ASSOCIATIVE=4 -DDTB_ENTRIES=256 -DHAVE_CMOV -DHAVE_MMX -DHAVE_SSE -DHAVE_SSE2 -DHAVE_SSE3 -DHAVE_SSSE3 -DHAVE_CFLUSH -DHAVE_HIT=1 -DNUM_SHAREDCACHE=1 -DNUM_CORES=1 -DCORE_CORE2 -DCOMPLEX -DDOUBLE zpotri.c -o zpotri.o ar -ru ../../libgoto_core2-r1.26.a spotri.o dpotri.o cpotri.o zpotri.o make[2]: Leaving directory `/cluster/GotoBLAS/lapack/potri' make[1]: Leaving directory `/cluster/GotoBLAS/lapack'
Next, download LinPack (also known as HPL or xHPL), from www.netlib.org/benchmark/hpl . There are now two versions available on this site:-
hpl-2.0.tar.gz (updated September 10, 2008) and hpl.tgz (updated January 20, 2004)
I downloaded both of them. First I will check hpl.tgz .
cd /cluster tar xzf hpl.tgz cd /cluster/hpl
[mpiuser@headnode hpl]$ cp setup/Make.Linux_PII_FBLAS_gm .
Now, I need some information first.
What is my GCC version? 4.1.2
[mpiuser@node1 ~]$ gcc -v Using built-in specs. Target: i386-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=i386-redhat-linux Thread model: posix gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)
Do I have my GCC related files on the OS? yes:-
[mpiuser@node1 ~]$ ls /usr/lib/gcc/i386-redhat-linux/4.1.2/ crtbegin.o crtend.o include libgcc_s.so libgomp.so crtbeginS.o crtendS.o libgcc.a libgcov.a libgomp.spec crtbeginT.o crtfastmath.o libgcc_eh.a libgomp.a SYSCALLS.c.X [mpiuser@node1 ~]$
Time to edit this file:-
[mpiuser@headnode hpl]$ vi Make.Linux_PII_FBLAS_gm ... ... TOPdir = $(HOME)/hpl INCdir = $(TOPdir)/include BINdir = $(TOPdir)/bin/$(ARCH) LIBdir = $(TOPdir)/lib/$(ARCH)
HPLlib = $(LIBdir)/libhpl.a ... ... MPdir = MPinc = MPlib = ... ...
LAdir = $(HOME)/GotoBLAS LAinc =
- LAlib = $(LAdir)/libf77blas.a $(LAdir)/libatlas.a
LAlib = $(LAdir)/libgoto.a -lm -L/usr/lib/gcc/i386-redhat-linux/4.1.2
...
CC = mpicc CCNOOPT = $(HPL_DEFS) CCFLAGS = $(HPL_DEFS) -O3
- CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
... LINKER = mpicc LINKFLAGS = $(CCFLAGS)
ARCHIVER = ar ARFLAGS = r RANLIB = echo
Now build this:-
[mpiuser@headnode hpl]$ make arch=Linux_PII_FBLAS_gm
... ... make[2]: Leaving directory `/cluster/mpiuser/hpl/testing/ptimer/Linux_PII_FBLAS_gm' ( cd testing/ptest/Linux_PII_FBLAS_gm; make ) make[2]: Entering directory `/cluster/mpiuser/hpl/testing/ptest/Linux_PII_FBLAS_gm' mpicc -DAdd_ -DF77_INTEGER=int -DStringSunStyle -I/cluster/mpiuser/hpl/include -I/cluster/mpiuser/hpl/include/Linux_PII_FBLAS_gm -O3 -o /cluster/mpiuser/hpl/bin/Linux_PII_FBLAS_gm/xhpl HPL_pddriver.o HPL_pdinfo.o HPL_pdtest.o /cluster/mpiuser/hpl/lib/Linux_PII_FBLAS_gm/libhpl.a /cluster/mpiuser/GotoBLAS/libgoto.a -lm -L/usr/lib/gcc/i386-redhat-linux/4.1.2 make /cluster/mpiuser/hpl/bin/Linux_PII_FBLAS_gm/HPL.dat make[3]: Entering directory `/cluster/mpiuser/hpl/testing/ptest/Linux_PII_FBLAS_gm' ( cp ../HPL.dat /cluster/mpiuser/hpl/bin/Linux_PII_FBLAS_gm ) make[3]: Leaving directory `/cluster/mpiuser/hpl/testing/ptest/Linux_PII_FBLAS_gm' touch dexe.grd make[2]: Leaving directory `/cluster/mpiuser/hpl/testing/ptest/Linux_PII_FBLAS_gm' make[1]: Leaving directory `/cluster/mpiuser/hpl' [mpiuser@headnode hpl]$
Now you should have HPL installed. We can check:-
[mpiuser@headnode hpl]$ ls /cluster/mpiuser/hpl/bin/Linux_PII_FBLAS_gm/ HPL.dat xhpl [mpiuser@headnode hpl]$
Alhumdulillah!
Now, Lets run LinPack:-
[mpiuser@headnode hpl]$ cd /cluster/mpiuser/hpl/bin/Linux_PII_FBLAS_gm/
[mpiuser@headnode Linux_PII_FBLAS_gm]$ cp HPL.dat HPL.dat.original
Here is the file, with various values inside it.:-
[mpiuser@headnode Linux_PII_FBLAS_gm]$ cat HPL.dat HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 4 # of problems sizes (N) 29 30 34 35 Ns 4 # of NBs 1 2 3 4 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 3 # of process grids (P x Q) 2 1 4 Ps 2 4 1 Qs 16.0 threshold 3 # of panel fact 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) 2 # of recursive stopping criterium 2 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 3 # of recursive panel fact. 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 0 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) [mpiuser@headnode Linux_PII_FBLAS_gm]$
Now edit the file HPL.dat and edit a few values:-
Remember, I have setup VMware machines, each with 1 x 2.2 GHz processor and 256 MB of memory. I will run the linpack test on a total of machine first.
The values to change in the file HPL.dat are changed based on following rules:-
First is P and Q. The rule is P x Q = total number of cores in the system, and Q >= P. So if you have 4 cores, then 2 * 2 = 4. I am benchmarking on two single core systems. So that is why I have 2. If you were doing a single node with two processor cores, then you could make P = 1 and Q = 2.
The second value you will need to change is the value of N. This is something you may experiment, before deciding on a final value. We can use the following formula as a good starting point:
Sqrt( .1 * (free -b of the available memory) * number of nodes)
ClusterVision tells us the formula as:-
(SquareRoot([GB/node]*1024*1024*128))*0.85
free -b is :-
[mpiuser@headnode Linux_PII_FBLAS_gm]$ free -b
total used free shared buffers cached
Mem: 261730304 208621568 53108736 0 29437952 122265600 -/+ buffers/cache: 56918016 204812288 Swap: 271425536 61440 271364096 [mpiuser@headnode Linux_PII_FBLAS_gm]$
value of free -b = 53108736
so lets start our basic calculator:-
[mpiuser@node1 Linux_PII_FBLAS_gm]$ bc bc 1.06 Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc. This is free software with ABSOLUTELY NO WARRANTY. For details type `warranty'. sqrt (.1 * (53108736)*2) 3259.1 quit
So we have 3259.1 as a result. But it should be a multiple of value of NB in HBL.dat. Since NB is 4 , as seen in the file below, and the suitable value for N would be 3256.
Lets boot two nodes:-
cd ~ [mpiuser@headnode ~]$ mpdboot -n 2 [mpiuser@headnode ~]$
[mpiuser@node1 ~]$ cd hpl/bin/Linux_PII_FBLAS_gm/
Try running the program on one node only, with the default HPL.dat:-
[mpiuser@node1 Linux_PII_FBLAS_gm]$ mpiexec -n 1 ./xhpl HPL ERROR from process # 0, on line 419 of function HPL_pdinfo: >>> Need at least 4 processes for these tests <<<
HPL ERROR from process # 0, on line 621 of function HPL_pdinfo: >>> Illegal input in file HPL.dat. Exiting ... <<<
[mpiuser@node1 Linux_PII_FBLAS_gm]$
Next, I changed the following lines in HPL.dat and run it on two nodes:-
3256 # of problems sizes (N)
4 # of NBs
1 # of problems sizes (N)
1 # of process grids (P x Q)
1 Ps
2 Qs
[mpiuser@headnode Linux_PII_FBLAS_gm]$ mpiexec -n 2 ./xhpl
HPL ERROR from process # 0, on line 331 of function HPL_pdinfo:
>>> Number of values of N is less than 1 or greater than 20 <<<
HPL ERROR from process # 0, on line 621 of function HPL_pdinfo: >>> Illegal input in file HPL.dat. Exiting ... <<<
[mpiuser@headnode Linux_PII_FBLAS_gm]$
Next, I changed the following lines in HPL.dat and run it on two nodes:- 4 # of problems sizes (N) 4 # of NBs 1 # of problems sizes (N) 1 # of process grids (P x Q) 1 Ps 2 Qs
[mpiuser@headnode Linux_PII_FBLAS_gm]$ mpiexec -n 2 ./xhpl
================================================================
HPLinpack 1.0a -- High-Performance Linpack benchmark -- January 20, 2004 Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK
================================================================
An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. ... ...
T/V N NB P Q Time Gflops
WR00R2C4 35 4 1 2 0.01 2.765e-03
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0469732 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0515020 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0180039 ...... PASSED
================================================================
T/V N NB P Q Time Gflops
WR00R2R2 35 4 1 2 0.01 2.544e-03
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0455498 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0499414 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0174583 ...... PASSED
================================================================
T/V N NB P Q Time Gflops
WR00R2R4 35 4 1 2 0.00 1.265e-02
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0370092 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0405774 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0141849 ...... PASSED
================================================================
Finished 288 tests with the following results:
288 tests completed and passed residual checks, 0 tests completed and failed residual checks, 0 tests skipped because of illegal input values.
End of Tests.
================================================================
[mpiuser@headnode Linux_PII_FBLAS_gm]$
So from the above output, I get 1.265e-02 Gflops, that means 0.01265 Gflops, which means, 12.65 MegaFlops.
The machines we have here has 1 processors. Each processor has 1 cores. Each core can do 4 Floating point operations per clock cycle. Each clock runs at a rate of 2.2GHz. Multiplying this out we can get the machines "Rpeak" or theoretical max performance:
1 processors * 1 cores * 4 FLOPS / clock cycle * 2.2 GHz = 8.8 GFlops
This is really confusing.
Q-1: How to find out a theoratical max performance value for a processor? Q-2: How to know how many FLOPS my processor can do in one clock cycle? Is it mentioned on some technical specification of a processor ? Q-3: How to correctly run Linpack?
Lets run the test again with a new HPL.dat file:-
[mpiuser@headnode Linux_PII_FBLAS_gm]$ cat HPL.dat HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 3256 Ns 1 # of NBs 100 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 1 Ps 2 Qs 16.0 threshold 3 # of panel fact 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) 2 # of recursive stopping criterium 2 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 3 # of recursive panel fact. 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 0 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) [mpiuser@headnode Linux_PII_FBLAS_gm]$
[mpiuser@headnode Linux_PII_FBLAS_gm]$ mpiexec -n 2 ./xhpl > performance.txt
[mpiuser@headnode Linux_PII_FBLAS_gm]$ grep ^WR performance.txt
WR00L2L2 3256 100 1 2 5.06 4.552e+00
WR00L2L4 3256 100 1 2 4.14 5.563e+00
WR00L2C2 3256 100 1 2 4.88 4.715e+00
WR00L2C4 3256 100 1 2 4.65 4.950e+00
WR00L2R2 3256 100 1 2 4.94 4.661e+00
WR00L2R4 3256 100 1 2 4.61 4.994e+00
WR00C2L2 3256 100 1 2 5.60 4.114e+00
WR00C2L4 3256 100 1 2 3.80 6.056e+00 <<<--- Highest!
WR00C2C2 3256 100 1 2 5.11 4.510e+00
WR00C2C4 3256 100 1 2 4.39 5.251e+00
WR00C2R2 3256 100 1 2 4.92 4.685e+00
WR00C2R4 3256 100 1 2 4.43 5.196e+00
WR00R2L2 3256 100 1 2 4.65 4.951e+00
WR00R2L4 3256 100 1 2 4.67 4.931e+00
WR00R2C2 3256 100 1 2 4.19 5.500e+00
WR00R2C4 3256 100 1 2 4.58 5.030e+00
WR00R2R2 3256 100 1 2 5.59 4.119e+00
WR00R2R4 3256 100 1 2 4.36 5.283e+00
[mpiuser@headnode Linux_PII_FBLAS_gm]$
As you can see above, one of the lines state 6.056 GFlops! Alhumdulillah.
Remember, this is the output from two node cluster.
Now lets see how much we should get in reality from one node:-
1 processors * 1 cores * 4 FLOPS / clock cycle * 2.2 GHz = 8.8 GFlops .
4Flops /clock Cycle is a fixed value, and is true for most of the modern processors. If I ignore any of the communication overheads between two nodes, I should be getting ideally , 8.8 GHz x 2 = 17.6 GFLOPS in total. Whereas, I am getting less than half (6 GFlops) on my test cluster. There is a reason to it. My nodes are VMware machines. As soon as they get a job, their CPU is shared in half. You also need to keep in mind the overhead/CPU usage of the host machine as well. So on real machines, I should get around 6GHz x 2 = 12 GFLOPS.
Efficiency of a cluster is simply : (Number of GFLOPS achieved / Number of theoratical GFLOPS) * 100 .
My cluster's efficiency looks like: (6 / 17.6) * 100 = 34 %
Later, I will show you the results from a real cluster, and the efficiency level. InshaAllah.