This guide describes the initial iteration of the xCAT support for booting a node over an HFI network with AIX 71B.
Install xCAT and DB2 on AIX management node.
rpm -Uvh xCAT-IBMhpc*.rpm
Download HFI packages
There are several seperate packages required for boot over HFI including NIM server/HFI device driver/xCAT scripts. Contact IBM to get the builds. In this example, we assume all the packages have been downloaded and extracted to /hfi folder on management node.
Update the /etc/hosts file
The examples in this guide assume the following IP addresses and hostnames:
10.0.0.208 c250mgrs03-pvt //management node
10.7.2.5 c250f07c02ap05.ppd.pok.ibm.com c250f07c02ap05 //host name and IP of the ethernet network interface on service node.
20.7.2.5 c250f07c02ap05-hf0.ppd.pok.ibm.com c250f07c02ap05-hf0 //service node
21.7.2.5 c250f07c02ap05-hf1.ppd.pok.ibm.com c250f07c02ap05-hf1 //service node
22.7.2.5 c250f07c02ap05-hf2.ppd.pok.ibm.com c250f07c02ap05-hf2 //service node
23.7.2.5 c250f07c02ap05-hf3.ppd.pok.ibm.com c250f07c02ap05-hf3 //service node
20.7.2.9 c250f07c02ap09-hf0.ppd.pok.ibm.com c250f07c02ap09-hf0 //compute node
21.7.2.9 c250f07c02ap09-hf1.ppd.pok.ibm.com c250f07c02ap09-hf1 //compute node
22.7.2.9 c250f07c02ap09-hf2.ppd.pok.ibm.com c250f07c02ap09-hf2 //compute node
23.7.2.9 c250f07c02ap09-hf3.ppd.pok.ibm.com c250f07c02ap09-hf3 //compute node
Make sure /etc/hosts on Management node contains hostnames for the ethernet interface and the HFI interface for the service node(s).
Define the service node and compute node
If there is no HMC configured to manage the P7 IH hardwares, you have to define the service nodes and compute nodes manually. If the hardwares are managed by HMC, you can use xCAT command "rscan" to generate the compute node definition. See man page of rscan for more details.
This is an example of service node definition in a mkdef stanza file:
c250f07c02ap05:
objtype=node
arch=ppc64
cons=fsp
groups=all,service //Specify service group indicate this is a service node.
hcp=f07c02fsp1_a //FSP node definition that managed it.
id=5
installnic=en0
ip=10.7.2.5
mgt=fsp
monserver=10.0.0.208
nfsserver=10.0.0.208
nodetype=lpar,osi
os=AIX
parent=f07c02fsp1_a //Set to Fsp that manage it.
postbootscripts=servicenode
pprofile=c250f07c02ap05
primarynic=en0
provmethod=1040A_SN
setupconserver=0
setupdhcp=1
setupftp=1
setupnfs=1
setuptftp=1
tftpserver=10.0.0.208
xcatmaster=10.0.0.208
This is an example of the compute node's definition:
c250f07c02ap09-hf0:
objtype=node
arch=ppc64
cons=fsp
currchain=boot
currstate=boot
groups=lpar,all
hcp=Server-9125-F2C-SNP7IH019-A
id=9
ip=10.4.32.224
mgt=fsp
nodetype=lpar,osi
os=AIX
parent=Server-9458-100-SNBPCF007-A
pprofile=xcatnode9
servicenode=c250f07c02ap05
xcatmaster=c250f07c02ap05-hf0
You can put the definition into one stanza file and import it to xCAT with the mkdef xCAT command.
cat /percs/hfi/c250f07c02ap09-hf0.stanza | mkdef -z
mknimimage -s /percs/aiximages/aix/ 1040A_SN
where /install/aiximages/1040a/aix/ contains the AIX image.
Note: If there are existing resource, you can specify the resource type and value as an option of mknimimage so mknimimage will not create same resource type. For example you have copied one lpp_source "1040A_SN_lpp_source" to /install/nim/lpp_source and created NIM lpp source "1040A_SN_lpp_source" then you can just use it to avoiding creating new lpp_source with command:
mknimimage -s /install/aiximages/1040a/aix/ 1040A_SN lpp_source=1040A_SN_lpp_source spot=1040A_SN -f
2. Add required service node software Following steps add the HFI device driver, replace the NIM with new version, and replace the bootpd birnay for HFI support. Please aware that this software is to work with HFI support. There is still some other software needing to be installed on service node. Please check the "Add required service node software" in service node setup document for more packages: https://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_an_AIX_Hierarchical_Cluster#Add_required_service_node_software
3. Update the HFI device driver to lpp source
inutoc /hfi/dd/
nim -o update -a packages=all -a source=/hfi/dd/ 1040A_SN_lpp_source
4. Define HFI device driver installp bundle
nim -o define -t installp_bundle -a server=master -a location=/hfi/dd/xCATaixHFIdd.bnd xCATaixHFIdd
5. Assign HFI devices drivers isntallp bundle to service node image so they will be installed during service node installation.
chdef -t osimage -o 1040A_SN installp_bundle=xCATaixHFIdd,xCATaixSN71
Where installp_bundle should have been defined when installing required service node software.
6. Configure HFI interfaces with postscript
cp /hfi/scripts/confighfi /install/postscripts/
cp /hfi/scripts/confignim /install/postscripts/
chdef -t osimage -o 1040A_SN synclists=/hfi/scripts/synclist
chdef c250f07c02ap05 postscripts=confighfi,confignim
7. If you are using an existing NIM resource including HPC softwares, two steps needs to be done to automatically install and configure the HPC softwares
1. Create installp bundles which specified what are HPC softwares need to be installed:
mkdir -p /install/nim/installp_bundle
cp /opt/xcat/share/xcat/IBMhpc/IBMhpc_base.bnd /install/nim/installp_bundle
nim -o define -t installp_bundle -a server=master -a location=/install/nim/installp_bundle/IBMhpc_base.bnd IBMhpc_base
cp /opt/xcat/share/xcat/IBMhpc/IBMhpc_all.bnd /install/nim/installp_bundle
nim -o define -t installp_bundle -a server=master -a location=/install/nim/installp_bundle/IBMhpc_all.bnd IBMhpc_all
chdef -t osimage -o 1040A_SN installp_bundle="xCATaixHFIdd,xCATaixSN71,IBMhpc_base,IBMhpc_all"
2. Assign the postscripts to service node to configure the HPC softwares automatically.
cp -p /opt/xcat/share/xcat/IBMhpc/IBMhpc.postbootscript /install/postscripts
cp -p /opt/xcat/share/xcat/IBMhpc/IBMhpc.postscript /install/postscripts
cp -p /opt/xcat/share/xcat/IBMhpc/gpfs/gpfs_updates /install/postscripts
cp -p /opt/xcat/share/xcat/IBMhpc/compilers/compilers_license /install/postscripts
cp -p /opt/xcat/share/xcat/IBMhpc/pe/pe_install /install/postscripts
cp -p /opt/xcat/share/xcat/IBMhpc/essl/essl_install /install/postscripts
cp -p /opt/xcat/share/xcat/IBMhpc/loadl/loadl_install /install/postscripts
chdef c250f07c02ap05 postscripts=confighfi,confignim,IBMhpc.postbootscript
For more details and steps about HPC integration, please review following page: http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_up_all_IBM_HPC_products_in_a_Stateful_Cluster
8. Then follow the service node setup document starting from "Section 3.8 Define xCAT networks" to install service node.
http://xcat.svn.sourceforge.net/viewvc/xcat/xcat-core/trunk/xCATclient/share/doc/xCAT2onAIXServiceNodes.pdf
9. Check if HFI interfaces are up after service node is setup.
xdsh c250f07c02ap05 ifconfig hf0
mknimimage -V -f -r -t diskless -s /percs/aiximages/aix 1040A_CN
where /install/AIX71GOLD/ contains the AIX 710 GOLD source images.
Same as diskfull images creation, if there are existing resource, you can specify the resource type and value as an option of mknimimage so mknimimage will not create same resource type. For example, you have copied one lpp_source 1040A_CN_lpp_source to /install/nim/lpp_source, then you can just use it to avoiding creating new lpp_source with command:
mknimimage -V -f -r -t diskless -s /install/AIX71GOLD/ 1040A_CNlpp_source=1040A_CN_lpp_source spot=1040A_CN 1040A_CN
Update the spot
Note: Skip this step if you are using the existing image including HFP softwares since all the packages have been updated into the image
See sections 4.2.1 (Update options) and section 4.2.2 "Adding required software" in the xCAT document "Using xCAT Service Nodes with AIX" to add necessary softwares to spot: http://xcat.svn.sourceforge.net/viewvc/xcat/xcat-core/trunk/xCAT-client/share/doc/xCAT2onAIXServiceNodes.pdf
Create an xCAT HFI network definition
Run a command similar to the following:
mkdef -t network -o hfinet net=20.0.0.0 mask=255.0.0.0 gateway=20.7.2.5
Install HFI device driver into spot
(Skip this step if you are using the existing image including HFP softwares since all the packages have been updated)
Install the HFI device driver into the spot on the management node.
inutoc /hfi/dd/
nim -o update -a packages=all -a source=/hfi/dd 1040A_CN_lpp_source
where: /percs/hfi/dd contains the HFI device driver installp packages.
chdef -t osimage -o 1040A_CN installp_bundle="xCATaixHFIdd,xCATaixCN71"
Where xCATaixCN71 should have been defined when add the additional softwares into spot.
mknimimage -u 1040A_CN
synchronize /etc/hosts to SPOT to bring up all the HFI interfaces on compute nodes
chdef -t osimage -o 1040A_CN synclists=/hfi/scripts/synclist
mknimimage -u 1040A_CN
If you are using the existing NIM resource including HPC softwares, you need to assign the postscripts to computes node to configure the HPC softwares automatically.(Optional)
chdef c250f07c02ap09-hf0 postscripts=IBMhpc.postbootscript
Initialize console for the compute node.
makeconservercf
Add confighfi postscript to compute node to config HFI interfaces automatically.
cp /hfi/scripts/confighfi /install/postscripts/
chdef c250f07c02ap09-hf0 postscripts=confighfi,IBMhpc.postbootscript
Note: remove the IBMhpc.postbootscript if you are not using a diskless image with HPC softwares.
Get the MAC address of the compute node
getmacs c250f07c02ap09-hf0 -D --hfi
Initialize the AIX/NIM diskless nodes
mkdsklsnode -i 1040A_CN c250f07c02ap09-hf0 --hfi -f -V
Open remote console
Open another window, login to the management node (ih1901) and run the following command to watch the installation from the console:
rcons c250f07c02ap09-hf0
Boot the compute node
rnetboot c250f07c02ap09-hf0 --hfi