xCAT Linux on IBM System P Clusters
11/12/2010, AM 10:35:04
This cookbook introduces how to use the xCAT2 to install Linux on the IBM power system machines.
The power system machines have the following characteristics:
xCAT supports two types of installations for compute nodes: Diskfull installation (Statefull) and Diskless (Stateless). xCAT also supports hierarchical management clusters where one or more service nodes are used to handle the installation and management of compute nodes. Please refer to xCAT2SetupHierarchy.pdf for hierarchical usage.
Based on the two types of installation, the following installation scenarios will be described in this document:
To provide an easier understanding of the installation steps, this cookbook provides an example.
The management node:
Arch: an LPAR on a p5/p6/p7 machine
OS: Red Hat Enterprise Linux 5.2
Hostname: pmanagenode
IP: 192.168.0.1
HCP: HMC
The management Network:
Net: 192.168.0.0
NetMask: 255.255.255.0
Gateway: 192.168.0.1
Cluster-face-IF: eth1
dhcpserver: 192.168.0.1
tftpserver: 192.168.0.1
nameservers: 192.168.0.1
The compute nodes:
Arch: an LPAR on a p5/p6/p7 machine
OS: Red Hat Enterprise Linux 5.2
HCP: HMC
Hostname: pnode1 - this node will be installed statefull
IP: 192.168.0.10
Cluster-face-IF: eth0
Hostname: pnode2 - this node will be installed stateless
IP: 192.168.0.20
Cluster-face-IF: eth0
The Hardware Control Point:
Name: hmc1
IP: 192.168.0.100
Before proceeding to setup your pLlinux Cluster, you should first read for information on downloading and installing xCAT on your Management Node:
https://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_a_Linux_xCAT_Mgmt_Node
Some xCAT database tables will be used in the following chapters, for more details on xCAT database tables check xcatdb manpage
This section is being used as an overview section for the P7 IH support working with xCAT EMS. The detail implementation will be described in P7 IH cluster guide.
{{:P7_IH_Cluster_on_Linux_MN}}
The tftp client in the open firmware of p5 is only compatible with tftp-server instead of atftpd which is required by xCAT2. So we have to remove the atftpd first and then install the tftp-server. This is not required for Power6 or later.
rpm -qa | grep atftp
Could find one or both of the following rpms:
atftp-xcat-*
atftp-*
service tftpd stop
rpm --nodeps -e atftp-xcat atftp
[RH]:
yum install tftp-server.ppc
zypper install tftp
Notes: make sure the entry "disable=no" in the /etc/xinetd.d/tftp.
service xinetd restart
The xCAT database table passwd contains default userids and passwords for xCAT to access cluster components. This section will describe how to set the default userids and passwords for system and hmc in xCAT database table.
chtab key=system passwd.username=root passwd.password=cluster
chtab key=hmc passwd.username=hscroot passwd.password=abc123
Note: The username and password for xCAT to access the HMCs can be specified through mkdef or chdef command, this is useful especially when some specific HMCs use the different username and password with the default ones. For example:
mkdef -t node -o hmc1 groups=hmc,all nodetype=hmc mgt=hmc username=hscroot password=abc1234
chdef -t node -o hmc1 username=hscroot password=abc1234
The definition of a node is stored in several tables of the xCAT database.
You can use rscan command to discover the HCP to get the nodes that managed by this HCP. The discovered nodes can be stored into a stanza file. Then edit the stanza file to keep the nodes which you want to create and use the mkdef command to create the nodes definition.
The following command will create an xCAT node definition for an HMC with a host name of hmc1. The groups, nodetype, mgt, username, and password attributes will be set.
mkdef -t node -o hmc1 groups=hmc,all nodetype=hmc mgt=hmc username=hscroot password=abc123
to change and add new groups:
chdef -t node -o hmc1 groups=hmc,rack1,all
to verify your data:
lsdef -l hmc1
If xCAT Management Node is in the same service network with HMC, you will be able to discover the HMC and create an xCAT node definition for the HMC automatically.
lsslp -w -s HMC
To check for the hmc name added to the nodelist:
tabdump nodelist
The above xCAT command lsslp discovers and writes the HMCs into xCAT database, but we still need to set HMCs' username and password.
chdef -t node -o <hmcname from lsslp> username=hscroot password=abc123
For more details with hardware discovery feature in xCAT, please refer to document:
Run the rscan command to gather the LPAR information. This command can be used to display the LPAR information in several formats and can also write the LPAR information directly to the xCAT database. In this example we will use the "-z" option to create a stanza file that contains the information gathered by rscan as well as some default values that could be used for the node definitions.
To write the stanza format output of rscan to a file called "node.stanza" run the following command. We are assuming, for our example ,that the hmc name returned from lsslp was hmc1.
rscan -z hmc1 > node.stanza
This file can then be checked and modified as needed. For example you may need to add a different name for the node definition or add additional attributes and values.
Note'': The stanza file will contain stanzas for things other than the LPARs. This information must also be defined in the xCAT database. ''The stanza will repeat the same bpa' information for multiple fsp(s). 'It is not necessary to modify the non-LPAR stanzas in any way.
The stanza file will look something like the following.
Server-9117-MMA-SN10F6F3D:
objtype=node
nodetype=fsp
id=5
model=9118-575
serial=02013EB
hcp=hmc01
pprofile=
parent=Server-9458-10099201WM_A
groups=fsp,all
mgt=hmc
pnode1:
objtype=node
nodetype=lpar,osi
id=9
hcp=hmc1
pprofile=lpar9
parent=Server-9117-MMA-SN10F6F3D
groups=lpar,all
mgt=hmc
cons=hmc
pnode2:
objtype=node
nodetype=lpar,osi
id=7
hcp=hmc1
pprofile=lpar6
parent=Server-9117-MMA-SN10F6F3D
groups=lpar,all
mgt=hmc
cons=hmc
Note'': The ''rscan'' command supports an option to automatically create node definitions in the xCAT database. To do this the LPAR name gathered by ''rscan'' is used as the node ''_name and the command sets several default values. If you use the “-w†option, make sure the LPAR name you defined will be the name you want used as your node name. _
_For a node which was defined correctly before, you can use the “lsdef –z [nodename]> node.stanza†command to export the definition into the node.stanza, and use command “cat node.stanza | chdef -z†to update the node.stanza according to your need. _
The information gathered by the rscan command can be used to create xCAT node definitions by running the following command:
cat node.stanza | mkdef -z
Verify the data:
lsdef -t node -l all
See the section xCAT node group support in xCAT2top for more details on how to define xCAT groups. For the example below add the compute group to the nodes.
chdef -t node -o pnode1,pnode2 -p groups=compute
chdef -t node -o pnode1 netboot=yaboot tftpserver=192.168.0.1 nfsserver=192.168.0.1
monserver=192.168.0.1 xcatmaster=192.168.0.1 installnic="eth0" primarynic="eth0"
Note: Please make sure the attributes "installnic" and "primarynic" are set up by the correct Ethernet Interface of compute node. Otherwise the compute node installation may hang on requesting information from an incorrect interface. The "installnic" and "primarynic" can also be set to mac address if you are not sure about the Ethernet interface name, the mac address can be got through getmacs command. The installnic" and "primarynic" can also be set to keyword "mac", which means that the network interface specified by the mac address in the mac table will be used.
Make sure that the address used above ( 192.168.0.1) is the address of the Management Node as known by the node. Also make sure site.master has this address.
_Make sure site.master is the address or name known by the node _
To change site.master to this address:
chtab key=master site.value=_192.168.0.1_
chdef -t node -o pnode1 os=<os> arch=ppc64 profile=compute
For valid options:
tabdump -d nodetype
xCAT supports the running of customization scripts on the nodes when they are installed. You can see what scripts xCAT will run by default by looking at the xcatdefaults entry in the xCAT postscripts database table. The postscripts attribute of the node definition can be used to specify the comma separated list of the scripts that you want to be executed on the nodes. The order of the scripts in the list determines the order in which they will be run.
For example, if you want to have your two scripts called foo and bar run on node node01 you could add them to the postscripts table:
_**chdef -t node -o node01 -p postscripts=foo,bar**_
(The -p flag means to add these to whatever is already set.)
For more information on creating and setting up Post*scripts:
[Postscripts_and_Prescripts]
To enable the NTP services on the cluster, first configure NTP on the management node and start ntpd.
service ntpd start
Next set the ntpservers attribute in the site table. Whatever time servers are listed in this attribute will be used by all the nodes that boot directly from the management node. In our example, the Management Node will be used as the ntp server.
_chdef -t site ntpservers= myMN_
To have xCAT automatically set up ntp on the cluster nodes you must add the setupntp script to the list of postscripts that are run on the nodes.
To do this you can either modify the postscripts attribute for each node individually or you can just modify the definition of a group that all the nodes belong to.
For example, if all your nodes belong to the group compute, then you could add setupntp to the group definition by running the following command.
_chdef -p -t group -o compute postscripts=setupntp_
A basic networks table was created for you during the xCAT install. Review that table and add additional networks based on your hardware configuration.
Create the networks that used for cluster management:
mkdef -t network -o net1 net=192.168.0.0 mask=255.255.255.0 gateway=192.168.0.1 mgtifname=eth1
dhcpserver=192.168.0.1 tftpserver=192.168.0.1 nameservers=192.168.0.1
vi /etc/hosts
127.0.0.1 localhost
192.168.0.1 pmanagenode
192.168.0.10 pnode1
192.168.0.20 pnode2
192.168.0.100 hmc1
.
.
.
Add following lines into /etc/resolv.conf:
vi /etc/resolv.conf
search cluster.net
nameserver 192.168.0.1
Setup nameserver:
chdef -t site nameservers=192.168.0.1 Setup the external nameserver: chdef -t site forwarders=9.112.4.1 Setup the local domain name: chdef -t site domain=cluster.net
makedns
service named start
chkconfig --level 345 named on
If you add nodes or update the networks table at a later time, then rerun makedns:
makedns
service named restart
The xCAT rcons command uses the conserver package to provide support for multiple read-only consoles on a single node and the console logging. For example, if a user has a read-write console session open on node node1, other users could also log in to that console session on node1 as read-only users. This allows sharing a console server session between multiple users for diagnostic or other collaborative purposes. The console logging function will log the console output and activities for any node with remote console attributes set to the following file which an be replayed for debugging or any other purposes:
/var/log/consoles/<management node>
Note: conserver=<management node> is the default, so it optional in the command
Each xCAT node with remote console attributes set should be added into the conserver configuration file to make the rcons work. The xCAT command makeconservercf will put all the nodes into conserver configuration file /etc/conserver.cf. The makeconservercf command must be run when there is any node definition changes that will affect the conserver, such as adding new nodes, removing nodes or changing the nodes' remote console settings.
To add or remove new nodes for conserver support:
makeconservercf
service conserver stop
service conserver start
The functions rnetboot and getmacs depend on conserver functions, check it is available.
rcons pnode1
If it works ok, you will get into the console interface of the pnode1. If it does not work, review your rcons setup as documented in previous steps.
See if you setup is correct at this point, run rpower to check node status:
rpower pnode1 stat
Before run getmacs, make sure the node is off. The reason is that HMC has one issue that it cannot shutdown linux nodes which are in running state.
If not, please force the lpar shutdown with:
rpower pnode1 stat, if node is on then run
rpower pnode1 off
If there's only one Ethernet adapter on the node or you have specified the installnic or primarynic attribute of the node, using following command can get the correct mac address.
Check for *nic definition, buy running
lsdef pnode1
To set installnic or primarynic:
chdef -t pnode1 -o blade01 installnic=eth0 primarynic=eth1
Get mac addresses:
getmacs pnode1
But, if there are more than one Ethernet adapters on the node, and you don't know which one has been configured for the installation process, or the lpar is just created and there is no active profile for that lpar, or the lpar is on a P5 system and there is no lhea/sea ethernet adapters, you have to specify more parameters like this for lpar to try to figure out an available interface by ping operation, you can run this command:
getmacs pnode1 -D -S 192.168.0.1 -G 192.168.0.10
The output looks like following:
pnode1:
Type Location Code MAC Address Full Path Name Ping Result Device Type
ent U9133.55A.10E093F-V4-C5-T1 f2:60:f0:00:40:05 /vdevice/l-lan@30000005 virtual
And the Mac address will be written into the xCAT mac table. Run to verify:
tabdump mac
chdef -t site dhcpinterfaces='pmanagenode|eth1'
On the SLES management node, the dhcp-server rpm may not have been automatically installed. Use following command to check whether it has been installed:
rpm -qa | grep -E "^dhcp-server"
If it is not installed, installed it manually:
zypper install dhcp-server
Add the relevant networks into the DHCP configuration:
makedhcp -n
Add the defined nodes into the DHCP configuration:
makedhcp -a
Restart he dhcp service:
service dhcpd restart
Note: Please make sure there is only one dhcpd server can serv these compute nodes.
You can use the iso file of the installed OS to extract the installation files. For example, you have a iso file /iso/RHEL5.2-Server-20080430.0-ppc-DVD.iso
copycds /iso/RHEL5.2-Server-20080430.0-ppc-DVD.iso
Note: If you encounter the issue that the iso cannot be mounted by the copycds command. Make sure the SElinux is disabled.
Before following the next steps of installation, you need to know about the relationship between <os> and <platform>. Normally, <os> is used as the name of the operating system, <platform> is used as one family or platform containing many operating systems. We can think that <platform> contains <os>.
For example, considering RedHat Enterprise Linux 6.0, rhels6 is the <os>, and rh is the <platform>. For SuSE Linux Enterprise Server 11 SP1, sles11.1 is the <os>, and sles is the <platform>.
Note: This naming convention is suitable for the installation of the Stateful/Stateless/Statelite Compute/Service nodes.
xCAT uses KickStart or AutoYaST installation templates and related installation scripts to complete the installation and configuration of the compute node.
You can find sample templates for common profiles in following directory:
/opt/xcat/share/xcat/install/<platform>/
If you customize a template then you should copy it to:
/install/custom/install/<platform> directory.
The profile, os and architecure of the node was setup in Set the type attributes of the node above.
To check the setting of your node's profile, os, architecture run:
lsdef pnode1
Object name: pnode1
.
.
.
arch=ppc64
os=rhels5.5
profile=compute
For this example, the search order for the template file is as follows:
The directory /install/custom/install/<platform> will be searched first and then search /opt/xcat/share/xcat/install/<platform>.
Then in the diretory the following order will be honored:
compute.rhels5.5.ppc64.tmpl
compute.rhels5.ppc64.tmpl
compute.rhels.ppc64.tmpl
compute.rhels5.5.tmpl
compute.rhels5.tmpl
compute.rhels.tmpl
compute.ppc64.tmpl
compute.tmpl
If you want to customize a template for node , you should copy the template to the /install/custom/install/<os>/ directory and make your modifications unless you rename your file. You need to copy to the custom directory, so the next install of xCAT will not wipe out your modifications, because it will update the /opt/xcat/share directory. Keep in mind the above search order to make sure it picks up your template.
Note: Sometimes the directory /opt/xcat/share/xcat/install/scripts also needs to be copied to /install/custom/install/ to make the customized profile work, because the customized profiles will need to include the files in scripts directory as the prescripts and postscripts.
For example, you need to put the .otherpkgs.pkglistfile into the /install/custom/install/<os>/ directory, if you need to install other packages.
If you want to install a specific package like a specific .rpm onto the compute node, copy the rpm into the following directory:
/install/post/otherpkgs/<os>/<arch>
Another thing you MUST DO is to create repodata for this directory. You can use the "createrepo" command to create repodata.
On RHEL5.x, the "createrepo" rpm package can be found in the install ISO; on SLES11, it can be found in SLE-11-SDK-DVD Media 1 ISO.
After "createrepo" is installed, you need to create one text file which contains the complete list of files to include in the repository. For example, the name of the text file is rpms.list in /install/post/otherpkgs/<os>/<arch> directory. Create rpms.list:
cd /install/post/otherpkgs/<os>/<arch>
ls *.rpm >rpms.list
Then, please run the following command to create the repodata for the newly-added packages:
createrepo -i rpms.list /install/post/otherpkgs/<os>/<arch>
The createrepo command with -i rpms.list option will create the repository for the rpm packages listed in the rpms.list file. It won't destroy or affect the rpm packages that are in the same directory, but have been included into another repository.
Or, if you create a sub-directory to contain the rpm packages, for example, named other in /install/post/otherpkgs/<os>/<arch>. Please run the following command to create repodata for the directory /install/post/otherpkgs/<os>/<arch>/.
createrepo /install/post/otherpkgs/<os>/<arch>/**other**
Note: Please replace other with your real directory name.
nodeset pnode1 install
rnetboot pnode1
After the node installation is completed successfully, the node's status will be changed to booted, the following command to check the node's status:
lsdef pnode1 -i status
When the node's status is changed to booted, you can also check ssh service on the node is working and you can login without password. Note: Do not run ssh or xdsh against the node until the node installation is completed successfully. Running ssh or xdsh against the node before the node installation completed may result in ssh hostkeys issues.
If ssh is working but cannot login without password, force exchange the ssh key to the compute node using xdsh:
xdsh pnode1 -K
After exchanging ssh key, following command should work, without being prompted for a password.
xdsh pnode1 date
Using a postinstall script ( you could also use the updatenode method):
mkdir /install/postscripts/data
cp <kernel> /install/postscripts/data
Create the postscript updatekernel:
vi /install/postscripts/updatekernel
#!/bin/bash
rpm -Uivh data/kernel-*rpm
chmod 755 /install/postscripts/updatekernel
Add the script to the postscripts table and run the install:
chdef -p -t group -o compute postscripts=updatekernel
rinstall compute
Typically, you can build your stateless compute node image on the Management Node, if it has the same OS and architecture as the node. If you need another OS image or architecture than the OS installed on the Management Node, you will need a machine that meets the OS and architecture you want for the image and create the image on that node.
The default list of rpms to added or exclude to the diskless images is shipped in the following directory:
/opt/xcat/share/xcat/netboot/<platform>
If you want to modify the current defaults for .pkglist or .exlist or *.postinstall, copy the shipped default lists to the following directory, so your modifications will not be removed on the next xCAT rpm update. xCAT will first look in the custom directory for the files before going to the share directory.
/install/custom/netboot/<platform> directory
If you want to exclude more packages, add them into the following exlist file:
/install/custom/netboot/<platform>/<profile>.exlist
Add more packages names that need to be installed on the stateless node into the pkglist file
/install/custom/netboot/<platform>/<profile>.pkglist
There are rules ( release 2.4 or later) for which * postinstall files will be selected to be used by genimage.
If you are going to make modifications, copy the appropriate /opt/xcat/share/xcat/netboot/<platform>/*postinstall file to the
/install/custom/netboot/<platform> directory:
cp opt/xcat/share/xcat/netboot/<platform>/*postinstall /install/custom/netboot/<platform>/.
Use these basic rules to edit the correct file in the /install/custom/netboot/<platform> directory. The rule allows you to customize your image down to the profile, os and architecture level, if needed.
You will find postinstall files of the following formats and genimage* will process the files in the order of the below formats:
<profile>.<os>.<arch>.postinstall
<profile>.<arch>.postinstall
<profile>.<os>.postinstall
<profile>.postinstall
This means, if "<profile>.<os>.<arch>.postinstall" is there, it will be used first.
Make sure you have the basic postinstall script setup in the directory to run for your genimage. The one shipped will setup fstab and rcons to work properly and is required.
You can add more postinstall process ,if you want. The basic postinstall script (2.4) will be named <profile>.<arch>.postinstall ( e.g. compute.ppc64.postinstall). You can create one for a specific os by copying the shipped one to , for example, compute.rhels5.4.ppc64.postinstall
Note: you can use the sample here: /opt/xcat/share/xcat/netboot/<platform>/
[RH]:
Add following packages name into the <profile>.pkglist
bash
nfs-utils
stunnel
dhclient
kernel
openssh-server
openssh-clients
busybox-anaconda
wget
vim-minimal
ntp
You can add any other packages that you want to install on your compute node. For example, if you want to have userids with passwords you should add the following:
cracklib
libuser
passwd
Add following packages name into the <profile>.pkglist
aaa_base
bash
nfs-utils
dhcpcd
kernel
openssh
psmisc
wget
sysconfig
syslog-ng
klogd
vim
cd /opt/xcat/share/xcat/netboot/rh
./genimage -i eth0 -n ibmveth -o rhels5.2 -p compute
cd /opt/xcat/share/xcat/netboot/sles
./genimage -i eth0 -n ibmveth -o sles11 -p compute
packimage -o rhels5.2 -p compute -a ppc64
packimage -o sles11 -p compute -a ppc64
nodeset pnode2 netboot
rnetboot pnode2
After the node installation is completed successfully, the node's status will be changed to booted, the following command to check the node's status:
lsdef pnode2 -i status
When the node's status is changed to booted, you can also check ssh service on the node is working and you can login without password.
Note: Do not run ssh or xdsh against the node until the node installation is completed successfully. Running ssh or xdsh against the node before the node installation completed may result in ssh hostkeys issues.
If ssh is working but cannot login without password, force exchange the ssh key to the compute node using xdsh:
xdsh pnode2 -K
After exchanging ssh key, following command should work.
xdsh pnode2 date
Put your new kernel and kernel modules on the MN. If the new kernel is already installed on your MN, you can go directly to the genimage command below. But, more likely, this new kernel is not installed on the MN. If that's the case, you can copy the kernel into /boot and the modules into /lib/modules/<new kernel directory> , and genimage will pick them up from there. Assuming you have the kernel in RPM format in /tmp:
cd /tmp
rpm2cpio kernel-2.6.32.10-0.5.ppc64.rpm | cpio -idv ./boot/vmlinuz-2.6.32.10-0.5-ppc64
cp ./boot/vmlinux-2.6.32.10-0.5-ppc64 /boot
rpm2cpio kernel-2.6.32.10-0.5.ppc64.rpm | cpio -idv './lib/modules/2.6.32.10-0.5-ppc64/*'
cp -r ./lib/modules/2.6.32.10-0.5-ppc64 /lib/modules
Run genimage/packimage to update the image with the new kernel:
genimage -i eth0 -n ibmveth -o sles11.1 -p compute -k 2.6.32.10-0.5-ppc64
packimage -o sles11 -p compute -a ppc64
Reboot the node with the new image:
nodeset pnode2 netboot
rnetboot pnode2
To show the new kernel, run:
xdsh pnode2 uname -a
If you want to remove an image, rmimage is used to remove the Linux stateless or statelite image from the file system. It is better to use this command than just remove the filesystem yourself, because it also remove appropriate links to real files system that may be distroyed on your Management Node, if you just use the rm -rf command.
You can specify the <os>, <arch> and <profile> value to the rmimagecommand:
rmimage -o <os> -a <arch> -p <profile>
Or, you can specify one imagename to the command:
rmimage <imagename>
Please refer to the [XCAT_Linux_Statelite] documentation.
Ensure that ssh is installed on the AIX xCAT management node. If you are using an AIX management node, make sure the value of "useSSHonAIX" is "yes" in the site table.
chtab key="useSSHonAIX" site.value=yes
The Lpar , CEC, or BPA has been defined in the nodelist, nodehm, nodetype, vpd, ppc tables.
Define the HMC as a node on the management node. For example,
chdef hmc01.clusters.com nodetype=hmc mgt=hmc groups=hmc username=hscroot password=abc123
Run the rspconfig command to set up and generate the ssh keys on the xCAT management node and transfer the public key to the HMC. You must also manually configure the HMC to allow remote ssh connections. For example:
rspconfig hmc01.clusters.com sshcfg=enable
Download the Microcode update package and associated XML file from the IBM Web site:
http://www14.software.ibm.com/webapp/set2/firmware/gjsn.
For P5/P6 (with HMC),P7 (without HMC) node definition, please refer to XCAT_System_p_Hardware_Management
rinv Server-m_tmp-SNs_tmp firm
Download the Microcode update package and associated XML file from the IBM Web site:
http://www14.software.ibm.com/webapp/set2/firmware/gjsn.
Create the /tmp/fw directory, if necessary, and copy the downloaded files to the /tmp/fw directory.
Run the rflash command with the --activate flag to specify the update mode to perform the updates.Please see the "flash" manpage for more information.
rflash Server-m_tmp-SNs_tmp -p /tmp/fw --activate disruptive
NOTE:You Need check your update is concurrent or disruptive here!! And the concurrent update is only for P5/P6 with HMC. Other commands sample:
rflash Server-m_tmp-SNs_tmp -p /tmp/fw --activate concurrent
Notes:
1)If the noderange is the group lpar, the upgrade steps are the same as the CEC's.
2)System p5, p6 and p7 updates can require time to complete and there is no visual indication that the command is proceeding.
For P5/P6 (with HMC),P7 (without HMC) nodes definition, please refer to XCAT_System_p_Hardware_Management
rinv Server-m_tmps_tmp firm
See rinv manpage for more options.
Download he Microcode update package and associated XML file from the IBM Web site:
http://www14.software.ibm.com/webapp/set2/firmware/gjsn
Create the /tmp/fw directory, if necessary, and copy the downloaded files to the /tmp/fw directory.
Run the rflash command with the --activate flag to specify the update mode to perform the updates.
rflash Server-m_tmps_tmp -p /tmp/fw --activate disruptive
NOTE:You Need check your update is concurrent or disruptive here!! And the concurrent update is only for P5/P6 with HMC. other commands sample:
rflash Server-m_tmps_tmp -p /tmp/fw --activate concurrent
Refer to the environment setup in the section 'Firmware upgrade for CEC on P5/P6/p7' to make sure the firmware version is correct.
Run the rflash command with the commit flag.
rflash Server-m_tmp-SNs_tmp --commit
Notes:
(1)If the noderange is Lpar, the commit steps are the same as the CEC's.
(2) When the --commit or --recover two flags is used, the noderange cannot be BPA . It only can be CEC or LPAR for P5/P6,and will take effect for both managed systems and power subsystems. It can be frame or BPA for P7, and will take effect for power subsystems only.
Refer to [Using_Linux_Driver_Update_Disk].
Kdump is a kexec-based kernel crash dumping mechanism for Linux.Currently i386, x86_64 and ppc64 ports of kdump are available, and the mainstream distrobutions including Fedora, RedHat Enterprise Linux and SuSE Linux Enterprise Server have shipped the kdump rpm packages.
For RHELS6 or other Linux OSes, there are two rpm packages for kdump, which are
kexec-tools, crash
Before creating the stateless/statelite Linux root images with kdump enabled, please add these two rpm packages into the <profile>.<os>.<arch>.pkglist file.
For Linux images, there's one attribute called dump, which is used to define the remote NFS path where the crash information is dumped to.
The format of dump follows the standard URI format, since currently only NFS protocol is supported, its value should be set to:
nfs://<nfs_server_ip>/<kdump_path>
If you intend to use the node's Service Node or Management Node (if no service node is avaiable) as the nfs_server, you can ignore the <nfs_server_ip> field, and you can set the value of dump attribute like:
nfs:///<kdump_path>
which treats the node's SN/MN as the NFS server for the kdump service.
Based on <profile>, <os> and <arch>, there should be the definitions for two Linux images, one is for diskless, the other one is for statelite, which are:
<os>-<arch>-netboot-<profile>
<os>-<arch>-statelite-<profile>
For diskless image, set the value of dump attribute for diskless image by the following command:
chdef -t osimage <os>-<arch>-netboot-<profile> dump=nfs://<nfs_server_ip>/<kdump_path>
For example, the image name is rhels6-ppc64-netboot-compute, the NFS server used for kdump is 10.1.0.1, and the path on the NFS server is /install/kdump, you can set the value by:
chdef -t osimage rhels6-ppc64-netboot-compute dump=nfs://10.1.0.1/install/kdump
For statelite image, Set the value of dump attribute for statelite image by the following command:
chdef -t osimage <os>-<arch>-statelite-<profile> dump=nfs://<nfs_server_ip>/<kdump_path>
Note: If there's no such osimages called <os>-<arch>-netboot-<profile> or <os>-<arch>-statelite-<profile> in the linuximage table, please update the dump attribute after the genimage command is executed.
Please make sure the NFS path(nfs://<nfs_server_ip>/<kdump_path>) specified for the dump attribute is write-able. Once the kernel panic is triggered, the node will reboot into the capture kernel and a kernel dump (vmcore) will be automatically saved to the <kdump_path>/var/crash/<node_ip>-<time>/ directory on the specified NFS server (<nfs_server_ip>), you donot need to create the /var/crash/ directory under the NFS path(nfs://<nfs_server_ip>/<kdump_path>), it will be created by the kdump service when saving the crash information.
This step is for statelite only.
For the statelite image, the /boot/ directory and the /etc/kdump.conf file should be added into the litefile table.
You can use the tabedit litefile command to update the litefile table. After they are added into the table, there should be two new entries as the following:
"ALL","/etc/kdump.conf",,,
"ALL","/boot/",,,
The <profile>.exlist file is located in the /opt/xcat/share/xcat/netboot/<os_platform>/ directory. You need to copy it to the /install/custom/netboot/<osplatform> directory, then edit the <profile>.exlist file in /install/custom/netboot/<osplatform> directory. kdump needs to create one new initrd file in the /boot/ directory of the rootimg, so the line which contains "/boot*" should be removed from the <profile>.exlist file.
In order to enable kdump service for the specified node/nodegroup, you need to add the "enablekdump" postscript as the following command:
chdef <noderange> -p postscripts=enablekdump
Please follow Stateless_node_installation to generate the diskless rootimg; and please follow Create_Statelite_Image to generate the statelite image.
Please follow the documents including xCAT_pLinux_Clusters and xCAT_Linux_Statelite to setup the diskless/statelite image, and to make the specified noderange booting with the diskless/statelite image.