XCAT_pLinux_Clusters

There is a newer version of this page. You can find it here.
 xCAT Linux on IBM System P Clusters

11/12/2010, AM 10:35:04

Introduction

This cookbook introduces how to use the xCAT2 to install Linux on the IBM power system machines.

The power system machines have the following characteristics:

  • May have multiple LPARs (an LPAR will be the target machine to install an operating system image on, i.e. the LPAR will be the compute node).
  • The Ethernet card and SCSI disk can be virtual devices.
  • An HMC or IVM is used for the HCP (hardware control point).

xCAT supports two types of installations for compute nodes: Diskfull installation (Statefull) and Diskless (Stateless). xCAT also supports hierarchical management clusters where one or more service nodes are used to handle the installation and management of compute nodes. Please refer to xCAT2SetupHierarchy.pdf for hierarchical usage.

Based on the two types of installation, the following installation scenarios will be described in this document:

  • Install a stateful compute node
  • Install a stateless compute node

To provide an easier understanding of the installation steps, this cookbook provides an example.

  • The management node:

    Arch: an LPAR on a p5/p6/p7 machine
    OS: Red Hat Enterprise Linux 5.2
    Hostname: pmanagenode
    IP: 192.168.0.1
    HCP: HMC

  • The management Network:

    Net: 192.168.0.0
    NetMask: 255.255.255.0
    Gateway: 192.168.0.1
    Cluster-face-IF: eth1
    dhcpserver: 192.168.0.1
    tftpserver: 192.168.0.1
    nameservers: 192.168.0.1

  • The compute nodes:

    Arch: an LPAR on a p5/p6/p7 machine
    OS: Red Hat Enterprise Linux 5.2
    HCP: HMC
    Hostname: pnode1 - this node will be installed statefull
    IP: 192.168.0.10
    Cluster-face-IF: eth0
    Hostname: pnode2 - this node will be installed stateless
    IP: 192.168.0.20
    Cluster-face-IF: eth0

  • The Hardware Control Point:

    Name: hmc1
    IP: 192.168.0.100

Install xCAT 2 on the Management node

Before proceeding to setup your pLlinux Cluster, you should first read for information on downloading and installing xCAT on your Management Node:

https://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_a_Linux_xCAT_Mgmt_Node

Some xCAT database tables will be used in the following chapters, for more details on xCAT database tables check xcatdb manpage

Setup for P7 IH Cluster on xCAT MN

This section is being used as an overview section for the P7 IH support working with xCAT EMS. The detail implementation will be described in P7 IH cluster guide.

{{:P7_IH_Cluster_on_Linux_MN}}

Setup the management node

[Power 5] Workaround the atftpd issue

The tftp client in the open firmware of p5 is only compatible with tftp-server instead of atftpd which is required by xCAT2. So we have to remove the atftpd first and then install the tftp-server. This is not required for Power6 or later.

Remove atftp

rpm -qa | grep atftp
Could find one or both of the following rpms:
atftp-xcat-*
atftp-*
service tftpd stop
rpm --nodeps -e atftp-xcat atftp

Install the tftp server needed by xCAT, and restart it

[RH]:

yum install tftp-server.ppc

[SLES]:

zypper install tftp

Restart the tftp server

Notes: make sure the entry "disable=no" in the /etc/xinetd.d/tftp.

service xinetd restart

Setup common attributes for xCAT in the database

The xCAT database table passwd contains default userids and passwords for xCAT to access cluster components. This section will describe how to set the default userids and passwords for system and hmc in xCAT database table.

Add the default account for system

chtab key=system passwd.username=root passwd.password=cluster

Add the default account for hmc

chtab key=hmc passwd.username=hscroot passwd.password=abc123

Note: The username and password for xCAT to access the HMCs can be specified through mkdef or chdef command, this is useful especially when some specific HMCs use the different username and password with the default ones. For example:

mkdef -t node -o hmc1 groups=hmc,all nodetype=hmc mgt=hmc username=hscroot password=abc1234
chdef -t node -o hmc1 username=hscroot password=abc1234

Define the compute nodes

The definition of a node is stored in several tables of the xCAT database.

You can use rscan command to discover the HCP to get the nodes that managed by this HCP. The discovered nodes can be stored into a stanza file. Then edit the stanza file to keep the nodes which you want to create and use the mkdef command to create the nodes definition.

Define the hardware control point for the nodes.

The following command will create an xCAT node definition for an HMC with a host name of hmc1. The groups, nodetype, mgt, username, and password attributes will be set.

mkdef -t node -o hmc1 groups=hmc,all nodetype=hmc mgt=hmc username=hscroot password=abc123

to change and add new groups:

chdef -t node -o hmc1 groups=hmc,rack1,all

to verify your data:

lsdef -l hmc1

If xCAT Management Node is in the same service network with HMC, you will be able to discover the HMC and create an xCAT node definition for the HMC automatically.

lsslp -w -s HMC

To check for the hmc name added to the nodelist:

tabdump nodelist

The above xCAT command lsslp discovers and writes the HMCs into xCAT database, but we still need to set HMCs' username and password.

chdef -t node -o <hmcname from lsslp> username=hscroot password=abc123

For more details with hardware discovery feature in xCAT, please refer to document:

(http://xcat.svn.sourceforge.net/viewvc/xcat/xcat-core/trunk/xCAT-client/share/doc/xCAT2pHWManagement.pdf )

Discover the LPARs managed by HMC using rscan

Run the rscan command to gather the LPAR information. This command can be used to display the LPAR information in several formats and can also write the LPAR information directly to the xCAT database. In this example we will use the "-z" option to create a stanza file that contains the information gathered by rscan as well as some default values that could be used for the node definitions.

To write the stanza format output of rscan to a file called "node.stanza" run the following command. We are assuming, for our example ,that the hmc name returned from lsslp was hmc1.

rscan -z hmc1 > node.stanza

This file can then be checked and modified as needed. For example you may need to add a different name for the node definition or add additional attributes and values.

Note'': The stanza file will contain stanzas for things other than the LPARs. This information must also be defined in the xCAT database. ''The stanza will repeat the same bpa' information for multiple fsp(s). 'It is not necessary to modify the non-LPAR stanzas in any way.

The stanza file will look something like the following.

Server-9117-MMA-SN10F6F3D:
objtype=node
nodetype=fsp
id=5
model=9118-575
serial=02013EB
hcp=hmc01
pprofile=
parent=Server-9458-10099201WM_A
groups=fsp,all
mgt=hmc
pnode1:
objtype=node
nodetype=lpar,osi
id=9
hcp=hmc1
pprofile=lpar9
parent=Server-9117-MMA-SN10F6F3D
groups=lpar,all
mgt=hmc
cons=hmc
pnode2:
objtype=node
nodetype=lpar,osi
id=7
hcp=hmc1
pprofile=lpar6
parent=Server-9117-MMA-SN10F6F3D
groups=lpar,all
mgt=hmc
cons=hmc

Note'': The ''rscan'' command supports an option to automatically create node definitions in the xCAT database. To do this the LPAR name gathered by ''rscan'' is used as the node ''_name and the command sets several default values. If you use the “-w” option, make sure the LPAR name you defined will be the name you want used as your node name. _

_For a node which was defined correctly before, you can use the “lsdef –z [nodename]> node.stanza” command to export the definition into the node.stanza, and use command “cat node.stanza | chdef -z” to update the node.stanza according to your need. _

Define xCAT node using the stanza file

The information gathered by the rscan command can be used to create xCAT node definitions by running the following command:

cat node.stanza | mkdef -z

Verify the data:

lsdef -t node -l all

Define xCAT groups

See the section xCAT node group support in xCAT2top for more details on how to define xCAT groups. For the example below add the compute group to the nodes.

chdef -t node -o pnode1,pnode2 -p groups=compute

Update the attributes of the node

chdef -t node -o pnode1 netboot=yaboot tftpserver=192.168.0.1 nfsserver=192.168.0.1
monserver=192.168.0.1 xcatmaster=192.168.0.1 installnic="eth0" primarynic="eth0"

Note: Please make sure the attributes "installnic" and "primarynic" are set up by the correct Ethernet Interface of compute node. Otherwise the compute node installation may hang on requesting information from an incorrect interface. The "installnic" and "primarynic" can also be set to mac address if you are not sure about the Ethernet interface name, the mac address can be got through getmacs command. The installnic" and "primarynic" can also be set to keyword "mac", which means that the network interface specified by the mac address in the mac table will be used.

Make sure that the address used above ( 192.168.0.1) is the address of the Management Node as known by the node. Also make sure site.master has this address.

Check the site.master value

_Make sure site.master is the address or name known by the node _

To change site.master to this address:

chtab key=master site.value=_192.168.0.1_

Set the type attributes of the node

chdef -t node -o pnode1 os=<os> arch=ppc64 profile=compute

For valid options:

 tabdump -d nodetype

Set up customization scripts (optional)

xCAT supports the running of customization scripts on the nodes when they are installed. You can see what scripts xCAT will run by default by looking at the xcatdefaults entry in the xCAT postscripts database table. The postscripts attribute of the node definition can be used to specify the comma separated list of the scripts that you want to be executed on the nodes. The order of the scripts in the list determines the order in which they will be run.

For example, if you want to have your two scripts called foo and bar run on node node01 you could add them to the postscripts table:

_**chdef -t node -o node01 -p postscripts=foo,bar**_

(The -p flag means to add these to whatever is already set.)

For more information on creating and setting up Post*scripts:
[Postscripts_and_Prescripts]

Add NTP setup script (optional)

To enable the NTP services on the cluster, first configure NTP on the management node and start ntpd.

service ntpd start

Next set the ntpservers attribute in the site table. Whatever time servers are listed in this attribute will be used by all the nodes that boot directly from the management node. In our example, the Management Node will be used as the ntp server.

_chdef -t site ntpservers= myMN_

To have xCAT automatically set up ntp on the cluster nodes you must add the setupntp script to the list of postscripts that are run on the nodes.

To do this you can either modify the postscripts attribute for each node individually or you can just modify the definition of a group that all the nodes belong to.

For example, if all your nodes belong to the group compute, then you could add setupntp to the group definition by running the following command.

 _chdef -p -t group -o compute postscripts=setupntp_

Setup Basic Services

A basic networks table was created for you during the xCAT install. Review that table and add additional networks based on your hardware configuration.

Setup the networks table

Create the networks that used for cluster management:

mkdef -t network -o net1 net=192.168.0.0 mask=255.255.255.0 gateway=192.168.0.1 mgtifname=eth1
 dhcpserver=192.168.0.1 tftpserver=192.168.0.1 nameservers=192.168.0.1

Setup Name Resolution

Setup /etc/hosts with entries for all you nodes, hmcs, fsps, etc

vi /etc/hosts
127.0.0.1  localhost
192.168.0.1 pmanagenode
192.168.0.10 pnode1
192.168.0.20 pnode2
192.168.0.100 hmc1
       .
       .
       .

Setup the nameserver

Add following lines into /etc/resolv.conf:

vi /etc/resolv.conf
search cluster.net
nameserver 192.168.0.1

Setup the DNS attributes in the Site table

Setup nameserver:

chdef -t site nameservers=192.168.0.1 Setup the external nameserver: chdef -t site forwarders=9.112.4.1 Setup the local domain name: chdef -t site domain=cluster.net

Setup DNS configuration

makedns
service named start
chkconfig --level 345 named on

Updating the DNS configuration

If you add nodes or update the networks table at a later time, then rerun makedns:

makedns
service named restart

Configure conserver

The xCAT rcons command uses the conserver package to provide support for multiple read-only consoles on a single node and the console logging. For example, if a user has a read-write console session open on node node1, other users could also log in to that console session on node1 as read-only users. This allows sharing a console server session between multiple users for diagnostic or other collaborative purposes. The console logging function will log the console output and activities for any node with remote console attributes set to the following file which an be replayed for debugging or any other purposes:

/var/log/consoles/<management node>

Note: conserver=<management node> is the default, so it optional in the command

Update conserver configuration

Each xCAT node with remote console attributes set should be added into the conserver configuration file to make the rcons work. The xCAT command makeconservercf will put all the nodes into conserver configuration file /etc/conserver.cf. The makeconservercf command must be run when there is any node definition changes that will affect the conserver, such as adding new nodes, removing nodes or changing the nodes' remote console settings.

To add or remove new nodes for conserver support:

makeconservercf
service conserver stop
service conserver start

Check rcons(rnetboot and getmacs depend on it)

The functions rnetboot and getmacs depend on conserver functions, check it is available.

rcons pnode1

If it works ok, you will get into the console interface of the pnode1. If it does not work, review your rcons setup as documented in previous steps.

Check hardware control setup to the nodes

See if you setup is correct at this point, run rpower to check node status:

rpower pnode1 stat

Update the mac table with the address of the node(s)

Before run getmacs, make sure the node is off. The reason is that HMC has one issue that it cannot shutdown linux nodes which are in running state.

If not, please force the lpar shutdown with:

rpower pnode1 stat, if node is on then run
rpower pnode1 off

If there's only one Ethernet adapter on the node or you have specified the installnic or primarynic attribute of the node, using following command can get the correct mac address.

Check for *nic definition, buy running

lsdef pnode1

To set installnic or primarynic:

chdef -t pnode1 -o blade01 installnic=eth0 primarynic=eth1

Get mac addresses:

getmacs pnode1

But, if there are more than one Ethernet adapters on the node, and you don't know which one has been configured for the installation process, or the lpar is just created and there is no active profile for that lpar, or the lpar is on a P5 system and there is no lhea/sea ethernet adapters, you have to specify more parameters like this for lpar to try to figure out an available interface by ping operation, you can run this command:

getmacs pnode1 -D -S 192.168.0.1 -G 192.168.0.10

The output looks like following:

pnode1:
Type Location Code MAC Address Full Path Name Ping Result Device Type
ent U9133.55A.10E093F-V4-C5-T1 f2:60:f0:00:40:05 /vdevice/l-lan@30000005 virtual

And the Mac address will be written into the xCAT mac table. Run to verify:

tabdump mac

Setup dhcp service

Setup the dhcp listen interfaces in site table

 chdef -t site dhcpinterfaces='pmanagenode|eth1'

[SLES] Check the installation of dhcp-server

On the SLES management node, the dhcp-server rpm may not have been automatically installed. Use following command to check whether it has been installed:

 rpm -qa | grep -E "^dhcp-server"

If it is not installed, installed it manually:

 zypper install dhcp-server

Configure the DHCP

Add the relevant networks into the DHCP configuration:

 makedhcp -n

Add the defined nodes into the DHCP configuration:

 makedhcp -a

Restart he dhcp service:

 service dhcpd restart

Note: Please make sure there is only one dhcpd server can serv these compute nodes.

Install a Compute Node

Prepare the installation source

You can use the iso file of the installed OS to extract the installation files. For example, you have a iso file /iso/RHEL5.2-Server-20080430.0-ppc-DVD.iso

copycds /iso/RHEL5.2-Server-20080430.0-ppc-DVD.iso

Note: If you encounter the issue that the iso cannot be mounted by the copycds command. Make sure the SElinux is disabled.

Statefull Node installation

OS versus Platform

Before following the next steps of installation, you need to know about the relationship between <os> and <platform>. Normally, <os> is used as the name of the operating system, <platform> is used as one family or platform containing many operating systems. We can think that <platform> contains <os>.

For example, considering RedHat Enterprise Linux 6.0, rhels6 is the <os>, and rh is the <platform>. For SuSE Linux Enterprise Server 11 SP1, sles11.1 is the <os>, and sles is the <platform>.

Note: This naming convention is suitable for the installation of the Stateful/Stateless/Statelite Compute/Service nodes.

Customize the install profile

xCAT uses KickStart or AutoYaST installation templates and related installation scripts to complete the installation and configuration of the compute node.

You can find sample templates for common profiles in following directory:

/opt/xcat/share/xcat/install/&lt;platform&gt;/

If you customize a template then you should copy it to:

/install/custom/install/&lt;platform&gt; directory.

Search order for installation templates

The profile, os and architecure of the node was setup in Set the type attributes of the node above.

To check the setting of your node's profile, os, architecture run:

lsdef pnode1
Object name: pnode1
.
.
.
arch=ppc64
os=rhels5.5
profile=compute

For this example, the search order for the template file is as follows:

The directory /install/custom/install/<platform> will be searched first and then search /opt/xcat/share/xcat/install/<platform>.

Then in the diretory the following order will be honored:

compute.rhels5.5.ppc64.tmpl
compute.rhels5.ppc64.tmpl
compute.rhels.ppc64.tmpl
compute.rhels5.5.tmpl
compute.rhels5.tmpl
compute.rhels.tmpl
compute.ppc64.tmpl
compute.tmpl
  • Customizing templates

If you want to customize a template for node , you should copy the template to the /install/custom/install/<os>/ directory and make your modifications unless you rename your file. You need to copy to the custom directory, so the next install of xCAT will not wipe out your modifications, because it will update the /opt/xcat/share directory. Keep in mind the above search order to make sure it picks up your template.

Note: Sometimes the directory /opt/xcat/share/xcat/install/scripts also needs to be copied to /install/custom/install/ to make the customized profile work, because the customized profiles will need to include the files in scripts directory as the prescripts and postscripts.

For example, you need to put the .otherpkgs.pkglistfile into the /install/custom/install/<os>/ directory, if you need to install other packages.

Install other specific packages

If you want to install a specific package like a specific .rpm onto the compute node, copy the rpm into the following directory:

/install/post/otherpkgs/&lt;os&gt;/&lt;arch&gt;

Another thing you MUST DO is to create repodata for this directory. You can use the "createrepo" command to create repodata.

On RHEL5.x, the "createrepo" rpm package can be found in the install ISO; on SLES11, it can be found in SLE-11-SDK-DVD Media 1 ISO.

After "createrepo" is installed, you need to create one text file which contains the complete list of files to include in the repository. For example, the name of the text file is rpms.list in /install/post/otherpkgs/<os>/<arch> directory. Create rpms.list:

cd /install/post/otherpkgs/&lt;os&gt;/&lt;arch&gt;
ls *.rpm &gt;rpms.list

Then, please run the following command to create the repodata for the newly-added packages:

createrepo -i rpms.list /install/post/otherpkgs/&lt;os&gt;/&lt;arch&gt;

The createrepo command with -i rpms.list option will create the repository for the rpm packages listed in the rpms.list file. It won't destroy or affect the rpm packages that are in the same directory, but have been included into another repository.

Or, if you create a sub-directory to contain the rpm packages, for example, named other in /install/post/otherpkgs/<os>/<arch>. Please run the following command to create repodata for the directory /install/post/otherpkgs/<os>/<arch>/.

createrepo /install/post/otherpkgs/&lt;os&gt;/&lt;arch&gt;/**other**

Note: Please replace other with your real directory name.

Set the node status to ready for installation

nodeset pnode1 install

Use network boot to start the installation

rnetboot pnode1

Check the installation results

After the node installation is completed successfully, the node's status will be changed to booted, the following command to check the node's status:

lsdef pnode1 -i status

When the node's status is changed to booted, you can also check ssh service on the node is working and you can login without password. Note: Do not run ssh or xdsh against the node until the node installation is completed successfully. Running ssh or xdsh against the node before the node installation completed may result in ssh hostkeys issues.

If ssh is working but cannot login without password, force exchange the ssh key to the compute node using xdsh:

xdsh pnode1 -K

After exchanging ssh key, following command should work, without being prompted for a password.

xdsh pnode1 date

Install a new Kernel on the nodes

Using a postinstall script ( you could also use the updatenode method):

mkdir /install/postscripts/data
cp &lt;kernel&gt; /install/postscripts/data

Create the postscript updatekernel:

vi /install/postscripts/updatekernel
#!/bin/bash
rpm -Uivh data/kernel-*rpm
chmod 755 /install/postscripts/updatekernel

Add the script to the postscripts table and run the install:

chdef -p -t group -o compute postscripts=updatekernel
rinstall compute

Statelessnode installation

Generate the stateless image for compute node

Typically, you can build your stateless compute node image on the Management Node, if it has the same OS and architecture as the node. If you need another OS image or architecture than the OS installed on the Management Node, you will need a machine that meets the OS and architecture you want for the image and create the image on that node.

Make the compute node add/exclude packaging list

The default list of rpms to added or exclude to the diskless images is shipped in the following directory:

/opt/xcat/share/xcat/netboot/&lt;platform&gt;

If you want to modify the current defaults for .pkglist or .exlist or *.postinstall, copy the shipped default lists to the following directory, so your modifications will not be removed on the next xCAT rpm update. xCAT will first look in the custom directory for the files before going to the share directory.

/install/custom/netboot/&lt;platform&gt; directory

If you want to exclude more packages, add them into the following exlist file:

/install/custom/netboot/&lt;platform&gt;/&lt;profile&gt;.exlist

Add more packages names that need to be installed on the stateless node into the pkglist file

/install/custom/netboot/&lt;platform&gt;/&lt;profile&gt;.pkglist

Setting up postinstall files

There are rules ( release 2.4 or later) for which * postinstall files will be selected to be used by genimage.

If you are going to make modifications, copy the appropriate /opt/xcat/share/xcat/netboot/<platform>/*postinstall file to the

/install/custom/netboot/<platform> directory:

cp opt/xcat/share/xcat/netboot/&lt;platform&gt;/*postinstall /install/custom/netboot/&lt;platform&gt;/.

Use these basic rules to edit the correct file in the /install/custom/netboot/<platform> directory. The rule allows you to customize your image down to the profile, os and architecture level, if needed.

You will find postinstall files of the following formats and genimage* will process the files in the order of the below formats:

&lt;profile&gt;.&lt;os&gt;.&lt;arch&gt;.postinstall
&lt;profile&gt;.&lt;arch&gt;.postinstall
&lt;profile&gt;.&lt;os&gt;.postinstall
&lt;profile&gt;.postinstall

This means, if "<profile>.<os>.<arch>.postinstall" is there, it will be used first.

  • If there is no such a file, then the "<profile>.<arch>.postinstall" file will be used.
  • If there's no such a file , then the "<profile>.<os>.postinstall" file will be used.
  • If there is no such file, then it will use "<profile>.postinstall".

Make sure you have the basic postinstall script setup in the directory to run for your genimage. The one shipped will setup fstab and rcons to work properly and is required.

You can add more postinstall process ,if you want. The basic postinstall script (2.4) will be named <profile>.<arch>.postinstall ( e.g. compute.ppc64.postinstall). You can create one for a specific os by copying the shipped one to , for example, compute.rhels5.4.ppc64.postinstall

Note: you can use the sample here: /opt/xcat/share/xcat/netboot/<platform>/

[RH]:

Add following packages name into the <profile>.pkglist

bash
nfs-utils
stunnel
dhclient
kernel
openssh-server
openssh-clients
busybox-anaconda
wget
vim-minimal
ntp

You can add any other packages that you want to install on your compute node. For example, if you want to have userids with passwords you should add the following:

cracklib
libuser
passwd

[SLES11]:

Add following packages name into the <profile>.pkglist

aaa_base
bash
nfs-utils
dhcpcd
kernel
openssh
psmisc
wget
sysconfig
syslog-ng
klogd
vim

Run image generation

[RHEL]:

cd /opt/xcat/share/xcat/netboot/rh
./genimage -i eth0 -n ibmveth -o rhels5.2 -p compute

[SLES11]:

cd /opt/xcat/share/xcat/netboot/sles
./genimage -i eth0 -n ibmveth -o sles11 -p compute

Pack the image

[RHEL]:

packimage -o rhels5.2 -p compute -a ppc64

[SLES]:

packimage -o sles11 -p compute -a ppc64

Set the node status ready for network boot

nodeset pnode2 netboot

Use network boot to start the installation

rnetboot pnode2

Check the installation result

After the node installation is completed successfully, the node's status will be changed to booted, the following command to check the node's status:

lsdef pnode2 -i status

When the node's status is changed to booted, you can also check ssh service on the node is working and you can login without password.

Note: Do not run ssh or xdsh against the node until the node installation is completed successfully. Running ssh or xdsh against the node before the node installation completed may result in ssh hostkeys issues.

If ssh is working but cannot login without password, force exchange the ssh key to the compute node using xdsh:

xdsh pnode2 -K

After exchanging ssh key, following command should work.

xdsh pnode2 date

Installing a new Kernel in the stateless image

Put your new kernel and kernel modules on the MN. If the new kernel is already installed on your MN, you can go directly to the genimage command below. But, more likely, this new kernel is not installed on the MN. If that's the case, you can copy the kernel into /boot and the modules into /lib/modules/<new kernel directory> , and genimage will pick them up from there. Assuming you have the kernel in RPM format in /tmp:

cd /tmp
rpm2cpio kernel-2.6.32.10-0.5.ppc64.rpm | cpio -idv ./boot/vmlinuz-2.6.32.10-0.5-ppc64
cp ./boot/vmlinux-2.6.32.10-0.5-ppc64  /boot
rpm2cpio kernel-2.6.32.10-0.5.ppc64.rpm | cpio -idv './lib/modules/2.6.32.10-0.5-ppc64/*'
cp -r ./lib/modules/2.6.32.10-0.5-ppc64 /lib/modules

Run genimage/packimage to update the image with the new kernel:

genimage -i eth0 -n ibmveth -o sles11.1 -p compute -k 2.6.32.10-0.5-ppc64
packimage -o sles11 -p compute -a ppc64

Reboot the node with the new image:

nodeset pnode2 netboot
rnetboot pnode2

To show the new kernel, run:

xdsh pnode2 uname -a

Remove an image

If you want to remove an image, rmimage is used to remove the Linux stateless or statelite image from the file system. It is better to use this command than just remove the filesystem yourself, because it also remove appropriate links to real files system that may be distroyed on your Management Node, if you just use the rm -rf command.

You can specify the <os>, <arch> and <profile> value to the rmimagecommand:

rmimage -o &lt;os&gt; -a &lt;arch&gt; -p &lt;profile&gt;

Or, you can specify one imagename to the command:

rmimage &lt;imagename&gt;

Statelite Node installation

Please refer to the [XCAT_Linux_Statelite] documentation.

Firmware upgrade

Prepare for Firmware upgrade

Enable the HMC to allow remote ssh connections(Only for P5/P6 with HMC).

[AIX]

Ensure that ssh is installed on the AIX xCAT management node. If you are using an AIX management node, make sure the value of "useSSHonAIX" is "yes" in the site table.

chtab key="useSSHonAIX" site.value=yes

Define the necessary attributes

The Lpar , CEC, or BPA has been defined in the nodelist, nodehm, nodetype, vpd, ppc tables.

Define the HMC as a node(Only for P5/P6 with HMC)

Define the HMC as a node on the management node. For example,

chdef hmc01.clusters.com nodetype=hmc mgt=hmc groups=hmc username=hscroot password=abc123

Setup SSH connection to HMC(Only for P5/P6 with HMC)

Run the rspconfig command to set up and generate the ssh keys on the xCAT management node and transfer the public key to the HMC. You must also manually configure the HMC to allow remote ssh connections. For example:

rspconfig hmc01.clusters.com sshcfg=enable

Get the Microcode update package and associated XML file

Download the Microcode update package and associated XML file from the IBM Web site:

http://www14.software.ibm.com/webapp/set2/firmware/gjsn.

Perform Firmware upgrade for CEC on P5/P6/P7

Define the CEC as a node on the management node

For P5/P6 (with HMC),P7 (without HMC) node definition, please refer to XCAT_System_p_Hardware_Management

Check firmware level

rinv Server-m_tmp-SNs_tmp firm

Update the firmware

Download the Microcode update package and associated XML file from the IBM Web site:

http://www14.software.ibm.com/webapp/set2/firmware/gjsn.

Create the /tmp/fw directory, if necessary, and copy the downloaded files to the /tmp/fw directory.

Run the rflash command with the --activate flag to specify the update mode to perform the updates.Please see the "flash" manpage for more information.

rflash Server-m_tmp-SNs_tmp -p /tmp/fw --activate disruptive

NOTE:You Need check your update is concurrent or disruptive here!! And the concurrent update is only for P5/P6 with HMC. Other commands sample:

rflash Server-m_tmp-SNs_tmp -p /tmp/fw --activate concurrent

Notes:

1)If the noderange is the group lpar, the upgrade steps are the same as the CEC's.

2)System p5, p6 and p7 updates can require time to complete and there is no visual indication that the command is proceeding.

Perform Firmware upgrades for BPA on P5/P6/P7

Define the BPA as a node on the Management Node

For P5/P6 (with HMC),P7 (without HMC) nodes definition, please refer to XCAT_System_p_Hardware_Management

Use rinv to check the firmware level

rinv Server-m_tmps_tmp firm

See rinv manpage for more options.

Update the firmware

Download he Microcode update package and associated XML file from the IBM Web site:

http://www14.software.ibm.com/webapp/set2/firmware/gjsn

Create the /tmp/fw directory, if necessary, and copy the downloaded files to the /tmp/fw directory.

Run the rflash command with the --activate flag to specify the update mode to perform the updates.

rflash Server-m_tmps_tmp -p /tmp/fw --activate disruptive

NOTE:You Need check your update is concurrent or disruptive here!! And the concurrent update is only for P5/P6 with HMC. other commands sample:

rflash Server-m_tmps_tmp -p /tmp/fw --activate concurrent

Commit currently activated LIC update(copy T to P) for a CEC/BPA on p5/p6/p7

Check firmware level

Refer to the environment setup in the section 'Firmware upgrade for CEC on P5/P6/p7' to make sure the firmware version is correct.

Commit the firmware LIC

Run the rflash command with the commit flag.

rflash Server-m_tmp-SNs_tmp --commit

Notes:

(1)If the noderange is Lpar, the commit steps are the same as the CEC's.

(2) When the --commit or --recover two flags is used, the noderange cannot be BPA . It only can be CEC or LPAR for P5/P6,and will take effect for both managed systems and power subsystems. It can be frame or BPA for P7, and will take effect for power subsystems only.

Advanced features

Use the driver update disk:

Refer to [Using_Linux_Driver_Update_Disk].

Setup Kdump Service over Ethernet/HFI on diskless Linux (for xCAT 2.6 and higher)

Ovewview

Kdump is a kexec-based kernel crash dumping mechanism for Linux.Currently i386, x86_64 and ppc64 ports of kdump are available, and the mainstream distrobutions including Fedora, RedHat Enterprise Linux and SuSE Linux Enterprise Server have shipped the kdump rpm packages.

Update the .pkglist file

For RHELS6 or other Linux OSes, there are two rpm packages for kdump, which are

 kexec-tools, crash

Before creating the stateless/statelite Linux root images with kdump enabled, please add these two rpm packages into the <profile>.<os>.<arch>.pkglist file.

Update the "dump" attribute for the image

For Linux images, there's one attribute called dump, which is used to define the remote NFS path where the crash information is dumped to.

The Format of the "dump" attribute

The format of dump follows the standard URI format, since currently only NFS protocol is supported, its value should be set to:

 nfs://&lt;nfs_server_ip&gt;/&lt;kdump_path&gt;

If you intend to use the node's Service Node or Management Node (if no service node is avaiable) as the nfs_server, you can ignore the <nfs_server_ip> field, and you can set the value of dump attribute like:

 nfs:///&lt;kdump_path&gt;

which treats the node's SN/MN as the NFS server for the kdump service.

Update the "dump" attribute

Based on <profile>, <os> and <arch>, there should be the definitions for two Linux images, one is for diskless, the other one is for statelite, which are:

 &lt;os&gt;-&lt;arch&gt;-netboot-&lt;profile&gt;
 &lt;os&gt;-&lt;arch&gt;-statelite-&lt;profile&gt;

For diskless image, set the value of dump attribute for diskless image by the following command:

 chdef -t osimage &lt;os&gt;-&lt;arch&gt;-netboot-&lt;profile&gt; dump=nfs://&lt;nfs_server_ip&gt;/&lt;kdump_path&gt;

For example, the image name is rhels6-ppc64-netboot-compute, the NFS server used for kdump is 10.1.0.1, and the path on the NFS server is /install/kdump, you can set the value by:

 chdef -t osimage rhels6-ppc64-netboot-compute dump=nfs://10.1.0.1/install/kdump

For statelite image, Set the value of dump attribute for statelite image by the following command:

 chdef -t osimage &lt;os&gt;-&lt;arch&gt;-statelite-&lt;profile&gt; dump=nfs://&lt;nfs_server_ip&gt;/&lt;kdump_path&gt;

Note: If there's no such osimages called <os>-<arch>-netboot-<profile> or <os>-<arch>-statelite-<profile> in the linuximage table, please update the dump attribute after the genimage command is executed.

Notes

Please make sure the NFS path(nfs://<nfs_server_ip>/<kdump_path>) specified for the dump attribute is write-able. Once the kernel panic is triggered, the node will reboot into the capture kernel and a kernel dump (vmcore) will be automatically saved to the <kdump_path>/var/crash/<node_ip>-<time>/ directory on the specified NFS server (<nfs_server_ip>), you donot need to create the /var/crash/ directory under the NFS path(nfs://<nfs_server_ip>/<kdump_path>), it will be created by the kdump service when saving the crash information.

Edit litefile table (for statelite only)

This step is for statelite only.

For the statelite image, the /boot/ directory and the /etc/kdump.conf file should be added into the litefile table.

You can use the tabedit litefile command to update the litefile table. After they are added into the table, there should be two new entries as the following:

 "ALL","/etc/kdump.conf",,,
 "ALL","/boot/",,,

Edit the .exlist file (for diskless only)

The <profile>.exlist file is located in the /opt/xcat/share/xcat/netboot/<os_platform>/ directory. You need to copy it to the /install/custom/netboot/<osplatform> directory, then edit the <profile>.exlist file in /install/custom/netboot/<osplatform> directory. kdump needs to create one new initrd file in the /boot/ directory of the rootimg, so the line which contains "/boot*" should be removed from the <profile>.exlist file.

Add the "enablekdump" postscript for the node

In order to enable kdump service for the specified node/nodegroup, you need to add the "enablekdump" postscript as the following command:

 chdef &lt;noderange&gt; -p postscripts=enablekdump

Generate rootimage for diskless/statelite

Please follow Stateless_node_installation to generate the diskless rootimg; and please follow Create_Statelite_Image to generate the statelite image.

The Rest Steps

Please follow the documents including xCAT_pLinux_Clusters and xCAT_Linux_Statelite to setup the diskless/statelite image, and to make the specified noderange booting with the diskless/statelite image.

References


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.