XCAT_pLinux_Clusters

There is a newer version of this page. You can find it here.

Introduction

This cookbook provides instructions on how to use xCAT to create and deploy a Linux cluster on IBM power system machines.

The power system machines have the following characteristics:

  • May have multiple LPARs (an LPAR will be the target machine to install an operating system image on, i.e. the LPAR will be the compute node).
  • The Ethernet card and SCSI disk can be virtual devices.
  • An HMC is used for the HCP (hardware control point) or for Power 775 nodes DFM (Direct FSP/BPA Management) is used to control them.

xCAT supports three types of installations for compute nodes: Diskfull installation (Statefull), Diskless Stateless, and Diskless Statelite. xCAT also supports hierarchical clusters where one or more service nodes are used to handle the installation and management of compute nodes. Please refer to [Setting_Up_a_Linux_Hierarchical_Cluster] for hierarchical usage.

This document will guide you through installing xCAT on your management node, configuring your cluster, deploying a Linux operating system to your compute nodes, and optionally upgrading firmware on your power system hardware.

To provide an easier understanding of the installation steps, this cookbook provides example commands for the following cluster configuration:

  • The management node:

    Arch: an LPAR on a p5/p6/p7 machine
    OS: Red Hat Enterprise Linux 5.2
    Hostname: pmanagenode
    IP: 192.168.0.1
    HCP: HMC

  • The management Network:

    Net: 192.168.0.0
    NetMask: 255.255.255.0
    Gateway: 192.168.0.1
    Cluster-face-IF: eth1
    dhcpserver: 192.168.0.1
    tftpserver: 192.168.0.1
    nameservers: 192.168.0.1

  • The compute nodes:

    Arch: an LPAR on a p5/p6/p7 machine
    OS: Red Hat Enterprise Linux 5.2
    HCP: HMC
    Hostname: pnode1 - this node will be installed statefull
    IP: 192.168.0.10
    Cluster-face-IF: eth0
    Hostname: pnode2 - this node will be installed stateless
    IP: 192.168.0.20
    Cluster-face-IF: eth0

  • The Hardware Control Point (only for the non-DFM case):

    Name: hmc1
    IP: 192.168.0.100

Install xCAT 2 on the Management node

Before proceeding to setup your pLlinux Cluster, you must first follow the instructions for installing the operating system, configuring your server to allow xCAT to set up valid defaults, and downloading and installing the xCAT rpms on your Linux management node:

[Setting_Up_a_Linux_xCAT_Mgmt_Node]

The details in the following chapters refer to various xCAT database tables. It may be helpful to briefly review the list of tables in the xCAT database and their descriptions to give you more background information before proceding. See xcatdb manpage

Setup the management node

[Power 5] Workaround the atftpd issue

The tftp client in the open firmware of p5 is only compatible with tftp-server instead of atftpd which is required by xCAT2. So we have to remove the atftpd first and then install the tftp-server. This is not required for Power6 or later.

Remove atftp

rpm -qa | grep atftp
Could find one or both of the following rpms:
atftp-xcat-*
atftp-*
service tftpd stop
rpm --nodeps -e atftp-xcat atftp

Install the tftp server needed by xCAT, and restart it

[RH]:

yum install tftp-server.ppc

[SLES]:

zypper install tftp

Restart the tftp server

Notes: make sure the entry "disable=no" in the /etc/xinetd.d/tftp.

service xinetd restart

Setup common attributes for xCAT in the database

The xCAT database table passwd contains default userids and passwords for xCAT to access cluster components. This section will describe how to set the default userids and passwords for system and hmc in xCAT database table.

Add the default account for system

This is the password that will be assigned to root, when nodes are installed.

chtab key=system passwd.username=root passwd.password=cluster

Setup the networks table

A basic networks table was created for you during the xCAT rpm install on the management node. Review that table, remove entries that are not relevant to your cluster management, and add additional networks based on your hardware configuration.

To list the existing network definitions:

lsdef -t network -l

To create an additional network that will be used for cluster management:

mkdef -t network -o net1 net=192.168.0.0 mask=255.255.255.0 gateway=192.168.0.1 mgtifname=eth1
 dhcpserver=192.168.0.1 tftpserver=192.168.0.1 nameservers=192.168.0.1

The makenetworks command helps gather and define network information from the management node, and the configured xCAT service nodes, and automatically saves this information in the xCAT database. Several services need to be configured before the SNs are installed, so you have to add these networks manually in order to add the hfi, ml, fsp, gpfs, etc networks.

Add NTP setup script (optional)

To enable the NTP services on the cluster, first configure NTP on the management node:

 vi /etc/ntp.conf

Set a list of time servers the management node will be using and any other configuration information you desire. If your management node will be a timeserver for the cluster, make sure you have a "restrict" stanza that will allow queries to this server. For example, if your cluster network is 192.168.0.0, you can add:

 restrict 192.168.0.0 mask 255.255.255.0 nomodify notrap

Adjust your system clock to be "close" to the time served by your timeserver. For example, if 0.north-america.pool.ntp.org is one of your timeservers, run:

 ntpdate -u 0.north-america.pool.ntp.org

If your system clock varied greatly from the timeserver, you may need to run ntpdate several times to close the time gap. The "offset" return value tells you how much of a time change was made. Use caution when making large changes to the system clock since this can impact time-sensitive applications running on your management node.

Configure the ntpd service to start at system boot:

  chkconfig ntpd on

Start ntpd:

service ntpd start

Next set the ntpservers attribute in the site table. Whatever time servers are listed in this attribute will be used by all the nodes that boot directly from the management node. In our example, the Management Node will be used as the ntp server.

_chdef -t site ntpservers= myMN_

To have xCAT automatically set up ntp on the cluster nodes you must add the setupntp script to the list of postscripts that are run on the nodes.

To do this you can either modify the postscripts attribute for each node individually or you can just modify the definition of a group that all the nodes belong to.

For example, if all your nodes belong to the group compute, then you could add setupntp to the group definition by running the following command.

 _chdef -p -t group -o compute postscripts=setupntp_

Setup Name Resolution

There are many different ways to set up name resolution for your cluster. You must ensure that all hostnames and IP addresses for your nodes, hmcs, fsps, etc., are resolvable. This section outlines some simple examples for setting up name resolution.

Setup /etc/hosts

Specify all hostnames and IP addresses for your cluster nodes, hmcs, fsps, etc., in your /etc/hosts file:

vi /etc/hosts
127.0.0.1  localhost
192.168.0.1 pmanagenode
192.168.0.10 pnode1
192.168.0.20 pnode2
192.168.0.100 hmc1
       .
       .
       .

Another way to create entries in your /etc/hosts file is by using the xCAT makehosts command. This is useful for large clusters when hostname/IP address mapping follows reqular patterns. However, this requires entries in your xCAT hosts table and node definitions to exist for your cluster. You may wish create a minimal /etc/hosts file now and come back to this step again after you have defined all of your nodes to xCAT.

Setup the nameserver

Add your cluster domain name and the management node's nameserver into /etc/resolv.conf:

vi /etc/resolv.conf
search cluster.net
nameserver 9.112.4.1

Setup the DNS attributes in the Site table

Setup the nameserver for your cluster nodes. In our example, the xCAT management node will be the nameserver for the cluster:

chdef -t site nameservers=192.168.0.1

Setup the external nameserver:

chdef -t site forwarders=9.112.4.1

Setup the local domain name:

chdef -t site domain=cluster.net

Setup DNS configuration

Set up your xCAT management node as a DNS server using named:

makedns
service named start
chkconfig --level 345 named on

Updating the DNS configuration

If you add nodes or update the networks table at a later time, then rerun makedns:

makedns
service named restart

Configure conserver

The xCAT rcons command uses the conserver package to provide support for a single read-write console or multiple read-only consoles on a single node and for console logging. For example, if a user has a read-write console session open on node node1, other users could also log in to that console session on node1 as read-only users. This allows sharing a console server session between multiple users for diagnostic or other collaborative purposes. The console logging function will log the console output and activities for any node with remote console attributes set to the following file which an be replayed using the xCAT replaycons command for debugging or any other purposes:

/var/log/consoles/<os> arch=ppc64 profile=compute

For valid options:

 tabdump -d nodetype

Configure conserver

The xCAT rcons command uses the conserver package to provide support for multiple read-only consoles on a single node and the console logging. For example, if a user has a read-write console session open on node node1, other users could also log in to that console session on node1 as read-only users. This allows sharing a console server session between multiple users for diagnostic or other collaborative purposes. The console logging function will log the console output and activities for any node with remote console attributes set to the following file which an be replayed for debugging or any other purposes:

/var/log/consoles/<management node>

Note: conserver=<management node> is the default, so it optional in the command

Update conserver configuration

Each xCAT node with remote console attributes set should be added into the conserver configuration file to make the rcons work. The xCAT command makeconservercf will put all the nodes into conserver configuration file /etc/conserver.cf. The makeconservercf command must be run when there is any node definition changes that will affect the conserver, such as adding new nodes, removing nodes or changing the nodes' remote console settings.

To add or remove new nodes for conserver support:

makeconservercf
service conserver stop
service conserver start

Check rcons(rnetboot and getmacs depend on it)

The functions rnetboot and getmacs depend on conserver functions, check it is available.

rcons pnode1

If it works ok, you will get into the console interface of the pnode1. If it does not work, review your rcons setup as documented in previous steps.

Check hardware control setup to the nodes

See if you setup is correct at this point, run rpower to check node status:

rpower pnode1 stat

Update the mac table with the address of the node(s)

Before run getmacs, make sure the node is off. The reason is the HMC cannot shutdown linux nodes which are in running state.

You can force the lpar shutdown with:

rpower pnode1 stat, if node is on then run
rpower pnode1 off

If there's only one Ethernet adapter on the node or you have specified the installnic or primarynic attribute of the node, using following command can get the correct mac address.

Check for *nic definition, by running

lsdef pnode1

To set installnic or primarynic:

chdef -t pnode1 -o blade01 installnic=eth0 primarynic=eth1

Get mac addresses:

getmacs pnode1

If there are more than one Ethernet adapters on the node, and you don't know which one has been configured for the installation process, or the lpar is just created and there is no active profile for that lpar, or the lpar is on a P5 system and there is no lhea/sea ethernet adapters, then you have to specify more parameters for the lpar to try to figure out an available interface by using the ping operation. Run this command:

getmacs pnode1 -D -S 192.168.0.1 -G 192.168.0.10

The output looks like following:

pnode1:
Type Location Code MAC Address Full Path Name Ping Result Device Type
ent U9133.55A.10E093F-V4-C5-T1 f2:60:f0:00:40:05 /vdevice/l-lan@30000005 virtual

And the Mac address will be written into the xCAT mac table. Run to verify:

tabdump mac

Update the mac table with the address of the node(s) for Power 775

To set installnic or primarynic:

chdef -t node -o c250f07c04ap13 installnic=hf0 primarynic=hf1

Get mac addresses:

getmacs c250f07c04ap13 -D

Configure DHCP

Add the defined nodes into the DHCP configuration:

 makedhcp c250f07c04ap13

Restart the dhcp service:

 service dhcpd restart

Set up customization scripts (optional)

xCAT supports the running of customization scripts on the nodes when they are installed. You can see what scripts xCAT will run by default by looking at the xcatdefaults entry in the xCAT postscripts database table. The postscripts attribute of the node definition can be used to specify the comma separated list of the scripts that you want to be executed on the nodes. The order of the scripts in the list determines the order in which they will be run.

To check current postscript and postbootscripts setting:

tabdump postscripts

For example, if you want to have your two scripts called foo and bar run on node node01 you could add them to the postscripts table:

_**chdef -t node -o node01 -p postscripts=foo,bar**_

(The -p flag means to add these to whatever is already set.)

For more information on creating and setting up Post*scripts: [Postscripts_and_Prescripts]

Install a Compute Node

Prepare the installation source

You can use the iso file of the installed OS to extract the installation files. For example, you have a iso file /iso/RHEL5.2-Server-20080430.0-ppc-DVD.iso

copycds /iso/RHEL5.2-Server-20080430.0-ppc-DVD.iso

Note: If you encounter the issue that the iso cannot be mounted by the copycds command. Make sure the SElinux is disabled.

Statefull Node installation

OS versus Platform

Before following the next steps of installation, you need to know about the relationship between <os> and <platform>. Normally, <os> is used as the name of the operating system, <platform> is used as one family or platform containing many operating systems. We can think that <platform> contains <os>.

For example, considering RedHat Enterprise Linux 6.0, rhels6 is the <os>, and rh is the <platform>. For SuSE Linux Enterprise Server 11 SP1, sles11.1 is the <os>, and sles is the <platform>.

Note: This naming convention is suitable for the installation of the Stateful/Stateless/Statelite Compute/Service nodes.

Customize the install profile

xCAT uses KickStart or AutoYaST installation templates and related installation scripts to complete the installation and configuration of the compute node.

You can find sample templates for common profiles in following directory:

/opt/xcat/share/xcat/install/&lt;platform&gt;/

If you customize a template then you should copy it to:

/install/custom/install/&lt;platform&gt; directory.

Search order for installation templates

The profile, os and architecure of the node was setup in Set the type attributes of the node above.

To check the setting of your node's profile, os, architecture run:

lsdef pnode1
Object name: pnode1
.
.
.
arch=ppc64
os=rhels5.5
profile=compute

For this example, the search order for the template file is as follows:

The directory /install/custom/install/<platform> will be searched first and then search /opt/xcat/share/xcat/install/<platform>.

Then in the diretory the following order will be honored:

compute.rhels5.5.ppc64.tmpl
compute.rhels5.ppc64.tmpl
compute.rhels.ppc64.tmpl
compute.rhels5.5.tmpl
compute.rhels5.tmpl
compute.rhels.tmpl
compute.ppc64.tmpl
compute.tmpl
  • Customizing templates

If you want to customize a template for node , you should copy the template to the /install/custom/install/<os>/ directory and make your modifications unless you rename your file. You need to copy to the custom directory, so the next install of xCAT will not wipe out your modifications, because it will update the /opt/xcat/share directory. Keep in mind the above search order to make sure it picks up your template.

Note: Sometimes the directory /opt/xcat/share/xcat/install/scripts also needs to be copied to /install/custom/install/ to make the customized profile work, because the customized profiles will need to include the files in scripts directory as the prescripts and postscripts.

For example, you need to put the .otherpkgs.pkglistfile into the /install/custom/install/<os>/ directory, if you need to install other packages.

Install other specific packages

If you want to install a specific package like a specific .rpm onto the compute node, copy the rpm into the following directory:

/install/post/otherpkgs/&lt;os&gt;/&lt;arch&gt;

Another thing you MUST DO is to create repodata for this directory. You can use the "createrepo" command to create repodata.

On RHEL5.x, the "createrepo" rpm package can be found in the install ISO; on SLES11, it can be found in SLE-11-SDK-DVD Media 1 ISO.

After "createrepo" is installed, you need to create one text file which contains the complete list of files to include in the repository. For example, the name of the text file is rpms.list in /install/post/otherpkgs/<os>/<arch> directory. Create rpms.list:

cd /install/post/otherpkgs/&lt;os&gt;/&lt;arch&gt;
ls *.rpm &gt;rpms.list

Then, please run the following command to create the repodata for the newly-added packages:

createrepo -i rpms.list /install/post/otherpkgs/&lt;os&gt;/&lt;arch&gt;

The createrepo command with -i rpms.list option will create the repository for the rpm packages listed in the rpms.list file. It won't destroy or affect the rpm packages that are in the same directory, but have been included into another repository.

Or, if you create a sub-directory to contain the rpm packages, for example, named other in /install/post/otherpkgs/<os>/<arch>. Please run the following command to create repodata for the directory /install/post/otherpkgs/<os>/<arch>/.

createrepo /install/post/otherpkgs/&lt;os&gt;/&lt;arch&gt;/**other**

Note: Please replace other with your real directory name.

Set the node status to ready for installation

nodeset pnode1 install

Use network boot to start the installation

rnetboot pnode1

Check the installation results

After the node installation is completed successfully, the node's status will be changed to booted, the following command to check the node's status:

lsdef pnode1 -i status

When the node's status is changed to booted, you can also check ssh service on the node is working and you can login without password. Note: Do not run ssh or xdsh against the node until the node installation is completed successfully. Running ssh or xdsh against the node before the node installation completed may result in ssh hostkeys issues.

If ssh is working but cannot login without password, force exchange the ssh key to the compute node using xdsh:

xdsh pnode1 -K

After exchanging ssh key, following command should work, without being prompted for a password.

xdsh pnode1 date

Install a new Kernel on the nodes

Using a postinstall script ( you could also use the updatenode method):

mkdir /install/postscripts/data
cp &lt;kernel&gt; /install/postscripts/data

Create the postscript updatekernel:

vi /install/postscripts/updatekernel
#!/bin/bash
rpm -Uivh data/kernel-*rpm
chmod 755 /install/postscripts/updatekernel

Add the script to the postscripts table and run the install:

chdef -p -t group -o compute postscripts=updatekernel
rinstall compute

Stateless Node Deployment

Generate the stateless image for compute node

Typically, you can build your stateless compute node image on the Management Node, if it has the same OS and architecture as the node. If you need another OS image or architecture than the OS installed on the Management Node, you will need a machine that meets the OS and architecture you want for the image and create the image on that node.

  • If the stateless image you are building doesn't match the OS/architecture of the Management Node, logon to the node with the desired architecture.

    ssh <node>
    mkdir /install
    mount xcatmn:/install /install ( make sure the mount is rw)

Make the compute node add/exclude packaging list

The default list of rpms to added or exclude to the diskless images is shipped in the following directory:

/opt/xcat/share/xcat/netboot/&lt;platform&gt;

If you want to modify the current defaults for .pkglist or .exlist or *.postinstall, copy the shipped default lists to the following directory, so your modifications will not be removed on the next xCAT rpm update. xCAT will first look in the custom directory for the files before going to the share directory.

/install/custom/netboot/&lt;platform&gt; directory

If you want to exclude more packages, add them into the following exlist file:

/install/custom/netboot/&lt;platform&gt;/&lt;profile&gt;.exlist

Add more packages names that need to be installed on the stateless node into the pkglist file

/install/custom/netboot/&lt;platform&gt;/&lt;profile&gt;.pkglist

Setting up postinstall files

There are rules ( release 2.4 or later) for which * postinstall files will be selected to be used by genimage.

If you are going to make modifications, copy the appropriate /opt/xcat/share/xcat/netboot/<platform>/*postinstall file to the

/install/custom/netboot/<platform> directory:

cp opt/xcat/share/xcat/netboot/&lt;platform&gt;/*postinstall /install/custom/netboot/&lt;platform&gt;/.

Use these basic rules to edit the correct file in the /install/custom/netboot/<platform> directory. The rule allows you to customize your image down to the profile, os and architecture level, if needed.

You will find postinstall files of the following formats and genimage* will process the files in the order of the below formats:

&lt;profile&gt;.&lt;os&gt;.&lt;arch&gt;.postinstall
&lt;profile&gt;.&lt;arch&gt;.postinstall
&lt;profile&gt;.&lt;os&gt;.postinstall
&lt;profile&gt;.postinstall

This means, if "<profile>.<os>.<arch>.postinstall" is there, it will be used first.

  • If there is no such a file, then the "<profile>.<arch>.postinstall" file will be used.
  • If there's no such a file , then the "<profile>.<os>.postinstall" file will be used.
  • If there is no such file, then it will use "<profile>.postinstall".

Make sure you have the basic postinstall script setup in the directory to run for your genimage. The one shipped will setup fstab and rcons to work properly and is required.

You can add more postinstall process ,if you want. The basic postinstall script (2.4) will be named <profile>.<arch>.postinstall ( e.g. compute.ppc64.postinstall). You can create one for a specific os by copying the shipped one to , for example, compute.rhels5.4.ppc64.postinstall

Note: you can use the sample here: /opt/xcat/share/xcat/netboot/<platform>/

[RH]:

Add following packages name into the <profile>.pkglist

bash
nfs-utils
stunnel
dhclient
kernel
openssh-server
openssh-clients
busybox-anaconda
wget
vim-minimal
ntp

You can add any other packages that you want to install on your compute node. For example, if you want to have userids with passwords you should add the following:

cracklib
libuser
passwd

[SLES11]:

Add following packages name into the <profile>.pkglist

aaa_base
bash
nfs-utils
dhcpcd
kernel
openssh
psmisc
wget
sysconfig
syslog-ng
klogd
vim

Configure postinstall files for Power 775 (Optional)

The HFI kernel can be installed by xCAT automatically, but other packages, such as hfi_util and nettools, require the rpm options --nodeps or --force respectively, which xCAT cannot handle automatically. So we need to modify the postinstall file manually to install those packages during the diskless image generation.

Add the following lines to /install/custom/netboot/rh/compute.rhels6.ppc64.postinstall. (rhels6 stands for the OS version - it should be the same as the previous step.)

cp /hfi/dd/* /install/netboot/rhels6/ppc64/compute/rootimg/tmp/
chroot /install/netboot/rhels6/ppc64/compute/rootimg/ /bin/rpm -ivh /tmp/dhclient-4.1.1-13.P1.el6.ppc64.rpm --force
chroot /install/netboot/rhels6/ppc64/compute/rootimg/ /bin/rpm -ivh /tmp/dhcp-4.1.1-13.P1.el6.ppc64.rpm --force
chroot /install/netboot/rhels6/ppc64/compute/rootimg/ /bin/rpm -ivh /tmp/kernel-headers- 2.6.32-71.el6.20110617.ppc64.rpm --force
chroot /install/netboot/rhels6/ppc64/compute/rootimg/ /bin/rpm -ivh /tmp/net-tools-1.60-102.el6.ppc64.rpm --force
chroot /install/netboot/rhels6/ppc64/compute/rootimg/ /bin/rpm -ivh /tmp/hfi_ndai-1.2-0.el6.ppc64.rpm --force
chroot /install/netboot/rhels6/ppc64/compute/rootimg/ /bin/rpm -ivh /tmp/hfi_util-1.12-0.el6.ppc64.rpm --force

Run image generation

[RHEL]:

cd /opt/xcat/share/xcat/netboot/rh
./genimage -i eth0 -n ibmveth -o rhels5.2 -p compute

[SLES11]:

cd /opt/xcat/share/xcat/netboot/sles
./genimage -i eth0 -n ibmveth -o sles11 -p compute

Run image generation for Power 775

[RHEL]: In Power 775, there is a HFI enabled kernel required in the diskless image. The compute nodes could boot from this customized kernel and boot from HFI interfaces.

cp /hfi/dd/kernel-2.6.32-71.el6.20110617.ppc64.rpm /install/kernels/
cd /opt/xcat/share/xcat/netboot/rh/ 
./genimage -i hf0 -n hf_if -o rhels6 -p compute -k 2.6.32-71.el6.20110617.ppc64

Sync /etc/hosts to the diskless image for Power 775 (Optional)

This is used by the postscript hficonfig to configure all the HFI interfaces on the compute nodes. Setup a synclist file containing this line:

/etc/hosts -&gt; /etc/hosts

The file can be put anywhere, but let's assume you name it /tmp/synchosts.

Make sure you have an OS image object in the xCAT database associated with your nodes and issue command:

xdcp -i &lt;imagepath&gt; -F /tmp/synchosts

<imagepath> stands for the OS image path. Generally it will be located in /install/netboot direcotry, for example, if the image path is /install/netboot/rhels6/ppc64/compute/rootimg, the command will be:

xdcp -i /install/netboot/rhels6/ppc64/compute/rootimg -F /tmp/synchosts

Pack the image

[RHEL]:

packimage -o rhels5.2 -p compute -a ppc64

[SLES]:

packimage -o sles11 -p compute -a ppc64

Set the node status ready for network boot

nodeset pnode2 netboot

Use network boot to start the installation

rnetboot pnode2

Check the installation result

After the node installation is completed successfully, the node's status will be changed to booted, the following command to check the node's status:

lsdef pnode2 -i status

When the node's status is changed to booted, you can also check ssh service on the node is working and you can login without password.

Note: Do not run ssh or xdsh against the node until the node installation is completed successfully. Running ssh or xdsh against the node before the node installation completed may result in ssh hostkeys issues.

If ssh is working but cannot login without password, force exchange the ssh key to the compute node using xdsh:

xdsh pnode2 -K

After exchanging ssh key, following command should work.

xdsh pnode2 date

Installing a new Kernel in the stateless image

The kerneldir attribute in linuximage table is used to assign one directory to hold the new kernel to be installed into the stateless/statelite image. Its default value is /install/kernels, you need to create the directory named <kernelver> under the kerneldir, and genimage will pick them up from there.

Assuming you have the kernel in RPM format in /tmp, the value of kerneldir is not set (which will take the default value: /install/kernels).

Before the 2.6.1, the content in the kernel packages need to be extracted out for genimage to take. In the version 2.6.1 and later, the kernel will be installed directly from rpm packages.

  • For RHEL:

The kernel RPM package is usually named kernel-<kernelver>.rpm, for example: kernel-2.6.32.10-0.5.ppc64.rpm is the kernel package for 2.6.32.10-0.5.ppc64.

2.6.1 and later

cp /tmp/kernel-2.6.32.10-0.5.ppc64.rpm /install/kernels/

Before 2.6.1

mkdir -p /install/kernels/2.6.32.10-0.5.ppc64
cd /install/kernels/2.6.32.10-0.5.ppc64
rpm2cpio /tmp/kernel-2.6.32.10-0.5.ppc64.rpm |cpio -idum
  • For SLES:

Usually, the kernel files for SLES are separated into two parts: kernel-<arch>-base and kernel, and the naming of kernel RPM packages are different. For example, there's two RPM packages in /tmp:

kernel-ppc64-base-2.6.27.19-5.1.ppc64.rpm
kernel-ppc64-2.6.27.19-5.1.ppc64.rpm

2.6.27.19-5.1.ppc64 is NOT the kernel version. 2.6.27.19-5-ppc64 is the kernel version actually. Please follow the naming rule to determine the kernel version.

After the kernel version is determined for SLES, then:

2.6.1 and later

cp /tmp/kernel-ppc64-base-2.6.27.19-5.1.ppc64.rpm /install/kernels/
cp /tmp/kernel-ppc64-2.6.27.19-5.1.ppc64.rpm /install/kernels/

Before 2.6.1

mkdir -p /install/kernels/2.6.27.19-5-ppc64
cd /install/kernels/2.6.27.19-5-ppc64
rpm2cpio /tmp/kernel-ppc64-base-2.6.27.19-5.1.ppc64.rpm |cpio -idum
rpm2cpio /tmp/kernel-ppc64-2.6.27.19-5.1.ppc64.rpm |cpio -idum

Run genimage/packimage to update the image with the new kernel: (Use sles as example)

2.6.1 and later

Since the kernel version is different with the rpm package version, the -g flag needs to be specified for the rpm version of kernel packages.

genimage -i eth0 -n ibmveth -o sles11.1 -p compute -k 2.6.27.19-5-ppc64 -g 2.6.27.19-5.1

Before 2.6.1

genimage -i eth0 -n ibmveth -o sles11.1 -p compute -k 2.6.27.19-5-ppc64
packimage -o sles11.1 -p compute -a ppc64

Reboot the node with the new image:

nodeset pnode2 netboot
rnetboot pnode2

To show the new kernel, run:

xdsh pnode2 uname -a

Remove an image

If you want to remove an image, rmimage is used to remove the Linux stateless or statelite image from the file system. It is better to use this command than just remove the filesystem yourself, because it also remove appropriate links to real files system that may be distroyed on your Management Node, if you just use the rm -rf command.

You can specify the <os>, <arch> and <profile> value to the rmimagecommand:

rmimage -o &lt;os&gt; -a &lt;arch&gt; -p &lt;profile&gt;

Or, you can specify one imagename to the command:

rmimage &lt;imagename&gt;

Statelite Node Deployment

Statelite is an xCAT feature which allows you to have mostly stateless nodes (for ease of management), but tell xCAT that just a little bit of state should be kept in a few specific files or directories that are persistent for each node. If you would like to use this feature, refer to the [XCAT_Linux_Statelite] documentation.

Advanced features

Use the driver update disk:

Refer to [Using_Linux_Driver_Update_Disk].

Setup Kdump Service over Ethernet/HFI on diskless Linux (for xCAT 2.6 and higher)

Ovewview

Kdump is a kexec-based kernel crash dumping mechanism for Linux.Currently i386, x86_64 and ppc64 ports of kdump are available, and the mainstream distrobutions including Fedora, RedHat Enterprise Linux and SuSE Linux Enterprise Server have shipped the kdump rpm packages.

Update the .pkglist file

For RHELS6 or other Linux OSes, there are two rpm packages for kdump, which are

 kexec-tools, crash

Before creating the stateless/statelite Linux root images with kdump enabled, please add these two rpm packages into the <profile>.<os>.<arch>.pkglist file.

Update "dump" attribute for Linux image

For Linux images, the new attribute called dump is introduced to linuximage table, and it is used to define the remote NFS path where the crash information is dumped to.

The format of dump follows the standard URI format, since currently only NFS protocol is supported, its value should be set to:

 nfs://&lt;nfs_server_ip&gt;/&lt;kdump_path&gt;

If you intend to use the Service Node or Management Node (if no service node is avaiable) as the nfs_server, you can ignore the <nfs_server_ip> field in the dump vlaue, and the value cat be set to:

 nfs:///&lt;kdump_path&gt;

which treats the node's SN/MN as the NFS server for kdump service.

Based on <profile>, <os> and <arch>, there should be the definitions for two Linux images, one is for diskless, the other one is for statelite, which are:

 &lt;os&gt;-&lt;arch&gt;-netboot-&lt;profile&gt;
 &lt;os&gt;-&lt;arch&gt;-statelite-&lt;profile&gt;

For diskless image, set the value of dump attribute for diskless image by the following command:

 chdef -t osimage &lt;os&gt;-&lt;arch&gt;-netboot-&lt;profile&gt; dump=nfs://&lt;nfs_server_ip&gt;/&lt;kdump_path&gt;

For example, the image name is rhels6-ppc64-netboot-compute, the NFS server used for kdump is 10.1.0.1, and the path on the NFS server is /install/kdump, you can set the value by:

 chdef -t osimage rhels6-ppc64-netboot-compute dump=nfs://10.1.0.1/install/kdump

For statelite image, Set the value of dump attribute for statelite image by the following command:

 chdef -t osimage &lt;os&gt;-&lt;arch&gt;-statelite-&lt;profile&gt; dump=nfs://&lt;nfs_server_ip&gt;/&lt;kdump_path&gt;

Note: If there are no such osimage definitions called <os>-<arch>-netboot-<profile> or <os>-<arch>-statelite-<profile> in linuximage table, please update the dump attribute after the genimage command is executed.

Please make sure that the remote NFS path(nfs://<nfs_server_ip>/<kdump_path>) for the dump attribute is write-able.

Once the kernel panic is triggered, the node will reboot into the capture kernel and the kernel dump (vmcore) will be automatically saved to the <kdump_path>/var/crash/<node_ip>-<time>/ directory on the specified NFS server (<nfs_server_ip>), you do not need to create the /var/crash/ directory under the NFS path(nfs://<nfs_server_ip>/<kdump_path>), which will be created by the kdump service when saving the crash information.

Special Notes for Hierarchical Cluster

In most cases, the service nodes in Linux hierarchical cluster will automatically mount the /install directory from the management node unless the installloc attribute in site table is left blank.

If the installloc attribute is set in site table and the serivce node is chosen to be the remote NFS server for the kdump service, the default value of dump will be not suitable, so you must assign one individual path (which should NOT be in /install/ directory on service node) to the dump attribute. And please make sure the individual path is exported and write-able to the compute nodes.

For example, you can create one directory called /kdump on the service node, and export it with write-able permission, then set the dump attribute as the following value:

 nfs://&lt;service_node_ip&gt;/kdump

OR:

 nfs:///kdump

Edit litefile table (for statelite only)

This step is for statelite only.

For the statelite images, the /boot/ directory and the /etc/kdump.conf file should be added into the litefile table.

You can use the tabedit litefile command to update the litefile table. After they are added into the table, there should be two new entries as the following:

 "ALL","/etc/kdump.conf",,,
 "ALL","/boot/",,,

Edit the .exlist file (for diskless only)

The <profile>.exlist file is located in the /opt/xcat/share/xcat/netboot/<os_platform>/ directory. In order to create your own .exlist file, you should copy it to /install/custom/netboot/<osplatform>, and update the <profile>.exlist file in /install/custom/netboot/<osplatform>.

The kdump service needs to create its own initial ramdisk under the /boot/ directory of the rootimg, so the line which contains

/boot*

MUST be removed from <profile>.exlist.

Add "enablekdump" as the postscript

In order to enable the kdump service for the specified node/nodegroup, you should add the enablekdump postscript by running the following command:

 chdef &lt;noderange&gt; -p postscripts=enablekdump

If enablekdump is not added as the postscript, the enablekdump postscript will not be run and the kdump service will fail to start when the node/nodegroup is booting up.

Generate rootimage for diskless/statelite

Please follow Stateless_node_installation to generate the diskless rootimg; and please follow Create_Statelite_Image to generate the statelite image.

The Rest Steps

Please follow the documents including xCAT_pLinux_Clusters and xCAT_Linux_Statelite to setup the diskless/statelite image, and to make the specified noderange booting with the diskless/statelite image.

Additional configuration

After noderange booted up with the diskless/statelite image, add a dynamic ip range into networks table to the network used for compute node installation. This dynamic ip range should be large enough accommodate all of the nodes on the network. For example:

 #netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,nodehostname,ddnsdomain,vlanid,comments,disable
 "hfinet","20.0.0.0","255.0.0.0","hf0","20.7.4.1","20.7.4.1","20.7.4.1","20.7.4.1",,,"20.7.4.100-20.7.4.200",,,,,

References


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.