xCAT Wiki

An extreme cluster/cloud administration toolkit

Brought to you by: besawn, cxhong, gurevich, obihoernchen, victorhu

XCAT_system_p_support_for_IBM_Flex

Introduction
- Terminology
- Command Man Pages and Database Attribute Descriptions
Prepare the Management Node for xCAT Installation
Install xCAT on the Management Node
Downloading and Installing DFM
Use xCAT to Configure Services on the Management Node
- Setup /etc/hosts File
- Setup DNS
- Setup DHCP
Define the CMMs and Switches
CMM Discovery and Configuration
Create node object definitions of flex blade servers
- Create predefined nodes in the Database first and run rscan -u to Discovery
  - Run rscan -u to discover all the compute node servers.
- Create object definition of flex blades by discovery using stanza files
- Set the network configuration for the fsp
- Modify blade server device names
Create the hardware server connection for the IBM Flex power 7 server
- Update the FSP firmware (optional)
Prepare for Node Deployment
Update the IBM Flex Power 7 Server firmware
Deploying an OS on the Blades
Deploying Stateless Nodes
Installing Stateful Nodes
Appendix 1: IBM Flex Recovery and CMM Redundancy
- Replacement of CMM
- CMM Redundancy
Appendix 2: CMM and Flexible Service Processor(FSP) password
- Errors caused by an FSP authentication problem
Appendix 3: Updating Firmware on Flex Ethernet and IB Switch Modules
- Firmware Update using CLI
**Appendix 4 Perform Deferred Firmware upgrades for Flex blade CEC **
- Deferred firmware update Background
- temp/perm side, pending_power_on_side attributes in Deferred firmware update
- **The procedure of the deferred firmware update **
Appendix 5 lshwconn LINE DOWN after power outage
Appendix 6: Migrate your Management Node to a new Service Pack of Linux
Appendix 7: Install your Management Node to a new Release of Linux

Introduction

IBM Flex combines networking, storage and servers in a single offering. It's consist of an IBM Flex Chassis, one or two Chassis Management Modules(CMM) and Power 7 and x86 compute node servers. The type of the management module for IBM Flex is 'cmm', and the Power 7 compute node servers include the IBM Flex System. p260, p460, and 24L Power 7 servers as well as the IBM Flex System. x240 Compute Node which is an x86 Intel-processor based server. In this document only the management of POWER 7 servers running Linux will be covered.

IBM Flex System. p260, p460, and 24L Power 7 servers need to be managed by a xCAT Management Node (MN) which is to be created on a standalone System P7 server. There needs to be a proper ethernet network communication between the xCAT MN to the CMMs, and to all the compute node through the Ethernet Switch Module. The xCAT support uses the hardware type 'hwtype=blade' to manage the P7 Flex blade servers working through the CMM management module). IBM Flex xCAT will use a management type of 'mgt=fsp' to control the POWER 7 servers which is done through the xCAT DFM (Direct FSP Management)). For xCAT IBM Flex Power 7 servers, the management approach is mixture of 'blade' and 'fsp'. Most of the discovery work will be done through CMM and the hardware management work with the server's FSP directly.

Terminology

The following terms will be used in this document:

xCAT DFM: Direct FSP Management is the name that we will use to describe the ability for xCAT software to communicate directly to the IBM FLex Power 7 server's service processor without the use of the HMC for management.

Chassis Management Module(CMM) - this term is used to reflect the pair of management modules installed in the rear of the chassis which have an Ethernet connection. The CMM is used to discover the servers within the chassis and for some data collection regarding the servers and chassis.

Compute node: This term is used to refer to the servers in an IBM Flex system. Compute nodes can be either Power 7 servers or x86 Intel based servers.

blade node: blade node refers to a node with the hwtype set to blade and represents the whole blade server. And the hcp attribute of the blade is set to the FSP's IP.

Command Man Pages and Database Attribute Descriptions

All of the commands used in this document are described in the xCAT man pages.
All of the database attributes referred to in this document are described in the xCAT database object and table descriptions.

Prepare the Management Node for xCAT Installation

These steps prepare the Management Node for xCAT Installation.

Install the Management Node OS
Supported OS and Hardware
[RH] Ensure that SELinux is Disabled
Disable the Firewall
Set Up the Networks
Configure NICS
Prevent DHCP client from overwriting DNS configuration (Optional)
Configure hostname
Setup basic hosts file
Setup the TimeZone
Create a Separate File system for /install (optional)
Restart Management Node

Install the Management Node OS

Install one of the supported distros on the Management Node (MN). It is recommended to ensure that dhcp, bind (not bind-chroot), httpd, nfs-utils, and perl-XML-Parser are installed. (But if not, the process of installing the xCAT software later will pull them in, assuming you follow the steps to make the distro RPMs available.)

Hardware requirements for your xCAT management node are dependent on your cluster size and configuration. A minimum requirement for an xCAT Management Node or Service Node that is dedicated to running xCAT to install a small cluster ( < 16 nodes) should have 4-6 Gigabytes of memory. A medium size cluster, 6-8 Gigabytes of memory; and a large cluster, 16 Gigabytes or more. Keeping swapping to a minimum should be a goal.

Supported OS and Hardware

For a list of supported OS and Hardware, refer to XCAT_Features.

[RH] Ensure that SELinux is Disabled

To disable SELinux manually:

 echo 0 > /selinux/enforce
 sed -i 's/^SELINUX=.*$/SELINUX=disabled/' /etc/selinux/config

Disable the Firewall

Note: you can skip this step in xCAT 2.8 and above, because xCAT does it automatically when it is installed.

The management node provides many services to the cluster nodes, but the firewall on the management node can interfere with this. If your cluster is on a secure network, the easiest thing to do is to disable the firewall on the Management Mode:

For RH:

 service iptables stop
 chkconfig iptables off

For SLES:

 SuSEfirewall2 stop

If disabling the firewall completely isn't an option, configure iptables to allow the ports described in XCAT_Port_Usage.

Set Up the Networks

The xCAT installation process will scan and populate certain settings from the running configuration. Having the networks configured ahead of time will aid in correct configuration. (After installation of xCAT, all the networks in the cluster must be defined in the xCAT networks table before starting to install cluster nodes.) When xCAT is installed on the Management Node, it will automatically run makenetworks to create an entry in the networks table for each of the networks the management node is on. Additional network configurations can be added to the xCAT networks table manually later if needed.

The networks that are typically used in a cluster are:

Management network - used by the management node to install and manage the OS of the nodes. The MN and in-band NIC of the nodes are connected to this network. If you have a large cluster with service nodes, sometimes this network is segregated into separate VLANs for each service node. See Setting Up a Linux Hierarchical Cluster for details.
Service network - used by the management node to control the nodes out of band via the hardware control point, e.g. BMC or HMC. If the BMCs are configured in shared mode, then this network can be combined with the management network.
Application network - used by the HPC applications on the compute nodes. Usually an IB network.
Site (Public) network - used to access the management node and sometimes for the compute nodes to provide services to the site.

In our example, we only focus on the management network:

The service network usually does not need special configuration, just the management node and service nodes need to communicate with the hardware control points through service network. In system x cluster, if the BMCs are in shared mode, so they don't need a separate service network.
we are not showing how to have xCAT automatically configure the application network NICs. See Configuring_Secondary_Adapters if you are interested in that.
under normal circumstances there is no need to put the site network in the networks table

For a sample Networks Setup, see the following example: Setting_Up_a_Linux_xCAT_Mgmt_Node#Appendix_A:_Network_Table_Setup_Example

Configure NICS

Configure the cluster facing NIC(s) on the management node.
For example edit the following files:

On RH: /etc/sysconfig/network-scripts/ifcfg-eth1
On SLES: /etc/sysconfig/network/ifcfg-eth1

 DEVICE=eth1
 ONBOOT=yes
 BOOTPROTO=static
 IPADDR=172.20.0.1
 NETMASK=255.240.0.0

Prevent DHCP client from overwriting DNS configuration (Optional)

If the public facing NIC on your management node is configured by DHCP, you may want to set '''PEERDNS=no''' in the NIC's config file to prevent the dhclient from rewriting /etc/resolv.conf. This would be important if you will be configuring DNS on the management node (via makedns - covered later in this doc) and want the management node itself to use that DNS. In this case, set '''PEERDNS=no''' in each /etc/sysconfig/network-scripts/ifcfg-* file that has '''BOOTPROTO=dhcp'''.

On the other hand, if you '''want''' dhclient to configure /etc/resolv.conf on your management node, then don't set PEERDNS=no in the NIC config files.

Configure hostname

The xCAT management node hostname should be configured before installing xCAT on the management node. The hostname or its resolvable ip address will be used as the default master name in the xCAT site table, when installed. This name needs to be the one that will resolve to the cluster-facing NIC. Short hostnames (no domain) are the norm for the management node and all cluster nodes. Node names should never end in "-enx" for any x.

To set the hostname, edit /etc/sysconfig/network to contain, for example:

 HOSTNAME=mgt

If you run hostname command, if should return the same:

 # hostname
 mgt

Setup basic hosts file

Ensure that at least the management node is in /etc/hosts:

 127.0.0.1               localhost.localdomain localhost
 ::1                     localhost6.localdomain6 localhost6
 ###
 172.20.0.1 mgt mgt.cluster

Setup the TimeZone

When using the management node to install compute nodes, the timezone configuration on the management node will be inherited by the compute nodes. So it is recommended to setup the correct timezone on the management node. To do this on RHEL, see http://www.redhat.com/advice/tips/timezone.html. The process is similar, but not identical, for SLES. (Just google it.)

You can also optionally set up the MN as an NTP for the cluster. See Setting_up_NTP_in_xCAT.

Create a Separate File system for /install (optional)

It is not required, but recommended, that you create a separate file system for the /install directory on the Management Node. The size should be at least 30 meg to hold to allow space for several install images.

Restart Management Node

Note: in xCAT 2.8 and above, you do not need to restart the management node. Simply restart the cluster-facing NIC, for example: ifdown eth1; ifup eth1

For xCAT 2.7 and below, though it is possible to restart the correct services for all settings, the simplest step would be to reboot the Management Node at this point.

Note: for Flex hardware, the switch configuration is only needed to discover (really to locate) the CMMs. The location of each blade is determined by the CMMs.

Install xCAT on the Management Node

Get the xCAT Installation Source

There are two options to get the installation source of xCAT:

download the xCAT installation packages
or install directly from the internet-hosted repository

Pick either one, but not both.

Note:
1. Due to the packages "net-snmp-libs" and "net-snmp-agent-libs"(required by "net-snmp-perl" in xcat-dep) are updated in Redhat 7.1 iso, a xcat-dep branch for Redhat 7.0 is created. Thus, please use the repo under "xcat-dep/rh7.0" for Redhat 7.0 and use "xcat-dep/rh7" for other Redhat 7 releases.
2. for CentOS and ScientificLinux, could use the same xcat-dep configuration with RHEL. For example, CentOS 7.0 could use xcat-dep/rh7.0/x86_64 as the xcat-dep repo.

Option 1: Prepare for the Install of xCAT without Internet Access

If not able to, or not want to, use the live internet repository, choose this option.

Go to the Download xCAT site and download the level of xCAT tarball you desire. Go to the xCAT Dependencies Download page and download the latest snap of the xCAT dependency tarball. (The latest snap of the xCAT dependency tarball will work with any version of xCAT.)

Copy the files to the Management Node (MN) and untar them:

    mkdir /root/xcat2
    cd /root/xcat2
    tar jxvf xcat-core-2.*.tar.bz2     # or core-rpms-snap.tar.bz2
    tar jxvf xcat-dep-*.tar.bz2

Point yum/zypper to the local repositories for xCAT and its dependencies:

[RH]:

    cd /root/xcat2/xcat-dep/<release>/<arch>;
    ./mklocalrepo.sh
    cd /root/xcat2/xcat-core
    ./mklocalrepo.sh

[SLES 11, SLES12]:

     zypper ar file:///root/xcat2/xcat-dep/<os>/<arch> xCAT-dep 
     zypper ar file:///root/xcat2/xcat-core  xcat-core

[SLES 10.2+]:

    zypper sa file:///root/xcat2/xcat-dep/sles10/<arch> xCAT-dep
    zypper sa file:///root/xcat2/xcat-core xcat-core

Option 2: Use the Internet-hosted xCAT Repository

When using the live internet repository, you need to first make sure that name resolution on your management node is at least set up enough to resolve sourceforge.net. Then make sure the correct repo files are in /etc/yum.repos.d.

Internet repo for xCAT-core

You could use the official release or latest snapshot build or development build, based on your requirements.

To get the repo file for the current official release:

[RH]:

wget http://sourceforge.net/projects/xcat/files/yum/<xCAT-release>/xcat-core/xCAT-core.repo

for example:

    cd /etc/yum.repos.d
    wget http://sourceforge.net/projects/xcat/files/yum/2.8/xcat-core/xCAT-core.repo

[SLES11, SLES12]:

zypper ar -t rpm-md http://sourceforge.net/projects/xcat/files/yum/<xCAT-release\>/xcat-core xCAT-core

for example:

    zypper ar -t rpm-md http://sourceforge.net/projects/xcat/files/yum/2.8/xcat-core xCAT-core

[SLES10.2+]:

    zypper sa http://sourceforge.net/projects/xcat/files/yum/<xCAT-release\>/xcat-core xCAT-core

for example:

     zypper sa http://sourceforge.net/projects/xcat/files/yum/2.8/xcat-core xCAT-core

To get the repo file for the latest snapshot build, which includes the latest bug fixes, but is not completely tested:

[RH]:

wget http://sourceforge.net/projects/xcat/files/yum/<xCAT-release>/core-snap/xCAT-core.repo

for example:

    cd /etc/yum.repos.d
    wget http://sourceforge.net/projects/xcat/files/yum/2.8/core-snap/xCAT-core.repo

[SLES11, SLES12]:

zypper ar -t rpm-md http://sourceforge.net/projects/xcat/files/yum/<xCAT-release>/core-snap xCAT-core

for example:

    zypper ar -t rpm-md http://sourceforge.net/projects/xcat/files/yum/2.8/core-snap xCAT-core

[SLES10.2+]:

    zypper sa http://sourceforge.net/projects/xcat/files/yum/<xCAT-release\>/core-snap xCAT-core

for example:

     zypper sa http://sourceforge.net/projects/xcat/files/yum/2.8/core-snap xCAT-core

To get the repo file for the latest development build, which is the snap shot build of the new version we are actively developing. This version has not been released yet. Use at your own risk:

[RH]:

    wget http://sourceforge.net/projects/xcat/files/yum/devel/core-snap/xCAT-core.repo

[SLES11, SLES12]:

    zypper ar -t rpm-md http://sourceforge.net/projects/xcat/files/yum/devel/core-snap xCAT-core

[SLES10.2+]:

    zypper sa http://sourceforge.net/projects/xcat/files/yum/devel/core-snap xCAT-core

Internet repo for xCAT-dep

To get the repo file for xCAT-dep packages:

** [RH]:**

wget http://sourceforge.net/projects/xcat/files/yum/xcat-dep/<OS-release>/<arch>/xCAT-dep.repo

for example:

wget http://sourceforge.net/projects/xcat/files/yum/xcat-dep/rh6/x86_64/xCAT-dep.repo

[SLES11, SLES12]:

zypper ar -t rpm-md http://sourceforge.net/projects/xcat/files/yum/xcat-dep/<OS-release>/<arch> xCAT-dep

for example:

zypper ar -t rpm-md http://sourceforge.net/projects/xcat/files/yum/xcat-dep/sles11/x86_64 xCAT-dep

[SLES10.2+]:

zypper sa http://sourceforge.net/projects/xcat/files/yum/xcat-dep/<OS-release\>/<arch\> xCAT-dep

for example:

zypper sa http://sourceforge.net/projects/xcat/files/yum/xcat-dep/sles10/x86_64 xCAT-dep

For both Options: Make Required Packages From the Distro Available

xCAT uses on several packages that come from the Linux distro. Follow this section to create the repository of the OS on the Management Node.

See the following documentation:

Setting Up the OS Repository on the Mgmt Node

Install xCAT Packages

[RH]: Use yum to install xCAT and all the dependencies:

yum clean metadata

or
yum clean all

then
yum install xCAT

[SLES]Use zypper to install xCAT and all the dependencies:

zypper install xCAT

(Optional) Install the Packages for sysclone

Note:syslcone is not supported on SLES.

In xCAT 2.8.2 and above, xCAT supports cloning new nodes from a pre-installed/pre-configured node, we call this provisioning method as sysclone. It leverages the opensource tool systemimager. xCAT ships the required systemimager packages with xcat-dep. If you will be installing stateful(diskful) nodes using the sysclone provmethod, you need to install systemimager and all the dependencies:

[RH]: Use yum to install systemimager and all the dependencies:

yum install systemimager-server

[SLES]: Use zypper to install systemimager and all the dependencies:

zypper install systemimager-server

Quick Test of xCAT Installation

Add xCAT commands to the path by running the following:

source /etc/profile.d/xcat.sh

Check to see the database is initialized:

tabdump site

The output should similar to the following:

    key,value,comments,disable
    "xcatdport","3001",,
    "xcatiport","3002",,
    "tftpdir","/tftpboot",,
    "installdir","/install",,
         .
         .
         .

If the tabdump command does not work, see Debugging xCAT Problems.

Restart or Reload xcatd

If you really encountered certain problem that xcat daemon failed to function, you can try to restart the xcat daemon.

[For xcat daemon is running on NON-systemd enabled Linux OS like rh6.x and sles11.x]

    service xcatd restart

[For xcat daemon is running on systemd enabled Linux OS like rh7.x and sles12.x. And AIX.]

    restartxcatd

Refer to the doc of restartxcatd to get the information why you need to use it for systemd enabled system.

If you want to restart xcat daemon but do not want to reconfigure the network service on the management (this will restart xcat daemon quickly for a large cluster).

[For xcat daemon is running on NON-systemd enabled Linux OS like rh6.x and sles11.x]

    service xcatd reload

[For xcat daemon is running on systemd enabled Linux OS like rh7.x and sles12.x. And AIX.]

    restartxcatd -r

If you want to rescan plugin when you added a new plugin, or you changed the subroutine handled_commands of certain plugin.

    rescanplugins

Updating xCAT Packages Later

If you need to update the xCAT RPMs later:

If the management node does not have access to the internet: download the new version of xCAT from Download xCAT and the dependencies from xCAT Dependencies Download and untar them in the same place as before.
If the management node has access to the internet, the commands below will pull the updates directly from the xCAT site.

To update xCAT:

[RH]:

    yum clean metadata or you may need to use yum clean all
    yum update '*xCAT*'

[SLES]:

    zypper refresh
    zypper update -t package '*xCAT*'

Note: this will not apply updates that may have been made to some of the xCAT deps packages. (If there are brand new deps packages, they will get installed.) In most cases, this is ok, but if you want to make all updates for xCAT rpms and deps, run the following command. This command will also pick up additional OS updates.

[RH]:

    yum update

[SLES]:

    zypper refresh
    zypper update

Note: Sometimes zypper refresh fails to refresh zypper local repository. Try to run zypper clean to clean local metadata, then use zypper refresh.

Note: If you are updating from xCAT 2.7.x (or earlier) to xCAT 2.8 or later, there are some additional migration steps that need to be considered:

Switch from xCAT IBM HPC Integration support to using Software Kits - see
IBM_HPC_Software_Kits#Switching_from_xCAT_IBM_HPC_Integration_Support_to_Using_Software_Kits
for details.
(Optional) Use nic attibutes to replace the otherinterfaces attribute to configure secondary see Cluster_Name_Resolution for details.
Convert non-osimage based system to osimage based system - see
Convert_Non-osimage_Based_System_To_Osimage_Based_System for details

Downloading and Installing DFM

This requires the new xCAT Direct FSP Management(dfm) plugin and hardware server(hdwr_svr) plugin, which are not part of the core xCAT open source, but are available as a free download from IBM. You must download this and install them on your xCAT management node (and possibly on your service nodes, depending on your configuration) before proceeding with this document.

Download xCAT-dfm RPM: http://www-933.ibm.com/support/fixcentral/swg/selectFixes?parent=ibm~ClusterSoftware&product=ibm/Other+software/IBM+direct+FSP+management+plug-in+for+xCAT&release=All&platform=All&function=all

Download ISNM-hdwr_svr RPM: http://www-933.ibm.com/support/fixcentral/swg/selectFixes?parent=ibm~ClusterSoftware&product=ibm/Other+software/IBM+High+Performance+Computing+%28HPC%29+Hardware+Server&release=All&platform=All&function=all

Download the suitable dfm and hdwr_svr packages for different OSes. Once you have downloaded these packages, install the hardware server package first, and then install DFM.

If you have been following the xCAT documentation, you should already have the yum repositories set up to pull in whatever xCAT dependencies and distro RPMs are needed (libstdc++.ppc, libgcc.ppc, openssl.ppc, etc.).

    yum install xCAT-dfm-* ISNM-hdwr_svr-*

Use xCAT to Configure Services on the Management Node

Setup /etc/hosts File

Since the map between the xCAT node names and IP addresses have been added in the xCAT database, you can run the makehosts xCAT command to create the /etc/hosts file from the xCAT database. (You can skip this step if you are creating /etc/hosts manually.)

    makehosts switch,blade,cmm

Verify the entries have been created in the file /etc/hosts.

Setup DNS

To get the hostname/IP pairs copied from /etc/hosts to the DNS on the MN:

Ensure that /etc/sysconfig/named does not have ROOTDIR set
Set site.forwarders to your site-wide DNS servers that can resolve site or public hostnames. The DNS on the MN will forward any requests it can't answer to these servers.

    chdef -t site forwarders=1.2.3.4,1.2.5.6

Edit /etc/resolv.conf to point the MN to its own DNS. (Note: this won't be required in xCAT 2.8 and above, but is an easy way to test that your DNS is configured properly.)

    search cluster
    nameserver 10.1.0.1

Run makedns

    makedns

For more information about name resolution in an xCAT Cluster, see [Cluster_Name_Resolution].

Setup DHCP

You usually don't want your DHCP server listening on your public (site) network, so set site.dhcpinterfaces to your MN's cluster facing NICs. For example:

    chdef -t site dhcpinterfaces=eth1

Then this will get the network stanza part of the DHCP configuration (including the dynamic range) set:

    makedhcp -n

The IP/MAC mappings for the nodes will be added to DHCP automatically as the nodes are discovered.

Define the CMMs and Switches

Define the CMMs
Define the Switches
Fill in More xCAT Tables
- The passwd Table
- The networks Table
Declare a dynamic range of addresses for discovery

Define the CMMs

First just add the list of CMMs and the groups they belong to:

    nodeadd cmm[01-15] groups=cmm,all

Now define attributes that are the same for all CMMs. These can be defined at the group level. For a description of the attribute names, see the node object definition.

    chdef -t group cmm hwtype=cmm mgt=blade

Next define the attributes that vary for each CMM. There are 2 different ways to do this. Assuming your naming conventions follow a regular pattern, the fastest way to do this is use regular expressions at the group level:

    chdef -t group cmm mpa='|(.*)|($1)|' ip='|cmm(\d+)|10.0.50.($1+0)|'

Note: The Flow for CMM IP addressing is 1) initially each CMM obtains a DHCP address from a dynamic range of IP addresses specified later, 2) This DHCP address will be listed when we do CMM discovery using lsslp 3) CMM configuration steps will change the CMM DHCP obtained ip address to the permanent static IP address which is specified here.

This chdef might look confusing at first, but once you parse it, it's not too bad. The regular expression syntax in xcat database attribute values follows the form:

    |pattern-to-match-on-the-nodename|value-to-give-the-attribute|

You use parentheses to indicate what should be matched on the left side and substituted on the right side. So for example, the mpa attribute above is:

    |(.*)|($1)|

This means match the entire nodename (.*) and substitute it as the value for mpa. This is what we want because for CMMs the mpa attribute should be set to itself.

For the ip attribute above, it is:

    |cmm(\d+)|10.0.50.($1+0)|

This means match the number part of the node name and use it as the last part of the IP address. (Adding 0 to the value just converts it from a string to a number to get rid of any leading zeros, i.e. change 09 to 9.) So for cmm07, the ip attribute will be 10.0.50.7.

For more information on xCAT's database regular expressions, see http://xcat.sourceforge.net/man5/xcatdb.5.html . To verify that the regular expressions are producing what you want, run lsdef for a node and confirm that the values are correct.

If you don't want to use regular expressions, you can create a stanza file containing the node attribute values:

    cmm01:
      objtype=node
      mpa=cmm01
      ip=10.0.50.1
    cmm02:
      objtype=node
      mpa=cmm02
      ip=10.0.50.2
    ...

Then pipe this into chdef:

    cat <stanzafile> | chdef -z

When you are done defining the CMMs, listing one should look like this:

    lsdef cmm07
    Object name: cmm07
        groups=cmm,all
        hwtype=cmm
        ip=10.0.50.7
        mgt=blade
        mpa=cmm07
        postbootscripts=otherpkgs
        postscripts=syslog,remoteshell,syncfiles

Define the Switches

    nodeadd switch[1-4] groups=switch,all
    chdef -t group switch ip='|switch(\d+)|10.0.60.($1+0)|'

Fill in More xCAT Tables

The passwd Table

There are several passwords required for management:

blade - The userid and password for the CMM.
ipmi - The userid and password used to communicate with the IPMI service on the IMM (BMC) of each blade. To avoid problems, this should be the same as the CMM userid and password above.
system - The root id and password which will be set on the node OS during node deployment and used later for the administrator to login to the node OS.

Use tabedit to give the passwd table contents like:

    key,username,password,cryptmethod,comments,disable
    "blade","USERID","PASSW0RD",,,
    "ipmi","USERID","PASSW0RD",,,
    "system","root","cluster",,,

The networks Table

All networks in the cluster must be defined in the networks table. When xCAT was installed, it ran makenetworks, which created an entry in this table for each of the networks the management node is connected to. Now is the time to add to the networks table any other networks in the cluster, or update existing networks in the table.

For a sample Networks Setup, see the following example: Setting_Up_a_Linux_xCAT_Mgmt_Node/#appendix-a-network-table-setup-example.

Declare a dynamic range of addresses for discovery

If you want to use hardware discovery, 2 dynamic ranges must be defined in the networks table: one for the service network (CMMs and IMMs), and one for the management network (the OS for each blade). The dynamic range in the service network (in our example 10.0) is used while discovering the CMMs and IMMs using SLP. The dynamic range in the management network (in our example 10.1) is used when booting the blade with the genesis kernel to get the MACs.

    chdef -t network 10_0_0_0-255_255_0_0 dynamicrange=10.0.255.1-10.0.255.254
    chdef -t network 10_1_0_0-255_255_0_0 dynamicrange=10.1.255.1-10.1.255.254

CMM Discovery and Configuration

Overview
Optional Discovery Method 1 - Mapping the CMMs to the switch port information (Development)
Optional Discovery Method 2 - Manually Discovering the CMMs Instead of Using the Switch Ports
CMM Configuration
CMM Security Password Expiration
Redundant CMM Support
Update the CMM firmware (optional)

Overview

In this section you will perform the CMM discovery and configuration tasks for the CMMs.

During the CMM discovery process all CMMs are discovered using Service Location Protocol(SLP) and the xCAT lsslp command. There are two methods which will allow mapping the SLP discovered CMMs to the CMMs predefined in the xCAT DB. You can either use method 1 which map the SLP data to the switch SNMP data together to update the xCAT DB or you can use method 2 to capture the SLP information to a file and edit it manually and then update the xCAT DB.

Two factors will determine which method you use. If this is a large configuration with many chassis and you are able to enable SNMP on the switch that the CMMs are connected to then method 1 would be preferred. If you are only defining a few chassis then method 2 might be an easier choice.

Note: xCAT Flex discovery now does not support the CMM with both primary and standby port.

Optional Discovery Method 1 - Mapping the CMMs to the switch port information (Development)

This supported will be available in xCAT 2.8.2 and later. This method requires SNMP access to the Ethernet switch where the CMMs are connected. If you can't configure SNMP on your switches, then use the section after:

CMM_Discovery_and_Configuration/#optional-discovery-method-2-manually-discovering-the-cmms-instead-of-using-the-switch-ports to discover and define the CMMs to xCAT.

In large clusters the most automated method for discovering is to map the SLP CMM information to the Ethernet switch SNMP data from which each chassis CMM is connected.

To use this method the xCAT switch and switches tables must be configured. The xCAT switch table will need to be updated with the switch port that each CMM is connected. The xCAT switches table must contain the SNMP access information.

Add the CMM switch/port information to the switch table.

 tabdump switch
 node,switch,port,vlan,interface,comments,disable
 "cmm01","switch","0/1",,,,
 "cmm02","switch","0/2",,,,

where: node is the cmm node object name switch is the hostname of the switch port is the switch port id. Note that xCAT does not need the complete port name. Preceding non numeric characters are ignored.

If you configured your switches to use SNMP V3, then you need to define several attributes in the switches table. Assuming all of your switches use the same values, you can set these attributes at the group level:

    tabch switch=switch switches.snmpversion=3 switches.username=xcatadmin \
         switches.password=passw0rd switches.auth=SHA



   tabdump switches
   switch,snmpversion,username,password,privacy,auth,linkports,sshusername,...
    "switch","3","xcatadmin","passw0rd",,"SHA",,,,,,

Note: It might also be necessary to allow authentication at the VLAN level

    snmp-server group xcatadmin v3 auth context vlan-230

Discover and update the xCAT CMM node definitions with the MAC, Model Type, and Serial Number.

    lsslp -s CMM -w

Verify that the CMMs have been updated with the mac, mtm, and serial information.

    lsdef cmm01
    cmm01:
           objtype=node
           mpa=cmm01
           nodetype=mp
           mtm=789392X
           serial=100037A
           side=2
           groups=cmm,all
           mgt=blade
           mac=5c:f3:fc:25:da:99
           hidden=0
           otherinterfaces=10.0.0.235
           hwtype=cmm

Optional Discovery Method 2 - Manually Discovering the CMMs Instead of Using the Switch Ports

If you can't enable SNMP on your switches, use this more manual approach to discover your hardware. If you have already discovered your hardware using spldiscover of lsslp --flexdiscover, skip this whole section.

Assuming your CMMs have at least received a dynamic address from the DHCP server, you can run lsslp to discover them and create a stanza file that contains their attributes that can be used to update the existing CMM nodes in the xCAT database. The problem is that without the switch port information, lsslp has no way to correlate the responses from SLP to the correct nodes in the database, so you must do that manually. Run:

    lsslp -m -z -s CMM > cmm.stanza

and it will create a stanza file with entries for each CMM that look like this:

    Server--SNY014BG27A01K:
           objtype=node
           mpa=Server--SNY014BG27A01K
           nodetype=mp
           mtm=789392X
           serial=100CF0A
           side=1
           groups=cmm,all
           mgt=blade
           mac=3440b5df0abe
           hidden=0
           otherinterfaces=10.0.0.235
           hwtype=cmm

Note: the otherinterfaces attribute is the dynamic IP address assigned to the CMM.

The first thing we want to do is strip out the non-essential attributes, because we have already defined them at a group level:

    grep -v -E '(mac=|nodetype=|groups=|mgt=|hidden=|hwtype=)' cmm.stanza > cmm2.stanza

Now edit cmm2.stanza and change each "<node>:" line and mpa to have the correct node name. Then put these attributes into the database:

    cat cmm2.stanza | chdef -z

CMM Configuration

For a new CMM the user USERID password is set as expired and you must use the xCAT rspconfig command to change the password to a new password before any other commands can access the CMM.

rspconfig cmm01 USERID=<new password>

Note: If password for CMM has been changed after discovery, you must make sure the correct password for CMM user USERID is updated into mpa table: chtab mpa=<cmm> mpa.username=USERID mpa.password=<password>; . You can then run the rspconfig command listed above.

Once a new password is use rspconfig to set the IP address of each CMM to the permanent (static) address specified in the ip attribute:

    rspconfig cmm01 initnetwork=*

Note: The rspconfig command with the initnetwork option will set the CMM IP address
to a the static IP address specified in the cmm01 node object ip attribute value.
The changing of the CMM network definition and will reset the CMM to boot
with the new value which will cause the CMM to temporarily loose its ethernet connection.

Checking the CMM definition will show that the DHCP value stored in otherinterfaces
has been removed since it is no longer being used.
You should use ping to test the IP address defined in the CMM node ip attribute to know when the CMM comes up before issuing other commands.

Once the CMM is back up and operationals use rspconfig to set the CMM to allow SSH and SNMP.

    rspconfig cmm01 sshcfg=enable 
    rspconfig cmm01 snmpcfg=enable

Note: If you receive error cmmxx: Failed to login to cmmxx, you can run "ssh USERID@cmm01" and set the ssh password for the CAT MN. If this does not work, we may need to check the passwords being referenced on the target CMM and in the xCAT database.

Note: If the cmm was previously defined and the rspconfig sshcfg=enable fails, you may need to clean up the old ssh entry in the know_hosts table on the xCAT MN. You can run "makeknownhosts cmm01 -r" to clean this ssh entry.

Check the values to make sure they were enabled properly.

    rspconfig cmm01 sshcfg snmpcfg
    cmm01: SSH: enabled
    cmm01: SNMP: enabled

Test the SSH connection to the CMM with the rscan CMM info command.

    ssh USERID@cmm01 info
    system> info
    UUID: 5CFB E60F 2EFB 4143 9154 B677 2A37 2143 
    Manufacturer: IBM (BG)
    Manufacturer ID: 20301
    Product ID: 336
    Mach type/model: 789392X
    Mach serial number: 100037A
    Manuf date: 2411
    Hardware rev: 52.48
    Part no.: 88Y6660
    FRU no.: 81Y2893
    FRU serial no.: Y130BG16D022
    CLEI: Not Available
    CMM bays: 2
    Blade bays: 14
    I/O Module bays: 4
    Power Module bays: 6
    Blower bays: 10
    Rear LED Card bays: 1
    U Height of Chassis 10
    Product Name: IBM Chassis Midplane

Test the SNMP connection to the CMM using rscan.

    rscan cmm01
    type    name             id      type-model  serial-number  mpa    address
    cmm     SN#Y014BG27A01K  0       789392X     100CF0A        cmm01  cmm01
    blade   node01           1       789523X     1082EAB        cmm01  10.0.0.232
    blade   node02           2       789523X     1082EBB        cmm01  10.0.0.231

CMM Security Password Expiration

The default security setting for the CMM is secure. This setting will require that the CMM user USERID password be changed within 90 days by default. You can change the password expiration date with the CMM accseccfg command. The following are examples of changing the expiration date.

List the security settings. The -pe is the password expiration:

   > ssh USERID@cmm01 accseccfg -T mm[1]
    system&gt; accseccfg -T mm[1]
    Custom settings:
    -alt 300
    -am local
    -cp on
    -ct 0
    -dc 2
    -de on
    -ia 120
    -ici off
    -id 180
    -lf 20
    -lp 2
    -mls 0
    -pc on
    -pe 90
    -pi 0
    -rc 5
    -wt user

You can change the password expiration date using the CMM flex command accseccfg .

     ssh USERID@cmm01 accseccfg -pe 300 -T mm[1] (set expiration days to 300)
     ssh USERID@cmm01 accseccfg -pe 0 -T mm[1]   (set expiration date to not expire)

More details on the CMM accseccfg command can be found at: http://publib.boulder.ibm.com/infocenter/flexsys/information/index.jsp?topic=%2Fcom.ibm.acc.cmm.doc%2Fcli_command_accseccfg.html

Redundant CMM Support

The xCAT support for CMM redundancy is to use the second CMM as the default standby CMM that has its own ethernet connection into the HW VLAN. For CMM discovery, it is recommended that the Flex cluster admin only plug in and connect the Bay 1 CMM as the primary CMM, where the admin does discovery and configuration of the Flex cluster with one primary CMM. When the primary CMM is fully working as a "static" IP with proper firmware levels, the admin can plug in the second Bay 2 CMM into the Flex chassis, and it will automatically come online as a standby CMM with same CMM firmware as the primary CMM. You can see more information about CMM recovery with Redundant CMM in a different section below.

Update the CMM firmware (optional)

This section specifies how to update the CMM firmware. You can run the xCAT "rinv cmm firm" command to list the cmm firmware level.

     rinv  cmm firm

The CMM firmware can be updated by loading the new cmefs.uxp firmware file using the CMM update command working with the http or tftp interface. Since the AIX xCAT MN does not usually support http, we have provided CMM update instructions working with tftp. The administrator needs to download firmware from IBM Fix Central. The compressed tar file will need to be uncompressed and unzipped to extract the firmware update files. You need to place the cmefs.uxp file in the /tftpboot directory on the xCAT MN for CMM update to work properly.

Once the firmware is unzipped and the cmefs.uxp is placed in the /tftpboot directory on the xCAT MN you can use the CMM update command to update the firmware on one chassis at a time or on all chassis managed by xCAT MN. More details on the CMM update command can be found at: http://publib.boulder.ibm.com/infocenter/flexsys/information/index.jsp?topic=%2Fcom.ibm.acc.cmm.doc%2Fcli_command_update.html

The format of the update command is: flash (-u) the CMM firmware file and reboot (-r) afterwards

    update -T system:mm[1] -r -u tftp://<server>/<update file>

flash (-u), show progress (-v), and reboot (-r) afterwards

    update -T system:mm[1] -v -r -u tftp://<server>/<update file>

Note: Make sure the CMM firmware file cmefs.uxp is placed in /tftpboot directory on xCAT MN. The tftp interface from the CMM will reference the /tftpboot as the default location.

To update firmware and restart a single CMM cmm01 from xCAT MN 70.0.0.1 use:

    ssh USERID@cmm01 update -T system:mm[1] -v  -r -u tftp://70.0.0.1/cmefs.uxp

If unprompted password is setup on all CMMs then you can use xCAT psh to update all CMMs in the cluster at once.

    psh -l USERID cmm update -T system:mm[1] -v -u tftp://70.0.0.1/cmefs.uxp

If you are experiencing a "Unsupported security level" message after the CMM firmware was updated then you should run the following command to overcome this issue.

    rspconfig cmm sshcfg=enable snmpcfg=enable

You can run the xCAT "rinv cmm firm" command to list the new cmm firmware.

     rinv  cmm firm

Create node object definitions of flex blade servers

There are different methods used to create the flex blade node objects in the xCAT database. One method is to create the predefined node objects, and then update the node objects using ""rscan -u "". The other method is to create ""rscan -z"", and then manually update the flex blade stanza file. The admin can then create the node objects using the stanza file.

Create predefined nodes in the Database first and run rscan -u to Discovery

This implementation should only be used when there are uniformed blade configurations working in the chassis. If there are mixtures of single and double wide blades in the chassis, the admin will need to remove unused blade node objects.

First just create the predefined node based on cmm and blade location, add the list of blades and the groups they belong too:

    nodeadd cmm[01-02]node[01-14] groups=all,blade

Change the blade definitions with the common attributes.

    chdef -t  group blade mgt=fsp cons=fsp

The attribute 'mpa' should be set to the node name of cmm. The attribute 'slotid' should be set to the physical slot id of the blade. The attribute 'hcp' should be set to the IP that admin try to assign to the fsp of the blade. Use chdef with patterns that will map to the settings you require.

    chdef -t group blade mpa='|cmm(\d+)node(\d+)|cmm($1)|'slotid='|cmm(\d+)node(\d+)|($2+0)|' hcp='|cmm(\d+)node(\d+)|10.0.($1+0).($2+0)|****'

List the blade entries to review the blade the definitions created.

    [root@c870f3ap01 ~]# nodels blade
    cmm01node01
    cmm01node03
    cmm01node05
    cmm01node07
    cmm01node09
    cmm01node10
    cmm01node11

Use lsdef to check each entry to validate the hcp, slotid, and hsp attributes:

    [root@c870f3ap01 ~]# lsdef cmm01node01
    Object name: cmm01node01
    cons=fsp
    groups=blade,all
    hcp=12.0.0.32
    hwtype=blade
    id=1
    mgt=fsp
    mpa=cmm01
    mtm=789542X
    nodetype=ppc,osi
    parent=cmm01
    postbootscripts=otherpkgs
    postscripts=syslog,remoteshell,syncfiles
    serial=10F752A
    slotid=1

Run rscan -u to discover all the compute node servers.

The rscan -u will match the xCAT nodes which have been defined in the xCAT database and update them instead of create a new one. It will also provide an error message that specifies if the blade node object is not found in the xCAT database. This type of error should happen when there is a configuration where the chassis contains both single wide and double wide blade configurations. The admin can execute the rmdef command for any unused blade node objects.

    rscan cmm -u

(For "rscan" details see: http://xcat.sourceforge.net/man1/rscan.1.html )

If there are a mixture of single and double wide blade in the chassis, the admin should remove the unused blade objects from the xCAT DB.

    rmdef  <cmmxxnodeyy>

Create object definition of flex blades by discovery using stanza files

This method is suggested when there are a different mix of flex blades being used in the flex blade cluster.

The rscan command reads the actual configuration of blade server in the CMM and creates node definitions in the xCAT database to reflect them. This command will create node objects for the target CMM, and the flex blades on the CMM in a stanza file. The admin should manually update the different nodes objects to specify the proper node names they want to use in the xCAT cluster. The admin may also want to change the hcp=<FSP IP> to be a different IP address than what was provided by DHCP server. If the CMM node object is already created, you can remove the CMM entries from the stanza file. You may need to add the "id=0" attribute to cmm objects.

There are unique differences between System P and System X Flex blade node objects working with rscan command. The big differences are the following attributes.

For System P Flex blades

    mgt=fsp
    cons=fsp
    id=1
    slotid=<blade slot>
    hcp=<FSP IP>

For System X Flex blades

    mgt not set,   admin can update  with mgt=ipmi
    cons not set,  admin can update with cons=ipmi
    slotid=<blade slot>
    id attribute is not used
    there is no hcp

Run the rscan command against all of the CMMs to create a stanza file for the definitions of all the compute node servers.

    rscan cmm -z >nodes.stanza

The Power 7 compute node stanza file is like this:

    SN#YL10JH184084:
           objtype=node
           nodetype=ppc,osi
           slotid=1
           id=1
           mtm=789542X
           serial=10F69BA
           mpa=flexcmm01
           parent=flexcmm01
           hcp=70.0.0.41
           groups=blade,all
           mgt=fsp
           cons=fsp
           hwtype=blade
    SN#Y110UF18P003:
           objtype=node
           nodetype=ppc,osi
           slotid=3
           id=1
           mtm=789522X
           serial=10F75AA
           mpa=flexcmm01
           parent=flexcmm01
           hcp=70.0.0.22
           groups=blade,all
           mgt=fsp
           cons=fsp
           hwtype=blade

In a stanza file, the user can get the blade server with the attributes hcp (fsp of the blade), mtm, serial and id attributions. For the stanza file above, the node SN#YL10JH184084 is a power blade(nodetype=ppc,hwtype=blade,mpa=cmm01). In order to easily access or operate those compute node servers, the user can edit the stanza file and give the node the name user want them to be for definition of each compute node server.

For Power 7 compute nodes the administrator will change the object name and hcp attribute for the IP of fsp. For example, the user can modify the definition of SN#YL10JH184084 as followings:

    cmm01node01:
       objtype=node
       cons=fsp
       groups=blade,all
       hcp=70.0.0.41
       hwtype=blade
       slotid=3
       id=1
       mgt=fsp
       mpa=cmm01
       mtm=789542X
       nodetype=ppc,osi
       parent=flexcmm01
       serial=10F69BA
       slotid=1

Then create the definitions in the database:

    cat nodes.stanza | mkdef -z

If CMM node objects are not updated from the target stanza file, make sure that the """id=0""" attribute is updated for the CMMs.

     chdef cmm  id=0

Set the network configuration for the fsp

The FSP for the System P flex blade will initially be setup as a dynamic IP address. The admin can choose to use this IP, or has the option to change it to another static IP address in the service VLAN. This FSP IP is controlled by the ""hcp"" attribute for the node. You can use mkdef/chdef or rscan to update the hcp entries to set the proper FSP IP addresses. The rspconfig command with the network=* option will set the FSP IP address to the value you specified in the hcp attribute.

    chdef  cmm01node01 hcp=12.0.0.101
    rspconfig blade network=*

Modify blade server device names

In order to conveniently manage the blade servers, the customer may want to have a cleaner name for the blade node. The following command can be used to modify a blade device name.

    rspconfig singlenode textid="cmm01node01"

The following command can be used to change a group of blade device name to the node names that are defined in xCAT DB.

    rspconfig blade textid=*

Create the hardware server connection for the IBM Flex power 7 server

1. Add the server's connections for the DFM management:

    mkhwconn blade -t

2. check the connections are LINE_UP:

    lshwconn blade

3. make sure the server powered on

    rpower blade state
    rpower blade on

Update the FSP firmware (optional)

This is accomplished by using the rflash xCAT command from the xCAT Management node. The admin should download the supported GFW from the IBM Fix central website, and place it in a directory that is available to be read by the xCAT Management node. The default firmware option with rflash is working with "disruptive". Since the Flex blades work with DFM, the admin may use the rflash "deferred" firmware option which is listed in the Appendix.

1. Use rinv command to get the current firmware levels of the blades' FSPs:

    rinv bladenoderange firm

(For "rinv" details see: http://xcat.sourceforge.net/man1/rinv.1.html )

2.Use the rflash command to update the firmware levels for the blades' FSPs. Then validate that the new firmware is loaded:

For firmware disruptive update, you should make sure the blade in power off state firstly.

     rpower bladenoderange off

And then use rflash to do the update:

    rflash bladenoderange -p <directory> --activate disruptive

(For "rflash" details see: http://xcat.sourceforge.net/man1/rflash.1.html )

    rinv bladenoderange firm

Note: If there is an error during the rflash update where the firmware is not loaded properly, you ran reference the firmware recovery procedure at the following xCAT document location.
XCAT_Power_775_Hardware_Management/#recover-the-system-from-a-pp-situation-because-of-the-failed-firmware-update.

3. Verify that the blades are healthy, then power on and boot up the blades:

    rpower bladenoderange state
    rvitals bladenoderange lcds
    rpower bladenoderange on

(For "rvitals" details see: http://xcat.sourceforge.net/man1/rvitals.1.html )

Prepare for Node Deployment

rcons configuration

It is important that the admin disable the Serial Over Lan (SOL) support on the CMM, so that xCAT DFM can control the remote console for the System P flex blades:

    rspconfig cmm solcfg=disable

Update conserver configuration

    makeconservercf

Check rcons. Before running rcons to open the console, make sure the Power 7 Servers are on:

    rpower blade state  # if any of the nodes are off, then run...
    rpower blade on

    rcons onebladenode

**Set the 'getmac' attribute to 'blade' **

    chdef blade getmac=blade

Update the mac table with the MAC address of Each Blade

In order to successfully deploy the OS you need to get the MAC for each blades in-band NIC that is connected to the management network and store it in the blade node object.

You can display all of the MACs for blades:

    # getmacs cmm01node11 -d
    MAC Address 1: 34:40:b5:be:c0:08
    MAC Address 2: 34:40:b5:be:c0:0c

To get the first MAC for each blade and store it in the database:

    getmacs blade

If you want to use the MAC for an adapter other than the first one, use the -i option of getmacs. For example:

    getmacs blade -i eth1

To display the MACs just collected:

    # lsdef blade -ci mac
    cmm01node01: mac=34:40:b5:be:c0:08
    ...

Set the Boot String for Each Blade

Ensure the blades are powered to onstandby (already done when collecting the MAC addresses).

    rpower blade onstandby

Then run rbootseq to set the blades to boot from the network first:

    rbootseq blade net

After using rbootseq to set the boot string, you should run rpower with reset to make the boot string permanent:

    rpower blade reset

Note: you can leave the blades always booting from the network first. Even for stateful nodes that have already been installed with a valid boot image on their hard disk, they will contact DHCP on the xCAT management node and it will instruct the nodes to boot from their hard disk.

Update the IBM Flex Power 7 Server firmware

1. Use rinv command to get the current firmware levels of the IBM Flex Power 7 Server:

    rinv bladenoderange firm (output to be added here)

2.Use the rflash command to update the firmware levels for the IBM Flex Power 7 Server. Then validate that the new firmware is loaded:

For firmware disruptive update, you should make sure the server in power off state firstly.

     rpower bladenoderange state
     rpower bladenoderange off

And then use rflash to do the update:

    rflash bladenoderange -p <directory> --activate disruptive
    (output to be added here)
    rinv bladenoderange firm

3. Verify that the blades are healthy and power on the servers:

    rpower bladenoderange state
    rpower bladenoderange on

Deploying an OS on the Blades

If you want to define one or more stateless (diskless) OS images and boot the nodes with those, see section XCAT_system_p_support_for_IBM_Flex/#deploying-stateless-nodes. This method has the advantage of managing the images in a central place, and having only one image per node type.
In you want to install your nodes as stateful (diskful) nodes, follow section XCAT_system_p_support_for_IBM_Flex/#installing-stateful-nodes.
If you want to have nfs-root statelite nodes, see [XCAT_Linux_Statelite]. This has the same advantage of managing the images from a central place. It has the added benefit of using less memory on the node while allowing larger images. But it has the drawback of making the nodes dependent on the management node or service nodes (i.e. if the management/service node goes down, the compute nodes booted from it go down too).
If you have a very large cluster (more than 500 nodes), at this point you should follow [Setting_Up_a_Linux_Hierarchical_Cluster] to install and configure your service nodes. After that you can return here to install or diskless boot your compute nodes.

Deploying Stateless Nodes

Note: this section is included from another document. Some of the examples refer to "x86_64". Just substitute "ppc64" instead.

Note: this section describes how to create a stateless image using the genimage command to install a list of rpms into the image. As an alternative, you can also capture an image from a running node and create a stateless image out of it. See [Capture_Linux_Image] for details.

Create the Distro Repository on the MN
Using an osimage Definition
Select or Create an osimage Definition
Set up pkglists
Set up a postinstall script (optional)
Set up Files to be synchronized on the nodes
Configure the nodes to use your osimage
Generate and pack your image
Boot the nodes

Create the Distro Repository on the MN

The copycds command copies the contents of the linux distro media to /install/<os>/<arch> so that it will be available to install nodes with or create diskless images.

Obtain the Redhat or SLES ISOs or DVDs.
If using an ISO, copy it to (or NFS mount it on) the management node, and then run:

copycds <path>/RHEL6.2-Server-20080430.0-x86_64-DVD.iso
If using a DVD, put it in the DVD drive of the management node and run:

copycds /dev/dvd # or whatever the device name of your dvd drive is

Tip: if this is the same distro version as your management node, create a .repo file in /etc/yum.repos.d with content similar to:

[local-rhels6.2-x86_64]
name=xCAT local rhels 6.2
baseurl=file:/install/rhels6.2/x86_64
enabled=1
gpgcheck=0

This way, if you need some additional RPMs on your MN at a later, you can simply install them using yum. Or if you are installing other software on your MN that requires some additional RPMs from the disto, they will automatically be found and installed.

Using an osimage Definition

Note: To use an osimage as your provisioning method, you need to be running xCAT 2.6.6 or later.

The provmethod attribute of your nodes should contain the name of the osimage object definition that is being used for those nodes. The osimage object contains paths for pkgs, templates, kernels, etc. If you haven't already, run copycds to copy the distro rpms to /install. Default osimage objects are also defined when copycds is run. To view the osimages:

    lsdef -t osimage          # see the list of osimages
    lsdef -t osimage <osimage-name>
          # see the attributes of a particular osimage

Select or Create an osimage Definition

From the list found above, select the osimage for your distro, architecture, provisioning method (install, netboot, statelite), and profile (compute, service, etc.). Although it is optional, we recommend you make a copy of the osimage, changing its name to a simpler name. For example:

    lsdef -t osimage -z rhels6.3-x86_64-netboot-compute | sed 's/^[^ ]\+:/mycomputeimage:/' | mkdef -z

This displays the osimage "rhels6.3-x86_64-netboot-compute" in a format that can be used as input to mkdef, but on the way there it uses sed to modify the name of the object to "mycomputeimage".

Initially, this osimage object points to templates, pkglists, etc. that are shipped by default with xCAT. And some attributes, for example otherpkglist and synclists, won't have any value at all because xCAT doesn't ship a default file for that. You can now change/fill in any osimage attributes that you want. A general convention is that if you are modifying one of the default files that an osimage attribute points to, copy it into /install/custom and have your osimage point to it there. (If you modify the copy under /opt/xcat directly, it will be over-written the next time you upgrade xCAT.) An important attribute to change is the rootimgdir which will contain the generated osimage files so that you don't over-write an image built with the shipped definitions. To continue the previous example:

      chdef -t osimage -o mycomputeimage rootimgdir=/install/netboot/rhels6.3/x86_64/mycomputeimage

Set up pkglists

You likely want to customize the main pkglist for the image. This is the list of rpms or groups that will be installed from the distro. (Other rpms that they depend on will be installed automatically.) For example:

    mkdir -p /install/custom/netboot/rh
    cp -p /opt/xcat/share/xcat/netboot/rh/compute.rhels6.x86_64.pkglist /install/custom/netboot/rh
    vi /install/custom/netboot/rh/compute.rhels6.x86_64.pkglist
    chdef -t osimage mycomputeimage pkglist=/install/custom/netboot/rh/compute.rhels6.x86_64.pkglist

The goal is to install the fewest number of rpms that still provides the function and applications that you need, because the resulting ramdisk will use real memory in your nodes.

Also, check to see if the default exclude list excludes all files and directories you do not want in the image. The exclude list enables you to trim the image after the rpms are installed into the image, so that you can make the image as small as possible.

    cp /opt/xcat/share/xcat/netboot/rh/compute.exlist /install/custom/netboot/rh
    vi /install/custom/netboot/rh/compute.exlist 
    chdef -t osimage mycomputeimage exlist=/install/custom/netboot/rh/compute.exlist

Make sure nothing is excluded in the exclude list that you need on the node. For example, if you require perl on your nodes, remove the line "./usr/lib/perl5*".

Installing OS Updates By Setting linuximage.pkgdir(only support for rhels and sles)

The linuximage.pkgdir is the name of the directory where the distro packages are stored. It can be set to multiple paths. The multiple paths must be separated by ",". The first path is the value of osimage.pkgdir and must be the OS base pkg directory path, such as pkgdir=/install/rhels6.5/x86_64,/install/updates/rhels6.5/x86_64 . In the os base pkg path, there is default repository data. In the other pkg path(s), the users should make sure there is repository data. If not, use "createrepo" command to create them.

If you have additional os update rpms (rpms may be come directly the os website, or from one of the os supplemental/SDK DVDs) that you also want installed, make a directory to hold them, create a list of the rpms you want installed, and add that information to the osimage definition:

Create a directory to hold the additional rpms:

    mkdir -p /install/updates/rhels6.5/x86_64 
    cd /install/updates/rhels6.5/x86_64 
    cp /myrpms/* .

OR, if you have a supplemental or SDK iso image that came with your OS distro, you can use copycds:

    copycds RHEL6.5-Supplementary-DVD1.iso -n rhels6.5-supp

If there is no repository data in the directory, you can run "createrepo" to create it:

    createrepo .

The createrepo command is in the createrepo rpm, which for RHEL is in the 1st DVD, but for SLES is in the SDK DVD.

NOTE: when the management node is rhels6.x, and the otherpkgs repository data is for rhels5.x, we should run createrepo with "-s md5". Such as:

    createrepo -s md5 .

Append the additional packages to install into the corresponding pkglist. For example, in /install/custom/install/rh/compute.rhels6.x86_64.pkglist, append:

    ...
    myrpm1
    myrpm2
    myrpm3

Remember, if you add more rpms at a later time, you must run createrepo again.

If not already specified, set the custom pkglist file in your osimage definition:

    chdef -t osimage mycomputeimage pkglist=/install/custom/install/rh/compute.rhels6.x86_64.pkglist

Add the new directory to the list of package directories in your osimage definition:

    chdef -t osimage mycomputeimage -p pkgdir=/install/updates/rhels6.5/x86_64

OR, if you used copycds:

    chdef -t osimage mycomputeimage -p pkgdir=/install/rhels6.5-supp/x86_64

Note: After making the above changes,

For diskfull install, run "nodeset <noderange> mycomputeimage" to pick up the changes, and then boot up the nodes
For diskless or statelite, run genimage to install the packages into the image, and then packimage or liteimg and boot up the nodes.
If the nodes are up, run "updatenode <noderange> ospkgs" to update the packages.
These functions are only supported for rhels6.x and sles11.x

Installing Additional Packages Using an Otherpkgs Pkglist

If you have additional rpms (rpms not in the distro) that you also want installed, make a directory to hold them, create a list of the rpms you want installed, and add that information to the osimage definition:

Create a directory to hold the additional rpms:

mkdir -p /install/post/otherpkgs/rh/x86_64
cd /install/post/otherpkgs/rh/x86_64
cp /myrpms/* .
createrepo .

NOTE: when the management node is rhels6.x, and the otherpkgs repository data is for rhels5.x, we should run createrepo with "-s md5". Such as:

createrepo -s md5 .

Create a file that lists the additional rpms that should be installed. For example, in /install/custom/netboot/rh/compute.otherpkgs.pkglist put:

myrpm1
myrpm2
myrpm3
Add both the directory and the file to the osimage definition:

chdef -t osimage mycomputeimage otherpkgdir=/install/post/otherpkgs/rh/x86_64 otherpkglist=/install/custom/netboot/rh/compute.otherpkgs.pkglist

If you add more rpms at a later time, you must run createrepo again. The createrepo command is in the createrepo rpm, which for RHEL is in the 1st DVD, but for SLES is in the SDK DVD.

If you have multiple sets of rpms that you want to keep separate to keep them organized, you can put them in separate sub-directories in the otherpkgdir. If you do this, you need to do the following extra things, in addition to the steps above:

Run createrepo in each sub-directory
In your otherpkgs.pkglist, list at least 1 file from each sub-directory. (During installation, xCAT will define a yum or zypper repository for each directory you reference in your otherpkgs.pkglist.) For example:

xcat/xcat-core/xCATsn
xcat/xcat-dep/rh6/x86_64/conserver-xcat

There are some examples of otherpkgs.pkglist in /opt/xcat/share/xcat/netboot/<distro>/service.*.otherpkgs.pkglist that show the format.

Note: the otherpkgs postbootscript should by default be associated with every node. Use lsdef to check:

lsdef node1 -i postbootscripts

If it is not, you need to add it. For example, add it for all of the nodes in the "compute" group:

chdef -p -t group compute postbootscripts=otherpkgs

Set up a postinstall script (optional)

Postinstall scripts for diskless images are analogous to postscripts for diskfull installation. The postinstall script is run by genimage near the end of its processing. You can use it to do anything to your image that you want done every time you generate this kind of image. In the script you can install rpms that need special flags, or tweak the image in some way. There are some examples shipped in /opt/xcat/share/xcat/netboot/<distro>. If you create a postinstall script to be used by genimage, then point to it in your osimage definition. For example:

    chdef -t osimage mycomputeimage postinstall=/install/custom/netboot/rh/compute.postinstall

Set up Files to be synchronized on the nodes

Note: This is only supported for stateless nodes in xCAT 2.7 and above.

Sync lists contain a list of files that should be sync'd from the management node to the image and to the running nodes. This allows you to have 1 copy of config files for a particular type of node and make sure that all those nodes are running with those config files. The sync list should contain a line for each file you want sync'd, specifying the path it has on the MN and the path it should be given on the node. For example:

    /install/custom/syncfiles/compute/etc/motd -> /etc/motd
    /etc/hosts -> /etc/hosts

If you put the above contents in /install/custom/netboot/rh/compute.synclist, then:

    chdef -t osimage mycomputeimage synclists=/install/custom/netboot/rh/compute.synclist

For more details, see Sync-ing_Config_Files_to_Nodes.

Configure the nodes to use your osimage

You can configure any noderange to use this osimage. In this example, we define that the whole compute group should use the image:

     chdef -t group compute provmethod=mycomputeimage

Now that you have associated an osimage with nodes, if you want to list a node's attributes, including the osimage attributes all in one command:

    lsdef node1 --osimage

Generate and pack your image

There are other attributes that can be set in your osimage definition. See the osimage man page for details.

Building an Image for a Different OS or Architecture

If you are building an image for a different OS/architecture than is on the Management node, you need to follow this process: [Building_a_Stateless_Image_of_a_Different_Architecture_or_OS]. Note: different OS in this case means, for example, RHEL 5 vs. RHEL 6. If the difference is just an update level/service pack (e.g. RHEL 6.0 vs. RHEL 6.3), then you can build it on the MN.

Building an Image for the Same OS and Architecture as the MN

If the image you are building is for nodes that are the same OS and architecture as the management node (the most common case), then you can follow the instructions here to run genimage on the management node.

Run genimage to generate the image based on the mycomputeimage definition:

    genimage mycomputeimage

Before you pack the image, you have the opportunity to change any files in the image that you want to, by cd'ing to the rootimgdir (e.g. /install/netboot/rhels6/x86_64/compute/rootimg). Although, instead, we recommend that you make all changes to the image via your postinstall script, so that it is repeatable.

The genimage command creates /etc/fstab in the image. If you want to, for example, limit the amount of space that can be used in /tmp and /var/tmp, you can add lines like the following to it (either by editing it by hand or via the postinstall script):

    tmpfs   /tmp     tmpfs    defaults,size=50m             0 2
    tmpfs   /var/tmp     tmpfs    defaults,size=50m       0 2

But probably an easier way to accomplish this is to create a postscript to be run when the node boots up with the following lines:

    logger -t xcat "$0: BEGIN"
    mount -o remount,size=50m /tmp/
    mount -o remount,size=50m /var/tmp/
    logger -t xcat "$0: END"

Assuming you call this postscript settmpsize, you can add this to the list of postscripts that should be run for your compute nodes by:

    chdef -t group compute -p postbootscripts=settmpsize

Now pack the image to create the ramdisk:

    packimage mycomputeimage

Installing a New Kernel in the Stateless Image

Note: This procedure assumes you are using xCAT 2.6.1 or later.

The kerneldir attribute in linuximage table can be used to assign a directory containing kernel RPMs that can be installed into stateless/statelite images. The default for kernerdir is /install/kernels. To add a new kernel, create a directory named <kernelver> under the kerneldir, and genimage will pick them up from there.

The following examples assume you have the kernel RPM in /tmp and is using the default value for kerneldir (/install/kernels).

The RPM names below are only examples, substitute your specific level and architecture.

[RHEL]:

The RPM kernel package is usually named: kernel-<kernelver>.rpm.
For example, kernel-2.6.32.10-0.5.x86_64.rpm means kernelver=2.6.32.10-0.5.x86_64.

    mkdir -p /install/kernels/2.6.32.10-0.5.x86_64
    cp /tmp/kernel-2.6.32.10-0.5.x86_64.rpm /install/kernels/2.6.32.10-0.5.x86_64/
    createrepo /install/kernels/2.6.32.10-0.5.x86_64/

Run genimage/packimage to update the image with the new kernel.
Note: If downgrading the kernel, you may need to first remove the rootimg directory.

    genimage <imagename> -k 2.6.32.10-0.5.x86_64
    packimage <imagename>

[SLES]:

The RPM kernel package is usually separated into two parts: kernel-<arch>-base and kernel<arch>.
For example, /tmp contains the following two RPMs:

    kernel-ppc64-base-2.6.27.19-5.1.x86_64.rpm
    kernel-ppc64-2.6.27.19-5.1.x86_64.rpm

2.6.27.19-5.1.x86_64 is NOT the kernel version, 2.6.27.19-5-x86_64 is the kernel version.
The "5.1.x86_64" is replaced with "5-x86_64".

    mkdir -p /install/kernels/2.6.27.19-5-x86_64/
    cp /tmp/kernel-ppc64-base-2.6.27.19-5.1.x86_64.rpm /install/kernels/2.6.27.19-5-x86_64/
    cp /tmp/kernel-ppc64-2.6.27.19-5.1.x86_64.rpm /install/kernels/2.6.27.19-5-x86_64/

Run genimage/packimage to update the image with the new kernel.
Note: If downgrading the kernel, you may need to first remove the rootimg directory.

Since the kernel version name is different from the kernel rpm package name, the -g flag MUST to be specified on the genimage command.

    genimage <imagename> -k 2.6.27.19-5-x86_64 -g 2.6.27.19-5.1
    packimage <imagename>

Installing New Kernel Drivers to Stateless Initrd

The kernel drivers in the stateless initrd are used for the devices during the netboot. If you are missing one or more kernel drivers for specific devices (especially for the network device), the netboot process will fail. xCAT offers two approaches to add additional drivers to the stateless initrd during the running of genimage.

Use the '-n' flag to add new drivers to the stateless initrd

    genimage <imagename> -n <new driver list>

Generally, the genimage command has a default driver list which will be added to the initrd. But if you specify the '-n' flag, the default driver list will be replaced with your <new driver list>. That means you need to include any drivers that you need from the default driver list into your <new driver list>.

The default driver list:

    rh-x86:   tg3 bnx2 bnx2x e1000 e1000e igb mlx_en virtio_net be2net
    rh-ppc:   e1000 e1000e igb ibmveth ehea
    sles-x86: tg3 bnx2 bnx2x e1000 e1000e igb mlx_en be2net
    sels-ppc: tg3 e1000 e1000e igb ibmveth ehea be2net

Note: With this approach, xCAT will search for the drivers in the rootimage. You need to make sure the drivers have been included in the rootimage before generating the initrd. You can install the drivers manually in an existing rootimage (using chroot) and run genimage again, or you can use a postinstall script to install drivers to the rootimage during your initial genimage run.

Use the driver rpm package to add new drivers from rpm packages to the stateless initrd

Refer to the doc Using_Linux_Driver_Update_Disk#Driver_RPM_Package.

Boot the nodes

    nodeset compute osimage=mycomputeimage

(If you need to update your diskless image sometime later, change your osimage attributes and the files they point to accordingly, and then rerun genimage, packimage, nodeset, and boot the nodes.)

Now boot your nodes...

    rpower blade boot

Installing Stateful Nodes

Note: this section is included from another document. Some of the examples refer to "x86_64". Just substitute "ppc64" instead. Also, the rsetboot command is not necessary with Flex Power 7 blades.

This section describes deploying stateful nodes.
There are two options to install your nodes as stateful (diskful) nodes:

Use ISOs or DVDs, follow the Option 1 , Installing Stateful Nodes using ISOs or DVDs below.
Clone new nodes from a pre-installed/pre-configured node, follow the Option 2,Installing Stateful Nodes Using Sysclone below.

Option 1: Installing Stateful Nodes Using ISOs or DVDs
Option 2: Installing Stateful Nodes Using Sysclone
- Install or Configure the Golden Client
- **Capture image from the Golden Client **

Option 1: Installing Stateful Nodes Using ISOs or DVDs

This section describes the process for setting up xCAT to install nodes; that is how to install an OS on the disk of each node.

Create the Distro Repository on the MN

The copycds command copies the contents of the linux distro media to /install/<os>/<arch> so that it will be available to install nodes with or create diskless images.

Obtain the Redhat or SLES ISOs or DVDs.
If using an ISO, copy it to (or NFS mount it on) the management node, and then run:

    copycds <path>/RHEL6.2-*-Server-x86_64-DVD1.iso

If using a DVD, put it in the DVD drive of the management node and run:

    copycds /dev/dvd       # or whatever the device name of your dvd drive is

Tip: if this is the same distro version as your management node, create a .repo file in /etc/yum.repos.d with content similar to:

    [local-rhels6.2-x86_64]
    name=xCAT local rhels 6.2
    baseurl=file:/install/rhels6.2/x86_64
    enabled=1
    gpgcheck=0

Select or Create an osimage Definition

The copycds command also automatically creates several osimage defintions in the database that can be used for node deployment. To see them:

    lsdef -t osimage          # see the list of osimages
    lsdef -t osimage <osimage-name>          # see the attributes of a particular osimage

From the list above, select the osimage for your distro, architecture, provisioning method (in this case install), and profile (compute, service, etc.). Although it is optional, we recommend you make a copy of the osimage, changing its name to a simpler name. For example:

    lsdef -t osimage -z rhels6.2-x86_64-install-compute | sed 's/^[^ ]\+:/mycomputeimage:/' | mkdef -z

This displays the osimage "rhels6.2-x86_64-install-compute" in a format that can be used as input to mkdef, but on the way there it uses sed to modify the name of the object to "mycomputeimage".

But for now, we will use the default values in the osimage definition and continue on. (If you really want to see examples of modifying/creating the pkglist, template, otherpkgs pkglist, and sync file list, see the section [Using_Provmethod=osimagename]. Most of the examples there can be used for stateful nodes too.)

Install a New Kernel on the Nodes (Optional)

Create a postscript file called (for example) updatekernel:

    vi /install/postscripts/updatekernel

Add the following lines to the file:

    #!/bin/bash
    rpm -Uivh data/kernel-*rpm

Change the permission on the file:

    chmod 755 /install/postscripts/updatekernel

Make the new kernel RPM available to the postscript:

    mkdir /install/postscripts/data
    cp <kernel> /install/postscripts/data

Add the postscript to your compute nodes:

    chdef -p -t group compute postscripts=updatekernel

Now when you install your nodes (done in a step below), it will also update the kernel.

Alternatively, you could install your nodes with the stock kernel, and update the nodes afterward using updatenode and the same postscript above, in this case, you need to reboot your nodes to make the new kernel be effective.

Customize the disk partitioning (Optional)

By default, xCAT will install the operating system on the first disk and with default partitions layout in the node. However, you may choose to customize the disk partitioning during the install process and define a specific disk layout. You can do this in one of two ways:

Partition definition file

You could create a customized osimage partition file, say /install/custom/my-partitions, that contains the disk partitioning definition, then associate the partition file with osimage, the nodeset command will insert the contents of this file directly into the generated autoinst configuration file that will be used by the OS installer.

Create partition file

The partition file must follow the partitioning syntax of the installer(e.g. kickstart for RedHat, AutoYaST for SLES， Preseed for Ubuntu).

Here are examples of the partition file:

RedHat Standard Partitions for IBM Power machines

# Uncomment this PReP line for IBM Power servers
#part None --fstype "PPC PReP Boot" --size 8 --ondisk sda
# Uncomment this efi line for x86_64 servers
#part /boot/efi --size 50 --ondisk /dev/sda --fstype efi
part /boot --size 256 --fstype ext4
part swap --recommended --ondisk sda
part / --size 1 --grow --fstype ext4 --ondisk sda

** RedHat LVM Partitions**

# Uncomment this PReP line for IBM Power servers
#part None --fstype "PPC PReP Boot" --ondisk /dev/sda --size 8
# Uncomment this efi line for x86_64 servers
#part /boot/efi --size 50 --ondisk /dev/sda --fstype efi
part /boot --size 256 --fstype ext4 --ondisk /dev/sda
part swap --recommended --ondisk /dev/sda
part pv.01 --size 1 --grow --ondisk /dev/sda
volgroup system pv.01
logvol / --vgname=system --name=root --size 1 --grow --fstype ext4

** RedHat RAID 1 configuration **

See Use_RAID1_In_xCAT_Cluster for more details.

** x86_64 SLES Standard Partitions**

      <drive>
         <device>/dev/sda</device>
         <initialize config:type="boolean">true</initialize>
         <use>all</use>
         <partitions config:type="list">
           <partition>
             <create config:type="boolean">true</create>
             <filesystem config:type="symbol">swap</filesystem>
             <format config:type="boolean">true</format>
             <mount>swap</mount>
             <mountby config:type="symbol">path</mountby>
             <partition_nr config:type="integer">1</partition_nr>
             <partition_type>primary</partition_type>
             <size>32G</size>
           </partition>
           <partition>
             <create config:type="boolean">true</create>
             <filesystem config:type="symbol">ext3</filesystem>
             <format config:type="boolean">true</format>
             <mount>/</mount>
             <mountby config:type="symbol">path</mountby>
             <partition_nr config:type="integer">2</partition_nr>
             <partition_type>primary</partition_type>
             <size>64G</size>
           </partition>
         </partitions>
       </drive>

** x86_64 SLES LVM Partitions**

<drive>
  <device>/dev/sda</device>
  <initialize config:type="boolean">true</initialize>
  <partitions config:type="list">
    <partition>
      <create config:type="boolean">true</create>
      <crypt_fs config:type="boolean">false</crypt_fs>
      <filesystem config:type="symbol">ext3</filesystem>
      <format config:type="boolean">true</format>
      <loop_fs config:type="boolean">false</loop_fs>
      <mountby config:type="symbol">device</mountby>
      <partition_id config:type="integer">65</partition_id>
      <partition_nr config:type="integer">1</partition_nr>
      <pool config:type="boolean">false</pool>
      <raid_options/>
      <resize config:type="boolean">false</resize>
      <size>8M</size>
      <stripes config:type="integer">1</stripes>
      <stripesize config:type="integer">4</stripesize>
      <subvolumes config:type="list"/>
    </partition>
    <partition>
      <create config:type="boolean">true</create>
      <crypt_fs config:type="boolean">false</crypt_fs>
      <filesystem config:type="symbol">ext3</filesystem>
      <format config:type="boolean">true</format>
      <loop_fs config:type="boolean">false</loop_fs>
      <mount>/boot</mount>
      <mountby config:type="symbol">device</mountby>
      <partition_id config:type="integer">131</partition_id>
      <partition_nr config:type="integer">2</partition_nr>
      <pool config:type="boolean">false</pool>
      <raid_options/>
      <resize config:type="boolean">false</resize>
      <size>256M</size>
      <stripes config:type="integer">1</stripes>
      <stripesize config:type="integer">4</stripesize>
      <subvolumes config:type="list"/>
    </partition>
    <partition>
      <create config:type="boolean">true</create>
      <crypt_fs config:type="boolean">false</crypt_fs>
      <format config:type="boolean">false</format>
      <loop_fs config:type="boolean">false</loop_fs>
      <lvm_group>vg0</lvm_group>
      <mountby config:type="symbol">device</mountby>
      <partition_id config:type="integer">142</partition_id>
      <partition_nr config:type="integer">3</partition_nr>
      <pool config:type="boolean">false</pool>
      <raid_options/>
      <resize config:type="boolean">false</resize>
      <size>max</size>
      <stripes config:type="integer">1</stripes>
      <stripesize config:type="integer">4</stripesize>
      <subvolumes config:type="list"/>
    </partition>
  </partitions>
  <pesize></pesize>
  <type config:type="symbol">CT_DISK</type>
  <use>all</use>
</drive>
<drive>
  <device>/dev/vg0</device>
  <initialize config:type="boolean">true</initialize>
  <partitions config:type="list">
    <partition>
      <create config:type="boolean">true</create>
      <crypt_fs config:type="boolean">false</crypt_fs>
      <filesystem config:type="symbol">swap</filesystem>
      <format config:type="boolean">true</format>
      <loop_fs config:type="boolean">false</loop_fs>
      <lv_name>swap</lv_name>
      <mount>swap</mount>
      <mountby config:type="symbol">device</mountby>
      <partition_id config:type="integer">130</partition_id>
      <partition_nr config:type="integer">5</partition_nr>
      <pool config:type="boolean">false</pool>
      <raid_options/>
      <resize config:type="boolean">false</resize>
      <size>auto</size>
      <stripes config:type="integer">1</stripes>
      <stripesize config:type="integer">4</stripesize>
      <subvolumes config:type="list"/>
    </partition>
    <partition>
      <create config:type="boolean">true</create>
      <crypt_fs config:type="boolean">false</crypt_fs>
      <filesystem config:type="symbol">ext3</filesystem>
      <format config:type="boolean">true</format>
      <loop_fs config:type="boolean">false</loop_fs>
      <lv_name>root</lv_name>
      <mount>/</mount>
      <mountby config:type="symbol">device</mountby>
      <partition_id config:type="integer">131</partition_id>
      <partition_nr config:type="integer">1</partition_nr>
      <pool config:type="boolean">false</pool>
      <raid_options/>
      <resize config:type="boolean">false</resize>
      <size>max</size>
      <stripes config:type="integer">1</stripes>
      <stripesize config:type="integer">4</stripesize>
      <subvolumes config:type="list"/>
    </partition>
  </partitions>
  <pesize></pesize>
  <type config:type="symbol">CT_LVM</type>
  <use>all</use>
</drive>

** ppc64 SLES Standard Partitions**

    <drive>
      <device>/dev/sda</device>
      <initialize config:type="boolean">true</initialize>
      <partitions config:type="list">
        <partition>
          <create config:type="boolean">true</create>
          <crypt_fs config:type="boolean">false</crypt_fs>
          <filesystem config:type="symbol">ext3</filesystem>
          <format config:type="boolean">false</format>
          <loop_fs config:type="boolean">false</loop_fs>
          <mountby config:type="symbol">device</mountby>
          <partition_id config:type="integer">65</partition_id>
          <partition_nr config:type="integer">1</partition_nr>
          <resize config:type="boolean">false</resize>
          <size>auto</size>
        </partition>
        <partition>
          <create config:type="boolean">true</create>
          <crypt_fs config:type="boolean">false</crypt_fs>
          <filesystem config:type="symbol">swap</filesystem>
          <format config:type="boolean">true</format>
          <fstopt>defaults</fstopt>
          <loop_fs config:type="boolean">false</loop_fs>
          <mount>swap</mount>
          <mountby config:type="symbol">id</mountby>
          <partition_id config:type="integer">130</partition_id>
          <partition_nr config:type="integer">2</partition_nr>
          <resize config:type="boolean">false</resize>
          <size>auto</size>
        </partition>
        <partition>
          <create config:type="boolean">true</create>
          <crypt_fs config:type="boolean">false</crypt_fs>
          <filesystem config:type="symbol">ext3</filesystem>
          <format config:type="boolean">true</format>
          <fstopt>acl,user_xattr</fstopt>
          <loop_fs config:type="boolean">false</loop_fs>
          <mount>/</mount>
          <mountby config:type="symbol">id</mountby>
          <partition_id config:type="integer">131</partition_id>
          <partition_nr config:type="integer">3</partition_nr>
          <resize config:type="boolean">false</resize>
          <size>max</size>
        </partition>
      </partitions>
      <pesize></pesize>
      <type config:type="symbol">CT_DISK</type>
      <use>all</use>
    </drive>

** SLES RAID 1 configuration **

See Use_RAID1_In_xCAT_Cluster for more details.

** Ubuntu standard partition configuration on PPC64le **

8 1 32 prep
        $primary{ }
        $bootable{ }
        method{ prep } .

256 256 512 ext3
        $primary{ }
        method{ format }
        format{ }
        use_filesystem{ }
        filesystem{ ext3 }
        mountpoint{ /boot } .

64 512 300% linux-swap
        method{ swap }
        format{ } .

512 1024 4096 ext3
        $primary{ }
        method{ format }
        format{ }
        use_filesystem{ }
        filesystem{ ext4 }
        mountpoint{ / } .

100 10000 1000000000 ext3
        method{ format }
        format{ }
        use_filesystem{ }
        filesystem{ ext4 }
        mountpoint{ /home } .

** Ubuntu standard partition configuration on X86_64 **

256 256 512 vfat
        $primary{ }
        method{ format }
        format{ }
        use_filesystem{ }
        filesystem{ vfat }
        mountpoint{ /boot/efi } .

256 256 512 ext3
        $primary{ }
        method{ format }
        format{ }
        use_filesystem{ }
        filesystem{ ext3 }
        mountpoint{ /boot } .

64 512 300% linux-swap
        method{ swap }
        format{ } .

512 1024 4096 ext3
        $primary{ }
        method{ format }
        format{ }
        use_filesystem{ }
        filesystem{ ext4 }
        mountpoint{ / } .

100 10000 1000000000 ext3
        method{ format }
        format{ }
        use_filesystem{ }
        filesystem{ ext4 }
        mountpoint{ /home } .

If none of these examples could be used in your cluster, you could refer to the Kickstart documentation or Autoyast documentation or Preseed documentation to write your own partitions layout. Meanwhile, RedHat and SuSE provides some tools that could help generate kickstart/autoyast templates, in which you could refer to the partition section for the partitions layout information:

RedHat:
- The file /root/anaconda-ks.cfg is a sample kickstart file created by RedHat installer during the installation process based on the options that you selected.
- system-config-kickstart is a tool with graphical interface for creating kickstart files
SLES
- Use yast2 autoyast in GUI or CLI mode to customize the installation options and create autoyast file
- Use yast2 clone_system to create autoyast configuration file /root/autoinst.xml to clone an existing system
Ubuntu
- For detailed information see the files partman-auto-recipe.txt and partman-auto-raid-recipe.txt included in the debian-installer package. Both files are also available from the debian-installer source repository. Note that the supported functionality may change between releases.

Associate partition file with osimage

      chdef -t osimage <osimagename> partitionfile=/install/custom/my-partitions
      nodeset <nodename> osimage=<osimage>

For Redhat, when nodeset runs and generates the /install/autoinst file for a node, it will replace the #XCAT_PARTITION_START#...#XCAT_PARTITION_END# directives from your osimage template with the contents of your custom partitionfile.

For Ubuntu, when nodeset runs and generates the /install/autoinst file for a node, it will generate a script to write the partition configuration to /tmp/partitionfile, this script will replace the #XCA_PARTMAN_RECIPE_SCRIPT# directive in /install/autoinst/<node>.pre. </node>

Partitioning definition script(for RedHat and Ubuntu)

Create a shell script that will be run on the node during the install process to dynamically create the disk partitioning definition. This script will be run during the OS installer %pre script on Redhat or preseed/early_command on Unbuntu execution and must write the correct partitioning definition into the file /tmp/partitionfile on the node.

Create partition script

The purpose of the partition script is to create the /tmp/partionfile that will be inserted into the kickstart/autoyast/preseed template, the script could include complex logic like select which disk to install and even configure RAID, etc..

Note: the partition script feature is not thoroughly tested on SLES, there might be problems, use this feature on SLES at your own risk.

Here is an example of the partition script on Redhat and SLES, the partitioning script is /install/custom/my-partitions.sh:

instdisk="/dev/sda"

modprobe ext4 >& /dev/null
modprobe ext4dev >& /dev/null
if grep ext4dev /proc/filesystems > /dev/null; then
        FSTYPE=ext3
elif grep ext4 /proc/filesystems > /dev/null; then
        FSTYPE=ext4
else
        FSTYPE=ext3
fi
BOOTFSTYPE=ext3
EFIFSTYPE=vfat
if uname -r|grep ^3.*el7 > /dev/null; then
    FSTYPE=xfs
    BOOTFSTYPE=xfs
    EFIFSTYPE=efi
fi

if [ `uname -m` = "ppc64" ]; then
        echo 'part None --fstype "PPC PReP Boot" --ondisk '$instdisk' --size 8' >> /tmp/partitionfile
fi
if [ -d /sys/firmware/efi ]; then
    echo 'bootloader --driveorder='$instdisk >> /tmp/partitionfile
        echo 'part /boot/efi --size 50 --ondisk '$instdisk' --fstype $EFIFSTYPE' >> /tmp/partitionfile
else
    echo 'bootloader' >> /tmp/partitionfile
fi

echo "part /boot --size 512 --fstype $BOOTFSTYPE --ondisk $instdisk" >> /tmp/partitionfile
echo "part swap --recommended --ondisk $instdisk" >> /tmp/partitionfile
echo "part / --size 1 --grow --ondisk $instdisk --fstype $FSTYPE" >> /tmp/partitionfile

The following is an example of the partition script on Ubuntu, the partitioning script is /install/custom/my-partitions.sh:

if [ -d /sys/firmware/efi ]; then
    echo "ubuntu-efi ::" > /tmp/partitionfile
    echo "    512 512 1024 fat16" >> /tmp/partitionfile
    echo '    $iflabel{ gpt } $reusemethod{ } method{ efi } format{ }' >> /tmp/partitionfile
    echo "    ." >> /tmp/partitionfile
else
    echo "ubuntu-boot ::" > /tmp/partitionfile
    echo "100 50 100 ext3" >> /tmp/partitionfile
    echo '    $primary{ } $bootable{ } method{ format } format{ } use_filesystem{ } filesystem{ ext3 } mountpoint{ /boot }' >> /tmp/partitionfile
    echo "    ." >> /tmp/partitionfile
fi
echo "500 10000 1000000000 ext3" >> /tmp/partitionfile
echo "    method{ format } format{ } use_filesystem{ } filesystem{ ext3 } mountpoint{ / }" >> /tmp/partitionfile
echo "    ." >> /tmp/partitionfile
echo "2048 512 300% linux-swap" >> /tmp/partitionfile
echo "    method{ swap } format{ }" >> /tmp/partitionfile
echo "    ." >> /tmp/partitionfile

Associate partition script with osimage:

         chdef -t osimage <osimagename> partitionfile='s:/install/custom/my-partitions.sh'
         nodeset <nodename> osimage=<osimage>

Note: the 's:' preceding the filename tells nodeset that this is a script.
For Redhat, when nodeset runs and generates the /install/autoinst file for a node, it will add the execution of the contents of this script to the %pre section of that file. The nodeset command will then replace the #XCAT_PARTITION_START#...#XCAT_PARTITION_END# directives from the osimage template file with "%include /tmp/partitionfile" to dynamically include the tmp definition file your script created.
For Ubuntu, when nodeset runs and generates the /install/autoinst file for a node, it will replace the "#XCA_PARTMAN_RECIPE_SCRIPT#" directive and add the execution of the contents of this script to the /install/autoinst/<node>.pre, the /install/autoinst/<node>.pre script will be run in the preseed/early_command.</node></node>

Partitioning disk file(For Ubuntu only)

The disk file contains the name of the disks to partition in traditional, non-devfs format and delimited with space “ ”, for example,

/dev/sda /dev/sdb

If not specified, the default value will be used.

Associate partition disk file with osimage:

         chdef -t osimage <osimagename> -p partitionfile='d:/install/custom/partitiondisk'
         nodeset <nodename> osimage=<osimage>

Note: the 'd:' preceding the filename tells nodeset that this is a partition disk file.
For Ubuntu, when nodeset runs and generates the /install/autoinst file for a node, it will generate a script to write the content of the partition disk file to /tmp/boot_disk, this context to run the script will replace the #XCA_PARTMAN_DISK_SCRIPT# directive in /install/autoinst/<node>.pre. </node>

Partitioning disk script(For Ubuntu only)

The disk script contains a script to generate a partitioning disk file named "/tmp/boot_disk". for example,

    rm /tmp/devs-with-boot 2>/dev/null || true; 
    for d in $(list-devices partition); do 
        mkdir -p /tmp/mymount; 
        rc=0; 
        mount $d /tmp/mymount || rc=$?; 
        if [[ $rc -eq 0 ]]; then 
            [[ -d /tmp/mymount/boot ]] && echo $d >>/tmp/devs-with-boot; 
            umount /tmp/mymount; 
        fi 
    done; 
    if [[ -e /tmp/devs-with-boot ]]; then 
        head -n1 /tmp/devs-with-boot | egrep  -o '\S+[^0-9]' > /tmp/boot_disk; 
        rm /tmp/devs-with-boot 2>/dev/null || true; 
    else 
        DEV=`ls /dev/disk/by-path/* -l | egrep -o '/dev.*[s|h|v]d[^0-9]$' | sort -t : -k 1 -k 2 -k 3 -k 4 -k 5 -k 6 -k 7 -k 8 -g | head -n1 | egrep -o '[s|h|v]d.*$'`; 
        if [[ "$DEV" == "" ]]; then DEV="sda"; fi; 
        echo "/dev/$DEV" > /tmp/boot_disk; 
    fi;

If not specified, the default value will be used.

Associate partition disk script with osimage:

         chdef -t osimage <osimagename> -p partitionfile='s:d:/install/custom/partitiondiskscript'
         nodeset <nodename> osimage=<osimage>

Note: the 's:' prefix tells nodeset that is a script, the 's:d:' preceding the filename tells nodeset that this is a script to generate the partition disk file.
For Ubuntu, when nodeset runs and generates the /install/autoinst file for a node, this context to run the script will replace the #XCA_PARTMAN_DISK_SCRIPT# directive in /install/autoinst/<node>.pre. </node>

Additional preseed configuration file and additional preseed configuration script (For Ubuntu only)

To support other specific partition methods such as RAID or LVM in Ubuntu, some additional preseed configuration entries should be specified, these entries can be specified in 2 ways:

'c:<the absolute path of the additional preseed config file>', the additional preseed config file
 contains the additional preseed entries in "d-i ..." syntax. When "nodeset", the     
#XCA_PARTMAN_ADDITIONAL_CFG# directive in /install/autoinst/<node> will be replaced with 
content of the config file, an example:

d-i partman-auto/method string raid
d-i partman-md/confirm boolean true

's:c:<the absolute path of the additional preseed config script>',  the additional preseed config
 script is a script to set the preseed values with "debconf-set". When "nodeset", the 
#XCA_PARTMAN_ADDITIONAL_CONFIG_SCRIPT# directive in /install/autoinst/<node>.pre will be replaced 
with the content of the script, an example:

debconf-set partman-auto/method string raid
debconf-set partman-md/confirm boolean true

If not specified, the default value will be used.

Associate additional preseed configuration file or additional preseed configuration script with osimage:

Associate additional preseed configuration file by:

         chdef -t osimage <osimagename> -p partitionfile='c:/install/custom/configfile'
         nodeset <nodename> osimage=<osimage>

Associate additional preseed configuration script by:

         chdef -t osimage <osimagename> -p partitionfile='s:c:/install/custom/configscript'
         nodeset <nodename> osimage=<osimage>

Debug partition script

If the partition script has any problem, the os installation will probably hang, to debug the partition script, you could enable the ssh access in the installer during installation, then login the node through ssh after the installer has started the sshd.
For Redhat, you could specify sshd in the kernel parameter and then kickstart will start the sshd when Anaconda starts, then you could login the node using ssh to debug the problem:

chdef <nodename> addkcmdline="sshd"
nodeset <nodename> osimage=<osimage>

For Ubuntu, you could insert the following preseed entries to /install/autoinst/<node> to tell the debian installer to start the ssh server and wait for you to connect:</node>

d-i anna/choose_modules string network-console
d-i preseed/early_command string anna-install network-console

d-i network-console/password-disabled boolean false
d-i network-console/password           password cluster
d-i network-console/password-again     password cluster

** Note: For the entry "d-i preseed/early_command string anna-install network-console",if there is already a "preseed/early_command" entry in /install/autoinst/<node>, the value "anna-install network-console" should be appended to the existed "preseed/early_command" entry carefully, otherwise, the former will be overwritten. </node>

set the kernel options which will be persistent the installed system(Optional)

The attributes “linuximage.addkcmdline” and “bootparams.addkcmdline” are the interfaces for the user to specify some additional kernel options to be passed to kernel/installer for node deployment.

The added kernel parameters can be 'OS deployment Only' or 'Reboot Only'(Added to the grub2.conf). A specific prefix 'R::' is defined to identify that this parameter is 'Reboot Only'. Otherwise, it's 'OS deployment Only'.

For example, to specify the redhat7 kernel option “net.ifnames=0” to be persistent (Reboot Only), that means it does take effect even after reboot:

chdef -t osimage -o rhels7-ppc64-install-compute -p addkcmdline="R::net.ifnames=0"

Note: The persistent kernel options with prefix 'R::' won't be passed to the OS installer for node deployment. So that means if you want a parameter to be available for both 'OS deployment' and 'Reboot', you need to specify the parameter twice with and without 'R::' prefix.

[SLES] set the netwait kernel parameter (Optional)

If there are quite a few(e.g. 12) network adapters on the SLES compute nodes, the os provisioning progress might hang because that the kernel would timeout waiting for the network driver to initialize. The symptom is the compute node could not find os provisioning repository, the error message is "Please make sure your installation medium is available. Retry?".

To avoid this problem, you could specify the kernel parameter "netwait" to have the kernel wait the network adapters initialization. On a node with 12 network adapters, the netwait=60 did the trick.

  chdef <nodename> -p addkcmdline="netwait=60"

Update the Distro at a Later Time

After the initial install of the distro onto nodes, if you want to update the distro on the nodes (either with a few updates or a new SP) without reinstalling the nodes:

create the new repo using copycds:

    copycds <path>/RHEL6.3-*-Server-x86_64-DVD1.iso

 Or, for just a few updated rpms, you can copy the updated rpms from the distributor into a directory under /install and run createrepo in that directory.

add the new repo to the pkgdir attribute of the osimage:

    chdef -t osimage rhels6.2-x86_64-install-compute -p pkgdir=/install/rhels6.3/x86_64

 Note: the above command will add a 2nd repo to the pkgdir attribute. This is only supported for xCAT 2.8.2 and above. For earlier versions of xCAT, omit the -p flag to replace the existing repo directory with the new one.

run the ospkgs postscript to have yum update all rpms on the nodes

    updatenode compute -P ospkgs

Option 2: Installing Stateful Nodes Using Sysclone

This section describes how to install or configure a diskful node (we call it a golden-client), capture an osimage from this golden-client, then the osimage can be used to install/clone other nodes. See Using_Clone_to_Deploy_Server for more information.

Note: this support is available in xCAT 2.8.2 and above.

Install or Configure the Golden Client

If you want to use the sysclone provisioning method, you need a golden-client. In this way, you can customize and tweak the golden-client’s software and configuration according to your needs, and verify it’s proper operation. Once the image is captured and deployed, the new nodes will behave in the same way the golden-client does.

To install a golden-client, follow the section Installing_Stateful_Linux_Nodes#Option_1:_Installing_Stateful_Nodes_Using_ISOs_or_DVDs.

To install the systemimager rpms on the golden-client, do these steps on the mgmt node:

Download the xcat-dep tarball which includes systemimager rpms. (You might already have the xcat-dep tarball on the mgmt node.)

Go to xcat-dep and get the latest xCAT dependency tarball. Copy the file to the management node and untar it in the appropriate sub-directory of /install/post/otherpkgs. For example:

    (For RH/CentOS):    
    mkdir -p /install/post/otherpkgs/rhels6.3/x86_64/xcat
    cd /install/post/otherpkgs/rhels6.3/x86_64/xcat
    tar jxvf xcat-dep-*.tar.bz2



    (For SLES):  
    mkdir -p /install/post/otherpkgs/sles11.3/x86_64/xcat
    cd /install/post/otherpkgs/sles11.3/x86_64/xcat
    tar jxvf xcat-dep-*.tar.bz2

Add the sysclone otherpkglist file and otherpkgdir to osimage definition that is used for the golden client, and then use updatenode to install the rpms. For example:

    (For RH/CentOS): 
    chdef -t osimage -o <osimage-name> otherpkglist=/opt/xcat/share/xcat/install/rh/sysclone.rhels6.x86_64.otherpkgs.pkglist
    chdef -t osimage -o <osimage-name> -p otherpkgdir=/install/post/otherpkgs/rhels6.3/x86_64
    updatenode <my-golden-cilent> -S



    (For SLES):  
    chdef -t osimage -o <osimage-name> otherpkglist=/opt/xcat/share/xcat/install/sles/sysclone.sles11.x86_64.otherpkgs.pkglist
    chdef -t osimage -o <osimage-name> -p otherpkgdir=/install/post/otherpkgs/sles11.3/x86_64
    updatenode <my-golden-cilent> -S

Capture image from the Golden Client

On the mgmt node, use imgcapture to capture an osimage from the golden-client.

    imgcapture <my-golden-client> -t sysclone -o <mycomputeimage>

Tip: when imgcapture is run, it pulls the osimage from the golden-client, and creates the image files system and a corresponding osimage definition on the xcat management node.

 lsdef -t osimage <mycomputeimage> to check the osimage attributes.

Appendix 1: IBM Flex Recovery and CMM Redundancy

The CMM is the gateway for the hardware management and monitoring communication for the Flex chassis and the Flex P7 blades. If you lose the network communication between the xCAT MN and the primary CMM, you can not execute any hardware management commands to the CMM or blades. If the Flex P7 blades and Ethernet SM are running, the blades should be able to keep running for some time.

Replacement of CMM

If you only have one CMM configured in your Flex chassis, you will need to work with IBM service to fix this CMM quickly, since you will not be able to properly manage the Flex blades until you have a working CMM. The CMM replacement activity is to execute CMM HW discovery on new CMM, where you locate the new MAC address and current DHCP dynamic IP address for CMM. You then update the CMM node object's "mac" and "otherinterfaces" attributes with data found from hardware discovery. Once the CMM node object has new data, we execute the configuration CMM steps working with rspconfig. Once the CMM is configured using the static IP, the DHCP and mac address is not referenced.

The following scenario is to replace the CMM working with node object "cmm01" with a static IP of 10.1.100.1.

Locate new mac and DHCP IP for replacement CMM

     lsslp -m -z -s CMM > /tmp/cmm01.stanza

Update cmm01 object with new mac and current DHCP IP

     chdef cmm01 otherinterfaces=<dhcpip> mac=<macaddr>

Set password for USERID for new cmm0

     rspconfig cmm01 USERID=<new_passwd>  1

Set new cmm01 back to original static IP

     rspconfig cmm01 initnetwork=*

Enable ssh and snmp for new cmm01

     rspconfig cmm01 sshcfg=enable snmpcfg=enable

CMM Redundancy

The recommended support strategy with xCAT is to setup each Flex chassis with 2 CMM's where the primary CMM is located in bay 1 and the standby CMM is in bay 2. Each CMM needs to have their own ethernet connection into the xCAT HW VLAN, and the primary CMM must be configured as a static IP that is listed in the xCAT DB. The xCAT MN only can communicate with the primary CMM when executing hardware management commands. The Standby CMM is only there as a backup, and will take ownership as the primary CMM using the same static IP. The xCAT Flex only supports the default CMM redundancy configuration, and does not support the advanced failover settings. The activity for CMM fail over is that the standby CMM takes over the roll of the primary CMM, and that failed CMM is setup as the standby CMM when registered by the Flex chassis. The xCAT MN will lose it's network connection to the primary CMM during the CMM fail over, but will automatically reconnect back to the new primary CMM when it completes the failover in about 3-4 minutes

The fail over from the primary CMM to the standby CMM happens in the following scenarios.

 Admin executes software failover from the CMM GUI
 Admin executed software failover using the CMM CLI
 Admin physically pulls out primary CMM from the Flex chassis

Fail over software reset from CMM GUI

The admin will have a network connection into the CMM, and has activated the CMM GUI. They will reference the "Mgt Module Management" and select on the "Restart" . The admin selects the "Restart and Switch to Standby Management Module" . This will cause the primary CMM to reset, and will change the setting of primary to the "Standby CMM" which now becomes the new primary CMM when the fail over completes.

Fail over software reset from CMM CLI

The admin will have a network connection into the CMM, and has a ssh connection into primary CMM with USERID from xCAT MN. The admin use CMM CLI command "env -T" to get to the primary CMM then executes command "reset -f" for the CMM failover. This will cause the primary CMM to reset, and will change the setting of primary to the "Standby CMM" which now becomes the new primary CMM when the fail over completes.

     # ssh USERID@cmm01
      Hostname:              cmm01
      Static IP address:     10.0.100.1
      Burned-in MAC address: 5F:FF:FF:FF:FF:FF
      DHCP:                  Disabled - Use static IP configuration.
     system> env -T system:mm[1]
     OK
     system:mm[1]> reset -f

Fail over hardware reset of CMM

The scenario is when there is a physical activity where the primary CMM is pulled from the chassis. There are different reasons why the admin may want pull out the CMM. This could be when the CMM is no longer working properly or there is an issue with the ethernet interface of the primary CMM. At this time when the primary CMM is pulled, it will do an automatic failover to the standby CMM, and the standby CMM is now the primary. The admin can work IBM or network support to understand the CMM or network failure. When the failed CMM is ready, the admin can just plug it in the Flex chassis, and it will now become the new Standby CMM. The admin can schedule a CMM software fail over if they want to swap back to the original CMM primary.

Appendix 2: CMM and Flexible Service Processor(FSP) password

In the IBM Flex chassis the architecture is designed to simplify some aspects of the systems management of the chassis. As part of this goal the IBM Flex system has integrated the CMM USERID and password into the IBM Flex system p compute nodes FSP. This is done through an internal LDAP server on the CMM serving the userids and passwords to LDAP on the FSPs. What this means to the system xCAT administrator is that the CMM USERID is tightly coupled with xCAT DFM authentication of the FSP. xCAT hardware control failures to authenticate on the FSP is likely the result of an issue with the chassis CMM USERID password. This section will provide commands which will help you determine that you have an authentication problem, verify that its an issue with the CMM USERID password, as well as how to resolve the problem.

Errors caused by an FSP authentication problem

The system administrator may first notice a problem with some of the hardware control
commands giving an authentication error.

   rpower cmm01node01 stat
    cmm01node01: Error: state=CEC AUTHENTICATION FAILED, 
       type=02, MTMS=7895-42X*10F752A, sp=primary, slot=A, ipadd=12.0.0.32, 
           alt_ipadd=unavailable

Checking the connection to the FSP shows that the authenication for this FSP is failing:

    lshwconn cmm01node01
    cmm01node01: sp=primary,ipadd=12.0.0.32,alt_ipadd=unavailable,state=CEC 
         AUTHENTICATION FAILED

This could be caused by the USERID password being expired on the CMM. You can check with the following:

    ssh USERID@cmm01 users -T mm[1]
    system> users -T mm[1]
    Users
    =====
    USERID
      Group(s): supervisor
      Max 0 session(s) allowed
      1 active session(s)
      Account is active
      **Password is expired**
      Password is compliant
      Number of SSH public keys installed for this user: 3
    User Permission Groups
    ======================

In order to correct this problem you need to activate the CMM USERID and then remove and add the connections to the FSP.

     ssh USERID@cmm01 accseccfg -pe 0 -T mm[1]

Checking the USERID password is active:

     ssh USERID@cmm01 users -T mm[2]
    system> users -T mm[2]
    Users
    =====
    USERID
      Group(s): supervisor
      Max 0 session(s) allowed
      1 active session(s)
      **Account is active**
      Password does not expire
      Password is compliant
      Number of SSH public keys installed for this user: 3
    User Permission Groups
    ======================

Second you need to remove and add back each FSP connection for this chassis to create new connections:

     rmhwconn cmm01node01
     mkhwconn cmm01node01 -t

The last step is to check the connection:

    lshwconn cmm01node01
    cmm01node01: sp=primary,ipadd=12.0.0.32,alt_ipadd=unavailable,state=LINE UP

Appendix 3: Updating Firmware on Flex Ethernet and IB Switch Modules

This section provides manual procedures to help update the firmware for Ethernet and Infiniband (IB) Switch modules. There is more detail information can be referenced in the IBM Flex System documentation under Network switches: http://publib.boulder.ibm.com/infocenter/flexsys/information/

The IB6131 Switch module is a Mellanox IB switch, and you down load firmware (image-PPC_M460EX-SX_3.2.xxx.img) from the Mellanox website into your xCAT Management Node or server that can communicate to Flex IB6131 switch module. We provided the firmware update procedure for the Mellanox IB switches including IB6131 Switch module in our xCAT document Managing the Mellanox Infiniband Network: Managing_the_Mellanox_Infiniband_Network/#mellanox-switch-and-adapter-firmware-update.

The IBM Flex system supports Ethernet switch modules models (EN2092 (1GB), EN4093 (10GB), and the firmware is available from the IBM Support Portal http://www-947.ibm.com/support/entry/portal/overview?brandind=hardware~puresystems~pureflex_system. The firmware update procedure used with the Flex Ethernet (EN2092) switch module which will reference two firmware images for OS (GbScSE-1G-10G-7.5.1.xx_OS.img) and Boot (GbScSE-1G-10G-7.5.1.x_Boot.img). These images should be placed on the xCAT MN or FTP server in the /tftpboot directory. Make sure that this server has proper ethernet communication to the Ethernet switch module.

Firmware Update using CLI

1) Login to the Ethernet switch using the "admin" userid and specify the admin password.

       ssh admin@<switchipaddr>

2) Get into boot directory, and list current image settings with cur command. This includes 2 OS images called image1 and image2,and will specify which image is the current boot image.

       >> boot
       >> cur

3) Get the new Ethernet OS image file from the ftp server to replace the older image on the ethernet switch using gtimg command. The gtimg command will prompt you for full path OS image file name, ftp/root userid, and password. It will ask to specify "data" port, and a confirmation to complete the download, and flashes the update. An example of EN2092 OS image would be "GbScSE-1G-10G-7.5.1.0_OS.img", and replaces "image2" on the ethernet switch.

       >> gtimg image2 <FTP server> GbScSE-1G-10G-7.5.1.0_OS.img
          Enter name of file on FTP/TFTP server: /tftpboot/GbScSE-1G-10G-7.5.1.0_OS.img
          Enter username for FTP server or hit return for TFTP server: root
          Enter password for username on FTP server:  <root password>
          Enter the port to use for downloading the image ["data"|"mgt"]: "data"
          Confirm download operation [y/n]: y

4) Get the new Ethernet boot image file from the ftp server to replace cuurent boot image on the ethernet switch using gtimg command. The gtimg command will prompt you for full path OS image file name, ftp/root userid, and password. It will ask to specify "data" port, and a confirmation to complete the download, and flashes the update. An example of EN2092 OS image would be "GbScSE-1G-10G-7.5.1.0_Boot.img", and will point to new boot image2.

       >> gtimg image2 <FTP server> GbScSE-1G-10G-7.5.1.0_Boot.img
          Enter name of file on FTP/TFTP server: /tftpboot/GbScSE-1G-10G-7.5.1.0_Boot.img
          Enter username for FTP server or hit return for TFTP server: root
          Enter password for username on FTP server:  <root password>
          Enter the port to use for downloading the image ["data"|"mgt"]: "data"
          Confirm download operation [y/n]: y

5) Validate the current image settings with cur command, where image2 now has the latest firmware level, and that the current boot image is working with latest image2 file. You can then execute the reset command to boot the ethernet switch using the latest firmware level.

       >> cur
       >> reset

Appendix 4 Perform Deferred Firmware upgrades for Flex blade CEC

Deferred firmware update Background

It may take some time to execute a disruptive firmware update in a large cluster. To reduce the down time of the cluster, customers may want to flash new firmware levels while the Flex blades are up and running, The deferred firmware update will load the new firmware into the T (temp) side, but will not activate it like the disruptive firmware. The customer can continue to run with the P (perm) side and can wait for a maintenance window where they can activate and boot the blades/cec with new firmware levels.

temp/perm side, pending_power_on_side attributes in Deferred firmware update

The deferred firmware update includes 2 parts: The first part (1) is to apply the firmware to the T (temp) sides of Flex blade FSPs when the cluster is up and running. The second part (2) is to activate the new firmware on the blades at a scheduled time.

The default setting is that the CEC/FSPs are working from the temp side (current_power_on_side). During part(1) of the deferred firmware update implementation, the CEC will continue to run on the perm side while the rflash of the new firmware levels will installed to the temp side. It is very important that the perm side contains the current stable version of firmware. The perm side is usually only used as a recovery environment when working with firmware updates.

When executing a reboot to the blade (FSPs), it will run on the side which the pending_power_on_side attribute is set. After we finish the part (1), the admin will want to make sure the pending_power_on_side attribute is set to "perm" if the blades want to be rebooted working with the older stable firmware. When you are ready to activate the new firmware and reboot the blades, you will want to make sure the pending_power_on_side attribute is set to "temp".

The procedure of the deferred firmware update

Before starting the deferred firmware update, the admin should first make sure that the most recent stable firmware level has been applied to the P (perm) side. We should note that T-side firmware will be moved over to the P-side automatically when we execute the rflash of the new firmware into the T (temp) side.

1.1 Apply the firmware for Flex blades

      rinv <blade> firm

1.2 Apply the new GFW code into the blade's FSPs

      rflash <blade> -p <rpm_directory> --activate deferred

1.3 Check to make sure the proper Firmware levels have been loaded into the temp side (new) and the perm side (previous) for the Frames or CECs. The rflash working with "deferred" should now specify Current Power on side to now be "perm":

      rinv <blade> firm

2. Setup Cecs/blades pending power to Perm (needed for CEC/blade reboot -- power off/on)

In part 1, the new firmware is now loaded on the temp side. If you need to keep the Flex blade active for a period of time (such as several days) we need to make sure we are working with previous firmware level, which is running on the P-side. You should change the pending_power_on_side attribute from temp to perm.

      rspconfig <blade> pending_power_on_side

  If not, set CEC's the pending power on side to P-side:

      rspconfig <blade> pending_power_on_side=perm

3.Activate the new firmware at schedule time

The new firmware level has been loaded on the temp side, and it is time to activate the blade/CECs with new firmware level. The admin should make sure the pending_power_on_side is now set back from perm to temp.

3.1 Check if the pending power on side for CEC are on T-side

      rspconfig <blade> pending_power_on_side
      If not, set the pending power on side to T-side
      rspconfig <blade> pending_power_on_side=temp

3.2 Power off the target Flex blades

      rpower <blade> off

3.3 Reboot the service processor for the CECs/blade

       rpower <blade> resetsp

Wait for 5-10 minutes for FSPs to restart. When the connections become LINE_UP again, the FSPs have finished the reboot.

       lshwconn <blade>

3.4 Verify that the cec/blade updates are the new firmware level and that they are using the temp side for the current_power_on_side .

       rinv <blade> firm

3.5 Power on the Flex blades and bring up the Flex blade cluster. The power on of the flex blades will be based on the install environment.

If this is a diskful environment, the admin should be able to "rpower <blade> on " to bring the blade up on the local disk.

      rpower  <blade> on

If this is a diskless environment, the admin should power up the blade to onstandby, set the boot sequence to network, and reset the blade.

      rpower <blade> onstandby
      rbootseq <blade> net
      rpower <blade> reset

Appendix 5 lshwconn LINE DOWN after power outage

Testing has shown that when a chassis looses power and is started back up it is possible that the connections to the blade FSPs will be LINE DOWN. If this occurs you should reset the CMM for the chassis with this problem.

     ssh USERID@cmm01 service -T mm[1] -vr

Appendix 6: Migrate your Management Node to a new Service Pack of Linux

If you need to migrate your xCAT Management Node with a new SP level of Linux, for example rhels6.1 to rhels6.2 you should as a precautionary measure:

Backup database and save critical files to be used if needed to reference or restore using xcatsnap. Move the xcatsnap log and *gz file off the Management Node.
Backup images and custom data in /install and move off the Management Node.
service xcatd stop
service xcatd stop on any service nodes
Migrate to the new SP level of Linux.
service xcatd start

If you have any Service Nodes:

Migrate to the new SP level of linux and reinstall the servicenode with xCAT following normal procedures.
service xcatd start

The documentation Setting_Up_a_Linux_xCAT_Mgmt_Node/#appendix-d-upgrade-your-management-node-to-a-new-service-pack-of-linux gives a sample procedure on how to update the management node or service nodes to a new service pack of Linux.

Appendix 7: Install your Management Node to a new Release of Linux

First backup critical xCAT data to another server so it will not be loss during OS install.

Back up the xcat database using xcatsnap, important config files and other system config files for reference and for restore later. Prune some of the larger tables:

     tabprune eventlog -a
     tabprune auditlog -a
     tabprune isnm_perf -a (Power 775 only)
     tabprune isnm_perf_sum -a (Power 775 only)
     xcatsnap

xcatsnap will capture database, config files. You should copy to another host. By default it will create in /tmp/xcatsnap two files, for example:

     xcatsnap.hpcrhmn.10110922.log
     xcatsnap.hpcrhmn.10110922.tar.gz

Back up from /install directory, all images, custom setup data that you want to save. and move to another server. xcatsnap will not backup the install directory.

After the OS install:

Proceed to to setup the xCAT MN as a new xCAT MN using the instructions in this document.

Wiki: Cluster_Name_Resolution
Wiki: Setting_Up_a_Linux_Hierarchical_Cluster
Wiki: Setting_Up_a_Linux_xCAT_Mgmt_Node
Wiki: XCAT_Documentation
Wiki: XCAT_Linux_Statelite
Wiki: XCAT_Overview,_Architecture,_and_Planning
Wiki: XCAT_System_p_Hardware_Management_for_HMC_Managed_Systems
Wiki: XCAT_System_p_Hardware_Management_for_hmc_managed_systems