XCAT_system_p_support_for_IBM_Flex

There is a newer version of this page. You can find it here.

THIS PAGE UNDER CONSTRUCTION

Introduction

IBM Flex combines networking, storage and servers in a single offering. It's consist of an IBM Flex Chassis, one or two Chassis Management Modules(CMM) and Power 7 and x86 compute node servers. The type of the management module for IBM Flex is 'cmm', and the blade servers include the IBM Flex System™ p260, p460, and 24L Power 7 servers as well as the IBM Flex System™ x240 Compute Node which is an x86 Intel-processor based server. In this document only the management of POWER 7 blade server will be covered.

IBM Flex System™ p260, p460, and 24L Power 7 server hardware management: Generally, xCAT uses the management type 'blade' to manage the blade center and blade server (The management work is done through the management module). For IBM Flex xCAT will use a management type of 'fsp' to management the POWER 7 blade servers(The management work is done through the xCAT DFM (Direct FSP Management)). For xCAT IBM Flex Power 7 servers, the management approach will be the mix of 'blade' and 'fsp'. Most of the discovery work will be done through CMM and most of the hardware management work will be done through blade's FSP directly.

Terminology

The following terms will be used in this document:

xCAT DFM: Direct FSP Management is the name that we will use to describe the ability for xCAT software to communicate directly to the IBM FLex Power pblade's service processor without the use of the HMC for management.

Chassis Management Module(CMM) - this term is used to reflect the pair of management modules installed in the rear of the chassis which have an Ethernet connection. The CMM is used to discover the servers within the chassis and for some data collection regarding the servers and chassis.

Compute node: This term is used to refer to the servers in an IBM Flex system. Compute nodes can be either Power 7 servers or x86 Intel based servers.

blade node: blade node refers to a node with the hwtype set to blade and represents the whole blade server. And the hcp attribute of the blade is set to the FSP's IP.

Downloading and Installing DFM

This requires the new xCAT Direct FSP Management plugin (xCAT-dfm-*.ppc64.rpm), which is not part of the core xCAT open source, but is available as a free download from IBM. You must download this and install it on your xCAT management node (and possibly on your service nodes, depending on your configuration) before proceeding with this document.

Download xCAT-dfm RPM: http://www-933.ibm.com/support/fixcentral/swg/selectFixes?parent=ibm~ClusterSoftware&product=ibm/Other+software/IBM+direct+FSP+management+plug-in+for+xCAT&release=All&platform=All&function=all

Download ISNM-hdwr_svr RPM (linux) http://www-933.ibm.com/support/fixcentral/swg/selectFixes?parent=ibm~ClusterSoftware&product=ibm/Other+software/IBM+High+Performance+Computing+%28HPC%29+Hardware+Server&release=All&platform=All&function=all

Once you have downloaded these packages, install the hardware server package, and then install DFM:

rhels:

If you have been following the xCAT documentation, you should already have the yum repositories set up to pull in whatever xCAT dependencies and distro RPMs are needed (libstdc++.ppc, libgcc.ppc, openssl.ppc, etc.).

yum install xCAT-dfm-.ppc64.rpm ISNM-hdwr_svr-.ppc64.rpm

Hardware Discovery

Overview

The discovery procedure is used to simplify the cluster environment setup for the administrator especially for the cluster with thousands of nodes. Administrator needs to connect the ethernet and provide the power before the discovery process is started. Firstly, discover the CMM and configure the cmm node , then discover and configure the blade server/fsp.

Preparation for the discovery

1. The Ethernet interface of CMM and xCAT management node have been connected to the service VLAN so that xCAT management node can connect to the hardware to do the hardware discovery and management work.

2. Configure a dhcp dynamic range for the CMM and FSPs to get the temporary IP to finished the hardware discovery. In this example, the 10.0.0.0/16 will be used as the service vlan, and the 10.0.200.0/24 will be used as the temporary network for the discovery of cmm.

Note: As part of RH6.2, the dhcpd daemon will require the "dhcpd" user to be added to the "/etc/passwd file" . The dhcpd user should be added automatically when the dhcp.ppc64 rpm is installed. If you need to add it by hand, run: adduser -s /sbin/nologin -d / dhcpd

chdef -t network 10_0_0_0-255_255_0_0 dynamicrange=10.0.200.1-10.0.200.200
makedhcp -n service dhcpd restart   # linux
startsrc -s dhcpsd                  # AIX

Discovery the cmm by the lsslp command

1. Power on all of the chassis. This will cause the CMMs to get the temporary DHCP IP from the xCAT management node. 2. Run the lsslp to discover the CMMs:

lsslp -m -z -s CMM > /tmp/cmm.stanza

3. Edit the stanza file to give the meaningful node name for the cmms (The mpa attribute should have the same value with the name). Simply the names can be set as cmm01 to cmm99. These CMM node names will require name resolution (added to /etc/host). 4. Define the CMMs to the xCAT database:

cat /tmp/cmm.stanza | mkdef -z

5. Define the static IP for all the cmms

chdef -t group cmm ip='|cmm(\d+)|10.0.100.($1+0)|'

6. Add the CMM node names into the /etc/hosts, and dns resolution if being used for name resolution.

makehosts cmm
makedns cmm

Configure the cmm

1. If the user want to change the password for USERID to another one, the following command can be used:

rspconfig cmm USERID=<new_passwd>

2. Initialize the network configuration for cmms. The static IP will be configured to the cmm.

rspconfig cmm initnetwork=*

3. Enable the ssh,snmp for all the cmms

rspconfig cmm sshcfg=enable snmpcfg=enable

Create the blade server in two different ways

Create object definition of blade server in the Database first and run rscan -u to Discovery

This implementation should only be used when there are uniformed blade configurations working in the chassis. If there are mixtures of single and double wide blades in the chassis, the admin will need to remove unused blade node objects.

1. Define the blade server node definitions

For Power 7 servers:

The attribute 'mpa' should be set to the node name of cmm. The attribute 'slotid' should be set to the physical slot id of the blade. The attribute 'hcp' should be set to the IP that admin try to assign to the fsp of the blade.

mkdef cmm[01-02]node[01-14] groups=all,blade mgt=fsp cons=fsp chdef -t group blade mpa='|cmm(\d+)node(\d+)|cmm($1)|
'slotid='|cmm(\d+)node(\d+)|($2+0)|' hcp='|cmm(\d+)node(\d+)|10.0.($1+0).($2+0)|' mgt=fsp

[root@c870f3ap01 ~]# nodels blade
cmm01node01
cmm01node03
cmm01node05
cmm01node07
cmm01node09
cmm01node10
cmm01node11
[root@c870f3ap01 ~]# lsdef cmm01node01
Object name: cmm01node01
cons=fsp
groups=blade,all
hcp=12.0.0.32
hwtype=blade
id=1
mgt=fsp
mpa=cmm01
mtm=789542X
nodetype=ppc,osi
parent=cmm01
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
serial=10F752A
slotid=1
[root@c870f3ap01 ~]#

For x86 Intel compute nodes:

The attribute 'mpa' should be set to the node name of cmm. The attribute 'slotid' should be set to the physical slot id of the blade. The attribute 'hcp' should be set to the IP that admin assigns to the BMC of the blade.

mkdef cmm[01-02]node[01-14] groups=all,blade mgt=ipmi cons=fsp chdef -t group blade mpa='|cmm(\d+)node(\d+)|cmm($1)|
'slotid='|cmm(\d+)node(\d+)|($2+0)|' hcp='|cmm(\d+)node(\d+)|10.0.($1+0).($2+0)|' mgt=ipmi

[root@c870f3ap01 ~]# nodels blade
cmm01node01
cmm01node03
cmm01node05
cmm01node07
cmm01node09
cmm01node10
cmm01node11
[root@c870f3ap01 ~]# lsdef cmm01node01
Object name: cmm01node01
cons=ipmi
groups=blade,all
hcp=12.0.0.32
hwtype=blade
id=1
mgt=ipmi
mpa=cmm01
mtm=789542X
nodetype=ppc,osi
parent=cmm01
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
serial=10F752A
slotid=1
[root@c870f3ap01 ~]#

2. Run rscan -u to discover all the compute node servers. The 'rscan -u' will match the xCAT nodes which have been defined in the xCAT database and update them instead of create a new one. It will also provide an error message that specifies if the blade node object is not found in the xCAT database. This type of error should happen when there is a configuration where the chassis contains both single wide and double wide blade configurations. The admin can execute the rmdef command for any unused blade node objects.

rscan cmm -u
If there are a mixture of single and double wide blade in the chassis, the admin should remove the unused blade objects from the xCAT DB.
rmdef  <cmmxxnodeyy>

Create object definition of blade server by discovery directly

The rscan command reads the actual configuration of blade server in the CMM and creates node definitions in the xCAT database to reflect them. Run the rscan command against all of the CMMs to create a stanza file for the definitions of all the compute node servers.

rscan cmm -z >nodes.stanza

The Power 7 compute node stanza file is like this:

SN#YL10JH184084:
       objtype=node
       nodetype=ppc,osi
       slotid=1
       id=1
       mtm=789542X
       serial=10F69BA
       mpa=flexcmm01
       parent=flexcmm01
       hcp=70.0.0.41
       groups=blade,all
       mgt=fsp
       cons=fsp
       hwtype=blade
SN#Y110UF18P003:
       objtype=node
       nodetype=ppc,osi
       slotid=3
       id=1
       mtm=789522X
       serial=10F75AA
       mpa=flexcmm01
       parent=flexcmm01
       hcp=70.0.0.22
       groups=blade,all
       mgt=fsp
       cons=fsp
       hwtype=blade

The x86 Intel based compute node stanza file is like this:

SN#YL10JH184095:
       objtype=node
       nodetype=mp,osi
       id=7
       mtm=8737AC1
       serial=23FFP69
       mpa=flexcmm01
       parent=flexcmm01
       groups=blade,all
       mgt=ipmi
       cons=ipmi
       hwtype=blade
SN#Y110UF18P005:
       objtype=node
       nodetype=mp,osi
       id=8
       mtm=789522X
       serial=23FFP92
       mpa=flexcmm01
       parent=flexcmm01
       groups=blade,all
       mgt=ipmi
       cons=ipmi
       hwtype=blade

In a stanza file, the user can get the blade server with the attributes hcp (fsp of the blade), mtm, serial and id attributions. For the stanza file above, the node SN#YL10JH184084 is a pblade(nodetype=ppc,hwtype=blade,mpa=cmm01). In order to easily access or operate those compute node servers, the user can edit the stanza file and give the node the name user want them to be for definition of each compute node server.

For Power 7 compute nodes the administrator will change the object name and hcp attribute for the IP of fsp. For example, the user can modify the definition of SN#YL10JH184084 as followings:

cmm01node01:
   objtype=node
   cons=fsp
   groups=blade,all
   hcp=70.0.0.41
   hwtype=blade
   slotid=3
   id=1
   mgt=fsp
   mpa=cmm01
   mtm=789542X
   nodetype=ppc,osi
   parent=flexcmm01
   serial=10F69BA
   slotid=1

For x86 compute nodes the administrator will change the object name only. For example, the user can modify the definition of SN#Y11oUF18P005 as followings:

cmm01node08:
       objtype=node
       nodetype=mp,osi
       slotid=3
       mtm=789522X
       serial=23FFP92
       mpa=flexcmm01
       parent=flexcmm01
       groups=blade,all
       mgt=ipmi
       cons=ipmi
       hwtype=blade

Then create the definitions in the database:

cat nodes.stanza | mkdef -z

Set the network configuration for the fsp

rspconfig blade network=*

Modify blade server device names

In order to conveniently manage the blade servers, the customer may wan to have a cleaner name for the blade node. The following command can be used to modify a blade device name.

rspconfig singlenode textid="cmm01node01"

The following command can be used to change a group of blade device name to the node names that are defined in xCAT DB.

rspconfig blade textid=*

Power 7 Compute Servers Only - Create the hardware server connection for the blades' FSPs

1. Add the blade's fsp connections for the DFM management:

mkhwconn blade -t

2. check the connections are LINE_UP:

lshwconn blade

3. make sure the blade server powered on

rpower blade state 
rpower blade on

Power 7 Compute Servers Only - Prepare the provision configuration

rcons configuration

1 Make sure the SOL on CMM has been disabled

rspconfig cmm solcfg 
rspconfig cmm solcfg=disable

2 Update conserver configuration

makeconservercf 
#For Linux: 
service conserver stop 
service conserver start 
#For AIX: 
stopsrc -s conserver 
startsrc -s conserver

3 Check rcons

Before run rcons to open the console, you should make sure the blades are on state:

rpower bladenoderange state  ; if node is off  then run 
rpower bladenoderange on


rcons onebladenode

If the blade is in off state, it will specify "Destination BLADE is in POWER OFF state, Please power it on and wait.". We need to power on blade.

rpower onebladenode  state  ;if node is off  then run 
rpower onebladenode on

If the blade is powered off and then on while console is opening, we need to refresh console manually.

Check hardware control setup to the nodes See if you setup is correct at this point, run rpower to check node status:

rpower bladenoderange  stat

Update the mac table with the address of the node(s) Get mac addresses: IBM Flex POWER 7 blades support getting the mac address through the CMM:

1 Set the 'getmac' attribute to 'blade'

chdef cmm01node01 getmac=blade

Since the Firmware is not stable at present, the following 2 steps are recommended to get the mac address for the specified interface.

2 run getmacs to display all the macs:

getmacs cmm01node01 -d

3 Set the 'installnic' attribute to specify the mac address for which interface will be collected. The admin shall exactly know witch interface is connected. Note: If 4 mac addresses are gotten, they all are the mac addresses of the blade. The x can start from 0(map to the eth0 of the blade) to 3. If 5 mac addresses are gotten, the 1st mac address must be the mac address of the blade's FSP, so the x shall start from 1(map to the eth0 of the blade) to 4.

chdef cmm01node01 installnic=x
getmacs cmm01node01

set boot string: Before run rbootseq to set the boot device, you should make sure the blades are on state:

rpower bladenoderange state  ; if node is off  then run 
rpower bladenoderange on

And then run rbooteq to set the boot string:

rbootseq bladenoderange  net

After using rbootseq to set the boot string, you should run rpower with cycle to make the boot string be permanent, and wait about 1 minute:

 rpower bladenoderange cycle

Use network boot to start the installation

This is just a hardware control management doc. Before start the installation, you should configure some services, prepare the networks setting, osimage and so on.

1. Installation preparation

 For Linux, you should refer to the link:
    https://sourceforge.net/apps/mediawiki/xcat/index.php?title=XCAT_pLinux_Clusters
 For AIX,  you should refer to the link:
    https://sourceforge.net/apps/mediawiki/xcat/index.php?title=XCAT_AIX_RTE_Diskfull_Nodes
    https://sourceforge.net/apps/mediawiki/xcat/index.php?title=XCAT_AIX_mksysb_Diskfull_Nodes
    https://sourceforge.net/apps/mediawiki/xcat/index.php?title=XCAT_AIX_Diskless_Nodes

NOTE: If you want to use the hardware control commands for IBM Flex, you should use the commands listed in this doc.

2 Use network boot to start the installation

rpower bladenoderange cycle

The 'cycle' process will costs about 1 minute, then the boot screen will be seen with 'rcons'.

Update the CMM firmware

The CMM firmware is updated by downloading the firmware from <http TBD>. Once the firmware has been downloaded the compressed tar file needs to be uncompressed, unzipped, and placed on a directory on the EMS. For this example we will use /install/firmware as the directory for the firmware.

Once the firmware is unzipped and placed in this directory you can use the CMM update command to update the firmware on either one chassis at a time or all chassis managed by xCAT. The format of the command is:

flash the file and reboot afterwards

update -r -u http://&lt;server&gt;/&lt;path to file&gt;

flash, show progress, and reboot afterwards

update -v -r -u http://&lt;server&gt;/&lt;path to file&gt;

To update a single CMM use:

ssh USERID@ccm01 udpate -T system:mm[1] -v -u http://70.0.0.1/install/firmware/2pet07v/cmefs.uxp

If unprompted password is setup on all CMMs then you can use xCAT psh to update all CMMs in the cluster at once.

psh cmm  -l USERID udpate -T system:mm[1] -v -u http://70.0.0.1/install/firmware/2pet07v/cmefs.uxp

Update the FSP firmware

This is accomplished by using the rflash xCAT command from the xCAT Management node. The admin should download the supported GFW from the IBM Fix central website, and place it in a directory that is available to be read by the xCAT Management node.

1. Use rinv command to get the current firmware levels of the blades' FSPs:

rinv bladenoderange firm (output to be added here)

2.Use the rflash command to update the firmware levels for the blades' FSPs. Then validate that the new firmware is loaded:

For firmware disruptive update, you should make sure the blade in power off state firstly.

 rpower bladenoderange state
 rpower bladenoderange off

And then use rflash to do the update:

rflash bladenoderange -p &lt;directory&gt; --activate disruptive 
(output to be added here) 
rinv bladenoderange firm

3. Verify that the blades are healthy and power on the blades:

rpower bladenoderange state
rvitals bladenoderange lcds
rpower bladenoderange on

MongoDB Logo MongoDB