THIS PAGE UNDER CONSTRUCTION
IBM Flex combines networking, storage and servers in a single offering. It's consist of an IBM Flex Chassis, one or two Chassis Management Modules(CMM) and Power 7 and x86 compute node servers. The type of the management module for IBM Flex is 'cmm', and the blade servers include the IBM Flex System™ p260, p460, and 24L Power 7 servers as well as the IBM Flex System™ x240 Compute Node which is an x86 Intel-processor based server. In this document only the management of POWER 7 blade server will be covered.
IBM Flex System™ p260, p460, and 24L Power 7 server hardware management: Generally, xCAT uses the management type 'blade' to manage the blade center and blade server (The management work is done through the management module). For IBM Flex xCAT will use a management type of 'fsp' to management the POWER 7 blade servers(The management work is done through the xCAT DFM (Direct FSP Management)). For xCAT IBM Flex Power 7 servers, the management approach will be the mix of 'blade' and 'fsp'. Most of the discovery work will be done through CMM and most of the hardware management work will be done through blade's FSP directly.
The following terms will be used in this document:
xCAT DFM: Direct FSP Management is the name that we will use to describe the ability for xCAT software to communicate directly to the IBM FLex Power pblade's service processor without the use of the HMC for management.
Chassis Management Module(CMM) - this term is used to reflect the pair of management modules installed in the rear of the chassis which have an Ethernet connection. The CMM is used to discover the servers within the chassis and for some data collection regarding the servers and chassis.
Compute node: This term is used to refer to the servers in an IBM Flex system. Compute nodes can be either Power 7 servers or x86 Intel based servers.
blade node: blade node refers to a node with the hwtype set to blade and represents the whole blade server. And the hcp attribute of the blade is set to the FSP's IP.
This requires the new xCAT Direct FSP Management plugin (xCAT-dfm-*.ppc64.rpm), which is not part of the core xCAT open source, but is available as a free download from IBM. You must download this and install it on your xCAT management node (and possibly on your service nodes, depending on your configuration) before proceeding with this document.
Download xCAT-dfm RPM: http://www-933.ibm.com/support/fixcentral/swg/selectFixes?parent=ibm~ClusterSoftware&product=ibm/Other+software/IBM+direct+FSP+management+plug-in+for+xCAT&release=All&platform=All&function=all
Download ISNM-hdwr_svr RPM (linux) http://www-933.ibm.com/support/fixcentral/swg/selectFixes?parent=ibm~ClusterSoftware&product=ibm/Other+software/IBM+High+Performance+Computing+%28HPC%29+Hardware+Server&release=All&platform=All&function=all
Once you have downloaded these packages, install the hardware server package, and then install DFM:
rhels:
If you have been following the xCAT documentation, you should already have the yum repositories set up to pull in whatever xCAT dependencies and distro RPMs are needed (libstdc++.ppc, libgcc.ppc, openssl.ppc, etc.).
yum install xCAT-dfm-.ppc64.rpm ISNM-hdwr_svr-.ppc64.rpm
The discovery procedure is used to simplify the cluster environment setup for the administrator especially for the cluster with thousands of nodes. Administrator needs to connect the ethernet and provide the power before the discovery process is started. Firstly, discover the CMM and configure the cmm node , then discover and configure the blade server/fsp.
1. The Ethernet interface of CMM and xCAT management node have been connected to the service VLAN so that xCAT management node can connect to the hardware to do the hardware discovery and management work.
2. Configure a dhcp dynamic range for the CMM and FSPs to get the temporary IP to finished the hardware discovery. In this example, the 10.0.0.0/16 will be used as the service vlan, and the 10.0.200.0/24 will be used as the temporary network for the discovery of cmm.
Note: As part of RH6.2, the dhcpd daemon will require the "dhcpd" user to be added to the "/etc/passwd file" . The dhcpd user should be added automatically when the dhcp.ppc64 rpm is installed. If you need to add it by hand, run: adduser -s /sbin/nologin -d / dhcpd
chdef -t network 10_0_0_0-255_255_0_0 dynamicrange=10.0.200.1-10.0.200.200
makedhcp -n service dhcpd restart # linux
startsrc -s dhcpsd # AIX
1. Power on all of the chassis. This will cause the CMMs to get the temporary DHCP IP from the xCAT management node. 2. Run the lsslp to discover the CMMs:
lsslp -m -z -s CMM > /tmp/cmm.stanza
3. Edit the stanza file to give the meaningful node name for the cmms (The mpa attribute should have the same value with the name). Simply the names can be set as cmm01 to cmm99. These CMM node names will require name resolution (added to /etc/host). 4. Define the CMMs to the xCAT database:
cat /tmp/cmm.stanza | mkdef -z
5. Define the static IP for all the cmms
chdef -t group cmm ip='|cmm(\d+)|10.0.100.($1+0)|'
6. Add the CMM node names into the /etc/hosts, and dns resolution if being used for name resolution.
makehosts cmm
makedns cmm
1. If the user want to change the password for USERID to another one, the following command can be used:
rspconfig cmm USERID=<new_passwd>
2. Initialize the network configuration for cmms. The static IP will be configured to the cmm.
rspconfig cmm initnetwork=*
3. Enable the ssh,snmp for all the cmms
rspconfig cmm sshcfg=enable snmpcfg=enable
This implementation should only be used when there are uniformed blade configurations working in the chassis. If there are mixtures of single and double wide blades in the chassis, the admin will need to remove unused blade node objects.
1. Define the blade server node definitions
The attribute 'mpa' should be set to the node name of cmm. The attribute 'slotid' should be set to the physical slot id of the blade. The attribute 'hcp' should be set to the IP that admin try to assign to the fsp of the blade.
mkdef cmm[01-02]node[01-14] groups=all,blade mgt=fsp cons=fsp chdef -t group blade mpa='|cmm(\d+)node(\d+)|cmm($1)|
'slotid='|cmm(\d+)node(\d+)|($2+0)|' hcp='|cmm(\d+)node(\d+)|10.0.($1+0).($2+0)|' mgt=fsp
[root@c870f3ap01 ~]# nodels blade
cmm01node01
cmm01node03
cmm01node05
cmm01node07
cmm01node09
cmm01node10
cmm01node11
[root@c870f3ap01 ~]# lsdef cmm01node01
Object name: cmm01node01
cons=fsp
groups=blade,all
hcp=12.0.0.32
hwtype=blade
id=1
mgt=fsp
mpa=cmm01
mtm=789542X
nodetype=ppc,osi
parent=cmm01
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
serial=10F752A
slotid=1
[root@c870f3ap01 ~]#
The attribute 'mpa' should be set to the node name of cmm. The attribute 'slotid' should be set to the physical slot id of the blade. The attribute 'hcp' should be set to the IP that admin assigns to the BMC of the blade.
mkdef cmm[01-02]node[01-14] groups=all,blade mgt=ipmi cons=fsp chdef -t group blade mpa='|cmm(\d+)node(\d+)|cmm($1)|
'slotid='|cmm(\d+)node(\d+)|($2+0)|' hcp='|cmm(\d+)node(\d+)|10.0.($1+0).($2+0)|' mgt=ipmi
[root@c870f3ap01 ~]# nodels blade
cmm01node01
cmm01node03
cmm01node05
cmm01node07
cmm01node09
cmm01node10
cmm01node11
[root@c870f3ap01 ~]# lsdef cmm01node01
Object name: cmm01node01
cons=ipmi
groups=blade,all
hcp=12.0.0.32
hwtype=blade
id=1
mgt=ipmi
mpa=cmm01
mtm=789542X
nodetype=ppc,osi
parent=cmm01
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
serial=10F752A
slotid=1
[root@c870f3ap01 ~]#
2. Run rscan -u to discover all the compute node servers. The 'rscan -u' will match the xCAT nodes which have been defined in the xCAT database and update them instead of create a new one. It will also provide an error message that specifies if the blade node object is not found in the xCAT database. This type of error should happen when there is a configuration where the chassis contains both single wide and double wide blade configurations. The admin can execute the rmdef command for any unused blade node objects.
rscan cmm -u
If there are a mixture of single and double wide blade in the chassis, the admin should remove the unused blade objects from the xCAT DB.
rmdef <cmmxxnodeyy>
The rscan command reads the actual configuration of blade server in the CMM and creates node definitions in the xCAT database to reflect them. Run the rscan command against all of the CMMs to create a stanza file for the definitions of all the compute node servers.
rscan cmm -z >nodes.stanza
The Power 7 compute node stanza file is like this:
SN#YL10JH184084:
objtype=node
nodetype=ppc,osi
slotid=1
id=1
mtm=789542X
serial=10F69BA
mpa=flexcmm01
parent=flexcmm01
hcp=70.0.0.41
groups=blade,all
mgt=fsp
cons=fsp
hwtype=blade
SN#Y110UF18P003:
objtype=node
nodetype=ppc,osi
slotid=3
id=1
mtm=789522X
serial=10F75AA
mpa=flexcmm01
parent=flexcmm01
hcp=70.0.0.22
groups=blade,all
mgt=fsp
cons=fsp
hwtype=blade
The x86 Intel based compute node stanza file is like this:
SN#YL10JH184095:
objtype=node
nodetype=mp,osi
id=7
mtm=8737AC1
serial=23FFP69
mpa=flexcmm01
parent=flexcmm01
groups=blade,all
mgt=ipmi
cons=ipmi
hwtype=blade
SN#Y110UF18P005:
objtype=node
nodetype=mp,osi
id=8
mtm=789522X
serial=23FFP92
mpa=flexcmm01
parent=flexcmm01
groups=blade,all
mgt=ipmi
cons=ipmi
hwtype=blade
In a stanza file, the user can get the blade server with the attributes hcp (fsp of the blade), mtm, serial and id attributions. For the stanza file above, the node SN#YL10JH184084 is a pblade(nodetype=ppc,hwtype=blade,mpa=cmm01). In order to easily access or operate those compute node servers, the user can edit the stanza file and give the node the name user want them to be for definition of each compute node server.
For Power 7 compute nodes the administrator will change the object name and hcp attribute for the IP of fsp. For example, the user can modify the definition of SN#YL10JH184084 as followings:
cmm01node01:
objtype=node
cons=fsp
groups=blade,all
hcp=70.0.0.41
hwtype=blade
slotid=3
id=1
mgt=fsp
mpa=cmm01
mtm=789542X
nodetype=ppc,osi
parent=flexcmm01
serial=10F69BA
slotid=1
For x86 compute nodes the administrator will change the object name only. For example, the user can modify the definition of SN#Y11oUF18P005 as followings:
cmm01node08:
objtype=node
nodetype=mp,osi
slotid=3
mtm=789522X
serial=23FFP92
mpa=flexcmm01
parent=flexcmm01
groups=blade,all
mgt=ipmi
cons=ipmi
hwtype=blade
Then create the definitions in the database:
cat nodes.stanza | mkdef -z
Set the network configuration for the fsp
rspconfig blade network=*
In order to conveniently manage the blade servers, the customer may wan to have a cleaner name for the blade node. The following command can be used to modify a blade device name.
rspconfig singlenode textid="cmm01node01"
The following command can be used to change a group of blade device name to the node names that are defined in xCAT DB.
rspconfig blade textid=*
1. Add the blade's fsp connections for the DFM management:
mkhwconn blade -t
2. check the connections are LINE_UP:
lshwconn blade
3. make sure the blade server powered on
rpower blade state
rpower blade on
rcons configuration
1 Make sure the SOL on CMM has been disabled
rspconfig cmm solcfg
rspconfig cmm solcfg=disable
2 Update conserver configuration
makeconservercf
#For Linux:
service conserver stop
service conserver start
#For AIX:
stopsrc -s conserver
startsrc -s conserver
3 Check rcons
Before run rcons to open the console, you should make sure the blades are on state:
rpower bladenoderange state ; if node is off then run
rpower bladenoderange on
rcons onebladenode
If the blade is in off state, it will specify "Destination BLADE is in POWER OFF state, Please power it on and wait.". We need to power on blade.
rpower onebladenode state ;if node is off then run
rpower onebladenode on
If the blade is powered off and then on while console is opening, we need to refresh console manually.
Check hardware control setup to the nodes See if you setup is correct at this point, run rpower to check node status:
rpower bladenoderange stat
Update the mac table with the address of the node(s) Get mac addresses: IBM Flex POWER 7 blades support getting the mac address through the CMM:
1 Set the 'getmac' attribute to 'blade'
chdef cmm01node01 getmac=blade
Since the Firmware is not stable at present, the following 2 steps are recommended to get the mac address for the specified interface.
2 run getmacs to display all the macs:
getmacs cmm01node01 -d
3 Set the 'installnic' attribute to specify the mac address for which interface will be collected. The admin shall exactly know witch interface is connected. Note: If 4 mac addresses are gotten, they all are the mac addresses of the blade. The x can start from 0(map to the eth0 of the blade) to 3. If 5 mac addresses are gotten, the 1st mac address must be the mac address of the blade's FSP, so the x shall start from 1(map to the eth0 of the blade) to 4.
chdef cmm01node01 installnic=x
getmacs cmm01node01
set boot string: Before run rbootseq to set the boot device, you should make sure the blades are on state:
rpower bladenoderange state ; if node is off then run
rpower bladenoderange on
And then run rbooteq to set the boot string:
rbootseq bladenoderange net
After using rbootseq to set the boot string, you should run rpower with cycle to make the boot string be permanent, and wait about 1 minute:
rpower bladenoderange cycle
Use network boot to start the installation
This is just a hardware control management doc. Before start the installation, you should configure some services, prepare the networks setting, osimage and so on.
1. Installation preparation
For Linux, you should refer to the link:
https://sourceforge.net/apps/mediawiki/xcat/index.php?title=XCAT_pLinux_Clusters
For AIX, you should refer to the link:
https://sourceforge.net/apps/mediawiki/xcat/index.php?title=XCAT_AIX_RTE_Diskfull_Nodes
https://sourceforge.net/apps/mediawiki/xcat/index.php?title=XCAT_AIX_mksysb_Diskfull_Nodes
https://sourceforge.net/apps/mediawiki/xcat/index.php?title=XCAT_AIX_Diskless_Nodes
NOTE: If you want to use the hardware control commands for IBM Flex, you should use the commands listed in this doc.
2 Use network boot to start the installation
rpower bladenoderange cycle
The 'cycle' process will costs about 1 minute, then the boot screen will be seen with 'rcons'.
The CMM firmware is updated by downloading the firmware from <http TBD>. Once the firmware has been downloaded the compressed tar file needs to be uncompressed, unzipped, and placed on a directory on the EMS. For this example we will use /install/firmware as the directory for the firmware.
Once the firmware is unzipped and placed in this directory you can use the CMM update command to update the firmware on either one chassis at a time or all chassis managed by xCAT. The format of the command is:
flash the file and reboot afterwards
update -r -u http://<server>/<path to file>
flash, show progress, and reboot afterwards
update -v -r -u http://<server>/<path to file>
To update a single CMM use:
ssh USERID@ccm01 udpate -T system:mm[1] -v -u http://70.0.0.1/install/firmware/2pet07v/cmefs.uxp
If unprompted password is setup on all CMMs then you can use xCAT psh to update all CMMs in the cluster at once.
psh cmm -l USERID udpate -T system:mm[1] -v -u http://70.0.0.1/install/firmware/2pet07v/cmefs.uxp
This is accomplished by using the rflash xCAT command from the xCAT Management node. The admin should download the supported GFW from the IBM Fix central website, and place it in a directory that is available to be read by the xCAT Management node.
1. Use rinv command to get the current firmware levels of the blades' FSPs:
rinv bladenoderange firm (output to be added here)
2.Use the rflash command to update the firmware levels for the blades' FSPs. Then validate that the new firmware is loaded:
For firmware disruptive update, you should make sure the blade in power off state firstly.
rpower bladenoderange state
rpower bladenoderange off
And then use rflash to do the update:
rflash bladenoderange -p <directory> --activate disruptive
(output to be added here)
rinv bladenoderange firm
3. Verify that the blades are healthy and power on the blades:
rpower bladenoderange state
rvitals bladenoderange lcds
rpower bladenoderange on