
This document assumes that you have already purchased your GPFS product, have the Linux rpms available, and are familiar with the GPFS documentation: GPFS Library
These instructions are based on GPFS 3.3. GPFS 3.4 and GPFS 3.5. If you are using a different version of GPFS, you may need to make adjustments to the information provided here.
Before proceeding with these instructions, you should have the following already completed for your xCAT cluster:
To set up GPFS in a statelite or stateless cluster, follow these steps:
Copy the GPFS rpms from your distribution media onto the xCAT management node (MN), following the instructions you received and accepting the product licenses as required. Suggested target location to put the rpms on the xCAT MN:
/install/post/otherpkgs/<os>/<arch>/gpfs
For rhels6 ppc64, the target location is:
/install/post/otherpkgs/rhels6/ppc64/gpfs
Note: (optional) If you want to use the optional TEAL GPFS connector feature, copy the teal-gpfs-sn rpm to /install/post/otherpkgs/<os>/<arch>/gpfs on your xCAT management node. The teal-gpfs-sn rpm is shipped with the TEAL product.
(optional) Download GPFS updates following your normal procedures. This may be a useful link: GPFS Support and Downloads. Suggested target location for the update rpms:
/install/post/otherpkgs/gpfs_updates
Install the GPFS base and update rpms on your xCAT management node following the GPFS documentation.
(optional) If you want to use the optional TEAL GPFS connector feature, install the teal-base and teal-gpfs rpms onto your xCAT management node. Refer to [Setting_up_TEAL_on_xCAT_Management_Node] for details.
Assuming that the kernel and architecture of your xCAT management node is the same as all your compute nodes, build the GPFS portability layer rpm following the instructions provided by GPFS. See:
/usr/lpp/mmfs/src/README
NOTE: This requires that the kernel source rpms are installed on your xCAT management node. For example, for SLES11, make sure the kernel-source and kernel-ppc64-devel rpms are installed. For rhels6, make sure the cpp.ppc64,gcc.ppc64,gcc-c++.ppc64,kernel-devel.ppc64 and rpm-build.ppc64 are installed; If not, please run "yum install cpp.ppc64 gcc.ppc64 gcc-c++.ppc64 kernel-devel.ppc64 rpm-build.ppc64 compat-libstdc++-33.ppc64 rsh.ppc64" on rhels6 xCAT MN.
If the kernel of your compute nodes will be different from that of your xCAT management node, you will first need to install one node with all of the GPFS and kernel source rpms, follow these instructions to build the GPFS portability layer rpm there, copy that rpm back to your xCAT management node and then continue with the rest of these procedures to add the rpm to your image and install/configure GPFS.
Install the new rpm on your xCAT management node and copy it to your otherpkgs directory in preparation for installing it into your diskless images:
For sles11 ppc64, please run the following commands:
cd /usr/src/packages/RPMS/ppc64
rpm -Uvh gpfs.gplbin*.rpm
cp gpfs.gplbin*.rpm /install/post/otherpkgs/<os>/<arch>/gpfs
For rhels6 ppc64, please run the following commands:
cd /root/rpmbuild/RPMS/ppc64/
rpm -Uvh gpfs.gplbin*.rpm
cp gpfs.gplbin*.rpm /install/post/otherpkgs/rhels6/ppc64/gpfs/
createrepo /install/post/otherpkgs/rhels6/ppc64/gpfs/
If the createrepo command is not found, you may need to install the createrepo rpm package that is shipped with your OS distribution.
Include GPFS in your diskless image:
Install the optional xCAT-IBMhpc rpm on your xCAT management node. This rpm is available with xCAT and should already exist in your zypper or yum repository that you used to install xCAT on your managaement node. A new copy can be downloaded from Download xCAT.
To install the rpm in SLES:
zypper refresh
zypper install xCAT-IBMhpc
To install the rpm in Redhat:
yum install xCAT-IBMhpc
TEAL GPFS Connector Feature (optional):
If you have a hierarchical cluster with service nodes, and want to use the optional TEAL GPFS connector feature, please install teal-gpfs-sn rpm on the GPFS collector service node (and backup GPFS collector service nodes if possible) following the instruction below:
Add teal-gpfs-sn to your otherpkgs list:
vi /install/custom/install/<ostype>/<service-profile>.otherpkgs.pkglist
Add two lines:
#INCLUDE:/opt/xcat/share/xcat/IBMhpc/gpfs/gpfs.otherpkgs.pkglist#
#INCLUDE:/opt/xcat/share/xcat/IBMhpc/teal/teal-gpfs-collector.otherpkgs.pkglist#
If your service nodes are already installed and running, update the software on your service nodes:
updatenode <service-noderange> -S
Add to pkglist: Edit your /install/custom/netboot/<ostype>/<profile>.pkglist and add the base IBMhpc pkglist:
For sles11 ppc64:
#INCLUDE:/opt/xcat/share/xcat/IBMhpc/IBMhpc.sles11.ppc64.pkglist#
For rhels6 ppc64:
#INCLUDE:/opt/xcat/share/xcat/IBMhpc/IBMhpc.rhels6.ppc64.pkglist#
Verify that the above sample pkglist contains the correct packages. If you need to make changes, you can copy the contents of the file into your <profile>.pkglist and edit as you wish instead of using the #INCLUDE: ...# entry.
Add to otherpkgs: Edit your /install/custom/netboot/<ostype>/<profile>.otherpkgs.pkglist and add:
#INCLUDE:/opt/xcat/share/xcat/IBMhpc/gpfs/gpfs.<arch>.otherpkgs.pkglist#
Verify that the above sample pkglist contains the correct gpfs packages. If you need to make changes, you can copy the contents of the file into your <profile>.otherpkgs.pkglist and edit as you wish instead of using the #INCLUDE: ...# entry.
If you are building a stateless image that will be loaded into the node's memory, you will want to remove all unnecessary files from the image to reduce the image size. Add to exclude list: Edit your /install/custom/netboot/<ostype>/<profile>.exlist and add:
#INCLUDE:/opt/xcat/share/xcat/IBMhpc/IBMhpc.<osver>.<arch>.exlist#
#INCLUDE:/opt/xcat/share/xcat/IBMhpc/gpfs/gpfs.exlist#
Verify that the above sample exclude lists contain the files and directories you want deleted from your diskless image. If you need to make changes, you can copy the contents of the file into your <profile>.exlist and edit as you wish instead of using the #INCLUDE: ...# entry.
Note: Several of the exclude list files shipped with xCAT-IBMhpc re-include files (with "+directory" syntax) that are normally deleted with the base exclude lists xCAT ships in /opt/xcat/share/xcat/netboot/<os>/compute.*.exlist. Keeping these files in the diskless image is required for the install and functionality of some of the HPC products.
If you are building a statelite image, refer to the xCAT documentation for statelite images for creating persistent files, identifying mount points, and configuring your xCAT cluster for working with statelite images. For your GPFS support, add writable and persistent directories/files required by GPFS to your litefile table in the xCAT database:
tabedit litefile
In a separate window cut the contents of /opt/xcat/share/xcat/IBMhpc/gpfs/litefile.csv
paste into your tabedit session, modify as needed for your environment, and save
When using persistent files, you should also make sure that you have an entry in your xCAT database statelite table pointing to the location for storing those files for each node.
Note: The sample litefile.csv contains an entry for the /gpfs directory which is the default mount point for your GPFS filesystems on the node. If you create your GPFS filesystems with a different mount point, you will need to change this entry accordingly.
Add to postinstall scripts:
Edit your /install/custom/netboot/<ostype>/<profile>.postinstall(please make sure it has executable permission) and add:
/opt/xcat/share/xcat/IBMhpc/IBMhpc.<os>.postinstall $1 $2 $3 $4 $5
NODESETSTATE=genimage installroot=$installroot /opt/xcat/share/xcat/IBMhpc/gpfs/gpfs_updates
For rhels6 ppc64, edit the /install/custom/netboot/rh/compute.postinstall(please make sure it has executable permission) and add:
/opt/xcat/share/xcat/IBMhpc/IBMhpc.rhel.postinstall $1 $2 $3 $4 $5
NODESETSTATE=genimage installroot=$1 /opt/xcat/share/xcat/IBMhpc/gpfs/gpfs_updates
Review these sample scripts carefully and make any changes required for your cluster. Note that some of these scripts may change tuning values and other system settings. They will be run by genimage after all of your rpms are installed into the image. First the basic IBMhpc script will be run to create filesystems, turn on services, and set some tunables. Then the script to install the gpfs update rpms into the image will be run, and the script will also do some additional configuration work such as creating a non-functional nsddevices script and adding GPFS paths to the default profile. Verify that these scripts will work correctly for your cluster. If you wish to make changes to any of these scripts, copy it to /install/postscripts and adjust the above entry in the postinstall script to invoke your updated copy.
NOTE: If you are creating an image that will be used for GPFS I/O nodes, you MUST make a local copy of the gpfs_updates script and comment out the lines that create a non-functional nsddevices script. You need a working copy of this script for your I/O server so that it can find the disks it needs to build your GPFS filesystems.
Once your nodes have been added to the GPFS cluster (see below), you can add the following after the previous entry:
installroot=$installroot /install/postscripts/gpfs_mmsdrfs
For rhels6 ppc64, the entry should be:
installroot=$1 /install/postscripts/gpfs_mmsdrfs
You need to copy this script from the /opt/xcat/share/xcat/IBMhpc/gpfs directory to /install/postscripts and edit it to provide the correct location of your master mmsdrfs file (see section below for more information).
Including the mmsdrfs file in the image before the node has been added will cause the mmaddnode command to fail with an error that GPFS thinks the node already belongs to another GPFS cluster.
Network boot your nodes:
nodeset <noderange> netboot for all your nodes
rnetboot (noderange> to boot your nodes
When the nodes are up, verify that the GPFS rpms are all correctly installed.
GPFS installation documentation advises having all your nodes running and installed with the GPFS rpms before creating your GPFS cluster. However, with very large clusters, you may choose to only have your main GPFS infrastructure nodes up and running, create your cluster, and then add your compute nodes later. If so, only install and boot those nodes that are critical to configuring your GPFS cluster and bringing your GPFS filesystems online. You can network boot the compute nodes later and add them to your GPFS configuration using the mmaddnode command.
As stated at the beginning of this page, these instructions assume that you have already created a diskless image with a base AIX operating system and tested a network installation of that image to at least one compute node. This will ensure you understand all of the processes, networks are correctly defined, NIM operates well, NFS is correct, xCAT postscripts run, and you can xdsh to the node with proper ssh authorizations. For detailed instructions, see the xCAT document for deploying AIX diskless nodes [XCAT_AIX_Diskless_Nodes].
xCAT recommends that you use the mknimimage --sharedroot option to use the NIM shared root support for your diskless nodes. Your nodes will be stateless in that they will not maintain persistent files in the / root directory across reboots, but the node NIM initialization process will be much quicker, and the load on your NFS server (NIM master) will be significantly reduced.
Include GPFS in your diskless image:
Install the optional xCAT-IBMhpc rpm on your xCAT management node. This rpm is available with xCAT and should already exist in the directory that you downloaded your xCAT rpms to. It did not get installed when you ran the instxcat script. A new copy can be downloaded from Download xCAT.
To install the rpm:
cd <your xCAT rpm directory>
rpm -Uvh xCAT-IBMhpc*.rpm
TEAL GPFS Connector Feature (optional):
If you want to use the optional TEAL GPFS connector feature, install the teal.base and teal.gpfs installp packages onto your xCAT management node, refer to [Setting_up_TEAL_on_xCAT_Management_Node] for details.
If you have a hierarchical cluster with service nodes, and want to use the optional TEAL GPFS connector feature, install the teal.gpfs-sn installp package on the GPFS collector service node (and backup GPFS collector service nodes if possible) following the instruction below:
cp /opt/xcat/share/xcat/IBMhpc/teal/teal-gpfs-collector.bnd /install/nim/installp_bundle
nim -o define -t installp_bundle -a server=master -a location=/install/nim/installp_bundle/teal-gpfs-collector.bnd teal
Assume you have IBMhpc_base and gpfs NIM installp_bundle defined, if not, follow the instruction below to define them
chdef -t osimage -o <image_name> -p installp_bundle="IBMhpc_base,gpfs,teal"
If your service nodes are already installed and running, update the software on your service nodes:
updatenode <service-noderange> -S
Copy the GPFS product packages and PTFS from your distribution media onto the xCAT management node (MN). Suggested target location to put the packages on the xCAT MN:
/install/post/otherpkgs/aix/ppc64/gpfs
Note: (optional) If you want to use the optional TEAL GPFS connector feature, copy teal.gpfs-sn package to /install/post/otherpkgs/aix/ppc64/gpfs on your xCAT management node. The teal.gpfs-sn package is shipped with the TEAL product.
The packages that will be installed by the xCAT HPC Integration support are listed in sample bundle files. Review the following file to verify you have all the product packages you wish to install (instructions are provided below for copying and editing this file if you choose to use a different list of packages):
/opt/xcat/share/xcat/IBMhpc/gpfs/gpfs.bnd
Add the GPFS packages to the lpp_source used to build your diskless image:
inutoc /install/post/otherpkgs/aix/ppc64/gpfs
nim -o update -a packages=all -a source=/install/post/otherpkgs/aix/ppc64/gpfs <lpp_source_name>
Add additional base AIX packages to your lpp_source:
Some of the HPC products require additional AIX packages that may not be part of your default AIX lpp_source. Review the following file to verify all the AIX packages needed by the HPC products are included in your lpp_source (instructions are provided below for copying and editing this file if you choose to use a different list of packages):
/opt/xcat/share/xcat/IBMhpc/IBMhpc_base.bnd
To list the contents of your lpp_source, you can use:
nim -o showres <lpp_source_name>
And to add additional packages to your lpp_source, you can use the nim update command similar to above specifying your AIX distribution media and the AIX packages you need.
Create NIM bundle resources for base AIX prerequisites and for GPFS packages:
cp /opt/xcat/share/xcat/IBMhpc/IBMhpc_base.bnd /install/nim/installp_bundle
nim -o define -t installp_bundle -a server=master -a location=/install/nim/installp_bundle/IBMhpc_base.bnd IBMhpc_base
cp /opt/xcat/share/xcat/IBMhpc/gpfs/gpfs.bnd /install/nim/installp_bundle
nim -o define -t installp_bundle -a server=master -a location=/install/nim/installp_bundle/gpfs.bnd gpfs
Review these sample bundle files and make any changes as desired.
Add the bundle resource to your xCAT diskless image definition:
chdef -t osimage -o <image_name> -p installp_bundle="IBMhpc_base,gpfs"
Update the image:
Note: Verify that there are no nodes actively using the current diskless image. NIM will fail if there are any NIM machine definitions that have the SPOT for this image allocated. If there are active nodes accessing the image, you will either need to power them down and run rmdkslsnode for those nodes, or you will need to create a new image and then switch your nodes to that image later. For more information and detailed instructions on these options, see the xCAT document for updating software on AIX nodes: [Updating_AIX_Software_on_xCAT_Nodes].
mknimimage -u <image_name>
Add base HPC and GPFS postscripts
cp -p /opt/xcat/share/xcat/IBMhpc/IBMhpc.postscript /install/postscripts
cp -p /opt/xcat/share/xcat/IBMhpc/gpfs/gpfs_updates /install/postscripts
chdef -t group -o <compute nodegroup> -p postscripts="IBMhpc.postscript,gpfs_updates"
Review these sample scripts carefully and make any changes required for your cluster. Note that some of these scripts may change tuning values and other system settings. The scripts will be run on the node after it has booted as part of the xCAT diskless node postscript processing.
Optionally update the GPFS mmsdrfs configuration file in your image.
Once your nodes have been added to the GPFS cluster See Setting_up_GPFS_in_a_Statelite_or_Stateless_Cluster/#build-your-gpfs-cluster below, you can run the following script on the xCAT management node to update the GPFS mmsdrfs configuration file in your image.
cp -p /opt/xcat/share/xcat/IBMhpc/gpfs/gpfs_mmsdrfs /install/postscripts
vi /install/postscripts/gpfs_mmsdrfs
# Edit script to set GPFS master SOURCE config file, xCAT IMAGE names, and other values
/install/postscripts/gpfs_mmsdrfs
See section Setting_up_GPFS_in_a_Statelite_or_Stateless_Cluster/#gpfs-cluster-definition-file-mmsdrfs below for more information.
Note: Including the mmsdrfs file in the image before the node has been added will cause the mmaddnode command to fail with an error that GPFS thinks the node already belongs to another GPFS cluster.
Follow the instructions in the xCAT AIX documentation [XCAT_AIX_Diskless_Nodes] to network boot your nodes:
GPFS installation documentation advises having all your nodes running and installed with the GPFS lpps before creating your GPFS cluster. However, with very large clusters, you may choose to only have your main GPFS infrastructure nodes up and running, create your cluster, and then add your compute nodes later. If so, only install and boot those nodes that are critical to configuring your GPFS cluster and bringing your GPFS filesystems online. You can network boot the compute nodes later and add them to your GPFS configuration using the mmaddnode command.
Follow the GPFS documentation to create your GPFS cluster, define manager nodes, quorum nodes, accept licenses, create NSD disk definitions, and create your filesystems. Once you have verified that your GPFS cluster is operational and that the GPFS filesystems are available to the currently active nodes, you can add your remaining compute nodes to the GPFS cluster.
The mmaddnode command will accept a file containing a list of node names as input. xCAT can help you create this file. Simply run the xCAT nodels command against the desired noderange and redirect the output to a file. This file can then be used as input to GPFS commands. For example:
nodels compute > /tmp/gpfsnodes
mmaddnode -N /tmp/gpfsnodes
GPFS stores its cluster definition in the file /var/mmfs/gen/mmsdrfs. When changes are made to the GPFS cluster, GPFS contacts all active nodes and updates the copy of this file on those nodes. With diskless images, this file may not be persistent across reboots of the node. Therefore, it is important to keep an updated copy in each diskless image to ensure the node can correctly rejoin the GPFS cluster if it is rebooted. xCAT provides the following sample script to help you with this:
/opt/xcat/share/xcat/IBMhpc/gpfs/gpfs_mmsdrfs
This script uses rsync to keep the mmsdrfs file updated in your images. You will need to copy this script to your /install/postscripts directory and edit it to identify the correct SOURCE location of your master mmsdrfs file, the target IMAGEs to be updated, and if using xCAT hierarchy with local disks on the service node, the noderange of the SERVICE nodes to be kept current. When you invoke the script, you can specify if you also want to run packimage or liteimg for each of your Linux images, and if you want to sync your /install/netboot directory to your service nodes if updates have been made:
cp /opt/xcat/share/xcat/IBMhpc/gpfs/gpfs_mmsdrfs /install/postscripts
vi /install/postscripts/gpfs_mmsdrfs
/install/postscripts/gpfs_mmsdrfs packimage syncinstall
(specify "packimage" or "liteimg" only for your Linux diskless or stateless images respectively. "syncinstall" is required for AIX as the first option).
If you will be making limited changes to your GPFS cluster configuration, you may choose to run this script manually after each set of changes. However, if you are making frequent changes, or if you may forget to update the images after making changes to the GPFS cluster, you may choose to run this script periodically as a cron job. For example, add the following to your crontab to check every 10 minutes:
*/10 * * * * /install/postscripts/gpfs_mmsdrfs packimage syncinstall 2>&1 >> /tmp/gpfs_mmsdrfs.log
(specify "packimage" or "liteimg" only for your Linux diskless or stateless images respectively. "syncinstall" is required for AIX as the first option).
For Linux, you should also update your image postinstall scripts in /install/custom/netboot/<ostype>/<profile>.postinstall to run this script from genimage to place a current copy of the mmsdrfs file into your image each time genimage is run. When doing so, make sure to invoke this script with the $installroot environment variable set to ignore your IMAGE settings in the script and only update the image that genimage is being run for.
Note: If you have a copy of the mmsdrfs file in an image that is booted on a node that has NOT been added to the GPFS cluster yet, the mmaddnode command will fail with an error that GPFS thinks the node is already part of another cluster. You will need to remove the runtime copy of the /var/mmfs/gen/mmsdrfs file on the node before running mmaddnode:
xdsh <noderange> rm /var/mmfs/gen/mmsdrfs
There are several ways you can start GPFS on your cluster nodes.
Use the xCAT xdsh command to run the GPFS mmstartup command individually on all nodes, or use GPFS to distribute the commands by running "mmstartup -a". Note that for very large clusters, running an mmstartup command to all nodes in the cluster at one time can cause a heavy load on your network. Therefore, using xdsh with appropriate fanout values may be a better choice.
The mmchconfig command allows you to set a cluster-wide option to automatically start GPFS anytime a node is booted:
mmchconfig autoload=yes
The default setting for this option is "autoload=no". When you are first setting up GPFS across your cluster, you will probably choose NOT to turn this on until after you have initially installed all your nodes and done some cluster-wide verification.
When changing this setting for your GPFS cluster, be sure to update the mmsdrfs file in all of your diskless images as described above to ensure the correct value is available when the node reboots.
If you want to use the optional TEAL GPFS connector feature, after the GPFS cluster is correctly configured, verify that the teal-base and teal-gpfs rpms for Linux and teal.base and teal.gpfs installps for AIX are correctly installed on your xCAT management node, the teal-gpfs-sn rpm for Linux and teal.gpfs-sn installp for AIX is correctly installed on your GPFS collector service node(and backup GPFS collector service nodes if possible). Then, you can specify the selected service node as your TEAL GPFS collector node. Run the following command on the xCAT management node:
/opt/teal/bin/tlgpfschnode -C <GPFS cluster name> -N <node name> -e
For example:
/opt/teal/bin/tlgpfschnode -C gpfscluster.cluster.com -N myservicenode1 -e
You can create your own postscript in /install/postscripts and add an entry to the xCAT postscripts table for your nodes. The postscript can be as simple as:
/usr/lpp/mmfs/bin/mmsdrrestore
/usr/lpp/mmfs/bin/mmstartup
If GPFS is using a network interface that is not immediately available at node boot time (e.g. an IB interface that needs to be configured from an xCAT postscript and may take a little extra time becoming stable), you may wish to add some network health checks to your script before starting GPFS.
xCAT previously shipped support for using GPFS in stateless Linux clusters in
/opt/xcat/share/xcat/netboot/add-on/autogpfs
Note:This support (autogpfs) is no longer being maintained, although it may still work for you if you choose to use it. The methodologies outlined in this document are the preferred usage.
Wiki: Download_xCAT
Wiki: IBM_HPC_Stack_in_an_xCAT_Cluster
Wiki: Power_775_Cluster_Recovery
Wiki: Setting_Up_IBM_HPC_Products_on_a_Statelite_or_Stateless_Login_Node
Wiki: Setting_Up_IBM_HPC_Products_on_an_I-O_Node
Wiki: Setting_up_IBM_HPC_Products_on_an_IO_node
Wiki: Setting_up_TEAL_on_xCAT_Management_Node
Wiki: Setting_up_all_IBM_HPC_products_in_a_Statelite_or_Stateless_Cluster
Wiki: Setup_for_Power_775_Cluster_on_xCAT-MN
Wiki: Setup_for_Power_775_Cluster_on_xCAT_MN
Wiki: Updating_AIX_Software_on_xCAT_Nodes
Wiki: XCAT_AIX_Diskless_Nodes