
This document assumes that you have already purchased your LoadLeveler product, have the Linux rpms available, and are familiar with the LoadLeveler documentation: Tivoli Workload Scheduler LoadLeveler library
These instructions are based on LoadLeveler 4.1 and LoadLeveler 5.1. If you are using a different version of LoadLeveler, you may need to make adjustments to the information provided here.
When installing LoadLeveler in an xCAT cluster, it is assumed that you will be using the xCAT MySQL or DB2 database to store your LoadLeveler configuration data. Different versions of Loadleveler support different operating systems, hardware architectures, and databases. Refer to the LoadLeveler documentation for the support required for your cluster. For example, Power 775 requires LoadLeveler 5.1 on AIX 7.1 or RedHat ELS 6 with DB2.
Before proceeding with these instructions, you should have the following already completed for your xCAT cluster:
Loadleveler requires that userids be common across all nodes in a LoadLeveler cluster, and that the user home directories are shared. There are many different ways to handle user management and to set up a cluster-wide shared home directory (for example, using NFS or through a global filesystem such as GPFS). These instructions assume that the shared home directory has already been created and mounted across the cluster and that the xCAT management node and all xCAT service nodes are also using this directory. You may wish to have xCAT invoke your custom postbootscripts on nodes to help set this up.
Copy the LoadLeveler packages from your distribution media onto the xCAT management node (MN). Suggested target location to put the packages on the xCAT MN:
/install/post/otherpkgs/<os>/<arch>/loadl
For rhels6 ppc64 , the target location is:
/install/post/otherpkgs/rhels6/ppc64/loadl
Note: LoadLeveler 4.1.1 on Linux requires a special Java rpm to run its license acceptance script. This is not required for LoadLeveler 5.1. The correct version of this rpm is identified in the LoadLeveler product documentation (at the time of this writing, the rpm was IBMJava2-142-ppc64-JRE-1.4.2-5.0.ppc64.rpm, but please verify with the LL documentation). Ensure the Java rpm is included in the loadl otherpkgs directory.
For Linux, you should create repodata in your /install/post/otherpkgs/<os>/<arch>/* directory so that yum or zypper can be used to install these packages and automatically resolve dependencies for you:
createrepo /install/post/otherpkgs/<os>/<arch>/loadl
If the createrepo command is not found, you may need to install the createrepo rpm package that is shipped with your Linux OS. For SLES 11, this is found on the SDK media.
Following the LoadLeveler Installation Guide, create the loadl group and userid:
On Linux:
groupadd loadl
useradd -g loadl loadl
On AIX:
mkgroup -a loadl
mkuser pgrp=loadl groups=loadl home=/<user_home>/loadl loadl
Commonly, the <user_home> is the /home directory. When creating the loadl group and userid, the administrator can change this location as needed. It is assumed that you have already created a common home directory in the cluster for all users either in NFS, GPFS, or some other shared filesystem. Also, LoadLeveler requires that its administrative userid have either rsh or ssh access across all nodes in the cluster and to the LL central manager. Make sure you have set this up for the loadl userid. For example, to create a .rhosts file (as root):
nodels compute > /<user_home>/loadl/.rhosts
echo "<MN hostname>" >> /<user_home>/loadl/.rhosts
chown loadl:loadl /<user_home>/loadl/.rhosts
Or, if you are using ssh for LoadLeveler communications, follow your ssh documentation to set up .ssh keys for the userid.
Note: xCAT does not provide any general function for just setting up a user's ssh keys. However, if the user will also be running xCAT xdsh and other commands, the xCAT wiki page on [Granting_Users_xCAT_privileges] includes instructions on how to provide the user with this access, including automatically setting up ssh keys for that user.
If the user will not be authorized to run xCAT commands, you can still "cheat" and take advantage of a side-effect of the xdsh command to set up your ssh keys:
su - <userid>
/opt/xcat/bin/xdsh xxx -K ## "xxx" can be any string
xdsh will prompt you for the user's password. Enter the correct password, and then xdsh will fail with:
Error: Permission denied for request
Even though the xdsh command failed, it still created a /<user_home>/<userid>/.ssh directory with ssh keys. Create an authorized_keys file for the user:
cat /<user_home>/<userid>/.ssh/id_rsa.pub >> /<user_home>/<userid>/.ssh/authorized_keys
Since the home directory is shared across the cluster, the userid now has non-password prompted ssh access to all nodes and to the xCAT management node.
Sync the loadl group and userid to all nodes in the cluster:
See the step below on "(Optional) Synchronize system configuration files" for more details.
The role of the central manager is to examine the job.s requirements and find one or more machines in the LoadLeveler cluster that will run the job. Once it finds the machine(s), it notifies the scheduling machine. To Set up LoadLeveler Central Manager, install LoadLeveler on the node to act as LoadLeveler Central Manager. When setting up LoadLeveler in an xCAT non-hierarchical cluster, it is recommended to set up xCAT management node as the LoadLeveler Central Manager. When setting up LoadLeveler in an xCAT hierarchical cluster, it is recommended to set up one of your xCAT service nodes as the LoadLeveler Central Manager. If you have a different LoadLeveler Central Manager configuration, please see LoadLeveler documentation for more information.
To use the LoadLeveler database configuration option with the xCAT database, you will need to install LoadLeveler on your xCAT management node. You may also choose to configure your management node or service nodes as your LL central manager and resource manager. Following the LoadLeveler Installation Guide for details, install LoadLeveler on your xCAT management node. These are the high-level steps:
On Linux:
Make sure the following packages are installed on your management node:
compat-libstdc++-33.ppc64
libXmu.ppc64
libXtst.ppc64
libXp.ppc64
libXScrnSaver.ppc64
cd /install/post/otherpkgs/<os>/<arch>/loadl
IBM_LOADL_LICENSE_ACCEPT=yes rpm -Uvh ./LoadL-full-license*.rpm
rpm -Uvh ./LoadL-scheduler-full*.rpm ./LoadL-resmgr-full*.rpm
cd /install/post/otherpkgs/aix/ppc64/loadl
inutoc .
installp -Y -X -d . all
installp -X -B -d . all
You may choose to install LoadLeveler on your xCAT service node and configure it as your LL central manager or resource manager. Follow Setting_up_LoadLeveler_in_a_Stateful_Cluster/#linux to install LoadLeveler on your xCAT Linux service node. Follow Setting_up_LoadLeveler_in_a_Stateful_Cluster/#AIX to install LoadLeveler on your xCAT AIX service node.
Note: Installing LoadLeveler on xCAT service nodes follows the same process as Installing LoadLeveler on compute nodes, only with the differences below:
Python
PyODBC
For example:
cp /opt/xcat/share/xcat/IBMhpc/loadl/loadl.bnd /install/nim/installp_bundle/loadl-sn.bnd
Make sure the LoadL.scheduler packages exist in /install/post/otherpkgs/aix/ppc64/loadl directory
Add a new line in /install/nim/installp_bundle/loadl-sn.bnd
I:LoadL.scheduler
nim -o define -t installp_bundle -a server=master -a location=/install/nim/installp_bundle/loadl-sn.bnd loadl-sn
chdef -t osimage -o <image_name> -p installp_bundle="IBMhpc_base,loadl-sn"
Make your own copy for /opt/xcat/share/xcat/IBMhpc/loadl/loadl_install, rename and edit it.
cp -p /opt/xcat/share/xcat/IBMhpc/loadl/loadl_install /install/postscripts/loadl_install-sn
Modify /install/postscripts/loadl_install-sn, and change the "aix_loadl_bin" as:
aix_loadl_bin=/usr/lpp/LoadL/full/bin
For example:
cp -p /opt/xcat/share/xcat/IBMhpc/loadl/loadl-5103.otherpkgs.pkglist /opt/xcat/share/xcat/IBMhpc/loadl/loadl-5103-sn.otherpkgs.pkglist
#ENV:IBM_LOADL_LICENSE_ACCEPT=yes#
loadl/LoadL-full-license*
#loadl/LoadL-scheduler-full*
loadl/LoadL-resmgr-full*
To:
#ENV:IBM_LOADL_LICENSE_ACCEPT=yes#
loadl/LoadL-full-license*
loadl/LoadL-scheduler-full*
loadl/LoadL-resmgr-full*
Note: by default, it is assuming to install Loadl 5.1.0.3 or upper. If you wish to install Loadl 5.1.0.2 or below, then make your own copy for /opt/xcat/share/xcat/IBMhpc/loadl/loadl_install, rename and edit it.
For example:
cp -p /opt/xcat/share/xcat/IBMhpc/loadl/loadl_install /install/postscripts/loadl_install-sn
Modify /install/postscripts/loadl_install-sn, and change the three lines as:
#linux_loadl_license_script="/opt/ibmll/LoadL/sbin/install_ll -c resmgr"
linux_loadl_license_script="/opt/ibmll/LoadL/sbin/install_ll"
linux_loadl_bin=/opt/ibmll/LoadL/full/bin
After your xCAT management node and service nodes are installed with the LoadLeveler resmgr and scheduler packages, you can start to configure the xCAT management node or an xCAT service node as the LoadLeveler Central Manager and Resource Manager.
Generally, any LoadLeveler node can be specified as the LoadLeveler Central Manager and Resource Manager given that it has the LoadLeveler resmgr and scheduler packages installed, has remote access to the database when using the database configuration option, and has network connectivity to all of the nodes that will be part of your LoadLeveler cluster. For an xCAT HPC cluster without hierarchy, it is recommended that you set up your xCAT management node as the LoadLeveler Central Manager. For an xCAT HPC hierarchical cluster, it is recommended that you set up one of your xCAT service nodes as the LoadLeveler Central Manager. If you are setting up a service node as your central manager in an xCAT hierarchical cluster, you may need to also set up network routing so that the xCAT management node can communicate to the service node using the interface defined as the central manager. This may not be the same interface that the xCAT management node uses to communicate to the service node if they are on different networks. Follow the instructions in [Setup_IP_routes_between_the_MN_and_the_CNs].
To specify LoadLeveler Central Manager, you can either use llinit command if you are using file-based configuration:
llinit -cm <central manager host>
OR edit LoadL_config configuration file:
CENTRAL_MANAGER_LIST = <list of central manager and alt central managers>
OR if planning to use database configuration edit the ll-config stanza in the cluster configuration file:
CENTRAL_MANAGER_LIST = <list of central manager and alt central managers>
OR if you have already set up and are running the LoadLeveler database configuration option (see instructions below), change the attribute directly in the database:
llconfig -c CENTRAL_MANAGER_LIST=<central manager host>
Unless otherwise specified, LoadLeveler will use the central manager as the resource manager.
LoadLeveler provides the option to use configuration data from files or from a database. When setting up LoadLeveler in an xCAT HPC cluster, it is recommended that you use the database configuration option. This will use the xCAT MySQL or DB2 database. The DB2 database is required for Power 775 clusters. The hints and tips provided here will allow you to use xCAT to help set up default LoadLeveler database support. However, be sure that you have read through all of the LoadLeveler documentation for this support and understand what needs to be done to set it up. You will need to make modifications to the processes outlined here to take advantage of advanced LoadLeveler features and to set this up correctly for your environment.
All LoadLeveler nodes that will access the database must have access to the database server, and have ODBC installed and configured correctly. When setting up LoadLeveler in an xCAT non-hierarchical cluster, it is recommended that you set up your xCAT management node as the LoadLeveler DB access node. When setting up LoadLeveler in an xCAT hierarchical cluster, it is recommended that you set up your xCAT service nodes as the LoadLeveler DB access nodes. The LoadLeveler DB access nodes will serve all of its xCAT compute nodes as LoadLeveler "remotely configured nodes". The xCAT service nodes already have database access granted.
If you are running xCAT with the MySQL database, you will need to set up LoadLeveler to use this same database. Note that MySQL is not supported on Power 775 clusters. You must use DB2.
If you do not already have xCAT running with MySQL, follow the instructions in
[Setting_Up_MySQL_as_the_xCAT_DB] to convert your xCAT database to MySQL on xCAT management node. After that, follow the instructions below to set up ODBC for LoadLeveler DB access.
After your xCAT cluster is running with MySQL, to add LoadLeveler DB access on your xCAT management node, and configure the MySQL database for use with LoadLeveler, follow the instructions in Setting_Up_MySQL_as_the_xCAT_DB/#add-odbc-support to set up ODBC support. We only list the basic steps here that are fully documented in the XCAT doc:
Note: As of Oct 2010, the AIX deps package will automatically install the perl-DBD-MySQL , and unixODBC-* when installed on the Management or Service Nodes. On Redhat/Fedora and on SLES, MySQL comes as part of the OS. You may find these already installed.
cd <your xCAT-mysql rpm directory>
rpm -Uvh unixODBC-*
rpm -Uvh mysql-connector-odbc-*
With xCAT 2.6 and newer, run the command
mysqlsetup -o -L
This will set up /etc/odbcinst.ini, /etc/odbc.ini, and .odbc.ini in the root home directory and set the MySQL log-bin-trust-function-creators variable to on.
With xCAT 2.5 and older, run the command
mysqlsetup -o
and manually set the MySQL log-bin-trust-function-creators variable to ON using the MySQL interactive command:
mysql -u root -p
<enter password when prompted>
SET GLOBAL log_bin_trust_function_creators=1;
# On Linux:
cp /root/.odbc.ini /<user_home>/loadl
chown loadl:loadl /<user_home>/loadl/.odbc.ini
# On AIX:
cp /.odbc.ini /<user_home>/loadl
chown loadl:loadl /<user_home>/loadl/.odbc.ini
You can verify this access:
su - loadl
# On Linux:
/usr/bin/isql -v xcatdb
# On AIX:
/usr/local/bin/isql -v xcatdb
After your xCAT cluster is running with MySQL, to configure LoadLeveler DB access on xCAT service nodes, and configure the MySQL database for use with LoadLeveler, follow the instructions in Setting_Up_MySQL_as_the_xCAT_DB/#add-odbc-support - section "Setup the ODBC on the Service Node" to set up ODBC support. The basic steps are:
Note: As of Oct 2010, the AIX deps package will automatically install the perl-DBD-MySQL , and unixODBC-* when installed on the Management or Service Nodes. On Redhat/Fedora and on SLES, MySQL comes as part of the OS. With xCAT 2.6, the sample service package lists shipped with xCAT contain the ODBC rpms. You may find these already installed.
To include the rpms and ODBC files in the service node image, first add the rpms to the service node package list:
On Linux, add the rpms to the otherpkgs.pkglist file:
vi /install/custom/install/<ostype>/<service-profile>.otherpkgs.pkglist
# add the following entries:
unixODBC
mysql-connector-odbc
On AIX, add the rpms to the bundle file (assuming this bundle file is already defined to NIM and included in your xCAT osimage definition):
vi /install/nim/installp_bundle/xCATaixSN<version>.bnd
# add the following entries:
I:X11.base.lib
R:mysql-connector-odbc-*
For AIX61, the bundle file is /install/nim/installp_bundle/xCATaixSN61.bnd; For AIX71, the bundle file is /install/nim/installp_bundle/xCATaixSN71.bnd
With xCAT 2.6, xCAT provides an odbcsetup postbootscript. Add this to the list of postscripts run on your servicenode to create the required ODBC files:
chdef service -p postbootscripts=odbcsetup
With xCAT 2.5 and older, you will need to add the ODBC files to the synclist for your service node image:
vi /install/custom/install/<ostype>/<service-profile>.synclist
#add the following entries:
/etc/odbcinst.ini /etc/odbc.ini -> /etc/
# On Linux:
/root/.odbc.ini -> /root/
# On AIX:
/.odbc.ini -> /
and if you don't already have a synclist defined for your image:
chdef -t osimage -o <service node image> -p synclists=/install/custom/install/<ostype>/<service-profile>.synclist
If your service nodes are actively running, push out the changes now:
For xCAT 2.6 and newer:
updatenode -P odbcsetup
For xCAT 2.5 and older:
updatenode service -S
updatenode service -F
(These need to be run as two separate commands since the files need to get pushed out AFTER the packages are installed).
If you are running xCAT with the DB2 database, you will need to set up LoadLeveler to use this same database. If you do not already have xCAT running with DB2, follow the instructions in [Setting_Up_DB2_as_the_xCAT_DB] to convert your xCAT database to DB2 on xCAT management node. After that, follow the instruction below to set up ODBC for LoadLeveler DB access.
After your xCAT cluster is running with DB2, to configure LoadLeveler DB access on xCAT management node, and configure the DB2 database for use with LoadLeveler, follow the instructions in Setting_Up_DB2_as_the_xCAT_DB/#adding-odbc-support - section "Setup the ODBC on the Management Node" to set up ODBC support.
After your xCAT cluster is running with DB2, to configure LoadLeveler DB access on xCAT service nodes, and configure the DB2 database for use with LoadLeveler, follow the instructions in Setting_Up_DB2_as_the_xCAT_DB/#adding-odbc-support - section "Setup ODBC on the Service Nodes" to set up ODBC support. Also read the section on automatic setup of DB2 on the Service Nodes during install: Setting_Up_DB2_as_the_xCAT_DB/#setting-up-the-db2-client-on-the-service-nodes.
After your xCAT cluster is running with the MySQL or DB2 database, and your xCAT management node or service nodes are set up with ODBC support for LoadLeveler DB access following the instruction above, you can start to configure the xCAT management node or xCAT service nodes as the LoadLeveler DB access nodes.
Generally, all LoadLeveler nodes that have access to the database can be specified as LoadLeveler DB access nodes. While in an xCAT HPC cluster, it is recommended that you set up your xCAT management node as the LoadLeveler DB access node in an xCAT non-hierarchical cluster, and set up your xCAT service nodes as the LoadLeveler DB access nodes in an xCAT hierarchical cluster. If you have a different LoadLeveler DB access node configuration, please see LoadLeveler documentation for more information.
Modify /etc/LoadL.cfg master configuration file on the xCAT management node or xCAT service nodes to add a line:
LoadLDB = xcatdb
Follow the LoadLeveler instructions to perform the necessary steps to initialize and configure your cluster using the database configuration option. This includes things like properly editting your /etc/LoadL.cfg master configuration file, and determining your LoadLeveler configuration information, and running the llconfig -i command to initialize the database.
Note: By default, the xCAT HPC Integration support will only install the LoadLeveler resmgr rpm on the compute nodes in your cluster. Both the LoadLeveler resmgr and scheduler rpms are installed on your xCAT management node or your xCAT service nodes, so when you run llinit on your xCAT management node or your xCAT service nodes, it will configure the default LoadL_admin and LoadL_config files to reference these. You will need to modify the BIN and NEGOTIATOR values in the LoadL_config file to correctly work for all the compute nodes in your cluster. If you are running with the LoadLeveler database configuration option,use the llconfig -c command to change these values:
On Linux:
BIN = /opt/ibmll/LoadL/resmgr/full/bin/
NEGOTIATOR=/opt/ibmll/LoadL/scheduler/full/bin/LoadL_negotiator
On AIX:
BIN = /usr/lpp/LoadL/resmgr/full/bin/
NEGOTIATOR=/usr/lpp/LoadL/scheduler/full/bin/LoadL_negotiator
After you have made any needed updates, initialize the LoadLeveler database configuration:
llconfig -i -t <your cluster config file> -f <LoadL_config file you have edited>
Note: After llconfig -i is executed on the xCAT management node to initialize the LL database, it creates /install/postscripts/llserver.sh and /install/postscripts/llcompute.sh postscripts and the /install/postscripts/LoadL directory with files used by these scripts. You can use these postscripts via the xCAT postscript process to configure the selected nodes as the LoadLeveler database server nodes or compute nodes when running LoadLeveler with the database option. Please see the descriptions in llserver.sh and llcompute.sh for details.
To configure the LoadLeveler database server nodes, specify the llserver.sh as xCAT postbootscripts. For example:
To run during service node installs:
chdef <loadl db servers> -p postbootscripts=llserver.sh
To run immediately on a service node that is already installed:
updatenode <loadl db servers> -P llserver.sh
To configure the LoadLeveler compute nodes, specify the llcompute.sh as xCAT postbootscripts. For example:
To run during compute node installs or diskless boot:
chdef <loadl compute nodes> -p postbootscripts=llcompute.sh
To run immediately on a compute node that is already installed:
updatenode <loadl compute nodes> -P llcompute.sh
Note: When using service nodes make sure the postscripts and the LoadL subdirectory are copied to the /install/postscripts directories on each service node before the updatenode command is issued.
For example:
xdcp service -v -R /install/postscripts/* /install/postscripts
To continue to set up LoadLeveler in a Linux stateful cluster, follow these steps:
Include LoadLeveler in your stateful image definition:
Install the optional xCAT-IBMhpc rpm on your xCAT management node and service nodes. This rpm is available with xCAT and should already exist in your zypper or yum repository that you used to install xCAT on your management node. A new copy can be downloaded from: Download xCAT.
To install the rpm in SLES:
zypper refresh
zypper install xCAT-IBMhpc
To install the rpm in Redhat:
yum install xCAT-IBMhpc
vi /install/custom/install/<ostype>/<service-profile>.otherpkgs.pkglist
If this is a new file, add the following to use the service profile shipped with xCAT:
#INCLUDE:/opt/xcat/share/xcat/install/sles/service.<osver>.<arch>.otherpkgs.pkglist
Either way, add this line:
xcat/xcat-core/xCAT-IBMhpc
updatenode <service-noderange> -S
You should create repodata in your /install/post/otherpkgs/<os>/<arch>/* directory so that yum or zypper can be used to install these packages and automatically resolve dependencies for you:
createrepo /install/post/otherpkgs/<os>/<arch>/loadl
If the createrepo command is not found, you may need to install the createrepo rpm package that is shipped with your Linux OS. For SLES 11, this is found on the SDK media.
Add to postscripts:
Copy the IBMhpc postscript to the xCAT postscripts directory:
cp /opt/xcat/share/xcat/IBMhpc/IBMhpc.postscript /install/postscripts
Review these sample scripts carefully and make any changes required for your cluster. Note that some of these scripts may change tuning values and other system settings. This script will run after all OS rpms are installed on the node and the xCAT default postscripts have run, but before the node reboots for the first time.
Add this script to the postscripts list for your node. For example, if all nodes in your compute nodegroup will be using this script:
chdef -t group -o compute -p postscripts=IBMhpc.postscript
Add to pkglist:
Edit your /install/custom/install/<ostype>/<profile>.pkglist and add the base IBMhpc pkglist:
#INCLUDE:/opt/xcat/share/xcat/IBMhpc/IBMhpc.sles11.ppc64.pkglist#
For rhels6 ppc64, edit the /install/custom/install/rh/compute.pkglist,
#INCLUDE:/opt/xcat/share/xcat/IBMhpc/IBMhpc.rhels6.ppc64.pkglist#
Verify that the above sample pkglist contains the correct packages. If you need to make changes, you can copy the contents of the file into your <profile>.pkglist and edit as you wish instead of using the #INCLUDE: ...# entry.
Note: This pkglist support is available with xCAT 2.5 and newer releases. If you are using an older release of xCAT, you will need to add the entries listed in these pkglist files to your Kickstart or AutoYaST install template file.
Add to otherpkgs:
Edit your /install/custom/install/<ostype>/<profile>.otherpkgs.pkglist and add:
#INCLUDE:/opt/xcat/share/xcat/IBMhpc/loadl/loadl-5103.otherpkgs.pkglist#
Note: If you are using LoadLeveler 5.1.0.2 or below, please use pkglist /opt/xcat/share/xcat/IBMhpc/loadl/loadl.otherpkgs.pkglist
Verify that the above sample pkglist contains the correct LoadLeveler packages. If you need to make changes, you can copy the contents of the file into your <profile>.otherpkgs.pkglist and edit as you wish instead of using the #INCLUDE: ...# entry.
Add to postbootscripts:
Copy the LoadLeveler postbootscript to the xCAT postscripts directory:
cp /opt/xcat/share/xcat/IBMhpc/loadl/loadl_install /install/postscripts
Review and edit this script to meet your needs. This script will run on the node after OS has been installed, the node has rebooted for the first time, and the xCAT default postbootscripts have run. Note that this script will only install the Loadl-resmgr-full rpm. You will need to edit this script if you wish to also install the LoadL-scheduler rpm on all of your compute nodes.
Add this script to the postbootscripts list for your node. For example, if all nodes in your compute nodegroup will be using this script and the nodes' attribute postbootscripts value is otherpkgs:
chdef -t group -o compute -p postbootscripts=loadl_install-5103
If you already have unique postbootscripts attribute settings for some of your nodes (i.e. the value contains more than simply "otherpkgs" and that value is not part of the above group definition), you may need to change those node definitions directly:
chdef <noderange> -p postbootscripts=loadl_install-5103
Note: If you are using LoadLeveler 5.1.0.2 or below, please use postscript loadl_install instead
(Optional) Synchronize system configuration files:
LoadLeveler requires that userids be common across the cluster. There are many tools and services available to manage userids and passwords across large numbers of nodes. One simple way is to use common /etc/password files across your cluster. You can do this using xCAT's syncfiles function. Create the following file:
vi /install/custom/install/<ostype>/<profile>.synclist
add the following line:
/etc/hosts /etc/passwd /etc/group /etc/shadow -> /etc/
When the node is installed or 'updatenode <noderange> -F' is run, these files will be copied to your nodes. You can periodically re-sync these files as changes occur in your cluster. See the xCAT documentation for more details: [Sync-ing_Config_Files_to_Nodes]
If your nodes are already installed with the correct OS, and you are adding LoadLeveler software to the existing nodes, continue with these instructions and skip the next step to "Network boot the nodes".
The updatenode command will be used to add the LoadLeveler software and run the postscripts using the pkglist and otherpkgs.pkglist files created above. Note that support was added to updatenode in xCAT 2.5 to install packages listed in pkglist files (previously, only otherpkgs.pkglist entries were installed). If you are running an older version of xCAT, you may need to add the pkglist entries to your otherpkgs.pkglist file or install those packages in some other way on your existing nodes.
You will want updatenode to run zypper or yum to install all of the packages. Make sure their repositories have access to the base OS rpms:
SLES:
xdsh <noderange> zypper repos --details | xcoll
RedHat:
xdsh <noderange> yum repolist -v | xcoll
If you installed these nodes with xCAT, you probably still have repositories set pointing to your distro directories on the xCAT MN or SNs. If there is no OS repository listed, add appropriate remote repositories using the zypper ar command or adding entries to /etc/yum/repos.d.
Also, for updatenode to use zypper or yum to install packages from your /install/post/otherpkgs directories, make sure you have run the createrepo command for each of your otherpkgs directories (see instructions in the "Updating xCAT nodes" document [Using_Updatenode]
Synchronize configuration files to your nodes (optional):
updatenode <noderange> -F
Update the software on your nodes:
updatenode <noderange> -S
Run postscripts and postbootscripts on your nodes:
updatenode <noderange> -P
Network boot your nodes:
To continue to set up LoadLeveler in an AIX stateful cluster, follow these steps:
Include LoadLeveler in your stateful image:
Install the optional xCAT-IBMhpc rpm on your xCAT management node. This rpm is available with xCAT and should already exist in the directory that you downloaded your xCAT rpms to. It did not get installed when you ran the instxcat script. A new copy can be downloaded from: Download xCAT.
To install the rpm:
cd <your xCAT rpm directory>
rpm -Uvh xCAT-IBMhpc*.rpm
The packages that will be installed by the xCAT HPC Integration support are listed in sample bundle files. Review the following file to verify you have all the product packages you wish to install (instructions are provided below for copying and editing this file if you choose to use a different list of packages):
/opt/xcat/share/xcat/IBMhpc/loadl/loadl.bnd
Add the LoadLeveler packages to the lpp_source used to build your image:
inutoc /install/post/otherpkgs/aix/ppc64/loadl
nim -o update -a packages=all -a source=/install/post/otherpkgs/aix/ppc64/loadl <lpp_source_name>
Some of the HPC products require additional AIX packages that may not be part of your default AIX lpp_source. Review the following file to verify all the AIX packages needed by the HPC products are included in your lpp_source (instructions are provided below for copying and editing this file if you choose to use a different list of packages):
/opt/xcat/share/xcat/IBMhpc/IBMhpc_base.bnd
To list the contents of your lpp_source, you can use:
nim -o showres <lpp_source_name>
And to add additional packages to your lpp_source, you can use the nim update command similar to above specifying your AIX distribution media and the AIX packages you need.
Create NIM bundle resources for base AIX prerequisites and for your LoadLeveler packages:
cp /opt/xcat/share/xcat/IBMhpc/IBMhpc_base.bnd /install/nim/installp_bundle
nim -o define -t installp_bundle -a server=master -a location=/install/nim/installp_bundle/IBMhpc_base.bnd IBMhpc_base
cp /opt/xcat/share/xcat/IBMhpc/loadl/loadl.bnd /install/nim/installp_bundle
nim -o define -t installp_bundle -a server=master -a location=/install/nim/installp_bundle/loadl.bnd loadl
Review these sample bundle files and make any changes as desired. Note that the loadl.bnd file will only install the LoadL.resmgr lpp. If you wish to install the full LoadLeveler product on all of your compute nodes, edit this bundle file, and make corresponding changes to the loadl_install postscript below.
Add the bundle resources to your xCAT image definition:
chdef -t osimage -o <image_name> -p installp_bundle="IBMhpc_base,loadl"
Add base HPC and LoadLeveler postscripts
cp -p /opt/xcat/share/xcat/IBMhpc/IBMhpc.postscript /install/postscripts
cp -p /opt/xcat/share/xcat/IBMhpc/loadl/loadl_install-5103 /install/postscripts
chdef -t group -o <compute nodegroup> -p postscripts="IBMhpc.postscript,loadl_install-5103"
Review these sample scripts carefully and make any changes required for your cluster. Note that some of these scripts may change tuning values and other system settings. The scripts will be run on the node after it has booted as part of the xCAT diskless node postscript processing.
(Optional) Synchronize system configuration files:
LoadLeveler requires that userids be common across the cluster. There are many tools and services available to manage userids and passwords across large numbers of nodes. One simple way is to use common /etc/password files across your cluster. You can do this using xCAT's syncfiles function. Create the following file:
vi /install/custom/install/aix/<profile>.synclist
add the following lines:
/etc/hosts /etc/passwd /etc/group -> /etc/
/etc/security/passwd /etc/security/group /etc/security/limits /etc/security/roles -> /etc/security/
Add this syncfile to your image:
chdef -t osimage -o <imagename> synclists=/install/custom/install/aix/<profile>.synclist
When the node is installed or 'updatenode <noderange> -F' is run, these files will be copied to the node. You can periodically re-sync these files as changes occur in your cluster. See the xCAT documentation for more details: [Sync-ing_Config_Files_to_Nodes]
If your nodes are already installed with the correct OS, and you are adding LoadLeveler software to the existing nodes, continue with these instructions and skip the next step to "Network boot the nodes".
The updatenode command will be used to synchronize configuration files, add the LoadLeveler software and run the postscripts. To have updatenode install both the OS prereqs and the base LoadLeveler packages, complete the previous instructions to add LoadLeveler software to your image.
Synchronize configuration files to your nodes (optional):
updatenode <noderange> -F
Update the software on your nodes:
updatenode <noderange> -S installp_flags="-agQXY"
Run postscripts and postbootscripts on your nodes:
updatenode <noderange> -P
Follow the instructions in the xCAT AIX documentation [XCAT_AIX_RTE_Diskfull_Nodes] to network boot your nodes:
There are several ways you can start LoadLeveler on your cluster nodes.
Use the xCAT xdsh command to run the LoadLeveler llrctl start command individually on all nodes, or use LoadLeveler to distribute the commands by running "llrctl -g start". Note that for very large clusters, running the llrctl -g command to start the daemons on all the nodes in the cluster can take a long time since this is a serial operation from the LoadLeveler central manager. Therefore, using xdsh with appropriate fanout values may be a better choice.
You can set up /etc/inittab or /etc/init.d to automatically start LoadLeveler when your node boots. However, if your shared home directory is in GPFS, and this is a large cluster using a network interface that may take a little extra start time at node boot, this may not be a reliable way to start the daemons.
Wiki: Download_xCAT
Wiki: Granting_Users_xCAT_privileges
Wiki: IBM_HPC_Stack_in_an_xCAT_Cluster
Wiki: Power_775_Cluster_on_MN
Wiki: Setting_up_all_IBM_HPC_products_in_a_Stateful_Cluster
Wiki: Sync-ing_Config_Files_to_Nodes
Wiki: Using_Updatenode
Wiki: XCAT_2.6.6_Release_Notes
Wiki: XCAT_AIX_RTE_Diskfull_Nodes
Wiki: XCAT_pLinux_Clusters_775