xCAT Wiki

An extreme cluster/cloud administration toolkit

Brought to you by: besawn, cxhong, gurevich, obihoernchen, victorhu

Setting_Up_an_AIX_Hierarchical_Cluster

There is a newer version of this page. You can find it here.

Overview
Switch to a relational database
Setup for P7 IH Cluster
Defining Cluster Nodes
Install the Service Nodes
Setup of GPFS I/O Server nodes
Install the cluster nodes
Using login nodes
Using a backup service node
Cleanup
- Removing NIM machine definitions
- Removing NIM resources

Overview

In an xCAT cluster the single point of control is the xCAT management node. However, in order to provide sufficient scaling and performance for large clusters, it may also be necessary to have additional servers to help handle the deployment and management of the cluster nodes. In an xCAT cluster these additional servers are referred to as service nodes.

For an xCAT on AIX cluster there is a primary NIM master which is on the management node. The service nodes are configured as additional NIM masters. All commands are run on the management node. The xCAT support automatically handles the NIM setup on the low level service nodes and the distribution of the NIM resources. All installation resources for the cluster are managed from the primary NIM master. The NIM resources are automatically replicated on the low level masters when they are needed.

You can set up one or more service nodes in an xCAT cluster. The number you need will depend on many factors including the number of nodes in the cluster, the type of node deployment, the type of network etc. (As a general "rule of thumb" you should plan on having at least 1 service node per 128 cluster nodes.)

A service node may also be used to run user applications in most cases.

For reliability, availability, and serviceability purposes users may wish to configure backup service nodes in hierarchical cluster environments.

The backup service node will be configured to be able to quickly take over from the original service node if a problem occurs.

This is not an automatic failover feature. You will have to initiate the switch from the primary service node to the backup manually. The xCAT support will handle most of the setup and transfer of the nodes to the new service node.

See Section 5, "Using a backup service node", later in this document for details on how to set this up.

An xCAT service node must be installed with xCAT software as well as additional prerequisite software.

AIX service nodes must be diskfull (NIM standalone) systems. Diskless xCAT service nodes are not currently supported for AIX.

In the process described below the service nodes will be deployed using a standard AIX/NIM "rte" network installation. If you are using multiple service nodes you may want to consider creating a "golden" mksysb image that you can use as a common image for all the service nodes. See the xCAT document named "Cloning AIX nodes (using an AIX mksysb image)" for more information on using mksysb images. See [XCAT_AIX_mksysb_Diskfull_Nodes].

In this document it is assumed that the cluster nodes will be diskless. The cluster nodes will be deployed using a common diskless image. It is also possible to deploy the cluster nodes using "rte" or "mksysb" type installs.

Before starting this process it is assumed you have configured an xCAT management node by following the process described in the AIX overview document. [XCAT_AIX_Cluster_Overview_and_Mgmt_Node]

Switch to a relational database

When using service nodes you must switch to a database that supports remote access. XCAT currently supports MySQL, PostgreSQL, and DB2. As a convenience, the xCAT site provides downloads for MySQL and PostreSQL. The default SQlite database cannot be used.

If you are using P7 IH hardware in your cluster you must use DB2.

( xcat-postgresql-snap201007150920.tar.gz (sourceforge.net) and xcat-mysql-201005260807.tar.gz (sourceforge.net) )

See the following xCAT documents for instructions on how to configure these databases.

[Setting_Up_MySQL_as_the_xCAT_DB]

[Setting_Up_PostgreSQL_as_the_xCAT_DB]

[Setting_Up_DB2_as_the_xCAT_DB]

Note: When configuring the database you will need to add access for each of your service nodes. This process for this is described in the documentation mentioned above.

Note #2: The sample xCAT bundle files mentioned below contain commented-out entries for each of the supported databases. You must edit the bundle file you use to un-comment the appropriate database rpms. If the required database packages are not installed on the service node then the xCAT configuration will fail.

Note #3: The database tar files that are available on the xCAT web site may contain multiple versions of RPMs - one for each AIX operating system level. When you are copying required software to your lpp_source resource make sure you copy the rpm that coincides with your OS level. Do not copy multiple versions of the same rpm to the lpp_source directory.

Setup for P7 IH Cluster

NOTE: This support will be available in xCAT 2.6 and beyond.

Defining Cluster Nodes

Note: At this point your hardware components should be defined in the xCAT database.

For more information on methods you can use to define xCAT nodes refer to the following document. ( [Defining_cluster_nodes_on_System_P] )

The examples provided below illustrate how to define the nodes and use the mkvm command to create new partitions.

Define nodes

Using the mkdef command

You can use the mkdef command to define one or more cluster nodes. You could provide the node information on the command line or you could create a stanza file and pass it to the command.

At a minimum the "mgt" and "groups" attribute of the node definitions must be set.

For example, to create a set of new node definitions you could run a command similar to the following.

**mkdef -t node -o clstrn04-clstrn10 groups=all,aixnodes mgt=hmc**

See the mkdef man page for more examples. (mkdef )

When you need to define a large number of nodes you may wish to use the a cluster configuration file as described below.

Using a cluster configuration file.

**This support will be available in xCAT 2.6 and beyond.**

A cluster configuration file contains information that can be used to create initial xCAT cluster definitions for a new cluster. It can include hardware descriptions, node definitions, site information, naming conventions, IP addresses etc.

It is intended to be used in a cluster environment that has very regular naming conventions. If your cluster does not follow consistent naming patterns you should define attribute values manually instead.

A cluster configuration file can be used with the xCAT xcatsetup command to "prime" the xCAT database.

The format of a cluster configuration file is described in the man page for the xcatsetup command. (xcatsetup)

Once you have created the configuration file you can run the xcatsetup command as follows:

**xcatsetup &lt;cluster_config_file&gt;**

Create LPARs

You can use the xCAT mkvm command to create additional logical partitions.

For example, to create the new partitions based on the partition profile for node clstrn01 you could run the following command. (This assumes you have already set up the partition for clstrn01 and that it is defined in the xCAT database.)

**mkvm -V clstn01 -i 4 -n clstrn04-clstrn10**

See the mkvm man page for more details and usage. (mkvm)

Also see the following for a description of using mkvm with the xCAT DFM support. [XCAT_System_p_Hardware_Management#Using_the_.2Avm_commands_to_define_paritions_in_xCAT_DFM]

Note: In some cases it is necessary to re-boot the hardware after running the mkvm command. Refer to the man page and documentation listed above for a more complete description.

Create LPARs and node definitions for P7 IH Cluster ????????

**TBD**

This support will be available in xCAT 2.6 and beyond.

**NOTE:**

This section will describe the process for  creating the P7 IH Octants/LPARs. It will describe the default P7 IH systems
configuration, and will describe which P7 IH octants are designated as xCAT service nodes. This section should provide
enough detail for the xCAT administrator to execute xCAT commands for the P7 IH CECs. We should try and provide the
commands and should provide sample files that can be referenced.


**TBD**

For P7/IH, Create Cluster Config File - The customer creates an hw/cluster configuration data file that contains
the info enumerated below. The purpose of this file is for the customer to describe how all the discovered
hw components should be logically arranged, ordered, and configured. This is because, during the discovery phase
(step 5), we will 1st SLP discover all of the raw hw components (HMCs, BPAs, FSPs) on the service network. But
this only gives us basic info about each component (MTMS, MAC, etc). We don't know the physical arrangement of
the components, and we don't know the IP/hostnames the customer wants for each one. Therefore, the customer
below provides that info. You can think of this as a cluster plan or blueprint - used to automate the configuration
of the system and used to verify the cluster configuration. The cluster config file also allows the customer to
provide some basic info about the other HPC products so that a basic set up of them can be accomplished throughout
the cluster. The format of the file will be typical stanza file format. See the cluster config file mini-design,
[1], for more details and the xcatsetup man page [2] for the exact format and keywords in the files.__

Install the Service Nodes

For service nodes:

Add "service" to the "groups" attribute of all the service nodes.
Specify the services that the service nodes will be providing. At a minimum, the "setupnameserver" attribute of the service nodes must be explicitly set to "yes" or "no". (Ex. "setupnameserver=no")

For non-service nodes:

Add the name of the service node for a node to the node definition. (Ex. "servicenode=xcatSN01") This is the name of the service node as it is known by the management node.
Set the "xcatmaster" attribute. This must be the name of the service node as it is known by the node. This may or may not be the same value as the "servicenode " attribute.

Note: if the "servicenode" and "xcatmaster" values are not set then xCAT will default to use the value of the "master" attribute in the xCAT "site" definition.

The updated stanzas might look something like the following.

_cl2sn01:_
_objtype=node_
_nodetype=lpar,osi_
_id=9_
_hcp=hmc01_
_pprofile=lpar9_
_parent=Server-9117-MMA-SN10F6F3D_
_groups=all,service_
_setupnameserver=no_
_mgt=hmc_
_cl2cn27:_
_objtype=node_
_nodetype=lpar,osi_
_servicenode=cl2sn01 # name of service node as known by the #management node_
_xcatmaster=cl2sn01-en1 # name of the service node as known by the #node_
_id=7_
_hcp=hmc01_
_pprofile=lpar6_
_parent=Server-9117-MMA-SN10F6F3D_
_groups=all_
_mgt=hmc_
_Server-9117-MMA-SN10F6F3D:_
_objtype=node_
_nodetype=fsp_
_id=5_
_model=9118-575_
_serial=02013EB_
_hcp=hmc01_
_pprofile=_
_parent=Server-9458-10099201WM_A_
_groups=fsp,all_
_mgt=hmc_

Note: The rscan command supports an option to automatically create node definitions in the xCAT database. To do this the LPAR name gathered by rscan is used as the node name and the command sets several default values. If you use the "-w" option make sure the LPAR name you defined will be the name you want used as your node name.

Specify the services provided by the service nodes

Distributing services to your service nodes will help alleviate the load on your management node and prevent potential bottlenecks from occurring in your cluster. You choose the services that you would like started on your service node by setting the attributes in the servicenode table. When the xcatd daemon is started or restarted on the service node, a check will be made by the xCAT code that the services from this table are configured on the service node and running, and will stop and start the service as appropriate.

This check will be done each time the xcatd is restarted on the service node. If you do not wish this check to be done, and the service not to be restarted, use the reload option when starting the daemon on the service node:

 xcatd -r

For example, the following command will setup the service node group to start the nameserver, conserver, and NTP server automatically on the service nodes. You may want to setup other services such as the monitoring server on the service node. See tabdump -d servicenode for the services available.

 chdef -t group -o service setupnameserver=1 setupconserver=1 setupntp=1

Note: When using the chdef commands, the attributes names for setting these server values do not match the actual names in the servicenode table. This is to avoid conflicts with corresponding attribute names in the noderes table. To see the correct attribute names to use with chdef:

chdef -h -t node

and search for attributes that begin with "setup".

If you do not want any service started on the service nodes, then run the following command to define the service nodes but start no services:

 chdef -t group -o service setupnameserver=0

Add IP addresses and hostnames to /etc/hosts

Make sure all node hostnames are added to /etc/hosts. Refer to the section titled "Add cluster nodes to the /etc/hosts file" in the following document for details:

[XCAT_AIX_Cluster_Overview_and_Mgmt_Node]

If you are working on a P7 IH cluster, to bring up all the HFI interfaces on service nodes, make sure IP/hostnames for all the HFI interfaces on service nodes have been updated in /etc/hosts.

Create a Service Node operating system image

Reminder: If you wish to create separate file systems for your NIM resources you should do that before continuing. For example, you might want to create a seaparate file system for /install and one for any dump resources you may need. You may also wish to change the primary hostname of your management node to the cluster management interface. This is described in XCAT_AIX_Cluster_Overview_and_Mgmt_Node

Use the xCAT mknimimage command to create an xCAT osimage definition as well as the required NIM installation resources.

An xCAT osimage definition is used to keep track of a unique operating system image and how it will be deployed.

In order to use NIM to perform a remote network boot of a cluster node the NIM software must be installed, NIM must be configured, and some basic NIM resources must be created.

The mknimimage will handle all the NIM setup as well as the creation of the xCAT osimage definition. It will not attempt to reinstall or reconfigure NIM if that process has already been completed. See the mknimimage man page for additional details.

Note: If you wish to install and configure NIM manually you can run the AIX nim_master_setup command (Ex. "nim_master_setup -a mk_resource=no -a device=<source directory>") or use other NIM commands such as nimconfig.

By default, the mknimimage command will create the NIM resources in subdirectories of /install. Some of the NIM resources are quite large (1-2G) so it may be necessary to increase the file size limit.

For example, to set the file size limit to "unlimited" for the user "root" you could run the following command.

 _**/usr/bin/chuser fsize=-1 root**_

When you run the command you must provide a source for the installable images. This could be the AIX product media, a directory containing the AIX images, or the name of an existing NIM lpp_source resource. You must also provide a name for the osimage you wish to create. This name will be used for the NIM SPOT resource that is created as well as the name of the xCAT osimage definition. The naming convention for the other NIM resources that are created is the osimage name followed by the NIM resource type, (ex. " 61cosi_lpp_source").

In this example we need resources for installing a NIM "standalone" type machine using the NIM "rte" install method. (This type and method are the defaults for the mknimimage command but you can specify other values on the command line.)

For example, to create an osimage named "610SNimage" using the images contained in the /myimages directory you could issue the following command.

 _**mknimimage -s /myimages 610SNimage**_

(Creating the NIM resources could take a while!)

Note: To populate the /myimages directory you could copy the software from the AIX product media using the AIX gencopy command. For example you could run "gencopy -U -X -d /dev/cd0 -t /myimages all".

By default the command will create NIM lpp_source, spot, and bosinst_data resources. You can also specify alternate or additional resources on the command line using the "attr=value" option, ("<nim resource type>=<resource name>").

For example:

 _mknimimage -s /dev/cd0 610SNimage resolv_conf=my_resolv_conf_

Any additional NIM resources specified on the command line must be previously created using NIM interfaces. (Which means NIM must already have been configured previously. )

Note: Another alternative is to run mknimimage without the additional resources and then simply add them to the xCAT osimage definition later. You can add or change the osimage definition at any time. When you initialize and install the nodes xCAT will use whatever resources are specified in the osimage definition.

When the command completes it will display the osimage definition which will contain the names of all the NIM resources that were created. The naming convention for the NIM resources that are created is the osimage name followed by the NIM resource type, (ex. " 610SNimage_lpp_source"), except for the SPOT name. The default name for the SPOT resource will be the same as the osimage name.

The xCAT osimage definition can be listed using the lsdef command, modified using the chdef command and removed using the rmnimimage command. See the man pages for details.

In some cases you may also want to modify the contents of the NIM resources. For example, you may want to change the bosinst_data file or add to the resolv_conf file etc. For details concerning the NIM resources refer to the NIM documentation.

You can list NIM resource definitions using the AIX lsnim command. For example, if the name of your SPOT resource is "610SNimage" then you could get the details by running:

_lsnim -l 610SNimage_

To see the actual contents of a NIM resource use

_nim -o showres &lt;resource name&gt;_

For example, to get a list of the software installed in your SPOT you could run:

_**nim -o showres 610SNimage**_

Create an image_data resource (optional)

If you are using PostgreSQL or DB2 you must make sure the node starts out with enough file system space to install the database software. This can be done using the NIM image_data resource.

A NIM image_data resource is a file that contains stanzas of information that is used when creating file systems on the node. To use this support you must create the file , define it as a NIM resource, and add it to the xCAT osimage definition.

To help simplify this process xCAT ships a sample image_data file called

_/opt/xcat/share/xcat/image_data/xCATsnData_

This file assumes you will have at least 70G of disk space available. It also sets the physical partition size to 128M.

It sets the following default file system sizes.

_/var -&gt; 5G _
_/opt -&gt; 10G _
_/ -&gt; 30G _
_/usr -&gt; 4G _
_/tmp -&gt; 3G_
_/home -&gt; 0.12G_
_/admin -&gt; 0.12 G _
_/livedump -&gt; 0.25G_

If you need to change any of these be aware that you must change two stanzas for each file system. One is the fs_data and the other is the corresponding lv_data.

Once you have settled on a final version of the image_data file you can copy it to the location that will be used when defining NIM resources.

(ex. _/install/nim/image_data/myimage_data_)

To define the NIM resource you could use the SMIT interfaces or run a command similer to the following.

_**nim -o define -t image_data -a server=master -a location= /install/nim/image_data/myimage_data myimage_data**_

To add these bundle resources to your xCAT osimage definition run:

_**chdef -t osimage -o 610SNimage image_data=myimage_data**_

Add required service node software

XCAT and prerequisite software

An xCAT AIX service node must also be installed with additional xCAT and prerequisite software.

The required software is specified in the sample bundle file discussed below.

To simplify this process xCAT includes all required xCAT and open source dependent software in the following files:

core-aix-&lt;version&gt;.tar.gz
dep-aix-&lt;version&gt;.tar.gz tar

The required software must be copied to the NIM lpp_source that is being used for the service node image. The easiest way to do this is to use the following command:

nim -o update

NOTE: The latest xCAT dep-aix package actually includes multiple sub-directories corresponding to different versions of AIX. Be sure to copy the correct versions of the rpms to your lpp_source directory.

For example, assume all the required xCAT rpm software has been copied and unwrapped in the /tmp/images directory.

Assuming you are using AIX 6.1 you could copy all the appropriate rpms to your lpp_source resource using the following commands:

_**nim -o update -a packages=all -a source=/tmp/images/xcat-dep/6.1 610SNimage_lpp_source**_
_**nim -o update -a packages=all -a source=/tmp/images/xcat-core/  610SNimage_lpp_source**_

The NIM command will find the correct directories and update the lpp_source resource.

There are also several installp filesets that will be needed. It would be good to check that they are in your lpp_source. If not, you should be able to get them from the AIX source (media).

_expect_
_tcl_
_tk_
_openssl_
_openssh_
_bos.sysmgt.nim.master_

Additional diskless dump software (optional)

The AIX ISCSI dump support requires the devices.tmiscw fileset. This is currently available from the AIX Expansion Pack.

If you plan to create dump resources for your diskless nodes you must have this software installed on you management node and service nodes.

The easiest way to include this fileset is to copy it to your lpp_source resources (as mentioned above) and to include it in the installp_bundle file (described below).

Additional P7 IH software (optional)

**NOTE**:  This support will be available in xCAT 2.6 and beyond.

The P7 IH cluster requires additional software that needs to be included in the lpp_source. This includes HFI device drivers, NIM, and bootpd binary.

The following is a list of the required software. This software can be provide on request.

bootpd
cpio
dd/README
dd/devices.chrp.IBM.HFI
dd/devices.common.IBM.hfi
dd/if_hf
dd/xCATaixHFIdd.bnd
nim/bos.sysmgt
scripts/confighfi
scripts/confignim
scripts/synclist

Assume this software has all been copied to /hfi on the management node.

Update the HFI device driver to lpp source:

_**nim -o update -a packages=all -a source=/hfi/dd/ 610SNimage_lpp_source**_

Define HFI device driver installp bundle

_**nim -o define -t installp_bundle -a server=master -a location=/hfi/dd/xCATaixHFIdd.bnd xCATaixHFIdd**_

Assign HFI devices drivers isntallp bundle to service node image so they will be installed during service node installation.

_**chdef -t osimage -o 610SNimage installp_bundle=xCATaixHFIdd**_

Include additional files for P7 IH support

The P7 IH service node require additional files and postscripts. These are all specified in the /hfi/scripts/synclist file.

To include them in the xCAT osimage used for the service nodes you can run a command similar to the following.

_**chdef -t osimage -o 610SNimage synclists=/hfi/scripts/synclist**_

Using NIM installp_bundle resources

To get all this additional software installed we need a way to tell NIM to include it in the installation. To facilitate this, xCAT provides sample NIM installp bundle files. (Always make sure that the contents of the bundle files you use are the packages you want to install and that they are all in the appropriate lpp_source directory.)

Starting with xCAT version 2.4.3 there will be a set of bundle files to use for installing a service node. They will be in:

_/opt/xcat/share/xcat/installp_bundles_

There is a version corresponding to the different AIX OS levels. (xCATaixSN71.bnd, xCATaixSN61.bnd etc.) Just use the one that corresponds to the version of AIX you are running.

Note: For earlier version of xCAT the sample bundle files are shipped as part of the xCAT tarball file.

To use the bundle file you need to define it as a NIM resource and add it to the xCAT osimage definition.

Copy the bundle file ( say xCATaixSN61.bnd ) to a location where it can be defined as a NIM resource, for example "/install/nim/installp_bundle".

To define the NIM resource you can run the following command.

_**nim -o define -t installp_bundle -a server=master -a location= /install/nim/installp_bundle/xCATaixSN61.bnd xCATaixSN61**_

To add this bundle resources to your xCAT osimage definition run:

_**chdef -t osimage -o 610SNimage installp_bundle="xCATaixSN61"**_

Important Note: The sample xCAT bundle files mentioned above contain commented-out entries for each of the supported databases. You must edit the bundle file you use to uncomment the appropriate database rpms. If the required database packages are not installed on the service node then the xCAT configuration will fail.

Check the osimage (optional)

To avoid potential problems when installing a node it is adviseable to verify that all the additional software that you wish to install has been copied to the appropriate NIM lpp_source directory.

Any software that is specified in the "otherpkgs" or the "installp_bundle" attributes of the xCAT osimage definition must be available in the lpp_source directories.

To find the location of the lpp_source directories run the "lsnim -l <lpp_source_name>" command:

_**lsnim -l 610SNimage_lpp_source**_

If the location of your lpp_source resource is "/install/nim/lpp_source/610SNimage_lpp_source/" then you would find rpm packages in "/install/nim/lpp_source/610SNimage_lpp_source/RPMS/ppc" and you would find your installp and emgr packages in "/install/nim/lpp_source/610SNimage_lpp_source/installp/ppc".

To find the location of the installp_bundle resource files you can use the NIM "lsnim -l" command. For example,

_**lsnim -l xCATaixSN61**_

Starting with xCAT version 2.4.3 you can use the xCAT chkosimage command to do this checking. For example:

_**chkosimage -V 610SNimage**_

In addition to letting you know what software is missing from your lpp_source the chkosimage command will also indicate if there are multiple files that match the entries in your bundle file. This can happen when you use wild cards in the packages names added to the bundle file. In this case you must remove any old packages so that there is only one rpm selected for each entry in the bundle file.

To automate this process you may be able to use the "-c" (clean) option of the chkosimage command. This option will keep the rpm that was most recently written to the directory and remove the others. (Be careful when using this option!)

For example,

_**chkosimage -V -c 610SNimage**_

Define xCAT networks

Create an xCAT network definition for each network that contains cluster nodes. You will need a name for the network and values for the following attributes.

net The network address.

mask The network mask.

gateway The network gateway.

In our example we will assume that all the cluster node management interfaces and the xCAT management node interface are on the same network. You can use the xCAT mkdef command to define the network.

For an flat ethernet example called net1:

_**mkdef -t network -o net1 net=9.114.0.0 mask=255.255.255.224 gateway=9.114.113.254**_

Note: The xCAT definition should correspond to the NIM network definition. If multiple cluster subnets are needed then you will need an xCAT and NIM network definition for each one. If you are using the default master_net, make sure that the default gateway is properly for your xCAT MN.

Define xCAT P7 IH HFI networks

Create an xCAT network definition for each HFI network that contains cluster nodes.

For example, to create an HFI network definition called "hfinet" you cold run a comand similar to the following.

_**mkdef -t network -o hfinet net=20.0.0.0  mask=255.0.0.0 gateway=20.7.4.5**_

Create additional NIM network definitions (optional)

For the processs described in this document we are assuming that the xCAT management node and the LPARs are all on the same network.

However, depending on your specific situation, you may need to create additional NIM network and route definitions.

NIM network definitions represent the networks used in the NIM environment. When you configure NIM, the primary network associated with the NIM master is automatically defined. You need to define additional networks only if there are nodes that reside on other local area networks or subnets. If the physical network is changed in any way, the NIM network definitions need to be modified.

To create the NIM network definitions corresponding to the xCAT network definitions you can use the xCAT xcat2nim command.

For example, to create the NIM definitions corresponding to the xCAT "clstr_net" network you could run the following command.

_**xcat2nim -V -t network -o clstr_net**_

Manual method

The following is an example of how to define a new NIM network using the NIM command line interface.

Step 1

Create a NIM network definition. Assume the NIM name for the new network is "clstr_net", the network address is "10.0.0.0", the network mask is "255.0.0.0", and the default gateway is "10.0.0.247".

_nim -o define -t ent -a net_addr=10.0.0.0 -a snm=255.0.0.0 -a routing1='default 10.0.0.247' clstr_net_

Step 2

Create a new interface entry for the NIM "master" definition. Assume that the next available interface index is "2" and the hostname of the NIM master is "xcataixmn". This must be the hostname of the management node interface that is connected to the "clstr_net" network.

_nim -o change -a if2='clstr_net xcataixmn 0' -a cable_type2=N/A master_

Step 3 - (optional)

If the new subnet is not directly connected to a NIM master network interface then you should create NIM routing information

The routing information is needed so that NIM knows how to get to the new subnet. Assume the next available routing index is "2", and the IP address of the NIM master on the "master_net" network is "8.124.37.24". Assume the IP address on the NIM master on the "clstr_net" network is " 10.0.0.241". This command will set the route from "master_net" to "clstr_net" to be " 10.0.0.241" and it will set the route from "clstr_net" to "master_net" to be "8.124.37.24".

_nim -o change -a routing2='master_net 10.0.0.241 8.124.37.24' clstr_net_

Step 4

Verify the definitions by running the following commands.

_lsnim -l master_
_lsnim -l master_net_
_lsnim -l clstr_net_

See the NIM documentation for details on creating additional network and route definitions. (IBM AIX Installation Guide and Reference. <http://www-03.ibm.com/servers/aix/library/index.html>)

Define an xCAT service group

If you did not already create the xCAT "service" group when you defined your nodes then you can do it now using the mkdef or chdef command.

There are two basic ways to create xCAT node groups. You can either set the "groups" attribute of the node definition or you can create a group directly. You can set the "groups" attribute of the node definition when you are defining the node with the mkdef command or you can modify the attribute later using the chdef command. For example, if you want to create the group called "service" with the members sn01 and sn02 you could run chdef as follows.

_chdef -t node -p -o sn01,sn02 groups=service _

The "-p" option specifies that "service" be added to any existing value for the "groups" attribute.

The second option would be to create a new group definition directly using the mkdef command as follows.

_mkdef -t group -o service members="sn01,sn02"_

These two options will result in exactly the same definitions and attribute values being created.

Include customization scripts (optional)

xCAT supports the running of customization scripts on the nodes when they are installed.

This support includes:

The running of a set of default customization scripts that are required by xCAT.
You can see what scripts xCAT will run by default by looking at the "xcatdefaults" entry in the xCAT "postscripts" database table. ( I.e. Run "tabdump postscripts".). You can change the default setting by using the xCAT chtab or tabedit command. The scripts are contained in the /install/postscripts directory on the xCAT management node.
The optional running of customization scripts provided by xCAT.
There is a set of xCAT customization scripts provided in the /install/postscripts directory that can be used to perform optional tasks such as additional adapter configuration.
The optional running of user-provided customization scripts.

To have your script run on the nodes:

Put a copy of your script in /install/postscripts on the xCAT management node. (Make sure it is executable.)
When using service nodes make sure the postscripts are copied to the /install/postscripts directories on each service node.
Set the "postscripts" attribute of the node definition to include the comma separated list of the scripts that you want to be executed on the nodes. The order of the scripts in the list determines the order in which they will be run. For example, if you want to have your two scripts called "foo" and "bar" run on node "node01" you could use the chdef command as follows.

chdef -t node -o node01 -p postbootscripts=foo,bar

(The "-p" means to add these to whatever is already set.)

The customization scripts are run during the post boot process ( during the processing of /etc/inittab).

Note: For diskfull installs if you wish to have a script run after the install but before the first reboot of the node you can create a NIM script resource and add it to your osimage definition.

Add "servicenode" script for service nodes

You must add the "servicenode" script to the postbootscripts attribute of all the service node definitions. To do this you could modify each node definition individually or you could simply modify the definition of the "service" group.

For example, to have the "servicenode" postscript run on all nodes in the group called "service" you could run the following command.

_**chdef -p -t group service postbootscripts=servicenode**_

Add NTP setup script (optional)

To have xCAT automatically set up ntp on the cluster nodes you must add the setupntp script to the list of postscripts that are run on the nodes.

To do this you can either modify the "postscripts" attribute for each node individually or you can just modify the definition of a group that all the nodes belong to.

For example, if all your nodes belong to the group "compute" then you could add setupntp to the group definition by running the following command.

_chdef -p -t group -o compute postscripts=setupntp_

Note: In hierarchy cluster, the ntpserver for the compute nodes will be pointed to the their service nodes, so if you want to set up ntp on the compute nodes, make sure the ntp server is set up correctly on the service nodes, the setupntp postscript can set up both the ntp client and the ntp server.

Add secondary adapter configuration script (optional)

It is possible to have additional adapter interfaces automatically configured when the nodes are booted. XCAT provides sample configuration scripts for both Ethernet and IB adapters. These scripts can be used as-is or they can be modified to suit you particular environment. The Ethernet sample is /install/postscript/configeth. When you have the configuration script that you want you can add it to the "postscripts" attribute as mentioned above. Make sure your script is in the /install/postscripts directory and that it is executable.

If you wish to configure IB interfaces please refer to: [Managing_the_Infiniband_Network]

P7 IH configuration scripts (optional)

There are additional post scripts required that will be used to update the NIM master on the service node and to configure HFI interfaces when working with a P7 IH configuration. This includes the HFI network and DB2 configuration scripts.

You must include the following scripts:

confighfi
confignim
db2install
odbcsetup

These scripts must be copied to the /install/postscript/ directory on the management node.

The names of the scripts must be added to the "postbootscripts" attribute of the the service node (or service group) definitions.

_ The order of the scripts is important._

The "db2install" script must come before the "servicenode" script and the "odbcsetup" script must come after.

To set the postscripts attribute you can run a command similar to the following.

**chdef service postbootscripts=confighfi,confignim,db2install,servicenode,odbcsetup**

Verify the "postbootscript" setting by listing one of the service node definitions.

**lsdef xcatsn12**

Make sure the scripts are all listed and in the correct order.

Some scripts are included by default or by setting other attributes. It may be useful to check the xCAT "postscripts" table if you are having difficulty getting the list of postscripts correct.

**tabedit postscripts**


**Note:** Currently the xCAT for AIX support does not distinguish between the "postscripts" and "postbootscripts" attributes.    Both are treated as post boot scripts and are run after the initial boot of the node.

Gather MAC information for the install adapters

Create NIM client & group definitions

You can use the xCAT xcat2nim command to automatically create NIM machine and group definitions based on the information contained in the xCAT database. By doing this you synchronize the NIM and xCAT names so that you can use the same target names when running either an xCAT or NIM command.

To create NIM machine definitions for your service nodes you could run the following command.

_**xcat2nim -t node service**_

To create NIM group definition for the group "service" you could run the following command.

_**xcat2nim -t group -o service**_

To check the NIM definitions you could use the NIM lsnim command or the xCAT xcat2nim command. For example, the following command will display the NIM definitions of the nodes contained in the xCAT group called "service", (from data stored in the NIM database).

_xcat2nim -t node -l service_

Create prescripts (optional)

The xCAT prescript support is provided to to run user-provided scripts during the node initialization process. These scripts can be used to help set up specific environments on the servers that handle the cluster node deployment. The scripts will run on the install server for the nodes. (Either the management node or a service node.) A different set of scripts may be specified for each node if desired.

One or more user-provided prescripts may be specified to be run either at the beginning or the end of node initialization. The node initialization on AIX is done either by the nimnodeset command (for diskfull nodes) or the mkdsklsnode command (for diskless nodes.)

You can specify a script to be run at the beginning of the nimnodeset or mkdsklsnode command by setting the prescripts-begin node attribute.

You can specify a script to be run at the end of the commands using the prescripts-end node attribute.

See the following for the format of the prescript table: [Postscripts_and_Prescripts]

The attributes may be set using the chdef command.

For example, if you wish to run the foo and bar prescripts at the beginning of the nimnodeset command you would run a command similar to the following.

_**chdef -t node -o node01 prescripts-begin="standalone:foo,bar"**_

When you run the nimnodeset command it will start by checking each node definition and will run any scripts that are specified by the _prescripts-begin _attributes.

Similarly, the last thing the command will do is run any scripts that were specified by the _prescripts-end _attributes.

Initialize the AIX/NIM nodes

You can use the xCAT nimnodeset command to initialize the AIX standalone nodes. This command uses information from the xCAT osimage definition and default values to run the appropriate NIM commands.

For example, to set up all the nodes in the group "service" to install using the osimage named "610SNimage" you could issue the following command.

_**nimnodeset -i 610SNimage service**_

To verify that you have allocated all the NIM resources that you need you can run the "lsnim -l" command. For example, to check the node "clstrn01" you could run the following command.

_lsnim -l clstrn01_

The command will also set the "profile" attribute in the xCAT node definitions to "610SNimage ". Once this attribute is set you can run the nimnodeset command without the "-i" option.

Note: To verify that NIM has properly initialized the nodes you can also check the contents of the /etc/bootptab or /var/lib/dhcp/db/dhcpd.leases, /etc/exports, and the node "info" file in the /tftpboot directory.

Open a remote console (optional)

You can open a remote console to monitor the boot progress using the xCAT rcons command. This command requires that you have conserver installed and configured.

If you wish to monitor a network installation you must run rcons before initiating a network boot.

To configure conserver run:

_**makeconservercf **_

To start a console:

_**rcons node01**_

Note: You must always run makeconservercf after you define new cluster nodes.

Initiate a network boot

Initiate a remote network boot request using the xCAT rnetboot command. For example, to initiate a network boot of all nodes in the group "service" you could issue the following command.

_**rnetboot service**_

Note: If you receive timeout errors from the rnetboot command, you may need to increase the default 60-second timeout to a larger value by setting ppctimeout in the site table:

_chdef -t site -o clustersite ppctimeout=180_

Verify the deployment

If you opened a remote console using rcons you can watch the progress of the installation.
For p6 lpars, it may be helpful to bring up the HMC web interface in a browser and watch the lpar status and reference codes as the node boots.
You can use the AIX lsnim command to see the state of the NIM installation for a particular node, by running the following command on the NIM master:

lsnim -l <clientname>
When the node is booted you can log in and check if it is configured properly. For example, is the password set?, is the timezone set?, can you xdsh to the node etc.

Retry and troubleshooting tips

If a node did not boot up:
Verify network connections. Try a ping test using the HMC console.
For bootp, check /etc/bootptab to make sure an entry exists for the node.
For dhcp, check /var/lib/dhcp/db/dhcpd.leases to make sure an entry exists for the node
Verify that the information in /tftpboot/<node>.info is correct.
Stop and restart inetd:

stopsrc -s inetd
startsrc -s inetd
Stop and restart tftp:

stopsrc -s tftpd
startsrc -s tftpd
Verify NFS is running properly and mounts can be performed with this NFS server:
View /etc/exports for correct mount information.
Run the showmount and exportfs commands.
Stop and restart the NFS and related daemons:

stopsrc -g nfs
startsrc -g nfs
Attempt to mount a filesystem from another system on the network.
You may need to reset the NIM client definition and start over.

nim -Fo reset node01
nim -o deallocate -a subclass=all node01
If the node booted but one or more customization scripts did not run correctly:
You can check the /var/log/messages file on the management node and the var/log/xcat/xcat.log file on the node (if it is up) to see if any error messages were produced during the installation.
Restart the xcatd daemon (xcatstop & xcatstart (xCAT2.4 restartxcatd)) and then re-run the customization scripts either manually or by using updatenode.

Configure additional adapters on the service nodes (optional)

If additional adapter configuration is required on the service nodes you could either use the xdsh command to run the appropriate AIX commands on the nodes or you may want to use the updatenode command to run a configuration script on the nodes.

XCAT provides sample adapter interface configuration scripts for Ethernet and IB. The Ethernet sample is /install/postscripts/configeth. It illustrate how to use a specific naming convention to automatically configure interfaces on the node. You can modify this script for your environment and then run it on the node using updatenode. First copy your script to the /install/postscripts directory and make sure it is executable. Then run a command similar to the following.

_updatenode clstrn01 myconfigeth_

If you wish to configure IB interfaces please refer to: [Managing_the_Infiniband_Network]

Verify Service Node configuration

During the node boot up there are several xCAT post scripts that are run that will configure the node as an xCAT service node. It is advisable to check the service nodes to make sure they are configured correctly before proceeding.

There are several things that can be done to verify that the service nodes have been configured correctly.

Check if NIM has been installed and configured by running "lsnim" or some other basic NIM command on the service node.
Check to see if all the additional software has been installed. For example, Run "rpm -qa" to see if the xCAT and dependency software is installed.
Try running some xCAT commands such as "lsdef -a"to see if the xcatd daemon is running and if data can be retrieved from the xCAT database on the management node.
If using SSH for your remote shell try to ssh to the service nodes from the management node.
Check the system services, as mentioned earlier in this document, to make sure the service node can respond to a network boot request.

Setup of GPFS I/O Server nodes

**TBD** - This is pointer to the GPFS documentation to setup GPFS I/O servers on the P7 IH cluster.

Install the cluster nodes

Planning for external NFS server(optional)

xCAT AIX stateless/statelite compute nodes need to mount NFS directories from the service node, the failure on the service node will immediately bring down all the compute nodes served by the service node, external NFS server can be used to provide high availability NFS service for the AIX stateless/statelite compute nodes to avoid single point of failure by the service node. Refer to [External_NFS_Server_Support_With_AIX_Stateless_And_Statelite] for more details.

Create a diskless image

Update the image (SPOT)

Set up statelite support (for diskless-stateless nodes only)

This support is available in xCAT version 2.5 and beyond.

The xCAT statelite support for AIX provides the ability to "overlay" specific files or directories over the standard diskless-stateless support.

There is a complete description of the statelite support in : [XCAT_AIX_Diskless_Nodes#AIX_statelite_support]

To set up the statelite support you must:

fill in one or more to the statelite tables in the xCAT database.
Run the "mknimimage -u" command which will use that information to modify the SPOT resource.

Note: You could also fill in the statelite tables before initially running the mknimimage to create the osimage. (Rather than doing the setup later with the "-u" option.)

Define xCAT networks

Create a network definition for each network that contains cluster nodes. You will need a name for the network and values for the following attributes.

net The network address.

mask The network mask.

gateway The network gateway.

This "How-To" assumes that all the cluster node management interfaces and the xCAT management node interface are on the same network. You can use the xCAT mkdef command to define the network.

For example:

_**mkdef -t network -o net1 net=9.114.113.224 mask=255.255.255.224 gateway=9.114.113.254**_

Set conserver and monserver

If the service nodes will be running the conserver or monserver daemons for the compute nodes instead of the xCAT managment node running the daemons for all of the nodes, set these attributes to the node's service node:

 chdef compute1 conserver=sn1 monserver=sn1
 chdef compute2 conserver=sn2 monserver=sn2

Gather MAC information for the node boot adapters

Use the xCAT getmacs command to gather adapter information from the nodes. This command will return the MAC information for each Ethernet adapter available on the target node. The command can be used to either display the results or write the information directly to the database. If there are multiple adapters the first one will be written to the database and used as the install adapter for that node.

The command can also be used to do a ping test on the adapter interfaces to determine which ones could be used to perform the network boot. In this case the first adapter that can be successfully used to ping the server will be written to the database.

Before running getmacs you must first run the makeconservercf command. You need to run makeconservercf any time you add new nodes to the cluster.

 _**makeconservercf**_

Shut down all the nodes that you will be querying for MAC addresses.

_**rpower aixnodes off**_

To retrieve the MAC address for all the nodes in the group "aixnodes" and write the first adapter MAC to the xCAT database you could issue the following command.

_**getmacs aixnodes**_

To display all adapter information but not write anything to the database.

_**getmacs -d aixnodes**_

To retrieve the MAC address and do a ping test to determine which adapter MAC to use for the node you could issue the following command. (The ping operation may take a while to complete.)

_**getmacs -d aixnodes -D -S 10.14.0.2 -G 10.14.0.2 -C 10.14.0.4**_

The output would be similar to the following.

Type &lt;/nowiki&gt;Location Code MAC Address Full Path Name Ping Result Device Type
ent U9125.F2A.024C362-V6-C2-T1 fef9dfb7c602 [/vdevice/l-lan@30000002](/vdevice/l-lan@30000002) successful virtual
ent U9125.F2A.024C362-V6-C3-T1 fef9dfb7c603 /vdevice/l-lan@30000003 unsuccessful virtual

From this result you can see that " fef9dfb7c602" should be used for this nodes MAC address.

For more information on using the getmacs command see the man page.

To add the MAC value to the node definition you can use the chdef command. For example:

_chdef -t node node01 mac=fef9dfb7c603_

Gather HFI MAC information for the HFI boot adapters on P7 IH

Currently xCAT only support to retrieve the HFI MAC address from doing a ping test from the HFI adapters. You could issue following command with -D to perform a ping test. (The ping operation may take a while to complete.)

_getmacs aixnodes -D --hfi _

The output would be similar to the following.

# Type  Location Code   MAC Address      Full Path Name  Ping Result
hfi-ent U78A9.001.1122233-P1 020004030004 /hfi-iohub@300000000000002/hfi-ethernet@10 unsuccessful physical
hfi-ent U78A9.001.1122233-P1 020004030004 /hfi-iohub@300000000000002/hfi-ethernet@11 unsuccessful physical

Define xCAT groups (optional)

XCAT supports both static and dynamic node groups. See the section titled "xCAT node group support" in the "xCAT2 Top Doc" document for details on using xCAT groups.

See the following doc for nodegroup support: [Node_Group_Support]

Add IP addresses and hostnames to /etc/hosts

Make sure all node hostnames are added to /etc/hosts. Refer to the section titled "Add cluster nodes to the /etc/hosts file" in the following document for details: [XCAT_AIX_Cluster_Overview_and_Mgmt_Node]

If you are working on a P7 IH cluster, to bring up all the HFI interfaces on service nodes, make sure IP/hostnames for all the HFI interfaces on service nodes have been updated in /etc/hosts.

Verify the node definitions

Verify that the node definitions include the required information.

To get a listing of the node definition you can use the lsdef command. For example to display the definitions of all nodes in the group "aixnodes" you could run the following command.

_lsdef -t node -l -o aixnodes_

The output for one diskless node might look something like the following:

_Object name: clstrn02_
_cons=hmc_
_groups=lpar,all_
_servicenode=clstrSN1_
_xcamaster=clstrSN1-en1_
_hcp=clstrhmc01_
_hostnames=clstrn02.mycluster.com_
 _id=2_
_ip=10.1.3.2_
_mac=001a64f9bfc9_
_mgt=hmc_
_nodetype=lpar,osi_
_os=AIX_
_parent=clstrf1fsp03-9125-F2A-SN024C352_
_pprofile=compute_

Most of these attributes should have been filled in automatically by xCAT.

Note: The length of a NIM object name must be no longer than 39 characters. Since the xCAT definitions will be used to create the NIM object definitions you should limit your xCAT names accordingly.

Note: xCAT supports many different cluster environments and the attributes that may be required in a node definition will vary. For diskless nodes using a servicenode, the node definition should include at least the attributes listed in the above example.

Make sure "servicenode" is set to the name of the service node as known by the management node and "xcatmaster" is set to the name of the service node as known by the node.

To modify the node definitions you can use the chdef command.

For example to set the xcatmaster attribute for node "clstrn01" you could run the following.

_chdef -t node -o clstrn01 xcatmaster=clstrSN1-en1_

Verify the node definitions for boot over HFI on P7 IH

Most of the node attributes to boot over HFI are the same as boot over ethernet above. Following is an example as an output of lsdef command.

_lsdef -t node -l -o aixnodes_


_Object name: clstrn02_
_arch=ppc64_
_cons=fsp_
_groups=lpar,all_
_hcp=Server-9125-F2C-SNP7IH019-A_
_id=9_
_ip=20.4.32.224_
_mac=020004030004_
_mgt=fsp_
_nodetype=lpar,osi_
_os=AIX_
_parent=Server-9458-100-SNBPCF007-A_
_pprofile=compute_
_servicenode=clstrSN1_
_xcatmaster=clstrSN1-hf0_

Note: xcatmaster attribute is setting to the hostname which is a HFI interface hostname on services, it should has been updated in /etc/hosts and synchronized to service node already.

Note: mac attribute is an HFI MAC address that got from compute node.

Note: ip attribute is setting to the HFI IP address that works on compute node.

Set up post boot scripts (optional)

xCAT supports the running of customization scripts on the nodes when they are installed. For diskless nodes these scripts are run when the /etc/inittab file is processed during the node boot up.

This support includes:

The running of a set of default customization scripts that are required by xCAT.
You can see what scripts xCAT will run by default by looking at the "xcatdefaults" entry in the xCAT "postscripts" database table. ( I.e. Run "tabdump postscripts".). You can change the default setting by using the xCAT chtab or tabedit command. The scripts are contained in the /install/postscripts directory on the xCAT management node.
The optional running of customization scripts provided by xCAT.
There is a set of xCAT customization scripts provided in the /install/postscripts directory that can be used to perform optional tasks such as additional adapter configuration. (See the "configiba" script for example.)
The optional running of user-provided customization scripts.

To have your script run on the nodes:

Put a copy of your script in /install/postscripts on the xCAT management node. (Make sure it is executable.)
When using service nodes make sure the postscripts are copied to the /install/postscripts directories on each service node.
Set the "postscripts" attribute of the node definition to include the comma separated list of the scripts that you want to be executed on the nodes. The order of the scripts in the list determines the order in which they will be run. For example, if you want to have your two scripts called "foo" and "bar" run on node "node01" you could use the chdef command as follows.

chdef -t node -o node01 -p postscripts=foo,bar

(The "-p" means to add these to whatever is already set.)

Note: The customization scripts are run during the boot process (out of /etc/inittab).

P7 IH configuration scripts (optional)

Thee are additional post scripts required to configure HFI interfaces when working with P7 IH configuration. You will need to make sure that the xCAT HFI scripts confighfi is properly copied into the /install/postscript/ directory. You will then need to allocate the postscripts to the proper xCAT OS images being used with installation for xCAT SN and xCAT CN.

_**chdef clstrn02 postscripts=confighfi**_

Set up prescripts (optional)

For more information about using the xCAT prescript support refer: [Using_%26_Creating_Postscripts]

Initialize the AIX/NIM diskless nodes

You can set up NIM to support a diskless boot of nodes by using the xCAT mkdsklsnode command. This command uses information from the xCAT database and default values to run the appropriate NIM commands.

For example, to set up all the nodes in the group "aixnodes" to boot using the SPOT (COSI) named "61cosi" you could issue the following command.

_**mkdsklsnode -i 61cosi aixnodes**_

The command will define and initialize the NIM machines. It will also set the "profile" attribute in the xCAT node definitions to "61cosi ".

If you are seting up all the nodes to boot over HFI, option --hfi is required to mkdsklsnode command. For example:

_**mkdsklsnode -i 61cosi aixnodes --hfi**_

To verify that NIM has allocated the required resources for a node and that the node is ready for a network boot you can run the "lsnim -l" command. For example, to check node "node01" you could run the following command.

_lsnim -l node01_

Note:

The NIM initialization of multiple nodes is done sequentially and takes approximately three minutes per node to complete. If you are planning to initialize multiple nodes you should plan accordingly. The NIM development team is currently working on a solution for this scaling issue.

Verifying the node initialization before booting (optional)

Once the mkdsklsnode command completes you can log on to the service node and verify that it has been configured correctly.

The /etc/bootptab or /var/lib/dhcp/db/dhcpd.leases, /etc/exports files.
The node info file in the /tftpboot directory.
The NIM node definition for the node.

Open a remote console (optional)

You can open a remote console to monitor the boot progress using the xCAT rcons command. This command requires that you have conserver installed and configured.

If you wish to monitor a network installation you must run rcons before initiating a network boot.

To configure conserver run:

_makeconservercf _

To start a console:

_rcons node01_

Note: You must always run makeconservercf after you define new cluster nodes.

Initiate a network boot

Initiate a remote network boot request using the xCAT rnetboot command. For example, to initiate a network boot of all nodes in the group "aixnodes" you could issue the following command.

rnetboot aixnodes

If you are trying to initiate a network boot over HFI instead of ethernet, option --hfi is required to rnetboot command.

rnetboot aixnodes --hfi

Note: If you receive timeout errors from the rnetboot command, you may need to increase the default 60-second timeout to a larger value by setting ppctimeout in the site table:

chdef -t site -o clustersite ppctimeout=180

Verify the deployment

Retry and troubleshooting tips:
For p6 lpars, it may be helpful to bring up the HMC web interface in a browser and watch the lpar status and reference codes as the node boots.
Verify network connections
If the rnetboot returns "unsuccessful" for a node, verify that bootp and tftp is configured and running properly on the .
For bootp, view /etc/bootptab to make sure an entry exists for the node.
For dhcp, view /var/lib/dhcp/db/dhcpd.leases to make sure an entry exists for the node.
Verify that the information in /tftpboot/<node>.info is correct.
Stop and restart inetd:

stopsrc -s inetd
startsrc -s inetd
Stop and restart tftp:

stopsrc -s tftp
startsrc -s tftp
Verify NFS is running properly and mounts can be performed with this NFS server:
View /etc/exports for correct mount information.
Run the showmount and exportfs commands.
Stop and restart the NFS and related daemons:

stopsrc -g nfs
startsrc -g nfs
Attempt to mount a file system from another system on the network.
If that doesn't work, you may need to re-initialize the diskless node and start over. You can use the rmdsklsnode command to uninitialize an AIX diskless node. This will deallocate and remove the NIM definition but it will not remove the xCAT node definition.

TBD

Using a backup service node

For reliability, availability, and serviceability purposes you may wish to use backup service nodes in your hierarchical cluster.

The backup service node will be set up to quickly take over from the original service node if a problem occurs.

This is not an automatic fail over feature. You will have to initiate the switch from the primary service node to the backup manually. The xCAT support will handle most of the setup and transfer of the nodes to the new service node.

Abbreviations used below:

MN : - management node.
SN : - service node.
CN : - compute node.

Initial deployment

Integrate the following steps into the hierarchical deployment process described above.

Make sure both the primary and backup service nodes are installed, configured, and can access the MN database.
When defining the CNs add the necessary service node values to the "servicenode" and "xcatmaster" attributes of the node definitions.
(Optional) Create an xCAT group for the nodes that are assigned to each SN. This will be useful when setting node attributes as well as providing an easy way to switch a set of nodes back to their original server.

Note-

xcatmaster: : The hostname of the xCAT service node as known by the node.
servicenode: : The hostname of the xCAT service node as known by the management node.

To specify a backup service node you must specify a comma-separated list of two service nodes for the "servicenode" value. The first one will be the primary and the second will be the backup for that node.

For the "xcatmaster" value you should only include the primary name of the service node as known by the node.

In the simplest case the management node, service nodes, and compute nodes are all on the same network and the interface name of the service nodes will be same for either the management node or the compute node.

For this case you could set the attributes as follows:

       chdef &lt;noderange&gt;  servicenode="xcatsn1,xcatsn2" xcatmaster="xcatsn1"

However, in some network environments the name of the SN as known by the MN may be different than the name as known by the CN. (If they are on different networks.)

In the following example assume the SN interface to the MN is on the "a" network and the interface to the CN is on the "b" network. To set the attributes you would run a command similar to the following.

       chdef &lt;noderange&gt;  servicenode="xcatsn1a,xcatsn2a" xcatmaster="xcatsn1b"

The process can be simplified by creating xCAT node groups to use as the <noderange> in the chdef command.

To create an xCAT node group containing all the nodes that have the service node "SN27" you could run a command similar to the following.

        mkdef -t group -o SN27group -w  servicenode=SN27

Note: When using backup service nodes you should consider splitting the CNs between the two service nodes. This way if one fails you only need to move half your nodes to the other service node.

When you run the xcat2nim, nimnodeset or mkdsklsnode commands to define and initialize the CNs, these commands will automatically replicate the required NIM resources on the SN used by the CN. If you have a backup SN specified then the replications and NIM definition will also be done on the backup SN. This will make it possible to do a quick takeover without having to wait for replication when you need to switch.

The xcat2nim, nimnodeset, mkdsklsnode, and rmdsklsnode commands also support the "-p|--primarySN" and "- b|--backupSN" options. You can use these options to target either the primary or backup service nodes in case you do not wish to update both. These option could be used to initialize the primary service node first, to get the nodes booted and running, and then initialized the backup SN later while the nodes are running.

Monitoring the service nodes

TBD - work in progress - need to switch to transclude

In most cluster environments it is very important to monitor the state of the service nodes and to quickly switch nodes to the backup service node as soon as possible.

See [Monitor_and_Recover_Service_Nodes#Monitoring_Service_Nodes] for details on monitoring your service nodes.

Note: The examples in this documentation are based on the following cluster environment:

Management Node: aixmn1(9.114.47.103) running AIX 7.1B and DB2 9.7

Service Node: aixsn1(9.114.47.115) running AIX 7.1B and DB2 9.7

Compute Node: aixcn1(9.114.47.116) running diskless AIX 7.1B

Service node takeover - with node reboot

Initialize the nodes on the new SN (if needed)

If the NIM replication hasn't been run on the new SN then you must run the xCAT commands to get the new SN configured properly. For diskless nodes you must run mkdsklsnode. For diskful nodes you must run xcat2nim and nimnodeset. (See the man pages for details.)

For example, if you wish to initialize the diskless compute node named "compute02" on its backup service node you could run a command similar to the following.

     mkdsklsnode -V -b -i 710dskls compute02

Use the xCAT "snmove" command to reset the node attributes to point to the backup SN.

   Syntax:
   snmove noderange [-d|--dest sn2] [-D|--destn sn2n]
           [-i|--ignorenodes]
   snmove -s|--source sn1 [-S|--sourcen sn1n] [-d|--dest sn2]
           [-D|--destn sn2n] [-i|--ignorenodes]
   snmove [-h|--help|-v|--version]

For example, if the SN named "SN27" goes down you could switch all the compute nodes that use it to the backup SN by running a command similar to the following.

     snmove -s SN27

The snmove command will check and set several node attribute values.

servicenode: : This will be set to either the second server name in the servicenode attribute list or the value provided on the command line.
xcatmaster: : Set with either the value provided on the command line or it will be automatically determined from the servicenode attribute.
nfsserver: : If the value is set with the source service node then it will be set to the destination service node.
tftpserver: : If the value is set with the source service node then it will be reset to the destination service node.
monserver: : If set to the source service node then reset it to the destination servicenode and xcatmaster values.
conserver: : If set to the source service node then reset it to the destination servicenode and run makeconservercf.

For diskful nodes, (i.e. AIX standalone clients), it will also run the NIM niminit command on the nodes and rerun several xCAT customization scripts to reset whatever services need to be re-targeted to the new SN.

Reboot the diskless nodes

Diskless CNs will have to be re-booted to have them switch to the new SN. Shut down the diskless nodes and run the "rnetboot" command to reboot nodes. This command will get the new SN information from the xCAT database and perform a directed boot request from the node to the new SN. When the node boots up it will be configured as a client of the NIM master on the new SN. For example, to reboot all the nodes that are in the xCAT group "SN27group" you could run the following command.

    rnetboot SN27group

To shut down the nodes you can use the xdsh command to run the shutdown command on the nodes. If that doesn't work you can use the rpower command to shut the nodes down.

DO NOT try to use the rpower command to reboot the nodes. This would cause the node to try to reboot from the old SN.

Service node takeover - no node reboot

**TBD - work in progress**

The SN backup support and the external nfs support may be combined to provide a more highly available diskless compute node. In this case it may be possible to switch compute nodes to the backup service node without having to reboot the nodes.

An AIX diskless node depends on filesystems mounted from a server. Normally this server is the NIM master. If the NIM master goes down the node will also. To avoid the node going down you can use the xCAT external NFS support. This means that when the NIM master goes down the node can continue to function as long as the NFS server is available.

When this happens it is important to switch the the diskless node to a new service node (NIM master) as soon as possible. The best way to do the switch to the new service node would involve rebooting the node. By rebooting the node you are able to switch any of the NIM resources you have defined to the backup service node.

However, if the node availability is a primary concern it is possible to manage the service node takeover without rebooting the nodes.

To do this you must set up both the external NFS server and the service node backup support.

The service node takeover is initiated manually using the xCAT snmove command. If the NFS server is used for the node resources and the backup service node has been specified then the xCAT code will do the takeover.

A service node takeover should be initiated as soon as possible after the the current service node failure is detected.

The xCAT live takeover support is currently limited to the following NIM diskless node resources: SPOT, shared_root, root, paging. In other words, if you wish to use this support you must not allocate other resources, ( such as dump), to the diskless nodes.

The service node live takeover is supported for AIX diskless nodes including stateful, stateless and statelite configurations.

Set up an external NFS server

See [External_NFS_Server_Support_With_AIX_Stateless_And_Statelite] for details on using external NFS server with AIX diskless nodes.

Switch to the backup service node

Switching back

The process for switching nodes back will depend on what must be done to recover the original service node. Essentially the SN must have all the NIM resources and definitions restored and operations completed before you can use it.

If all the configuration is still intact you can simply use the snmove command to switch the nodes back.

If the configuration must be restored then you will have to run either the mkdsklsnode (diskless) or xcat2nim/nimnodeset (diskful) commands. These commands will re-configure the SN using the common osimages defined on the xCAT management node.

For example:

    mkdsklsnode SN27group

This command will check each node definition to get the osimage it is using. It will then check for the primary and backup service nodes and do the required configuration for the one that needs to be configured.

Once the SN is ready you can run the snmove command to switch the node definitions to point to it. For example, if you assume the nodes are currently managed by the "SN28" service node then could could switch them back to the "SN27" SN with the following command.

    svmove SN27group -d SN27

If your compute nodes are diskless then they must be rebooted using the rnetboot command in order to switch to the other service node.

Cleanup

The NIM definitions and resources that are created by xCAT commands are not automatically removed. It is therefore up to the system administrator to do some clean up of unused NIM definitions and resources from time to time. (The NIM lpp_source and SPOT resources are quite large.) There are xCAT commands that can be used to assist in this process.

Removing NIM machine definitions

Use the xCAT rmdsklsnode command to remove all the NIM diskless machine definitions that were created for the specified xCAT nodes. This command will not remove the xCAT node definitions.

For example, to remove the NIM machine definition corresponding to the xCAT diskless node named "node01" you could run the command as follows.

_**rmdsklsnode node01**_

Use the xCAT xcat2nim command to remove all the NIM standalone machine definitions that were created for the specified xCAT nodes. This command will not remove the xCAT node definitions.

For example, to remove the NIM machine definition corresponding to the xCAT node named "node01" you could run the command as follows.

_**xcat2nim -t node -r node01**_

The xcat2nim and rmdsklsnode command are intended to make it easier to clean up NIM machine definitions that were created by xCAT. You can also use the AIX nim command directly. See the AIX/NIM documentation for details.

Removing NIM resources

Use the xCAT rmnimimage command to remove all the NIM resources associated with a given xCAT osimage definition. The command will only remove a NIM resource if it is not allocated to a node. You should always clean up the NIM node definitions before attempting to remove the NIM resources. The command will also remove the xCAT osimage definition that is specified on the command line.

For example, to remove the "610image" osimage definition along with all the associated NIM resources run the following command.

_**rmnimimage 610image**_

If necessary, you can also remove the NIM definitions directly by using NIM commands. See the AIX/NIM documentation for details.

xCAT Wiki

An extreme cluster/cloud administration toolkit

Setting_Up_an_AIX_Hierarchical_Cluster

Overview

Switch to a relational database

Setup for P7 IH Cluster

Defining Cluster Nodes

Define nodes

Using the mkdef command

Using a cluster configuration file.

Create LPARs

**Create LPARs and node definitions for P7 IH Cluster ???????? **

Install the Service Nodes

Specify the services provided by the service nodes

Add IP addresses and hostnames to /etc/hosts

Create a Service Node operating system image

Create an image_data resource (optional)

Add required service node software

XCAT and prerequisite software

Additional diskless dump software (optional)

Additional P7 IH software (optional)

Include additional files for P7 IH support

**Using NIM installp_bundle resources **

Check the osimage (optional)

Define xCAT networks

Define xCAT P7 IH HFI networks

Create additional NIM network definitions (optional)

Define an xCAT service group

Include customization scripts (optional)

Add "servicenode" script for service nodes

Add NTP setup script (optional)

Add secondary adapter configuration script (optional)

** P7 IH configuration scripts (optional)**

Gather MAC information for the install adapters

Create NIM client & group definitions

Create prescripts (optional)

Initialize the AIX/NIM nodes

Open a remote console (optional)

Initiate a network boot

Verify the deployment

Retry and troubleshooting tips

Configure additional adapters on the service nodes (optional)

Verify Service Node configuration

Setup of GPFS I/O Server nodes

Install the cluster nodes

Planning for external NFS server(optional)

Create a diskless image

Update the image (SPOT)

Set up statelite support (for diskless-stateless nodes only)

Define xCAT networks

Set conserver and monserver

Gather MAC information for the node boot adapters

Gather HFI MAC information for the HFI boot adapters on P7 IH

Define xCAT groups (optional)

Add IP addresses and hostnames to /etc/hosts

Verify the node definitions

Verify the node definitions for boot over HFI on P7 IH

Set up post boot scripts (optional)

P7 IH configuration scripts (optional)

Set up prescripts (optional)

Initialize the AIX/NIM diskless nodes

Verifying the node initialization before booting (optional)

Open a remote console (optional)

Initiate a network boot

Verify the deployment

Using login nodes

Using a backup service node

Initial deployment

Monitoring the service nodes

Service node takeover - with node reboot

Initialize the nodes on the new SN (if needed)

Use the xCAT "snmove" command to reset the node attributes to point to the backup SN.

Reboot the diskless nodes

Service node takeover - no node reboot

Set up an external NFS server

Switch to the backup service node

Switching back

Cleanup

Removing NIM machine definitions

Removing NIM resources

Create LPARs and node definitions for P7 IH Cluster ????????

Using NIM installp_bundle resources

P7 IH configuration scripts (optional)