This document is specific to iDataplex, but gives the basic instructions for setting up any x86_64, IPMI-based, rack mounted xCAT cluster quickly. It is meant to get you going as quickly as possible and therefore only goes through the most common scenario. For additional scenarios, see [XCAT_iDataPlex_Advanced_Setup]
This configuration will have a single dx340 Management Node with 167 other dx340 servers as nodes. The OS deployed will be RH Enterprise Linux 5.2, x86_64 edition. Here is a diagram of the racks:
In our example, the management node is known as 'mgt', the node namess are n1-n167, and the domain will be 'cluster'. We will use the BMCs in shared mode so they will share the NIC on each node that the node's operating system communicates to the xCAT management node over. This is call the management LAN. We will use subnet 172.16.0.0 with a netmask of 255.240.0.0 (/12) for it. (This provides an IP address range of 172.16.0.1 - 172.31.255.254 .) We will use the following subsets of this range for:
The network is physically laid out such that port number on a switch is equal to the U position number within a column, like this:
Here is a summary of the steps required to set up the cluster and what this document will take you through:
Install one of the supported distros on the Management Node (MN). It is recommended to ensure that dhcp, bind (not bind-chroot), httpd, nfs-utils, and perl-XML-Parser are installed. (But if not, the process of installing the xCAT software later will pull them in, assuming you follow the steps to make the distro RPMs available.)
For details on the hardware requirements for the MN, see [Setting_Up_a_Linux_xCAT_Mgmt_Node#Hardware_features_required].
In the /etc/selinux/config file, set:
SELINUX=disabled
In order for a change in this setting to take effect, you must reboot the system, but we will do this later in this section.
The management node provides many services to the cluster nodes, but the firewall on the management node can interfere with this. If your cluster is on a secure network, the easiest thing to do is to disable the firewall on the Management Mode:
service iptables stop
chkconfig iptables off
If disabling the firewall completely isn't an option, configure iptables to allow the following services on the NIC that faces the cluster: DHCP, TFTP, NFS, HTTP, DNS.
The xCAT installation process will scan and populate certain settings from the running configuration. Having the networks configured ahead of time will aid in correct configuration. (After installation of xCAT, all the networks in the cluster must be defined in the xCAT networks table before starting to install cluster nodes.) When xCAT is installed on the Management Node, it will automatically run makenetworks to create an entry in the networks table for each of the networks the management node is on. Additional network configurations can be added to the xCAT networks table manually later if needed.
The networks that are typically used in a cluster are:
In our example, we only deal with the management network because:
For more information, see [Setting_Up_a_Linux_xCAT_Mgmt_Node#Appendix_A:_Network_Table_Setup_Example].
Configure the cluster facing NICs. An example /etc/sysconfig/network-scripts/ifcfg-eth1:
DEVICE=eth1
ONBOOT=yes
BOOTPROTO=static
IPADDR=172.20.0.1
NETMASK=255.240.0.0
If the public facing NIC on your management node is configured by DHCP, you may want to set PEERDNS=no in the NIC's config file to prevent the dhclient from rewriting /etc/resolv.conf. This would be important if you will be configuring DNS on the management node (via makedns - covered later in this doc) and want the management node itself to use that DNS. In this case, set PEERDNS=no in each /etc/sysconfig/network-scripts/ifcfg- file that has BOOTPROTO=dhcp*.
On the other hand, if you want dhclient to configure /etc/resolv.conf on your management node, then don't set PEERDNS=no in the NIC config files.
The xCAT management node hostname should be configured before installing xCAT on the management node. The hostname or its resolvable ip address will be used as the default master name in the xCAT site table, when installed. This name needs to be the one that will resolve to the cluster-facing NIC. Short hostnames (no domain) are the norm for the management node and all cluster nodes. Node names should never end in "-enx" for any x.
To set the hostname, edit /etc/sysconfig/network to contain, for example:
HOSTNAME=mgt
If you run hostname command, if should return the same:
# hostname
mgt
Ensure that at least the management node is in /etc/hosts:
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
###
172.20.0.1 mgt mgt.cluster
When using the management node to install compute nodes, the timezone configuration on the management node will be inherited by the compute nodes. So it is recommended to setup the correct timezone on the management node. To do this on RHEL, see http://www.redhat.com/advice/tips/timezone.html. The process is similar, but not identical, for SLES. (Just google it.)
You can also optionally set up the MN as an NTP for the cluster. See [Setting_up_NTP_in_xCAT].
It is not required, but recommended, that you create a separate file system for the /install directory on the Management Node. The size should be at least 30 meg to hold to allow space for several install images.
Though it is possible to restart the correct services for all settings except SELinux, the simplest step would be to reboot the Management Node at this point.
xCAT will use the ethernet switches during node discovery to find out which switch port a particular MAC address is communicating over. This allows xCAT to match a random booting node with the proper node name in the database. To set up a switch, give it an IP address on its management port and enable basic SNMP functionality. (Typically, the SNMP agent in the switches is disabled by default.) Login to the switches to allow the SNMP version 1 community string "public" read access. This will allow xCAT to communicate without further customization. If you want to use SNMP version 3, the user/password and AuthProto (default is 'md5') should be set in the switches table. It is also recommended that spanning tree be set to portfast or edge-port for faster boot performance. Please see the relevant switch documentation as to how to configure these items.
If for some reason you can't configure SNMP on your switches, you can use the more manual method of entering the nodes' MACs into the database. See [#Manually_discover_a_node].
There are two options for installation of xCAT:
Pick either one, but not both.
If not able to, or not wishing to, use the live internet repository, choose this option.
Go to the [Download_xCAT] site and download the level of xCAT tarball you desire. Go to the xCAT Dependencies Download page and download the latest snap of the xCAT dependency tarball. (The latest snap of the xCAT dependency tarball will work with any version of xCAT.)
Copy the files to the Management Node (MN) and untar them:
mkdir /root/xcat2
cd /root/xcat2
tar jxvf xcat-core-2.*.tar.bz2 # or core-rpms-snap.tar.bz2
tar jxvf xcat-dep-2*.tar.bz2
Point YUM to the local repositories for xCAT and its dependencies:
cd /root/xcat2/xcat-dep/<release>/<arch>
./mklocalrepo.sh
cd /root/xcat2/xcat-core
./mklocalrepo.sh
When using the live internet repository, you need to first make sure that name resolution on your management node is at least set up enough to resolve sourceforge.net. Then make sure the correct repo files are in /etc/yum.repos.d:
To get the current official release:
cd /etc/yum.repos.d
wget http://sourceforge.net/projects/xcat/files/yum/stable/xcat-core/xCAT-core.repo
To get the deps package:
wget http://sourceforge.net/projects/xcat/files/yum/xcat-dep/<release>/<arch>/xCAT-dep.repo
for example:
wget http://sourceforge.net/projects/xcat/files/yum/xcat-dep/rh6/x86_64/xCAT-dep.repo
For information on installing xCAT on SLES with zypper, see [Setting_Up_a_Linux_xCAT_Mgmt_Node#.5BSLES.5D_Setup_Zypper].
xCAT uses on several packages that come from the Linux distro. Follow this section to create the repository of the OS on the Management Node.
See the following documentation: [Setting_Up_a_Linux_xCAT_Mgmt_Node#Get_the_Requisite_Packages_From_the_Distro].
[RH]: Use yum to install xCAT and all the dependencies:
yum clean metadata
yum install xCAT
[SLES]: Use zypper to install xCAT and all the dependencies:
zypper install xCAT
Add xCAT commands to the path by running the following:
source /etc/profile.d/xcat.sh
Check to see the database is initialized:
tabdump site
The output should similar to the following:
key,value,comments,disable
"xcatdport","3001",,
"xcatiport","3002",,
"tftpdir","/tftpboot",,
"installdir","/install",,
.
.
.
If you need to update the xCAT RPMs later:
To update xCAT:
[RH]:
yum update '*xCAT*'
zypper refresh
zypper update -t package '*xCAT*'
Note: this will not apply updates that may have been made to some of the xCAT deps packages. (If there are brand new deps packages, they will get installed.) In most cases, this is ok, but if you want to make all updates for xCAT rpms and deps, run the following command. This command will also pick up additional OS updates.
[RH]:
yum update
zypper refresh
zypper update
Several xCAT database tables must be filled in while setting up an iDataPlex cluster. To make this process easier, xCAT provides several template files in /opt/xcat/share/xcat/templates/e1350/. These files contain regular expressions that describe the naming patterns in the cluster. With xCAT's regular expression support, one line in a table can define one or more attribute values for all the nodes in a node group. (For more information on xCAT's database regular expressions, see http://xcat.sourceforge.net/man5/xcatdb.5.html .) To load the default templates into your database:
cd /opt/xcat/share/xcat/templates/e1350/
for i in *csv; do tabrestore $i; done
These templates contain entries for a lot of different node groups, but we will be using the following node groups:
In our example, ipmi, idataplex, 42perswitch, and compute have the exact same membership because all of our iDataPlex nodes have those characteristics.
The templates define the following naming conventions:
If these conventions don't work for your situation, you can either:
Now you can use the power of the templates to define the nodes quickly:
nodeadd n1-n167 groups=ipmi,idataplex,42perswitch,compute,all
nodeadd bmc1-bmc167 groups=84bmcperrack
nodeadd switch1-switch4 groups=switch
To see the list of nodes you just defined:
nodels
To see all of the attributes that the combination of the templates and your nodelist have defined for a few sample nodes:
lsdef n100,bmc100,switch2
This is the easiest way to verify that the regular expressions in the templates are giving you attribute values you are happy with. (Or, if you modified the regular expressions, that you did it correctly.)
All networks in the cluster must be defined in the networks table. When xCAT was installed, it ran makenetworks, which created an entry in this table for each of the networks the management node is connected to. Now is the time to add to or update in the networks table any other networks in the cluster.
For a sample Networks Setup, see the following example: [Setting_Up_a_Linux_xCAT_Mgmt_Node#Appendix_A:_Network_Table_Setup_Example]
If you want to use hardware discovery, a dynamic range is required to be defined in the networks table. It's used for the nodes to get an IP address before xCAT knows their MAC addresses.
In this case, we'll designate 172.20.255.1-172.20.255.254 as a dynamic range:
chdef -t network 172_16_0_0-255_240_0_0 dynamicrange=172.20.255.1-172.20.255.254
If not using a terminal server, SOL is recommended, but not required to be configured. To instruct xCAT to configure SOL in installed operating systems on dx340 systems:
chdef -t group -o compute serialport=1 serialspeed=19200 serialflow=hard
For dx360-m2 and newer use:
chdef -t group -o compute serialport=0 serialspeed=115200 serialflow=hard
The template created a default passwd table. This includes the system entry which is the passwd that will be assigned to root when the node is installed. You can modify this table using tabedit. To change the default password for root on the nodes, change the system line. To change the password to be used for the BMCs, change the ipmi line.
tabedit passwd
#key,username,password,cryptmethod,comments,disable
"blade","USERID","PASSW0RD",,,
"system","root","cluster",,,
"ipmi","USERID","PASSW0RD",,,
The template created a basic noderes table which defines node resources during install. In the template, servicenode and xcatmaster are not defined, so they will default to the Management Node.
At this point, xCAT should be ready to begin managing services.
Since the map between the xCAT node names and IP addresses have been added in the hosts table by the 31350 template, you can run the makehosts xCAT command to create the /etc/hosts file from the xCAT hosts table. (You can skip this step if creating /etc/hosts manually.)
makehosts switch,idataplex,ipmi
Verify the entries have been created in the file /etc/hosts. For example your /etc/hosts should look like this:
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
###
172.20.0.1 mgt mgt.cluster
172.20.101.1 n1 n1.cluster
172.20.101.2 n2 n2.cluster
172.20.101.3 n3 n3.cluster
172.20.101.4 n4 n4.cluster
172.20.101.5 n5 n5.cluster
172.20.101.6 n6 n6.cluster
172.20.101.7 n7 n7.cluster
.
.
.
To get the hostname/IP pairs copied from /etc/hosts to the DNS on the MN:
Set site.forwarders to your site-wide DNS servers that can resolve site or public hostnames. The DNS on the MN will forward any requests it can't answer to these servers.
chdef -t site forwarders=1.2.3.4,1.2.5.6
Edit /etc/resolv.conf to point the MN to its own DNS. (Note: this won't be required in xCAT 2.8 and above.)
search cluster
nameserver 172.20.0.1
Run makedns
makedns && service named start
For more information about name resolution in an xCAT Cluster, see [Cluster_Name_Resolution].
This will get the network stanza part of the DHCP configuration (including the dynamic range) set:
makedhcp -n
The IP/MAC mappings for the nodes will be added to DHCP automatically as the nodes are discovered.
Nothing to do here - the TFTP server is done by xCAT during the Management Node install.
makeconservercf && service conserver start
If you want to update node firmware when you discover the nodes, follow the steps in [XCAT_iDataPlex_Advanced_Setup#Updating_Node_Firmware] before continuing.
Walk over to systems, hit power buttons, and on the MN watch nodes discover themselves by:
#tail -f /var/log/messages
Look for the dhcp requests, the xCAT discovery requests, and the "<node> has been discovered" messages.
A quick summary of what is happening during the discovery process it:
After a successful discovery process, the following attributes will be added to the database for each node. (You can verify this by running lsdef <node> ):
bmcpassword #the bmc password
mac #the mac address of the node which gotten by discovery
mtm #the hardware type
serial # the hardware serial number
If you cannot discover the nodes successfully, see the next section [#Manually_discover_a_node].
If you just have a few nodes and can't configure the switch for SNMP, you can manually set up the xCAT tables instead, and then run the BMC setup process to configure the BMC on the nodes:
This mac address can be obtained from the back panel of the machine. This MAC address should belong to NIC which is connected to the management network.
chdef n1 mac="xx:xx:xx:xx:xx:xx"
chdef n2 mac="yy:yy:yy:yy:yy:yy"
Add the nodes to dhcp service
makedhcp n1-n2
Setup the current runcmd to be bmcsetup
nodeset n1-n2 runcmd=bmcsetup
Then walk over and power on the node.
After about 5-10 minutes, nodes should be configured and ready for hardware management. You can verify this by:
# rpower all stat|xcoll
====================================
n1,n10,n100,n101,n102,n103,n104,n105,n106,n107,n108,n109,n11,n110,n111,
n112,n113,n114,n115,n116,n117,n118,n119,n12,n120,n121,n122,n123,n124,n125,n126,n127,n128,
n129,n13,n130,n131,n132,n133,n134,n135,n136,n137,n138,n139,n14,n140,n141,n142,n143,n144,n145,
n146,n147,n148,n149,n15,n150,n151,n152,n153,n154,n155,n156,n157,n158,n159,n16,n160,n161,
n162,n163,n164,n165,n166,n167,n17,n18,n19,n2,n20,n21,n22,n23,n24,n25,n26,n27,n28,n29,n3,n30,
n31,n32,n33,n34,n35,n36,n37,n38,n39,n4,n40,n41,n42,n43,n44,n45,n46,n47,n48,n49,n5,n50,n51,n52,
n53,n54,n55,n56,n57,n58,n59,n6,n60,n61,n62,n63,n64,n65,n66,n67,n68,n69,n7,n70,n71,n72,n73,n74,
n75,n76,n77,n78,n79,n8,n80,n81,n82,n83,n84,n85,n86,n87,n88,n89,n9,n90,n91,n92,n93,n94,n95,n96,
n97,n98,n99
====================================
on
Download Redhat ISOs or load your OS's DVD's and place in a directory:
mkdir /root/xcat2
cd /root/xcat2
wget <ISO of your Redhat OS>
Run copycds to setup the install directory for the node diskfull/diskless boots. The copycds commands will copy the contents of to /install/rhel5/<arch>. For example:
cd /root/xcat2
copycds RHEL5.2-Server-20080430.0-x86_64-DVD.iso
The following command will commence installation to disk on all of the nodes. Modify the oslevel from rhels5.2 to whatever version you are installing.
# rinstall -o rhels5.2 all
It is possible to use the wcons command to monitor a sampling of the nodes:
# wcons n1,n20,n80,n100
or rcons to monitor one node
# rcons n1
Additionally, nodestat may be used to check the status of a node as it installs:
# nodestat n20,n21
n20: installing man-pages - 2.39-10.el5 (0%)
n21: installing prep
After some time, the nodes should be up and ready for general usage
For any given command, typing 'man command' should give an in depth document on the workings of that command. Here are some examples of using key commands and command combinations in useful ways.
For a list of all xCAT commands and links to their manpages: [XCAT_Commands]
In this configuration, a handy convenience group would be the lower systems in the chassis, the ones able to read temperature and fanspeed. In this case, the odd systems would be on the bottom, so to do this with a regular expression:
# nodech '/n.*[13579]$' groups,=bottom
or explicitly
chdef -p -t node -o n1-n9,n11-n19,n21-n29,n31-n39,n41-n49,n51-n59,n61-n69,n71-79,n81-n89,
n91-n99,n101-n109,n111-119,n121-n129,n131-139,n141-n149,n151-n159,n161-n167 groups="bottom"
We can list discovered and expanded versions of attributes (Actual vpd should appear instead of *) :
# nodels n97 nodepos.rack nodepos.u vpd.serial vpd.mtm
n97: nodepos.u: A-13
n97: nodepos.rack: 2
n97: vpd.serial: ********
n97: vpd.mtm: *******
You can also list all the attributes:
#lsdef n97
Object name: n97
arch=x86_64
.
groups=bottom,ipmi,idataplex,42perswitch,compute,all
.
.
.
rack=1
unit=A1
xCAT provides parallel commands and the sinv (inventory) command, to analyze the consistency of the cluster. [Parallel_Commands_and_Inventory]
Combining the use of in-band and out-of-band utilities with the xcoll utility, it is possible to quickly analyze the level and consistency of firmware across the servers:
# rinv n1-n3 mprom|xcoll
====================================
n1,n2,n3
====================================
BMC Firmware: 1.18
The BMC does not have the BIOS version, so to do the same for that, use psh:
# psh n1-n3 dmidecode|grep "BIOS Information" -A4|grep Version|xcoll
====================================
n1,n2,n3
====================================
Version: I1E123A
To update the firmware on your nodes, see [XCAT_iDataPlex_Advanced_Setup#Updating_Node_Firmware].
To do this, see [XCAT_iDataPlex_Advanced_Setup#Updating_ASU_Settings_on_the_Nodes].
If the configuration is louder than expected (iDataplex chassis should nominally have a fairly modest noise impact), find the nodes with elevated fanspeed:
# rvitals bottom fanspeed|sort -k 4|tail -n 3
n3: PSU FAN3: 2160 RPM
n3: PSU FAN4: 2240 RPM
n3: PSU FAN1: 2320 RPM
In this example, the fanspeeds are pretty typical. If fan speeds are elevated, there may be a thermal issue. In a dx340 system, if near 10,000 RPM, there is probably either a defective sensor or misprogrammed power supply.
To find the warmest detected temperatures in a configuration:
# rvitals bottom temp|grep Domain|sort -t: -k 3|tail -n 3
n3: Domain B Therm 1: 46 C (115 F)
n7: Domain A Therm 1: 47 C (117 F)
n3: Domain A Therm 1: 49 C (120 F)
Change tail to head in the above examples to seek the slowest fans/lowest temperatures. Currently, an iDataplex chassis without a planar tray in the top position will report '0 C' for Domain B temperatures.
For more options, see rvitals manpage: http://xcat.sourceforge.net/man1/rvitals.1.html