Linux supplies the driver update disk mechanism to support the devices which cannot be driven by the released distribute during the installation process. "driver update disk" is a media which containing the drivers and related configuration files for certain devices.
See [Using_Linux_Driver_Update_Disk] for information on using the driver update disk.
The xCAT database tables support powerful regular expressions for defining a pattern-based configuration. This can make the tables much smaller in large clusters, and can also help for more dynamic configurations or defining a site-standard set of defaults once and applying to multiple clusters.
Even though the syntax of the regular expressions looks complicated, once you understand the basics of the syntax, it is easy to make changes to the regular expressions to fit your cluster. As an example, we will change the IP addresses of the nodes from 172.20.100+racknum.nodenuminrack to 10.0.0.nodenum. The IP address is defined in the ip attribute and the relevant node group is idataplex, so we first query the regular expression using:
mgt# lsdef -t group idataplex -i ip
Object name: idataplex
ip=|\D+(\d+).*$|172.30.(101+(($1-1)/84)).(($1-1)%84+1)|
Notice in the expression there are 3 vertical bars. The text between the first 2 vertical bars is what we will match in the node name whose IP address we are trying to get from the database. The pattern match of "\D+(\d+).*$" means that we expect some non-digit characters, then some digits, then any text, then the end. Because the match of the digits is in parentheses, that value will get put into $1. So if we are matching n2, $1 will equal "2". The text between the 2nd and 3rd vertical bars represents the value of the attribute (in this case the ip) for this node. Any text in parentheses will be evaluated with the current value of $1. If you do the math, the resulting ip for n2 will be 172.30.101.2 . If we want it to be 10.0.0.2 instead, then change the regular expression like this:
chdef -t group idataplex ip='|\D+(\d+).*$|10.0.0.($1+0)|'
Note: you could also change this expression using:
tabedit hosts
Now test that the expression is working correctly for n2:
mgt# lsdef n2 -i ip
Object name: n2
ip=10.0.0.2
Any of the regular expressions can be changed in a similar way to suit your needs. For more details on the table regular expressions, see the xCAT database object and table descriptions.
If you want xCAT to install or diskless boot your nodes immediately after discovering and defining them, do this before kicking off the discovery process:
Associate the osimage with your nodes:
chdef ipmi provmethod=<osimage>
Add the deployment step to the chain attribute:
chdef ipmi chain='runcmd=bmcsetup,osimage=<osimage name>:reboot4deploy'
Now initiate the discovery and deployment by powering on the nodes.
If you just have a few nodes and do not want to use the templates to set up the xCAT tables, and do not want to configure your ethernet switches for SNMP access, then the following steps can be used to define your nodes and prepare them for bmcsetup. (For more information, see the node object attribute descriptions.)
Add the new nodes:
nodeadd n1-n20 groups=ipmi,idataplex,compute,all
Set attributes that are common across all of the nodes:
chdef -t group ipmi mgt=ipmi netboot=xnba bmcusername=USERID bmcpassword=PASSW0RD
For each node, set the node specific attributes (bmc, ip, mac). For example:
chdef n1 bmc=10.0.1.1 ip=10.0.2.1 mac="xx:xx:xx:xx:xx:xx"
.
.
.
Add the nodes to dhcp service
makedhcp idataplex
Setup the current runcmd to be bmcsetup
nodeset idataplex runcmd=bmcsetup
Then walk over and power on the node. This will set up each BMC with the IP address userid and password that were set in the database above.
The process for updating node firmware during the node discovery phase, or at a later time, is:
Put into a tarball:
For example:
ibm_fw_imm2_1aoo27b-1.10_anyos_noarch.uxz
ibm_fw_imm2_1aoo27b-1.10_anyos_noarch.xml
ibm_fw_uefi_tde111a-1.00_anyos_32-64.uxz
ibm_fw_uefi_tde111a-1.00_anyos_32-64.xml
runme.sh
ibm_utl_uxspi_9.21_rhel6_32-64.bin
./ibm_utl_uxspi_9.21_rhel6_32-64.bin up -L -u
For this example, we assume you have a nodegroup called "ipmi" that contains the nodes you want to update.
Option 1 - update during discovery: If you want to update the firmware during the node discovery process, ensure you have already added a dynamic range to the networks table and run "makedhcp -n". Then update the chain table to do both bmcsetup and the firmware update:
chdef -t group ipmi chain="runcmd=bmcsetup,runimage=http://mgmtnode/install/firmware/firmware-update.tgz,shell"
Option2 - update after node deployment: If you are updating the firmware at a later time (i.e. not during the node discovery process), tell nodeset that you want to do the firmware update, and then set currchain to drop the nodes into a shell when they are done:
nodeset ipmi runimage=http://mgmtnode/install/firmware/firmware-update.tgz,shell
chdef ipmi currchain=shell
Then physically power on the nodes (in the discovery case), or if the BMCs are already configured, run: rpower ipmi boot
To monitor the progress, watch the currstate attribute in the chain table. When they all turn to "shell", then they are done:
watch -d 'nodels ipmi chain.currstate|xcoll'
At this point, you can check the results of the updates by ssh'ing into nodes and looking at /var/log/IBM_Support. (Or you can use psh or xdsh to grep for specific messages on all of the nodes.)
If you need to update CMOS/uEFI/BIOS settings on your nodes, download ASU (Advanced Settings Utility) from the IBM Fix Central web site:
Once you have the ASU RPM on your MN (management node), you have several choices of how to run it:
ASU can be run on the management node (MN) and told to connect to the IMM of a node. First install ASU on the MN:
rpm -i ibm_utl_asu_asut78c-9.21_linux_x86-64.rpm
cd /opt/ibm/toolscenter/asu
Determine the IP address, username, and password of the IMM (BMC):
lsdef node1 -i bmc,bmcusername,bmcpasswd
tabdump passwd | grep ipmi # the default if username and password are not set for the node
Run ASU:
./asu64 show all --host <ip> --user <username> --password <pw>
./asu64 show uEFI.ProcessorHyperThreading --host <ip> --user <username> --password <pw>
./asu64 set uEFI.RemoteConsoleRedirection Enable --host <ip> --user <username> --password <pw> # a common setting that needs to be set
If you want to set a lot of settings, you can put them in a file and run:
./asu64 batch <settingsfile> --host <ip> --user <username> --password <pw>
Copy the RPM to the nodes:
xdcp ipmi ibm_utl_asu_asut78c-9.21_linux_x86-64.rpm /tmp
Install the RPM:
xdsh ipmi rpm -i /tmp/ibm_utl_asu_asut78c-9.21_linux_x86-64.rpm
Run asu64 with the ASU commands you want to run:
xdsh ipmi /opt/ibm/toolscenter/asu/asu64 show uEFI.ProcessorHyperThreading
Add the ASU RPM to your node image following the instructions in [Install_Additional_Packages].
If this is a stateless node image, re-run genimage and packimage and reboot your nodes. (If you don't want to reboot your nodes right now, run updatenode -S to install ASU on the nodes temporarily.
If this is a stateful node image, run updatenode -S to install ASU on the nodes.
If you want to set ASU settings while discovering the nodes:
See [#Updating_Node_Firmware] for an example of using a tar file in a runimage statement.
{{:Configuring Secondary Adapters}}
The following documentation is old and has not been tested in a long time.
Do this on the same node you generated the image on. Note: if this is a node other than the management node, we assume you still have /install mounted from the MN, the genimage stuff in /root/netboot, etc..
yum install kernel-devel gcc squashfs-tools
mkdir /tmp/aufs
cd /tmp/aufs
svn co http://xcat.svn.sf.net/svnroot/xcat/xcat-dep/trunk/aufs
If your node does not have internet acccess, do that elsewhere and copy
tar jxvf aufs-2-6-2008.tar.bz2
cd aufs
mv include/linux/aufs_type.h fs/aufs/
cd fs/aufs/
patch -p1 <node> "echo 3 > /proc/sys/vm/drop_caches;free -m;df -h"
total used free shared buffers cached
Mem: 3961 99 3861 0 0 61
-/+ buffers/cache: 38 3922
Swap: 0 0 0
Filesystem Size Used Avail Use% Mounted on
compute_ppc64 100M 220K 100M 1% /
none 10M 0 10M 0% /tmp
none 10M 0 10M 0% /var/tmp
Max for / is 100M, but only 220K being used (down from 225M). But wheres the OS?
Look at cached. 61M compress OS image. 3.5x smaller As files change in hidden OS they get copied to tmpfs (compute_ppc64) with a copy on write. To reclaim space reboot. The /tmp and /var/tmp is for MPI and other Torque and user related stuff. if 10M is too small you can fix it. To reclaim this space put in epilogue:
umount /tmp /var/tmp; mount -a
If you want to upgrade your management node to a new release of Linux, refer to [Setting_Up_a_Linux_xCAT_Mgmt_Node#Appendix_D:_Upgrade_your_Management_Node_to_a_new_Release_of_Linux]
If you want to upgrade your service nodes to a new release of Linux, refer to [Setting_Up_a_Linux_Hierarchical_Cluster#Update_Service_Node_Diskfull_Image]
Note: this section is still under construction!
In case you need to understand this to diagnose problems, this is a summary of what happens when xCAT network boots the different types of nodes.
Booting Node <node>.elilo) HTTP server
bootloader (xnba/elilo)
request for linux kernel -->
HTTP server
bootloader (xnba/elilo)
<node>)
HTTP server
initrd
request 2nd stage bootloader -->
DHCP server
initrd
<os>/x86_64/images/install.img)
HTTP server
kickstart installer (install.img)
request installation pkgs -->
HTTP server
kickstart installer (install.img)
<node>)(localboot)
TFTP server
boot from local disk
/etc/init.d/xcatpostinit1
execute /opt/xcat/xcatinstallpost
/opt/xcat/xcatinstallpost
set "REBOOT=TRUE" in /opt/xcat/xcatinfo
/opt/xcat/xcatinstallpost
execute mypostscript.post in /xcatpost
mypostscript.post
execute postbootscripts in /xcatpost
updateflag.awk
update node status to booted -->
xCAT server
Booting Node Network Transfer Management Node
PXE ROM
DHCP request -->
DHCP server
PXE ROM
<node>.elilo)
HTTP server
bootloader (xnba/elilo)
request for linux kernel -->
HTTP server
bootloader (xnba/elilo)