Multiple_SubCluster_support

There is a newer version of this page. You can find it here.

{{:Design Warning}}

Required Reviewers

  • ?

Required Approvers

  • Guang Cheng Li, Linda Mellor

Overview

Two new customers requirement are covered by this design.

The first requirement is to be able to take an xCAT Cluster managed my one xCAT Management Node and to divide it into multiple subclusters. The nodes in each subcluster will share common ssh keys and root password. This allows the nodes in a subcluster to be able to ssh to each other without password, but cannot do the same to any node in another subcluster.

Note:We are calling these subclusters, because they share a common xCAT Management Node and database including the site tables which defines the attributes of the entire cluster.

The second requirement is for mkvlan enhancements (TBD).

Multiple SubCluster support

The multiple subcluster support requires several enhancements to xCAT.

root ssh keys

Currently xCAT changes root ssh keys on the service nodes (SN) and compute nodes (CN) that are generated at install time to the root ssh keys from the Management node. It also changes the ssh hostkeys on the SN and CN to a set of pregenerated hostkeys from the MN. Putting the public key in the authorized-keys file on the service nodes and nodes allows passwordless ssh to the Service Nodes (SN) and the compute nodes from the Management Node (MN). This setup also allowed for passwordless ssh between all compute nodes and servicenodes. The pregenerated hostkey makes all nodes look like the same to ssh, so you are never prompted for updates to known_hosts

compute nodes

Having subclusters that cannot passwordless ssh to nodes in other subclusters requires xCAT to generate a set of root ssh keys for each subcluster and install them on the compute nodes in that subcluster. In addition the MN public key must still be put in the authorized_keys file on the nodes in the non-hierarchical cluster or the SN public key for hierarchical support.

Question: How about the common ssh hostkeys? Should we generate a set of those for each subcluster?

service nodes

We will still use the MN root ssh keys on any service nodes. Service Nodes would not be allowed to be a member of a subcluster.

root password

Currently xCAT puts the root password on the node only during install. It is taken from the passwd table where key=system. The new subcluster support requires a unique password for each subcluster to be installed.

xCAT changes

To support multiple subclusters we have the proposed changes:

Table Changes

Nodelist table will have a new attribute defined subclustername

Commands

mksubcluster

mksubcluster will be used to do the following:

  • define a subcluster name
  • assign nodes to the subcluster
  • generated the root ssh keys
  • take in the root password for the subcluster
Implementation

mksubcluster will have the following interface:

mksubcluster <noderange> -n <subclustername> -p <root password>

It will take the input and do the following:

For each node in the noderange it will update the nodelist.subclustername attribute with the subclustername. It will create a passwd table entry with the key=subclustername and password the input password. It will create a dynamic group subcluster name where nodelist.subclustername=the input subclustername. It will generate a set of root ssh keys for the subcluster and store them in /etc/xcat/sshkeys/<subclustername>.

Question: should this be a dynamic group or should it just add the subclustername as a group on the nodes when it updates the subclustername attribute in the nodelist table. Note:We decided on subclustername attribute and not use group because we did not want to have the possiblity of them being able to add a node to two subclusters.

rmsubcluster

rmsubcluster will be used to do the following:

  • remove nodes from their defined subcluster
  • remove subcluster if no nodes left defined
  • clean up all keys and password table
Implementation

rmsubcluster <noderange> -n <subclustername>

It will take the input and do the following:

For each node in the noderange it will remove the subclustername from the nodelist.subclustername, nodelist.group if subcluster defined, if it matches. If defined in another subcluster - error.

If no nodes left in the subcluster clean up dynamic group, root ssh keys for that cluster and passwd table entry.

mkvlan enhancements

Needed mkvlan enhancements (TBD). But here a come comments about current support.

Currently it only supports Cisco and some modules of BNT switches (EN4093,G8000,G8124,G8264, 8264E). To support more BNT modules, we need to update the OID table because each BNT modules uses different OIDs for the same function (a very bad design by BNT). And to support other switch vendors like Juniper, a significant code change needs to be done because currently Juniper does support vlan function through SNMP interface. We have to use its own libraries to have it done. This needs framework change in our vlan code.


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.