Chaos - Technical details

Current CHAOS feature summary:
CHAOS 1.6 has been released with the following high-level feature set;
          Live CD or PXE; runs from RAM after loading from media - nil installation.
          6Mbyte OS footprint; fits on a business card
          Feature packed Linux Kernel (2.4.27)
          Latest openMosix software (kernel 2.4.27-20040808, tools 0.3.6-2)
          Automatic IP configuration; boot with DHCP/BOOTP/RARP or manually
          3DES encrypted network communications; IPSEC - fully meshed!
          Stateful packet filtering; 500/UDP and ESP network accessible in IPSEC mode
          Custom INIT binary; fast, zombie-free, clear (color coded), and inflexible!
          Autodiscovery daemon tyd with multicast (local) and unicast (multi-site) support
          Supports most i586/PCI hardware (including recent Compaq/Dell desktops)

The proposed CHAOS roadmap
CHAOS version 1.6 was intended to be a package upgrade from CHAOS 1.5, and an effort to restore compatibility between CHAOS, ClusterKnoppix and Qantian. Unfortunately, due to IPSEC issues in ClusterKnoppix, this effort has not been successful. As a new version of Knoppix has been released there is, understandably, little interest in repairing/upgrading the older version (clusterKNOPPIX_V3.6-2004-08-16-EN-cl1).

The plan going forward is, therefore;

          CHAOS version 1.6 has been released as a package upgrade from CHAOS 1.5, only.
          CHAOS version 1.7 will be released along with the latest version of ClusterKnoppix, providing compatible/inter-operable versions of CHAOS, ClusterKnoppix and Qantian. CHAOS 1.7 will also see the beginning of the security and implementation improvements in the CHAOS-specific code (i.e. tyd, init, etc).

Understanding CHAOS' networking
CHAOS has been designed with large ad-hoc roll-outs in mind. Everywhere that a problem was considered, the resultant decision was almost certainly made in favor of security, simplicity and automation. To achieve the largest possible gains in time/labour saving through automation in networking, the boot screen features a matrix of prefabricated boot options. These options allow for combinations of DHCP, BOOTP or static addressing to be quickly selected - the default being dynamic (DHCP/BOOTP). In the case that no network configurator exists, pressing F5 at the boot prompt will display instructions for manually entering static IP interface data.

All CHAOS automagic IP configuration options are managed by init, to provide the fastest flexible boot sequence possible. This implementation method ensures that, where required, supporting drivers (such as PCMICIA support) can be loaded before attempting to obtain network connectivity.

An ASCII-diagram of the services used by CHAOS can be seen here;

 #
 # CHAOS-1.0  /etc/services
 #
 #
 # Network services, Internet style
 #
 ssh             22/tcp          # Secure Shell Login
 bootps          67/udp          # BOOTP server
 bootpc          68/udp          # BOOTP client
 tftp            69/udp          # Trivial File Transfer Protocol
 http            80/tcp          # WorldWideWeb HTTP
 ike             500/udp         # IPSEC IKE
 #
 #
 # Local openMosix
 #
 om-mfs          723/tcp         # openMosix FileSystem port
 om-disc         1334/udp        # openMosix autodiscovery protocol
 om-mig          4660/tcp        # openMosix Migration Daemon port
 om-info         5428/udp        # openMosix Info Daemon port
 #
 #
 # Local CHAOS
 #
 mgetm           2727/udp        # mulicast get-m protocol
 ugetm           2728/udp        # unicast get-m protocol
 tnp             3278/tcp        # terrence-n-phillip protocol


Understanding CHAOS' tyd
The CHAOS auto discovery daemon is called "tyd" - pronounced "tie-dee" (like "tidy"). It is an implementation of the TNP protocol, through a brutally butchered omdiscd source framework; it is a single process/single thread binary that operates as both client and server in the TNP process. Tyd can be executed from /sbin/tyd on a running CHAOS node.

Tyd has two components, Terrence and Phillip. Terrence acts as a client, interrogating Phillip, the server. On the first node in a cluster, Tyd is executed without any parameters, forcing it to run without Terrence; we'll come back to Phillip later. On every other node, Tyd is executed with one parameter; an IP address for another node. Tyd can now tell Terrence where one Phillip is. Terrence connects to Phillip and, once they have met and exchanged their formal greeting, Terrence begins to interrogate Phillip.

Terrence is going to be busy for a little while. First, he retrieves the mosix map from the Phillip he was told about; he updates the local CHAOS node with this map. Then, he connects to every Phillip in the map (cluster), one by one, starting with the first (lowest numbered node-id) and moving through to the last. As he visits each Phillip, Terrence asks him to add the new CHAOS node to his own map. Phillip does this, and returning the current total number of nodes to Terrence. If the total number of nodes that a given Phillip knows about, is more than the total number of nodes Terrence is expecting, then Terrence has been passed by another Terrence; he aborts his cluster addition interrogation, and starts again - asking the first node for a new map.

Phillip, relatively speaking, has a much easier job. Once Terrence has been successful in joining the cluster (map) he goes away, leaving a Phillip in place on the new CHAOS node. This Phillip is like all or any other; contains a map of the entire cluster, is ready to serve maps to a passing Terrence, and to add new Terrences as they occur.

An ASCII-diagram of the TNP communications can be seen here;

/*
 *    tnp proto
 *
 *        port 3278/tcp
 *
 *        hello
 *
 *          [t] -------> [p]  "let's look for treasure!"                (hello,)
 *          [t] <------- [p]  "yes, let's look for treasure!"           (hello.)
 *
 *
 *        interrogate - map
 *
 *          [t] -------> [p]  "aaaahahahaha .. spattered your face!"    (map me)
 *          [t] <------- [p]  [map_count][map data] - close
 *
 *
 *        interrogate - getpubkey
 *
 *          [t] -------> [p]  "get_ssh_pub_key"                         (add me)
 *          [t] <------- [p]  [rsa key ent]
 *          [t] -------> [p]  "get_ssh_pub_key_ack"
 *          [t] <------- [p]  [dsa key ent]
 *          [t] -------> [p]  "get_ssh_pub_key_ack"
 *
 *        interrogate - setpubkey
 *
 *          [t] -------> [p]  "set_ssh_pub_key"                         (add me)
 *          [t] <------- [p]  "set_ssh_pub_key_ack"
 *          [t] -------> [p]  [rsa key ent]
 *          [t] <------- [p]  "set_ssh_pub_key_ack"
 *          [t] -------> [p]  [dsa key ent]
 *          [t] <------- [p]  "set_ssh_pub_key_ack"
 *
 *        interrogate - add
 *
 *          [t] -------> [p]  "aaaahahahaha .. just kidding!"           (add me)
 *          [t] <------- [p]  [node_count] - close
 *
 *
 *        interrogate - del
 *
 *          [t] -------> [p]  "*phht* daaaahahahaha!"                   (del me)
 *          [t] <------- [p]  close
 *
 *
 *
 *    tyd help can be found with "tyd --help"
 * 
 */
               


The TNP protocol was created to solve the shortcomings of the omdiscd. The omdiscd does an awesome job of dynamically configuring the kernel with a common cluster-wide openMosix map; providing your cluster is within a single VLAN or LAN segment. This segment boundary limitation is not intentional - the multicast traffic used to maintain the cluster configuration is only isolated due to the routing infrastructure surrounding a given segment. However, having spent three days trying to get mrouted to bridge VLANs on a proxy-arp'd class-b network, it became evident that writing a unicast autodiscovery daemon would be far more productive.

At this stage tyd, like omdiscd, does not support node removal. Hooks have been coded, but the openMosix kernel was not accepting the node removal request -- more investigation is required. There is also a limitation in node addition, based on cluster integrity. Should a node go missing mid-process, the map chain will be broken and new nodes fail to add correctly. This will be corrected in a comming version of tyd.

Understanding CHAOS' get-m
The CHAOS autodiscovery daemon, tyd, needed a little help. As mentioned above, tyd needs to know the IP address of just one node in the cluster (any node), so that the Terrence component of tyd can download the cluster map, and find all of the other nodes in the cluster. As you will see below, to specify this "master" node, you pass tyd the "-m" parameter.

This new daemon get-m, pronounced "get em" (like a Brittish "get them"), removes the need to manually key in an IP address for Terrence to retrieve a map from. Instead, get-m uses a multicast sonar to try to find any nodes that may already exist on this LAN segment. If it finds any, it records them in /var/run/get-m.info for tyd to find.

Understanding CHAOS' init
The initd (or just plain "init") is the unix initialisation process. It is the first user space software to be executed after the kernel has loaded. Init's job is to literally initialise the operating environment. It does this by setting start-up kernel values, mounting file systems, loading server software, etc.

The CHAOS init daemon works very much like the traditional init (though far less gracefully).

At startup, each of the desired processes are initialised and launched. At shutdown (ctrl-alt-del or "kill 1") the init process gracefully brings the system down. Actually, it is the shutdown process that is the more important of the two; during shutdown, init asks openMosix to expel all of the locally retained processes, so that no parcels are lost on reboot.

The real power of the CHAOS init won't be of interest to you unless you try to embed CHAOS, or try to customise it in some way that requires special deployment. For those who want to serve CHAOS to a network, or make their own CD's, etc, the CHAOS init supports a number of environment options that will act to parameterise the startup characteristics of the operating environment. On the boot prompt/boot command line, after the init= specifier, you can add PARAM=value options, each separated by spaces. Unfortunately there is a very limited number of these values (32 chars worth?) that the kernel will accept - even though /proc/cmdline may reflect all of the options entered. A future version of init will probably rectify this issue by reading /proc/cmdline, downloading an options file from a tftp server, or doing both.

An ASCII-diagram of the init options can be seen here;

/*
 *  ACPID               ACPID Start/Allow
 *              = 0             Don't
 *              = 1             Do
 *
 *  BOOT                Boot Type
 *              = 0             Unknown/ISO
 *              = 1             Network
 *              = 2             Local/Fixed Disk
 *
 *  DHCP                DHCP client Start/Allow
 *              = 0             Don't
 *              = 1             Do
 *
 *  EJECT               CDROM Find/Eject
 *              = 0             Don't
 *              = 1             Do
 *
 *  HTTPD               HTTPD Start/Allow
 *              = 0             Don't
 *              = 1             Do
 *              = 2             Do + Admin
 *
 *  IPFILT              Ipfilter Start/Allow
 *              = 0             Don't
 *              = 1             Do
 *
 *  IPSEC               IPSEC Start/Allow
 *              = 0             Don't
 *              = 1             Do
 *
 *  MASTER              Allocate discovery host (tyd)
 *              =               ip.ad.dr.ess
 *              =               host.na.me
 *
 *  OMAUTO              Autodiscovery Type
 *              = 0             Unknown/None
 *              = 1             Omdiscd
 *              = 2             Tyd
 *
 *  SETI                SETI Start/Allow
 *              = 0             Don't
 *              = 1             Do
 *
 *  SHELL               Boot to Interactive Shell
 *              = 0             Don't
 *              = 1             Do
 *
 *  SSHD                SSHD Start/Allow
 *              = 0             Don't
 *              = 1             Do
 *
 *  TYD                 TYD behaviour
 *              = 0             None/Off
 *              = 1             Slave
 *              = 2             Master
 *              = 3             PXE-Master
 *
 */


But why create a new init? In a dedicated task environment, the flexibility of a complete unix OS is simply not required. Removing said flexibility allows for focus in the distribution; CHAOS' init does only what is required to start the cluster environment and without the use of a myriad of shell scripts. This technique also makes it easier to integrate features comprehensively (such as PXE boot options).

CHAOS has stateful packet filtering
Yes - The Mosix/openMosix architecture is incredibly insecure; from vulnerabilities in the respective implementations, all the way back to their insecure design. Since CHAOS 0.5, tyd has supported packet filtering using the linux netfilter kernel-level stateful packet filtering code.
In CHAOS version post CHAOS 0.7, if cryptography is disabled, then the filters employed allow only the node's http daemon and ty-daemon to be visible to the network, beyond those nodes registered in the cluster. If cryptography is enabled, then the filters employed allow only the node's http daemon, IPSEC's ESP protocol, and IKE's service port visible to the network, beyond those nodes registered in the cluster. With cryptography enabled, all openMosix and tyd communications are required to route via the ipsec0 interface, only.

CHAOS has cryptography - 3DES under IPSEC
From CHAOS 0.7, tyd has supported the layer-3 VPN standard - IPSEC - with a default encryption transform of 3DES (Triple DES). CHAOS uses pre-shared-keys (PSKs) to authenticate the tunnel.

The default value for the PSK is stored in /etc/ipsec.secrets. It is long and complicated, but it would still be most prudent avoid using the default. Note that using the default will ensure that differing tyd versions will not successfully communicate within the same cluster. See also the following sections on compatibility for details on CHAOS nodes in non-CHAOS-homogeneous clusters.

Nb; CHAOS applies the default Openswan encrypt, tunnel, compress and pfs transforms in a fully meshed topology ("n-1" tunnels per node). All tyd and openMosix communications are encrypted and encapsulated in the IPSEC tunnels.

Inner Contact Us

Contact Us

Check Out

Check out our products and services

Related Pages - Chaos