Diskless Node

A diskless node (or diskless workstation) is a workstation or personal computer without disk drives, which employs network booting to load its operating system from a server. (A computer may also be said to act as a diskless node, if its disks are unused and network booting is used.)

Diskless nodes (or computers acting as such) are sometimes known as network computers or hybrid clients. Hybrid client may either just mean diskless node, or it may be used in a more particular sense to mean a diskless node which runs some, but not all, applications remotely, as in the thin client computing architecture.

Advantages of diskless nodes can include lower production cost, lower running costs, quieter operation, and manageability advantages (for example, centrally managed software installation).

In many universities and in some large organizations, PCs are used in a similar configuration, with some or all applications stored remotely but executed locally—again, for manageability reasons. However, these are not diskless nodes if they still boot from a local hard drive.

Distinction between diskless nodes and centralized computing

Diskless nodes process data, thus using their own CPU and RAM to run software, but do not store data persistently—that task is handed off to a server. This is distinct from thin clients, in which all significant processing happens remotely, on the server—the only software that runs on a thin client is the "thin" (i.e. relatively small and simple) client software, which handles simple input/output tasks to communicate with the user, such as drawing a dialog box on the display or waiting for user input.

A collective term encompassing both thin client computing, and its technological predecessor, text terminals (which are text-only), is centralized computing. Thin clients and text terminals can both require powerful central processing facilities in the servers, in order to perform all significant processing tasks for all of the clients.

Diskless nodes can be seen as a compromise between fat clients (such as ordinary personal computers) and centralized computing, using central storage for efficiency, but not requiring centralized processing, and making efficient use of the powerful processing power of even the slowest of contemporary CPUs, which would tend to sit idle for much of the time under the centralized computing model.

	Centralized computing or Thin client	Diskless node	Dataless node	Fat client
Local hard drives used for data	No	No	No	Yes
Local hard drives used for OS	No	No	Yes	Yes
Local general-purpose processing used	No	Yes	Yes	Yes

Principles of operation

The operating system (OS) for a diskless node is loaded from a server, using network booting. In some cases, removable storage may be used to initiate the bootstrap process, such as a USB flash drive, or other bootable media such as a floppy disk, CD or DVD. However, the firmware in many modern computers can be configured to locate a server and begin the bootup process automatically, without the need to insert bootable media.

For network auto-booting, the Preboot Execution Environment (PXE) or Bootstrap Protocol (BOOTP) network protocols are commonly used to find a server with files for booting the device. Standard full-size desktop PCs are able to be network-booted in this manner with an add-on network card that includes a UNDI boot ROM. Diskless network booting is commonly a built-in feature of desktop and laptop PCs intended for business use, since it can be used on an otherwise disk-booted standard desktop computer to remotely run diagnostics, to install software, or to apply a disk image to the local hard drive.

After the bootstrapping process has been initiated, as described above, bootstrapping will take place according to one of three main approaches.

In the first approach (used, for example, by the Linux Terminal Server Project), the kernel is loaded into memory and then the rest of the operating system is accessed via a network filesystem connection to the server. (A small RAM disk may be created to store temporary files locally.) This approach is sometimes called the "NFS root" technique when used with Linux or Unix client operating systems.

In the second approach, the kernel of the OS is loaded, and part of the system's memory is configured as a large RAM disk, and then the remainder of the OS image is fetched and loaded into the RAM disk. This is the implementation that Microsoft has chosen for its Windows XP embedded remote boot feature.

In the third approach, disk operations are virtualized and are actually translated into a network protocol. The data that are usually stored in a disk drive are then stored in virtual disks files homed on a server. The disk operations such as requests to read/write disk sectors are translated into corresponding network requests and processed by a service or daemon running on the server side. This is the implementation that is used by Neoware Image Manager, Ardence, VHD and various "boot over iSCSI" products. This third approach differs from the first approach because what is remote is not a file system but actually a disk device (or raw device) and that the client OS is not aware that it is not running off a hard disk. This is why this approach is sometimes named "Virtual Hard Disk" or "Network Virtual Disk".

This third approach makes it easier to use client OS than having a complete disk image in RAM or using a read-only file system. In this approach, the system uses some "write cache" that stores every data that a diskless node has written. This write cache is usually a file, stored on a server (or on the client storage if any). It can also be a portion of the client RAM. This write cache can be persistent or volatile. When volatile, all the data that has been written by a specific client to the virtual disk are dismissed when said client is rebooted, and yet, user data can remain persistent if recorded in user (roaming) profiles or home folders (that are stored on remote servers). The two major commercial products (the one from Hewlett-Packard, and the other one from Citrix Systems) that allow the deployment of Diskless Nodes that can boot Microsoft Windows or Linux client OS use such write caches. The Citrix product cannot use persistent write cache, but VHD and HP product can.

Diskless Windows nodes

Windows 3.x and Windows 95 OSR1 supported Remote Boot operations, from NetWare servers, Windows NT Servers and even DEC Pathworks servers.

Third party software Vendors such as Qualystem (acquired by Neoware), LanWorks (acquired by 3Com), Ardence (acquired by Citrix), APCT and Xtreamining Technology have developed and marketed software products aimed to remote-boot newer versions of the Windows product line: Windows 95 OSR2 and Windows 98 were supported by Qualystem and Lanworks, Windows NT was supported by APCT and Ardence (called VenturCom at that time), and Windows 2000/XP/2003/Vista/Windows 7 are supported by Hewlett Packard (which acquired Neoware which had previously acquired Qualystem) and Citrix Systems (which acquired Ardence).

Comparison with fat clients

Software installation and maintenance

With essentially a single OS image for an array of machines (with perhaps some customizations for differences in hardware configurations among the nodes), installing software and maintaining installed software can be more efficient. Furthermore, any system changes made during operation (due to user action, worms, viruses, etc.) can be either wiped out when the power is removed (if the image is copied to a local RAM disk) such as Windows XP Embedded remote boot or prohibited entirely (if the image is a network filesystem). This allows use in public access areas (such as libraries) or in schools etc., where users might wish to experiment or attempt to "hack" the system.

However, it is not necessary to implement network booting to achieve either of the above advantages - ordinary PCs (with the help of appropriate software) can be configured to download and reinstall their operating systems on (e.g.) a nightly basis, with extra work compared to using shared disk image that diskless nodes boot off.

Modern diskless nodes can share the very same disk image, using a 1:N relationship (1 disk image used simultaneously by N diskless nodes). This makes it very easy to install and maintain software applications: The administrator needs to install or maintain the application only once, and the clients can get the new application as soon as they boot off the updated image. Disk image sharing is made possible because they use the write cache: No client competes for any writing in a shared disk image, because each client writes to its own cache.

All the modern diskless nodes systems can also use a 1:1 Client-to-DiskImage relationship, where one client "owns" one disk image and writes directly into said disk image. No write cache is used then.

Making a modification in a shared disk image is usually made this way:

The administrator makes a copy of the shared disk image that he/she wants to update (this can be done easily because the disk image file is opened only for reading)
The administrator boots a diskless node in 1:1 mode (unshared mode) from the copy of the disk image he/she just made
The administrator makes any modification to the disk image (for instance install a new software application, apply patches or hotfixes)
The administrator shutdowns the diskless node that was using the disk image in 1:1 mode
The administrator shares the modified disk image
The diskless nodes use the shared disk image (1:N) as soon as they are rebooted.

Centralized storage

The use of central disk storage also makes more efficient use of disk storage. This can cut storage costs, freeing up capital to invest in more reliable, modern storage technologies, such as RAID arrays which support redundant operation, and storage area networks which allow hot-adding of storage without any interruption. Further, it means that losses of disk drives to mechanical or electrical failure—which are statistically highly probable events over a timeframe of years, with a large number of disks involved—are often both less likely to happen (because there are typically fewer disk drives that can fail) and less likely to cause interruption (because they would likely be part of RAID arrays). This also means that the nodes themselves are less likely to have hardware failures than fat clients.

Diskless nodes share these advantages with thin clients.

Performance of centralized storage

However, this storage efficiency can come at a price. As often happens in computing, increased storage efficiency sometimes comes at the price of decreased performance.

Large numbers of nodes making demands on the same server simultaneously can slow down everyone's experience. However, this can be mitigated by installing large amounts of RAM on the server (which speeds up read operations by improving caching performance), by adding more servers (which distributes the I/O workload), or by adding more disks to a RAID array (which distributes the physical I/O workload). In any case this is also a problem which can affect any client-server network to some extent, since, of course, fat clients also use servers to store user data.

Indeed, user data may be much more significant in size and may be accessed far more frequently than operating systems and programs in some environments, so moving to a diskless model will not necessarily cause a noticeable degradation in performance.

Greater network bandwidth (i.e. capacity) will also be used in a diskless model, compared to a fat client model. This does not necessarily mean that a higher capacity network infrastructure will need to be installed—it could simply mean that a higher proportion of the existing network capacity will be used.

Finally, the combination of network data transfer latencies (physically transferring the data over the network) and contention latencies (waiting for the server to process other nodes' requests before yours) can lead to an unacceptable degradation in performance compared to using local drives, depending on the nature of the application and the capacity of the network infrastructure and the server.

Other advantages

Another example of a situation where a diskless node would be useful is in a possibly hazardous environment where computers are likely to be damaged or destroyed, thus making the need for inexpensive nodes, and minimal hardware a benefit. Again, thin clients can also be used here.

Diskless machines may also consume little power and make little noise, which implies potential environmental benefits and makes them ideal for some computer cluster applications.

Comparison with thin clients

Major corporations tend to instead implement thin clients (using Microsoft Windows Terminal Server or other such software), since much lower specification hardware can be used for the client (which essentially acts as a simple "window" into the central server which is actually running the users operating system as a login session). Of course, diskless nodes can also be used as thin clients. Moreover, thin client computers are increasing in power to the point where they are becoming suitable as fully-fledged diskless workstations for some applications.

Both thin client and diskless node architectures employ diskless clients which have advantages over fat clients (see above), but differ with regard to the location of processing.

Advantages of diskless nodes over thin clients

Distributed load The processing load of diskless nodes is distributed. Each user gets its own processing isolated environment, barely affecting other users in the network, as long as their workload is not filesystem-intensive. Thin clients rely on the central server for the processing and thus require a fast server. When the central server is busy and slow, both kinds of clients will be affected, but thin clients will be slowed down completely, whereas diskless nodes will only be slowed down when accessing data on the server.
Better multimedia performance. Diskless nodes have advantages over thin clients in multimedia-rich applications that would be bandwidth intensive if fully served. For example, diskless nodes are well suited for video gaming.
Peripheral support Diskless nodes are typically ordinary personal computers or workstations with no hard drives supplied, which means the usual large variety of peripherals can be added. By contrast, thin clients are typically very small, sealed boxes with no possibility for internal expansion, and limited or non-existent possibility for external expansion. Even if e.g. a USB device can be physically attached to a thin client, the thin client software might not support peripherals beyond the basic input and output devices - for example, it may not be compatible with graphics tablets, digital cameras or scanners.

Advantages of thin clients over diskless nodes

The hardware is cheaper on thin clients, since processing requirements on the client are minimal, and 3D acceleration and elaborate audio support are not usually provided. Of course, a diskless node can also be purchased with a cheap CPU and minimal multimedia support, if suitable. Thus, cost savings may be smaller than they first appear for some organizations. However, many large organizations habitually buy hardware with a higher than necessary specification to meet the needs of particular applications and uses, or to ensure future proofing (see below). There are also less "rational" reasons for overspecifying hardware which quite often come into play: departments wastefully using up budgets in order to retain their current budget levels for next year; and uncertainty about the future, or lack of technical knowledge, or lack of care and attention, when choosing PC specifications. Taking all these factors into account, thin clients may bring the most substantial savings, as only the servers are likely to be substantially "gold-plated" and/or "future-proofed" in the thin client model.

Future proofing is not much of an issue for thin clients, which are likely to remain useful for the entirety of their replacement cycle - one to four years, or even longer - as the burden is on the servers. There are issues when it comes to diskless nodes, as the processing load is potentially much higher, thus meaning more consideration is required when purchasing. Thin client networks may require significantly more powerful servers in the future, whereas a diskless nodes network may in future need a server upgrade, a client upgrade, or both.

Thin client networks have less network bandwidth consumption potentially, since much data is simply read by the server and processed there, and only transferred to the client in small pieces, as and when needed for display. Also, transferring graphical data to the display is usually more suited for efficient data compression and optimisation technologies (see e.g. NX technology) than transferring arbitrary programs, or user data. In many typical application scenarios, both total bandwidth consumption and "burst" consumption would be expected to be less for an efficient thin client, than for a diskless node.

Source: Wikipedia