Saturday, February 28, 2009
HP Systems Insight Manager simplified
Modern servers for large organizations come in racks. These racks can have 5 extremely powerful and expandable servers, 42 of the thinnest servers, or even up to 128 server blades. When these computers are set up and while they're running there's no keyboard, monitor or mouse connected to them. The physical installation and the software installation and configuration are handled by completely different people. An essential piece that makes this work is a built-in system manager.
For HP servers the built in system manager is called Integrated Lights Out, or ILO. It's a dedicated computer inside the server that works on standby power, so it's on even when the server is off. It's accessible through its own network port or a shared one. It has access to all of the server's health monitoring systems, the keyboard mouse USB and video, and can even flash the BIOS. Older servers have basic ILO. More recent servers will have ILO2. Both versions of ILO have the potential to be improved with a license key in the CMOS settings that enables more advanced features like graphics video and virtual media.
To make working with many thousands of systems manageable, you need a coordinated system that allows you to monitor and perform operations on servers, groups of servers, or entire datacenters. That's where HP System Insight Manager comes in.
Systems Insight Manager (SIM) consists of a web server, database and set of utilities. Although HP doesn't make a strong point of it, this type of service is called a Content Management System (CMS). The web server provides a single integrated viewpoint of all the servers. It does this by presenting a visual representation of the servers themselves. It can detect servers on the network, or you can tell it where they are. Once they're configured, servers are available in the interface, monitored continuously and managed. You can actually look at the picture of the rack on the web server and see the lights blinking. It's integrated with the built in management hardware of the servers, so by selecting a server you can perform many different options. You can power the server on and off, turn on the service ID blue light, configure BIOS settings, flash the BIOS or even install an operating system. Using the remote console you can watch the machine boot as if you were in front of it with a keyboard mouse and monitor, and use whatever graphical interface you install too, so configuring Windows or Linux doesn't require a trip to the server room. These features work outside the operating system using dedicated hardware inside the server that runs on standby power, so it works even when the server is turned off. Because it's web based, it can be made available anywhere on your network or or anywhere in the world. Systems Insight Manager runs on its own server, and can be downloaded for free.
HP offers some software packages for sale, and provides some with each Proliant server. These software packages help in running, managing and configuring an individual server. They all also plug into Systems Insight Manager, enabling various features from the higher level view.
The Proliant Support Pack that you get with each server includes a suite of drivers and software for supported operating systems, including a service called the System Management Home page. This service runs in the operating system, presents a web based interface, and allows you to access all of the built in system monitors and the system management hardware in each server.
Systems Insight Manager detects this interface and installs a button to monitor each system's System Management home page. Included in the Proliant Support Pack is also a CD package called "Smart Start" that allows for the remote installation of an operating system for a single server, and it has a scripting toolkit for scripted installations. There's also an array configuration utility (ACU) that allows you to configure locally attached hard drives using HP RAID controllers. A diagnostic suite is also included, which allows you to perform certain diagnostic tests outside of the operating system (offline edition) or inside of it (online edition).
Insight Control Environment is available at additional cost. It includes all of these modules, some of which are also available separately:
Rapid Deployment Pack allows you to build and configure system images and stream them to individual servers or groups of servers.
Virtual Machine Management Pack assists with virtual machines.
Vulnerability and Patch Management lets you set up repositories for patches and automatically deploy them.
Insight Power Manager allows you to monitor and control power usage per server and by groups of servers.
In addition to HP servers the the Systems Insight Manager can also monitor other devices that use SNMP, the (formerly) Simple Network Management Protocol, which is supported by almost all modern network devices.
Other vendors have similar systems for managing their servers. These power tools for server administration help reduce costs and enable fewer server administrators to manage far more servers.
Wednesday, February 11, 2009
Layer 2 networking - Simplified - First in series
In this first installment I'm going to cover some basic terms and describe the basic equipment. In the second installment we'll go over an example network and hopefully tie together how the things work. If I get to a third segment I should be able to step up to the next level of our network model and tie some networks together.
The real purpose of this article isn't to educate you - it's to cement these ideas in my mind in a way that is accessible to people I talk to on a daily basis. If you find this useful you're welcome to copy it in any way you like. I hereby dedicate it to the public domain.
Terminology
Technology
The addressing scheme
Reliably unreliable
The packet
Your network card
The Hub, extender and bridge
The switch
VLANs
QOS
Trunking
Routers and other gateways
Terminology
Layer 2 refers to the second layer of the 7 layer OSI networking model. Although there are other models that describe network architecture the OSI model is the accepted standard for most people. Layer two is the level that defines a "network". Below this level are devices and media. Above this level are internets and intranets. This topic is a network. We'll cover for completeness virtual networks and touch on routing between virtual networks because these are issues that are dealt with on this OSI networking level.
IEEE 802.3 is the name of the working group that invented Ethernet and documented these standards which are still in use today, though most of the technologies were first invented by Robert Metcalf.
IEEE 802.11 is the name of the working group that adopts standards for wireless Ethernet.
There are other ways to do networking than Ethernet. They're all odd and/or dead, so I won't cover them here.
An octet is 8 bits. The term byte is technically the size of word that the information processing system can handle, but let's not be pedantic. For the purpose of this article a byte is an octet is 8 bits and is represented by eight binary digits, two hexadecimal digits or a value from 0-255.
Packets, datagrams and frames are not quite the same things. Despite this the terms here will be used interchangeably to refer both to the information being passed (data) and the control information that describes that information and how to get it to where it's going (header). The purpose of this is to make the information more accessible. If you can't deal with this please cite somebody else. Communication is not well served by excess precision.
Technology
We'll be discussing wired Ethernet over copper. For most of the material wireless networking is similar, but hopefully I'll find time to write them up in detail another time. For now the problem is big enough so I'll stick with wired networks using Cat 5e or better media. Fiber is an important part of modern networking, but fiber networking at layer two is similar enough that I can probably avoid discussing the differences. There are other ways to do networking, but either they're of historical interest or special purpose use only.
The network under discussion here will be only a single Local Area Network and the discussion will end at the first router we come to. Once a router re-addresses your data, it's no longer on the same network and passes beyond this topic. The only exception is when we get to VLANs, for which a cursory discussion of routing is necessary, since VLANs are common parts of modern networks and appropriate for discussion at layer 2.
The addressing scheme
For layer two ethernet we have a special sublayer, the Media Access Control or MAC layer, that deals with addressing. The rules are pretty simple. A MAC address uniquely identifies a particular access device that will receive packets. A MAC address is typically 6 bytes, or 48 bits. MAC addresses are usually written as pairs of hexadecimal digits, called out in the order of transmission, such as 01-02-03-0a-0b-0c or 01:02:03:0a:0b:0c. In both of these cases the first half is referred to as the organizationally unique identifier (OUI) and the second half is the network interface controller (NIC) specific ID. The original purpose for this was to allow for specific network controller vendors to identify their products in the MAC address and still leave a way for each NIC on a LAN to have a unique ID. Since this is 30 years later, you can probably anticipate that we've run out of numbers and individual vendors have multiple OUIs, and MACs are no longer deliberately unique. That's OK, though, because these days MAC address is a configurable part of the NIC and so if you have two with the same number (address collision) you can fix it.
Reliably unreliable
It's counter-intuitive, but it works. On the ethernet at layer 2, the system is deliberately unreliable. There is no error detection or correction mechanism. The ethernet delivers packets on a "best effort" basis. Unexpected packets received on a port are ignored. Packets to unknown hosts are just discarded. At Layer 4 we get systems that handle detecting if communication was successful, but the equipment at layer 2 literally doesn't care. Reliable methods have been tried, but failed to keep up with the speed of Ethernet and were ultimately discarded or pushed into specialized applications.
The packet
The packet consist of a header and your data. If you have no interest in programming or network analysis you can safely skip the rest of this part. The header consists of a preamble that identifies the packet as an ethernet frame. It's 7 bytes with the value 10101010. This is the signal that lets the receiver know there's data coming down the wire. It's followed by the start of frame indicator, which is a single byte with the value 10101011. Then comes the MAC destination address, then the MAC source address. The next field is rather tricky.
An optional field, the 802.11Q field, goes here. If present, the first two bytes are 0x8100, because this value is an invalid value for other packet formats. This is called the Tag Protocol Identifier (TPID). If this value is present then an additional field called the Tag Protocol Identifier (TPI) of two bytes will follow. The TPI identifies VLAN and QOS, and will be described later.
Next comes the Ethertype field. This is two bytes. For 802.3 ethernet this is the length of the packet and valid values are between 64 and 1522 bytes.
Next comes the data, which can be between 46 and 1500 bytes.
Last comes a 4 byte field, which represents the result of performing an error detection algorithm called CRC-32 on the rest of the packet. The sender computes this value when sending the packet and adds it to the end. When the receiver performs the calculation on the received packet it's exceedingly unlikely that the computation will result in the CRC data unless the packet was transmitted correctly.
And that's all. If a packet is less than 64 bytes, which isn't allowed given the required data above, it's called a runt and discarded.
Your network card
I'm writing this in February of 2009. Current technology is gigabit ethernet, which is probably the same capacity you plug it into. Your network interface controller (NIC) allows your computer to connect to the physical cable and communicate with the network. If you have a laptop it almost certainly has a network port you can connect to the network. Network administration is a vast and variable field. Some networks only allow connections by known systems, or have other restrictions. We are not going to cover those issues here. It's assumed the typical permissive network found in business or homes is used.
Your NIC plugs into a wire with four twisted pairs of wires, and from there to a switch, possibly with a wall jack and premises wiring in between. Since premise wiring is just simple copper connections that extend the wires we'll ignore them here.
Until your NIC has a physical connection to another device and they've worked out between them out to communicate, you're not "on the network".
Your NIC or your switch or both might only be capable of 100 million bits per second (100Mbps, or fast ethernet). You might be connected directly to another PC's network card directly, which is "technically" a network, but we won't discussed this odd case. Whether you need normal cable, called a "patch" or "straightthrough" cable, or a special cable that reverses the send and receive signals called a "crossover" cable depends on a number of factors. Most NICs and switches these days have a feature called "Auto MDI-X" that straightens out these issues. Switches and network cards can also discover between them which speed each supports and automatically use this speed. The only trap here is that the cable standards for modern networking are very strict. If both the sender and receiver are capable of faster communication than the wire between them, they will suffer a horrible connection. If this happens to you, throw out the old cable and get a new one. They're cheap.
Almost all computers these days come with at least one gigabit ethernet port, but they're not all the same. A high end ethernet controller is a microcomputer in itself and handles almost all aspects of the communication. Built in controllers often use the processor to calculate the checksum and for various other things, and system memory to hold the packets during processing. Built in controllers are getting better these days though and processors are powerful enough to handle this so you don't have to worry about that too much unless your needs are pretty extreme - and then you wouldn't be reading this anyway.
Now look: gigabit isn't currently the top of the networking food chain. It's not even close. Unlike other IT infrastructure, networking usually progresses by 10s. The previous generation was 100 million bits per second. The current standard is 1 billion bits per second, or 1 gigabit. 10 gigabit ethernet is now widely available, and 100 gigabit is in development. There are bizarre unrelated networking protocols like Infiniband. You don't need to worry about that right now. Today gigabit ethernet is where it's at, and it's more than enough for most of the stuff you want to do if you're my target audience.
The link
The link is shorthand for the successfully connected physical medium that data passes over.
The Hub, extender
These devices are historical oddities. If you find one, throw it away and replace it with a switch. If you don't know what these are, don't worry. You don't need to know about this. You don't want to try it.
The switch
Although some people are trying to get this named a "network bridge" its common name is "switch". This is the key piece of equipment we'll be talking about. Switches come in many varieties and capabilities and can cost more than a half million dollars on the high end or less than 50 dollars on the low. Some switches are capable of performing "routing" at OSI model layer 3, but we won't discuss this here - we'll only consider layer 2 switching, which all switches use. The switch receives the packet from your NIC. If the NIC in the destination address is directly connected to the switch, the switch forwards it out directly to that NIC only. If the destination address isn't directly connected then a couple of things can happen. If the switch has a layer 2 routing facility like "Spanning Tree Protocol" and is connected to a similarly equipped switch, then it can know which port on the switch to forward the packet through and send it through that. Otherwise the switch forwards the packet out all of its ports except the one it was received on, or drops it depending on the switch configuration.
Managed switch
An unmanaged switch doesn't do QOS. It doesn't do VLANs. It probably doesn't do spanning tree. It doesn't have storable and recoverable configurations. Since managed switches start at under $200 for an 8 port gigabit switch these days, get a managed switch unless you know why you don't need one.
VLANs
Earlier we discussed the 802.11q part of the packet header. In addition to QOS this field has 12 bits to designate the "virtual local area network". When both ends of a link are capable of 802.11q, and are configured to use it, up to 1024 VLANs are possible. In practice not all switches are capable of using any, and some only support a limited number. In most cases only servers access more than one VLAN on a single link.
So what's a VLAN? In as much as a LAN is a physical network, a "virtual lan" is some subset of the physical network. By applying a number to the VLAN it's possible to do a number of useful things. You can separate communication between servers and equipment based on role, and change the relationships in the switch software without rerouting the physical wires in the walls. This allows the network administrator to assign the accounting department to their own network, for example, so that the sales department can't inadvertently access PCs in the accounting department. They can also screw up this configuration so that an attentive user can access all VLANs by leaving all VLANs and QOS configured on the user's port by default.
A port on a switch can be dedicated to a particular VLAN, and then all traffic received on that port from the end user will belong to that VLAN. If the person at that network port moves to another desk on another floor, it's possible to restrict his access only to the network resources that are appropriate for him. Inside the network the VLANs share physical links, but switches will not pass information from one VLAN to another. In order to get a packet from one VLAN to another, a router is required.
One trick with VLANs is that you can have a two sets of switches that support, say, VLAN 11, with unmanaged switches or switches or ports configured to not pass VLAN 11 between them. In this case these two VLANs, though they share a VLAN number and physical connections, are isolated from each other. Spanning tree protocol can wind up blocking the transfer of packets on a particular VLAN if configured incorrectly.
In addition, a LAN is a broadcast domain. Layer 2 networking contains a facility for sending one packet to all receivers on all ports on all switches on that network. Having too many users in a broadcast domain increases the likelihood one of them will go crazy and create a "broadcast storm". By segregating subsets of customers in VLANs, it's possible to limit the scope of such a malfunction.
QOS
QOS is about traffic priority. If you're doing VOIP or streaming video on your network and you require a connection that doesn't stutter then you probably need QOS.
One problem we get into here is that the QOS standard for networking, 802.1p, is differently implemented by various networking equipment vendors. They've all got whiz-bang features that justify their proprietary features. After all, the standard is only 15 years old. It specifies 8 priority "bins". How it's implemented is not specified and left to implementation.
Most switching equipment vendors allow users to prefer a minimum percentage of a link to a particular bin. Then if no traffic is in that bin the bandwidth is allowed for other traffic, but if a stream occurs on a link then it's permitted to consume up to the minimum percentage without hindrance by other traffic on the line. When the communication passes through a link that doesn't support this, the tags are lost, so QOS delivery is limited to the segments of the network that directly support it.
How you would use it is home, for example, is that you have a switch that supports QOS, a video server with your home movies, and a mythTV box that you watch movies on. Naturally if your spouse is downstairs downloading remastering the video on your file server of the family Christmas event you don't want that to degrade your viewing experience of Office Space. So you configure the video server with a QOS of 2 on the your Video VLAN, VLAN 90. Then you tell the gigabit switch that the port to your mythTV box is VLAN 90 and that the QOS for bin is 20%. Magically your mythTV box has a minimum of 20% of its link for video. This oversimplified example skips the part where you need at least two switches before this is useful.
Trunking
This is more of a business thing. There are two types of "trunking". The first is where you use one link to pass multiple VLANs. The second is where you use multiple individual links between two switches to increase the bandwidth between them. We're not going to worry about this right now.
Routers and other gateways
When traffic leaves the LAN it must pass through a gateway to an off-network device or network. For the purposes of this topic a router or gateway is just another computer. When we get to connecting VLANs together I'll cover this a little bit, but not a lot.
The main discussion.
Whew! That was a lot of background. I don't know about you, but I'm glad it's over. Let's do some network engineering now in another post.
Sunday, July 13, 2008
Linux GIS
There is a project with a long history in Linux that does this -- it's called GRASS. It was chosen for three projects in the 2008 Google Summer of Code. It's an active project with a long history and many users so it's likely to be around for quite a while longer, and it's licensed under the GNU GPL so price isn't an issue.
GRASS is pretty feature rich. GIS systems are always complex beasts as the various methods of storing, converting and visualizing geographic data are all rich fields with long histories and good fields for varying preferences. This system allows GIS data to be stored in any of the common databases including MS Access, MySQL, PostgrSQL, MS-SQL Server, Oracle, dBASE and others as well as various common formats or flat files. It can use live files created for and by ESRI's ArcGIS, which is the commonest commercial GIS program.
With the next version of GRASS a native Windows build will be available. For now the Windows version of the application is built under Cygwin.
Like many GPL licensed applications, GRASS has been included in a number of packages called distributions that include many complimentary applications that target an audience with a complete suite of applications and related tools that suit a common purpose, along with the Linux operating system and all of the usual applications as well. ArcheOS is an example of one that's targeted to archeologists that provides GRASS and related tools as well as a rich set of new toys to play with. I'll be using ArcheOS to set up a workstation system with GRASS. As of the current version (2.0.0) ArcheOS comes as a 1.2GB .iso file to burn to DVD for live DVD use or to install and includes version 6.2.3 (the most current stable release) of GRASS.
Anyway, give GRASS a try and tell me what you think.
Friday, July 04, 2008
LTSP configuration (Gutsy) - Episode 2
We're going to use it to help clone a bunch of Windows XP computers.
In our first episode we built an LTSP server. If you haven't read that article yet or you don't have a good LTSP server (not in production!) to work with it would be good to go back there and follow the steps so you are better able to follow along.
There are several steps to perform here. We have to select an imaging platform that's bootable with ltsp. It has to not copy the whole drive -- just the blocks that have data in them. It has to be reasonably fast. We have to select a method of getting it the large image to the clients -- probably file sharing but possibly multicasting. We have to tie it all together. One advantage we have is that we have a large number of machines laying around with 40GB drives and gigabit ethernet to use as servers. They're surplus from a prior installation.
To get the image size down we need to use a project that uses ntfsclone, since that's the project that knows about the contents of NTFS formats and can copy only the blocks that have data. We need a project that uses ntfsclone and works with ltsp but allows us some flexibility in how we use it. I chose clonezilla. This project is a subproject of Diskless Remote Boot in Linux (DRBL), which is a similar project to ltsp. DRBL and clonezilla are projects of Taiwan's National Center for High-Performance Computing. It has handy installers, comes in bootable CD and pendrive formats, and a version is available for network booting. Although they claim to support multicasting the process is as yet unwieldy so we'll be using the ltsp server as a dhcp and file server, and clone multiple servers to meet our bandwidth requirements. Since we're using Ubuntu for the ltsp server, I decided to go with Clonezilla Live Experimental (Hardy). Download the .iso and burn it to a CD.
Before we put a lot of work into making it netbootable I should probably validate that it makes a good copy in a reasonable period of time. I'll be using a recently imaged old laptop that won't be a disaster if I mangle its image, just in case clonezilla does not work as advertised. My actual target clients are dual core notebooks with faster hard drives but the image is also five times the size. I'm looking for scalability on the server (serving many clients simultaneously) and on the network.
Boot the clonezilla CD from a client you would like to clone that's connected via network to your ltsp localnet.
pic
Choose the boot to ram option because we're going to use run from RAM if we PXE boot. After some text scrolls past you see this
pic
We'll choose the english version and don't touch keymaps.
Start clonezilla
device image disk/partition to/from image
We'll use the ssh server because we don't have samba or nfs set up yet on our ltsp server.
It will automatically detect our nic and network
DHCP is set up so we'll use that to get our address.
It detects our server and offers it as the default.
Port 22 is the default for ssh.
The default account is root. We don't allow remote access from a root account so we'll use this one.
Here we select a directory on the host. This is a good time to make the directory and ensure it is owned by the user you selected before.
We're warned we're about to be asked a password.
Are we sure we want to connect to a new server? Of course the answer is yes here.
Here's a prompt with no useful information for what we're doing. Press enter.
We're going to choose savedisk here to take a snapshot of the hard drive in this computer. When we restore we choose restoredisk instead.
We're going to use ntfsclone, so choose the first option here.
The default here is only to choose -c for wait for confirmation. We're going to clear that and have no options set on this screen.
The hard drive on this PC has 5.4 GB. Using -z1 we can bring that down to 2.1, which is better for our networking. -z2 is much slower for little improvement in image size and net, it's probably a loss in speed.
Here we choose an image name. This will actually be a subdirectory in the folder chosen previously, with various files in it.
There's only one drive in this machine. It's a laptop. As soon as we confirm this last entry, it will begin taking the image and storing it on the server. First run with 10/100 networking took 445 seconds. Second copy to server took 442 seconds. Download with 10/100 took 382 seconds, and at gigabit speed we get 365 seconds. Obviously bandwidth isn't our bottleneck. One thing to watch out for -- on the server, storing one image both CPUs hit about 50%, considerably more than their baseline 20%. This is likely due to the encryption overhead of SSH connections. The network usage goes in spikes of about 4-12 MB/s with gigabit networking. To improve this we'll need to a different network protocol to serve the images.
Now we check the image size on the network.
ltsp:/$ ls -hal /home/partimag/2008-07-05-00-img/
total 2.1G
That's good. Now we repeat the process but choose to download the image.
Test the image thoroughly. Are all the files there? Perform a chkdsk. No errors? Then we've got a viable copy but the speed needs work.
I'll try Samba next. We'll stick with the gigabit connection since it's up. I would go over the way I configured Samba, but you can figure it out from this useful page.
Testing with samba reveals that the server processor overhead for a single gigabit connection goes from the baseline of 20% to about 22%. We've removed the processor bottleneck. We're only using about 1/10th of a gigabit link on average. We have plenty of machines available so we'll probably go with six or eight clients per server depending on how much the load slows them down.
LTSP configuration (Gutsy) - Episode 1
I've gotten LTSP systems up and running before. This latest evolution is giving me grief. The purpose of this post is to document the successful steps so that I can replicate them reliably. Version 5 of LTSP is pretty slick once you get it going.
For a server platform I have an HP XW8200 with 4GB RAM and dual 3.2GHz Xeon processors. It has an 72GB U320 SCSI drive to boot from and an additional 500GB SATA drive for data. It has three gigabit network ports - one on the motherboard and two on a server grade add-in PCI-X card. I will be using one of these to connect to the upstream internet, and two for my localnets. Each localnet gigabit NIC will be connected to a different switched network. The clients will boot from the network and be offered a menu of LTSP client or imaging at boot time.
I've selected the Ubuntu 8.04 (Hardy Heron) Alternate CD mode LTSP installation. It has a text-based installer that adds all of the basic stuff required to get the server up and running. It is supposed to work right out of the box, though that's not my experience.
The first issue I've discovered is that this method will not properly install if the PC is connected to the internet during installation, but also will not if no network ports have link. The networking is universally misconfigured in these cases. The workaround for this is to unplug the NICs and plug in the one NIC that will be used for Internet into a standalone network switch. This allows the NIC to be connected and configured as the primary network interface. I've selected eth1 for this chore. After the server is up and running you can configure the network the rest of the way.
The second issue is that if I run the install with the SATA drive connected, the system tries to boot from it even though I have the BIOS set to prefer the SCSI drive. I fix this by disconnecting the SATA drive until later in the installation.
The third issue is that at work my tyrranical network admins detect linux package updates as abusive network consumption and throttle me to less than dialup bandwidth. To get around this I'll be doing the work at home where I have 6Mbps cable broadband I can abuse all I like.
The next step is to configure the network. First, connect the port that you were keeping alive to the network and boot into your new system and log in. At that point you should be able to use the Internet. Then configure the other two network ports. You'll need to know your network gateway, which is given as the last line when you use the "route" command. For my purposes here it's the home router I'm using - 192.168.0.1. You will need a network address and mask for each of your localnets. I'm choosing 192.168.10.1 255.255.255.0 for eth0 and 192.168.11.1 255.255.255.0 for eth2. One pitfall here is to try to configure these ports on the same subnet. Don't do it. It messes up your routing and your server won't know where to send the packets. If the ltsp server gets its internet from dhcp, you also want to make sure neither of these subnets is the same as a subnet you might be assigned to automatically. Now we have the server up and running online. It's time to get updates.
In the menu choose System->Administration->Synaptic Package Manager and click the Reload button. The list of software sources is pre-loaded for you. Reload downloads the current list of updates and checks them against your current install. Today against the basic installation I did there are 228 updates, of which 9 are new packages and 219 are upgrades to existing packages. It's 256 MB in all. I'm waiting for them to download and install right now. There are kernel updates in there so there will be a reboot afterward. Today there are over 24,000 software packages in the software repository and more than 1400 of them are installed in this basic configuration.
I get a note that my ssh keys were updated. This will require rebuilding the thin client image that was built during the install. It tells me the key was stored in:
/etc/ssh/ssh_host_rsa_key /etc/ssh/ssh_host_dsa_key
We fix this by running
sudo ltsp-update-sshkeys
Once the updating is done. Now I have a current server, there's another step before I can boot the clients. During installation it warned me that DHCPD needed to be configured because it couldn't figure out what networks the clients were on.
The log for dhcpd is /var/log/syslog
restart dhcpd with
sudo invoke-rc.d dhcp3-server restart
The next issue is that the ltsp server for some reason stores the dhcpd configuration file in /etc/ltsp rather than the default /etc/dhcp3 folder. I update the dhcpd.conf file in /etc/ltsp with this:
#
# Default LTSP dhcpd.conf config file.
#
authoritative;
subnet 192.168.10.0 netmask 255.255.255.0 {
range 192.168.10.20 192.168.10.250;
option domain-name "example1.com";
option domain-name-servers 192.168.10.1;
option broadcast-address 192.168.10.255;
option routers 192.168.10.1;
# next-server 192.168.0.254;
# get-lease-hostnames true;
option subnet-mask 255.255.255.0;
option root-path "/opt/ltsp/i386";
if substring( option vendor-class-identifier, 0, 9 ) = "PXEClient" {
filename "/ltsp/i386/pxelinux.0";
} else {
filename "/ltsp/i386/nbi.img";
}
}
subnet 192.168.11.0 netmask 255.255.255.0 {
range 192.168.11.20 192.168.11.250;
option domain-name "example2.com";
option domain-name-servers 192.168.11.1;
option broadcast-address 192.168.11.255;
option routers 192.168.11.1;
# next-server 192.168.0.254;
# get-lease-hostnames true;
option subnet-mask 255.255.255.0;
option root-path "/opt/ltsp/i386";
if substring( option vendor-class-identifier, 0, 9 ) = "PXEClient" {
filename "/ltsp/i386/pxelinux.0";
} else {
filename "/ltsp/i386/nbi.img";
}
}
Then I PXE boot a client directly attached to eth0. It gets a DHCP address of 192.168.10.250 and loads the boot image with Busybox. Then it shows the Ubuntu splash screen but then fails out to an initramfs shell. This generally indicates that the cient image that was installed from the cdrom is bad. To fix this I move the directory /opt/ltsp/i386 to /opt/ltsp/i386.original and run
sudo ltsp-build-client
This directory is very important. It's a "chroot" environment. We will be working with different chroot environments when we build client images, but I'm going to get the ltsp client image built and booting properly first to validate the architecture. ltsp-build-client takes a good long time to download the component parts from the repository and build the client image.
We're not done yet. Now we update the repository sources for the client:
sudo mv /opt/ltsp/i386/etc/apt/sources.list /opt/ltsp/i386/etc/apt/sources.list.backup
sudo cp /etc/apt/sources.list /opt/ltsp/i386/etc/apt
And chroot into the client environment
sudo chroot /opt/ltsp/i386
Update the packages and upgrade them
sudo apt-get update
sudo apt-get upgrade
Today there are 43 packages to upgrade. Then I exit the chroot environment
exit
and update the client image with
sudo ltsp-update-image
When this is complete I can PXE boot the client, log in and it works fine. I have a working LTSP system. The clients boot in about 15 seconds and are ready to go immediately.
Now is a great time to make a backup copy of your /opt/ltsp/i386 folder. If you mangle it, then you will be able to put it back.
Next I install thin-client-manager-gnome using System->Administration->Synaptic Package Manager. This lets me see the processes on the client. I'm supposed to be able to kill them also and get a remote desktop but that's not working out. I added it to my main menu with
/usr/bin/gksudo /usr/bin/student-control-panel
The icons are in /usr/share/student-control-panel/ but they're png so you'll have to use something else.
One quick test - shut down the client and the server. Boot the server. After it's up, boot two clients, one on each subnet port. If they all come up fine and working you have successfully built ltsp. That's it for this step.
For the next article I'll be building the boot menu so that instead of booting to LTSP you'll have the option for a few seconds of choosing a different option, such as cloning.
The third article will cover building the cloning image.
Thursday, June 19, 2008
Ubuntu + LTSP + DrQueue = Render cluster
The latest version of Ubuntu incorporates the venerable LTSP project in an interesting way -- any chroot environment can be configured as an environment to be PXEBooted. Since PXEBoot has been built into every consumer machine for five years, many new things are possible.
LTSP is designed to be a way that ancient desktops and modern thin clients can be configured to save money on the point-of-access. This new facility means that much more can be done with it. Understanding how requires a bit of explaining.
A chroot environment is a configuration in Linux where the user can (CH)ange the (ROOT) directory to some subdirectory of the current computer. It's used in services to isolate a particular service or user's environment so that they can't access things they're not supposed to. It's like a limited virtual machine. It can be configured the same way as a normal environment would -- with local applications, events, all the usual stuff.
LTSP extends this by building the chroot environment into an image file that a booting machine can use as its own real environment. By controlling the chroot environments issued to various machines based on MAC address (an address unique to the machine or network card) one can assign a specific chroot environment to a particular machine. This allows the LTSP to issue a thin client linux to ancient computers that deliver a modern experience using the server's greater computation power. It also allows the system to send special environments based on the client's architecture. PowerPC Macintosh computers require a special one, as do some others. You can even PXE boot a virtual machine -- so as to leverage virtualization technologies and server consolidation dynamically. A controller process can be configured to monitor loads on your network at dynamically launch virtual machines to handle the loads as the need requires.
It has been possible for some time to build a redundant architecture for every common service that uses various network and software methods to assign work for one service to multiple servers. By leveraging this PXE boot, specific environments for specific services, and assigning machines to service tasks via MAC addresses it's possible to create a redundant architecture to provide all of these services that scales to any size.
This changes a great deal in infrastructure design. Every server can round-robin to whatever server is available. When services are slow: add another server to the list that receive the image for that service and boot it. It will automagically configure itself to receive a share of the load and serve clients. Need more power in your render cluster? Buy as many render nodes as you need and PXE Boot them -- no touch configuration. A node fails? That's fine. It's all redundant. Swap it out and move on. Even the LTSP servers themselves can be made redundant in this way, so that as long as one persists the architecture will survive.
What I think is cool about this: You can build the most powerful render cluster in the world without writing even a single line of code. That's right - the programmer-free cluster. It's all off the shelf hardware and software.
Over the next few weeks I'll be building a render cluster using cheap equipment. Watch this space to see what I can do with it.
Saturday, March 29, 2008
Networking education resources
If you want to get a good basic understanding of how basic networking works, you could do worse than to take the ProCurve Networking Primer. It offers the fundamentals in an easy to understand self-paced course. It doesn't have a lot of vendor bias in it.
HP offers a great deal of training in fundamentals for free. Some of it is specific to their products and some of it is not. On the ProCurve Training page you will find some materials to study if you are interested in these things. It's accessible to the public.
A lot of the HP training is available free to the public but it's hidden behind a membership page so people can't find it easily. This is silly because the training itself is hosted on a public FTP server. For example the exam preparation guides are in the epgs directory. There's quite a lot of interesting stuff on ftp.hp.com and it's wide open for browsing.
IBM also has a good deal of online training available here.
Naturally MIT's Open CourseWare covers networking as well
Friday, March 28, 2008
Nettop, netbook, Mobile Internet Device, Blah
Yeah, there's lots of vapor in the air regarding "thin is in" low power, small performance mini notebooks and portable PC components. It would easy to grouse about how we've heard this before and when the air cleared the thing cost $2500 if you could buy it at all and was lame until you dropped it, at which point it was worthless.
The thing is, that story's over. Flash storage as a medium has matured and become much cheaper. You can get small LCD (or newer tech) monitors at ridiculously low prices because of economies of scale. The small LCD in the eee PC for example is used in point of sale equipment, digital photo frames, kiosks, and a number of other devices. With a low power processor that's also cheap the Bill of Materials on this equipment starts getting interesting.
At IDF in a few days the NDA's for lots of companies building platforms on Intel's Diamondville and Silverthorne (nee Atom) processors expire and we're going to see what kind of device the major manufacturers can build with a 0.5W - 2.5W processor that is very cheap, runs IA32 architecture and clocks at reasonable (1.8 GHz?) speeds. I think there will be more than a few surprises in store.
I'm going to speculate there are more than a few that are a decent laptop computer that costs about what consumers are currently paying for an MP3 player like the Zune or the IPOD. That's going to drive a lot of market in the third world. It's going to change a lot of things about the bottom end of the laptop market. Some of these things are not going to be computers at all, but they also will be really cool.
One thing's for sure though: If any of them run Vista, they won't do it well.