Sunday, July 13, 2008

Linux GIS

For some time now I've been interested in Geographic Information Systems (GIS) for Linux. This is a natural combination since there is a huge amount of freely available geographic data available for free from the US government. GIS systems take datapoints, usually as geographic coordinates (longitude, latitude and elevation) and by associating various data (stream surveys, street plans, etc.) give a graphical representation that's very flexible. It's helpful to use them to make maps or visualize different elements.

There is a project with a long history in Linux that does this -- it's called GRASS. It was chosen for three projects in the 2008 Google Summer of Code. It's an active project with a long history and many users so it's likely to be around for quite a while longer, and it's licensed under the GNU GPL so price isn't an issue.

GRASS is pretty feature rich. GIS systems are always complex beasts as the various methods of storing, converting and visualizing geographic data are all rich fields with long histories and good fields for varying preferences. This system allows GIS data to be stored in any of the common databases including MS Access, MySQL, PostgrSQL, MS-SQL Server, Oracle, dBASE and others as well as various common formats or flat files. It can use live files created for and by ESRI's ArcGIS, which is the commonest commercial GIS program.

With the next version of GRASS a native Windows build will be available. For now the Windows version of the application is built under Cygwin.

Like many GPL licensed applications, GRASS has been included in a number of packages called distributions that include many complimentary applications that target an audience with a complete suite of applications and related tools that suit a common purpose, along with the Linux operating system and all of the usual applications as well. ArcheOS is an example of one that's targeted to archeologists that provides GRASS and related tools as well as a rich set of new toys to play with. I'll be using ArcheOS to set up a workstation system with GRASS. As of the current version (2.0.0) ArcheOS comes as a 1.2GB .iso file to burn to DVD for live DVD use or to install and includes version 6.2.3 (the most current stable release) of GRASS.

Anyway, give GRASS a try and tell me what you think.

Friday, July 04, 2008

LTSP configuration (Gutsy) - Episode 2

LTSP is the Linux Terminal Server Project. Because it's popular with schools it's had quite a bit of development and it has been adopted by Ubuntu as part of their Edubuntu package. It's generally used to allow a server to provide the horsepower for a bunch of thin clients. We'll be expanding it to server other useful purposes.

We're going to use it to help clone a bunch of Windows XP computers.

In our first episode we built an LTSP server. If you haven't read that article yet or you don't have a good LTSP server (not in production!) to work with it would be good to go back there and follow the steps so you are better able to follow along.

There are several steps to perform here. We have to select an imaging platform that's bootable with ltsp. It has to not copy the whole drive -- just the blocks that have data in them. It has to be reasonably fast. We have to select a method of getting it the large image to the clients -- probably file sharing but possibly multicasting. We have to tie it all together. One advantage we have is that we have a large number of machines laying around with 40GB drives and gigabit ethernet to use as servers. They're surplus from a prior installation.

To get the image size down we need to use a project that uses ntfsclone, since that's the project that knows about the contents of NTFS formats and can copy only the blocks that have data. We need a project that uses ntfsclone and works with ltsp but allows us some flexibility in how we use it. I chose clonezilla. This project is a subproject of Diskless Remote Boot in Linux (DRBL), which is a similar project to ltsp. DRBL and clonezilla are projects of Taiwan's National Center for High-Performance Computing. It has handy installers, comes in bootable CD and pendrive formats, and a version is available for network booting. Although they claim to support multicasting the process is as yet unwieldy so we'll be using the ltsp server as a dhcp and file server, and clone multiple servers to meet our bandwidth requirements. Since we're using Ubuntu for the ltsp server, I decided to go with Clonezilla Live Experimental (Hardy). Download the .iso and burn it to a CD.

Before we put a lot of work into making it netbootable I should probably validate that it makes a good copy in a reasonable period of time. I'll be using a recently imaged old laptop that won't be a disaster if I mangle its image, just in case clonezilla does not work as advertised. My actual target clients are dual core notebooks with faster hard drives but the image is also five times the size. I'm looking for scalability on the server (serving many clients simultaneously) and on the network.

Boot the clonezilla CD from a client you would like to clone that's connected via network to your ltsp localnet.

pic

Choose the boot to ram option because we're going to use run from RAM if we PXE boot. After some text scrolls past you see this

pic

We'll choose the english version and don't touch keymaps.

Start clonezilla

device image disk/partition to/from image

We'll use the ssh server because we don't have samba or nfs set up yet on our ltsp server.

It will automatically detect our nic and network

DHCP is set up so we'll use that to get our address.

It detects our server and offers it as the default.

Port 22 is the default for ssh.

The default account is root. We don't allow remote access from a root account so we'll use this one.

Here we select a directory on the host. This is a good time to make the directory and ensure it is owned by the user you selected before.

We're warned we're about to be asked a password.

Are we sure we want to connect to a new server? Of course the answer is yes here.

Here's a prompt with no useful information for what we're doing. Press enter.

We're going to choose savedisk here to take a snapshot of the hard drive in this computer. When we restore we choose restoredisk instead.

We're going to use ntfsclone, so choose the first option here.

The default here is only to choose -c for wait for confirmation. We're going to clear that and have no options set on this screen.

The hard drive on this PC has 5.4 GB. Using -z1 we can bring that down to 2.1, which is better for our networking. -z2 is much slower for little improvement in image size and net, it's probably a loss in speed.

Here we choose an image name. This will actually be a subdirectory in the folder chosen previously, with various files in it.

There's only one drive in this machine. It's a laptop. As soon as we confirm this last entry, it will begin taking the image and storing it on the server. First run with 10/100 networking took 445 seconds. Second copy to server took 442 seconds. Download with 10/100 took 382 seconds, and at gigabit speed we get 365 seconds. Obviously bandwidth isn't our bottleneck. One thing to watch out for -- on the server, storing one image both CPUs hit about 50%, considerably more than their baseline 20%. This is likely due to the encryption overhead of SSH connections. The network usage goes in spikes of about 4-12 MB/s with gigabit networking. To improve this we'll need to a different network protocol to serve the images.

Now we check the image size on the network.

ltsp:/$ ls -hal /home/partimag/2008-07-05-00-img/
total 2.1G

That's good. Now we repeat the process but choose to download the image.

Test the image thoroughly. Are all the files there? Perform a chkdsk. No errors? Then we've got a viable copy but the speed needs work.

I'll try Samba next. We'll stick with the gigabit connection since it's up. I would go over the way I configured Samba, but you can figure it out from this useful page.

Testing with samba reveals that the server processor overhead for a single gigabit connection goes from the baseline of 20% to about 22%. We've removed the processor bottleneck. We're only using about 1/10th of a gigabit link on average. We have plenty of machines available so we'll probably go with six or eight clients per server depending on how much the load slows them down.

LTSP configuration (Gutsy) - Episode 1

The Linux Terminal Server Project (LTSP) is a method of using linux as an operating system that delivers thin clients the performance of a server. It works with many linux distributions and I have previously used it with good results. I'm working on putting together a system that lets me use the ltsp architecture to also perform imaging of desktops and laptops in bulk and quickly using a complete FOSS toolchain. If I get that far I'll explore using LTSP's on-demand architecture as part of a cloud type redundant infrastructure.

I've gotten LTSP systems up and running before. This latest evolution is giving me grief. The purpose of this post is to document the successful steps so that I can replicate them reliably. Version 5 of LTSP is pretty slick once you get it going.

For a server platform I have an HP XW8200 with 4GB RAM and dual 3.2GHz Xeon processors. It has an 72GB U320 SCSI drive to boot from and an additional 500GB SATA drive for data. It has three gigabit network ports - one on the motherboard and two on a server grade add-in PCI-X card. I will be using one of these to connect to the upstream internet, and two for my localnets. Each localnet gigabit NIC will be connected to a different switched network. The clients will boot from the network and be offered a menu of LTSP client or imaging at boot time.

I've selected the Ubuntu 8.04 (Hardy Heron) Alternate CD mode LTSP installation. It has a text-based installer that adds all of the basic stuff required to get the server up and running. It is supposed to work right out of the box, though that's not my experience.

The first issue I've discovered is that this method will not properly install if the PC is connected to the internet during installation, but also will not if no network ports have link. The networking is universally misconfigured in these cases. The workaround for this is to unplug the NICs and plug in the one NIC that will be used for Internet into a standalone network switch. This allows the NIC to be connected and configured as the primary network interface. I've selected eth1 for this chore. After the server is up and running you can configure the network the rest of the way.

The second issue is that if I run the install with the SATA drive connected, the system tries to boot from it even though I have the BIOS set to prefer the SCSI drive. I fix this by disconnecting the SATA drive until later in the installation.

The third issue is that at work my tyrranical network admins detect linux package updates as abusive network consumption and throttle me to less than dialup bandwidth. To get around this I'll be doing the work at home where I have 6Mbps cable broadband I can abuse all I like.

The next step is to configure the network. First, connect the port that you were keeping alive to the network and boot into your new system and log in. At that point you should be able to use the Internet. Then configure the other two network ports. You'll need to know your network gateway, which is given as the last line when you use the "route" command. For my purposes here it's the home router I'm using - 192.168.0.1. You will need a network address and mask for each of your localnets. I'm choosing 192.168.10.1 255.255.255.0 for eth0 and 192.168.11.1 255.255.255.0 for eth2. One pitfall here is to try to configure these ports on the same subnet. Don't do it. It messes up your routing and your server won't know where to send the packets. If the ltsp server gets its internet from dhcp, you also want to make sure neither of these subnets is the same as a subnet you might be assigned to automatically. Now we have the server up and running online. It's time to get updates.

In the menu choose System->Administration->Synaptic Package Manager and click the Reload button. The list of software sources is pre-loaded for you. Reload downloads the current list of updates and checks them against your current install. Today against the basic installation I did there are 228 updates, of which 9 are new packages and 219 are upgrades to existing packages. It's 256 MB in all. I'm waiting for them to download and install right now. There are kernel updates in there so there will be a reboot afterward. Today there are over 24,000 software packages in the software repository and more than 1400 of them are installed in this basic configuration.

I get a note that my ssh keys were updated. This will require rebuilding the thin client image that was built during the install. It tells me the key was stored in:

/etc/ssh/ssh_host_rsa_key /etc/ssh/ssh_host_dsa_key

We fix this by running

sudo ltsp-update-sshkeys

Once the updating is done. Now I have a current server, there's another step before I can boot the clients. During installation it warned me that DHCPD needed to be configured because it couldn't figure out what networks the clients were on.

The log for dhcpd is /var/log/syslog
restart dhcpd with

sudo invoke-rc.d dhcp3-server restart

The next issue is that the ltsp server for some reason stores the dhcpd configuration file in /etc/ltsp rather than the default /etc/dhcp3 folder. I update the dhcpd.conf file in /etc/ltsp with this:

#
# Default LTSP dhcpd.conf config file.
#

authoritative;

subnet 192.168.10.0 netmask 255.255.255.0 {
range 192.168.10.20 192.168.10.250;
option domain-name "example1.com";
option domain-name-servers 192.168.10.1;
option broadcast-address 192.168.10.255;
option routers 192.168.10.1;
# next-server 192.168.0.254;
# get-lease-hostnames true;
option subnet-mask 255.255.255.0;
option root-path "/opt/ltsp/i386";
if substring( option vendor-class-identifier, 0, 9 ) = "PXEClient" {
filename "/ltsp/i386/pxelinux.0";
} else {
filename "/ltsp/i386/nbi.img";
}
}
subnet 192.168.11.0 netmask 255.255.255.0 {
range 192.168.11.20 192.168.11.250;
option domain-name "example2.com";
option domain-name-servers 192.168.11.1;
option broadcast-address 192.168.11.255;
option routers 192.168.11.1;
# next-server 192.168.0.254;
# get-lease-hostnames true;
option subnet-mask 255.255.255.0;
option root-path "/opt/ltsp/i386";
if substring( option vendor-class-identifier, 0, 9 ) = "PXEClient" {
filename "/ltsp/i386/pxelinux.0";
} else {
filename "/ltsp/i386/nbi.img";
}
}


Then I PXE boot a client directly attached to eth0. It gets a DHCP address of 192.168.10.250 and loads the boot image with Busybox. Then it shows the Ubuntu splash screen but then fails out to an initramfs shell. This generally indicates that the cient image that was installed from the cdrom is bad. To fix this I move the directory /opt/ltsp/i386 to /opt/ltsp/i386.original and run

sudo ltsp-build-client

This directory is very important. It's a "chroot" environment. We will be working with different chroot environments when we build client images, but I'm going to get the ltsp client image built and booting properly first to validate the architecture. ltsp-build-client takes a good long time to download the component parts from the repository and build the client image.

We're not done yet. Now we update the repository sources for the client:

sudo mv /opt/ltsp/i386/etc/apt/sources.list /opt/ltsp/i386/etc/apt/sources.list.backup
sudo cp /etc/apt/sources.list /opt/ltsp/i386/etc/apt

And chroot into the client environment

sudo chroot /opt/ltsp/i386

Update the packages and upgrade them

sudo apt-get update
sudo apt-get upgrade

Today there are 43 packages to upgrade. Then I exit the chroot environment
exit

and update the client image with
sudo ltsp-update-image

When this is complete I can PXE boot the client, log in and it works fine. I have a working LTSP system. The clients boot in about 15 seconds and are ready to go immediately.

Now is a great time to make a backup copy of your /opt/ltsp/i386 folder. If you mangle it, then you will be able to put it back.

Next I install thin-client-manager-gnome using System->Administration->Synaptic Package Manager. This lets me see the processes on the client. I'm supposed to be able to kill them also and get a remote desktop but that's not working out. I added it to my main menu with
/usr/bin/gksudo /usr/bin/student-control-panel
The icons are in /usr/share/student-control-panel/ but they're png so you'll have to use something else.

One quick test - shut down the client and the server. Boot the server. After it's up, boot two clients, one on each subnet port. If they all come up fine and working you have successfully built ltsp. That's it for this step.

For the next article I'll be building the boot menu so that instead of booting to LTSP you'll have the option for a few seconds of choosing a different option, such as cloning.

The third article will cover building the cloning image.