Categories
VMware

Rename VMware Virtual machines on ESX

Ever wanted to rename a virtual machine and found out that the “Rename” option merely renames the “friendly name” in VirtualCenter? You could clone the VM to a new one with the proper name, but that requires a lot of downtime usually.
There is a quicker way:

  1. Shut down the VM
  2. Choose “Remove from inventory”
  3. Log into the ESX console and cd to the place where your VM is
  4. Rename the directory
  5. Rename all the files in the directory
  6. Change the names in the vmdk, vmsd, vmx, and vmxf files
  7. Browse the datastore and add the new vmx to the inventory

Or:

cd /vmfs/volumes/vmfs-data6
mv OldVM NewVM
cd NewVM
rename OldVM NewVM *
perl -pi -e 's/OldVM/NewVM/g' NewVM.vm*

Categories
Computers VMware

Areca releases driver for VMware

Areca has just released a beta driver for use with VMware ESX 3.5 🙂

This means that finally all the advantages of the Areca hardware can be used to build VMware systems.

I consider the Areca’s one of the best (if not the best) professional SATA RAID controllers out there.

I have used Dell servers a lot, because they offer more bang for the buck. However, Dell keeps on using crappy RAID controllers that are full of bugs. Over the last few years, it happened several times that Dell servers went down because of RAID controller problems, such as bugs in firmware.
I got really depressed by looking at the firmware history of their shitty PERC controllers – they started out the naming scheme with letters but they had to revert to another scheme as soon as the past the 27th firmware update. How’s that for mature code. Oh, and almost every update is labeled critical by Dell.
The PERC controllers that ship with Dell servers perform OK-ish, but they are hard to manage, they don’t have cool features like online RAID level migration, and at the time did not offer SATA RAID.
Luckily we have an IBM Fibre Channel box to store our data on, so if one of the Dells goes down again (you it will once you’ve seen the driver and firmware history) we don’t risk loosing too much data.

It was very frustating to be forced to buy servers that contain sub-optimal hardware when you know there is much better kit out there. But now, with the Areca drivers available I can create a multi-terabyte 1U VMware server for our disaster recovery plan.

When I get my hands on an Areca controller I will see how VMware behaves with that – to be continued.

Categories
Computers VMware

Multiple full VM backups using VCB, rsync, OpenSSH and VSS

The problem

Our shiny new VI3 setup works really well, but the backup chapter still needs work. I P2V-ed all our Linux boxes to VM’s, so the existing rsnapshot file level backups still run. So far so good.
But, in addition to file level backups, I also want full VM backups, each day, both on-site and off-site. As a matter of fact, I also want some sort of versioning system, to have multiple, full VM, off-site backups. I don’t want to install some mega expensive disk array that contains X times the ~900 Gb of space all my raw VM’s suck up.
What I want is a very simple, efficient and elegant setup, without all kinds of fancy stuff and graphical bells and whistles. I’m running UNIX systems for a living so I’m not afraid of console utilities.

After doing some research I was unable to find any existing solutions, and the ones that come close are commercial and expensive, or require too much complicated crap to be installed.

The solution

Our VMware license includes a license for VMware Consolidated Backup (VCB). Being a great company, VMware has plugins and manuals for all the major closed source, expensive, buggy black boxes enterprise backup suites, but documentation about their command line tools is pretty lame and comes down to one louzy console screen of help text.
Luckily, it seems that in order to make full VM backups you actually need just one command (vcbMounter.exe).

Since the open source program rsync has served me really well in the past, I decided to use it again for our VMware backups.
My setup uses two machines (Windows 2003 Server, as VCB runs only on Windows), one machine hooked up to our SAN, running the VCB software, and one off-site machine housing the archive. Both machines are modest 1U Supermicro boxes, with 4 x 1 Tb SATA in RAID5, on Areca controllers. They are connected via our dedicated WAN link at 100Mb/s.

It basically comes down to:

  1. Full VM backups are created locally with VCB; old backups are first deleted (because VCB refuses to overwrite old backups)
  2. The new backups are transferred to the remote site efficiently and securely using rsync and OpenSSH
  3. The off-site server uses Volume Shadow Copy to create a history of full VM backups

Steps 1 and 2 are done using this batchfile (rename to .bat/cmd).
By using the ––inplace option, we actually update the old backup files on the remote server. This is an important details, because without it the file would be deleted and recreated, thereby killing the efficiency of the VSS part later.
The rsync algorithm will cause only the diffs to go over the line. The backup of all our VM’s together here is about 500 Gb (VCB strips out redundant unused space, saving about 400 Gb already at this stage.
The link to our remote site is 100 Mbit/s, so in the theoretic, most optimistic approach this can transport 36 Gb/hour, which would make the synchronization take at least 13-14 hours. In practice it will be even longer, and thus impratical to use.
Using rsync, only the diffs are sent over the line.
Our situation, with 10 VM’s running website, e-mail, database, fileservers, applications etc, the first results show that the daily diffs are somewhere between 20 and 30 Gb. This would theoretically take less than a hour to transport.
The practical situation is alot different. Although the actual amount of data is reasonably small, running rsync with its sliding checksums on half a terabyte of binary chunks takes also hours.
My real-world example show that the VCB backups themselves take about 1.5 hours to execute, yielding a directory with ~500 Gb of backups. This then gets rsync-ed to the remote site, which takes 5-8 hours (a seen during the last week). This is a workable solution for a daily schedule.

The partition that houses the data on the remote server has Volume Shadow Copy enabled, and creates Shadow Copies daily at the appropriate time (30 minutes before the other site initiates the rsync step).
The following picture shows that we now have 5 full copies available of our 500 Gb directory, but instead of an extra 5 x 500 Gb = 2.5 Tb, it merely takes up an extra 120 Gb:
shadow copies dailog box on w2k3

At this stage we’ve got:

  • Daily full backups of our VM’s on-site
  • Multiple full backups of our VM’s off-site

Caveats

  • To prepare everything, I need a full copy of the 500 Gb tree on both machines. Initially I planned on using rsync and OpenSSH, but it turned out that the OpenSSH daemon on Windows is very slow. With my systems (dual Xeon 3 GHz etc) connected via gigabit, the throughput maxed out at about 6-7 Mb/sec (Linux to Linux: > 30 Mb/sec).
    Instead of using rsync/OpenSSH, I simply mounted the disk with CIFS and copied over the whole tree.
    Subsequent tranfers are already limited to about 12 Mb/sec because of our uplink speed, but that is not a poblem in the real world scenario.
  • I have used the cwRsync package to install rsync and OpenSSH on Windows. OpenSSH with public key authentication between two Windows systems is possible and runs perfectly fine, but setting things up can be a bit hairy, especially if you’re used to UNIX systems…
  • To secure things, you should restrict access to the OpenSSH daemon; I have used the buildin Windows Firewall to accomplish this, and it works fine.
  • This article describes only half of the story. The other half is called restore. The Vcb utility to restore VM’s that comes with VCB (vcbRestore.exe) is pretty buggy and inflexible. It is hard to restore VM’s to a different place or with a different name. As some people have found out, it is possible to use VMware Converter to convert restore VCB backups to a different system (VMware ESX, server, Workstation, etc), but even despite VMware now claiming Converter can do it, this step still requires manual fiddling with vmx and vmdk files.
    I have recently updated VMware Converter to 3.0.2u1 build 62456, and now everything works like a charm 🙂
    It is installed on the same machine as VCB, so Converter has direct access to the backups. The restoration process is very straighforward and easy to understand. The software allows you to change the disksize of the restore VM, the datastore where the VM will be put, and the name. This name is then reflected at low-level, so not only the ‘friendly name’, but also the VMDK files have this new name. I have restored several machines and it worked without glitch. The only downside is that the restoration process takes places over the network, which is a bit slower than the backup process, which is done over fibre channel. But with gigabit ethernet restoring a small VM of 4Gb only took a few minutes.
    This way of restoring also allows you to restore a VM onto a totally different system. This might come in handy for Disaster Recovery, where you might be forced to revive a VM onto VMware server or even on VMware Workstation.
Categories
VMware

mount: special device /var/run does not exist

While P2V-ing an Ubuntu 6.06 server with my warm-cloning P2V method, I ran into a strange problem when booting:

Sceendump - click to enlarge

The exact text of the error is:


mount: special device /var/run does not exist
mount: special device /var/lock does not exist
mount: wrong fs type, bad option, bad superblock on /dev/shm/var.run,
       missing codepage or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so

mount: wrong fs type, bad option, bad superblock on /dev/shm/var.lock,
       missing codepage or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so

Everything seems to work, untill I found out that the extra IP addresses on secondary network interfaces were not able to carry any network traffic…
I had to cancel the virtualization and revert to the physical machine again.
The system in question has a separate /var partition (see booting picture, /dev/sda6 in my case).
I first mount the root filesystem, which holds on empty “var” directory, and then mount /dev/sda6 on that “var” directory. I had created empty “run” and “lock” directories in there to be able to mount /var/run and /var/lock. Wrong!

Turns out that the root filesystem needs to contain /var/run and /var/lock, even though the system has a separate /var partition.

:bonk: :bonk: :bonk:

Thanks to Chris Siebenmann for pointing this out on his wiki.

The solution thus is to boot from Ubuntu Live CD, mount ONLY the root filesystem, and create the /var/run and /var/lock directories.
They are only needed for mounting the tempfs partitions, and will be hidden by the real /var partition, which is mounted over it once the system has finished booting.

Categories
VMware

Manual P2V of Debian Sarge

Now that we at TERENA have a new and shiny setup of VMware VI3, I had to migrate several of our Debian 3.1 Sarge servers. Some of them had custom kernels, because of specific hardware.

This virtualisation process (P2V, or Physical to Virtual) is properly supported for the Windows platform using the VMware Converter software. This works very nice, and supports hot cloning. However when your Physical Machine (PM) is running Linux, hot cloning is not possible. The way to go is to use the VMWare Converter Boot CD. This requires rebooting the PM with a bootable CDROM, and from the PE-boot environment on that CDROM the dead corps is cloned to a VM.

The downside is of course that the machine has to be brought down for a substantial amount of time. Also, if your PM uses specific I/O controllers and/or network cards, the boot CDROM will need to be customized to hold the right drivers. This has to be tested too, so it might even take several times of downtime.

By doing things manually, you can avoid almost all of the downtime. I P2V-ed 3 systems, all over 100Gb, each with less than 20 minutes downtime.

Also, because you ‘warm’ clone a live system, you don’t need to worry about disk and network drivers. Another benefit compared to cold cloning is that you can test things first on a dummy VM without any downtime at all.
This article documents all the steps needed. It assumes your old PM is running Debian Sarge with one of the 2.6.8 kernels, uses GRUB as bootloader, has rsync installed, and can be reached by the root user via SSH.

The procedure basically comes down to cloning a live system to a dead VM, stopping all services, do a final syncronisation, and revive the dead VM.

Step-by-step guide:

  • Create a VM with at least as much diskspace as the PM.
  • Configure the VM to boot an ISO image of Ubuntu 6.06 LTS Desktop Edition
  • Open up a shell, su to root, and partition the disk. If you stick to exactly the same partition scheme, you don’t have to change the fstab file. You can change the size without any problem too. If you decide to change the partition scheme, be sure to not split directories that contain hard links. For example, if your PM has just one big /-partition, and you decide that the new VM will have separate / and /usr partitions, this will not work because hard links cannot be created across partitions.
  • Once all partitions are created, make filesystems on them (don’t forget swap), and mount them in the correct order under a temporary directory, let’s say /root/oldbox . Create root’s dir /root/oldbox/root and in there create a file /root/oldbox/root/excluded that contains:

    /proc/*
    /sys/*
    /dev/*
    /mnt/*
    /tmp/*
    /root/excluded

    If you changed the partition scheme, you should put /etc/fstab here too, and manually put the correct one in place.
  • cd into /root/oldbox/root and rsync everything from the PM into it:
    rsync -avH --numeric-ids --delete \
    --exclude-from=root/excluded IP_of_PM:/ .

    This will take a while, depending on the amount of data your PM has.
  • Once everything is copied over, the time has come to shutdown all data-writing services on your PM (mail, databases, etc). Ideally only the SSH daemon should run. This means that most of your services will be offline from here. The good thing is that this period can be really small.
  • Once you made sure that nothing runs except SSH on your PM, rerun the rsync command. This time it will be quick, as only the diffs will need to be transferred. This usually involves open files from databases, logfiles, etc.
  • Now create the initial device nodes needed for the kernel:
    cd /root/oldbox
    mknod -m 660 /root/oldbox/dev/console c 5 1
    mknod -m 660 /root/oldbox/dev/null c 1 3
  • mount the proc and dev filesystem and chroot into the /root/oldbox dir:

    mount -t proc none /root/oldbox/proc
    mount -o bind /dev /root/oldbox/dev
    chroot /root/oldbox
  • If we now would reboot, the old initrd image would not recognize the proper modules to (unless your PM accidentally had a LSI controller). We need to add the drivers. To do this, add this to the file /etc/mkinitrd/modules (assuming your PM runs one of the 2.6.8 debian kernels):

    mptscsih
    mptbase

    And regenerate the initrd image (depending on your specific kernel version):
    mkinitrd -o /boot/initrd.img-2.6.8-4-686-smp 2.6.8-4-686-smp
    (Since my PM had an older, custom kernel (2.6.8-3-686-smp), I installed a newer one, plus udevd. During installation a new initrd image is generated automatically:
    apt-get install udev kernel-image-2.6.8-4-686-smp)
  • Now we need to regenerate the bootblock. Run the grub command to enter the grub shell and see where it finds the stage1 file:
    find /boot/grub/stage1

    It should come up with something like (hd0,1). Use this as argument for the next command:root (hd0,1)
    Then use the hd part only for the next command:
    setup (hd0)
    Then issue quit to leave the grub shell.

By now your system is ready to boot. Leave the chroot environment (exit), unmount the dev and proc filesystem, then unmount all filesystems under /root/oldbox, issue sync, and then halt.
To avoid network clash, unplug the network cable of the PM, or shut it down.
Now you can power on your VM, it should boot a virtualized copy of your Debian system 🙂

TODO – Some things to do afterwards (vmtools, etc)