Virtualization

From WBITT's Cooker!

Revision as of 17:57, 13 April 2010 by Kamran (Talk | contribs)
Jump to: navigation, search

A little about the Author:

Muhammad Kamran Azeem [ CISSP, RHCE, OCP (DBA) ]. Working on High Performance Computing Clusters at Saudi Aramco. More at http://wbitt.com

Contents

Why this document ?

I have been planning to create a CBT on Xen, for a long time. (More than a year actually!). The objective of the CBT was to help everyone understand Virtualization, and particularly develop a training material to pass RedHat Virtualization exam. Later, I though that this will be too limited a scope. So I decided to include KVM with it too. (I actually plan to include Citrix XenServer and VMware ESX/Vspehere too, at a later stage!). Last week, I decided to go ahead with this idea of making the CBT. I thought that explaining all these basics, in the beginning of the CBT, with no real action happening on the screen, would be too boring for the viewers/students. So I thought of making some sort of presentation in the OpenOffice Presenter software. Soon enough I had another problem at my hands. It became difficult for me to add points in a slides. Because each point added in some random slide, would cause a ripple-effect. That is I had to manually cut and move text from one slide to another, then another, and so on. That became too laborious. So I decided to use cooker, which in-fact, I setup earlier for these types of tasks, earlier. And that is my friends, the reason of this document, which you see here! (It is part of training material for the CBT).

Virtualization

  • What is Virtualization?
  • Commonly known virtualization technologies?
  • Advantages and Disadvantages of Virtualization?
  • Types of Virtualization (Para / Full, etc)
  • Types of Hyper-visors
  • Technologies we will cover: XEN, KVM
  • Note: Most of the material was obtained from Wikipedia http://en.wikipedia.org/wiki/

What is Virtualization?

Virtualization is a term that refers to the abstraction of computer resources. In simpler words, the mechanism to run multiple instances/copies of various operating systems inside a base operating system, mainly to utilize under-used resources on the physical host, where base operating system is running.

History of Virtualization

  • The IBM System/360 Model 67 (S/360-67) was a mainframe, and first shipped in July 1966. It included features to facilitate time-sharing applications, notably virtual memory hardware and 32-bit addressing.
  • CP/CMS was the first fully-virtualized virtual machine operating system, running on IBM System/360 Model 67, and evolved from the ground-breaking research system CP-40.
  • The S/360-67 included various hardware and microcode features that enabled full virtualization of the raw S/360 hardware. The full-virtualization concept was pioneered with CP-40 on custom hardware; its implementation on the S/360-67 made CP-67 possible.
  • It is important to note that full hardware virtualization was not an original design goal for the S/360-67.
  • Thus, in many respects, it can be said, that IBM's CP-67 and CP/CMS products anticipated (and heavily influenced) contemporary virtualization software, such as VMware Workstation, Xen, and Microsoft Virtual PC.
  • The IBM System/370 (S/370) was a model range of IBM mainframes announced on June 30, 1970 as the successors to the System/360 family.
  • Full virtualization was not quite possible with the x86 platform until the 2005-2006 addition of the AMD-V and Intel VT extensions.
  • Many platform virtual machines for the x86 platform came very close, and claimed full virtualization even prior to the AMD-V and Intel VT additions. e.g. Parallels Workstation, VMware Workstation, VMware Server (formerly GSX Server), VirtualBox, etc.

VMware

  • VMware was founded in 1998 and delivered its first product, VMware Workstation, in 1999.
  • VMware and similar virtualization software for the x86 processor family must employ binary translation techniques to trap and virtualize the execution of certain instructions. These techniques incur some performance overhead as compared to a VM running on a natively virtualizable architecture.
  • VMware is closed source.

Xen

  • Xen originated as a research project (XenoServer) at the University of Cambridge, led by Ian Pratt, who later founded XenSource, Inc.
  • XenSource supports the development of the open source project and also sells enterprise versions of the software.
  • Details about Xen's design are in the 2003 research paper: Xen and the Art of Virtualization
  • Xen is open source software.
  • The first public release of Xen occurred in 2003.
  • Citrix Systems acquired XenSource, Inc in October 2007 and subsequently renamed Xensource's products under the Citrix brand.
  • Xen Management Consoles
    • Xen Tools
    • Ganeti
    • Perl-based MLN
    • Web-based HyperVM and FluidVM, Cloudmin
    • GUI applications Convirture (formerly XenMan) and Red Hat's Virtual Machine Manager, virt-manager.
    • Novell's PlateSpin Orchestrate also manages Xen VMs in SUSE Linux Enterprise Server.
  • Xen supported architectures are:
    • 32-bit x86 with PAE support
    • Intel 64/AMD64
    • Intel Itanium 2
    • Xen's Full-Virtualization additionally requires availability of Intel VT-x or AMD-V technology within the processor.
    • Note1: Xen does not support committing more RAM to VMs (in total) than the total physical RAM you have on the physical host. Means you cannot over-commit RAM.
    • Note2: Xen allows/supports committing more CPUs to VMs (in total) than the total physical CPUs you have on the physical host. That will, however have a negative effect on the performance.

QEMU

QEMU was presented in USENIX 2005 Annual Technical Conference. QEMU was written by Fabrice Bellard and is free software. Specifically, the QEMU virtual CPU core library is released under the GNU Lesser General Public License (GNU LGPL). Many hardware device emulation sources are released under the BSD license. Here is the link to his paper on QEMU: http://www.usenix.org/publications/library/proceedings/usenix05/tech/freenix/full_papers/bellard/bellard_html/index.html

QEMU is a machine emulator: it can run an unmodified target operating system (such as Windows or Linux) and all its applications in a virtual machine. QEMU itself runs on several host operating systems such as Linux, Windows and Mac OS X. The host and target CPUs can be different.

The primary usage of QEMU is to run one operating system on another, such as Windows on Linux or Linux on Windows. Another usage is debugging because the virtual machine can be easily stopped, and its state can be inspected, saved and restored. Moreover, specific embedded devices can be simulated by adding new machine descriptions and new emulated devices.

QEMU also integrates a Linux specific user mode emulator. It is a subset of the machine emulator which runs Linux processes for one target CPU on another CPU. It is mainly used to test the result of cross compilers or to test the CPU emulator without having to start a complete virtual machine.

QEMU is made of several subsystems:

  • CPU emulator (currently x86, PowerPC, ARM and Sparc)
  • Emulated devices (e.g. VGA display, 16450 serial port, PS/2 mouse and keyboard, IDE hard disk, NE2000 network card, ...)
  • Generic devices (e.g. block devices, character devices, network devices) used to connect the emulated devices to the corresponding host devices
  • Machine descriptions (e.g. PC, PowerMac, Sun4m) instantiating the emulated devices
  • Debugger
  • User interface

QEMU will be discussed at a proper time in this document.


QEMU is a generic and open source machine emulator and virtualizer.

When used as a machine emulator, QEMU can run OSes and programs made for one machine (e.g. an ARM board) on a different machine (e.g. your own PC). By using dynamic translation, it achieves very good performances.

When used as a virtualizer, QEMU achieves near native performances by executing the guest code directly on the host CPU. QEMU supports virtualization when executing under the Xen hypervisor or using the KVM kernel module in Linux. When using KVM, QEMU can virtualize x86, server and embedded PowerPC, and S390 guests.

In conjunction with CPU emulation, it also provides a set of device models, allowing it to run a variety of unmodified guest operating systems; it can thus be viewed as a hosted virtual machine monitor. It also provides an accelerated mode for supporting a mixture of binary translation (for kernel code) and native execution (for user code), in the same fashion as VMware Workstation and Microsoft Virtual PC.

One feature exclusive to QEMU is that of portability: the virtual machines can be run on any PC, even those where the user has only limited rights with no administrator access, making the 'PC-on-a-USB-stick' concept very real.


QEMU has two operating modes:

User mode emulation :

QEMU can launch Linux or Darwin/Mac OS X processes compiled for one CPU on another CPU. Target OS system calls are thunked for endianness and 32/64 bit mismatches. WINE Windows API reimplementation and DOSEMU are the main targets for QEMU in user mode emulation. This mode also eases cross-compilation and cross-debugging.

Complete Computer System mode emulation

QEMU emulates a full computer system, including a processor and various peripherals. It can be used to provide virtual hosting of several virtual computers on a single computer. QEMU can boot many guest operating systems, including Linux, Solaris, Microsoft Windows, DOS, and BSD [1]; it supports emulating several hardware platforms, including x86, AMD64, ARM, Alpha, ETRAX CRIS, MIPS, MicroBlaze and SPARC.

KVM

  • KVM is open source software.
  • KVM ( Kernel-based Virtual Machine) was developed by Qumranet, Inc.
  • On September 4, 2008, Qumranet was acquired by Red Hat, Inc.
  • KVM is a full virtualization solution for Linux on x86 hardware containing virtualization extensions (Intel VT or AMD-V).
  • Using KVM, one can run multiple VMs running unmodified Linux or Windows images.
  • Each virtual machine has private virtualized hardware: a network card, disk, graphics adapter, etc.
  • The kernel component of KVM is included in mainline Linux, as of 2.6.20.
  • KVM management tools: ovirt, Virtual Machine Manager, etc.

Parallels

  • Parallels uses Intel Core's virtualization technology to allow the virtual machine direct access to the host computer's processor. Much of Parallels' software is based on a lightweight hyper-visor architecture, which provides the guest operating system direct access to the computer's hardware. Each Parallels virtual machine functions like a real computer with its own processor, RAM, floppy, CD drives, hard drive and tools.
  • First released December 8, 2005, Parallels Workstation enables users to create multiple, independent virtual machines on one PC. Workstation consists of a virtual machine suite for Intel x86-compatible computers (running Microsoft Windows or Linux), which allows the simultaneous creation and execution of multiple x86 virtual machines. Workstation supports hardware x86 virtualization technologies such as Intel VT.
  • Parallels Virtuozzo Containers is an operating system-level virtualization product designed for large-scale homegenous server environments and data centers. Parallels Virtuozzo Containers is compatible with x86, x86-64 and IA-64 platforms. Parallels Virtuozzo Containers was first released under Parallels' former parent company SWsoft. The Linux version was released in 2001 while the Windows version was released in 2005.
  • Recently released Parallels Workstation 4.0 Extreme delivers a powerful, next-generation virtualization platform that enables end-users to experience dedicated host graphic and networking resources in a virtualized environment. Consolidate the workflow of several machines on a single, high-performance workstation — without sacrificing performance and flexibility.
  • Parallels Workstation 4.0 Extreme is the first software for workstations to support Intel Virtualization Technology for Direct I/O (Intel VT-d) for full GPU acceleration. Virtualize 3-D modeling, visualization and Hi-Definition (HD) video programs in a Windows and Linux virtual environment at full speeds.

OpenVZ

  • OpenVZ is an operating system-level virtualization technology based on the Linux kernel and operating system. OpenVZ allows a physical server to run multiple isolated operating system instances, known as containers, Virtual Private Servers (VPSs), or Virtual Environments (VEs). It is similar to FreeBSD Jails and Solaris Zones.
  • As compared to virtual machines such as VMware and paravirtualization technologies like Xen, OpenVZ is limited in that it requires both the host and guest OS to be Linux (although Linux distributions can be different in different containers). However, OpenVZ claims a performance advantage; according to its website, there is only a 1–3% performance penalty for OpenVZ as compared to using a standalone server.[1] One independent performance evaluation[2] confirms this. Another shows more significant performance penalties[3] depending on the metric used.
  • OpenVZ is the basis of Virtuozzo Containers, a proprietary software product provided by Parallels, Inc. OpenVZ is licensed under the GPL version 2 and is supported and sponsored by Parallels whereas the company does not offer commercial end-user support for OpenVZ.
  • OpenVZ is divided into a custom kernel and user-level tools.

VirtualBox

  • Oracle VM VirtualBox is an x86 virtualization software package, originally created by German software company Innotek, now developed by Oracle Corporation as part of its family of virtualization products. It is installed on an existing host operating system; within this application, additional guest operating systems, each known as a Guest OS, can be loaded and run, each with its own virtual environment.
  • Supported host operating systems include Linux, Mac OS X, OS/2 Warp, Windows XP, Windows Vista, Windows 7 and Solaris; there is also an experimental port to FreeBSD.[1] Supported guest operating systems include a small number of versions of NetBSD and various versions of DragonFlyBSD, FreeBSD, Linux, OpenBSD, OS/2 Warp, Windows, Solaris, Haiku, Syllable, ReactOS and SkyOS.[2]
  • According to a 2007 survey by DesktopLinux.com, VirtualBox was the third most popular software package for running Windows programs on Linux desktops.
  • In January 2007, VirtualBox Open Source Edition (OSE) was released as free software, subject to the requirements of the GNU General Public License (GPL), version 2.
  • Sun Microsystems acquired Innotek, the original developers of VirtualBox, in February 2008.
  • Oracle Corporation acquired Sun in January 2010, at which point the product was re-branded as Oracle VM VirtualBox.

Other examples of x86 virtualization software

  • Microsoft's Virtual PC, Hyper-V, and Microsoft Virtual Server.
  • Open-source solutions: QEMU, Kernel-based Virtual Machine (KVM) and VirtualBox.
  • Research systems: Denali, L4, and Xen.
  • The following software conditionally makes use of the support offered by AMD-V and/or Intel VT:
    • KVM, VirtualBox, Xen, VMware ESX Server (also known as vSphere). Microsoft Hyper-V, Microsoft Virtual Server (also branded as Microsoft Virtual PC or Windows Virtual PC), Oracle VM (uses Xen). Parallels Workstation, Parallels Server, Sun xVM, Virtual Iron, VMware Workstation, VMware Fusion, VMware Server.

Virtualization terminology

  • Virtualization: Virtualization is a term that refers to the abstraction of computer resources. In simpler words, the mechanism to run multiple instances/copies of various operating systems inside a base operating system, mainly to utilize under-used resources on the physical host, where base operating system is running.
  • Hyper-Visor or Virtual Machine Monitor (VMM) : It is the software which manages and supports the virtualization environment. It runs the virtual machines and isolates them from real hardware. There are three types of Hyper-Visors.
    • Type 1 Hyper-visor: A hyper-visor running on bare metal hardware, e.g. Linux KVM, IBM z/VM, VMware ESX, etc
    • Type 2 Hyper-visor: Virtualization software that runs on the host OS. e.g. VMware workstation, VMware server (formerly known as GSX server), Parallels Desktop, Microsoft Virtual Server, etc.
    • Hybrid Hyper-visor: Runs directly on bare metal like Type 1, but depends heavily on drivers and support from one of its (privileged) virtual machines to function properly. e.g. Xen. Dom-0 is the special VM, which is needed by kernel-xen.
  • Emulator: Emulator is a software which emulates all pieces of hardware for it's VM. e.g. VMware, Qemu, etc.
  • Shared Kernel: Used in chrooted / jailed virtual environments. All machines share the same kernel, and most of the libraries. Only some parts of the OS are (so called) "virtualized", or made available to the VM through separate directories.
  • Domain: Any virtual machine running on hyper-visor.
  • Domain-0 / Privileged Domain: A virtual machine having privileged access to the hyper-visor. It manages the hypervisor and the other VMs. This domain is always started first by the hyper-visor, on system boot. Also referred to as Management Domain or Management Console. Dom-0 can be used in "Thick" or "Thin" model. Thick model means that a lot of software is present to assist virtual machine management. Such as laptops, desktops, etc, used for development and testing. Thin model means that Dom-0 is kept as thin as possible by providing just the bare minimum software components to the hyper-visor to run the virtual machines properly. This results in lesser resource utilization by the Dom-0, and providing more resources to the guest domains. Used in production environments, on production servers, etc.
  • Domain-U / Guest Domains / User Domains: VM created by Dom-0. Sometimes simply known as Guest, or Dom-U.
  • PAE: Physical Address Extension, is a feature first implemented in the Intel Pentium Pro to allow x86 processors to access more than 4 gigabytes of random access memory if the operating system supports it. It was extended by AMD to add a level to the page table hierarchy, to allow it to handle up to 52-bit physical addresses, add NX bit functionality, and make it the mandatory memory paging model in long mode.
    • PAE is provided by Intel Pentium Pro (and above) CPUs - including all later Pentium-series processors except the 400 MHz bus versions of the Pentium M, as well as by other processors such as the AMD Athlon and later AMD processor models with similar or more advanced versions of the same architecture.
    • Required to be present on 32-bit x86 CPU, if para-virtualization is to be used. (This means that you can most certainly use Xen for-para-virtualization on almost any hardware lying around in your office / home.)
  • Intel VT-x (sometimes Intel VT)
    • Intel VT-x (Virtualization Technology) is the Intel's hardware assistance for processors running virtualization platforms.
    • Intel VT includes a series of extensions for hardware virtualization. The Intel VT-x extensions, adds migration, priority and memory handling capabilities to a wide range of Intel processors. By comparison, the VT-d extensions add virtualization support to Intel chipsets that can assign specific I/O devices to specific virtual machines (VM)s, while the VT-c extensions bring better virtualization support to I/O devices such as network switches.
  • AMD-V
    • AMD-V (AMD Virtualization) is a set of hardware extensions for the x86 processor architecture. AMD designed the extensions to perform repetitive tasks normally performed by software and improve resource use and virtual machine (VM) performance.
    • AMD-V technology was first announced in 2004 and added to AMD's Pacifica 64-bit x86 processor designs.
    • By 2006, AMD's Athlon 64 X2 and Athlon 64 FX processors appeared with AMD-V technology, and today, the technology is available on Turion 64 X2, second- and third-generation Opteron, Phenom and Phenom II processors.

Processor capability identification tips:

On linux, you can check the /proc/cpuinfo file and see if the flags line has "vmx" (for Intel) or smx (for AMD) in it. If the following line results in some text, then your CPU (irrespective of being Intel or AMD), has Hardware-Assisted Full Virtualization support.

egrep ‘(vmx|svm)’ /proc/cpuinfo

If the command above, does not return any results, or just returns to the command prompt silently, then your processor does not support Hardware-Assisted Full Virtualization. However, it should be noted that sometimes, this feature is turned off in the BIOS. Therefore you should check your BIOS settings first, to verify that.

Also, if your CPU is an older model, and does not have Intel VT-x or AMD-V technologies, all hope is not lost. Check if your CPU provides PAE feature. If that is there, you can still use/create Para-Virtual virtual machines on this machine, using Xen. You can also use the emulation based full virtualization products such as Qemu, Bochs, Virtual Box, VMware workstation, etc. Here is how you can check for PAE for your CPU in Linux:

grep pae /proc/cpuinfo

Additional tip to check if your processor is 64 bit or not is to check for a flag "lm" (meaning "long mode") in the cpu flags. If the command below returns some text, you have a 64 bit processor:

grep -w lm /proc/cpuinfo

Also, just for convenience, another tip being placed here is, how to know if you are running a 32 bit Linux OS or a 64 bit Linux OS? This is important to know because sometimes, someone has physical machine with 64 bit processor, but out of ignorance, or need, he installed a 32 bit Linux OS on it. In such a case, he cannot use the full power/features of the CPU with the installed 32 bit OS. If you see x86_64 in the command output of "uname -a", (just before the words GNU/Linux), you are running 64-bit version of Linux. Seeing i386, or i686 in the output, would mean that you are running a 32-bit Linux OS. The "lm" mode described above tells you if your processor itself is 64-bit capable or not.

[kamran@test ~]$ uname -a
Linux lnxlan215 2.6.30.8-64.fc11.x86_64 #1 SMP Fri Sep 25 04:43:32 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[kamran@test ~]$
  • PVM: Para-Virtual Machine . A virtual machine created using Xen's para-virtualization technology.
  • HVM: Hardware-assisted Virtual Machine. A virtual machine created using Xen's or KMV's hardware-assisted full virtualization technology, on a physical host which supports Intel VT-x/AMD-V extensions in the processor.

Why use Virtualization?

  • Consolidation
    • Power, Rack/Desk/Floor Space, Hardware, HVAC, Wiring/cabling, etc.
  • Efficient utilization of under-utilized resources
    • CPU / memory, disks, bandwidth, etc.
  • Support for applications only supporting older versions of some OS.
  • Service / domain / role based compartmentalization
    • e.g. mail server and web server on separate VMs.
  • Fail-over and Load Balancing features.
  • Development and Testing
    • Developers can test code on test servers.
    • Easy roll-backs.
    • Replica of production server can be created as a VM, so patches, etc can be tested.
    • Programs/Applications targeted to run on different OS / platforms can be tested. e.g a web application in need to be tested on firefox running on Linux and on Windows.
    • Virus testing, Spam testing, password cracking, sniffing, DOS, etc can all be tested safely.
  • Training
    • Virtual labs can be setup with less resources
    • Security training can be delivered without concerns of busting out in the production network.
    • Each student can have his own (virtual) lab in his own PC/Laptop, in addition to the lab provided by the instructor.
  • Virtual Appliances
    • Appliances, such a a hardened mail server, can be created, which simply would need to be started as a VM in your host OS, etc. Same can be done to create fully functional web hosting servers. (I have deployed few web hosting servers, using this method! )
    • Ease of machine migration in case of hardware failure. (e.g. No need to re-install / reconfigure your favourite mail server from scratch!)
  • Legacy application support
    • Legacy applications / programs , which do not support latest hardware or OS, etc, can be made to run on the OS they support, in a virtualized environment.
    • Such applications benefit from the newer hardware, such as speed , and thus run faster.
    • And someone said, less screw-drivers!

Why “not” Virtualization?

  • Administration of VMs, when more than a few, is more complicated, and sensitive than more than a few physical servers. The different VM interfaces, such as VMware's Virtual Infrastructure Center, and KVM's ovirt, try to address this.
  • For live migrations involving movement of a VM from one physical host to another, involves extra IPs, etc. Plus the shared storage, sometimes cluster file systems, etc.
  • Various networking problems arise, such as firewalls, routing, switching, bridging etc.
  • Some service providers (such as ServerBeach), does not support “bridged connections” from the rented server. This adds extra complexity in managing the physical host's firewall, routing tables, etc.
  • Hardware needs to be more fault-tolerant, and relatively powerful, compared to single server/service requirements.
  • Console access, block device access, recovery, system trouble-shooting, etc, are complex areas to handle.

Virtualization Technologies

Emulation-based Full Virtualization

    • Slower than hardware-based full virtualization.
    • Hyper-visor simulates the virtual machine in software, by analyzing all instructions and converting each one appropriately before it gets to the CPU.
    • Dynamic translation is a technique used to improve performance. Which is, the hypervisor analyses the binary instructions just before they are run, allowing safe instructions to run unmodified, but converting sensitive instructions just before they execute. The converted code is also cached in memory, to speed up future (sensitive) instructions coming in for execution.
    • Dynamic recompilation optimizes frequently reused sequences on the fly.
    • Full vitualization with Dynamic Recompilation is the basic technique used by VMware, for it's initial / basic products, VMware Workstation, VMware Server , etc.
    • Full emulation can also be used to simulate non-standard processor architectures, needed by different OS / applications, by converting all instructions.
    • This method of simulating/emulating results in very slow VMs.
    • QEMU, Bochs are example of non-native / non-standard processor emulators for/on Linux.

Native / Hardware-based / Hardware-assisted Full Virtualization

    • Requires CPU based hardware acceleration. (Intel VT-x, AMD-V)
    • Bare-metal look and feel. Access to HW is controlled through hyper-visor.
    • Almost all code coming in from VM is run directly by the CPU of the physical host, without any changes, for efficiency.
    • The hyper-visor only steps in when the code coming in from the VM uses sensitive instructions, that would interfere with the state of the hyper-visor itself, or the environment it is supported by.
    • Such sensitive instructions must be intercepted by the hyper-visor and translated/replaced with safe equivalents before they are actually executed on the CPU of the physical host.
    • To do this, all sensitive instructions in the CPU's Instruction Set Architecture (ISA), must be defined as privileged.
    • Traditional x86 architecture's instruction set has about 17 instructions which are sensitive, but they are not defined as privileged, which are unable to trap such instructions coming from VM. The latest Intel Itanium-2 has three instructions which are sensitive, but are still not defined as privileged.
    • Intel VT-x and AMD-V technologies were developed to overcome this problem on modern 32-bit and 64 bit x86 processors.
    • In Linux, Xen hyper-visor "can use" these new CPU features. Whereas, KVM "needs/requires" these features in the CPU, for it (KVM hyper-visor) to work.
    • Examples are: KVM, VMware ESX
    • Un-modified Guest OS can be used as VM. e.g. Windows.

Para Virtualization / Cooperative Virtualization

    • Works without the newly available CPU based hardware acceleration technologies, such as Intel VT-x and AMD-V.
    • e.g. Xen.
    • The "hyper-visor aware" code is integrated into the kernel of the operating systems running on the virtual machines. This results in a "modified kernel", commonly known as "kernel-xen" instead of simply "kernel". That is why you will see "kernel-xen-x.y" booting up when you power up your virtual machine OS. The base OS / Domain-0 already runs under kernel-xen. Generally, no other changes are required on the rest of the software on the virtual machines. Xen is the actual hyper-visor, which runs directly on the CPU of the physical host, with "full speed". In other words, the (modified) kernel of each virtual machine's OS actually runs on the hyper-visor, assuming the hyper-visor to be CPU itself. This happens to all the VMs. This eliminates the need to have a separate trapping / translation mechanism to be present in the hyper-visor.
    • The above description implies that only modified Guest OS can be used as VM, which understand the hyper-visor. That means windows and family products cannot be run in Para-Virtualization environment. (One of the excellent books on Xen: "The book of Xen", mentions that, though Xen team did port Windows to Xen during the development process, no "released" version of Windoes can run under Xen in para-virtualized mode.)
    • This also means that all versions / derivatives of Linux, which have "kernel-xen" included in their package list can be used as Dom-U / Guests.
    • Only the hyper-visor has privileged access to the CPU, and is designed to be as small and limited as possible.
    • The Xen hyper-visor interacts with the OS running under it's control, using very few well-defined interfaces, called hyper-calls. Xen has about 50 hyper-calls compared to about 300 for Linux!
    • Hyper-calls are "asynchronous",so that the hyper-calls themselves don't block other processes or other OSes.
    • The Xen paper, mentioned earlier (Xen and the art of virtualization), indicates performance degradation of less than 2 percent for standard work-load scenarios. And a degradation of between 10 and 20 percent for worst case scenarios!
    • The base OS, which actually installs Xen hyper-visor on the physical host, is also referred to as "Privileged Domain" or "Domain-0" or "Dom-0". This privileged domain is in-turn used to manage the hypervisor. This privileged domain manages all other virtual machines created under Xen hyper-visor. These other virtual machines are referred to as "Guest Domains" or "User Domains" or "Dom-U". That means the OS of the privileged domain, also runs as a VM, under Xen hyper-visor, just like other virtual machines on the same physical host, "but", "with more privileges". Dom-0 has direct access to the hardware resources of the physical host.
    • Para-Virtualization never allows emulation. That means that any guest OS will see the same processor, as seen by the Dom-0 / physical host / base-OS.
    • Para-Virtualization should always be the selected, because of speed and performance, if there is a choice.
    • Advantages are :
      • Works on older hardware, or on the hardware which does not have hardware-assisted full virtualization.
      • Works much faster than Emulated or Hardware based virtualization technologies. Sometimes outperforming the actual bare metal performance!
    • Performance can further be enhanced by presenting virtual block devices to the virtual machines, instead of real block devices. This means that special para-virtualized drivers need to be present in the OS running on the VM. The co-operation between the kernel and the hyper-visor can allow para-virtualized drivers to have much lower overhead than native drivers.

OS Virtualization

    • OS Chroot environments.
    • OpenVZ, Solaris Containers, FreeBSD jails, etc.
    • Shared kernel is the single point of failure.

Application Virtualization

    • Application creates a sandbox environment in browser, etc. e.g. JRE .

API-level Virtualization

    • Virtualization provided to support single application.
    • e.g. WINE is used to run Windows programs in Linux environment.

Xen Architecture

  • As mentioned earlier, Xen hyper-visor runs directly on machine's hardware, in place of the operating system. The OS is in-fact loaded as a module by GRUB.
  • When GRUB boots, it loads the hyper-visor, "kernel-xen".
  • The hyper-visor then creates initial Xen domain, Domain-0 (Dom-0 for short).
  • Dom-0 has full / privileged access to the hardware and to the hyper-visor, through it's control interfaces.
  • xend, the user-space service is started to support utilities, which in-turn can install and control other domains and manage the xen hyper-visor itself.
  • It is critical, and thus included in Xen's design, to provide security to Dom-0. If that is compromised, the hyper-visor and the other virtual machines/domains can also be compromised on same machine.
  • One Dom-U, cannot directly access another Dom-U. All user domains are accessed only by Domain-0.
  • In RHEL 5.x the hyper-visor uses kernel-xen. The RHEL5.x user domains also use (their own, individual) kernel-xen to boot up.
  • User domains running RHEL 4.5 and higher (4.x), use kernel-xenU as their boot kernel.
  • RHEL 4 cannot be used as Dom-0.
  • Fedora 4 to Fedora 8 can be used as Dom-0. Later versions of Fedora 9-12, cannot be used as Dom-0.
  • All versions of Fedora can be used as Dom-U.
  • On the contrary, fully-virtualized hardware virtual machines (user/guest domains), must use normal "kernel", instead of "kernel-xen".

The Privilege Rings architecture

  • Security Rings, also known as privilege rings, or privilege levels, are a design feature of all the modern day processors. The lowest numbered ring has the highest privilege and the highest numbered ring has the lowest privilege. Normally four rings are used, numbered 0-3.
  • In a non virtualized environment / case, the normal Linux Operating System's kernel runs in ring-0, where it has full access to all of the hardware. The user-space programs are run in ring-3, which has limited access to all hardware resources. For any access needed to the hardware, the user-space programs request to system programs in ring-0.
  • Para-virtualization works using Ring-Compression. In this case, the hyper-visor itself runs on ring-0. The kernels of Dom-0 and Dom-Us of the PVMs are run in lower privileged rings, in the following manner.
    • On 32-bit x86 machines, Dom-0 and Dom-U kernels run in ring-1; and segmentation is used to protect memory.
    • On 64-bit architectures, segmentation is not supported. In such case, kernel-space and user-space for virtual domains must both run in ring-3. Paging and context switches are used to protect the hyper-visor, and also protect the kernel-address-space and user-spaces of virtual domains, from each other.
  • Hardware-assisted Full Virtualization works differently. The new processor instructions (Intel VT-x and AMD-V) places the CPU is new execution modes, depending on situation.
    • When executing instructions for a hardware-assisted virtual machine (HVM), the CPU switches to "non-privileged" or "non-root" or "guest" mode, in which the VM kernel can run in ring-0 and the userspace can run in ring-3.
    • When an instruction arrives, which must be trapped by the hyper-visor, the CPU leaves this mode and returns to the normal "privileged" or "root" or "host" mode, in which, the privileged hyper-visor us running in ring-0.
    • Each processor for each virtual machine, has a virtual machine control block/structure associated with it. This block is 4KB in size, also known as a page. This block/structure stores the information about the state of the processor in that particular virtual machine's "guest" mode.

Note that this can be used in parallel to para-virtualization. This means that some virtual machines may be setup to run as para-virtualized, and at the same time, on the same physical host, some virtual machines may use the virtualization extensions (Intel VT-x/ADM-V). This is of-course only possible on a physical host, which has these extensions available and enabled in the CPU.

Xen Networking Concepts / Architecture

This is the most important topic after the basic Xen Architecture concepts. Disks / Virtual Block Devices will be following it.

Xen provides two types of network connectivity to the guest OS / Dom-Us.

  1. Shared Physical Device (xenbr0)
  2. Virtual Network (virbr0)
  • Shared Physical Device (xenbr0)
    • When a VM needs to have the IP of the same network, to which Xen physical host is connected to, it needs to be "bridged" to the physical network. "xenbr0" is the standard bridge or a virtual switch , which connects VMs to the same network, where the physical host itself is connected.
    • Xenbr0 never has an IP assigned to it, because it is just a forwarding switch/bridge.
    • This kind of connectivity is used when the VMs have publicly accessible services running on them, such as an email server, a web server, etc.
    • This is a much easier mode of networking / network connections to the VMs.
    • This was the default method of networking VMs in RHEL 5.0
  • Virtual Network (virbr0)
    • When a VM does not have to be on the same network as of the physical host itself, it can be connected to another type of bridge / virtual switch, which is private to the physical host only. This is named "virbr0" in Xen, and in KVM installations too.
    • The Xen physical host is assigned a default IP of 192.168.122.1 , and is connected to this private switch . All VMs created / configured to connect to this switch get the IP of the same private subnet 192.168.122.0/24 . The physical host's 192.168.122.1 interface works as a gateway for these VM's traffic to go out of the physical host, and allow them to communicate with the outside world.
    • This communication is done through NAT. The physical host / Dom-0 acts as NAT router, and also as a DHCP and DNS server for the virtual machines connected to virbr0. A special service running on physical host/ Dom-0 , named "dnsmasq" does this.
    • It should be noted that the DHCP service running on physical host / Dom-0 does not create any conflict with any other DHCP server on the network , to which the public interface of the physical host is connected. Thus it is safe.
    • This mechanism is mostly used in test environments, as it allows each developer / administrator to have his own virtual machines, in a sandbox / isolated environment, within his PC / laptop, etc.
    • Since the machines do not obtain the IP from the public network of the physical host, the public IPs are not wasted.
    • Another advantage is that even if your physical host is not connected to any network, the virbr0 still has an IP (192.168.122.1), thus all virtual machines and the physical host are always connected to each other. This is not possible in Shared Physical Device mode (xenbr0), because if the network cable is unplugged from the physical host and it does not have an IP of it's own, the virtual machines, also don't have an IP of their own. (Unless if they are configured with static IPs, of-course).
    • Some service providers do not provide the bridging functionality (xenbr0). ServerBeach is one of them. Thus I had to use virbr0 and twist the firewall rules to make this virtual machine accessible from outside. (This example is not covered in this text. I may explain it in the CBT.)
    • This is the default network connectivity method for VMs in RHEL 5.1 and onwards.


  • Each Xen guest domain / Dom-U, can have up to three (3) virtual NICs assigned to it. The physical interface on the physical host / Dom-0 is renamed to "peth0" (Physical eth0) . This becomes the "uplink" from this Xen physical host, to the physical LAN switch. In fact a virtual network cable is connected to from this peth0 to the virtual bridge created by Xen.
  • Virtual Network Interfaces with the naming scheme of vifD.N are created in Dom-0, as network ports for the bridges, and are connected to the virtual network interfaces (eth0) of each virtual machine. "D" is the "Domain-ID" of the Dom-U and "N" is the "NIC number" in the Dom-U. It should be noted that vifD.N never has an IP assigned to it. The IP is assigned to the actual virtual network interface in the domain it is pointing to.
  • Note that both "peth0" and "vifD.N" have the MAC address of "FE:FF:FF:FF:FF:FF". This is because they are (so-called) ports connecting to outer world (in case of peth0), and internal virtual network (in case of vifD.N). This indicates that the actual MAC address of the back-end virtual network interface card of the domain will take precedence.
  • Also note that "virbr0" will always have a MAC address of "00:00:00:00:00:00". Remember this is a NAT router component, and all MAC addresses are stripped off at the router level / layer-3. Therefore this indicates that the virtual machines can have any MAC address, but when they go out of this interface, they are replaced with the MAC address of the eth0 of Dom-0. (This is because the packet will be NATed).
  • All virtual NICs of the VMs have MAC address of the vendor code "00:16:3E".
  • Dom-0's physical interface was renamed to peth0 in the point explained above. Therefore for Dom-0 to communicate to th e world, it needs a network interface. A virtual network interface "eth0" is assigned to it. This eth0 of Dom-0 is then assigned appropriate IP. This eth0 of Dom-0 is connected to the virtual bridge already created by Xen. The corresponding virtual interface on Dom-0 for this one is vif0.0 . See the computer output a little below to understand this.
  • For example, there is a Dom-U named "Fedora-12" with domain-id of "3", and that Dom-U has only one virtual NIC, "eth0", then there will be an interface defined in Dom-0, by the name vif3.0 , which means that a virtual network cable from the Dom-0's bridge is connected to the eth0 of domain "Fedora-12". In the computer output below, this vif (vif3.0) is not shown, because that is a freshly installed Xen server, which doesn't have any VMs created in it at the moment.


Here is the output of ifconfig command, from a freshly installed Xen physical host. There are a few concepts, which you must master, before you move on.

[root@xenhost ~]# ifconfig                            
eth0      Link encap:Ethernet  HWaddr 00:13:72:81:3A:3D  
          inet addr:192.168.1.20  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::213:72ff:fe81:3a3d/64 Scope:Link             
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1             
          RX packets:42 errors:0 dropped:0 overruns:0 frame:0            
          TX packets:37 errors:0 dropped:0 overruns:0 carrier:0          
          collisions:0 txqueuelen:0                                      
          RX bytes:5919 (5.7 KiB)  TX bytes:5150 (5.0 KiB)               

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host     
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0                           
          RX bytes:588 (588.0 b)  TX bytes:588 (588.0 b)      

peth0     Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
          inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1     
          RX packets:32 errors:0 dropped:0 overruns:0 frame:0
          TX packets:32 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000                         
          RX bytes:5147 (5.0 KiB)  TX bytes:4762 (4.6 KiB)     
          Interrupt:16 Memory:fe8f0000-fe900000                

vif0.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
          inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1     
          RX packets:64 errors:0 dropped:0 overruns:0 frame:0
          TX packets:62 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0                            
          RX bytes:9412 (9.1 KiB)  TX bytes:7239 (7.0 KiB)     

virbr0    Link encap:Ethernet  HWaddr 00:00:00:00:00:00
          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
          inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:468 (468.0 b)

xenbr0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
          RX packets:11 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:578 (578.0 b)  TX bytes:0 (0.0 b)

[root@xenhost ~]#

And here is the output of ifconfig command from a KVM physical host. This is being show here, just to show the difference in networking model between Xen and KVM.

[root@kworkbee ~]# ifconfig 
eth0      Link encap:Ethernet  HWaddr 00:1C:23:3F:B8:80  
          inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::21c:23ff:fe3f:b880/64 Scope:Link            
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1            
          RX packets:750557 errors:0 dropped:0 overruns:0 frame:0       
          TX packets:503960 errors:0 dropped:0 overruns:0 carrier:0     
          collisions:0 txqueuelen:1000                                  
          RX bytes:857738915 (818.0 MiB)  TX bytes:55480683 (52.9 MiB)  
          Interrupt:17                                                  

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host     
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:326 errors:0 dropped:0 overruns:0 frame:0
          TX packets:326 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:11701 (11.4 KiB)  TX bytes:11701 (11.4 KiB)

virbr0    Link encap:Ethernet  HWaddr 8A:E3:7A:EA:A6:A3
          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
          inet6 addr: fe80::88e3:7aff:feea:a6a3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:144 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:17135 (16.7 KiB)

[root@kworkbee ~]#


One of the most important piece of text related to Xen Networking is at http://wiki.xensource.com/xenwiki/XenNetworking . This article has diagrams to explain Xen Networking concepts. Reproducing some of the text here:

Xen creates, by default, seven pair of "connected virtual ethernet interfaces" for use by dom0. Think of them as two ethernet interfaces connected by an internal crossover ethernet cable. veth0 is connected to vif0.0, veth1 is connected to vif0.1, etc, up to veth7 -> vif0.7. You can use them by configuring IP and MAC addresses on the veth# end, then attaching the vif0.# end to a bridge.

Every time you create a running domU instance, it is assigned a new domain id number. You don't get to pick the number, sorry. The first domU will be id #1. The second one started will be #2, even if #1 isn't running anymore.

For each new domU, Xen creates a new pair of "connected virtual ethernet interfaces", with one end in domU and the other in dom0. For linux domU's, the device name it sees is named eth0. The other end of that virtual ethernet interface pair exists within dom0 as interface vif<id#>.0. For example, domU #5's eth0 is attached to vif5.0. If you create multiple network interfaces for a domU, it's ends will be eth0, eth1, etc, whereas the dom0 end will be vif<id#>.0, vif<id#>.1, etc.

When a domU is shutdown, the virtual ethernet interfaces for it are deleted.

There is another excellent explanation of some of the Xen Networking Concepts at http://www.novell.com/communities/print/node/4094 . I am reproducing some part of it below. You should still read this document in full.

The following outlines what happens when the default Xen networking script runs on single NIC system:

  1. the script creates a new bridge named xenbr0
  2. "real" ethernet interface eth0 is brought down
  3. the IP and MAC addresses of eth0 are copied to virtual network interface [1] veth0
  4. real interface eth0 is renamed peth0
  5. virtual interface veth0 is renamed eth0
  6. peth0 and vif0.0 are attached to bridge xenbr0 as bridge ports
  7. the bridge, peth0, eth0 and vif0.0 are brought up

The process works wonderfully if there is only one network device present on the system. When multiple NICs are present, this process can be confused or limitations can be encountered.

In this process, there is a couple of things to remember:

  • pethX is the physical device, but it has no MAC or IP address
  • xenbrX is the bridge between the internal Xen virtual network and the outside network, it does not have a MAC or IP address
  • vethX is a usuable end-point by either Dom0 or DomU and may or may not have an IP or MAC address
  • vifX.X is a floating end-point for vethX's that is connected to the bridge
  • ethX is a renamed vethX that is connected to xenbrX via vifX.X and has an IP and MAC address

netloop

In the process of bringing up the networking, veth and vif pairs are brought up. For each veth device, there is a coresponding vif device. The veth devices are given to the DomU's, while the corresponding vif device is attached to the bridge. By default, seven of the veth/vif pairs are brought up. Each physical device consumes a veth/vif pair, thereby reducing the number of veth/vifs available for DomU's.

When a new DomU is started, a free veth/vif pair is used. The vif device is given to the DomU and is presented within DomU as ethX. (note: the veth/vif bridge is loosely like an ethernet cable. The veth end is given to either Dom0 or DomU and the vif end is attached ot the bridge)

For most installations, the idea of having seven virtual machines run at the same time is somewhat difficult (though not impossible). However, for each NIC card there has to be bridge, peth, eth and vif device. Since eth and vif devices are pseudo devices, the number of netloops is decremented for each physical NIC beyond the assume single NIC.

   * With one NIC, 7 veth/vif pairs are present
   * Two NICs will reduce the veth/vif pairs available to 5
   * Two NICs bonded will reduce the veth/vif available to 4
   * Three NICs will reduce the veth/vifs available to 3
   * Three NICs bonded presented as a single bond leaves 0 veth/vifs available
   * Four NICs will result in a deficit of -1 veth/vifs
   * Four NICs bonded into one bond results in a deficit of -3 veth/vifs
   * Four NICs bonded into two bonds results in a deficit of -4 veth/vifs

Where most people run into problems is with bonding. The network-multinet script enables the use of bonding. It is easy to see where one could run into trouble with multiple NICs.

The solution is to increase the number of netloop devices, thereby increasing the number of veth/vif pairs available for use.

  • In /etc/modprobe.d create and open a file for editing called "netloop"
  • Add the following line to it
      options netloop nloopbacks=32
  • Save the file
  • Reboot to activate the setting

It is recommended to increase the number of netloops in any situation where multiple NICs are present. When a deficit of netloops exist, sparadioc and odd behavior have been observed including completely broken networking configuration.

Check-list for performing an actual Xen installation on Physical Host

Alright, after necessary theory covered in the text before this, we will now go into the actual fun part. Which is installation of Xen on a machine. Here are things to check before you start.

  • Make sure that PAE is supported by your processor(s) at a minimum, which is needed by Xen, if para-virtulization is needed.
  • Make sure you have enough processors / processing power for both Dom-0 and Dom-U to function properly.
  • If you want to use Hardware-assisted full virtualization, make sure that Intel VT-x/AMD-V extensions are available in your processor(s).
  • At least 512 MB RAM for each domain, including Dom-0 and Dom-U. It can be brought down to 384 MB , or even 256 MB in some cases, depending on the software configurations you select.
  • Enough space in active partition of the OS, for each VM, if you want to use one large file as visrtual disks for your virtual machines. Xen creates virtual disks in the location: /var/lib/xen/images .
  • You can also create virtual disks on Logical Volumes and snap-shots, as well as on a SAN, normally ISCSI based IP-SAN.
  • Enough free disk area, to create raw partitions, which can be used by virtual machines, as their virtual disks. In this case, free space in active linux partitions is irrelevant.
  • Install Linux as you would normally. (We are only focusing on RHEL, CENTOS, Fedora in this text, though there are other distributions out there too.)
  • You will need an X interface on this machine, if you want to use virt-manager, which is the GUI interface for libvirt, which in-turn controls Xen, KVM and QEMU.
  • You may want to select the package group named "Virtualization" during install process.
  • If you did not select the package-group "Virtualization" during install process, install it now.
  • Make sure that you have kernel-xen , xen , libvirt and virt-manager are installed.
  • Make sure that your default boot kernel in GRUB is the one with "kernel-xen" in it. You can also set "DEFAULTKERNEL=kernel-xen in /etc/sysconfig/kernel file.
  • You may or may not want to use SELINUX. If you are not comfortable with it, disable it. Xen and KVM are SELINUX aware, and work properly (with more security) when used (properly) on top of SELINUX enabled Linux OS.
  • Lastly, make sure that xend and libvirtd services are set to ON on boot up.
chkconfig --level 35 xend on
chkconfig --level 35 libvirtd on
  • It is normally quite helpful to disable un-necessary services, depending on your requirements. I normally disable sendmail, cups, bluetooth, etc, on my servers.
  • It is important to know that while creating para-virtual machines, you cannot use the ISO image of your Linux distribution, stored on physical host, trying to use it as install media for the VMs being created. When you need to create PV machines, you will need an exploded version of the install CD or RHEL/CENTOS 4.5 or higher, accessible to this physical host. Normally this is done by storing the exploded tree of the installation CD/DVD on the hard disk of the physical host and making it available through NFS, HTTP or FTP. Therefore you must cater for this additional disk space requirement, when you are installing base OS on the physical host.
Personal tools