Friday, February 18, 2011

Redundant setup for Vmware ESX using commodity hardware

We ran a hardware consolidation project for one of our clients targeting to reduce hardware footprint and power consumption in the data centre. Among numerous other activities, big part of the site was virtualized to run on Vmware ESX servers. There were no new equipment purchases planned, so we had to use some of our old but still pretty powerful boxes. In particular, one Dell PowerEdge 1855 blade servers enclosure had all of its 10 blades installed with ESX and numerous physical boxes were moved (with P2V converter) to the blades. The blades had 2 local hard drives configured with RAID1 using its hardware RAID controller. The disks were low capacity (less then 100GB).

One of the main problems in this configuration was obviously the storage. Storing the VMs on the local disks of each blade would have number of critical disadvantages compared to running on a good and properly configured SAN/NAS array:
1.    Reliability – SAN/NAS arrays is usually more reliable than local disks.
2.    Space – local disks are much more limited in space and ability to expand.
3.    Redundancy – with VM stored on SAN, any ESX SAN-connected server can run it.
       The VM can be moved to another ESX in the case its current server needs maintenance
       or in an event of a crash.


Although we had SAN array which had enough capacity for the project we could not connect the blades to the SAN network as the enclosure does not support HBAs and SAN switches modules.

Here is what I did to configure the blades to use SAN:

  • Installed 2 Linux servers with 2 HBAs each. Connected to SAN in the classic redundant schema - 2 SAN switches forming 2 separate fabrics, each connected to 2 separate SPs of the SAN array and each HBA of each server connected to a separate switch.
  • The servers are allocated the same set of SAN LUNs to be used for storing VMs.
  • The servers have IET (iscsi-target) software installed and exporting the same set of the LUNs as iSCSI targets. IMPORTANT – the exported target for the same LUN should have the same target ID on both servers. In this way iSCSI initiators will “see” the LUN as the same resource on both IET servers and thus will have 2 possible paths for reaching it.
  • The servers will be connected with 2 NICs to 2 separate network switches and have Linux NIC Bonding implemented for the network connection failover. Alternatively, if more network bandwidth is required, the 2 NICs can be connected to the same switch and the ports configured with Port Aggregation for NIC load balancing and failover. With 4 NICs on the server the two configurations above can be combined to have NIC and Switch load balancing and failover.
  • Blade servers’ enclosure or any standalone server on which ESX will run is connected with at least two NICs to separate network switches.

For each ESX server:
  • In ESX network configuration, the NICs are attached to the same virtual switch and grouped with NIC Teaming for failover.
  • In ESX Storage Adapters configuration:
          *  Choose iSCSI adapter and click Properties
          *  In CHAP configuration, choose “Use CHAP” and fill in the UserName and Secret
              as it configured on the iSCSI servers above (assume that the configuration is
              similar on both servers. Otherwise, separate CHAP configurations will need
              to be configured for each target)
          *  In Static Discovery tab, choose Add and fill in the server IP and the target ID.
              Add all the targets you need to be accessible from this ESX server. For each
              target, add it twice – once from each IET server. Again, while the servers’ IPs
              are different - the target IDs should be identical for the same LUN being exported.
          *  For CHAP and Advanced settings choose “Inherit from Parent” or adjust manually
              if required.
          *  Usually Rescan is required after the change.
          *  You will start seeing the devices and the list of possible paths to them once you get
              back to the Storage Adapters Configuration window.

  • In ESX Storage configuration:
          *  Choose Add Storage
          *  In the wizard window choose Disk/Lun
          *  Next window will show new targets. Choose the target to add.
          *  For VMFS mount options specify “Keep the existing signature”
          *  Finish the wizard and the target will appear in the list of the data stores 
  • Select the new data store and go to Properties.
  • Choose Manage Paths – here you can choose the protocol to use for LB/FailOver (RR/MRU/Fixed).

This configuration is No Single Point of Failure (NSPF) assuming SAN box is NSPF device (which is usually the case for most enterprise level SAN boxes).


The solution is using existing (relatively old) hardware and does not require any financial investment. Yet, its providing obvious benefits and serving its purpose quite well.

 
Here is the diagram of the implementation above: