UNIX Systems Administration

Friday, February 18, 2011

Redundant setup for Vmware ESX using commodity hardware

We ran a hardware consolidation project for one of our clients targeting to reduce hardware footprint and power consumption in the data centre. Among numerous other activities, big part of the site was virtualized to run on Vmware ESX servers. There were no new equipment purchases planned, so we had to use some of our old but still pretty powerful boxes. In particular, one Dell PowerEdge 1855 blade servers enclosure had all of its 10 blades installed with ESX and numerous physical boxes were moved (with P2V converter) to the blades. The blades had 2 local hard drives configured with RAID1 using its hardware RAID controller. The disks were low capacity (less then 100GB).

One of the main problems in this configuration was obviously the storage. Storing the VMs on the local disks of each blade would have number of critical disadvantages compared to running on a good and properly configured SAN/NAS array:
1.    Reliability – SAN/NAS arrays is usually more reliable than local disks.
2.    Space – local disks are much more limited in space and ability to expand.
3.    Redundancy – with VM stored on SAN, any ESX SAN-connected server can run it.
       The VM can be moved to another ESX in the case its current server needs maintenance
       or in an event of a crash.

Although we had SAN array which had enough capacity for the project we could not connect the blades to the SAN network as the enclosure does not support HBAs and SAN switches modules.

Here is what I did to configure the blades to use SAN:

Installed 2 Linux servers with 2 HBAs each. Connected to SAN in the classic redundant schema - 2 SAN switches forming 2 separate fabrics, each connected to 2 separate SPs of the SAN array and each HBA of each server connected to a separate switch.

The servers are allocated the same set of SAN LUNs to be used for storing VMs.

The servers have IET (iscsi-target) software installed and exporting the same set of the LUNs as iSCSI targets. IMPORTANT – the exported target for the same LUN should have the same target ID on both servers. In this way iSCSI initiators will “see” the LUN as the same resource on both IET servers and thus will have 2 possible paths for reaching it.

The servers will be connected with 2 NICs to 2 separate network switches and have Linux NIC Bonding implemented for the network connection failover. Alternatively, if more network bandwidth is required, the 2 NICs can be connected to the same switch and the ports configured with Port Aggregation for NIC load balancing and failover. With 4 NICs on the server the two configurations above can be combined to have NIC and Switch load balancing and failover.

Blade servers’ enclosure or any standalone server on which ESX will run is connected with at least two NICs to separate network switches.

For each ESX server:

In ESX network configuration, the NICs are attached to the same virtual switch and grouped with NIC Teaming for failover.

In ESX Storage Adapters configuration:

          * Choose iSCSI adapter and click Properties
          * In CHAP configuration, choose “Use CHAP” and fill in the UserName and Secret
              as it configured on the iSCSI servers above (assume that the configuration is
              similar on both servers. Otherwise, separate CHAP configurations will need
              to be configured for each target)
          * In Static Discovery tab, choose Add and fill in the server IP and the target ID.
              Add all the targets you need to be accessible from this ESX server. For each
              target, add it twice – once from each IET server. Again, while the servers’ IPs
              are different - the target IDs should be identical for the same LUN being exported.
          * For CHAP and Advanced settings choose “Inherit from Parent” or adjust manually
              if required.
          * Usually Rescan is required after the change.
          * You will start seeing the devices and the list of possible paths to them once you get
              back to the Storage Adapters Configuration window.

In ESX Storage configuration:

          * Choose Add Storage
          * In the wizard window choose Disk/Lun
          * Next window will show new targets. Choose the target to add.
          * For VMFS mount options specify “Keep the existing signature”
          * Finish the wizard and the target will appear in the list of the data stores

Select the new data store and go to Properties.
Choose Manage Paths – here you can choose the protocol to use for LB/FailOver (RR/MRU/Fixed).

This configuration is No Single Point of Failure (NSPF) assuming SAN box is NSPF device (which is usually the case for most enterprise level SAN boxes).

The solution is using existing (relatively old) hardware and does not require any financial investment. Yet, its providing obvious benefits and serving its purpose quite well.

Here is the diagram of the implementation above:

Tuesday, December 21, 2010

Office remote VPN solution.

The need and the problem:
The company’s office network is protected with CheckPoint firewall. The remote user access VPN solution to the sites is possible with use of CheckPoint secure remote client only. Unfortunately, there is no good VPN client for Linux platform and new editions of Microsoft OSs (like 64bit Vista) that works with CheckPoint firewall, at least for the firewall version in use (NGX60).

The solution:
OpenVPN is a free open source solution. Source code can be compiled to run on virtually any OS.
On the server side, Pfsense firewall is installed in the office, one interface of which is connected to the ISP router and is statically using one of the assigned External IPs. The other interface is connected to the local network. The drawing below shows the described configuration:

The problems with the solution and the ways to work around:
1.    Routing issue:
In the regular setup of Pfsense OpenVPN, an unused subnet is used for VPN clients and is routed to the main LAN network. As in our case the Pfsense firewall is not the main GW in the target network, the default gateway for any host in the network is set to be a network main router or the main firewall node if routing is on the firewall. Therefore, the VPN subnet clients won’t be able to access the internal networks as the internal hosts will route the response packets through their default routers and it will never get back to the Pfsense firewall.
Solution:
One of the ways to fix it is to set additional route on the main router or firewall to re-route traffic targeting the VPN network to Pfsence node. In case the LAN is part of a broader Site to Site VPN community and the VPN clients will need to reach other sites’ networks, the new VPN subnet would need to be configured on all firewalls and all routing devices across the community.

Here is my solution to the problem:

Step 1:    As our VPN community comprises of number of sites I used bridged VPN connections, solution described in http://doc.pfsense.org/index.php/OpenVPN_Bridging.
That is actually the way to provide OpenVPN remote client the main LAN IPs from a specific range. As a result the clients are in the main LAN network and can reach any host in it. The routing issue does not exist as the traffic is local to the LAN subnet and thus no GW is involved.
Additional benefit of remote hosts having main LAN IPs is - the topology set for the main LAN is also valid for the remote OpenVPN clients. (Provided proper routes are set on the Pfsense FW). Meaning that traffic from VPN client will be allowed in the same way it is allowed for the regular LAN hosts, whereas If the remote hosts are assigned IPs from a separate subnet, which is not in use in any network of the VPN community, this subnet will need to be defined on all firewalls/routers to allow the same traffic.

Step 2:    For the remote hosts to access other subnets, routing information should be added in two places (1st in Custom Option of OpenVPN server configurations; 2nd is the route in Pfsense static routes table ) :

1st:    OpenVPN server configurations -> Custom options: Add “route <NETWORK> <MASK>”. If more than one option, separate with “;”.

2nd:    System->Static Routes: Add new route with the following:
Interface – LAN
Destination network - <NETWORK> / <BIT COUNT> (according to the <MASK> above)
Gateway – The FW/router which can reroute to this network. In my case it is the main network firewall, the default GW for the main LAN.
Description – As required

2.    Not all traffic passing over VPN tunnel:
The setup above will redirect traffic targeted to the main LAN and the routed subnets through the VPN tunnel. The rest of the traffic will keep going in its regular way. This means, it won’t be encrypted. To reroute all the traffic through the VPN tunnel, the following custom option should be used:
In OpenVPN server configurations -> Custom options: push "redirect-gateway def1". If more than one option, separate with “;”.
That will redirect all traffic through the Pfsense GW. It can slow things down as all the traffic will be encrypted and will pass through an extra hop - the Pfsense GW. On the other hand, that can provide additional security, especially important with unsecure and/or wireless networks.

    Here is how the Custom Options Field looks like:

Tuesday, November 16, 2010

Pfsense to Checkpoint IPSec tunnel issues

I had to set an IPSec tunnel between two sites, one running Pfsense (PF) as its perimeter Firewall and the other Checkpoint (CP). PF had one local subnet under it. CP had multiple subnets. The superset of all these subnets in CIDR on CP included more subnets that were actually configured. After setting the tunnel between the single PF network and the superset of networks on CP, the traffic could only pass in one direction – from clients under PF to the clients under CP.

Here are the errors I got on each end:

PF:
   Jun 7 05:25:24 racoon: ERROR: failed to pre-process packet.
   Jun 7 05:25:24 racoon: ERROR: failed to get sainfo.
   Jun 7 05:25:24 racoon: ERROR: failed to get sainfo.
   Jun 7 05:25:24 racoon: INFO: respond new phase 2 negotiation:

CP:
   Error: "Packet is dropped because there is no valid SA -
   please refer to solution sk19423 in SecureKnowledge Database for more information"

Many the sources I looked for to find a clue were pointing to incorrect subnet specifications on one of the ends of the tunnel. The superset of networks on CP was certainly the culprit and here is what finally worked for me:

Step1 (on CP firewall):
On CheckPoint R60 NG there is an option “ike use largest possible subnets”. By default it is set to “true”. This force the VPN on the CP to summarize subnet information sent in phase 2 of IKE key exchange. In the case of more than one subnet some superset of subnets is calculated and sent. That might be different from the destination network configured on PF side, which can be either a superset of subnets or even one of the subnets on CP.

To fix this behavior the option above need to be changed to “false”.
There are two ways to do it:

1.    Through Check Point database tool. That is the option I used.
   a.    Close all Smart DashBoard sessions.
          Run the database tool and connect with your regular credentials.
      In my installation it was found in:
      “C:\Program Files\CheckPoint\SmartConsole\R60\PROGRAM\GuiDBedit.exe”
   b.    Under Firewall->Properties find the option “ike use largest possible subnets” and
          change the value to “false”.
   c.    Save and exit the Database tool.
   d.    Push the policy to the node(s) in the regular way.

2.    Through CLI using DBEDIT:
   a.    Close all Smart DashBoard sessions
   b.    On a command line run “dbedit”
   c.    Run – “modify properties firewall_properties ike_use_largest_possible_subnets false”
   d.    Run – “update properties firewall_properties”
   e.    Quit
   f.    Install policy to the node(s) in the regular way.

Step 2 (on PF firewall):
For each of the subnets on CP, build separate IPSEC tunnel on PF.
These tunnels properly worked in both directions.

It looks like the issue is common when the tunnel is set between CheckPoint and other types of VPN gateways like Pfsense or CISCO PIX.
When similar tunnels were set between two CheckPoint gateways, I had no such issues and the two were able to properly calculate their set of subnets on both ends. In this case one VPN tunnel including all subnets was sufficient.

Monday, November 1, 2010

Zimbra 6.0.8 mail processing issue

I’ve installed a new Zimbra 6.0.8 server approximately one month ago. Everything was working fine till one day, when the server stopped processing incoming emails due to a configuration issue in postfix’ main.cf.
The issue is pretty weird as no one altered any configurations and no updates were manually or automatically applied to the server during the last month. It just stopped working.

The zimbra.log was showing the following error message on acceptance of a new email:

Nov 1 12:18:19 postfix/smtpd[1660]: fatal: parameter "smtpd_recipient_restrictions": specify at least one working instance of: check_relay_domains, reject_unauth_destination, reject, defer or defer_if_permit
Nov 1 12:18:19 postfix/master[754]: warning: process /opt/zimbra/postfix/libexec/smtpd pid 1660 exit status 1
Nov 1 12:18:19 postfix/master[754]: warning: /opt/zimbra/postfix/libexec/smtpd: bad command startup -- throttling

The corresponding line in /opt/zimbra/postfix/conf/main.cf was:
       smtpd_recipient_restrictions =
And this is what was causing the issue.
The first thing I did was update of main.cf directly with:
       smtpd_recipient_restrictions = permit_mynetworks,reject_unauth_destination
Then I reloaded postfix (# /opt/zimbra/postfix/sbin/postfix reload).
That solved the issue, but obviously that was not persistent across Zimbra restarts…

I needed a way to make the change in main.cf persistent while keeping the current recipient restriction policy.

Here is the solution that worked for me:
The zmlocalconfig (in accordance with the /opt/zimbra/conf/ localconfig.xml) was:
     # zmlocalconfig | grep recipient_restrictions
     # smtpd_recipient_restrictions = permit_mynetworks,reject_unauth_destination
Which I guess is the actual configuration for smtpd_recipient_restrictions behavior.

On the other hand, the file /opt/zimbra/conf/zmmta.cf, which also controls the way different configuration files (including main.cf) are built during Zimbra restarts, had the following:
        POSTCONF smtpd_recipient_restrictions FILE postfix_recipient_restrictions.cf

Assuming that the zmlocalconfig is the configuration which is in effect, I changed zmmta.cf file’s line to have the needed value built into main.cf:
Changed this:
    POSTCONF smtpd_recipient_restrictions FILE postfix_recipient_restrictions.cf
To this:
    POSTCONF smtpd_recipient_restrictions permit_mynetworks,reject_unauth_destination

As expected, after Zimbra restart, the required line was built into main.cf.

AGAIN: This solution does not change the recipient restriction behavior while solving the email processing issue, and making the change persistent.
In case more flexibility in recipient restriction policy is required, I guess the main.cf through zmmta.cf and/or zmlocalconfig need to be updated to use the postfix_recipient_restrictions.cf file which than can be changed as needed.
It looks like this link has some sort of solution - http://wiki.zimbra.com/wiki/RestrictPostfixRecipients

Thursday, October 21, 2010

Redirecting emails to a PIPE in Zimbra

Sometimes there is a need to redirect/forward incoming emails to STDIN of a script/program or in other words redirect emails to a pipe. Although such program can connect to email server using POP or IMAP to process the emails, a redirect through pipe appears to be much more efficient. Such program can also have much simpler logic as it just needs to deal with the incoming emails and not to worry about differentiation between the processed and unprocessed ones.

I had a need to set such redirect of emails to a PHP script for one of our projects. Zimbra server does not support this through its GUI, so here are the steps I did to make it happen:

Email accounts that need to redirect the incoming emails to the pipe should forward a copy (or redirect without leaving local copy, depends on the need) of an email to the address of a special account. This special account will be configured to redirect any incoming emails to the pipe.If the email server name is “mymailserver.my.domain.com”, I will create a new local account: my_pipe_acct@mymailserver.my.domain.com
In the forwarding options of the account above I’ll forward all messages to the following dummy address. It can be anything, even not a valid address: my_pipe_acct@my_pipe.mymailserver.my.domain.com
New service "my_pipe" will need to be set in master.cf file and will define pipe to “script.php” script that will process the emails. It actually says that if email needs to be processed with service “my_pipe”, then “script.php” is the service to invoke.
Add this to /opt/zimbra/postfix/conf/master.cf.in file:
my_pipe unix - n n - - pipe
flags= user=zimbra argv=/path/to/pipe/script/script.php
and run # su – zimbra –c "zmmtactl stop && zmmtactl start"
The lines should be added before # AMAVISD-NEW line. Note that the 2nd line should start with space. Don't add directly to master.cf file as it will be overwritten by Zimbra restarts. Edit master.cf.in!!
Now we need to define that any messages arriving to this dummy address (or actually any address of a type <XYZ>@my_pipe. mymailserver.my.domain.com ) will be redirected to "my_pipe" service for processing. This is done in /opt/zimbra/postfix/conf/transport.
Add this line there:
my_pipe.mymailserver.my.domain.com my_pipe
Run #/opt/zimbra/postfix/sbin/postmap /opt/zimbra/postfix/conf/transport.
To configure zimbra to check the transport_maps, the following line in /opt/zimbra/postfix/conf/main.cf should look like:

         transport_maps = hash:/opt/zimbra/postfix/conf/transport,proxy:ldap:/opt/zimbra/conf/ldap-transport.cf
         NOTE: Don't update main.cf directly as it will be overwritten by Zimbra restarts.
         Instead, run the following:
        # su – zimbra
        # zmlocalconfig -e postfix_transport_maps="hash:/opt/zimbra/postfix/conf/transport,proxy:ldap:/opt/zimbra/conf/ldap-transport.cf"
    This will permanently add the line to the main.cf.

   To enable:
        #/opt/zimbra/postfix/sbin/postfix reload
        OR
        # su – zimbra –c "zmcontrol stop"
        # su – zimbra –c "zmcontrol start"

Obviously it is possible to configure more than one pipe on a server. Just repeat the same steps above to set another special account, which will forward to yet another dummy address, which will be redirecting to another pipe service, etc.