9 February 2010 7 Comments

XenServer 5.5 network performance woes

If you’re using Citrix XenServer, then you’ve undoubtedly ran into network performance issues. These issues are even more apparent when your deployments utilize an iSCSI SAN. I have been working through this problem for the last 2 months, and I’ve found countless complaints about Citrix XenServer under performing on consistent bases. The common denominator here is the iSCSI SAN connected to the XenServers over copper wire on a GIG network.

In most cases, users will notice that their PV’s network connection will be slow, the SAN will not burst over 100Mbps, which is pathetic! To make matters worse, as your PV’s receive more traffic, they’ll begin to display network fatigue and randomly drop packets or just come to a near screeching halt! So far, Citrix has not yet released anything concrete for users to implement. In fact, they’ve denied, rather not discussed the issue in public. I’m sure they’ll continue to do so, until they’ve determined the root cause of the issue. In the meantime, there are a few steps you can take to correct the issue. To start off, you’ll want to disable the VIF/PIF checksums using the script below.

If you’re using a Windows 2003 PV, then you can hack the registry and add “DisableTaskOffload” to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters:

  1. Reinstall XenTools 5.5
  2. On the Virtual Machines, Click Start, click Run, type regedit, and then click OK.
  3. Locate and then click the following registry subkey:
    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
  4. In the right pane, make sure that the DisableTaskOffload registry entry exists. If this entry does not exist, follow these steps to add the entry:
    a. On the Edit menu, point to New, and then click DWORD Value.
    b. Type DisableTaskOffload, and then press ENTER.
  5. Click DisableTaskOffload.
  6. On the Edit menu, click Modify.
  7. Type 1 in the Value data box, and then press ENTER.
  8. Exit Registry Editor.
  9. Restart all Virtual Machines

Bash script to disable VIF/PIF checksum (proceed at your own risk):

#!/bin/bash
 

if_modes="rx tx sg tso ufo gso"

if [[ "$1" == "--local" || "$1" == "-l" ]]; then
    echo -n "disabling checksum offloading for local devices... "
    for iface in $(ifconfig | awk '$0 ~ /Ethernet/ { print $1 }'); do
        for if_mode in ${if_modes}; do
          ethtool -K $iface $if_mode off 2>/dev/null
        done
    done
    echo "done."
else
    echo -n "disabling checksum offloading in xapi settings... "
    for VIF in $(xe vif-list --minimal | sed -e 's/,/ /g')
    do
        ###xe vif-param-clear uuid=$VIF param-name=other-config
        for if_mode in ${if_modes}; do
            xe vif-param-set uuid=$VIF other-config:ethtool-${if_mode}="off"
        done
    done
    for PIF in $(xe pif-list --minimal | sed -e 's/,/ /g')
    do
        ###xe pif-param-clear uuid=$PIF param-name=other-config
        for if_mode in ${if_modes}; do
            xe pif-param-set uuid=$PIF other-config:ethtool-${if_mode}="off"
        done
    done
    echo "done."
fi

7 Responses to “XenServer 5.5 network performance woes”

  1. Marlon 6 May 2010 at 7:57 am #

    Oh man, you’re a lifesaver. I spent all day from 6am to 11pm trying to figure out why large soap responses were failing on my recently virtualized server. I did the regedit and tried the request again. It went through without issue. THANKS!

  2. Daniel 7 May 2010 at 8:09 pm #

    HI, this is interesting, I’m currently having an issue with Xen Vm’s not being able to download/upload much faster then 100mbit/s, once the vm’s get to about 14MB/s up or down, their network slows down to a few kbit/s then back up.

    I’m interesting in the above script. Can you give some more info on what it’s doing and how you believe it fixes the network issues ?

    Regards, Daniel

    • Will 8 May 2010 at 9:05 pm #

      The script is simply disabling TCP checksum and offload on the VIF and PIF. Needless to say, what I recommend is not the guaranteed fix. Fact is, we’re merely getting rid of the “overhead”.

  3. Pete 29 June 2010 at 11:28 pm #

    Top script, this appears so far in addition to the 2003 regifx to have moved us from 1Mb/s to a very good 300 – 500 Mbits/s transfer speed will continue testing and let you know if we have any problems. We are using a fibre connected SAN but appear to have the problem wiht the guesst machines transfers.

    • Will 30 June 2010 at 7:38 am #

      Cool! Let us know how it turns out. Have you also updated to the latest version of XenServer?

  4. Jawid 27 July 2010 at 8:51 am #

    Hi Mate,

    We have just done this on this on our server and it has worked like a charm. I am about to do this on a production platform that will do it for about 50 VMs and I am just thinking about a roll back just in case things go wrong.

    Do you have a script that will undo the changes that we make the script. Ie a reverse of the script above?

    Thanks for all your help! Your script was a lifesaver!

    • Will 27 July 2010 at 3:50 pm #

      Unfortunately, I do not have a script to reverse the changes.


Leave a Reply