Thursday, February 9, 2012

IGMP Snooping-Enabled NLB on Cisco IOS

Microsoft NLB. What can I say...its free, and its Microsoft, you're not getting a premo solution. In a virtual environment, where a NLB member can move from physical server to physical server, some real fun begins. Many HOWTOs, including Cisco's own, will have you placing static CAM entries everywhere. BLEH! I hope to show you how to avoid that.

The MS rundown of NLB as a whole:
http://technet.microsoft.com/en-us/library/bb742455.aspx

Cisco's HOWTO (needs work as you'll see below):
http://www.cisco.com/en/US/products/hw/switches/ps708/products_configuration_example09186a0080a07203.shtml

There's a problem with this HOWTO from Cisco, its a bit messy. I want to give some credit first, I gathered some of this data from this forum post, I wouldn't have pieced it together otherwise.

Lets begin.

Microsoft NLB, when running in IGMP-enabled multicast mode(at least in 2008 R2), uses a IANA multicast MAC address, not a non-IANA one. This is an important point that I think has been overlooked by Cisco in their guide...because with this, you don't need the static CAM entry, you just need IGMP Snooping.

IGMP snooping won't work without IGMP joins being seen from the servers (virtual or otherwise). So you need a IGMP router on that VLAN/segment to advertise its presence so the Windows servers respond, and the snooping is performed. To do that, you need to either A) Enable PIM (and therefore IGMP) on the interface or B) Simply enable the interface to be a "IGMP Querier". I'll leave it up to the reader to find their own platform's Multicast configuration guide to find the commands, but I will warn you of two things:

1) Make sure multicast-routing is turned on(in the VRF your interface is on if you're doing VRFs)
2) You will not see any "joined groups" in your show ip igmp command output

Finally, Cisco's note about process switching. Bug CSCsw87563 addresses it for the 6500 platform, not sure about the others. In my environment, I've added zero CAM entries because the bug is "fixed" for my platform, if you're in the same boat, good for you. If not...you really should, process switching is terrible. Even if you are affected by this, you only need to put the static cam on the switch with the SVI. All downstream switches will snoop and L2 switch with ease.

>>>>>A quick bug toolkit search revealed nothing on the popular 3560/3750 and 4500 lines. I am very interested if anyone can find more info this process switch thing on other platforms....even NX-OS!

Finally, all my work has been around avoiding the use of tying a static CAM entry to a physical interface everywhere (to avoid switch flooding). You still need a single static ARP entry.