Traditional layer 2 networks have issues because of three main reasons:
- Spanning-tree.
- Limited amount of VLANs.
- Large MAC address tables.
Spanning-tree blocks any redundant links to avoid loops. Blocking links to create a loop-free topology gets the job done, but it also means we pay for links we can’t use. We could switch to a layer 3 network, but some technology requires layer 2 networking.
The VLAN ID is 12-bit, which means we can create 4094 VLANs (0 and 4095 are reserved). Only 4094 available VLANs can be an issue for data centers. For example, imagine we have a service provider with 500 customers. With 4094 available VLANs, they can only offer 8 VLANs to each customer.
Because of server virtualization, the number of addresses in the MAC address tables of our switches has grown exponentially. Before server virtualization, a switch only had to learn one MAC address per switchport. With server virtualization, we run many virtual machines (VM) or containers on a single physical server. Each VM has a virtual NIC and a virtual MAC address. The switch has to learn many MAC addresses on a single switchport.
A Top of Rack (ToR) switch in a data center could connect to 24 or 48 physical servers. A data center could have many racks, so each switch has to store the MAC addresses of all VMs that communicate with each other. We require much larger MAC address tables compared to networks without server virtualization.
In this lesson, I’ll explain what VXLAN is, how it works, and how it solves the above layer 2 issues.
Overlay vs Underlay
VXLAN uses an overlay and underlay network:
An overlay network is a virtual network that runs on top of a physical underlay network. Even if you never heard about this terminology before, you have probably seen it. A GRE tunnel is a simple example of an overlay network. The GRE tunnel runs on top of a physical underlay network.
With VXLAN, the overlay is a layer 2 Ethernet network. The underlay network is a layer 3 IP network. Another name for the underlay network is a transport network.
The underlay network is simple; its only job is to get packets from A to B. We don’t use any layer 2 here, only layer 3. When we use layer 3, we can use an IGP like OSPF or EIGRP and load balance traffic on redundant links.
Another advantage is that the overlay and underlay network are independent. The overlay network is virtual and requires an underlay network, but whatever changes you make in the overlay network won’t affect the underlay network. You can add and remove links in the underlay network, and as long as your routing protocol can reach the destination, your overlay network will remain unchanged.
VNI
The VXLAN Network Identifier (VNI) identifies the VXLAN and has a similar function as the VLAN ID for regular VLANs. We use 24 bits for the VNI, which means we can create 16,777,215 ( ~16 million) VXLANs. That’s a lot, compared to those 4094 VLANs with a 12-bit VLAN ID. We can create plenty of VXLANs, which means a large service provider with even thousands of customers can use as many VXLANs per customer as needed.
VTEP
The VXLAN tunnel endpoint (VTEP) is the device that’s responsible for encapsulating and de-encapsulating layer 2 traffic. This device is the connection between the overlay and the underlay network. The VTEP comes in two forms:
- Software (host-based)
- Hardware (gateway)
Let’s look at these two options.
Software
When I’m talking about hosts, I mean hypervisors like VMWare’s ESXi or Microsoft’s Hyper-V. These hypervisors use virtual switches, and some of them support VXLAN. Here’s an illustration:
The VXLAN tunnels are between the virtual switches of the hypervisors. The underlay network is unaware of VXLAN.
Hardware
A hardware VTEP is a router, switch, or firewall which supports VXLAN. We also call a hardware VTEP a VXLAN gateway because it combines a regular VLAN and VXLAN segment into a single layer 2 domain. Some switches have VXLAN support with ASICs, offering better VXLAN performance than a software VTEP. Here’s what it looks like:
In the above picture, the VXLAN tunnels are between the physical switches. The devices that connect to the physical switches are unaware of VXLAN.
Interfaces
Each VTEP has two interfaces types:
- VTEP IP interface: Connects the VTEP to the underlay network with a unique IP address. This interface encapsulates and de-encapsulates Ethernet frames.
- VNI interface: A virtual interface that keeps network traffic separated on the physical interface. Similar to an SVI interface.
A VTEP can have multiple VNI interfaces, but they associate with the same VTEP IP interface. Here’s a picture to help you visualize this:
Let me explain what you see above:
- We have three VTEP devices, and each VTEP has a VTEP IP interface that connects to the underlay network.
- All VTEP devices have a VNI interface for VNI 5012 to create a layer 2 segment.
- VTEP1 and VTEP2 also have another VNI interface for VNI 5013 to create another layer 2 segment.
VXLAN Frame Format
Let’s take a closer look at the VXLAN frame and header:
When a VTEP encapsulates an Ethernet frame, it adds a VXLAN header. In this header, we find the VNI and some flags.
The official UDP port number for VXLAN is 4789. However, it’s possible that you also run into UDP port 8472. When VXLAN was first implemented in Linux, there was no official port number yet, and many vendors used port 8472.
The VXLAN header looks similar to the LISP header. This is not by accident. The idea was to add layer 2 support to LISP and call it layer 2 LISP. Instead, they came up with the name VXLAN.
Packet Walkthrough
Let’s look at an example of how VXLAN encapsulates and de-encapsulates an Ethernet frame. Here’s the topology:
H1 and H2 are regular hosts and unaware of VXLAN. VTEP1 and VTEP2 are two switches that act as VTEP devices. We use VNI 5012 to encapsulate Ethernet frames between H1 and H2.
Let me walk you through this process:
- H1 transmits an Ethernet frame, destined for H2.
- VTEP1 receives the Ethernet frame on its VNI interface and performs the following actions:
- Look up the VNI (5012 in my example) to which H1 is attached.
- Find the mapping between the destination MAC address and remote VTEP IP address.
- Add the VXLAN header with VNI 5012.
- Add the UDP header.
- Add the outer IP header and set the VTEP IP addresses of VTEP1 and VTEP2.
- Transmit the IP packet on the underlay network.
- VTEP2 receives the IP packet on its VTEP interface and performs the following actions:
- De-encapsulate the IP packet.
- Verify whether the VNI is correct and check if there is a host that uses the destination MAC address.
- Forward the original Ethernet frame towards H2.
- H2 receives the Ethernet frame.
Control Plane
In the packet walkthrough example, I explained that the VTEP device looks up the mapping to figure out what VTEP IP address to use to reach the destination MAC address. I didn’t explain how VTEP1 learned this mapping information. Let’s see how this works.
With a traditional VLAN, the first time two hosts communicate with each other, it goes like this:
- H1 sends an ARP request.
- Switches in between H1 and H2 learn the MAC address of H1.
- Switches flood the ARP request.
- H2 receives the ARP request.
- H2 answers with an ARP reply.
- Switches learn the MAC address of H2.
With VXLAN, each VTEP has a VXLAN mapping (forwarding) table that maps a destination MAC address to a remote VTEP IP address. How do VTEP devices learn MAC addresses? There are different control plane solutions. Cisco supports these four options:
- VXLAN with static unicast VXLAN tunnels.
- VXLAN with multicast underlay.
- VXLAN with MP-BGP EVPN.
- VXLAN with LISP.
The first option is simple. You manually configure the VXLAN mapping table. This works, but it’s not a scalable solution. The VXLAN standard describes the second solution, where we use a multicast “flood and learn” solution on the underlay.
Here’s how it works:
- Each VNI maps to a multicast group.
- The VTEP devices join the multicast group.
- When VTEP1 receives the ARP request, it transmits it to the multicast group.
- VTEP2 receives the ARP request and learns the MAC address of H1.
- VTEP2 stores the MAC address of H1 and the IP address of VTEP1 in the mapping table.
- When VTEP2 receives the ARP reply from H2, it uses the information in the mapping table to send a unicast packet to VTEP1.
The MP-BGP EVPN solution is popular in data centers and private clouds. VXLAN with LISP is a popular choice for campus networks. For example, Cisco’s SD-Access uses VXLAN with LISP on the control plane.
Conclusion
Let’s summarize what we have learned:
- Traditional layer 2 networks have issues:
- Spanning-tree blocks all redundant links. We can’t use ECMP.
- Limited amount of VLANs because of the 12-bit VLAN ID.
- Large MAC address tables because of server virtualization.
- VXLAN uses an overlay and underlay network;
- The underlay network is 100% layer 3, so we don’t have to use spanning-tree and can use load balancing.
- The overlay network is virtual.
- The 24-bit VNI identifies the VXLAN and is similar to a VLAN ID. We create ~16 million VXLANs. More than enough, even for large service providers.
- The VTEP device encapsulates and de-encapsulates layer 2 traffic. There are two versions:
- Software: Runs on the virtual switch of a hypervisor.
- Hardware: Runs on a router, switch, or firewall. Some hardware VTEPs use ASICs for better performance.
- Each VTEP has two interfaces:
- VTEP IP interface: Connects the VTEP to the underlay network. This interface encapsulates and de-encapsulates VXLAN traffic.
- VNI interface: Virtual interface, similar to a SVI interface.
- A VTEP can have multiple VNI interfaces, but they associate with the same VTEP interface.
- VXLAN encapsulates an Ethernet frame and adds a VXLAN, UDP, and IP header.
- The VXLAN standard describes a multicast “flood and learn” solution for the control plane.
- Other control plane options are static VXLAN tunnels, MP-BGP with EVPN, or VXLAN with LISP.
- You can learn more about VXLAN in RFC 7348.
VXLAN has many advantages. Let me give you an overview:
- VXLAN allows you to segment your network just like with VLANs, without the disadvantages of layer 2 networks.
- No spanning-tree so we can use redundant links (ECMP).
- Simple underlay network.
- It is not limited to 4094 VLANs because of the 12-bit VLAN ID.
- You can create more than 16 million VXLANs because of the 24-bit VNI.
- No need to build large layer 2 topologies and span VLANs across the entire network.
- Less flooding:
- Broadcast traffic.
- Multicast traffic.
- Unknown unicast traffic.
- High performance with hardware VTEPs that use ASICs.
I hope you enjoyed this lesson. If you have any questions, please leave a comment.
No comments:
Post a Comment