暗夜星空: February 2020

Tuesday, February 25, 2020

Network Maintenance

Network maintenance basically means you have to do what it takes in order to keep a network up and running and it includes a number of tasks:

Troubleshooting network problems.
Hardware and software installation/configuration.
Monitoring and improving network performance.
Planning for future network growth.
Creating network documentation and keeping it up-to-date.
Ensuring compliance with company policies.
Ensuring compliance with legal regulations.
Securing the network against all kind of threats.

Of course this list could be different for each network you work on and perhaps you are only responsible for a number of these tasks. All these tasks can be performed in the following way:

Structured tasks.
Interrupt-driven tasks.

Structured means you have a pre-defined plan for network maintenance that will make sure that problems are solved before they occur. As a network engineer this will also make your life a whole lot easier. Interrupt-driven means you just wait for trouble to occur and then fix it as fast as you can. Interrupt-driven is more like the “fireman” approach…you wait for trouble to happen and then you try to fix the problem as fast as you can. A structured approach where you have a network maintenance strategy and plan reduces downtime and it’s more cost effective.

Of course you can never completely get rid of interrupt-driven tasks because sometimes things “just go wrong” but with a good plan we can reduce the number of interrupt-driven tasks for sure.

You don’t have to think of a complete network maintenance model yourself; there are a number of well-known network maintenance models that we use. It’s best to use one of the models that is best suited for your organization and adjustments if needed.

Choosing which network maintenance model you will use depends on your network and the business. You can also use them as a template to create your own network maintenance model.

To give you an idea what a network maintenance model is about and what it looks like, here’s an example for FCAPS:

Fault management: we will configure our network devices (routers, switches, firewalls, servers, etc.) to capture logging messages and send them to an external server. Whenever an interface goes down or the CPU goes above 80% we want to receive an e-mail so we can see what is going on.
Configuration management: Any changes made to the network have to be logged. We will use a change management so relevant personnel will be notified of planned network changes. Changes to network devices have to be reported and acknowledged before they are implemented.
Accounting management: We will charge (guest) users for usage of the wireless network so they’ll pay for each 100MB of data or something. It’s also commonly used to charge people for long distance VoIP calls.
Performance management: Network performance will be monitored on all LAN and WAN links so we know when things go wrong. QoS (Quality of Service) will be configured on the appropiate interfaces.
Security management: We will create a security policy and implement it by using firewalls, VPNs, intrusion prevention systems and use AAA (Authorization, Authentication and Accounting) servers to validate user credentials. Network breaches have to be logged and a appropiate response has to be made.

You can see FCAPS is not just a “theoretical” method but it truly describes “what”, “how” and “when” we will do things.

Whatever network maintenance model you decide to use, there are always a number of routine maintenance tasks that should have listed procedures, here are a couple of examples:[teaser]

Configuration changes: Business are never static but they change all the time. Sometimes you need to make changes to the network to allow access for guest users, normal users might move from one office to another so you’ll have to make changes to the network to facilitate this.
Replacement of hardware: Older hardware has to be replaced with more modern equipment and it’s also possible that production hardware fails so we’ll have to replace it immediately.
Backups: If we want to recover from network problems such as failing switches or routers then we need to make sure we have recent backups of configurations. Normally you will use scheduled backups so you will save the running-configuration each day, week, month or whatever you like.
Software updates: We need to keep our network devices and operating systems up-to-date. Bugs are fixed but also to make sure we don’t have devices that are running older software that has security vulnerabilities.
Monitoring: We need to collect and understand traffic statistics and bandwidth utilization so we can spot (future) network problems but also so we can plan for future network growth.

Normally you will create a list with the tasks that have to be done for your network. These tasks can be assigned a certain priority. If a certain access layer switch fails then you will likely want to replace it as fast as you can but a failed distribution or core layer device will have a much higher priority since it impacts more users of the network. Other tasks like backups and software updates can be scheduled. You will probably want to install software updates outside of business operating hours and backups can be scheduled to perform each day after midnight or something. The advantage of scheduling certain tasks is that network engineers will less likely forget to do them.

Making changes to your network will sometimes impact productivity of users who rely on the network availability. Some changes will have a huge impact, changes to firewalls or access-list rules might impact more users then you’d wish for. For example you might want to install a new firewall and planned for a certain result. Accidentally you forgot about a certain application that uses random port numbers and you end up troubleshooting this issue. Meanwhile some users are not able to use this application (and shouting at you while you try to fix it…;).

Larger companies might have more than 1 IT department and each department is responsible for different network services. If you plan to replace a certain router tommorow at 2AM then you might want to warn the “Microsoft Windows” guys department because their servers will be unreachable. You can use change management for this. When you plan to make a certain change to the network then other departments will be informed and they can object if there is a conflict with their planning.

When you want to implement change management you might want to think about the following:

Who will be responsible for authorizing changes to the network?
Which tasks will be performed during scheduled maintenance windows?
What procedures have to be followed before making a change? (for example: doing a “copy run start” before making changes to a switch).
How will you measure the success or failure of network changes? (for example: if you plan to change a number of IP addresses you will plan the time required to make this change. If it takes 5 minutes to reconfigure the IP addresses and you end up troubleshooting 2 hours because something else is not working you might want to “rollback” to the previous configuration. How much time do you allow for troubleshooting? 5 minutes? 10 minutes? 1 hour?
How, when and who will add the network change to the network documentation?
How will you create a rollback plan so you can restore a configuration to the previous configuration in case of unexpected problems?
What circumstances will allow change management policies to be overruled?

Another task we have to do is to create and update our network documentation. Whenever a new network is designed and created it should be documented.

The more challenging part is to keep it up-to-date in the future. There are a number of items that you should find in any network documentation:

Physical topology diagram: This should show all the network devices and how they are physically connected to each other.
Logical topology diagram: This should show how everything is connected to each other. Protocols that are used, VLAN information etc.
Interconnections: It’s useful to have a diagram that shows which interfaces of one network device are connected to the interface of another network device.
Inventory: You should have an inventory of all network equipment, vendor lists, product numbers, software versions, software license information and each network device should have an organization tag assess number.
IP Addresses: You should have a diagram that covers all the IP addresses in use on the network and on which interfaces they are configured.
Configuration management: Before changing a configuration we should save the current running-configuration so it’s easy to restore to a previous (working) version. It’s even better to keep an archive of older configurations for future use.
Design documents: Documents that were created during the original design of the network should be kept so you can always check why certain design decisions were made.

It’s also a good idea to work with step-by-step guidelines for troubleshooting or using templates for certain configurations that all network engineers agree on to use, here are some examples to give you an idea:

interface FastEthernet0/1
 description AccessPoint
 switchport access vlan 2
 switchport mode access
 spanning-tree portfast

Here’s an example for access interfaces connected to wireless access points. Portfast has to be enabled for spanning-tree, the access points have to be in VLAN 2 and the switchport has to be changed to “access” manually.

interface FastEthernet0/2
 description VOIP
 interface FastEthernet0/2
 description ClientComputer
 switchport access vlan 6
 switchport mode access
 switchport port-security
 switchport port-security violation shutdown
 switchport port-security maximum 1
 spanning-tree portfast
 spanning-tree bpduguard enable

Here’s a template for interfaces that connect to client computers. The interface has to be configured for “access” mode manually. Port security has to be enabled so only 1 MAC address is allowed (the computer). The interface has to go into forwarding mode immediately so we configure spanning-tree portfast and if we receive a BPDU the interface should go into err-disabled. Working with pre-defined templates like there will reduce the number of errors because everyone agrees on the same configuration. If you give each network engineer instructions to “protect the interface” you’ll probably end up with 10 different configurations…here’s one more example:

interface GigabitEthernet0/1
 description TRUNK
 switchport trunk encapsulation dot1q
 switchport mode trunk
 switchport trunk nonegotiate

If you tell 2 network engineers to “configure a trunk” you might end up with one interface configured for 802.1Q encapsulation and the other one for ISL encapsulation. If one network engineer disabled DTP and the other one configure the interface as “dynamic desirable” then it will also fail. If you instruct them to configure a trunk according to a template then we’ll have the same configuration on both sides.[/MM_Access_Decision]

Hopefully this has given you an idea of what network maintenance is about. If you have any questions, feel free to leave a comment!

VXLAN Flood and Learn with Multicast

In the introduction to VXLAN lesson, I explained what VXLAN is and how it works. In this lesson, I’ll show you how to configure VXLAN where we use the multicast “flood and learn” system to learn the mapping between a VTEP IP address and a MAC address.

Configuration

Here’s the topology we’ll use:

All devices are CSR1000V routers running Cisco IOS XE Software, version 16.06.01. I’m using CSR1000V routers since anyone can use these. I use custom MAC addresses because those are easy to recognize when we do a packet capture.

VTEP1 and VTEP2 are our VTEP devices. The core router is there to simulate our “IP network”. We are going to create a VXLAN tunnel with VNI 5012 so that H1 and H2 can communicate directly over layer 2.

I pre-configured OSPF so that we have connectivity between the VTEP devices and the core router.

hostname CORE
!
interface Loopback0
 ip address 3.3.3.3 255.255.255.255
!
interface GigabitEthernet2
 mac-address 0000.5e00.5303
 ip address 192.168.13.3 255.255.255.0
!
interface GigabitEthernet3
 mac-address 0000.5e00.5333
 ip address 192.168.23.3 255.255.255.0
!
router ospf 1
 network 3.3.3.3 0.0.0.0 area 0
 network 192.168.13.0 0.0.0.255 area 0
 network 192.168.23.0 0.0.0.255 area 0
!
end

hostname H1
!
interface GigabitEthernet2
 mac-address 0000.5e00.5365
 ip address 192.168.12.101 255.255.255.0
!
end

hostname H2
!
interface GigabitEthernet2
 mac-address 0000.5e00.5366
 ip address 192.168.12.102 255.255.255.0
!
end

hostname VTEP1
!
interface Loopback0
 ip address 1.1.1.1 255.255.255.255
!
interface GigabitEthernet2
 mac-address 0000.5e00.5301
!
interface GigabitEthernet3
 mac-address 0000.5e00.5311
 ip address 192.168.13.1 255.255.255.0
!
router ospf 1
 network 1.1.1.1 0.0.0.0 area 0
 network 192.168.13.0 0.0.0.255 area 0
!
end

hostname VTEP2
!
interface Loopback0
 ip address 2.2.2.2 255.255.255.255
!
interface GigabitEthernet2
 mac-address 0000.5e00.5302
!
interface GigabitEthernet3
 mac-address 0000.5e00.5322
 ip address 192.168.23.2 255.255.255.0
!
router ospf 1
 network 2.2.2.2 0.0.0.0 area 0
 network 192.168.23.0 0.0.0.255 area 0
!
end

Multicast

Let’s start with the configuration of multicast. With VXLAN, we don’t have a typical scenario where we have a few sources and many receivers. All VTEP devices communicate with each other so it makes sense to use bidirectional PIM. The core router will be the RP in this network.

Let’s enable multicast routing and bidirectional PIM on all VTEP devices and the core router:

VTEP1, VTEP2 & CORE
(config)#ip multicast-routing distributed
(config)#ip pim bidir-enable

We need to enable PIM sparse mode on all physical interfaces that connect to the IP network:

VTEP1 & VTEP2 & CORE
(config)#interface GigabitEthernet 3
(config-if)#ip pim sparse-mode

CORE(config)#interface GigabitEthernet 2
CORE(config-if)#ip pim sparse-mode

And don’t forget the loopback interfaces:

VTEP1, VTEP2 & CORE
(config)#interface Loopback 0
(config-if)#ip pim sparse-mode

Last but not least, configure the RP address:

VTEP1, VTEP2 & CORE
(config)#ip pim rp-address 3.3.3.3 bidir

This completes the multicast configuration.

VXLAN

We need to create a Network Virtualization Endpoint (NVE) interface. This is where we configure the VNI and multicast group that we will use. We source this interface from the loopback 0 interface, use VNI 5012, and use multicast group 239.1.1.1.

Here’s how to configure the NVE interface:

VTEP1 & VTEP2
(config)#interface NVE 1
(config-if)#no shutdown
(config-if)#source-interface Loopback 0
(config-if)#member vni 5012 mcast-group 239.1.1.1

Now we need to configure the Ethernet Flow Point (EFP) service instance. This is a logical interface that connects a bridge domain to a physical port (or EtherChannel). Under the service instance, we configure whether the incoming traffic is tagged or untagged. In our case, the hosts send untagged traffic. This is how to configure it:

VTEP1 & VTEP2
(config)#interface GigabitEthernet 2
(config-if)#service instance 1 ethernet
(config-if-srv)#encapsulation untagged
(config-if-srv)#exit
(config-if)#exit

Last but not least, we need to configure the Bridge Domain Interface (BDI):

The BDI is the IOS XE equivalent of the IOS Bridge-Group Virtual Interface (BVI).

This is where we combine the VNI, physical interface, and service-instance:

VTEP1 & VTEP2
(config)#bridge-domain 1
(config-bdomain)#member vni 5012
(config-bdomain)#member GigabitEthernet 2 service-instance 1

This completes our VXLAN configuration.

I’m showing the two exit commands on purpose because I configure the bridge-domain globally. You can also configure the bridge-domain under the service instance.

Verification

Let’s verify our work.

Multicast

First, I’ll check if our multicast configuration is correct:

VTEP1#show ip mroute 239.1.1.1
IP Multicast Routing Table

(*, 239.1.1.1), 00:00:36/00:02:25, RP 3.3.3.3, flags: BCx
  Bidir-Upstream: GigabitEthernet3, RPF nbr 192.168.13.3
  Outgoing interface list:
    Tunnel0, Forward/Sparse-Dense, 00:00:36/00:02:25
    GigabitEthernet3, Bidir-Upstream/Sparse, 00:00:36/stopped

VTEP2#show ip mroute 239.1.1.1
IP Multicast Routing Table

(*, 239.1.1.1), 00:00:36/00:02:24, RP 3.3.3.3, flags: BCx
  Bidir-Upstream: GigabitEthernet3, RPF nbr 192.168.23.3
  Outgoing interface list:
    Tunnel0, Forward/Sparse-Dense, 00:00:36/00:02:24
    GigabitEthernet3, Bidir-Upstream/Sparse, 00:00:36/stopped

CORE#show ip mroute 239.1.1.1
IP Multicast Routing Table

(*, 239.1.1.1), 00:00:49/00:02:45, RP 3.3.3.3, flags: B
  Bidir-Upstream: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet3, Forward/Sparse, 00:00:44/00:02:45
    GigabitEthernet2, Forward/Sparse, 00:00:49/00:02:40

I’m seeing the (*,G) entry for the multicast group 239.1.1.1 and outgoing interfaces. This is looking good.

VXLAN

Let’s try some VXLAN specific commands. First, we’ll check if the NVE interface is up:

VTEP1#show nve interface nve1
Interface: nve1, State: Admin Up, Oper Up, Encapsulation: Vxlan,
BGP host reachability: Disable, VxLAN dport: 4789
VNI number: L3CP 0 L2DP 1
source-interface: Loopback0 (primary:1.1.1.1 vrf:0)

VTEP2#show nve interface nve1
Interface: nve1, State: Admin Up, Oper Up, Encapsulation: Vxlan,
BGP host reachability: Disable, VxLAN dport: 4789
VNI number: L3CP 0 L2DP 1
source-interface: Loopback0 (primary:2.2.2.2 vrf:0)

The command above tells us whether the NVE interface is up or not. We can add the detail parameter to also see the number of packets or bytes we transmitted or received on this interface:

VTEP1#show nve interface nve1 detail
Interface: nve1, State: Admin Up, Oper Up, Encapsulation: Vxlan,
BGP host reachability: Disable, VxLAN dport: 4789
VNI number: L3CP 0 L2DP 1
source-interface: Loopback0 (primary:1.1.1.1 vrf:0)
   Pkts In   Bytes In   Pkts Out  Bytes Out
         0          0          0          0

VTEP2#show nve interface nve1 detail
Interface: nve1, State: Admin Up, Oper Up, Encapsulation: Vxlan,
BGP host reachability: Disable, VxLAN dport: 4789
VNI number: L3CP 0 L2DP 1
source-interface: Loopback0 (primary:2.2.2.2 vrf:0)
   Pkts In   Bytes In   Pkts Out  Bytes Out
         0          0          0          0

Things are quiet but that will change soon. Let’s check the VNIs and multicast group addresses we use with the NVE interface:

VTEP1#show nve vni
Interface  VNI        Multicast-group VNI state  Mode  BD    cfg vrf                      
nve1       5012       239.1.1.1       Up         L2DP  1     CLI N/A

VTEP2#show nve vni
Interface  VNI        Multicast-group VNI state  Mode  BD    cfg vrf                      
nve1       5012       239.1.1.1       Up         L2DP  1     CLI N/A

Here Take a look at the show nve peers command:

VTEP1#show nve peers
Interface  VNI      Type Peer-IP          RMAC/Num_RTs   eVNI     state flags UP time

VTEP1 doesn’t know about any other VTEPs right now. This will change when we generate some traffic. The last thing we need to check is the bridge domain:

VTEP1#show bridge-domain 1
Bridge-domain 1 (2 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet2 service instance 1
    vni 5012
   AED MAC address    Policy  Tag       Age  Pseudoport

VTEP2#show bridge-domain 1
Bridge-domain 1 (2 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet2 service instance 1
    vni 5012
   AED MAC address    Policy  Tag       Age  Pseudoport

The output above is empty because our hosts haven’t sent anything yet. Let’s change that by sending some ICMP packets between H1 and H2:

H1#ping 192.168.12.102
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.12.102, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 2/2/4 ms

Excellent, at least there is connectivity. Let’s see if our VTEP devices now know about each other:

VTEP1#show nve peers
Interface  VNI      Type Peer-IP          RMAC/Num_RTs   eVNI     state flags UP time
nve1       5012     L2DP 2.2.2.2

VTEP2#show nve peers
Interface  VNI      Type Peer-IP          RMAC/Num_RTs   eVNI     state flags UP time
nve1       5012     L2DP 1.1.1.1

VTEP1 and VTEP2 now know about each other. Let’s also look again at the NVE interface:

VTEP1#show nve interface nve1 detail
Interface: nve1, State: Admin Up, Oper Up, Encapsulation: Vxlan,
BGP host reachability: Disable, VxLAN dport: 4789
VNI number: L3CP 0 L2DP 1
source-interface: Loopback0 (primary:1.1.1.1 vrf:0)
   Pkts In   Bytes In   Pkts Out  Bytes Out
         5        610          5        610

VTEP2#show nve interface nve1 detail
Interface: nve1, State: Admin Up, Oper Up, Encapsulation: Vxlan,
BGP host reachability: Disable, VxLAN dport: 4789
VNI number: L3CP 0 L2DP 1
source-interface: Loopback0 (primary:2.2.2.2 vrf:0)
   Pkts In   Bytes In   Pkts Out  Bytes Out
         5        610          5        610

Above, we see our 5 packets. Here’s the bridge domain output again:

VTEP1#show bridge-domain 1
Bridge-domain 1 (2 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet2 service instance 1
    vni 5012
   AED MAC address    Policy  Tag       Age  Pseudoport
   0   0000.5E00.5365 forward dynamic   170  GigabitEthernet2.EFP1
   0   0000.5E00.5366 forward dynamic   169  nve1.VNI5012, VxLAN 
                                             src: 1.1.1.1 dst: 2.2.2.2

VTEP2#show bridge-domain 1
Bridge-domain 1 (2 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet2 service instance 1
    vni 5012
   AED MAC address    Policy  Tag       Age  Pseudoport
   0   0000.5E00.5365 forward dynamic   149  nve1.VNI5012, VxLAN 
                                             src: 2.2.2.2 dst: 1.1.1.1
   0   0000.5E00.5366 forward dynamic   149  GigabitEthernet2.EFP1

The output above is interesting. We can see that VTEP1 and VTEP2 learned about the MAC addresses of our hosts.

We verified that our configuration works but there are some interesting things we can try with this topology.

Unknown Unicast Traffic

How about we look at the “flood and learn” system in action? To demonstrate this, I’ll clear the bridge domain on our VTEP devices:

VTEP1 & VTEP2
#clear bridge-domain 1 mac table

Once again, the VTEP devices don’t know about any MAC addresses:

VTEP1#show bridge-domain 1
Bridge-domain 1 (2 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet2 service instance 1
    vni 5012
   AED MAC address    Policy  Tag       Age  Pseudoport

VTEP2#show bridge-domain 1
Bridge-domain 1 (2 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet2 service instance 1
    vni 5012
   AED MAC address    Policy  Tag       Age  Pseudoport

I’ll transmit two ICMP requests from H1. I do this on purpose, the first one will be flooded and the second one will be transmitted with unicast:

H1#ping 192.168.12.102 repeat 2
Type escape sequence to abort.
Sending 2, 100-byte ICMP Echos to 192.168.12.102, timeout is 2 seconds:
!!
Success rate is 100 percent (2/2), round-trip min/avg/max = 2/3/4 ms

Here’s the first ICMP request:

Above, we see that this ICMP request is flooded because the destination is multicast group address 239.1.1.1. The ICMP reply from H2 to H1 is transmitted with unicast:

The second ICMP request from H1 is now also transmitted with unicast:

If you want to take a look at this packet capture yourself, click on the button below:

VXLAN Multicast Unknown Unicast

Broadcast Traffic

What about broadcast traffic? This works the same as unknown unicast. Let me show you an example:

H1#ping 192.168.12.255 repeat 1
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 192.168.12.255, timeout is 2 seconds:

Reply to request 0 from 192.168.12.102, 4 ms

Above, you can see that this layer 2 broadcast traffic is also flooded to the multicast group.

VXLAN Multicast Broadcast

This wraps up this lesson!

hostname CORE
!
interface Loopback0
 ip address 3.3.3.3 255.255.255.255
!
interface GigabitEthernet2
 mac-address 0000.5e00.5303
 ip address 192.168.13.3 255.255.255.0
!
interface GigabitEthernet3
 mac-address 0000.5e00.5333
 ip address 192.168.23.3 255.255.255.0
!
router ospf 1
 network 3.3.3.3 0.0.0.0 area 0
 network 192.168.13.0 0.0.0.255 area 0
 network 192.168.23.0 0.0.0.255 area 0
!
end

hostname H1
!
interface GigabitEthernet2
 mac-address 0000.5e00.5365
 ip address 192.168.12.101 255.255.255.0
!
end

hostname H2
!
interface GigabitEthernet2
 mac-address 0000.5e00.5366
 ip address 192.168.12.102 255.255.255.0
!
end

hostname VTEP1
!
ip multicast-routing distributed
!
redundancy
bridge-domain 1 
 member vni 5012
 member GigabitEthernet2 service-instance 1
!
interface Loopback0
 ip address 1.1.1.1 255.255.255.255
 ip pim sparse-mode
!
interface GigabitEthernet2
 mac-address 0000.5e00.5301
 service instance 1 ethernet
  encapsulation untagged
!
interface GigabitEthernet3
 mac-address 0000.5e00.5311
 ip address 192.168.13.1 255.255.255.0
 ip pim sparse-mode
!
interface nve1
 source-interface Loopback0
 member vni 5012 mcast-group 239.1.1.1
!
router ospf 1
 network 1.1.1.1 0.0.0.0 area 0
 network 192.168.13.0 0.0.0.255 area 0
!
ip pim bidir-enable
ip pim rp-address 3.3.3.3 bidir
!
end

hostname VTEP2
!
ip multicast-routing distributed
!
redundancy
bridge-domain 1 
 member vni 5012
 member GigabitEthernet2 service-instance 1
!
interface Loopback0
 ip address 2.2.2.2 255.255.255.255
 ip pim sparse-mode
!
interface GigabitEthernet2
 mac-address 0000.5e00.5302
 service instance 1 ethernet
  encapsulation untagged
!
interface GigabitEthernet3
 mac-address 0000.5e00.5322
 ip address 192.168.23.2 255.255.255.0
 ip pim sparse-mode
!
interface nve1
 source-interface Loopback0
 member vni 5012 mcast-group 239.1.1.1
!
router ospf 1
 network 2.2.2.2 0.0.0.0 area 0
 network 192.168.23.0 0.0.0.255 area 0
!
ip pim bidir-enable
ip pim rp-address 3.3.3.3 bidir
!
end

Conclusion

You have now learned how to configure VXLAN with Multicast bidirectional PIM, how to verify your configuration, and you have seen the flood and learn system in action. I hope you enjoyed this lesson. If you have any questions feel free to leave a comment!

Introduction to Virtual Extensible LAN (VXLAN)

Virtual eXtensible Local Area Network (VXLAN) is a tunneling protocol that tunnels Ethernet (layer 2) traffic over an IP (layer 3) network.

Traditional layer 2 networks have issues because of three main reasons:

Spanning-tree.
Limited amount of VLANs.
Large MAC address tables.

Spanning-tree blocks any redundant links to avoid loops. Blocking links to create a loop-free topology gets the job done, but it also means we pay for links we can’t use. We could switch to a layer 3 network, but some technology requires layer 2 networking.

The VLAN ID is 12-bit, which means we can create 4094 VLANs (0 and 4095 are reserved). Only 4094 available VLANs can be an issue for data centers. For example, imagine we have a service provider with 500 customers. With 4094 available VLANs, they can only offer 8 VLANs to each customer.

Because of server virtualization, the number of addresses in the MAC address tables of our switches has grown exponentially. Before server virtualization, a switch only had to learn one MAC address per switchport. With server virtualization, we run many virtual machines (VM) or containers on a single physical server. Each VM has a virtual NIC and a virtual MAC address. The switch has to learn many MAC addresses on a single switchport.

A Top of Rack (ToR) switch in a data center could connect to 24 or 48 physical servers. A data center could have many racks, so each switch has to store the MAC addresses of all VMs that communicate with each other. We require much larger MAC address tables compared to networks without server virtualization.

In this lesson, I’ll explain what VXLAN is, how it works, and how it solves the above layer 2 issues.

Overlay vs Underlay

VXLAN uses an overlay and underlay network:

An overlay network is a virtual network that runs on top of a physical underlay network. Even if you never heard about this terminology before, you have probably seen it. A GRE tunnel is a simple example of an overlay network. The GRE tunnel runs on top of a physical underlay network.

With VXLAN, the overlay is a layer 2 Ethernet network. The underlay network is a layer 3 IP network. Another name for the underlay network is a transport network.

The underlay network is simple; its only job is to get packets from A to B. We don’t use any layer 2 here, only layer 3. When we use layer 3, we can use an IGP like OSPF or EIGRP and load balance traffic on redundant links.

Another advantage is that the overlay and underlay network are independent. The overlay network is virtual and requires an underlay network, but whatever changes you make in the overlay network won’t affect the underlay network. You can add and remove links in the underlay network, and as long as your routing protocol can reach the destination, your overlay network will remain unchanged.

VNI

The VXLAN Network Identifier (VNI) identifies the VXLAN and has a similar function as the VLAN ID for regular VLANs. We use 24 bits for the VNI, which means we can create 16,777,215 ( ~16 million) VXLANs. That’s a lot, compared to those 4094 VLANs with a 12-bit VLAN ID. We can create plenty of VXLANs, which means a large service provider with even thousands of customers can use as many VXLANs per customer as needed.

VTEP

The VXLAN tunnel endpoint (VTEP) is the device that’s responsible for encapsulating and de-encapsulating layer 2 traffic. This device is the connection between the overlay and the underlay network. The VTEP comes in two forms:

Software (host-based)
Hardware (gateway)

Let’s look at these two options.

Software

When I’m talking about hosts, I mean hypervisors like VMWare’s ESXi or Microsoft’s Hyper-V. These hypervisors use virtual switches, and some of them support VXLAN. Here’s an illustration:

The VXLAN tunnels are between the virtual switches of the hypervisors. The underlay network is unaware of VXLAN.

Hardware

A hardware VTEP is a router, switch, or firewall which supports VXLAN. We also call a hardware VTEP a VXLAN gateway because it combines a regular VLAN and VXLAN segment into a single layer 2 domain. Some switches have VXLAN support with ASICs, offering better VXLAN performance than a software VTEP. Here’s what it looks like:

In the above picture, the VXLAN tunnels are between the physical switches. The devices that connect to the physical switches are unaware of VXLAN.

Interfaces

Each VTEP has two interfaces types:

VTEP IP interface: Connects the VTEP to the underlay network with a unique IP address. This interface encapsulates and de-encapsulates Ethernet frames.
VNI interface: A virtual interface that keeps network traffic separated on the physical interface. Similar to an SVI interface.

A VTEP can have multiple VNI interfaces, but they associate with the same VTEP IP interface. Here’s a picture to help you visualize this:

Let me explain what you see above:

We have three VTEP devices, and each VTEP has a VTEP IP interface that connects to the underlay network.
All VTEP devices have a VNI interface for VNI 5012 to create a layer 2 segment.
VTEP1 and VTEP2 also have another VNI interface for VNI 5013 to create another layer 2 segment.

VXLAN Frame Format

Let’s take a closer look at the VXLAN frame and header:

When a VTEP encapsulates an Ethernet frame, it adds a VXLAN header. In this header, we find the VNI and some flags.

The official UDP port number for VXLAN is 4789. However, it’s possible that you also run into UDP port 8472. When VXLAN was first implemented in Linux, there was no official port number yet, and many vendors used port 8472.

The VXLAN header looks similar to the LISP header. This is not by accident. The idea was to add layer 2 support to LISP and call it layer 2 LISP. Instead, they came up with the name VXLAN.

Packet Walkthrough

Let’s look at an example of how VXLAN encapsulates and de-encapsulates an Ethernet frame. Here’s the topology:

H1 and H2 are regular hosts and unaware of VXLAN. VTEP1 and VTEP2 are two switches that act as VTEP devices. We use VNI 5012 to encapsulate Ethernet frames between H1 and H2.

Let me walk you through this process:

H1 transmits an Ethernet frame, destined for H2.
VTEP1 receives the Ethernet frame on its VNI interface and performs the following actions:
- Look up the VNI (5012 in my example) to which H1 is attached.
- Find the mapping between the destination MAC address and remote VTEP IP address.
- Add the VXLAN header with VNI 5012.
- Add the UDP header.
- Add the outer IP header and set the VTEP IP addresses of VTEP1 and VTEP2.
- Transmit the IP packet on the underlay network.
VTEP2 receives the IP packet on its VTEP interface and performs the following actions:
- De-encapsulate the IP packet.
- Verify whether the VNI is correct and check if there is a host that uses the destination MAC address.
- Forward the original Ethernet frame towards H2.
H2 receives the Ethernet frame.

Control Plane

In the packet walkthrough example, I explained that the VTEP device looks up the mapping to figure out what VTEP IP address to use to reach the destination MAC address. I didn’t explain how VTEP1 learned this mapping information. Let’s see how this works.

With a traditional VLAN, the first time two hosts communicate with each other, it goes like this:

H1 sends an ARP request.
Switches in between H1 and H2 learn the MAC address of H1.
Switches flood the ARP request.
H2 receives the ARP request.
H2 answers with an ARP reply.
Switches learn the MAC address of H2.

With VXLAN, each VTEP has a VXLAN mapping (forwarding) table that maps a destination MAC address to a remote VTEP IP address. How do VTEP devices learn MAC addresses? There are different control plane solutions. Cisco supports these four options:

VXLAN with static unicast VXLAN tunnels.
VXLAN with multicast underlay.
VXLAN with MP-BGP EVPN.
VXLAN with LISP.

The first option is simple. You manually configure the VXLAN mapping table. This works, but it’s not a scalable solution. The VXLAN standard describes the second solution, where we use a multicast “flood and learn” solution on the underlay.

Here’s how it works:

Each VNI maps to a multicast group.
The VTEP devices join the multicast group.
When VTEP1 receives the ARP request, it transmits it to the multicast group.
VTEP2 receives the ARP request and learns the MAC address of H1.
VTEP2 stores the MAC address of H1 and the IP address of VTEP1 in the mapping table.
When VTEP2 receives the ARP reply from H2, it uses the information in the mapping table to send a unicast packet to VTEP1.

The MP-BGP EVPN solution is popular in data centers and private clouds. VXLAN with LISP is a popular choice for campus networks. For example, Cisco’s SD-Access uses VXLAN with LISP on the control plane.

Conclusion

Let’s summarize what we have learned:

Traditional layer 2 networks have issues:
- Spanning-tree blocks all redundant links. We can’t use ECMP.
- Limited amount of VLANs because of the 12-bit VLAN ID.
- Large MAC address tables because of server virtualization.
VXLAN uses an overlay and underlay network;
The underlay network is 100% layer 3, so we don’t have to use spanning-tree and can use load balancing.
The overlay network is virtual.
The 24-bit VNI identifies the VXLAN and is similar to a VLAN ID. We create ~16 million VXLANs. More than enough, even for large service providers.
The VTEP device encapsulates and de-encapsulates layer 2 traffic. There are two versions:
- Software: Runs on the virtual switch of a hypervisor.
- Hardware: Runs on a router, switch, or firewall. Some hardware VTEPs use ASICs for better performance.
Each VTEP has two interfaces:
- VTEP IP interface: Connects the VTEP to the underlay network. This interface encapsulates and de-encapsulates VXLAN traffic.
- VNI interface: Virtual interface, similar to a SVI interface.
A VTEP can have multiple VNI interfaces, but they associate with the same VTEP interface.
VXLAN encapsulates an Ethernet frame and adds a VXLAN, UDP, and IP header.
The VXLAN standard describes a multicast “flood and learn” solution for the control plane.
Other control plane options are static VXLAN tunnels, MP-BGP with EVPN, or VXLAN with LISP.
You can learn more about VXLAN in RFC 7348.

VXLAN has many advantages. Let me give you an overview:

VXLAN allows you to segment your network just like with VLANs, without the disadvantages of layer 2 networks.
No spanning-tree so we can use redundant links (ECMP).
Simple underlay network.
It is not limited to 4094 VLANs because of the 12-bit VLAN ID.
You can create more than 16 million VXLANs because of the 24-bit VNI.
No need to build large layer 2 topologies and span VLANs across the entire network.
Less flooding:
- Broadcast traffic.
- Multicast traffic.
- Unknown unicast traffic.
High performance with hardware VTEPs that use ASICs.

I hope you enjoyed this lesson. If you have any questions, please leave a comment.