暗夜星空

Tuesday, February 25, 2020

Cisco Locator ID Separation Protocol (LISP)

Cisco Locator ID Separation Protocol (LISP) is a mapping and encapsulation protocol, originally developed to address the routing scalability issues on the Internet.

Internet routing tables have grown exponentially, putting a burden on BGP routers. Routing on the Internet is meant to be hierarchical, but because of disaggregation, a full Internet routing table nowadays contains over 800.000 prefixes.

Disaggregation is the opposite of aggregation (route summarization). We inject more specific routes when there is an aggregate (summary route). There are two main reasons why this happens:

Multihoming: Customers connect to two different ISPs and advertise their provider-independent address space (PI) to both ISPs.
Traffic engineering: A common practice for ingress traffic engineering is to advertise a more specific route. This works, but it increases the size of the Internet routing table.

You need powerful routers with enough RAM and TCAM to store all prefixes in the Internet routing table. Injecting more specific prefixes also increases the risk of route instability. We need routers with powerful CPUs to process changes in the routing table.

With traditional IP routing, an IP address has two functions:

Identity: To identify the device.
Location: The location of the device in the network; we use this for routing.

LISP separates these two functions of an IP address into two separate functions:

Endpoint Identifier (EID): Assigned to hosts like computers, laptops, printers, etc.
Routing Locators (RLOC): Assigned to routers. We use the RLOC address to reach EIDs.

Cisco created LISP, but it’s not a proprietary solution, it’s an open standard, defined in RFC 6830. Originally it was designed for the Internet, but nowadays, you also see LISP in other environments like data centers, IoT, WAN, and the campus (Cisco SD-Access).

In this lesson, you will learn about the different LISP components and how it operates.

LISP Overview

LISP is a map and encapsulation protocol. There are three essential environments in a LISP environment:

LISP sites: This is the EID namespace, where EIDs are.
non-LISP sites: This is the RLOC namespace where we find RLOCs. For example, the Internet.
LISP mapping service: This is the infrastructure that takes care of EID-to-RLOC mappings.

Here is a high-level simplified overview of how LISP works:

Let me explain what you see above:

We have two LISP sites, site 1 and site 2.
- In each site, there is a host and a router configured to use LISP.
  - The hosts have an EID address:
    - H1 EID 192.168.1.101
    - H2 EID 192.168.2.102.
  - The routers have an RLOC address:
    - R1 RLOC 192.168.123.1
    - R2 RLOC 192.168.123.2
The RLOC space is a non-LISP area. For example, the Internet.

When H1 wants to send an IP packet to H2, here’s what happens:

H1 doesn’t have anything to do with LISP and sends an IP packet to its default gateway (R1).
R1 receives the IP packet and asks the LISP mapping system where it can find EID 192.168.2.102.
The mapping system replies with an EID-to-RLOC mapping.
R1 now knows that it can reach EID 192.168.2.102 through RLOC 192.168.123.2. The router encapsulates the IP packet with LISP encapsulation and transmits the packet.
R2 receives the LISP encapsulated IP packet, de-encapsulates it, and forwards the original IP packet to H2.

A very simplified one-sentence explanation is that LISP is a tunneling protocol that uses a DNS-like system to figure out to which router they should send IP packets.

The LISP routers that encapsulate and de-encapsulate have a name:

Ingress Tunnel Router (ITR): Router, which encapsulates IP packets.
Egress Tunnel Router (ETR): Router, which de-encapsulates LISP encapsulated IP packets.
Tunnel Router (xTR): Router which performs both the ITR and ETR functions.

I added the ITR and ETR functions in the picture below:

Keep in mind that the hosts (computers, laptops, printers) don’t know anything about LISP.

From the LISP router’s perspective: Every endpoint (host) has an EID.
From the host’s perspective: It has a regular IP address. The host doesn’t even know what LISP is.

Now you know the basics of LISP, let’s add some more detail to this story.

LISP Control Plane

The LISP control plane is similar to how DNS works:

DNS resolves a hostname to an IP address.
LISP resolves an EID to an RLOC.

With traditional IP routing, we install prefixes in the routing table. LISP doesn’t install EID-prefixes in the routing table. Instead, LISP uses a distributed mapping system where we map EIDs to RLOCs. We store these mappings in a distributed EID-to-RLOC database. When an ITR needs to find an RLOC address, it sends a Map-Request query to the mapping system.

LISP Data Plane

Once an ITR has figured out which RLOC to use to reach an EID, it encapsulates the IP packet. Let’s take a closer look at how LISP encapsulates IP packets:

Let me walk you through the main headers. When the ITR receives the IP packet from a host, it adds the following headers:

LISP Header: This header includes some LISP information needed to forward the packet. I won’t cover every bit and field here, but the instance ID is worth mentioning. The instance ID is a 24-bit value that has a similar function as the Route Distinguisher (RD) in MPLS VPN. The instance ID is a unique identifier, which keeps prefixes apart when you have overlapping (private) EID addresses in your LISP sites.
Outer LISP UDP header: The source port is selected by the ITR to prevent traffic from one LISP site to another LISP site to take the same path if you have equal-cost multipath (ECMP) links to the destination. Different source ports prevent polarization. The destination port is 4341.
Outer LISP IP header: Contains the source and destination RLOC IP addresses needed to route the packet from the ITR to ETR.

EIDs and RLOCs can be IPv4 or IPv6 addresses, so the LISP data-plane supports any of the following encapsulation combinations:

IPv4 RLOCs with IPv4 EIDs.
IPv4 RLOCs with IPv6 EIDs.
IPv6 RLOCs with IPv6 EIDs.
IPv6 RLOCs with IPv4 EIDs.

For a detailed explanation of all fields in the LISP header, check out RFC 6830.

LISP Operation

You now know what LISP RLOCs and EIDs are, what an ITR and ETR do, and that LISP uses a mapping system for the control plane and how LISP encapsulates IP packets on the data plane.

Let’s dive even deeper and look at the exact steps of how the LISP mapping system works.

Mapping System

Map-Register and Map-Notify

When you configure an ETR, the router registers the EID-prefixes with the device that is responsible for keeping track of EID-to-RLOC mappings; the Map-Server (MS).

Here is an illustration of the registration process:

Lisp Map Register Map Reply Messages Example

The ETR sends a Map-Register message to the MS which contains:

EID-prefix: 192.168.2.0/24.
RLOC address: 192.168.123.2.
UDP source port: Chosen by the ETR.
UDP destination port: 4342.

The MS sends a reply called the Map-Notify to the ETR and confirms that it received and processed the Map-Register message. The Map-Notify message uses UDP source and destination port 4342.

Map-Request and Map-Reply

ITRs use Map-Request messages to request an EID-to-RLOC mapping. The Map-Reply provides the mapping.

Two functions provide a role here:

MS
MR (Map-Resolver)

We talked about the MS before. It’s the device where ETRs register their EID-prefixes and which stores EID-to-RLOC mappings.

When the ITR needs an EID-to-RLOC mapping, it sends a Map-Request to the MR.

When the MR receives a Map-Request, and it has an entry in its local database, then the MR responds with a Map-Reply. When it doesn’t have an entry, then the MR forwards the Map-Request to an MS. The MS forwards the Map-Request to the ETR, which answers the Map-Request with a Map-Reply directly.

In smaller environments, we combine the MR and MS functions into a single router. We call this an MR/MS. Here is an illustration of this process:

Lisp Itr Sends Map Request To Ms Etr Sends Map Reply

Let me walk you through this process:

Step 1

Within a LISP site, we use traditional routing. Let’s say H1 in LISP site 1 wants to communicate with H2 in LISP site 2. These hosts don’t know anything about LISP. If you had multiple routers in between H1 and the ITR, they would use regular IP routing to reach the ITR.

Step 2

The ITR receives the IP packet from H1 with destination 192.168.2.102. It does a lookup in its FIB table and asks itself the following questions:

Is there an entry that matches 192.168.2.102? If so:
- Use regular IP routing.
- If not, we use LISP encapsulation if any of the following three checks is true:
  - We have a default route.
  - We don’t have a route
  - We have a route with Null0 as the next-hop.
Is the source IP a registered EID-prefix in the local map-cache?
- If not, forward the packet with regular IP routing.
- If so, the ITR:
  - Selects a UDP source port.
  - Sets the UDP destination port to 4342.
  - Sends an encapsulated Map-Request to the MR for 192.168.2.102.

Step 3

The MR and MS functions are on the same device. When the MR/MS receives the Map-Request, it forwards it to the ETR, which registered the EID-prefix.

Step 4

When the ETR receives the Map-Request, it creates and transmits a Map-Reply which includes:

EID-to-RLOC mapping:
- EID: 192.168.2.0/24
- RLOC: 192.168.123.2
UDP source port 4342.
UDP destination port is the one that the ITR selected as the UDP source port for the Map-Request.

The ETR sends the Map-Reply directly to the ITR. However, it’s also possible that the ETR requests the MS to answer the Map-Request on its behalf.

To do this, the ETR has to enable the “proxy Map-Reply” flag (P-bit) in the Map-Register message.

The ITR receives the Map-Reply from the ETR (or the MS if the ETR requested a proxy Map-Reply) and installs the EID-to-RLOC mapping in its local map-cache. The ITR also programs its FIB and is now ready to forward LISP encapsulated traffic.

LISP Data Path

Let’s see what it looks like when the ITR encapsulates the IP packet from H1 with LISP. Here is an overview:

Let me explain the steps:

Step 1

The ITR receives a packet from H1 with EID 192.168.1.101 destined for H2 with EID 192.168.2.102.

Step 2

The ITR checks its FIB, finds a matching entry, encapsulates the IP packet from H1, and transmits the LISP encapsulated IP packet to the ETR:

Source IP: the RLOC address from ITR.
Destination IP: the RLOC address from the ETR.
Source UDP port: selected by ITR.
Destination UDP port: 4341.

Step 3

The ETR receives and de-encapsulates the LISP encapsulated IP packet, then forwards the IP packet to H2.

PETR

The Proxy ETR (PETR) is a router that connects to a non-LISP site like the Internet or a Data Center. Because the PETR connects with non-LISP sites, it doesn’t register EID prefixes with the mapping system.

When an ITR sends a Map-Request, and the EID is not in the mapping database system of the MS, here’s what happens:

The MS calculates the shortest prefix, which matches the requested destination but which doesn’t match any LISP EIDs.
The MS adds the calculated non-LISP prefix in a Negative Map-Reply.
The ITR can add this prefix in its map-cache and install it in the FIB.

From now on, the ITR can send traffic, which matches the non-LISP prefix directly to the PETR.

Let’s look at an example. I replaced LISP site 2 with a non-LISP site and replaced the ETR with a PETR:

Let me explain the steps:

Step 1

H1 wants to send an IP packet to destination IP address 3.21.157.243 (a server on the Internet named S1), so it forwards its packet to the ITR.

Step 2

The ITR doesn’t know how to reach 3.21.157.243 and sends a Map-Request to the MR to figure out what RLOC to use.

Step 3

The MR forwards the Map-Request to the MS. The MS replies with a Negative Map-Reply and includes a calculated non-LISP prefix. When the ITR receives the Negative Map-Reply, it installs the non-LISP prefix in its mapping cache and FIB.

Step 4

The ITR encapsulates the IP packet from H1 and forwards it to the PETR.

Step 5

The PETR de-encapsulates the LISP encapsulated IP packet and forwards the IP packet to 3.21.157.243.

PITR

The Proxy ITR (PITR) receives traffic from non-LISP sites destined to LISP EIDs. They behave similarly to ITRs:

They resolve the mapping for the destination EID.
They encapsulate and forward traffic to the destination RLOC.

The PITR sends a Map-Request to the MR, and when it receives the Map-Reply, it encapsulates the IP packets with LISP and transmits it to the ETR.

Let’s look at an illustration:

Let me explain what we see above:

Step 1

The PITR receives traffic from S1 with destination 192.168.1.101 (H1).

Step 2

The PITR sends a Map-Request to the MR to figure out what the RLOC is for EID: 192.168.1.101.

Step 3

The MR receives the Map-Request, and forwards it to the MS. The MS forwards the Map-Request to the ETR.

Step 4

The ETR replies to the PITR with a Map-Reply which contains the EID-to-RLOC mapping:

EID: 192.168.1.101
RLOC: 192.168.123.253

Step 5

The PITR encapsulates the IP packet and forwards it to the ETR.

Step 6

The ETR receives the LISP encapsulated IP packet, de-encapsulates it, and forwards the IP packet to H1.

A router that performs both the PETR and PITR functions is a Proxy xTR (PxTR) router.

Conclusion

You have now learned the basics of LISP, a mapping and encapsulation protocol:

LISP was originally developed to address the routing scalability issues on the Internet.
Internet routing tables have grown rapidly, mainly because of multihoming and traffic engineering, where we inject specific prefixes into BGP (disaggregation).
With traditional IP routing, the IP address has two functions:
- To identify the device
- The location of the device in the network.
LISP separates these two functions:
- Endpoint Identifier (EID): the IPv4 or IPv6 address of a host at a LISP site.
- Routing Locator (RLOC): the IPv4 or IPv6 address of ETR, which faces the non-LISP network like the Internet.
LISP is a map and encapsulation protocol. The mapping system is similar to how DNS operates. We ask a central system what the RLOC is to reach a specific EID.
Ingress Tunnel Router (ITR): LISP router that LISP encapsulates IP packets from EIDs.
Egress Tunnel Router (ETR): LISP router that de-encapsulates LISP encapsulated IP packets from outside of the LISP site and destined to EIDs within the LISP site.
Tunnel Router (xTR): LISP router that performs both the ITR and ETR functions. Most routers do this.
On the data plane, we add a LISP header:
- The instance ID ensures we have unique prefixes, useful when you have overlapping EID (private) prefixes.
- The ITR selects a UDP source port for the LISP UDP header to prevent CEF polarization when using ECMP.
LISP supports any combination of IPv4 and IPv6 EIDs and RLOCs.
An ETR uses a Map-Register message to register its EID prefixes with the MS and receives a Map-Notify message when the prefix is accepted.
An ITR that wants to encapsulate an IP packet from a host sends a Map-Request to the MR to figure out what the RLOC address is to reach the EID. The MR forwards the message to the MS.
Map-Server (MS): Device (usually a router) which learns EID-to-prefix mapping entries from ETRs and stores them in a local EID-to-RLOC mapping database.
Map-Resolver (MR): Device (usually a router) that receives Map-Requests from an ITR and finds the appropriate ETR to answer the Map-Request by checking the MS.
MS/MR: Device (usually a router) that performs both the MS and MR functions.
Proxy ITR (PITR): Similar to ITR but for non-LISP sites that send traffic destined to EIDs.
Proxy ETR (PETR): Similar to ETR. Used when a LISP site needs to send traffic to non-LISP sites.
Proxy xTR (PxTR): LISP router that performs both the PITR and PETR functions.
LISP Router: Router that performs the ITR, ETR, PITR, and PETR functions.
LISP is explained in detail in RFC 6833.

I hope you enjoyed this lesson. If you have any questions, please leave a comment.

Cloud Connectivity

In this lesson, we’ll take a look at different topics related to cloud connectivity:

Cloud connectivity: the different options to connect to public cloud providers.
SD-Access: Cisco’s SDN solution for enterprise campus networks.
SD-WAN: the new software method to configure and manage WAN connections.
Virtual switching: how we connect virtual machines and containers to the rest of the network.

There is a lot to cover so let’s get started.

Cloud Connectivity

When organizations just start with public cloud services they often use VPNs over the Internet to connect their on-premises applications to the public cloud services.

VPNs over Internet have several advantages:

Cost: Internet access is cheap compared to private WAN.
Availability: Internet access is available almost everywhere.
Migration: easy to switch to another cloud provider because you can reach them all over the Internet.
Mobile users: if you have many mobile users then they can access cloud services whenever they have an Internet connection.

There are also several disadvantages:

Bandwidth: depending on your users and applications, your Internet connection might not have enough bandwidth.
Quality of Service (QoS): the Internet is best-effort. You can prioritize your traffic when it leaves your router but there is no end-to-end QoS.
SLA: most ISPs don’t offer any SLAs. If they do, your Internet connection could be almost as pricey as a private WAN solution.

If you have applications with certain bandwidth/latency/jitter requirements then VPN over the Internet might not be the best solution.

Fortunately, all large cloud providers offer dedicated connections. We’ll take a look at the top 3 cloud providers: Amazon AWS, Google Cloud, and Microsoft Azure.

Amazon AWS Direct Connect

Amazon AWS offers Direct Connect, a dedicated connection from AWS to your office, datacenter, or co-location.

You can use 802.1Q VLAN tags for multiple virtual interfaces. This allows you to use one VLAN to access public services like S3 with public IP addresses, and another VLAN for private resources like EC2 instances with private IP addresses.

You can get speeds between 1 and 10 Gbps. Lower speeds are available through APN partners that support AWS Direct Connect.

Google Cloud Dedicated Interconnect

Google Cloud Dedicated Interconnect provides a physical connection between the customer and the Google Cloud network through a Google supported co-location. You can get speeds up to 10 Gbps per circuit and there is a maximum of eight circuits per Dedicated Interconnect connection. If you don’t need as much speed, you can use the Partner Interconnect option for speeds between 50 Bps and 10 Gbps.

Microsoft Azure ExpressRoute

With Microsoft Azure ExpressRoute, you connect directly to Azure, Office 365, and Dynamics 365 over a private WAN connection. You get access to all regions within the geopolitical region. With a premium add-on, you get connectivity across all geopolitical regions in the world. You can get speeds up to 10 Gbps.

Multicloud Connectivity

At the beginning of the cloud era, organizations typically had one or a few applications in a single cloud. Later, they started using more services but stuck to the same cloud provider.

Nowadays, there are many different cloud providers and they offer different services. Many organizations are interested to use the best services from each cloud provider so they look into multicloud strategies. This makes sense since each cloud provider offers different services and has its own strengths and weaknesses.

With a multicloud strategy, you are not locked in to one cloud provider and you can build your solution based on the best services from multiple cloud providers.

There are providers like Intercloud who offer a private connection that connects to multiple public cloud providers. This allows you to connect to all cloud providers without using each cloud provider’s own dedicated connection option.

Software Defined Access (SD-Access)

Enterprise networking can get pretty complex. There’s usually a campus, some remote branches, remote workers, and we connect everything together with WAN connections. We have many devices on the physical layer including routers, switches, firewalls, wireless LAN controllers, etc.

There is a lot going on in the logical topology: we have VLANs, VRFs, routing protocols, access-lists, firewall rules, etc. We configure pretty much everything manually with perhaps a bit of network automation to make our lives easier.

Back in 2007/2008, SDN showed up with the promise of automating everything and getting rid of the CLI by defining everything in software.

SDN however, is mostly about datacenters. In the datacenter, everything is about applications. In an enterprise, it’s about (mobile) users and devices. We have users working everywhere using laptops, tablets, and smartphones.

Enterprise networks use a lot of hardware appliances. New firewall? Order a Cisco ASA. Extra wireless LAN controller (WLC)? Order another WLC appliance.

Wouldn’t it be nice if we could spin up new services for our enterprise networks, similar to how cloud services work? If you need a new firewall, just click on a button and it starts a new vASA. Need another WLC? Hit the button, and it starts a virtual WLC.

This is one of the promises of Cisco’s SD-access: complete automation of your enterprise network, similar to how SDN/cloud solutions work. Here’s what it looks like:

There are five main components:

Fabric
APIC-EM Controller
Identity Services Engine (ISE)
Network Data Platform (NDP)
DNA center

Fabric

Let’s start with the fabric. This is where you find all hardware components that you are familiar with: routers, switches, wireless LAN controllers, access points, etc. This includes devices that run IOS and IOS XE.

SD-access uses an underlay responsible for forwarding traffic; it’s only job is to provide a transport mechanism. We keep the underlay network simple. We use an overlay network for the different services. The overlay network is flexible and programmable. Why this separation?

You don’t see this separation on most campus enterprise networks but it makes sense. For example, if you want to support a new application on your network then perhaps you have to change an existing access-list. If you change this access-list then it could also affect other applications in your network. You might want to implement the access-list changes during maintenance hours. If you mess up, you can always do a rollback.

With an underlay and overlay network, changes to the overlay network won’t affect your underlay network. It’s similar to using tunneling protocols like GRE or DMVPN. You can mess around with the tunnels, but it won’t affect the underlying network.

We use APIs to configure the hardware devices and to launch new services. It’s still possible to use the CLI for troubleshooting.

The fabric consists of three key components:

Control plane: based on Locator Identity Separator Protocol (LISP)
Data plane: based on Virtual Extensibe LAN (VXLAN)
Policy plane: based on Cisco TrustSec (CTS)

LISP simplifies routing by removing destination information from the routing table and moving it to a centralized mapping system It’s similar to DNS, a router sends a query to a central LISP mapping system and asks where a destination address is. This results in smaller routing tables and requires less CPU cycles.

We could use LISP on the data plane, but it can only tunnel L3 traffic. SD-access uses a modified version of VXLAN on the data plane, one of the reasons is that VXLAN supports L2 encapsulation.

On the policy plane we use Cisco TrustSec, Scalable Group Tags (SGT), and the SGT Exchange Protocol (SXP). We add endpoints that require a similar network policy to a shared group, and we add an SGT to the group. The SGT is a tag that is separate from a network address and you can attach network policies (QoS, PBR, etc.) to it.

This allows you to create network policies without mapping it to IP addresses or subnets. The SGT is added in the VXLAN header.

APIC-EM Controller

The APIC-EM controller is Cisco’s SDN controller for enterprise networks and supports IOS/IOS XE devices. You can use this as a standalone product.

SD-access uses the APIC-EM controller as the SDN controller to control all devices in the fabric. The APIC-EM controller is controlled by DNA center.

DNA Center

DNA center is the portal where you manage everything. It’s web-based, and right now only available as a hardware appliance. If you ask me, that’s strange considering SD-access is all about virtualization. They will probably release a virtual version sometime.

There are four key attributes of DNA center:

Design
Policy
Provision
Assurance

You can take a look at DNA center for yourself with the Cisco’s Sandboxes. You can log in directly into the DNA center GUI with username “devnetuser” and password “Cisco123!”.

Design

This is where you design your entire network:

Build the network hierarchy: add sites, buildings, etc.
IP address management: add the network and subnet addresses you want to use for your networks.
Network settings: configure DHCP, DNS, SNMP, and Syslog servers. You also create QoS and wireless profiles here.
Image repository: manage all IOS/IOS XE images of your devices in one place.

Cisco Dna Center Design Network Hierarchy

Policy

This is where we configure everything that is related to network policies. You create the policies and DNA center translates your policies into configurations for the hardware devices in the fabric.

Provision

This is where we add new devices to the network and where we apply network policies to devices.

Assurance

Assurance is where you monitor the entire network. You can see an overview of all network devices, (wireless) clients, and applications. You can monitor the health status and an overview of all issues in the network.

ISE

ISE is Cisco’s AAA product and has been out for a while now. ISE applies the policies you create through DNA center.

NDP

This is a new Cisco product. NDP is the analytics engine that analyzes all your logging information, NetFlow, SNMP, etc. It collects metrics of everything in the fabric, including devices, users, and “things” (Internet of Things). You can monitor everything that NDP collects through DNA center.

Software Defined WAN (SD-WAN)

Software Defined WAN (SD-WAN) is hot nowadays. Why?

Private WAN connections like MPLS are reliable but also expensive. WAN connections are usually a big chunk of the IT budget, so it’s understandable that organizations are interested in replacing their private WAN connections with regular Internet connections to reduce costs.

To understand SD-WAN, we first have to talk about some “problems” with traditional WAN connections. We can choose between private WAN connections or public Internet connections. Let’s compare these two options:

Cost: private WAN connections like MPLS are way more expensive than regular Internet connections.
Time to deploy: it takes longer to deploy a private WAN connection than a regular Internet connection.
SLA: Service providers offer SLAs for private WAN connections that we don’t have for regular Internet connections. There are providers who offer SLAs for “business” class Internet connections, but these are usually way more expensive than regular (consumer) Internet connections.
Packet loss: Internet connections have a higher packet loss rate compared to private WAN connections like MPLS.
QoS: Internet connections don’t offer any QoS. You can prioritize your outgoing traffic but that’s it, the Internet itself is like the wild west. Private WAN connections often support end-to-end QoS.

The way we use our WAN has also changed throughout the years. Most organizations had an HQ, remote users, and perhaps some branch offices. Branch offices were connected to the HQ with private WAN or VPNs over the Internet. Remote users used remote VPN over the Internet to connect.

Nowadays, organizations also run their own applications in the cloud instead of on-premises, and they use applications like Office 365 or Gsuite. Our traffic patterns look different now:

Hq Branch Remote User Cloud Internet Wan

What about network management? Each router has its own control plane, and we use the CLI to manually create our router configurations “box-by-box”. This is time-consuming and prone to errors. We can use network automation tools to make our lives easier, but the control plane remains decentralized.

SD-WAN promises to save money by using a combination of Internet and private WAN connections and make network management much easier.

One problem with SD-WAN is that each vendor has a different idea about what SD-WAN is. I’ll give you a basic overview of what SD-WAN is about. An SD-WAN solution has parts of the control plane centralized and is built with network automation and orchestration in mind. We create network policies globally and push them to all routers from a central location. You could create a QoS policy and push it to all your 500 branch routers with a single mouse click. We don’t use the CLI anymore. Instead, we have a GUI and use APIs to configure and manage our WAN connections. Some vendors still support a CLI if you want to do some troubleshooting.

We use multiple WAN connections and active/active per-application load-balancing. Let’s say we have a site with a fiber, cable, 4G, and DSL connection. SD-WAN monitors all these WAN connections and keeps track of performance metrics like the throughput and delay. It selects the WAN connection with the lowest latency and highest throughput.

When a certain link fails then it can fail over to the next best option. It can also do this on a per-application level. You could use the fiber connection for traffic to the public cloud and the cable connection for low-priority FTP traffic. It protectson traffic over public Internet connections with IPSec.

SD-WAN could be an alternative to an expensive private WAN link with an SLA that promises “five nines” of uptime (99.999%). The idea behind it is that multiple WAN connections are always more reliable than a single WAN connection.

Cisco SD-WAN Solutions

Cisco offers three SD-WAN solutions:

- Intelligent WAN (IWAN)
- Meraki SD-WAN
- Cisco SD-WAN (Viptela)

IWAN is Cisco’s first SD-WAN solution for the ISR platform and intended for hybrid WAN (MPLS and Internet) or Internet-only connections.

Behind the scenes they use some familiar protocols:

DMVPN with QoS
Wide Area Application Services (WAAS)
Application Visibility and Control (AVC)
Performance Routing (PfR)

Meraki SD-WAN is for existing Meraki customers that are interested in the advantages of SD-WAN. Here are some features that it offers:

Apply bandwidth, routing, and security policies from a central location to all WAN connections (MPLS, Internet, 4G, etc.)
Centralized network visibility and control.
QoS and bandwidth management with Meraki traffic shaping
Dynamic policy and performance-based path selection with automatic load balancing.
Secure connectivity with cloud applications, remote offices, or datacenters.

Cisco SD-WAN (Viptela)

Cisco acquired Viptela, a major SD-WAN player, in 2017 and re-branded it to Cisco SD-WAN. This is Cisco’s enterprise SD-WAN solution.

Components

This solution consists of four main components and one optional analytics component:

vManage (management)
vSmart (controller)
vEdge (routers)
vBond (orchestrator)
vAnalytics (analytics)

Let me explain these components.

vManage

vManage is the Network Management System (NMS) to configure and manage the entire SD-WAN solution. You can use a GUI or REST API to access it. This is where you create device configurations and network policies. vManage also alerts you when there are events or outages.

vSmart

vSmart is the control plane of the architecture. vSmart controllers advertise routes, security, and policy information. Cisco SD-WAN uses the proprietary Overlay Management Protocol (OMP) for this. vSmart implements the policies that you configure through vManage.

For example, imagine you create a policy through vManage where real-time voice traffic requires a latency of less than 100 ms. The vSmart controller downloads the policy, converts it into a format suitable for the vEdge routers and then implements it on all vEdge routers.

All vEdge routers peer with a vSmart controller, it’s a hub and spoke topology. It’s similar to a BGP route reflector or a DMVPN NHRP server. The vSmart controller only lives in the control plane and is never in the data plane.

vEdge

vEdge is the software or hardware routers at your sites and responsible for the data plane. vEdge routers connect to a vSmart controller through a Datagram Transport Layer Security (DTLS) connection. If you want to use hardware, you have the following options:

Viptela vEdge: 100, 1000, 2000, or 5000 series routers.
Cisco ISR and ASR: the IOS XE SD-WAN software image allows you to use Cisco SD-WAN on the ISR 1000, ISR 4000, and ASR 1000 series.
Cisco ENCS: similar to the ISR series, you can use the IOS XE SD-WAN software images for the ENCS 5000 series platform.

If you want to use software, you have two options for VMs:

vEdge Cloud
Cisco Cloud Services Router (CSR)

vBond

vBond is the orchestrator. It authenticates vSmart controllers and vEdge routers and coordinates connectivity between them. It tells vEdge routers where and how to connect to vManage and vSmart controllers. vBond requires a public IP address so that all devices can connect to it. When a vEdge router joins the SD-WAN, the first thing it talks to is the vBond orchestrator.

vAnalytics

vAnalytics is an optional analytics service. It gives you visibility of applications and your infrastructure in the entire SD-WAN. You can use it for forecasting, and it gives you recommendations about your traffic and WAN connections. This can be useful to figure out whether you need to upgrade or downgrade certain WAN connections.

Cloud or on-premises

You can implement Cisco SD-WAN with a combination of cloud and on-premises options:

The vEdge routers and vBond orchestrator are available as hardware or VMs.
vManage and vSmart controllers are only available as VMs.

You can run the VMs on-premises on ESXi or KVM, or host them at cloud providers like Amazon AWS or Microsoft Azure.

Cloud onRamp

In the traditional model, you find all on-premises infrastructure and applications in a central HQ site or data center. We connect our branch offices in a hub and spoke topology and route all traffic from the branch offices to the HQ or datacenter.

Organizations nowadays often use cloud SaaS applications like Office 365, Gmail, or Salesforce. Instead of running everything on-premises, we also use IaaS services in the public cloud.

The traditional hub and spoke model where we connect and route all branch traffic to the main site or datacenter doesn’t work anymore. Cisco SD-WAN connects sites directly to these SaaS applications or IaaS services using one or more WAN connections.

There are two options:

Cloud onRamp SaaS
Cloud onRamp IaaS

Cloud onRamp SaaS monitors the performance of all WAN connections from a branch office to a SaaS application. Each path gets a “quality of experience” performance score from 0-10, 10 being the highest score. It makes real-time decisions to choose the best performing path between the end users at the branch office and the SaaS application in the cloud. You can monitor this in the vManage GUI.

Cloud onRamp IaaS extends the SD-WAN network into the public cloud. Through vManage, you can automatically create vEdge cloud routers in the public cloud provider infrastructure. This allows you to connect directly from your on-premises vEdge routers to the vEdge cloud routers at the public cloud provider.

Virtual Switching

VMs and containers both require network connectivity. Since they are virtual, we also need virtual networking. In this section, we’ll talk about how VMs and containers connect to the network.

Virtual Machines

VMs have virtual NICs (vNIC) which connect to virtual switches (vSwitch). A vSwitch is similar to a normal layer two switch, but it’s virtual. We use a vSwitch so that a VM on a hypervisor can communicate with other VMs or external networks outside of the physical server through the physical NIC (pNIC). You can create one or more vSwitches on a hypervisor.

One issue with vSwitches is that you have to configure them on every hypervisor. This can become an administrative burden if you have lots of hypervisors. To make our lives easier, we can use distributed virtual switching. This is a feature that aggregates the vSwitches in a cluster of hypervisors and treats them as a single distributed switch.

Distributed virtual switching has advantages:

Centralized management of all vSwitches in a cluster.
Configuration consistency.
Allows network policies and statistics to migrate when a VM migrates from one hypervisor to another.

Here is an overview with examples of native and third-party vSwitches you can use for different hypervisors:

Hypervisor	Native vSwitch	Third party vSwitch
vSphere	Standard vSwitch Distributed Virtual Switch	Cisco Nexus 1000V Cisco VM-FEX IBM DVS 5000V HPE 5900v
Hyper-V	Native Hyper-V Virtual Switch	Cisco Nexus 1000V Cisco VM-FEX NEC Broadcom
KVM	Linux Bridge Open vSwitch (OVS)	Cisco Nexus 1000V Open vSwitch (OVS)
XEN	Open vSwitch (OVS)	Open vSwitch (OVS)

VMWare stopped support for third-party vSwitches since vSphere 6.5U2.

Containers

Containers also require network connectivity. We use virtual bridges instead of virtual switches. Docker is the most popular container engine at the moment so let me give you an example of what Docker networking looks like:

Docker creates a virtual bridge called Docker0 and assigns an RFC1918 private range subnet to it which is not in use on the host machine. Each container connects to the Docker0 bridge with a vEth interface that appears as the eth0 interface in the container. Each eth0 interface receives an IP address from the subnet of the Docker0 virtual bridge.

This allows all containers on the same host to communicate with each other. If you want containers to communicate with the outside world, then Docker uses Iptables and NAT to forward ports from the outside physical interface to the vEth interface.

This setup with the Docker0 virtual bridge and port forwarding is fine for a single host. If you have multiple hosts, lots of containers, require networking between containers, and want to scale containers then it’s best to dive into container orchestration like Docker Swarm or Kubernetes. Kubernetes is the most popular option today, something we will discuss in another lesson.

Conclusion

In this lesson, we talked about quite some items:

Cloud connectivity:
- The advantages and disadvantages of VPN over the Internet to access public cloud services.
- Private WAN connections for the big three cloud providers.
- Why most organizations look for a multicloud strategy nowadays.
SD-Access:
- Cisco’s SDN solution for enterprise campus networks.
- The different components of the SD-Access architecture.
SD-WAN:
- What SD-WAN is and why it is such a hot topic nowadays.
- The three Cisco SD-WAN solutions:
  - Intelligent WAN (IWAN)
  - Meraki SD-WAN
  - Cisco SD-WAN (Viptela)
    - We took a closer look at Cisco SD-WAN and its different components.
    - We discussed Cloud Onramp so you can connect directly to SaaS applications and IaaS services in the cloud.
Virtual switching:
- How VMs use vSwitches to connect to the network.
- How docker containers use virtual bridges to connect to the network.

I hope you enjoyed this lesson, if you have any questions, please leave a comment.