暗夜星空: August 2021

Thursday, August 19, 2021

NCS Plug USB device

RP/0/RP0/CPU0:NCS-5501-SE#show filesystem

Thu Sep 27 00:19:12.510 UTC
File Systems:

Size(b) Free(b) Type Flags Prefixes
3962216448 3851993088 flash-disk rw apphost:
5971681280 4506451968 harddisk rw harddisk:
2069835776 2048102400 flash-disk rw disk0:
0 0 network rw ftp:
31458738176 29750853632 flash-disk rw disk2: <<<<<<<<<<<<<<<<<<<<<<<
0 0 network rw tftp:
480907264 479059968 flash rw /misc/config
RP/0/RP0/CPU0:NCS-5501-SE#dir disk0:
Thu Sep 27 00:19:17.611 UTC

RP/0/RP0/CPU0:NCS-5501-SE#
RP/0/RP0/CPU0:NCS-5501-SE#
RP/0/RP0/CPU0:NCS-5501-SE#show log last 10
RP/0/RP0/CPU0:Sep 27 00:17:46.329 : usb_disk[67393]: %OS-SYSLOG-6-LOG_INFO : mounted device to /disk2: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<

RP/0/RP0/CPU0:NCS-5501-SE#dir disk2:
Thu Sep 27 00:21:11.084 UTC

Directory of disk2:
41 -rwxr-xr-x 1 1092833280 Feb 14 2017 ncs5500-mini-x.iso-6.1.3 <<<<<<<<<<<<<<<<<<<<<<<<<<<

30721424 kbytes total (29053568 kbytes free)
RP/0/RP0/CPU0:NCS-5501-SE#
RP/0/RP0/CPU0:NCS-5501-SE#
RP/0/RP0/CPU0:NCS-5501-SE#copy disk2:ncs5500-mini-x.iso-6.1.3 harddisk:ncs5500-mini-x.iso-6.1.3
Thu Sep 27 00:21:34.094 UTC
Destination filename [/harddisk:/ncs5500-mini-x.iso-6.1.3]?
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

Sunday, August 8, 2021

IOS XR Embedded Event Manager (EEM) Script Example

Here is an example of an event manager script which can be used to collect a few commands when a message like the one below is logged:

LC/0/0/CPU0:Dec  8 06:06:40.659 : netio[211]: %PKT_INFRA-PQMON-6-QUEUE_DROP : Taildrop on XIPC queue 5 owned by ipv4_io (jid=181)

So this script is an example of what we can do:

Save the syslog message.

Do some regex maching to isolate some part of the syslog message which triggered the message. It extracts the location but we could improve it even further to extract the XIPC queue or the jid.

Open a CLI to collect some commands (syntax of the commands may reuse parts of the syslog message extracted at the previous step).

Save the output of the commands to a file (with a timestamps in the filename to avoid overwriting previous files

Here are the files:

punt.tcl: The script to install on the router

how_to_install.txt: The install instructions to send to customer

When you need to troubleshoot another kind of problem which should be triggered by another syslog message, the punt.tcl script can be modified:

- The name of the script can be changed. The install instructions in the "how_to_install.txt" will have to be modified accordingly.

- The "event_register_syslog pattern" can be modified to match the syslog message.

- The "cmd_list" can be modified to contain another set of commands.

https://drive.google.com/file/d/1wBAzHhLOkSZpK_UXL62qU8uJg_Wi_Q2n/view?usp=sharing

https://drive.google.com/file/d/1N4S_RGre-FJ_WmDZfiwOp8tEnNsAObXR/view?usp=sharing

Quick Start Guide - Data Collection for Various SD-WAN Issues

Introduction

This document describes several SD-WAN issues along relevant data that must be collected in advance before you open a TAC case to improve the speed of troubleshooting and/or problem resolution. This document is broken up into two main technical sections: vManage and Edge routers. Relevant outputs and command syntax are provided dependent upon the device in question.

Contributed by Brandon Lynch, Nilesh Khade, Cisco Engineering. Gururajan Rao, Eugene Khabarov, Cisco TAC Engineers.

Prerequisites

Requirements

Cisco recommends that you have knowledge of these topics:

Cisco's SDWAN architecture
General understanding of solution, including vManage controller as well as cEdge (IOS-XE SD-WAN routers) and vEdge devices (ViptelaOS routers)

Components Used

This document is not restricted to specific software and hardware versions.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.

Base Information Requested

Describe the problem and its impact on your network and users:
- Describe an expected behavior.
- Describe in details observed behavior.
- Prepare a topology diagram with addressing if possible, even if this is hand-drawn.
When did the problem start?
- Note the day and time the problem was first observed/noticed.
What could be a potential trigger of the problem?
- Document any recent changes made prior to when the problem started.
- Note any specific actions or events that occurred that can have triggered the problem to start.
- Does this problem correspond to any other network events or actions?
What is the frequency of the problem?
- Was this a one-time occurrence?
- If not, how often does the problem happen?
Provide information about the device(s) in question:
- If specific devices are affected (not random), what do they have in common?
- System-IP and Site-ID for each device.
- If the issue is on a vManage cluster, provide the node details (if not the same across all nodes in the cluster).
- For general issues inside of the vManage GUI, capture all screenshots to a file that show error messages or other anomalies/disparties that need to be investigated.
Provide information on desired outcome from the TAC and your priorities:
- Do you want to recover from the failure as soon as possible or find out the root cause of the failure?

vManage

The issues here are common problem conditions reported for vManage along with useful outputs for each problem that must be collected in addition to an admin-tech file(s). For cloud-hosted controllers, Technical Assistance Center (TAC) engineer can have access to collect the required admin-tech outputs for the devices based on the feedback in the Base Information Requested section if you provide explicit consent for this. However, we recommend to capture admin-tech outputs if the steps described here to ensure the data contained within is relevant to the time of the problem. This is specifically true if the problem isn't persistent, meaning that the problem can disappear by the time TAC is engaged. For on-prem controllers, an admin-tech must also be included with each set of data here. For a vManage cluster, ensure you capture an admin-tech for each node in the cluster or only the affected node(s).

Slowness/Sluggishness

Problem Report: Slowness in accessing the vManage GUI, latency when performing operations inside of the GUI, general slowness or sluggishness seen within vManage

Step 1. Capture 2-3 instances of a thread print, rename each thread-print file with a numerical designation after each (note the use of the username that you log into vManage with in the file path), example:

vManage# request nms application-server jcmd thread-print | save /home/<username>/thread-print.1

Step 2. Log in to vshell and run vmstat as below:

vManage# vshell
vManage:~$ vmstat 1 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 316172 1242608 5867144 0 0 1 22 3 5 6 1 93 0 0
0 0 0 316692 1242608 5867336 0 0 0 8 2365 4136 6 1 93 0 0
0 0 0 316204 1242608 5867344 0 0 0 396 2273 4009 6 1 93 0 0
0 0 0 316780 1242608 5867344 0 0 0 0 2322 4108 5 2 93 0 0
0 0 0 318136 1242608 5867344 0 0 0 0 2209 3957 9 1 90 0 0
0 0 0 318300 1242608 5867344 0 0 0 0 2523 4649 5 1 94 0 0
1 0 0 318632 1242608 5867344 0 0 0 44 2174 3983 5 2 93 0 0
0 0 0 318144 1242608 5867344 0 0 0 64 2182 3951 5 2 94 0 0
0 0 0 317812 1242608 5867344 0 0 0 0 2516 4289 6 1 93 0 0
0 0 0 318036 1242608 5867344 0 0 0 0 2600 4421 8 1 91 0 0
vManage:~$

Step 3. Collect additional details from the vshell:

vManage:~$ top (press '1' to get CPU counts)
vManage:~$ free -h
vManage:~$ df -kh

Step 4. Capture all NMS services diagnostics:

vManage# request nms application-server diagnostics
vManage# request nms configuration-db diagnostics
vManage# request nms messaging-server diagnostics
vManage# request nms coordination-server diagnostics
vManage# request nms statistics-db diagnostics

API Failures/Issues

Problem Report: API calls fail to return any data or the correct data, general problems executing queries

Step 1. Check the memory available:

vManage:~$ free -h
total used free shared buff/cache available
Mem: 31Gi 24Gi 280Mi 60Mi 6.8Gi 6.9Gi
Swap: 0B 0B 0B
vManage:~$

Step 2. Capture 2-3 instances of a thread print with a 5-second gap in between, rename each thread-print file with a numerical designation after each run of the command (note the use of the username that you log into vManage with in the file path):

vManage# request nms application-server jcmd thread-print | save /home/<username>/thread-print.1
<WAIT 5 SECONDS>
vManage# request nms application-server jcmd thread-print | save /home/<username>/thread-print.2

Step 3. Collect details for any active HTTP sessions:

vManage# request nms application-server jcmd gc-class-histo | i io.undertow.server.protocol.http.HttpServerConnection

Step 4. Provide this details:

1. API calls executed

2. Invocation frequency

3. Log in method (i.e., usage of a single token to execute subsequent API calls or usage of basic authentication to execute the call and then logout)

4. Is the JSESSIONID being re-used?

Note Starting from 19.2 vManage software, only token-based authentication is supported for API calls. For more details on token generation, timeout, and expiration, see this link.

Deep Packet Inspection (DPI) Stats/Slowness

Problem Report: With DPI enabled, statistics processing can be slow or introduce slowness inside of the vManage GUI.

Step 1. Check the disk size allocated for DPI inside of vManage by navigating to Administration > Settings > Statistics Database > Configuration.

Step 2. Check the index health by running the following CLI command from vManage:

vManage# request nms statistics-db diagnostics

Step 3. Confirm if any API calls related to DPI stats are executed externally.

Step 4. Check disk I/O stats with help of this CLI command from vManage:

vManage# request nms application-server diagnostics

Template Push Failures

Problem Report: Template push or device template update fails or times out.

Step 1. Capture the Config Preview and Intent config from vManage before you click the Configure Devices button (navigation example provided here):

Screenshot 2020-08-31 at 12.40.12.png

Step 2. Enable viptela.enable.rest.log from the logsettings page (this must be disabled after capturing the required information):

https://<vManage IP>:8443/logsettings.html

Step 3. If the template push failure involves a NETCONF issue or error, enable viptela.enable.device.netconf.log in addition to the REST log in Step 1. Note that this log must also be disabled after the outputs from Step 3 and Step 4 are captured.

Step 4. Attempt to attach the failed template again from vManage and capture an admin-tech using this CLI (capture this for each node of for a cluster):

vManage# request admin-tech

Step 5. Provide screenshots from the task in vManage and the Config Diff to confirm the failure details along with any CSV files used for the template.

Step 6. Include details about the failure and task, including the time of the failed push, system-ip of the device that failed, and error message you see in the vManage GUI.

Step 7. If a template push failure happens with an error message reported for the configuration by the device itself, collect an admin-tech from the device as well.

Cluster-Related Issues

Problem Report: Cluster instability leading to GUI timeouts, sluggishness, or other anomalies.

Step 1. Capture the output from server_configs.json from each vManage node in the cluster. For example:

vmanage# vshell
vmanage:~$ cd /opt/web-app/etc/
vmanage:/opt/web-app/etc$ more server_configs.json | python -m json.tool
{
"clusterid": "",
"domain": "",
"hostsEntryVersion": 12,
"mode": "SingleTenant",
"services": {
"cloudAgent": {
"clients": {
"0": "localhost:8553"
},
"deviceIP": "localhost:8553",
"hosts": {
"0": "localhost:8553"
},
"server": true,
"standalone": false
},
"container-manager": {
"clients": {
"0": "169.254.100.227:10502"
},
"deviceIP": "169.254.100.227:10502",
"hosts": {
"0": "169.254.100.227:10502"
},
"server": true,
"standalone": false
},
"elasticsearch": {
"clients": {
"0": "169.254.100.227:9300",
"1": "169.254.100.254:9300",
"2": "169.254.100.253:9300"
},
"deviceIP": "169.254.100.227:9300",
"hosts": {
"0": "169.254.100.227:9300",
"1": "169.254.100.254:9300",
"2": "169.254.100.253:9300"
},
"server": true,
"standalone": false
},
"kafka": {
"clients": {
"0": "169.254.100.227:9092",
"1": "169.254.100.254:9092",
"2": "169.254.100.253:9092"
},
"deviceIP": "169.254.100.227:9092",
"hosts": {
"0": "169.254.100.227:9092",
"1": "169.254.100.254:9092",
"2": "169.254.100.253:9092"
},
"server": true,
"standalone": false
},
"neo4j": {
"clients": {
"0": "169.254.100.227:7687",
"1": "169.254.100.254:7687",
"2": "169.254.100.253:7687"
},
"deviceIP": "169.254.100.227:7687",
"hosts": {
"0": "169.254.100.227:5000",
"1": "169.254.100.254:5000",
"2": "169.254.100.253:5000"
},
"server": true,
"standalone": false
},
"orientdb": {
"clients": {},
"deviceIP": "localhost:2424",
"hosts": {},
"server": false,
"standalone": false
},
"wildfly": {
"clients": {
"0": "169.254.100.227:8443",
"1": "169.254.100.254:8443",
"2": "169.254.100.253:8443"
},
"deviceIP": "169.254.100.227:8443",
"hosts": {
"0": "169.254.100.227:7600",
"1": "169.254.100.254:7600",
"2": "169.254.100.253:7600"
},
"server": true,
"standalone": false
},
"zookeeper": {
"clients": {
"0": "169.254.100.227:2181",
"1": "169.254.100.254:2181",
"2": "169.254.100.253:2181"
},
"deviceIP": "169.254.100.227:2181",
"hosts": {
"0": "169.254.100.227:2888:3888",
"1": "169.254.100.254:2888:3888",
"2": "169.254.100.253:2888:3888"
},
"server": true,
"standalone": false
}
},
"vmanageID": "0"
}

Step 2. Capture details on which services are enabled or disabled for each node. For this, navigate to Administration > Cluster Management in the vManage GUI.

Step 3. Confirm underlay reachability on the cluster interface. For this, run ping <ip-address> from each vManage node in VPN 0 to the cluster interface IP of the other nodes.

Step 4. Collect diagnostics from all NMS services for each vManage node in the cluster:

vManage# request nms application-server diagnostics
vManage# request nms configuration-db diagnostics
vManage# request nms messaging-server diagnostics
vManage# request nms coordination-server diagnostics
vManage# request nms statistics-db diagnostics

Edge (vEdge/cEdge)

The issues here are common problem conditions reported for Edge devices along with useful outputs for each that must be collected. Ensure that for each problem, an admin-tech is collected for all necessary and relevant Edge devices. For cloud-hosted controllers, TAC can have access to collect the required admin-tech outputs for the devices based on the feedback in the Base Information Requested section. However, as with vManage, it can be necessary to capture these before you open a TAC case to ensure the data contained within is relevant to the time of the problem. This is specifically true if the problem isn't persistent, meaning that the problem can disappear by the time TAC is engaged.

Control Connections Not Forming Between Device and Controller

Problem Report: Control connection not forming from a vEdge/cEdge to one or more of the controllers

Step 1. Identify the local/remote error of the control connection failure:

For vEdge: output of show control connections-history command.
For cEdge: output of show sdwan control connection-history command.

Step 2. Confirm the state of the TLOC(s) and that any and all show 'up':

For vEdge: output of show control local-properties command.
For cEdge: output of show sdwan control local-properties command.

Step 3. For errors around timeouts or connection failures (i.e., DCONFAIL or VM_TMO), take control-plane captures on both the edge device as well as the controller in question:

For controllers:

vManage# tcpdump vpn 0 interface eth1 options "-vvvvvv host 192.168.44.6"
tcpdump -p -i eth1 -s 128 -vvvvvv host 192.168.44.6 in VPN 0
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 128 bytes
20:02:07.427064 IP (tos 0xc0, ttl 61, id 50139, offset 0, flags [DF], proto UDP (17), length 168)
192.168.44.6.12346 > 192.168.40.1.12346: UDP, length 140
20:02:07.427401 IP (tos 0xc0, ttl 64, id 37220, offset 0, flags [DF], proto UDP (17), length 210)
192.168.40.1.12346 > 192.168.44.6.12346: UDP, length 182

- For vEdge:

vEdge-INET-Branch2# tcpdump vpn 0 interface ge0/2 options "-vvvvvv host 192.168.40.1"
tcpdump -p -i ge0_2 -vvvvvv host 192.168.40.1 in VPN 0
tcpdump: listening on ge0_2, link-type EN10MB (Ethernet), capture size 262144 bytes
20:14:16.136276 IP (tos 0xc0, ttl 64, id 55858, offset 0, flags [DF], proto UDP (17), length 277)
10.10.10.1 > 192.168.40.1.12446: [udp sum ok] UDP, length 249
20:14:16.136735 IP (tos 0xc0, ttl 63, id 2907, offset 0, flags [DF], proto UDP (17), length 129)
192.168.40.1.12446 > 10.10.10.1.12346: [udp sum ok] UDP, length 101

- For cEdge (capture below assumes the device was moved to CLI mode and an Access Control List (ACL) called CTRL-CAP was created to filter - see more details in the EPC capture example in the Application/Network Performance scenario):

cEdge-Branch1#config-transaction
cEdge-Branch1(config)# ip access-list extended CTRL-CAP
cEdge-Branch1(config-ext-nacl)# 10 permit ip host 10.10.10.1 host 192.168.40.1
cEdge-Branch1(config-ext-nacl)# 20 permit ip host 192.168.40.1 host 10.10.10.1
cEdge-Branch1(config-ext-nacl)# commit
cEdge-Branch1(config-ext-nacl)# end

cEdge-Branch1#monitor capture CAP control-plane both access-list CTRL-CAP buffer size 10
cEdge-Branch1#monitor capture CAP start

cEdge-Branch1#show monitor capture CAP buffer brief 
----------------------------------------------------------------------------
# size timestamp source destination dscp protocol
----------------------------------------------------------------------------
0 202 0.000000 192.168.20.1 -> 50.50.50.3 48 CS6 UDP
1 202 0.000000 192.168.20.1 -> 50.50.50.4 48 CS6 UDP
2 220 0.000000 50.50.50.3 -> 192.168.20.1 48 CS6 UDP
3 66 0.000992 192.168.20.1 -> 50.50.50.3 48 CS6 UDP
4 220 0.000992 50.50.50.4 -> 192.168.20.1 48 CS6 UDP
5 66 0.000992 192.168.20.1 -> 50.50.50.4 48 CS6 UDP
6 207 0.015991 50.50.50.1 -> 12.12.12.1 48 CS6 UDP

Step 4. For other errors observed in the control connection history outputs and for more details on the issues described, please refer to the following guide .

Control Connections Flapping Between Edge Device and Controller

Problem Report: One or more control connections flap between a vEdge/cEdge and one or more controllers. This can be frequent, intermittent, or random in nature.

Control connection flaps are generally the result of packet loss or forwarding issues between a device and a controller. Often times, this will be tied to TMO errors, depending on the directionality of the failure. To check this further, first verify the reason for the flap:
- For vEdge/controllers: output of show control connections-history command.
- For cEdge: output of show sdwan control connection-history command.
Confirm the state of the TLOC(s) and that any and all show 'up' when the flapping is occurring:
- For vEdge: output of show control local-properties command.
- For cEdge: output of show sdwan control local-properties command.
Collect packet captures on both the controller(s) and the edge device. Please refer to the Control Connections Not Forming Between Device and Controller section for details on capture parameters for each side.

Bidirectional Forwarding Detection (BFD) Sessions Not Forming or Flapping Between Edge Devices

Problem Report: BFD session is down or is flapping up and down between two edge devices.

Step 1. Collect the state of the BFD session on each device:

For vEdge: output of show bfd sessions command.
For cEdge: output of show sdwan bfd sessions command.

Step 2. Collect Rx and Tx packet counts on each edge router:

For vEdge: output of show tunnel statistics bfd command.
For cEdge: output of show platform hardware qfp active feature bfd datapath sdwan summary command.

Step 3. If counters do not increase for BFD session on one end of the tunnel in the outputs above, captures can be taken using ACLs to confirm if packets are being received locally. More details on this along with other validations that can be done can be found here .

Device Crashes

Problem Report: Device unexpectedly reloaded and problems with power are ruled out. Indications from the device are that it crashed potentially.

Step 1. Check the device to confirm if a crash or unexpected reload was observed:

For vEdge: output of show reboot history command.
For cEdge: output of show sdwan reboot history command.
Alternatively, navigate to Monitor > Network, select the device, and then navigate to System Status > Reboot to confirm if any unexpected reloads were seen.

Step 2. If confirmed, capture an admin-tech from the device through vManage by navigating to Tools > Operational Commands. Once there, select the Options button for the device and select Admin Tech. Ensure all check boxes are checked, which will include all logs and core files on the device.

Application/Network Performance Degraded or Failing Between Sites

Problem Report: Application does not work/HTTP pages not load, slowness/latency in performance, failures after making policy or configuration changes

Step 1. Identify the source/destination IP pair for an application or flow exhibiting the problem.

Step 2. Determine all Edge devices in the path and collect an admin-tech from each through vManage.

Step 3. Take a packet capture on the edge devices at each site for this flow when the problem is seen:

For vEdge:
- Enable Data Stream under Administration > Settings For Hostname field, enter the system IP of vManage. For VPN, enter 0
- Ensure HTTPS is enabled under the allow-service configuration of the vManage VPN 0 interface.
- Follow the steps here to capture traffic on the service-side VPN interface.
For cEdge:
- Move the cEdge(s) to CLI mode via Configuration > Devices > Change Mode > CLI mode
- On the cEdge(s), configure an extended ACL to match traffic bidirectionally. Make this as specific as possible to include protocol and port to limit the size and data in the capture.
Configure Embedded Packet Capture (EPC) for the service-side interface in both directions, using the ACL created in (b) to filter the traffic. The capture can be exported to PCAP format and copied off the box. A sample configuration is provided here for GigabitEthernet0/0/0 on a router using an ACL named BROKEN-FLOW:

monitor capture CAP interface GigabitEthernet0/0/0 both access-list BROKEN-FLOW buffer size 10
monitor capture CAP start

show monitor capture CAP parameter
show monitor capture CAP buffer [brief]

monitor capture CAP export bootflash:cEdge1-Broken-Flow.pcap

- Configure Packet Trace for the traffic in both directions, using the ACL created in (b) to filter the traffic. A sample configuration is provided below:

debug platform packet-trace packet 2048 fia-trace
debug platform packet-trace copy packet input l3 size 2048
debug platform condition ipv4 access-list BROKEN-FLOW both
debug platform condition start

show platform packet-trace summary
show platform packet-trace packet all | redirect bootflash:cEdge1-PT-OUTPUT.txt

Step 4. If possible, repeat Step 3 in a working scenario for comparison.

Tip: If there are no other ways to copy the corresponding files off of the cEdge directly, the files can be copied to vManage first using method described here. Run the command on vManage:
request execute scp -P 830 <username>@<cEdge system-IP>:/bootflash/<filename> .
This file will then be stored in /home/<username>/ directory for the username you used to login to vManage. From there, you can use Secure Copy Protocol (SCP) of Secure File Transfer Protocol (SFTP) to copy file off a vManage using a third-party SCP/SFTP client or a Linux/Unix machine CLI with OpenSSH utilities.

IOS-XR Layer2 Interconnect

1. Data Center trends

Data Center Interconnects (DCI) products are targeted at the Edge or Border leaf of Data Center environments, joining Data Centers to each other in a Point-to-Point or Point-to-Multipoint fashion, or at times extending the connectivity to Internet Gateways or peering points. Cisco has two converged DCI solutions; one is with integrated DWDM and another with advanced L3 routing and L2 switching technologies. A recent Dell’Oro report, forecasts the aggregate sales of equipment for DCI will grow by 85 percent over the next five years. This is driving strong demand for Ethernet Data Center Switch, and Routing technologies.

The emerging need for simplified DCI offering spans four core markets.

· Mega Scale DC

· Cloud DC

· Telco Cloud

· Large Enterprises

The emergence of cloud computing has seen a rush of traffic being centralized in regional and global data center as the Data Center emergence to being the core of many service deliveries, more recently ‘far edge’ compute in 5G has reemphasized the trend, with DC’s now being at the core of 5G build outs, as Web companies and SPs embark on using automation and modern DC tools to turn up 5G sites at unprecedent rates and look at micro data centers at the edge to enhance the user experience.

DCI’s newest architectures is drive by massive DCs that need connecting by either leased lines from SPs or by deploying their own or leasing dark fiber.

Inside the DC they often deploy a mix of their home-grown applications over and defined technologies, mostly L2 type services to reach compute hosts at the peripherals, although we have seen recent trends of L3 being expended all the way to compute with Segment Routing (SR).

Outside the DC fiber is less abundant and inter DC solutions are fairly standardized with SP class products providing the richest functionality at the most optimal scale and price point. A motivation in the last 2 years for further DCI upgrades has been the migrating to MacSec for Inter DCI links.

A most recent trend is of 100GE and 400GE Data center build outs, driving DCI upgrades, we’re seeing customers migrate to higher speed links at different inflection points, with 100GE being the sweet spot current, creating catalyst for Terabit platforms that support advanced L2/L3 VPN services and Route and Bridge functions, case in point ASR9000 and NCS5500.

2. ASR 9000 L2 DCI GW feature overview

Ethernet VPN (EVPN) and Virtual Extensible LAN (VXLAN) have become very popular technologies for Data Center (DC) fabric solution. EVPN is used as a control plane for the VXLAN-based fabric and provides MAC addresses advertisements via MP-BGP. It eliminates the use of the flood and learn approach of original VXLAN standard (RFC7348.) As a result, DC fabric allows to reduce unwanted flooding traffic, increase load sharing, provide faster convergence and detection of link/device failures, and simplify DC automation.

ASR9k as a feature reach platform can be used in DC fabric as a DC edge router. With one leg in DC and other in WAN, ASR9k is the gateway for traffic leaving/entering DC. At DC fabric facing, ASR9k operates as border leaf. At MPLS WAN facing, ASR9k operates as WAN edge PE. Such type of router is commonly referred to as DC Interconnect (DCI) Gateway router or EVPN-VXLAN L2/L3 gateway.

There are two main use-cases that are widely known in the industry: L3 DCI gateway as a solution for L3VPN service on VXLAN fabric and L2-based gateway which provides L2 stitching between VXLAN fabric and MPLS -based Core.

The first use-case was available in XR release 5.3.2. It was the 1st phase of ASR9K based DCI GW solution

In phase 2, beginning from 6.1.1 release, EVPN Route Type 2 (MAC route) was integrated with L2 MAC learning/forwarding on the data plane. This functionality is called EVPN-VXLAN L2 gateway. Multicast routing was used as an underlay option for distribution of BUM (Broadcast, Unknown unicast, Multicast) traffic within VXLAN Fabric. Starting from release 6.3.1 ASR9K-based DCI solution supports Ingress Replication underlay capability on the VXLAN fabric side.

EVPN-VXLAN L2 gateway functionality will be described in the following sections.

The reference topology for this solution is shown at the picture below.

A close up of text on a white background

Description automatically generated

Nexus 9000 plays role of ToR/Leaf switches for this DCI solution. N9K ToRs should provide Per-VLAN first-hop L3 GW for Hosts/VMs behind the ToRs.

ASR9K DCI GW acts as an L2 EVPN-VXLAN GW within the fabric and participates in fabric-side EVPN control plane to learn local fabric MAC routes advertised from ToRs, and distribute external MAC routes learnt from remote DCI GWs, towards the ToRs. VXLAN data plane is used within the fabric. Fabric side BGP-EVPN sessions between DCI GWs and ToRs can be eBGP or iBGP.

On the core side, DCI GW will do BGP-EVPN peering with remote DCI GWs to exchange MAC routes together with host IP bindings (needed for ARP suppression on the ToRs). MPLS data plane is used on the core side. Core side BGP-EVPN session can be eBGP or iBGP as well.

On the WAN or external core side, ASR9K DCI GW acts as an L2 EVPN-MPLS GW that participates in EVPN control plane with remote DCI GWs to learn external MAC routes from them and distribute fabric MAC routes, learnt locally towards them. MPLS data plane is used with remote PODs, outside the fabric. In essence, the L2 DCI GW stitches the fabric side and WAN side EVPN control planes and in data plane, it bridges traffic between VXLAN tunnel bridge-port and MPLS tunnel bridge-port.

2.1. Distribution of BUM traffic

In DC switching network, L2 BUM traffic flooding between leaf nodes (including border leaf on DCI GW) is necessary. There are 2 operational modes for BUM traffic forwarding. The first mode is called egress replication. In this mode, the VXLAN underlay is capable of both L3 unicast and multicast routing. L2 BUM traffic is forwarded using underlay multicast tree. Packet replication is done by L3 IP multicast - an egress replication scheme.

The second mode of operation is called VXLAN Ingress Replication (VXLAN IR). This mode is used when VXLAN underlay transport network is not capable of L3 multicasting. In this mode, the VXLAN imposition node maintains a per VNI list of remote VTEP nodes which service the same tenant VNI. The imposition node replicates BUM traffic for each remote VTEP node. Each copy of VXLAN packet is sent to destination VTEP by underlay L3 unicast transport.

3. Multihoming deployment models

Depending on the capability of ToRs, ASR 9000 DCI supports two multi-homing deployment models

Anycast VXLAN L2 GW model

ESI based multi-homing VXLAN GW model

These models are based on different mechanisms inside DC fabric for multi-homing and load-balancing between ToRs and DCI GWs. On the MPLS WAN side both models use the same implementation approach.

3.1. Anycast VXLAN L2 Gateway

DC gateway redundancy and load sharing is a critical requirement for modern high-scalable data centers. Today, in every new DC deployment, multi-homing DCI gateway is a must-have requirement. Anycast VXLAN gateway is a simple approach of multi-homing. It requires multi-homing gateway nodes to use a common VTEP IP. Gateway nodes in the same DC advertise the common VTEP IP in all EVPN routes from type 2 to 5. N9k ToR nodes in the DC see one DCI GW VTEP located on multiple physical gateway nodes. Each N9k forwards traffic to the closest gateway node via IGP routing. Closest gateway is identified by shortest distance metric or ECMP algorithm.

Among multi-homing DCI gateway nodes, an EVPN Ethernet Segment is created on VXLAN facing interface NVE. One of the nodes is elected as DF for a tenant VNI. The DF node is responsible for flooding BUM traffic from Core to DC.

All DCI GW/PE nodes discover each other via EVPN routes advertised at WAN. EVPN L2 tunnels are fully meshed between DCI PE nodes attach to MPLS Core. See at the blue lines on the picture below.

A close up of a logo

Description automatically generated

This picture describes a topology of anycast VXLAN gateway between DC and Core. In this topology, both ASR9k DCI GW nodes share a source VTEP IP address. N9k runs vPC mode in pairs.

Redundant DCI GWs use anycast VTEP loopback to advertise towards the VXLAN fabric. This loopback IP is used by ToR to forward traffic to DCI GW.

In the pictures below the BUM traffic distribution is shown. Flooded BUM traffic is dropped by the non-DF DCI node in both directions to cut the loop. BUM packet received from VXLAN fabric side will be dropped on the VNI port on ingress. DF will flood on MPLS side, which will also go to the peer non-DF DCI. However, this non-DF DCI will drop it on the VNI port in egress direction.

A close up of a logo

Description automatically generated

In the direction from DC-2/DC-3 to DC-1, both ASR9k DCI GW nodes receive the same BUM traffic from MPLS WAN. The DF for the tenant VNI forwards traffic to DC-1. Non-DF drops BUM traffic from WAN.

The N9k leaf nodes work in VPC pair and VPC is responsible for prevention of duplication traffic toward VM/Host

A close up of a logo

Description automatically generated

3.2. All-Active Multi-Homing VXLAN L2 Gateway

Although anycast VXLAN provides a simple multi-homing solution for gateway, traffic going out of DC may not be properly load balanced on DCI GW nodes. This is due to load balance relying on IGP shortest path metric. A N9k often sends traffic to one DCI gateway only. To overcome the limitation, Ethernet Segment based all-active multi-homing VXLAN L2 gateway is introduced.

Figure below shows the topology of all-active multi-homing VXLAN L2 gateway. In this scenario, all leaf nodes and DCI nodes have a unique VTEP IP. Each N9k leaf node creates EVPN Ethernet Segment (ES1) for a dual-homed Host/Server. ASR9k border leaf nodes create an Ethernet Segment (ES3) for VXLAN facing NVE interface. Traffic from DCI GWs is load balanced between ToRs nodes. The same happens in opposite direction.

Each Leaf sends BUM traffic to both ASR9k nodes. To prevent traffic duplication, only one of the ASR9k nodes can accept VXLAN traffic from N9k leaf. This is done by DF rule. DF election is done at per tenant VNI level. Half of the VNIs elect top DCI GW as DF. The other half elect bottom DCI GW. DF accepts traffic both from DC and WAN. Non-DF drops traffic from DC and WAN. Load balance across VNI is thus achieved on the two DCI L2 gateway nodes.

In all-active multi-homing topology, the data plane must perform ingress DF at VXLAN fabric facing. For the ASM-based underlay option, it’s not a problem, non-DF recognizes multicast encapsulated packets and drop them. But if Ingress Replication is used, DCI GW should have additional identifications to recognize unknown unicast traffic. It hence requires a data plane to implement Section 8.3.3 of RFC 8365 – BUM flag in the VXLAN header. The flag is used to identify L2 flood traffic received from VXLAN, thus a non-DF node can perform ingress drop operation to prevent duplicated traffic sent to a destination.

A picture containing object

Description automatically generated

Traffic flow on all-active multi-homing VXLAN L2 gateway is illustrated in figure below. DC1 outbound BUM traffic arrives on a leaf first. Leaf replicates the traffic to two ASR9k DCI nodes. DF DCI nodes flood traffic to WAN. Non-DF node drops traffic from DC fabric. Traffic flooded to WAN goes to DC-2 and DC-3. One copy comes back to DC-1 via bottom DCI node. The bottom DCI node compares the split horizon label in the received MPLS packet. Drops the packet with split horizon rule.

A close up of text on a white background

Description automatically generated

In the reverse direction (picture below), DC-1 inbound traffic from DC-2/DC-3 arrives on both top and bottom DCI nodes. The bottom one drops traffic with DF rule. The top node forwards 2 copies to remote leaf nodes. The N9k leaf nodes apply DF rule before forwarding traffic to Host.

Sources:

Jiri Chaloupka’s presentation from Cisco Live “BRKSPG-3965 EVPN Deep Dive”