Showing posts with label SDWAN. Show all posts
Showing posts with label SDWAN. Show all posts

Saturday, December 11, 2021

Support of: unhide viptela_internal

 

Introduction

 

Starting 20.4 release, we have removed the support for unhide viptela_internal command which let (TAC) Engineers troubleshoot the customer issues.    unhide viptela_internal is no longer a valid command and any of the previously hidden commands that remain for field, use are either “support” commands, are have been made fully supported commands.   

 

 

Background

 

There are MANY hidden commands.  If you go to a Viptela device CLI you will not see “show internal” or “request internal” or “tools internal”.  But if you type “unhide viptela_internal” and then provide the password ”  ", you will then be able to see those.  And underneath them are many more commands, all usually hidden and none of them are documented.  This is considered a security violation under Cisco rules.  Because this is not documented, it is considered a back door.  Because there is a password, it appears to be a more serious back door.  And this password has been posted (by others) online. 

 

Also note, in 19.2.3, 20.1.2. 20.3.1 and 20.3.2, we no longer user "unhite viptela_internal" to access.   Instead, use "unhide full".  The password is the same as used with viptela_internal.  See CSCvt00497  for more information.

 

With CSCwa45995: We are removing all traces of "unhide viptela internal" from the cEdge platform.   As part of removing hidden config (which could be exposed via unhide command), some commands were missed on polaris.  With this CDETS, we will be removing all instances of "viptela_internal" hidegroup from the code.  

 

External Notification

 

The following CCO link is posted externally.  

For the reasons mentioned above, the password and the list of hidden commands are published in below link.

 

https://www.cisco.com/c/dam/en/us/td/docs/routers/sdwan/Internal-Commands/Troubleshooting-Commands-f...

 

 

 

20.4 and after

 

vEdge# unhide viptela_internal
Error: unknown hide group

 

Any of the previously hidden commands that remain for field use, are moved under the support option 

There may be some commands that could be missing.     See below for more information.

 

vEdge# tools support ?
Possible completions:
  fp-dump   Perform fp-dump on a network interface
vEdge#

 

vEdge# show support ?     
Possible completions:
  cellular       cellular support commands
  cloudexpress   cloudexpress support commands
  control        DTLS support shell commands
  dhcp           DHCP support commands
  dnsd           dnsd support commands
  dpi            dpi support commands
  filter         filter support commands
  fp             Fast-path support commands
  ftm            ftm support commands
  nat            nat support commands
  omp            OMP support commands
  pim            pim support commands
  resolv         resolvd support commands
  tracker        tracker support commands
  ttm            TTM support commands
  vrrp           VRRP support commands
vEdge#

 

vEdge# request support ?
Possible completions:
  cellular               
  debug-malloc           Malloc-trim in a daemon
  fp                     
  router-advertisement   Enable/Disable Ipv6 Router Advertisements tx/rx interface
  software               
  tcpopt                 
  vdebug                 Control vdebug RAM disk logging
vEdge#

 

For UnPinning of flows on vE2K

vEdge# request support fp unpin-flows

 

Moving the deivce to vManaged mode or not

Currently there is no option to move the device in or out of vManage mode.  This option is not directly available to the customer.  It requires the use of 'unhide viptela_internal', and then from config mode running 'no system is-vmanaged'.

In 20.4, this is missing.     CSCvx23574  is opened to track this.   This will address for both cEdge and vEdge platforms.

 

 

Capturing (existing) Internal commands

 

Below are the tools, show and request internal commands as taken from 20.3.1 node.

 

show internal

 

vEdge# show  internal ?
Possible completions:
  admin-tech     Admin-tech commands
  app-route      
  cellular       
  cfgmgr         Configuration Manager shell commands
  cflowd         
  cloudexpress   cloudexpress commands
  control        DTLS shell commands
  cxp-app        
  dbgd           
  dhcp           DHCP shell commands
  dnsd           dnsd commands
  dot1x          
  dpi            dpi commands
  filter         
  flow-db        Flow Database
  flow-summary   Flow Database Summary
  fp             Fast-path shell commands
  fpm            
  ftm            
  gps            
  igmp           
  nat            
  omp            OMP shell commands
  pim            
  policy         Policy shell commands
  resolv         
  rtm            RTM shell commands
  server-app     
  snmp           SNMP shell commands
  sysmgr         
  system         
  tcpopt-db      
  tcpopt-tcpd    
  tracker        Tracker shell commands
  ttm            TTM shell commands
  tunnel         
  vrrp           VRRP shell commands
  wlan           
  zbf            
vEdge#

 

request internal

 

vEdge# request internal ?
Possible completions:
  cloudexpress      Cloudexpress related tools command
  embargo           vEdge embargo check
  fec               
  fp-dump           Perform fp-dump on a network interface
  ftm               
  interface-reset   
  live-core         Generate non-disruptive coredump of a running process
  malloc-trim       Malloc-trim in a daemon
  reset             Reset system or logs
  software          
  tcpopt            
  vdebug            Control vdebug RAM disk logging
  vedge-cloud       vEdge cloud internal commands
vEdge#

 

tools internal

 

vEdge# tools internal ?
Possible completions:
  clean_db            Remove vManage data
  csr_read            Reading cavium registers.
  csr_write           Writing into cavium registers.
  ethtool             ethtool
  firmware-printenv   Display environment variables.
  fp-dump             Perform fp-dump on a network interface
  hostapd_cli         hostapd_cli
  i2cdetect           i2cdetect tool.(Only for Mips)
  i2cdump             i2cdump tool.
  i2cget              i2cget tool. (only for Mips)
  i2cset              i2cset tool.
  mdio-read           mdio-read
  mdio-write          mdio-write
  mii-tool            mii-tool
  oui-lookup          Perform OUI lookup for show arp.
  poe-tool            poe-tool
  process_id          Find process ID.
  remove_tenancy      Remove Tenancy file on vManage
  tlv_tool            TLV tool.(Only for Mips)
  touch_test_root     Create or remove /usr/share/viptela/test_root for allowing any root cert for sw vedges.
  tracker             Add Latency on the interface for tracker packets
  valgrind_tool       Enable valgrind on a process.
vEdge#

 

 

Troubleshooting SD-WAN cEdge IPsec Replay Failures

 

Introduction

 

IPsec authentication provides built-in anti-replay protection against old or duplicated IPsec packets by checking the sequence number in the ESP header on the receiver. Anti-replay packet drops is one of the most common data-plane issues with IPsec due to packets delivered out of order outside of the anti-replay window. A general troubleshooting approach for IPsec anti-replay drops can be found here, and general technique applies to SDWAN as well. However, there are some implementation differences between traditional IPsec and IPsec used in the Cisco SD-WAN solution. This article is intended to explain these differences and the troubleshooting approach on the cEdge platforms running IOS-XE.

 

SDWAN Replay Detection Considerations

 

Group key vs. Pairwise key

 

Unlike traditional IPsec, where IPsec SAs are negotiated between two peers using the IKE protocol, SDWAN uses a group key concept. In this model, an SDWAN edge device periodically generates data plane inbound SA per TLOC and send these SAs to the vSmart controller, which in turn propagates the SA to the rest of the edges devices in the SD-WAN network. For a more detailed description of the SD-WAN data plane operations, see SD-WAN Data Plane Security Overview.

 

Note: Starting from IOS-XE 16.12.1a/SD-WAN 19.2, IPsec pairwise keys are supported. See IPsec Pairwise Keys Overview. With Pairwise keys, IPsec anti-replay protection works exactly like traditional IPsec. This article primarily will focus on replay check using the group key model.

 

SPI Encoding

 

In the IPsec ESP header, the SPI (Security Parameter Index) is a 32 bit value that the receiver uses to identify the SA to which an incoming packet should be decrypted with. With SD-WAN, this inbound SPI can be identified with show crypto ipsec sa:

 

cedge-2#show crypto ipsec sa | se inbound
     inbound esp sas:
      spi: 0x123(291)
        transform: esp-gcm 256 ,
        in use settings ={Transport UDP-Encaps, esn}
        conn id: 2083, flow_id: CSR:83, sibling_flags FFFFFFFF80000008, crypto map: Tunnel1-vesen-head-0
        sa timing: remaining key lifetime 9410 days, 4 hours, 6 mins
        Kilobyte Volume Rekey has been disabled
        IV size: 8 bytes
        replay detection support: Y
        Status: ACTIVE(ACTIVE)

Note: The SPI displayed with this command may not be the actual SA used in the data plane due to CSCvt06182 .

 

Notice even though this inbound SPI is the same for all the tunnels, the receiver has a different SA and the corresponding replay-window object associated with the SA for each peer edge device since the SA is identified by the source, destination ip address, source, destination ports 4-tuple, and the SPI number. So essentially, each peer will have its own anti-replay window object.

 

When looking at the actual packet sent by the peer device, one may notice the SPI value is different from the above output. Here is an example from the packet-trace output with the packet copy option enabled:

 

Packet Copy In
  45000102 0cc64000 ff111c5e ac127cd0 ac127cd1 3062303a 00eea51b 04000123
  00000138 78014444 f40d7445 3308bf7a e2c2d4a3 73f05304 546871af 8d4e6b9f

The actual SPI in the ESP header is 0x04000123. The reason for this is that, the leading bits in the SPI for SD-WAN are encoded with additional information, and only the low bits of the SPI field are allocated for the actual SPI. 

 

Traditional IPsec:

 

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               Security Parameters Index (SPI)                 | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 

SD-WAN:

 

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
|  CTR  | MSNS|         Security Parameters Index (SPI)         | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 

Where:

 

  • CTR (first 4 bits, bits 0-3) - Control Bits, used to indicate specific type of control packets. For example control bit 0x80000000 is used for BFD.
  • MSNS (next 3 bits, bits 4-6) - Multiple Sequence Number Space Index. This is used to locate the correct sequence counter in the sequence counter array to check for replay for the given packet. For SD-WAN, the 3 bit of MSNS allows for 8 different traffic classes to be mapped into their own sequence number space. This implies the effective SPI value that can be used for SA selection is the reduced low order 25 bits from the full 32 bit value of the field. More on this below.

 

 

Multiple Sequence Number Space for QoS

 

It is common to observe IPsec replay failures in an environment where packets are delivered out of order due to QoS, e.g., LLQ, since QoS is always run after IPsec encryption and encapsulation. The Multiple Sequence Number Space solution solves this problem by maintaining multiple sequence number spaces mapped to different QoS traffic classes for a given Security Association. The different sequence number space is indexed by the MSNS bits encoded in the ESP packet SPI field as depicted above. For a more detailed description, please see IPsec Anti Replay Mechanism for QoS

 

As noted above, this Multiple Sequence Number implementation implies the effective SPI value that can be used for SA selection is the reduced low order 25 bits. Another practical consideration when configuring the replay window size with this implementation is that, the configured replay-window size is for the aggregate replay window, so the effective replay window size for each Sequenc Number Space is 1/8 of the aggregate. For example, with the following configuration:

 

config-t
Security
ipsec
replay-window 1024
Commit

 

The effective replay window size for each Sequence Number Space is 1024/8 = 128!

 

Note: starting from IOS-XE 17.2.1, the aggregate replay window size has been increased to 8192 so that each Sequence Number Space can have a maximum replay window of 8192/8 = 1024 packets. This change was introduced with CSCvs51630 .

 

On an IOS-XE cEdge device, the last sequence number received for each requence number space can be obtained from the following IPsec dataplane output:

 

cedge-2#show crypto ipsec sa peer 172.18.124.208 platform

<snip>

------------------ show platform hardware qfp active feature ipsec datapath crypto-sa 5 ------------------

 Crypto Context Handle: ea54f530
 peer sa handle: 0
 anti-replay enabled
 esn enabled
 Inbound SA
 Total SNS: 8
 Space                highest ar number
 ----------------------------------------
   0                               39444
   1                                   0
   2                                1355
   3                                   0
   4                                   0
   5                                   0
   6                                   0
   7                                   0
<snip>

In the above example, the highest anti-replay window (Right edge of the anti-replay sliding window) for MSNS of 0 (0x00) is 39444, and that for MSNS of 2 (0x04) is 1335, and these counters will be used to check if the sequence number is inside of the replay window for packets in the same sequence number space.

 

Note: There are implementation differences betweem the ASR1k platform and the rest of the IOS-XE routing platforms (ISR4k, ISR1k, CSR1kv). As a result, there are some discrepancies in terms of the show commands and their output for these platforms. Currently, there is no command that will display the inbound top replay window edge on the ASR1k platform. This will hopefully be addressed in 17.3 as part of our serviceability effort.

 

Troubleshooting Replay Drop Failures

 

Troubleshooting Data Collection

 

When dealing with IPsec anti-replay drops, it's important to understand the conditions and potential triggers of the problem. At a minimum, collect the following set of information for to provide the context:

 

  • Device information for both the sender and receiver for the replay packet drops, including type of device, cEdge vs. vEdge, software version, and configuration.
  • Problem history. How long has the deployment been in place? How long has the problem been happening? Any recent changes to the network or traffic conditions.
  • Any pattern to the replay drops, e.g., is it sporadic or constant? Time of the problem and/or significant event, e.g., does it only happen during high traffic peak production hours, or only during rekey, etc.?


With the above information collected, proceed with the following troubleshooting workflow.

 

Troubleshooting workflow

 

The general troubleshooting approach for IPsec replay issues is just like how it's performed for traditional IPsec, while taking into account the per-peer SA sequence space and Multiple Sequence Number Space as explained above. Then follow these steps:

 

1. First identify the peer for the replay drop from the syslog and the drop rate. For drop statistics, always collect multiple timestamped snapshots of the output so that the drop rate can be quatified:

 

*Feb 19 21:28:25.006: %IOSXE-3-PLATFORM: R0/0: cpp_cp: QFP:0.0 Thread:000 TS:00001141238701410779 %IPSEC-3-REPLAY_ERROR: IPSec SA receives anti-replay error, DP Handle 6, src_addr 172.18.124.208, dest_addr 172.18.124.209, SPI 0x123

cedge-2#show platform hardware qfp active feature ipsec datapath drops
Load for five secs: 1%/0%; one minute: 1%; five minutes: 1%
No time source, *11:25:53.524 EDT Wed Feb 26 2020
------------------------------------------------------------------------
Drop Type  Name                                     Packets
------------------------------------------------------------------------
        4  IN_US_V4_PKT_SA_NOT_FOUND_SPI                              30
       19  IN_CD_SW_IPSEC_ANTI_REPLAY_FAIL                            41

It's not uncommon to see occasional replay drops due to packet delivery reordering in the network, but persistent replay drops that's service impacting should be investigated.

 

2a. For relatively low traffic rate, take a packet-trace using a condition set to be the peer ipv4 address with the copy packet option and examine the sequence numbers for the packet dropped against the current replay window right edge and sequence numbers in the adjacent packets to confirm if they are indeed duplicate or outside of the replay window.

 

2b. For high traffic rate with no predictable trigger, setup an EPC capture using circular buffer and EEM to stop the capture when replay errors are detected. Since EEM is currently not supported on vManage as of 19.3, this implies the cEdge would have to be in CLI mode when this troubleshooting task is performed. Once the capture is taken, use the BDB IPsec replay analyzer to analyze the packet capture for replay conditions.

 

3. Collect the show crypto ipsec sa peer x.x.x.x platform on the receiver ideally at the same time the packet capture or packet-trace is collected. This command should include the realtime dataplane replay window information for both the inbound and outbound SA.

 

4. If the packet dropped is indeed out of order, then take simultaneous captures from both the sender and receiver to identify if the problem is with the source or with the underlay network delivery layer.

 

5. If the packets are dropped even though they are neither duplicate nor outside of the replay window, then it's usually indicative of a software problem on the receiver.

 

Known Issues/Enhancements

 

  • CSCvq31153  SDWAN BFD session stuck and packet drops due to IN_CD_SW_IPSEC_ANTI_REPLAY_FAIL drops
  • CSCvr64231  BFD down with IPSec SA receives anti-replay error after NAT session flap sometimes
  • CSCvs48535  %IPSEC-3-REPLAY_ERROR: + BFD down and drops IN_CD_COPROC_ANTI_REPLAY_FAIL (vEdge incorrectly resets ESP seq.)
  • CSCvn79788  Incorrect syslog for anti-replay error on TSN1100 platform with SDWAN per-Tunnel QoS
  • CSCvs51630  cEdge: 'security ipsec replay-window' needs to support 8192
  • CSCvq75871 : SDWAN ipsec anti-replay drops for all packets when NAT session flap
  • CSCvn67507 : Packet drops due to IPSec-input and anti-replay when remote TLOC flaps
  • CSCvx15750 : SD-WAN:cEdge ipsec replay-window size decreases to 128 after a peer reloading
  • CSCvr64231 : BFD down with IPSec SA receives anti-replay error after NAT session flap sometimes
  • CSCvw00044 : 20.4-EFT: BFD sessions down on vEdge due to rx_replay_integrity_drops - Polaris side commit
  • CSCvs98389 : Packet drops in XE-SDWAN because of "IN_CD_COPROC_ANTI_REPLAY_FAIL" errors

 

 

References

 

 

Troubleshoot IPsec Issues for Service Tunnels on vEdges with IKEv2

 

Introduction

 

This document describes how to troubleshoot the most common issues for Internet Protocol security (IPsec) tunnels to third-party devices with Internet Key Exchange version 2 (IKEv2) configured. Most commonly referenced as Service/ Transport Tunnels on  Cisco SD-WAN documentation. This document also explains how to enable and read IKE debugs and associate them to the packet exchange to understand the point of failure on an IPsec negotiation.

 

 

Prerequisites

 

Requirements

 

Cisco recommends that you have knowledge of these topics:

 

  • IKEv2
  • IPsec negotiation
  • Cisco SD-WAN

 

Components Used

 

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.

 

Background Information

 

IKE Glossary

 

  • Internet Protocol security (IPsec) is a standard suite of protocols between 2 communication points across the IP network that provide data authentication, integrity, and confidentiality.
  • Internet Key Exchange version 2 (IKEv2) is the protocol used to set up a security association (SA) in the IPsec protocol suite.
  • security association (SA) is the establishment of shared security attributes between two network entities to support secure communication. An SA can include attributes such as cryptographic algorithm and mode; traffic encryption key; and parameters for the network data to be passed over the connection.
  • The vendor IDs (VID) are used to identify peer devices with the same vendor implementation in order to support vendor-specific features.
  • Nonce: random values created in the exchange to add randomness and prevent replay attacks.
  • Key-exchange (KE) information for the Diffie-Hellman (DH) secure key-exchange process.
  • Identity Initiator/responder (IDi/IDr) is used to send out authentication information to the peer. This information is transmitted under the protection of the common shared secret.
  • The IPSec shared key can be derived with the use of DH again to ensure Perfect Forward Secrecy (PFS) or with a refresh of the shared secret derived from the original DH exchange.
  • Diffie–Hellman (DH) key exchange is a method of securely cryptographic algorithms exchange over a public channel.
  • Traffic Selectors (TS) are the proxy identities or traffic exchanged on the IPsec negotiation to pass through the tunnel encrypted.

 

 

IKEv2 Packet Exchange

 

Each IKE packet contains payload information for the tunnel establishment. The IKE glossary explains the abbreviations shown on this image as part of the payload content for the packet exchange.

 

IKEV2 - NAT-T.pngIKEV2-Exchange

Note: It is important to verify on what packet exchange of the IKE negotiation the IPsec Tunnel fails to quickly analyze what configuration is involved to address the issue effectively.

 

Note: This document does not describe deeper the IKEv2 Packet exchange. For more references, navigate to IKEv2 Packet Exchange and Protocol Level Debugging

 

It is needed to correlate the vEdge configuration against the Cisco IOS® XE config. Also, it is useful to match the IPsec concepts and the payload content for IKEv2 packet exchanges as shown in the image.

 

Conf.jpg

 

 

Note: Each part of the configuration modifies an aspect of the IKE negotiation exchange. It is important to correlate the commands to the protocol negotiation of IPsec.

 

Troubleshoot

 

Enable IKE debugs

 

On vEdges debug iked enables debug level information either IKEv1 or IKEv2.

 

debug iked misc high
debug iked event high

It is possible to display the current debug information within vshell and run the command tail -f <debug path>.

vshell
tail -f /var/log/message

In CLI is also possible to display the current logs/debug information for the path specified.

monitor start /var/log/messages

 

Tips to Start the Troubleshoot Process for IPsec Issues

 

It is possible to separate three different IPsec scenarios. It is a good point of reference to identify the symptom to have a better approach to know how to start.

 

  1. IPsec tunnel does not establish.
  2. IPsec tunnel went down and it re-established on its own. (Flapped)
  3. IPsec tunnel went down and it stays on a downstate.

 

For the IPsec tunnel does not establish symptoms, it is needed to debug in real-time to verify what is the current behavior on the IKE negotiation.

For IPsec tunnel went down and it re-established on its own symptoms, most commonly known as tunnel Flapped and the root cause analysis (RCA) is needed. It is indispensable to know the timestamp when the tunnel went down or have an estimated time to look at the debugs.

For IPsec tunnel went down and it stays on downstate symptoms, it means the tunnel worked before but for any reason, it came down and we need to know the teardown reason and the current behavior that prevents the tunnel to be successfully established again.

 

Identify the points before the troubleshoot starts:

 

  1. IPsec tunnel (Number) with issues and configuration.
  2. The timestamp when the tunnel went down (if applicable).
  3. IPsec peer IP address (Tunnel destination).

 

All the debugs and logs are saved on /var/log/messages files, for the current logs, they are saved on messages file but for this specific symptom the flap could be identified hours/days after the issue, most probably debugs related would be on messages1,2,3..etc. It is important to know the timestamp to look at the right message file and analyze the debugs (charon) for the IKE negotiation of the IPsec Tunnel related.

 

Most of the debugs do not print the number of the IPsec tunnel. The most frequent way to identify the negotiation and packets is with the IP address of the remote peer and the IP address where the tunnel is sourced on the vedge. Some examples of IKE debugs printed:

 
Jun 18 00:31:22 vedge01 charon: 09[CFG] vici initiate 'child_IPsec2_1'
Jun 18 00:31:22 vedge01 charon: 16[IKE] initiating IKE_SA ipsec2_1[223798] to 10.10.10.1
Jun 18 00:31:22 vedge01 charon: 16[IKE] initiating IKE_SA ipsec2_1[223798] to 10.10.10.1

The debugs for the IKE INIT negotiation show the IPsec Tunnel number, However, the subsequent information for packet exchange only uses the IPsec tunnel IP addresses. 


Jun 18 00:31:22 vedge01 charon: 09[CFG] vici initiate 'child_ipsec2_1'
Jun 18 00:31:22 vedge01 charon: 16[IKE] initiating IKE_SA ipsec2_1[223798] to 10.10.10.1
Jun 18 00:31:22 vedge01 charon: 16[IKE] initiating IKE_SA ipsec2_1[223798] to 10.10.10.1
Jun 18 00:31:22 vedge01 charon: 16[ENC] generating IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(REDIR_SUP) ]
Jun 18 00:31:22 vedge01 charon: 16[NET] sending packet: from 10.132.3.92[500] to 10.10.10.1[500] (464 bytes)
Jun 18 00:31:22 vedge01 charon: 12[NET] received packet: from 10.10.10.1[500] to 10.132.3.92[500] (468 bytes)
Jun 18 00:31:22 vedge01 charon: 12[ENC] parsed IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(HTTP_CERT_LOOK) N(FRAG_SUP) V ]
Jun 18 00:31:22 vedge01 charon: 12[ENC] received unknown vendor ID: 4f:85:58:17:1d:21:a0:8d:69:cb:5f:60:9b:3c:06:00
Jun 18 00:31:22 vedge01 charon: 12[IKE] local host is behind NAT, sending keep alives

IPsec tunnel configuration:

interface ipsec2
  ip address 192.168.1.9/30
  tunnel-source      10.132.3.92
  tunnel-destination 10.10.10.1
  dead-peer-detection interval 30
  ike
   version      2
   rekey        86400
   cipher-suite aes256-cbc-sha1
   group        14
   authentication-type
    pre-shared-key
     pre-shared-secret $8$wgrs/Cw6tX0na34yF4Fga0B62mGBpHFdOzFaRmoYfnBioWVO3s3efFPBbkaZqvoN
    !
   !
  !
  ipsec
   rekey                   3600
   replay-window           512
   cipher-suite            aes256-gcm
   perfect-forward-secrecy group-14
  !

 

Symptom 1.  IPsec Tunnel Does Not Get Established

 

As the issue can be the first implementation for the tunnel, it has not been up and the IKE debugs are the best option.

 

Symptom 2.  IPsec Tunnel Went Down and It Was Re-established on Its Own

 

As previously mentioned, usually this symptom is addressed to know the root cause of why the tunnel went down. With the root cause analysis known, sometimes, the network's admin prevents further issues.

 

Identify the points before the troubleshoot starts:

 

  1. IPsec tunnel (Number) with issues and configuration.
  2. The timestamp when the tunnel went down.
  3. IPsec peer IP address (Tunnel destination)

 

DPD Retransmissions

In this example, the tunnel went down on Jun 18 at 00:31:17.

 

Jun 18 00:31:17 vedge01 FTMD[1472]: %Viptela-vedge01-FTMD-6-INFO-1000001: VPN 1 Interface ipsec2 DOWN
Jun 18 00:31:17 vedge01 FTMD[1472]: %Viptela-vedge01-ftmd-6-INFO-1400002: Notification: interface-state-change severity-level:major host-name:"vedge01" system-ip:4.0.5.1 vpn-id:1 if-name:"ipsec2" new-state:down

Note: The logs for IPsec tunnel down are not part of iked debugs, they are FTMD logs. Therefore, neither charon nor IKE would be printed.

 

Note: The related logs are not usually together printed, there be more information between them not related to the same process. 

 

Step 1. After the timestamp is identified and the time and the logs are correlated, start to review the logs from bottom to top.

 

Jun 18 00:31:17 vedge01 charon: 11[IKE] giving up after 3 retransmits

 

Jun 18 00:28:22 vedge01 charon: 08[IKE] retransmit 3 of request with message ID 543 (tries=3, timeout=30, exchange=37, state=2) 
Jun 18 00:28:22 vedge01 charon: 08[NET] sending packet: from 10.132.3.92[4500] to 10.10.10.1[4500] (76 bytes)  

 

Jun 18 00:26:45 vedge01 charon: 06[IKE] retransmit 2 of request with message ID 543 (tries=3, timeout=30, exchange=37, state=2) 
Jun 18 00:26:45 vedge01 charon: 06[NET] sending packet: from 10.132.3.92[4500] to 10.10.10.1[4500] (76 bytes)

 

Jun 18 00:25:21 vedge01 charon: 08[IKE] sending DPD request 
Jun 18 00:25:21 vedge01 charon: 08[ENC] generating INFORMATIONAL request 543 [ ]
Jun 18 00:25:21 vedge01 charon: 08[NET] sending packet: from 10.132.3.92[4500] to 10.10.10.1[4500] (76 bytes)
Jun 18 00:25:51 vedge01 charon: 05[IKE] retransmit 1 of request with message ID 543 (tries=3, timeout=30, exchange=37, state=2)
Jun 18 00:25:51 vedge01 charon: 05[NET] sending packet: from 10.132.3.92[4500] to 10.10.10.1[4500] (76 bytes)

 

The last successful DPD packet exchange is described as request # 542.

 

Jun 18 00:24:08 vedge01 charon: 11[ENC] generating INFORMATIONAL request 542 [ ] 
Jun 18 00:24:08 vedge01 charon: 11[NET] sending packet: from 10.132.3.92[4500] to 10.10.10.1[4500] (76 bytes)
Jun 18 00:24:08 vedge01 charon: 07[NET] received packet: from 13.51.17.190[4500] to 10.10.10.1[4500] (76 bytes)
Jun 18 00:24:08 vedge01 charon: 07[ENC] parsed INFORMATIONAL response 542 [ ]

 

 Step 2. Put all the information together in the right order:

 

Jun 18 00:24:08 vedge01 charon: 11[ENC] generating INFORMATIONAL request 542 [ ] 
Jun 18 00:24:08 vedge01 charon: 11[NET] sending packet: from 10.132.3.92[4500] to 10.10.10.1[4500] (76 bytes)
Jun 18 00:24:08 vedge01 charon: 07[NET] received packet: from 10.10.10.1[4500] to 10.132.3.92[4500] (76 bytes)
Jun 18 00:24:08 vedge01 charon: 07[ENC] parsed INFORMATIONAL response 542 [ ]

Jun 18 00:25:21 vedge01 charon: 08[IKE] sending DPD request
Jun 18 00:25:21 vedge01 charon: 08[ENC] generating INFORMATIONAL request 543 [ ]
Jun 18 00:25:21 vedge01 charon: 08[NET] sending packet: from 10.132.3.92[4500] to 10.10.10.1[4500] (76 bytes)
Jun 18 00:25:51 vedge01 charon: 05[IKE] retransmit 1 of request with message ID 543 (tries=3, timeout=30, exchange=37, state=2)
Jun 18 00:25:51 vedge01 charon: 05[NET] sending packet: from 10.132.3.92[4500] to 10.10.10.1[4500] (76 bytes)

Jun 18 00:26:45 vedge01 charon: 06[IKE] retransmit 2 of request with message ID 543 (tries=3, timeout=30, exchange=37, state=2)
Jun 18 00:26:45 vedge01 charon: 06[NET] sending packet: from 10.132.3.92[4500] to 10.10.10.1[4500] (76 bytes)

Jun 18 00:28:22 vedge01 charon: 08[IKE] retransmit 3 of request with message ID 543 (tries=3, timeout=30, exchange=37, state=2)
Jun 18 00:28:22 Lvedge01 charon: 08[NET] sending packet: from 10.132.3.92[4500] to 10.10.10.1[4500] (76 bytes)

Jun 18 00:31:17 vedge01 charon: 11[IKE] giving up after 3 retransmits
Jun 18 00:31:17 vedge01 FTMD[1472]: %Viptela-LONDSR01-FTMD-6-INFO-1000001: VPN 1 Interface ipsec2 DOWN
Jun 18 00:31:17 vedge01 FTMD[1472]: %Viptela-LONDSR01-ftmd-6-INFO-1400002: Notification: interface-state-change severity-level:major host-name:"LONDSR01" system-ip:4.0.5.1 vpn-id:1 if-name:"ipsec2" new-state:down

 

For the example described, the tunnel goes down due to vEdge01 does not receive the DPD packets from 10.10.10.1. It is expected after 3 DPD retransmissions the IPsec peer is set as "lost" and the tunnel goes down. There are multiple reasons for this behavior, usually, it is related to the ISP where the packets are lost or dropped in the path. If the issue occurs once, there is no way to track the traffic lost, however, if the issue persists, the packet can be tracked with the use of captures on vEdge, remote IPSec peer, and the ISP.

 

 

Symptom 3. IPsec Tunnel Went Down and It Stays on a Downstate

 

As previously mentioned in this symptom, the tunnel previously worked fine but for any reason, it came down and the tunnel has not been able to successfully established again. In this scenario, there is an affectation to the network.

 

identify the points before the troubleshoot starts:

 

  1. IPsec tunnel (Number) with issues and configuration.
  2. The timestamp when the tunnel went down.
  3. IPsec peer IP address (Tunnel destination)

 

PFS Mismatch

In this example, the troubleshoot does not start with the timestamp when the tunnel goes down. As the issue persists, the IKE debugs are the best option. 

 

interface ipsec1
  description             VWAN_VPN
  ip address 192.168.0.101/30
  tunnel-source-interface ge0/0
  tunnel-destination      10.10.10.1
  ike
   version      2
   rekey        28800
   cipher-suite aes256-cbc-sha1
   group        2
   authentication-type
    pre-shared-key
     pre-shared-secret "$8$njK2pLLjgKWNQu0KecNtY3+fo3hbTs0/7iJy6unNtersmCGjGB38kIPjsoqqXZdVmtizLu79\naQdjt2POM242Yw=="
    !
   !
  !
  ipsec
   rekey                   3600
   replay-window           512
   cipher-suite            aes256-cbc-sha1
   perfect-forward-secrecy group-16 
  !
  mtu                     1400
  no shutdown

 

The debug iked is enabled and negotiation is displayed.

 

daemon.info: Apr 27 05:12:56 vedge01 charon: 16[NET] received packet: from 10.10.10.1[4500] to 172.28.0.36[4500] (508 bytes) 
daemon.info: Apr 27 05:12:56 vedge01 charon: 16[ENC] parsed CREATE_CHILD_SA request 557 [ SA No TSi TSr ]
daemon.info: Apr 27 05:12:56 vedge01 charon: 16[CFG] received proposals: ESP:AES_GCM_16_256/NO_EXT_SEQ, ESP:AES_CBC_256/HMAC_SHA1_96/NO_EXT_SEQ, ESP:3DES_CBC/HMAC_SHA1_96/NO_EXT_SEQ, ESP:AES_CBC_256/HMAC_SHA2_256_128/NO_EXT_SEQ, ESP:AES_CBC_128/HMAC_SHA1_96/NO_EXT_SEQ, ESP:3DES_CBC/HMAC_SHA2_256_128/NO_EXT_SEQ
daemon.info: Apr 27 05:12:56 vedge01 charon: 16[CFG] configured proposals: ESP:AES_CBC_256/HMAC_SHA1_96/MODP_4096/NO_EXT_SEQ
daemon.info: Apr 27 05:12:56 vedge01 charon: 16[IKE] no acceptable proposal found
daemon.info: Apr 27 05:12:56 vedge01 charon: 16[IKE] failed to establish CHILD_SA, keeping IKE_SA
daemon.info: Apr 27 05:12:56 vedge01 charon: 16[ENC] generating CREATE_CHILD_SA response 557 [ N(NO_PROP) ]
daemon.info: Apr 27 05:12:56 vedge01 charon: 16[NET] sending packet: from 172.28.0.36[4500] to 10.10.10.1[4500] (76 bytes)

daemon.info: Apr 27 05:12:57 vedge01 charon: 08[NET] received packet: from 10.10.10.1[4500] to 172.28.0.36[4500] (76 bytes)
daemon.info: Apr 27 05:12:57 vedge01 charon: 08[ENC] parsed INFORMATIONAL request 558 [ ]
daemon.info: Apr 27 05:12:57 vedge01 charon: 08[ENC] generating INFORMATIONAL response 558 [ ]
daemon.info: Apr 27 05:12:57 vedge01 charon: 08[NET] sending packet: from 172.28.0.36[4500] to 10.10.10.1[4500] (76 bytes)
daemon.info: Apr 27 05:12:58 vedge01 charon: 07[NET] received packet: from 10.10.10.1[4500] to 172.28.0.36[4500] (396 bytes)
daemon.info: Apr 27 05:12:58 vedge01 charon: 07[ENC] parsed CREATE_CHILD_SA request 559 [ SA No TSi TSr ]
daemon.info: Apr 27 05:12:58 vedge01 charon: 07[CFG] received proposals: ESP:AES_GCM_16_256/NO_EXT_SEQ, ESP:AES_CBC_256/HMAC_SHA1_96/NO_EXT_SEQ, ESP:3DES_CBC/HMAC_SHA1_96/NO_EXT_SEQ, ESP:AES_CBC_256/HMAC_SHA2_256_128/NO_EXT_SEQ, ESP:AES_CBC_128/HMAC_SHA1_96/NO_EXT_SEQ, ESP:3DES_CBC/HMAC_SHA2_256_128/NO_EXT_SEQ
daemon.info: Apr 27 05:12:58 vedge01 charon: 07[CFG] configured proposals: ESP:AES_CBC_256/HMAC_SHA1_96/MODP_4096/NO_EXT_SEQ
daemon.info: Apr 27 05:12:58 Avedge01 charon: 07[IKE] no acceptable proposal found
daemon.info: Apr 27 05:12:58 vedge01 charon: 07[IKE] failed to establish CHILD_SA, keeping IKE_SA

 

Note:  CREATE_CHILD_SA packets are exchanged for every rekey or new SA. For more references, navigate to  Understanding IKEv2 Packet Exchange

 

IKE debugs show the same behavior and it is constantly repeated, so it is possible to take a part of the information and analyze it: 

CREATE_CHILD_SA means a rekey, with the purpose for the new SPIS to be generated and exchanged between the IPsec endpoints.

 

  • The vedge receives the CREATE_CHILD_SA request packet from 10.10.10.1.
  • The vedge processes the request and verifies the proposals (SA) sent by peer 10.10.10.1
  • The vedge compares the received proposal sent by the peer against its configured proposals.
  • The CREATE_CHILD_SA exchanged fails with " no acceptable proposals found".

 

At this point, the question is: Why is there a configuration mismatch if the tunnel worked previously and no changes were done?

Analyze in deep, there is an extra field on the configured proposals that the peer is not sending.

 

configured proposals: ESP:AES_CBC_256/HMAC_SHA1_96/MODP_4096/NO_EXT_SEQ

 

Received proposals:
ESP:AES_GCM_16_256/NO_EXT_SEQ,
ESP:AES_CBC_256/HMAC_SHA1_96/NO_EXT_SEQ,
ESP:3DES_CBC/HMAC_SHA1_96/NO_EXT_SEQ,
ESP:AES_CBC_256/HMAC_SHA2_256_128/NO_EXT_SEQ,
ESP:AES_CBC_128/HMAC_SHA1_96/NO_EXT_SEQ,
ESP:3DES_CBC/HMAC_SHA2_256_128/NO_EXT_SEQ

 

MODP_4096 is DH group 16, which vedges has configured for PFS (perfect-forward-secrecy) on phase 2 (IPsec section).

PFS is the only mismatch configuration in which the tunnel can be successfully established or not according to who is the initiator or responder in the IKE negotiation. However, when the rekey starts the tunnel is not be able to continue and this symptom can be presented or related to.

 

vEdge IPSec/Ikev2 Tunnel Not Getting Re-initiated After Being Torn Down Due to a DELETE Event

See Cisco bug ID CSCvx86427 for more information about this behavior.

 

As the issue perseveres, the IKE debugs are the best options. However, for this particular bug if debugs are enabled no information is displayed neither the terminal nor the message file.

To narrow down this issue and verify if vEdge hits the Cisco bug ID CSCvx86427, it is needed to find the moment when the tunnel goes down.

 

identify the points before the troubleshoot starts:

 

  1. IPsec tunnel (Number) with issues and configuration.
  2. The timestamp when the tunnel went down.
  3. IPsec peer IP address (Tunnel destination)

 

After the timestamp is identified, and the time and logs are correlated,  review the logs just before when the tunnel goes down.

 

Apr 13 22:05:21 vedge01 charon: 12[IKE] received DELETE for IKE_SA ipsec1_1[217] 
Apr 13 22:05:21 vedge01 charon: 12[IKE] deleting IKE_SA ipsec1_1[217] between 10.16.0.5[10.16.0.5]...10.10.10.1[10.10.10.1]
Apr 13 22:05:21 vedge01 charon: 12[IKE] deleting IKE_SA ipsec1_1[217] between 10.16.0.5[10.16.0.5]...10.10.10.1[10.10.10.1]
Apr 13 22:05:21 vedge01 charon: 12[IKE] IKE_SA deleted
Apr 13 22:05:21 vedge01 charon: 12[IKE] IKE_SA deleted
Apr 13 22:05:21 vedge01 charon: 12[ENC] generating INFORMATIONAL response 4586 [ ]
Apr 13 22:05:21 vedge01 charon: 12[NET] sending packet: from 10.16.0.5[4500] to 10.10.10.1[4500] (80 bytes)
Apr 13 22:05:21 vedge01 charon: 12[KNL] Deleting SAD entry with SPI 00000e77
Apr 13 22:05:21 vedge01 FTMD[1269]: %Viptela-AZGDSR01-FTMD-6-INFO-1000001: VPN 1 Interface ipsec1 DOWN
Apr 13 22:05:21 vedge01 FTMD[1269]: %Viptela-AZGDSR01-ftmd-6-INFO-1400002: Notification: interface-state-change severity-level:major host-name:"vedge01" system-ip:4.1.0.1 vpn-id:1 if-name:"ipsec1" new-state:down

Note:  There are multiples DELETES packets on an IPsec negotiation, and the DELETE for CHILD_SA is an expected DELETE for a REKEY process, this issue is seen when a pure IKE_SA DELETE packet is received without any particular IPsec negotiation. That DELETE removes all the IPsec/IKE tunnel.

 

Related Information