Wednesday, March 31, 2021

Lightspeed family get ready doc

 

Lightspeed

 

The Lightspeed program is the next-generation of route processors and line cards for the ASR9K family. These new cards will be based on Lightspeed NPU and Skybolt fabric. New set of commons, RP/RSPs and Skybolt based fabric cards, shall also be included as part of this project enabling these newer cards to operate at their full capacity.

 

 

Lightspeed - the fourth generation ASR9k Hardware and has the following  set of new linecards

  • 32 port 100G linecard: A99-32X100GE-SE&  A99-32X100GE-TR
  • 16 port 100G linecard: A9K-16X100GE-SE&  A9K-16X100GE-TR
  • 8 port 100G linecard: A9K-8X100GE-X-SE &  A9K-8X100GE-X-TR

 

 

Following are the release timelines for the various PID on Lightspeed family

  • 7-fabric linecards:
    • A99-32X100GE–TR (supported in 6.5.15)
    • A99-32X100GE–SE (supported with TR scale in 6.5.15. SE scale support in 6.5.2)
  • 5-fabric linecards:
    • A9K-16X100GE–TR (supported in 6.5.15)
    • A9K-16X100GE–SE (supported with TR scale in 6.5.15. SE scale support in 6.5.2)
    • A9K-8X100GE-X-TR (supported in 6.5.15)

A9K-8X100GE-X-SE (supported with TR scale in 6.5.15. SE scale support in 6.5.2)

* Note these versions are eXR only (64bit version of XR)

 

The key hardware capabilities for the Lightpseed hardware are listed below

 

  • New in-house developed NPU
  • 420G bandwidth (bi-directional), 300Mpps+ full duplex forwarding per LightSpeed NPU
    • (Tomahawk: 240G / 150Mpps+)
  • 8Tb to 4.2Tb per slot fabric capacity
    • depending on chassis type
  • 22 billion transistors
  • 2M MAC addresses
  • 6M v4 or 6M v6 routes
  • 48K queues per NPU

 

Difference between Tomahawk and Lightspeed

           

TBD (Internal consumption )

 

7- Fabric vs 5-Fabric

 

This denotes the requirement of how many FC3’s (fabric card) are required in the system.

 

For a 7-Fabric line card you need to have 7 FC3’s installed in the system to take the effectively utilize the Lightspeed bandwidth capacity.

 

For ASR9922 & ASR 9912

 

  • 32x100GE 7-fabric linecard: You need to install 7 SFC3s
  • 16x100GE 5-fabric linecard: You need to install min 4 SFC3s
  • 8x100GE 5-fabric linecard: You need to install min 4 SFC3s

 

For ASR9910 & ASR9906

  • 32x100GE 7-fabric linecard: You need to install 2 RSP5s and 5 SFC3s
  • 16x100GE 5-fabric linecard: You need to install 2 RSP5s and min 2 SFC3s
  • 8x100GE 5-fabric linecard: You need to install 2 RSP5s and min 2 SFC3s

The 7 fabric cards also need the A99-HighBandwidth fabric mode enabled to avoid policing the FIA to ~93gig per interface

Information on ASR 9000 Fabric Modes

 

If you use a 7 fabric card without that fabric mode you will see the following syslog when the LC comes online.

 

%FABRIC-FIA-1-RATE_LIMITER_ON : Set|fialc[4320]|0x108a000|Insufficient fabric capacity for card types in 
use -FIA egress rate limiter applied 

 

The following table illustrates the same :

 

Linecard Type

Fabrics

32 Port

7

7xSFC3

- or -

2xRSP5 & 5xSFC3

16 Port

4

4xSFC3

- or -

2xRSP5 & 2xSFC3

8 Port

4

4xSFC2/SFC3

- or -

2xRSP5 & 2xSFC3

 

 

Line card Architecture - 32X100 GE

 

Key highlights on the LC architecture for 32X100 GE

  1. 3.2T linerate QSFP28 LAN linecard: SE & TR variant (SE has TR scale with 6.5.15. Full SE scale with 6.5.2)
  2. No hardware support for OTN, MACsec and FlexE
  3. 7-Fabric card
    1. Works in 9922, 9912, 9910, 9906 & 9904 chassis
  4. Requires LightSpeed commons
    1. RP3
    2. SFC3
    3. RSP5
  5. Linecard Performance
    1. ASR 9922, 9912, 9910 & 9906 chassis
  1. 3.2T linerate with fabric redundancy
    1. ASR 9904 chassis
  1. 3.2T linerate with dual RSP5
  2. 1.8T throughput with single RSP5

 

 

Line card Architecture - 16X100 GE

 

  1. 1.6T linerate QSFP28 LAN linecard: SE & TR variant (SE has TR scale with 6.5.15. Full SE scale with 6.5.2)
  2. No hardware support for OTN, MACsec and FlexE
  3. 5-Fabric card
    1. Works in all ASR9k modular chassis
  4. Can work with Tomahawk commons (RP2)
    1. Rules are explained later in this deck
  5. Linecard Performance with LightSpeed commons
    1. ASR 9922, 9912, 9910 & 9906 chassis
  1. 1.6T linerate with fabric redundancy
    1. ASR 9904 chassis
  1. 1.6T linerate with dual RSP5
  2. 1.4T throughput with single RSP5
    1. ASR 9010 & 9006 chassis
  1. 1.6T linerate with dual RSP5
  2. 900G throughput with single RSP5

 

Lightspeed Commons

 

Following are list of the Common PID;s that will support Lightspeed family of line cards:

RP/RSP

 

 

•       A99-RP3-SE - ASR 9900 Route Processor 3 for Service Edge

•       A99-RP3-TR  - ASR 9900 Route Processor 3 for Packet Transport

•       A9K-RSP5-SE - ASR 9000 Route Switch Processor 5 for Service Edge

•       A9K-RSP5-TR -ASR 9000 Route Switch Processor 5 for Packet Transport

 

Fabric Cards

 

•       A99-SFC3                              ASR 9900 Series Switch Fabric Card 3

•       A99-SFC3-S                           ASR 9910 Series Switch Fabric Card 3 (ASR 9910 Shockwave)

•       A99-SFC3-T                           ASR 9906 Series Switch Fabric Card 3 (ASR 9906 Torchwood)

 

Fan

•       ASR-9922-FAN-V3               ASR 9922 Fan Tray v3

•        

 

 

Route Switch Processors and Route Processors


RSP used in ASR9910/9906/9904/9006/9010, RP in ASR9922/9912

 

 

 

RSP880
A99-RSP

RP2

RSP5

RP3

Description

3rd Gen RP and Fabric ASIC

4th Gen RP and Fabric ASIC

Switch Fabric Bandwidth

400G + 400G (9006/9010)

700G + 700G (9904) 200G + 200G + 1.0T (9906/9910)

1.2Tb + 200G

(separated fabric card)

900G + 900G (9006/9010)

1.8T + 1.8T (9904)
600G + 600G + 3.0T (9906/9910)

3.6Tb + 600G

(separated fabric card)

Processor

Intel x86 (Ivy Bridge EP)

8 Core 2GHz

Intel x86 (Skylake EP)

8 Core 2GHz

RAM

-TR: 16GB

-SE: 32GB

-TR: 16GB

-SE: 40GB

SSD

2 x 32GB Slim SATA

2 x 128GB Slim SATA

Punt BW

40GE

40GE

 

 

ASR 9000 Switch Fabric Overview

 

ASR9922/12: 7-fabric System w/ LightSpeed and Tomahawk

 

The 7-fabric LCs use all 7 fabrics if all other LCs in the system are 7 fabricand all LightSpeed and Tomahawk LCs in this mode interoperate at full BW.

 

 

 

 LS-7F.jpg

 

 

 

 

 

 

ASR99xx: 5-fabric System w/ LightSpeed and Tomahawk

 

A 7 Fabric LC (  i.e 32  X 100 G ) uses only 5 fabric when there is a 5-Fab lightspeed or tomahawk LC in the system.

 

 

5-FabLC.jpg

 

 

Lightspeed NPU Internals

 

 

 

 

LS-NPU.jpg

 

 

 

 

LightSpeed supports up to 500Gbps of line side interfaces comprised of a configurable combination of 10G/25G/40G/100G/400G Ethernet ports and/or 100G/400G/500G Interlaken interfaces. In addition to this there are two dedicated 10GE ports for punt/inject as well as a PDMA port (via a PCI gen3 interface) from the local CPU. Note that packets sourced locally (either via punt/inject 10GE ports or the PCI interface) can only come in to the ingress side of the ASIC, and packets destined to the local CPU (punt/inject or PCI) must go through the egress side.

There are several local datapaths.

  • A 20Gbps/8Mpps local loopback path which loops back from the ingress datapath to the egress datapath bypassing the fabric interface
  • A 60Gbps/25Mpps recycle path that loops back from the egress datapath back into the egress datapath (note this feature is not available in the ingress direction)

A 250Gbps service loopback path which loops back from the egress datapath to the ingress datapath bypassing the L2 line interface

 

Brief description of each functional block is listed below:                

  • L2 Super Block–Line side interfaces plus preliminary L2 processing
    • Flexible Header Parsing
    • Protocol Decode and Validation
    • Bundle Mapping
    • Satellite Encap Termination
    • Priority Classification
  • i/eGPB
    • ingress/egress Global Packet Buffer, split as ingress and egress GPB – meaning ingress and egress traffic can’t be switched directly within same PPE. It has to go through loopback path to reach counterpart.
    • 9MB to store packet under processing. V
    • Uidb, Racetrack and My MAC lookup moved in from L2 block.
  • DST–Manages the assignment of packets to 4608 PPE threads based on affinity

 

  • PA–Processor Array, has up to 3584 threads to process received packet and runs u-code, uses DMA for h/w assists and TLU for table lookups

 

  • FLB–Flow Lock Block, enforce packet order in the same flow. Support up to 1024 flows.
  • i/eGTR–ingress/egress Gather Block, gather up the packet from iGPB memory for sends the packet to Fabric or Line
  • EPE–Embedded Processing Element, 96MB SRAM. Store
    • Used for Stats, Policer, C code stack & thread local memory
  • DPE (DRAM): 2x4GB
  • Internal TCAM:5 MB
  • New hash-based PLUalgorithm with 9MB SRAM ( Refer Processor Prefix Lookup)
  • Local loopback used for ingress punt & inject to wire
  • Service loopback used for egress to fabric
  • Recycle from egress TM to eGPB
  • PCIe Gen3 Host Processor Interface

 

 

Life of packet inside the LS NPU

 

1)    L2 - Block :

  • An untagged frame is received on an Ethernet port of the L2 Block
  • Decodes the L2 encapsulation, determines that the L3 Type is IPv4, determines the L3 Offset, decodes the IPv4 header
  • Gets the frame type i.e encapsulation type
  • The Packet Classification Module (PCM) of the L2 Block detects a My-MAC match on the destination MAC address and prioritizes the frame based on the TOS value in the IPv4 header

                           

2)    iGPB :

Prior to sending the packet to GPM, the BAF performs the Ingress UIDB Classification TCAM Lookup and a Protocol Decode TCAM Lookup

  • The Ingress UIDB Classification uses the embedded TCAM
  • The Protocol Decode TCAM Lookup produces a vector that is used to accelerate PPE execution, For e.g an  “IPv4 Unicast Normal, No Exceptions”,  for an IPV4 packet.
  • Some of the fields in the Protocol Decode TCAM Lookup Key are Mark BitsPort BitsFrame TypeL3 TypeDA Bits, and Decode Bits

3)    iGPB :

  • The packet is forwarded to a PPE for processing, with BAF Header is delivered to the PPE along with the packet.
  • BAF Headers are accessible to the PPE software via DMEM. some of the more interesting info provided in the BAF Header includes Traffic Type, Frame Type, L3 Type, L3 Offset, Mark Bits, Port Bits, DA Bits, Decode Bits, Protocol Decode Vector, BAF Port, Logical Port, and Timestamp

4)    PPE: Data plane software execution begins in the main packet processing loop .

5)    PPE: Depending on the packet was received from a line side interface by checking a field in the Distribution Header, and then switches on the Protocol Decode Vector, which results in a call to the ingress IPv4 processing function.

 

6)  PPE: Depending on the packet type the appropriate packet processing will take place

     Fabric header is built in this stage and packets are flushed to fabric.

 

7) Packet is transmitted across fabric to egress LC

 

 

LightSpeed Migration

In service RSP5 or RP3 Migration is documented here:

https://www.cisco.com/c/en/us/td/docs/iosxr/asr9000/hardware-install/hig/b-asr9k-hardware-installati...

 

QOS

QOS Deployment Guide

The ASR9k QOS Deployment Guide has been updated.  There are still more updates coming so keep checking the officiall doc EDCS–1226762

Attaching version 45.1 here but check on docs.cisco.com for the latest version. 

 

QOS Dynamic Packet Buffering 

http://xrdocs.io/asr9k/blogs/2018-09-06-dynamic-packet-buffering/

 

Optics

See the datasheet for the supported optics. 

a99-32x100ge Data Sheet

 

Also note that usage of 100GER4-L-S QSFP in the 32x100GE card. 

 

It is a known limitation on the 32x100G LC only as these optics dissipate a lot more heat than the others.

These are the ports supported - the rest are disabled in software:

Ports : 12,13,14,15 and 28,29,30,31

 

Feature Parity

 

Keep track with the PM team:  https://salesconnect.cisco.com/#/program/PAGE-10319

SW and HW Release Dashboard - https://apps.na.collabserv.com/communities/service/html/communitystart?communityUuid=a89c8115-466a-4...

Screen Shot 2018-10-24 at 3.17.03 PM.png