Thursday, December 9, 2021

ASR9k BNG General Questions

 

Introduction

 

Follow BNG general questions need explnation:

 

1. Whether have any BNG documents?
2. How to send correct NAS-Port-Id to radius?
3. How to send caller-id attribute(31) to radius?
4. How to send qos template by radius?
5. How to send total qos config by radius?

6. How to bypass radius authentication when radius failure?

7. Refer to NP resource when send QOS for session by radius.

8. Authorize issue by DHCP option 60 and 61 under IPoE/DHCPv6

9. Refer to IPv6 RA issue when set up IPv6 DHCPv6 under PPPoE/IPoE.

 

If you found session couldn't up, and customer have QOS / bundle, please check follow command first, that is our QOS limitation:

 

  • queueing, relation with NP chunk, please check Q8, check by "sh qoshal resource summary np x location 0/x/cpu0"
  • police rate, no queue, relation with qos entry, please check Q9, check by "show controllers np struct qoS-INTF all location 0/x/cpu0"

 

Follow bng scale - get from roy's good doc ASR9K BNG boilerplate and RFP Q&A, I had attached the file in the TZ:

pastedImage_3.png

 

1. Whether have any BNG documents?

 

ASR9000/XR: BNG deployment guide

ASR9000 BNG Training guide setting up PPPoE and IPoE sessions 

ASR9000/XR: BNG and Dual-stack ipv4 and ipv6 sessions

ASR9000/XR: BNG VSA's (vendor specific attributes) and Services

Radius Attributes - CCO documents

ASR9K BNG radius and COA deployment guide

Radius Types/Attribute and so on

Understanding BNG session disconnect reasons <<< TS Doc

Subscriber disconnect codes for PPPoE on ASR9000 BNG  <<< TS Doc

 

2. General Troubleshooting commands for BNG

 

sh ins act summary 
sh inventory
sh log
sh subs se all sum         
sh radius
sh pool ipv4
sh pppoe limits 
sh pppoe statistics  
sh pppoe summary total location x/x/cpu0 <<< check pppoe summary info, e.g in flight
sh pppoe sum per-access-interface  <<< check each subscrib inter session for pppoe
sh pppoe throttles 

sh ipsubscriber access-interface brief <<< check each subscrib inter session for ipoe
sh ppp disconnect-history unique location x/x/cpu0 sh subs manager disconnect-history last summary sh subs manager disconnect-history unique summary sh subs manager statistics performance sh subs manager statistics summary total sh subs session all detail internal location x/x/cpu0 sh process block loc all sh process cpu loc x/x/cpu0 sh mem summary loc x/x/cpu0 sh shmwin summary location x/x/cpu0 <<< shared memory status sh controllers np struct UIDB-EGR-EXT all location x/x/cpu0
sh prm server tcam summary all all all location 0/x/cpu0 <<< check all tcam entry resource show tech pppoe location x/x/cpu0 show tech-support subscriber >>> ipoe show tech-support iedge show tech subdb sh subscriber manager trace all last 500 debug subscriber manager debug subscriber manager level verbose debug subscriber manager session next-subscriber filter-id mac-address xxxx.xxxx.xxxx
sh subscriber manager trace all verbose last 1000

3. How to send correct NAS-Port-Id to radius?

 

Follow customer config:

aaa attribute format NAS_PORT_FORMAT
 format-string length 253 "eth%s/%s/%s:%s.%s0/0/0/0/0/0" physical-slot physical-subslot physical-port outer-vlan-id inner-vlan-id
!
aaa radius attribute nas-port-id format NAS_PORT_FORMAT type 36
aaa accounting subscriber ASIAINFO group ASIAINFO
aaa authorization subscriber ASIAINFO group ASIAINFO
aaa authentication subscriber ASIAINFO group ASIAINFO
!
aaa group server radius ASIAINFO
 server 122.141.232.135 auth-port 1645 acct-port 1646
 source-interface Loopback0
!
radius source-interface Loopback0 vrf default
radius-server vsa attribute ignore unknown
radius-server host 122.141.232.135 auth-port 1645 acct-port 1646
 key 7 120B1019050A150E26

Follow customer requirement format for NAS-Port-Id:

NAS-Port-Id = "slot=4;subslot=1;port=1;vlanid=3004;vlanid2=1907;"

Follow debug output:

LC/0/0/CPU0:Jun  1 10:28:59.012 : radiusd[315]:  RADIUS:  Acct-Session-Id     [44]    10      04000007
LC/0/0/CPU0:Jun  1 10:28:59.012 : radiusd[315]:  RADIUS:  NAS-Port-Id         [87]    10      0/1/1/21
LC/0/0/CPU0:Jun  1 10:28:59.012 : radiusd[315]:  RADIUS:  Vendor,Cisco        [26]    16      
LC/0/0/CPU0:Jun  1 10:28:59.012 : radiusd[315]:  RADIUS:   cisco-nas-port      [2]    10      0/1/1/21  <<<
LC/0/0/CPU0:Jun  1 10:28:59.012 : radiusd[315]:  RADIUS:  User-Name           [1]     11      yj2514417
LC/0/0/CPU0:Jun  1 10:28:59.012 : radiusd[315]:  RADIUS:  Acct-Authentic      [45]    6       RADIUS[0]

Follow correct config and match customer requirement:

aaa attribute format NAS_PORT_FORMAT
 format-string length 253 "slot=%s;subslot=%s;port=%s;vlanid=%s;vlanid2=%s" physical-slot physical-subslot physical-port outer-vlan-id inner-vlan-id
!
aaa radius attribute nas-port-id format NAS_PORT_FORMAT

Jun  2 15:48:17.348 : radiusd[1113]:  RADIUS:  NAS-Port-Type       [61]    6       Virtual PPPoEoQinQ[37] 
Jun  2 15:48:17.348 : radiusd[1113]:  RADIUS:  Vendor,Cisco        [26]    41      
Jun  2 15:48:17.348 : radiusd[1113]:  RADIUS:   Cisco AVpair        [1]    35      client-mac-address=00f1.f310.a5db
Jun  2 15:48:17.348 : radiusd[1113]:  RADIUS:  Acct-Session-Id     [44]    10      0000007a
Jun  2 15:48:17.348 : radiusd[1113]:  RADIUS:  NAS-Port-Id         [87]    49      slot=0;subslot=0;port=666;vlanid=21;vlanid2=210 <<<
Jun  2 15:48:17.348 : radiusd[1113]:  RADIUS:  Vendor,Cisco        [26]    55      
Jun  2 15:48:17.348 : radiusd[1113]:  RADIUS:   cisco-nas-port      [2]    49      slot=0;subslot=0;port=666;vlanid=21;vlanid2=210

Attention: customer tried to config nas-port id format that should not general config, if no special requirement, please don't use that, as follow:

aaa radius attribute nas-port format e 00000000000000000000VVVVVVVVVVVV type 36
aaa radius attribute nas-port format e 00000000QQQQQQQQQQQQVVVVVVVVVVVV type 37

4. How to send caller-id attribute(31) to radius?

 

Radius engineer said need caller-id, that attribute 31. And need match follow format: 00:12:3f:13:ef:8a. But after checked, ASR9k not support above format, now only support follow three formats: CSCuc09009  (Internal)

 

Need customization in calling-station ID formatting

mac-address: for example, 0000.4096.3e4a
mac-address-ietf: for example, 00-00-40-96-3E-4A
mac-address-raw: for example, 000040963e4a

After add follow configuration, use the format "0000.4096.3e4a":

aaa attribute format caller-id
format-string length 253 "%s" client-mac-address
!
aaa radius attribute calling-station-id format caller-id

Jun 2 16:08:57.004 : radiusd[1113]: RADIUS: NAS-Port-Id [87] 49 slot=0;subslot=0;port=666;vlanid=21;vlanid2=210
Jun 2 16:08:57.004 : radiusd[1113]: RADIUS: Vendor,Cisco [26] 55
Jun 2 16:08:57.004 : radiusd[1113]: RADIUS: cisco-nas-port [2] 49 slot=0;subslot=0;port=666;vlanid=21;vlanid2=210
Jun 2 16:08:57.004 : radiusd[1113]: RADIUS: Calling-Station-Id [31] 16 00f1.f310.a5db <<<
Jun 2 16:08:57.004 : radiusd[1113]: RADIUS: User-Name [1] 11 yj2514417
Jun 2 16:08:57.004 : radiusd[1113]: RADIUS: Acct-Authentic [45] 6 RADIUS[1]
Jun 2 16:08:57.004 : radiusd[1113]: RADIUS: Framed-IP-Address [8] 6 222.162.104.171

5. How to send qos template by radius?

 

Customer sent follow QOS template, but session couldn't take the configuration:

Jun  8 16:46:16.711 : radiusd[1113]:  RADIUS: Received from id 63 122.141.232.135:1645, Access-Accept, len 138
Jun  8 16:46:16.711 : radiusd[1113]:  RADIUS:  authenticator 99 CE 62 3F 1E EB D3 95 - 48 93 8B 1A 5C DD 66 C2
Jun  8 16:46:16.711 : radiusd[1113]:  RADIUS:  Vendor,Unknown      [26]    12              Unsupported[43]    6       
Jun  8 16:46:16.711 : radiusd[1113]:  RADIUS:  Vendor,Unknown      [26]    8               Unsupported[44]    2       
Jun  8 16:46:16.711 : radiusd[1113]:  RADIUS:  Service-Type        [6]     6       Framed[2] 
Jun  8 16:46:16.711 : radiusd[1113]:  RADIUS:  Framed-Protocol     [7]     6       PPP[1]  
Jun  8 16:46:16.711 : radiusd[1113]:  RADIUS:  Idle-Timeout        [28]    6       14400   
Jun  8 16:46:16.711 : radiusd[1113]:  RADIUS:  Session-Timeout     [27]    6       172800  
Jun  8 16:46:16.711 : radiusd[1113]:  RADIUS:  Vendor,Cisco        [26]    36      
Jun  8 16:46:16.711 : radiusd[1113]:  RADIUS:   Cisco AVpair        [1]    30      "ip:sub-qos-policy-in=in_8M"
Jun  8 16:46:16.711 : radiusd[1113]:  RADIUS:  Vendor,Cisco        [26]    38      
Jun  8 16:46:16.711 : radiusd[1113]:  RADIUS:   Cisco AVpair        [1]    32      "ip:sub-qos-policy-out=out_8M"

RP/0/RSP0/CPU0:JLYJ-DBDL-ASR9010-SR#show running-config policy-map out_8M        
Mon Jun  8 23:26:16.958 BeiJing
policy-map out_8M
 class class-default
  shape average 8 mbps 
 !
 end-policy-map
!

RP/0/RSP0/CPU0:JLYJ-DBDL-ASR9010-SR#show running-config policy-map in_8M 
Mon Jun  8 23:26:40.804 BeiJing
policy-map in_8M
 class class-default
  police rate 8 mbps 
  !
 !
 end-policy-map
!

After checked, customer radius sent attribute not correct, that include "", follow correct attribute info:

Jun  8 14:56:31.307 : radiusd[1118]:  RADIUS:  Vendor,Cisco        [26]    38      
Jun  8 14:56:31.307 : radiusd[1118]:  RADIUS:   Cisco AVpair        [1]    32      ip:sub-qos-policy-in=JS100M-UP
Jun  8 14:56:31.307 : radiusd[1118]:  RADIUS:  Vendor,Cisco        [26]    41      
Jun  8 14:56:31.307 : radiusd[1118]:  RADIUS:   Cisco AVpair        [1]    35      ip:sub-qos-policy-out=JS100M-DOWN
Jun  8 14:56:31.307 : radiusd[1118]:  RADIUS:  Vendor,Cisco        [26]    35      

6. How to send total qos config by radius?

 

Radius config:

cisco   Cleartext-Password := "cisco"
        Cisco-Avpair += "ip:qos-policy-out=add-class(sub,(class-default),shape(4096))”

Debug on 9k:

Jun  8 15:25:46.685 : radiusd[1118]:  RADIUS:  Vendor,Cisco        [26]    66      
Jun  8 15:25:46.685 : radiusd[1118]:  RADIUS:   Cisco AVpair        [1]    60      ip:qos-policy-out=add-class(sub,class-default),shape(4096)

Check status on 9k:

RP/0/RSP0/CPU0:ASR9010-1#sh policy-map interface Bundle-Ether410.398.pppoe161
Mon Jun  8 15:32:55.454 UTC
Bundle-Ether410.398.pppoe161 direction input: Service Policy not installed

Bundle-Ether410.398.pppoe161 output: __sub_6515ffffffca53
Class class-default
  Classification statistics          (packets/bytes)     (rate - kbps)
    Matched             :                   0/0                    0
    Transmitted         :                   0/0                    0
    Total Dropped       :                   0/0                    0
  Queueing statistics
    Queue ID                             : 138 
    High watermark                       : N/A 
    Inst-queue-len  (packets)            : 0
    Avg-queue-len                        : N/A 
    Taildropped(packets/bytes)           : 0/0
    Queue(conform)      :                   0/0                    0
    Queue(exceed)       :                   0/0                    0
    RED random drops(packets/bytes)      : 0/0

7. How to bypass radius authentication when radius failure?

 

Customer maybe deploy 2 radius for BNG user, that should have standby, but worride 2 radius server are failure at same time. So want to check whether have any way to bypass authentication when radius failure.

 

We can use "authentication-no-response" to achieve customer requirement. As follow configuraiton, when normal work, packets will match PPPoE dynamips template; if radius failure will match radius_fail dynamips template. Btw, due to radius down, no server send QOS, if didn't do anything, new user will no any bandwidth limitation. We can config a general QOS template under dynamips, e.g: "radius_fail_qos_in/out".

 

dynamic-template
type ppp radius_fail <<< radius failure template
ppp ipcp dns 202.175.36.16 202.175.3.9
ppp ipcp peer-address pool pppoe-pool
ipv4 unnumbered Loopback0
service-policy input radius_fail_qos_in

service-policy output radius_fail_qos_out
!
type ppp PPPoE <<< radius normal work template
ppp prot-reject-timeout 1
ppp authentication pap
ppp ipcp dns 202.175.36.16 202.175.3.9
ppp ipcp peer-address pool pppoe-pool
ipv4 verify unicast source reachable-via rx
ipv4 unnumbered Loopback0
!
policy-map type control subscriber noradius
event session-start match-first
class type control subscriber CLASS-PPPoE do-all
10 activate dynamic-template PPPoE
!
event session-activate match-all
class type control subscriber CLASS-PPPoE do-until-failure
10 authenticate aaa list pppoe_test
!
event authentication-no-response match-first
class type control subscriber CTM-CLASS-PPPoE do-all
10 activate dynamic-template radius_fail
!
end-policy-map
!
aaa authorization subscriber default group AUTH
aaa authentication subscriber default group AUTH
aaa authentication subscriber pppoe_test group TEST_AUTH
!
radius-server timeout 15 <<< kw point
aaa group server radius TEST_AUTH
server-private 1.1.1.1 auth-port 1812 acct-port 1813
!
interface Bundle-Ether410.397
service-policy type control subscriber noradius
pppoe enable bba-group bba1
encapsulation ambiguous dot1q 397 second-dot1q any
!

But base on above configuration, user couldn't dialer success, after online checking, found that due to timeout config is 15s that more than PC's timeout, can check follow detail messages:

 

RP/1/RSP0/CPU0:Jun  4 18:25:04.014 : iedged[247]: [IEDGE:TP3936:PPSM-TRANS:EVENT:0x0] Started transaction timer. 0x10A72A18, 1, Session start request
RP/1/RSP0/CPU0:Jun  4 18:25:04.016 : iedged[247]: [IEDGE:TP1455:PPSM-SUB:EVENT:0x56] New Subscriber - state: initialize
RP/1/RSP0/CPU0:Jun  4 18:25:04.027 : iedged[247]: [IEDGE:TP1447:PPSM-SUB:EVENT:0x56] Triggering action function [connecting, session-connected] = ppsm_subscriber_action_finalize
RP/1/RSP0/CPU0:Jun  4 18:25:04.245 : radiusd[1139]:  RADIUS: Send Access-Request to 1.1.1.1:1812 id 54, len 236  <<< 1st request
RP/1/RSP0/CPU0:Jun  4 18:25:04.246 : radiusd[1139]: Successfully sent packet and started timeout handler for rctx 0x102073e0

RP/1/RSP0/CPU0:Jun  4 18:25:19.803 : radiusd[1139]: rctx found is 0x102073e0 <<< 1st timeout from 9k to radius
RP/1/RSP0/CPU0:Jun  4 18:25:19.803 : radiusd[1139]:  RADIUS: Send Access-Request to 1.1.1.1:1812 id 54, len 236  <<< 2nd request
RP/1/RSP0/CPU0:Jun  4 18:25:19.803 : radiusd[1139]: Successfully sent packet and started timeout handler for rctx 0x102073e0

RP/1/RSP0/CPU0:Jun  4 18:25:34.277 : iedged[247]: [IEDGE:TP2700:PPSM-API:EVENT:0x0] SRVR: IPC_NOTIFY_DATA msg_len: 115  <<< Client send terminal, just 30s
RP/1/RSP0/CPU0:Jun  4 18:25:34.277 : iedged[247]: [IEDGE:TP2702:PPSM-API:EVENT:0x0] SRVR: client id: 5, msg_type:0 
RP/1/RSP0/CPU0:Jun  4 18:25:34.277 : iedged[247]: [IEDGE:TP1171:PPSM-API:EVENT:0x56] function 22 [Session disconnect event], client id 5 [ppp_ma], trans-id (0x3f), cxt [0x0]

RP/1/RSP0/CPU0:BNG2#show ppp disconnect-history detail location 1/rsp0/cpu0
Thu Jun  4 18:26:08.234 HKT

Disconnected Subscriber Sessions in detail:
 Authentication
   Packets                          Sent             Received
   PAP
     Request                           0                   10  <<<
     Ack                               0                    0
     Nak                               0                    0

MA IDB:
-------
  iEdge state:
    Subscriber label                    = 86
    iEdge trans ID                      = 0
    Current iEdge state                 = 0
    In-progress iEdge state             = 0
    Updated iEdge state                 = 0
    IPv4 State                          = 1
    IPv6 State                          = 0
     iEdge Pending mask                 = 0x0
     iEdge In-progress mask             = 0x0
     Reason for Disconnection           = Received Term-Req (or Prot-Rej) <<<

RP/1/RSP0/CPU0:BNG2#show ppp disconnect-history location 1/rsp0/cpu0
Thu Jun  4 18:30:48.087 HKT

Interface                 Time disconnected   CLPN LCP FSM    Reason disconnected
----------------------------------------------------------------------------------
BE410.397.pppoe30         Jun  4 18:25:34.277      202389     Received Term-Req (or Prot-Rej)  <<< time match

Have a questions: if radius server recovery, how to cut customer back to normal radius?

 

I tried use "event timer-expiry" to workaround that, but failure. Our ASR9k must receive request packets, then will go-through all event again and match normal dynamic/radius, but after checked, there are no any feature will let user send request packets again when session is up.

 

So we only to use "ppp timeout absolute x" to disconnect PPP session. If customer dialer by CPE that will auto resend request packets when session down; if customer use PC to dialer, need manual to dialer again. Customer can adjust timeout timer to achieve their requirement.

 

8. Refer to NP resource when send QOS for session by radius

 

When you send QOS for session by radius, you maybe find session number only have 8k or 16k...and new session coulnd't establish. That due to QOS resource limitation for 1 port under 1 NP. So how to check that? 

 

Within an NPU, each port is mapped to a chunk in the NP’s Queuing ASIC on a per-direction basis. For example (refer to table below), in 24x10GE card port 0 is on NP 0, and allocates queueing resources from chunk 0 for ingress and chunk 3 for egress:

 

PortNPUChunk for IngressChunk for Egress
0003
1002
2001

 

Each chunk has 64K (64*1024) L4 queues and 8K (8*1024) L3 entities and 1K (1*1024) L2 entities.

 

Also chunk 0 is used in Ingress for Ports 0, 1 and 2 in the NPU. This means that chunk 0’s resources are being shared, and everytime a queuing policy is applied in the ingress direction of port 0, 1 or 2, the queues are being allocated from chunk 0.

 

qos-limited.png

 

Attention: ingress shaping is not supported on those LC's that have more then 30G of interface load on the (typhoon) NPU. Eg the 36x10, the 4x10MPA on a MOD80, the 8x10 on a MOD160 are examples whereby ingress shaping is disabled (becuase they offer 60G and 40G ingress per npu respectively).

 

For BNG, each session will take independent QOS resource, that will lead 1 port have a chunk by default and only have 8k session(if send l3 qos). We can use follow command to assign more chunk for 1 port: "service-policy output xxx subscriber-parent resource-id 0/1/2/3", btw can use follow command to check NP resource: "sh qoshal resource summary np 1 location 0/6/cpu0".

 

Refer to detail chunk test under bundle BNG, you can check my another special article:

ASR9k BNG QOS Test Summary Under Bundle Scenario

 

9. New QOS entry limitation when have more members on a Bundle

 

At before, we no found customer config more members on a bundle, because there are 2 questions:

 

  • As member increase, QOS rate will double/triple... QOS profile will copy to NP base on member, do not care whether at same NP. Customer can workaround output rate by "bundle load-balancing hash dst-ip", but input rate need workaround at peer devices.
  • If cutomer config Queue in QOS, that will have Q7's questions. Please check detail info in Q7.

Now, my customer CQ CATV online a new ASR9k and deploy PPPOE, version is 5.2.4+all SMU(2015 12.29). Normal online about 14K PPPOE session that RP base. After pass some time, found new session couldn’t online and no any alarms in "show log". But base on disconnect-history, we found more session was failure due to iEdge detected the warning:

 

Btw the customer have a special configuration different other BNG customer: a bundle include 6 member, but they only use police, not use queueing, so should no chunk resource limitation )

 

RP/0/RSP0/CPU0:GDDS-BRAS-ASR9K-2#show subscriber manager disconnect-history last summary 
Thu Jan  8 20:10:06.771 UTC
 
[ IEDGE DISCONNECT HISTORY LAST SESSIONS ]
 Location: 0/RSP0/CPU0
 
Interface               Time Disconnected    Disconnect reason             
=========               =================    =================             
BE10.100.pppoe43643     2015:01:08 20:10:04  Session-Start Failure, 'iEdge' 
                                             detected the 'warning' condition 
                                             'Start Config Failure', DC: 0 AC: 
                                             48 TC: 10
BE10.100.pppoe43644     2015:01:08 20:10:04  Session-Start Failure, 'iEdge' 
                                             detected the 'warning' condition 
                                             'Start Config Failure', DC: 0 AC: 
                                             48 TC: 10
 
+++ show subscriber manager disconnect-history unique summary location 0/RSP0/CPU0 [23:05:43.479 UTC Thu Jan 08 2015] +++
[ IEDGE DISCONNECT HISTORY UNIQUE EVENTS ]
Count  Last Interface          Last Time Disconnected  Disconnect reason             
=====  ==============          ======================  =================             
539753 BE10.100.pppoe46026     2015:01:08 22:04:53     Session-Start Failure, 
                                                       'iEdge' detected the 
                                                       'warning' condition 
                                                       'Start Config Failure', 
                                                       DC: 0 AC: 48 TC: 10

After discussed with DE Chandra, found QOS have some issue:

Client Name:             qos_ma
-------------------------------------
  client_type              = BPI
  ref_count                = 3
 
  Message Counters:
    BPI Register Received              = 1             
    BPI Register Replied               = 1               (0 failed)
    BPI Send Dropped                   = 0             
    BPI Receive Dropped                = 0             
    BPI Apply Sent                     = 799241        
    BPI Apply Done Received            = 1374201         (504462 failed)
    BPI Apply Timeout                  = 0             
    Start Sync                         = 1  

Client Name:             iedge SVM
-------------------------------------
 
    Produce Done Config Generated      = 457189          (504463 failed)
    Produce Done Config Completed      = 851816          (0 failed)
    Produce All Done Received          = 93811         
    Install Callback                   = 93811           (0 failed)
Jan  8 20:16:46.608977 2505 0/RSP0/CPU0 t1  [subdb_svr:MSGHDLR11:EVENT:MSGHDLR] SessID:[0x86414] Process BPI APPLY DONE response from conn_id:[13](1) for feature:[QoS] session [Error: '0x457c2200 ''prm_server' detected the 'warning' condition 'An operation that was requested was aborted - data integrity may be compromised.''']
show process blocked:
375 471170 1 vic_0 Send 0:00:00:0005 172114 prm_server_ty 319 524472 1 qos_ma_ea Reply 0:00:00:0000 172114 prm_server_ty

After checked with QOS PI team, no found issue, we found the issue should at QOS PD team, Prabhakara found hash entries exceeding when issue happened:

++++ show qos-ea trace all location all [16:08:52.294 UTC Sun Jan 11 2015] ++++

Jan  8 22:04:34.195 qos_ea/qos_ma_ea_err 0/0/CPU0 t1  CAPS_LTP102: Caps_add: rc 0x457c2200: 1 elapsed_time 10 ms: ifh 895fd20 dir 0 policy: up_default_policy intf_type 66 : Status: 'prm_server' detected the 'warning' condition 'An operation that was requested was aborted - data integrity may be compromised.'
Jan  8 22:04:34.216 qos_ea/qos_ma_ea_err 0/0/CPU0 t1  HAL_LTP101 Hash pgm failed: rc 0x457c2200 action 0 qos_ifh 0x8100000023557 ifh 0x895fe20 ul_ifh 0x2c0 size 2 policy up_default_policy: Status 'prm_server' detected the 'warning' condition 'An operation that was requested was aborted - data integrity may be compromised.'
Jan  8 22:04:34.216 qos_ea/qos_ma_ea_err 0/0/CPU0 t1  HAL_LTP110 can't program hash entries, rc 1165763072
Jan  8 22:04:34.216 qos_ea/qos_ma_ea_err 0/0/CPU0 t1  HAL_LTP121 results add failed, rc 1165763072
Jan  8 22:04:34.216 qos_ea/qos_ma_ea_err 0/0/CPU0 t1  CAPS_LTP75: qos_ea_add_policy() failed

There are 256k qos hash entries per np. The policy-maps you have configured requires 4 hash entries for each subscriber per one bundle member link.  Since your bundle has 6 member links, it will consume 24 entries for each subscriber. So, 14k subscribers will exceed this 256k entries and hence you are hitting the hardware limitations. 

 

Why is 4 hash entries? How to check that? One such unit per class-map per one bundle member!

 

Follow example explnation the limitation:

RP/0/RSP0/CPU0:GDDS-BRAS-ASR9K-2#sh controllers np ports all location 0/0/cpu0
Tue Jan 13 12:05:36.148 UTC
 
                Node: 0/0/CPU0:
----------------------------------------------------------------
 
NP Bridge Fia                       Ports                      
-- ------ --- --------------------------------------------------- 
0  --     0   GigabitEthernet0/0/0/0 - GigabitEthernet0/0/0/19 
1  --     1   TenGigE0/0/1/0, TenGigE0/0/1/1, TenGigE0/0/1/2, TenGigE0/0/1/3 

policy-map up_10240
 class outnet_class
  police rate 20480 kbps 
  !
 !
 class class-default
  police rate 10240 kbps 
  !
 !
 end-policy-map
!
policy-map down_10200
 class innet_class
  police rate 53248 kbps 
  !
 !
 class class-default
  police rate 10240 kbps 
  !
 !
 end-policy-map
!

RP/0/RSP0/CPU0:GDDS-BRAS-ASR9K-2#sh pppoe summary per-access-interface          
Tue Jan 13 12:10:50.713 UTC
 
0/RSP0/CPU0
-----------
    COMPLETE: Complete PPPoE Sessions
    INCOMPLETE: PPPoE sessions being brought up or torn down
 
Interface                        BBA-Group  READY   TOTAL  COMPLETE  INCOMPLETE
-------------------------------------------------------------------------------
BE10.100                             PPPOE      Y    5149      5147           2
BE20.100                             PPPOE      Y    4510      4510           0
BE21.100                             PPPOE      Y      24        24           0
BE100.100                            PPPOE      N       0         0           0
BE20.2500                            PPPOE      Y       1         1           0
BE20.2501                            PPPOE      Y      56        56           0
BE20.2502                            PPPOE      Y       0         0           0
BE20.2503                            PPPOE      Y       0         0           0
BE20.2504                            PPPOE      Y       0         0           0
BE20.2505                            PPPOE      Y       0         0           0
BE20.2506                            PPPOE      Y       0         0           0
BE20.2507                            PPPOE      Y       0         0           0
BE20.2508                            PPPOE      Y       0         0           0
                                             ----------------------------------
TOTAL                                          12    9740      9738           2
RP/0/RSP0/CPU0:GDDS-BRAS-ASR9K-2#show controllers np struct qoS-INTF all location 0/0/cpu0
Tue Jan 13 12:11:08.094 UTC
 
                Node: 0/0/CPU0:
----------------------------------------------------------------
NP: 0  Struct 2: QOS_INTF  
Struct is a PHYSICAL entity 
Reserved Entries: 0, Used Entries: 160360, Max Entries: 262144 
 
NP: 1  Struct 2: QOS_INTF  
Struct is a PHYSICAL entity 
Reserved Entries: 0, Used Entries: 0, Max Entries: 262144

Bundle 10 have 6 member; bundle 20 have 2 member; and all link at same NP.

So 4*6*5149 + 4*2*4510 = 123576 + 36080 = 159656

 

>>>2018-01-24 Update:

For same reason, we found another symptom, if you match, please check qos entry first:

1. From session level, have follow disconnect info:

+++ show subscriber manager disconnect-history unique summary 

10101324BE1000.2804.pppoe35992  2017:07:24 22:23:15     PAP Authentication 
                                                       failed, DC: 0 AC: 25 
                                                       TC: 17
                                                       
2. From ppp level, have follow "Feature Installation Failure" disconnect info
that due to getting some error form feature qos-ma. 

+++ show ppp disconnect-history location 0/RSP0/CPU0 [21:51:07.796 UTC Mon Jan 08 2018] +++

Interface                 Time disconnected   CLPN LCP FSM    Reason disconnected
----------------------------------------------------------------------------------
BE1.3554.pppoe33460       Jan  8 21:51:03.525      202389     Feature Installation Failure     <<<<

Jan  8 21:41:07.559245 1687 0/RSP0/CPU0 t1  [subdb_svr:SUBMGR36:ERROR:SUBMGR] SessID:[0x1af3da] BPI Apply Done failed while configuring ; [qos_ma], [0]feature:[QoS], error:[0x457c2000 ''prm_server' detected the 'warning' condition 'An operation that was requested was aborted - data integrity may be compromised.'']

3. From ppp level, have follow "PAP Authentication failed" disconnect info
Looks like authentication response from radius is error

Interface                 Time disconnected   CLPN LCP FSM    Reason disconnected
----------------------------------------------------------------------------------
BE1.3538.pppoe1598        Jan  8 21:51:03.528      202389     PAP Authentication failed  <<<

Jan  8 21:41:07.181303 0/RSP0/CPU0 t1  [IEDGE:TP838:AUTHEN:ERROR:16afe8] process_authen_error_response: Authen Received response ERRORED. status 5, error: 'AAA_BASE' detected the 'fatal' condition 'Invalid state (aaa base lib error)'

4. Radius have many rejects:

Server: 172.31.181.4/1812/1813  is UP
  Address family: IPv4
  Total Deadtime: 0s Last Deadtime: 0s
  Timeout: 5 sec, Retransmit limit: 3
  Quarantined: No
  Authentication:
    1421322 requests, 107 pending, 1130623 retransmits
    357674 accepts, 687281 rejects, 0 challenges
    1506883 timeouts, 0 bad responses, 0 bad authenticators

10. Authorize issue by DHCP option 60 and 61 under IPoE/DHCPv6

 
ASR9k couldn’t provide correct DHCPv6’s option information to radius. They use two options to authorize client, dhcp-vendor-class and dhcp-client-id-spl. Base on my test in lab, found easy to reproduce the issue.

 

In DHCPv4, that should option 60 and 61.
In DHCPv6, that should option 16 and 1.

 

Follow DHCPv4’s option:

dhcpv4-option.png

Follow DHCPv6’s option:

dhcpv6-option.png

 Follow lab information:

aaa attribute format op61@60
 format-string length 253 "%s@%s" dhcp-client-id-spl dhcp-vendor-class
!
policy-map type control subscriber IPoE-SUBSCRIBER-POLICY
 event session-start match-first
  class type control subscriber dual_stack_IPoE do-until-failure
   10 authorize aaa list for-local-dhcp format op61@60 password Cisco9k
   20 activate dynamic-template IPoE
  ! 
 !

>>> Follow ipv6 radius request:

RP/0/RSP0/CPU0:Jul 12 09:35:48.537 : radiusd[1124]:  RADIUS: Send Access-Request to 10.75.13.138:1812 id 183, len 224
RP/0/RSP0/CPU0:Jul 12 09:35:48.537 : radiusd[1124]:  RADIUS:  authenticator 58 99 2A C5 DF 35 89 D9 - F3 9B 18 82 E0 5C 55 CB
RP/0/RSP0/CPU0:Jul 12 09:35:48.537 : radiusd[1124]:  RADIUS:  Vendor,Cisco        [26]    41      
RP/0/RSP0/CPU0:Jul 12 09:35:48.537 : radiusd[1124]:  RADIUS:   Cisco AVpair        [1]    35      client-mac-address=000c.2906.961f
RP/0/RSP0/CPU0:Jul 12 09:35:48.537 : radiusd[1124]:  RADIUS:  Acct-Session-Id     [44]    10      000009cb
RP/0/RSP0/CPU0:Jul 12 09:35:48.537 : radiusd[1124]:  RADIUS:  NAS-Port-Id         [87]    16      0/0/410/10.398
RP/0/RSP0/CPU0:Jul 12 09:35:48.537 : radiusd[1124]:  RADIUS:  Vendor,Cisco        [26]    22      
RP/0/RSP0/CPU0:Jul 12 09:35:48.537 : radiusd[1124]:  RADIUS:   cisco-nas-port      [2]    16      0/0/410/10.398
RP/0/RSP0/CPU0:Jul 12 09:35:48.537 : radiusd[1124]:  RADIUS:  User-Name           [1]     17      ..........)...@ <<< not correct

>>> Follow ipv4 radius request:

RP/0/RSP0/CPU0:Jul 12 09:23:24.207 : radiusd[1124]:  RADIUS: Send Access-Request to 10.75.13.138:1812 id 119, len 260
RP/0/RSP0/CPU0:Jul 12 09:23:24.207 : radiusd[1124]:  RADIUS:  authenticator 0B 95 BD 2D 02 65 7D 70 - 9A F6 28 FA 60 C3 1C 06
RP/0/RSP0/CPU0:Jul 12 09:23:24.207 : radiusd[1124]:  RADIUS:  Vendor,Cisco        [26]    41      
RP/0/RSP0/CPU0:Jul 12 09:23:24.207 : radiusd[1124]:  RADIUS:   Cisco AVpair        [1]    35      client-mac-address=000c.2906.961f
RP/0/RSP0/CPU0:Jul 12 09:23:24.207 : radiusd[1124]:  RADIUS:  Vendor,Cisco        [26]    34      
RP/0/RSP0/CPU0:Jul 12 09:23:24.207 : radiusd[1124]:  RADIUS:   Cisco AVpair        [1]    28      dhcp-vendor-class=MSFT 5.0
RP/0/RSP0/CPU0:Jul 12 09:23:24.207 : radiusd[1124]:  RADIUS:  Acct-Session-Id     [44]    10      0000098b
RP/0/RSP0/CPU0:Jul 12 09:23:24.207 : radiusd[1124]:  RADIUS:  NAS-Port-Id         [87]    16      0/0/410/10.398
RP/0/RSP0/CPU0:Jul 12 09:23:24.207 : radiusd[1124]:  RADIUS:  Vendor,Cisco        [26]    22      
RP/0/RSP0/CPU0:Jul 12 09:23:24.207 : radiusd[1124]:  RADIUS:   cisco-nas-port      [2]    16      0/0/410/10.398
RP/0/RSP0/CPU0:Jul 12 09:23:24.207 : radiusd[1124]:  RADIUS:  User-Name           [1]     18      ...)...@MSFT 5.0 <<< correct inforamtion

 

After checked with DE, DHCPv6 will use the complete ‘duid’ value for client-id, and not support vendor-class, I had raised a DDTS: CSCuv30402 to follow that. So customer should use different radius profile for dhcp4/dhcpv6. Btw, DHCPv4's vendor-class have issue after 5.2.2, 5.3.1 will fix, you can check CSCus88647 for detail information.

 

11. DHCPv6 and Ping issue when deploy PPPOE + DHCPv6

 

Do the CASE now, summary later

 

12. Packets flow when deploy L3 BNG + COA

 

Complete test at last year, but no time to summary, update later.

 

13. How to config HTTP redirect by IXIA when test COA?

 

Complete test at last year, but no time to summary, update later.

 

14. Refer to IPv6 RA issue when set up IPv6 DHCPv6 under PPPoE/IPoE

 

  • What is RA? 

When you use dual stack for BNG + DHCPv6, you will need check RA, what is RA? RA will take M&O messages that let CPE do different action, and send default gateway(DHCPv6 couldn't send GW). Ok, how to define "M" and "O" ?

 

 M

OMeaning

Description

11Both Address and DNS are sent by DHCPv6 Stateful DHCPv6
01

Address sent by multicast RA (prefix+EUI-64);

DNS sent by DHCPv6

Stateless DHCPv6
00

Complete stateless config

Stateless AutoConfiguration

 

And the RA have two type: "Solicited RA"/unicast RA and "Unsolicited  RA"/multicast RA.  Have any different?  And when send them?

  • What is RS? 

RS should send by CPE, that will trigger RA generate on BNG/ASR9k. Follow behavior for PC:

 

Mac OSX: Couldn't initial RS under PPPoE, can generate by special command; normal send RS under IPoE. I had reported that issue on "Apple Support Communities", and submit a bug, no any respond: How to send "Router Solicitation" when do IPv6 PPPoE

 

Win7: Base on test, suggest use Win7 64bits that should work normal. And different with MAC, Win7 not care whether receive RA, will initial DHCPv6 packets.

 

Btw, RS definied in RFC4861, as follow:

https://tools.ietf.org/html/rfc4861#page-18

4.1. Router Solicitation Message Format Hosts send Router Solicitations in order to prompt routers to generate Router Advertisements quickly. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options ... +-+-+-+-+-+-+-+-+-+-+-+-

wireshark example:
ctm-ipv6-rs-01.png
  • Solicited RA / unicast:

That is a respond to the request generated from CPE, e.g PC, TP-Link, Linksys. At CPE side, will send RS (Router Solicitation) messages to request the router inforamtion, BNG will respond solicited RA.

 

solicited-ra.png
  • Unsolicited RA / multicast:

Router (BNG) send RA in intervals, and that indepenedent. 

 

unsolicited-ra.png
  • Have some issue under PPPoE?

For PPPoE, session(ipv4) should up before send RA, configuratio will effect under session interface(dynamic template), and RA behavior relation with a command "ipv6 nd ra-unicast". If config the command that would lead some CPE(not proactive send RS) couldn't get multicat RA. So suggestion remove the command under PPPoE, then follow test summary under 5.3.1:

 

PPPoEDynamic Template ConfigUcast unsolicited RAMcast unsolicited RAUnicast solicited RAMcast solicited RA
Ambiguouswith "ipv6 nd ra-unicast"not sendno mcast unsolicited RAsupportednot send
Ambiguouswithout "ipv6 nd ra-unicast"not sendsupportedno unicast solicited RAnot send
Special Vlanwith "ipv6 nd ra-unicast"not sendno mcast unsolicited RAsupportednot send
Special Vlanwithout "ipv6 nd ra-unicast"not sendsupportedno unicast solicited RAnot send

 

  • Have some issue under IPoE?

At first, you need to know, for IPOE, session should not up before send RA, any configuration under dynamic template will not effect for RA.

 

>>> For unsolicited/multicast RA:

Due to 9k don’t know inner VLAN in ambiguous(inner vlan is a range) scenario, so encapulation will incorrect for unsolicited/multicast RA that sent by access-interface. And unsolicited RA will not support in future due to the scale challenge involved, have DDTS: CSCud83802 to follow that, follow come from IPv6 ND team:

 

"Based on the configuration we might have ‘4096x4096’ (~2^24) RAs to be sent per access interface. Based on the discussion with BNG team, supporting this might be very challenging.  Due to resource limitation, not support today, you can check CSCud83802 ."

 

>>> For solicited/unicast RA:

Ok, unsolicited/multicast RA not support, whether solicited/unicast RA work normal under IPoE? Answer is "no"! And there are different behavior on two scenarios:

 

- ipv4 session establish first, then ipv6 set up:

-------------------------------------------------------------

as DE said, the command "ipv6 nd ra-unicast" not useful on IPoE, we will have new command "ipv6 nd start-ra-on-ipv6-enable" to support the scenario, that should support from 6.0.

 

- no ipv4 session, only have ipv6 session:

-------------------------------------------------------------

due to no any session before send RA, so send solicited/unicast RA need support on access-interface, but now we not support that...... I had raised a enhancement DDTS: CSCuv36770  to follow that.

 

  • More information:

Follow useful documents can be referenced for RA in different scenario, that post by Roy Jiang:

https://cisco.jiveon.com/docs/DOC-23008

 

Follow Roy's good document:

ASR9000 BNG IPv6 ND RA deployment best of practice 

 

Follow good document "ASR9K BNG RADIUS and CoA deployment guide" of Roy:

No comments:

Post a Comment