Sunday, July 11, 2021

New feature for PAM(64bit IOS XR)

 PAM(Platform Automated Monitoring),从6.1.2 版本开始(64bit, not in 32bit)开始引入该功能, 并且默认情况下是自动启动,用于监视进程crash,memory leak, CPU hog,traceback , disk usage等, 具体点就是当检测到某一事件时, 会自动采集一些信息并默认保存到harddisk:/cisco_support目录下, 供我们troubleshooting, 这一功能是全自动的,目前没法手动配置,具体示例可以参考以下文档:

PAM Events

从6.6.1 开始新引入一个feature, on-demand EDCD(Event Driven CLI Database ), 结合PAM能实现两种功能

  1. PAM Schedule: 每间隔一段时间采集一些信息
  2. PAM EEM Agent: 监控syslog, 若符合条件trigger采集一些信息

EDCD Ondemand-Create

1
2
3
4
5
6
RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand ?
  add-update          Add or update ondemand EDCD entries
  add-update-trigger  Add or update ondemand EDCD entries
  delete              Delete ondemand EDCD entries
  delete-all          Delete all EDCD entries
  trigger             Trigger the collection of traces associated with given identifier

//创建一个command list, 示例如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand add-update identifier xuxing_test commands "show run;show plat;show install active su"
Sun Apr 25 09:30:18.903 UTC
 
Ondemand EDCD has been updated (execute 'show edcd ondemand database' to verify.)
 
RP/0/RSP0/CPU0:ASR9910-B#
RP/0/RSP0/CPU0:ASR9910-B#show edcd ondemand database         
Sun Apr 25 09:30:58.713 UTC
============================================================
               Identifier: xuxing_test
============================================================
 1: show run
 2: show plat
 3: show install active su
------------------------------------------------------------

//往已有的command list 中新增一些命令的话, 使用如下的方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand add-update identifier xuxing_test commands "show clock"
Sun Apr 25 09:41:01.362 UTC
 
Ondemand EDCD has been updated (execute 'show edcd ondemand database' to verify.)
 
RP/0/RSP0/CPU0:ASR9910-B#
RP/0/RSP0/CPU0:ASR9910-B#show edcd ondemand database
Sun Apr 25 09:41:08.848 UTC
============================================================
               Identifier: xuxing_test
============================================================
 1: show run
 2: show plat
 3: show install active su
 4: show clock   <<<<<
------------------------------------------------------------
 
RP/0/RSP0/CPU0:ASR9910-B#

//admin cli 和 shell cli同样是支持的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand add-update identifier xuxing_test commands "admin show plat;run ng_show_version"
Sun Apr 25 09:48:36.510 UTC
 
Ondemand EDCD has been updated (execute 'show edcd ondemand database' to verify.)
 
RP/0/RSP0/CPU0:ASR9910-B#show edcd ondemand database                                                                  
Sun Apr 25 09:48:39.145 UTC
============================================================
               Identifier: xuxing_test
============================================================
 1: show run
 2: show plat
 3: show install active su
 4: admin show plat    <<<<
 5: run ng_show_version    <<<<
------------------------------------------------------------
 
RP/0/RSP0/CPU0:ASR9910-B#

EDCD Ondemand – Delete

可以选择删除某个command或者删除整个list:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand delete identifier xuxing_test ?
  commands  Specify a list of commands that to be deleted (if missing all entries under this sub-pattern will be deleted)
  <cr> 
   
RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand delete identifier xuxing_test commands "show clock"
Sun Apr 25 09:43:31.815 UTC
 
Ondemand EDCD has been updated (execute 'show edcd ondemand database' to verify.)
 
RP/0/RSP0/CPU0:ASR9910-B#show edcd ondemand database                                     
Sun Apr 25 09:43:34.277 UTC
============================================================
               Identifier: xuxing_test
============================================================
 1: show run
 2: show plat
 3: show install active su
------------------------------------------------------------

EDCD Ondemand – Trigger

如何测试command lish是否生效呢?可以使用以下命令:

1
2
3
4
5
6
7
RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand trigger identifier xuxing_test
Sun Apr 25 09:49:43.479 UTC
RP/0/RSP0/CPU0:ASR9910-B#
 
RP/0/RSP0/CPU0:Apr 25 09:36:40.033 UTC: run_cmd[69017]: %INFRA-INFRA_MSG-5-RUN_LOGIN : User cisco logged into shell from vty0
RP/0/RSP0/CPU0:Apr 25 09:36:46.775 UTC: run_cmd[69017]: %INFRA-INFRA_MSG-5-RUN_LOGOUT : User cisco logged out of shell from vty0
RP/0/RSP0/CPU0:Apr 25 09:49:54.118 UTC: logger[67945]: %OS-SYSLOG-4-LOG_WARNING : PAM has completed on-demand data collection for xuxing_test. All files are archived and saved at 0/RSP0/CPU0 : harddisk:/cisco_support/PAM-asr9k-ondemand-xr-xuxing_test-2021Apr25-094953.tgz (Please copy tgz file out of the router and send to Cisco support. This tgz file will be removed after 14 days.

如上所示,系统会尝试一个tar文件”harddisk:/cisco_support/PAM-asr9k-ondemand-xr-xuxing_test-2021Apr25-094953.tgz”, 从设备中copy出来解压缩显示如下:



PAM Schedule

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
RP/0/RSP0/CPU0:ASR9910-B#edcd scheduler add-update cadence '*/10 * * * *' ?    <<<< 两种方式, schedule command或者schedule之前配置好的command list
  command     Command to be executed at the above cadence
  identifier  An identifier linked to a list of CLIs (defined in ondemand EDCD)
  <cr>       
RP/0/RSP0/CPU0:ASR9910-B#edcd scheduler add-update cadence '*/10 * * * *' identifier xuxing_test
Sun Apr 25 10:03:26.302 UTC
Adding */10 * * * * root /pkg/bin/pam_is_active_rp && /pkg/bin/edcd_cli.py ondemand --operation trigger -i xuxing_test
Updating job file on remote RP
The following job has been added successfully:
*/10 * * * * root /pkg/bin/pam_is_active_rp && /pkg/bin/edcd_cli.py ondemand --operation trigger -i xuxing_test
RP/0/RSP0/CPU0:ASR9910-B
RP/0/RSP0/CPU0:ASR9910-B#show edcd scheduler    <<<<   查看已有的scheduler
Sun Apr 25 10:03:33.842 UTC
<Job ID>: <job content>
1: */10 * * * * root /pkg/bin/pam_is_active_rp && /pkg/bin/edcd_cli.py ondemand --operation trigger -i xuxing_test
RP/0/RSP0/CPU0:ASR9910-B#

‘*/10 * * * *’, 代表每隔10分钟执行一次, 这里的参数如何设置可以参考Linux crontab介绍:

Linux crontab

如何删除该schedule:

1
2
3
4
5
6
7
RP/0/RSP0/CPU0:ASR9910-B#edcd scheduler delete job-id 1    <<<< 使用job id 删除,job-id通过“show edcd scheduler”获得
Sun Apr 25 10:08:42.937 UTC
The following job has been deleted:
*/10 * * * * root /pkg/bin/pam_is_active_rp && /pkg/bin/edcd_cli.py ondemand --operation trigger -i xuxing_test
 
Updating job file on remote RP
RP/0/RSP0/CPU0:ASR9910-B#

PAM EEM

No comments:

Post a Comment