Experience: Troubleshooting a CM IPMEDPRO TN2602AP
This post is merely a log of a particular troubleshooting experience initiated by CM IPMEDPRO alarms.
My CM6 system recently notified me of a set of alarms:
Alarm Report
============
Port Maintenance On Alt Alarm Svc Ack? Date
Name Brd? Name Type State 1 2 Alarmed
02A13 IPMEDPRO y MINOR y 02/05/18:17
02A13 IPMEDPRO y MINOR y 02/05/18:17
02A1302 MEDPROPT y WARNING OUT 02/05/18:17
02A1301 MEDPROPT y WARNING OUT 02/05/18:17
Logging into SAT, I determined that the alarms were still active:
display alarms
ALARM REPORT
Port Mtce On Alt Alarm Svc Ack? Date Date
Name Brd? Name Type State 1 2 Alarmed Resolved
02A13 IPMEDPRO y MINOR y 02/05/18:17 00/00/00:00
02A13 IPMEDPRO y MINOR y 02/05/18:17 00/00/00:00
02A1302 MEDPROPT y WARNING OUT 02/05/18:17 00/00/00:00
02A1301 MEDPROPT y WARNING OUT 02/05/18:17 00/00/00:00
When there are alarms, there are generally corresponding errors that provide more information. We’ll look for all errors in port-network 2:
display errors Page 1 of 1
ERROR REPORT
The following options control which errors will be displayed.
ERROR TYPES
Error Type: Error List: active-alarms
REPORT PERIOD
Interval: a From: / / : To: / / :
EQUIPMENT TYPE ( Choose only one, if any, of the following )
Media Gateway:
Cabinet:
Port Network: 2
Board Number:
Port:
Category:
Extension:
Trunk ( group/member ): /
And get this result:
display errors
HARDWARE ERROR REPORT - ACTIVE ALARMS
Port Mtce Alt Err Aux First/Last Err Err Rt/ Al Ac
Name Name Type Data Occurrence Cnt Rt Hr St
02A1302 MEDPROPT 1025 02/05/18:17 255 10 11 a y
02/06/19:08
02A1301 MEDPROPT 1025 02/05/18:17 255 10 11 a y
02/06/19:08
02A13 IPMEDPRO 1025 131 02/04/10:32 9 0 0 a y
02/05/18:20
02A13 IPMEDPRO 1793 02/05/18:17 254 10 6 a y
02/06/19:08
By looking at the listed port numbers, I see that board 02A13 is an IPMEDPRO, and it has two ports in error (02A1301 and 02A1302). Troubleshooting a fault in CM generally starts with a board, rather than a port – particularly if the board itself is in error; port errors could be subsidiary errors.
Avaya publishes manuals detailing error types, codes, tests, etc. This particular switch is CM 6.3; we therefore want to refer to Maintenance Alarms for Avaya Aura® Communication Manager, Branch Gateways and Servers, Release 6.3. You can find the most recent publication applicable to your switch by logging into https://support.avaya.com using your Avaya SSO account, and searching for “Maintenance Alarms” while filtering for your switch release.
After finding IPMEDPRO in the index, we find that The IPMEDPRO maintenance object applies to the TN2302 IP Media Processor and the TN2602AP IP Media Resource 320 circuit packs
. To determine which circuit pack we have that is in fault, we list configuration for the board in error:
list configuration board 2a13
SYSTEM CONFIGURATION
Board Assigned Ports
Number Board Type Code Vintage u=unassigned t=tti p=psa
02A13 IP MEDIA PROCESSOR TN2602AP HW28 FW061 01 02
Now we can instead look up “IPMEDPRO (TN2602AP IP Media Resource 320)”. The manual provides a list of error log entries and recommended course of action, when possible. Using the above “display errors” output, we find more information for the thrown errors:
Error Type 1025: a module on the board failed. Aux Data values between 16641 and 16895 indicate a critical problem. See Board Health Query Test (#1652).
Error Type 1793: no electrical signal is detected on the Ethernet cable. The Ethernet cable is unplugged or there is a problem with a connection to the network interface.
It appears that error 1025 is a fairly generic “there’s an on-board module that has failed”, but 1793 gives us something to investigate.
CM is able to test equipment on-demand, so that you can see which tests have failed (and resulted in the aforementioned errors):
test board 02a13 long Page 1
TEST RESULTS
Port Mtce Name Alt. Name Test No. Result Error Code
02A13 IPMEDPRO 52 PASS
02A13 IPMEDPRO 1402 PASS
02A13 IPMEDPRO 1371 PASS
02A13 IPMEDPRO 1383 FAIL
02A13 IPMEDPRO 1379 FAIL 2805
02A13 IPMEDPRO 1506 ABORT
02A13 IPMEDPRO 1511 PASS
02A13 IPMEDPRO 1405 PASS
02A13 IPMEDPRO 1629 PASS
02A13 IPMEDPRO 1652 FAIL 16745
02A13 IPMEDPRO 1630 ABORT 1115
02A13 IPMEDPRO 1680 PASS
02A1301 MEDPROPT 1382 PASS
02A1301 MEDPROPT 1380 ABORT
02A1301 MEDPROPT 1407 ABORT 1
02A1302 MEDPROPT 1382 PASS
02A1302 MEDPROPT 1380 ABORT
02A1302 MEDPROPT 1407 ABORT 1
We can see that several demand tests failed. I’ll cite text from each failed test’s error documentation from the aforementioned manual.
DSP Query Test (#1382) failed with no error code.
The DSP failed. If it continues to fail, it will be taken out of service.
Ping Test (#1379) failed with error code 2805.
The number of pings received did not match the number sent (normally one ping sent). This means that no ping responses were received from the gateway defined on the ip-interface form for the IP Media Processor.
1. Retry the command at 1-minute intervals up to 3 times.
Board Health Query Test (#1652) failed with error code 16745.
The board has a critical error and will be taken out-of-service. Check that the other tests for the board pass. If they do no t:
1. Attempt to reset the circuit pack.
2. Rerun the test. If the problem continues, replace the circuit pack.
This tells me that there is a failure, likely hardware in nature, that is causing this circuit pack to be unable to communicate with the local IP network. Let’s determine what IP address it is assigned, so that we can see if the board responds to ping (the reverse direction of the ping it previously tried).
display ip-interface 02a13 Page 1 of 3
IP INTERFACES
Critical Reliable Bearer? y
Type: MEDPRO
Slot: 02A13 Slot: 02A14
Code/Suffix: TN2602 Code/Suffix: TN2602
Enable Interface? y Enable Interface? y
VLAN: n VLAN: n
Network Region: 2
VOIP Channels: 320
IPV4 PARAMETERS
Node Name: [rdctd]2a13 IP Address: 10.2.7.14
Duplicate Node Name: [rdctd]2a14 IP Address: 10.2.7.15
Gateway Node Name: [redacted] IP Address: 10.2.7.1
Subnet Mask: /22
IPV4 COMMON ATTRIBUTES
Shared Virtual Node Name: [rdctd]virt IP Address: 10.2.7.17
Virtual MAC Table: 1
Virtual MAC Address: 02:04:0d:4a:53:c1
display ip-interface 02a13 Page 2 of 3
IP INTERFACES
ETHERNET OPTIONS
Slot: 02A13 Slot: 02A14
Auto? n Auto? n
Speed: 100Mbps Speed: 100Mbps
Duplex: Full Duplex: Full
display ip-interface 02a13 Page 3 of 3
IP INTERFACES
VOIP/NETWORK THRESHOLDS
Enable VoIP/Network Thresholds? n
Noting that the IPMEDPROs are paired as 10.2.7.14 and 10.2.7.15, we will try to ping both of them:
$ ping -c 5 10.2.7.14
PING 10.2.7.14 (10.2.7.14): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
--- 10.2.7.14 ping statistics ---
5 packets transmitted, 0 packets received, 100.0% packet loss
$ ping -c 5 10.2.7.15
PING 10.2.7.15 (10.2.7.15): 56 data bytes
64 bytes from 10.2.7.15: icmp_seq=0 ttl=62 time=0.999 ms
64 bytes from 10.2.7.15: icmp_seq=1 ttl=62 time=0.896 ms
64 bytes from 10.2.7.15: icmp_seq=2 ttl=62 time=0.946 ms
64 bytes from 10.2.7.15: icmp_seq=3 ttl=62 time=1.049 ms
64 bytes from 10.2.7.15: icmp_seq=4 ttl=62 time=1.070 ms
--- 10.2.7.15 ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.896/0.992/1.070/0.064 ms
This seems to confirm CM’s report that the board is offline.
Let’s see if a busyout/release helps:
busyout board 2a13
COMMAND RESULTS
Port Maintenance Name Alt. Name Result Error Code
02A13 IPMEDPRO PASS
02A1301 MEDPROPT PASS
02A1302 MEDPROPT PASS
release board 2a13
COMMAND RESULTS
Port Maintenance Name Alt. Name Result Error Code
02A13 IPMEDPRO PASS
02A1301 MEDPROPT PASS
02A1302 MEDPROPT PASS
And after busyout/release, let’s try testing the board again:
test board 2a13 long Page 1
TEST RESULTS
Port Mtce Name Alt. Name Test No. Result Error Code
02A13 IPMEDPRO 52 PASS
02A13 IPMEDPRO 1402 PASS
02A13 IPMEDPRO 1371 PASS
02A13 IPMEDPRO 1383 FAIL
02A13 IPMEDPRO 1379 FAIL 2805
02A13 IPMEDPRO 1506 ABORT 2100
02A13 IPMEDPRO 1511 PASS
02A13 IPMEDPRO 1405 PASS
02A13 IPMEDPRO 1629 PASS
02A13 IPMEDPRO 1652 FAIL 16745
02A13 IPMEDPRO 1630 ABORT 1115
02A13 IPMEDPRO 1680 PASS
02A1301 MEDPROPT 1382 PASS
02A1301 MEDPROPT 1380 ABORT
02A1301 MEDPROPT 1407 ABORT 1
02A1302 MEDPROPT 1382 PASS
02A1302 MEDPROPT 1380 ABORT
02A1302 MEDPROPT 1407 ABORT 1
No change. This is the second time that this alarm has been thrown in the last 30 days. Last time, we swapped out the TN2602AP with an identical spare, and the replacement immediately came in-service. The error recurring leads me to believe that the problem is not a fault of the circuit-pack, but instead another component that may have failed.
When troubleshooting problems like these, I generally tend to work in a progression such that I check for administrative/configuration issues, then use software diagnostics (i.e., busyout, test board, etc), then bypass/replace/mitigate hardware in something resembling an order of likelihood of failure. In this case, the order of the hardware in my perceived likelihood of failure:
- Cable between far-end IP switchport and CrossFire Adapter
- The board itself
- The CrossFire Adapter (connects the ethernet cable to the G650)
- IP switchport
- Port-network/G650 itself (TDM bus, internal cabling, etc)
Except that we’ll strike #2 from this list, because it was recently put into service as a result of a failure appearing to be identical to this one.
Since that equipment is at a remote site, and there is nobody at the remote site, and interruption of the remote site will not cause a noteworthy outage, let’s see what happens if we test the TDM bus. ==A TDM TEST WILL CAUSE A SERVICE INTERRUPTION FOR EVERYTHING IN THE PORT-NETWORK.==
test tdm port-network 2 Please Wait
TEST RESULTS
Port Mtce Name Alt. Name Test No. Result Error Code
PN 02A TDM-BUS 294 PASS
PN 02A TDM-BUS 296 PASS
PN 02A TDM-BUS 297 ABORT 1005
PN 02B TDM-BUS 294 PASS
PN 02B TDM-BUS 296 ABORT 1005
PN 02B TDM-BUS 297 PASS
The TDM bus passed, but I noticed that my station’s alarm lamp for MINOR shut off. Let’s check that:
display alarms
ALARM REPORT
Port Mtce On Alt Alarm Svc Ack? Date Date
Name Brd? Name Type State 1 2 Alarmed Resolved
02A13 IPMEDPRO n WARNING 02/06/19:38 00/00/00:00
02A13 IPMEDPRO n WARNING 02/06/19:38 00/00/00:00
02A1302 MEDPROPT y WARNING OUT 02/06/19:38 00/00/00:00
02A1301 MEDPROPT y WARNING OUT 02/06/19:38 00/00/00:00
02A13 IPMEDPRO n WARNING 02/06/19:38 00/00/00:00
test board 2a13
TEST RESULTS
Port Mtce Name Alt. Name Test No. Result Error Code
02A13 IPMEDPRO 52 PASS
02A13 IPMEDPRO 1371 PASS
02A13 IPMEDPRO 1383 FAIL
02A13 IPMEDPRO 1379 FAIL 2805
02A13 IPMEDPRO 1505 ABORT 2806
02A13 IPMEDPRO 1511 PASS
02A13 IPMEDPRO 1405 PASS
02A13 IPMEDPRO 1629 PASS
02A13 IPMEDPRO 1630 ABORT 1115
02A13 IPMEDPRO 1680 PASS
02A1301 MEDPROPT 1407 ABORT 1
02A1302 MEDPROPT 1407 ABORT 1
Nope, the fault still exists. The switch merely reclassified it.
The next point of troubleshooting that will be least-effort is to replace the ethernet cable between the IP switchport and the CrossFire Adapter.
At this point, I went to the site housing port-network 2.
- Observed no link-light on IP switch.
- Observed red LED at top of 2a13.
Re-seated board 2a13.
- Observed no link-light on IP switch.
- Observed red LED at top of 2a13.
Replaced ethernet cable between CrossFire Adapter and IP switchport.
- Observed no link-light on IP switch.
- Observed red LED at top of 2a13.
Re-checked IP switchport configuration. It’s on a Cisco Catalyst. (I also did this prior to coming on-site, but forgot to mention it.)
interface FastEthernet1/0/5
description CONNECTION TO AVAYA G650 2a13
switchport access vlan 2
speed 100
duplex full
mls qos trust dscp
spanning-tree portfast
The switchport is correctly configured, per the above display ip-interface 02a13
output.
Let’s try bouncing the IP switchport:
switch(config)#int fa1/0/5
switch(config-if)#shutdown
switch(config-if)#no shutdown
switch(config-if)#exit
switch(config)#exit
- Observed no link-light on ip switch.
- Observed red LED at top of 2a13.
At this point, I’ve replaced the ethernet cable, and re-seated the board. The next thing in my order list of potentially failed hardware is the CrossFire Adapter. I removed the existing CrossFire Adapter, and replaced it with a spare.
- Observed no link-light on ip switch.
- Observed red LED at top of 2a13.
test board 2a13
TEST RESULTS
Port Mtce Name Alt. Name Test No. Result Error Code
02A13 IPMEDPRO 52 PASS
02A13 IPMEDPRO 1371 PASS
02A13 IPMEDPRO 1383 FAIL
02A13 IPMEDPRO 1379 FAIL 2805
02A13 IPMEDPRO 1505 ABORT 2806
02A13 IPMEDPRO 1511 PASS
02A13 IPMEDPRO 1405 PASS
02A13 IPMEDPRO 1629 PASS
02A13 IPMEDPRO 1630 ABORT 1115
02A13 IPMEDPRO 1680 PASS
02A1301 MEDPROPT 1407 ABORT 1
02A1302 MEDPROPT 1407 ABORT 1
That also didn’t help.
The next item on the list to check is the IP switchport. I’m using a Catalyst in this case; the switchports are not easily swappable. Instead, I’ll configure a nearby port such that I can move the cable onto it:
Changing IP switchport from 1/0/5 to 1/0/32
interface FastEthernet1/0/32
description TESTING BOARD 02A13
switchport access vlan 272
speed 100
duplex full
mls qos trust dscp
spanning-tree portfast
Moved cable from Fa1/0/5 to Fa1/0/32.
Feb 6 20:30:31.228: %LINK-3-UPDOWN: Interface FastEthernet1/0/32, changed state to up
Feb 6 20:30:32.234: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet1/0/32, changed state to up
- Observed immediate link light on the IP switch
- Observed green lights blinking on board 2a13
display errors
HARDWARE ERROR REPORT - ACTIVE ALARMS
Port Mtce Alt Err Aux First/Last Err Err Rt/ Al Ac
Name Name Type Data Occurrence Cnt Rt Hr St
test board 2a13
TEST RESULTS
Port Mtce Name Alt. Name Test No. Result Error Code
02A13 IPMEDPRO 52 PASS
02A13 IPMEDPRO 1371 PASS
02A13 IPMEDPRO 1383 PASS
02A13 IPMEDPRO 1379 PASS
02A13 IPMEDPRO 1505 ABORT 2806
02A13 IPMEDPRO 1511 PASS
02A13 IPMEDPRO 1405 PASS
02A13 IPMEDPRO 1629 PASS
02A13 IPMEDPRO 1630 PASS
02A13 IPMEDPRO 1680 PASS
02A1301 MEDPROPT 1407 PASS
02A1302 MEDPROPT 1407 PASS
The board immediately came in service, the errors and alarms immediately cleared, and the board passed the test that I demanded.
The only action still needed was to make the appropriate changes to the Catalyst, so that my changes are reflected (and hopefully nobody tries to use Fa1/0/5 in the future):
interface FastEthernet1/0/5
description PHYSICAL FAULT. DO NOT USE.
shutdown
!
interface FastEthernet1/0/32
description CONNECTION TO AVAYA G650 02A13
switchport access vlan 272
speed 100
duplex full
mls qos trust dscp
spanning-tree portfast
!
I honestly have no idea whether or not this post will be useful to anyone – please let me know. This really didn’t take me any time to write (<20 minutes). I grab most of the above “screen captures” whenever I’m troubleshooting anyway, so that I have them to refer to immediately afterward. (I recommend that you do the same.)