I recently had an Avaya Local Survivable Processor (LSP) unregister. There was no indication why – it just went down. I was able to get to its web interface, and it had a minor alarm indicating that it was unregistered. I suppose I could have restarted the whole thing, but my tech talked me through this procedure, which worked great.
Please note that the LSP is typically registered and inactive. This LSP only activates if network connectivity is lost to the site. So if the home site is available through the WAN, it should look like this:
list survivable-processor SURVIVABLE PROCESSORS Record Name/ Type Reg Act Translations Net Number IP Address Updated Rgn 1 THE_LSP LSP y n 2:02 12/30/2013 10 10.2.10.110 No V6 Entry
However, in my case it looked like this!
list survivable-processor SURVIVABLE PROCESSORS Record Name/ Type Reg Act Translations Net Number IP Address Updated Rgn 1 THE_LSP LSP n 10 10.2.10.110 No V6 Entry
No entry to indicate the active status nor the last translation date. I was able to SSH to the LSP and I confirmed the LSP wasn’t active. The media gateway was still connected to my home site. The LSP could ping the home site. Somehow, the LSP was simply lost.
So I confirmed all CM processes were up with the “statapp” command:
telecom@pbx-cm01> statapp Watchdog 9/ 9 UP SIMPLEX TraceLogger 3/ 3 UP SIMPLEX LicenseServer 2/ 2 UP SIMPLEX SME 6/ 6 UP SIMPLEX MasterAgent 1/ 1 UP SIMPLEX MIB2Agent 1/ 1 UP SIMPLEX MVSubAgent 1/ 1 UP SIMPLEX LoadAgent 1/ 1 UP SIMPLEX FPAgent 1/ 1 UP SIMPLEX INADSAlarmAgent 1/ 1 UP SIMPLEX GMM 4/ 4 UP SIMPLEX SNMPManager 1/ 1 UP SIMPLEX filesyncd 1/ 1 UP SIMPLEX MCD 1/ 1 UP SIMPLEX CommunicaMgr 86/86 UP SIMPLEX telecom@pbx-cm01>
Sure enough, all processes were running fine. So I stopped CM with the command
stop -afcn
This will stop the processes listed above, refreshing as it goes. It’s pretty nice. When all processes have stopped, you can start CM again with
start -ac
Again, it refreshes the screen as the processes start back up. When it’s done, the LSP should re-register. It’s also a good idea to watch the home CM for this register event. About 30 seconds after the CM comes back, you should see it. In your home site, do a
list trace ras ip-a 10.2.10.110
while the LSP processes restarts.
Here is the register event
list trace ras ip-address 10.2.10.110 LIST TRACE time data 17:45:39 TRACE STARTED 12/30/2013 CM Release String cold-03.0.124.0-21166 17:46:37 rcv RRQ ext endpt [10.2.10.110]:1719 switch [10.1.10.110]:1719 17:46:52 rcv KARRQ ext endpt [10.2.10.110]:1719 switch [10.1.10.110]:1719 17:47:07 rcv KARRQ ext endpt [10.2.10.110]:1719 switch [10.1.10.110]:1719 17:47:20 RAS TRACE COMPLETE ext
Those KARRQ messages are the keepalives.
After this restart, the LSP was registered fine. No service impact at the site, as the LSP is inactive unless the WAN goes down. And much less scary (for me anyway) than restarting the whole LSP.
Happy New Year everyone!