In my last EEM post I provided a simple means to change an IP address and default route of a Cisco router using a script that makes the change without requiring interactive user input. This is helpful if you are remotely changing a device’s WAN/Internet IP and waiting for some on-site hands to move a cable over to a new ISP or WAN SP connection. That first script, however, would make the change and then exit. What would happen if the new Internet connection had a problem, or the on-site help couldn’t move the cable for some reason? Proper testing and preparation should help you avoid most of those issues but you just never know.
One way to deal with this possibility is to issue a “reload in 10” before kicking off the EEM change script. If the change can’t be completed, the router will reboot back to its previous configuration. That’s fine, but I like to avoid a full reboot whenever possible, and “reload in” has always been a rather clunky rollback mechanism.
Another idea mentioned by Jody Lemoine in the comments of the last post, is taking advantage of the newer configuration archive and rollback features. While my original version of this improved script was more manual in nature, I really liked Jody’s idea and decided to incorporate it in to demonstrate the rollback feature as well. The only real difference is that the newer version of the script would potentially clobber any existing “config archive” path value so if you were to use this in an environment that was already using the archive config feature you may want to note and restore the archive path when you’re done. I probably could have worked that into the script as well but I was short on time to lab this improved-improved version so we’ll just go with it.
Here’s the new script:
event manager applet IP-CHANGE event none sync no default 90 maxrun 300 action 0001 comment ----------------------------------- action 0005 set _filesys "disk0:" action 0010 set _iface "fast0/0" action 0020 set _old_gw "192.0.2.2" action 0030 set _new_ip_mask "12.12.12.1 255.255.255.0" action 0040 set _new_gw "12.12.12.2" action 0050 set _ping_target "4.2.2.2" action 0060 set _max_tries "3" action 0099 comment ----------------------------------- action 0100 syslog priority alerts msg "IP change 30 second countdown initiated, change $_iface to $_new_ip_mask" action 0199 wait 30 action 0200 syslog priority alerts msg "Preparing for IP address change" action 0210 cli command "enable" action 0220 cli command "config term" action 0230 cli command "archive" action 0240 cli command "path $_filesys:/arc.cfg" action 0250 cli command "end" action 0300 syslog priority alerts msg "Proceeding with IP address change" action 0400 cli command "enable" action 0410 cli command "config term revert timer 5" action 0420 cli command "interface $_iface" action 0430 cli command "ip addr $_new_ip_mask" action 0440 cli command "no ip route 0.0.0.0 0.0.0.0 $_old_gw" action 0450 cli command "ip route 0.0.0.0 0.0.0.0 $_new_gw" action 0460 cli command "end" action 0500 syslog priority notifications msg "IP change complete. $_iface is $_new_ip_mask, gateway is $_new_gw." action 0600 set _num_tries "1" action 0700 while $_num_tries le $_max_tries action 0710 syslog priority notifications msg "Waiting 60 seconds for cabling change" action 0720 wait 60 action 0730 syslog priority notifications msg "Initiating connectivity check to $_ping_target." action 0740 cli command "enable" action 0750 cli command "ping $_ping_target rep 10 time 1" action 0760 regexp "(\!\!\!\!)" "$_cli_result" _match action 0770 if $_regexp_result eq "1" action 0771 syslog priority alerts msg "Ping check succeeded! Confirm change and exit script." action 0772 cli command "config confirm" action 0775 exit action 0780 else action 0781 syslog priority alerts msg "Ping check #$_num_tries failed." action 0782 increment _num_tries 1 action 0790 end action 0800 end action 0900 syslog priority alerts msg "All ping checks failed, reverting change" action 1000 cli command "enable" action 1010 cli command "config revert now" action 1100 syslog priority notifications msg "Reversion complete" action 9999 exit
The IP change itself looks very similar to the original version, with the main change being the use of variables that are all defined at the top of the script. This makes it easier to ensure you’re modifying all necessary parameters when reusing this script on different devices.
The other alterations are around the rollback and also the connectivity check. To ensure the rollback will work, actions 200-250 ensure that the configuration archive feature is enabled and pointed at a local file system. Then, when the change starts, action 410 starts configuration mode with a 5 minute reversion timer. If the configuration is not confirmed within 5 minutes of beginning this configuration session, the changes will be rolled back.
Following the change, we have a set of nested conditionals in actions 600-800. Basically, we set ourselves a little loop iteration counter (_num_tries), and after waiting a minute for the cable swap we try pinging a predefined target after the change. In action 750 we run a ping command, and action 760 then does a regular expression match for several !’s against the result of the preceding CLI command. If (action 770) the RegEx test succeeded (indicating successful pings), then we automatically issue the “config confirm” command to lock in the change. Otherwise (action 780) we increment the attempt counter, log a message, and go back to the top of the while loop to wait and then try the ping again. The net effect here is that the router tries 3 times over 3 minutes to ping a target following the change. This gives the on-site hands plenty of time to swap cables around. If those three attempts all fail, then starting at action 900, we revert the change using the “configure revert now” CLI. Even if this failed for some reason, the fact that we put a 5 minute timer on the config session in the first place will revert the config a few minutes later even if the script bombs for some reason.
Let’s see it in action! We’re using the same lab topology and change as before:
First, a failed change where we never establish connectivity at the new IP.
From R3, we reach R1 and initiate the change:
R3#telnet 192.0.2.1 Trying 192.0.2.1 ... Open R1#event man run IP-CHANGE R1#exit [Connection to 192.0.2.1 closed by foreign host] R3#
Here are the log messages from R1’s console showing what happened:
R1# *Apr 27 14:23:15.400: %HA_EM-1-LOG: IP-CHANGE: IP change 30 second countdown initiated, change fast0/0 to 12.12.12.1 255.255.255.0 *Apr 27 14:23:45.404: %HA_EM-1-LOG: IP-CHANGE: Preparing for IP address change *Apr 27 14:23:45.524: %SYS-5-CONFIG_I: Configured from console by on vty0 (EEM:IP-CHANGE) *Apr 27 14:23:45.560: %HA_EM-1-LOG: IP-CHANGE: Proceeding with IP address change *Apr 27 14:23:47.704: %ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_BACKUP: Backing up current running config to disk0:/arc.cfg-Apr-27-14-23-45.600-3 *Apr 27 14:23:47.704: %ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_START_ABSTIMER: User: : Scheduled to rollback to config disk0:/arc.cfg-Apr-27-14-23-45.600-3 in 5 minutes *Apr 27 14:23:48.100: %SYS-5-CONFIG_I: Configured from console by on vty0 (EEM:IP-CHANGE) *Apr 27 14:23:48.204: %HA_EM-5-LOG: IP-CHANGE: IP change complete. fast0/0 is 12.12.12.1 255.255.255.0, gateway is 12.12.12.2. *Apr 27 14:23:48.360: %HA_EM-5-LOG: IP-CHANGE: Waiting 60 seconds for cabling change *Apr 27 14:24:48.204: %HA_EM-5-LOG: IP-CHANGE: Initiating connectivity check to 4.2.2.2. *Apr 27 14:24:58.360: %HA_EM-1-LOG: IP-CHANGE: Ping check #1 failed. *Apr 27 14:24:58.556: %HA_EM-5-LOG: IP-CHANGE: Waiting 60 seconds for cabling change *Apr 27 14:25:58.368: %HA_EM-5-LOG: IP-CHANGE: Initiating connectivity check to 4.2.2.2. *Apr 27 14:26:08.556: %HA_EM-1-LOG: IP-CHANGE: Ping check #2 failed. *Apr 27 14:26:08.712: %HA_EM-5-LOG: IP-CHANGE: Waiting 60 seconds for cabling change *Apr 27 14:27:08.560: %HA_EM-5-LOG: IP-CHANGE: Initiating connectivity check to 4.2.2.2. *Apr 27 14:27:18.712: %HA_EM-1-LOG: IP-CHANGE: Ping check #3 failed. *Apr 27 14:27:18.720: %HA_EM-1-LOG: IP-CHANGE: All ping checks failed, reverting change *Apr 27 14:27:18.740: %ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_REVERTNOW: User: : Rollback immediately. *Apr 27 14:27:18.744: %ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_ROLLBACK_START: Start rolling to: disk0:/arc.cfg-Apr-27-14-23-45.600-3 *Apr 27 14:27:18.760: Rollback:Acquired Configuration lock. *Apr 27 14:27:22.500: %HA_EM-5-LOG: IP-CHANGE: Reversion complete
You can see the ping checks that failed eventually resulting the in script cutting back with the “config revert now” function.
Now, a successful change:
R3#telnet 192.0.2.1 Trying 192.0.2.1 ... Open R1#event man run IP-CHANGE R1#show event man policy active Key: p - Priority :L - Low, H - High, N - Normal, Z - Last s - Scheduling node :A - Active, S - Standby default class - 1 applet event no. job id p s status time of event event type name 1 4 N A running Sun Apr27 14:31:31 2014 none IP-CHANGE R1#exit
R1’s logs show the success:
R1# *Apr 27 14:31:31.500: %HA_EM-1-LOG: IP-CHANGE: IP change 30 second countdown initiated, change fast0/0 to 12.12.12.1 255.255.255.0 *Apr 27 14:32:01.500: %HA_EM-1-LOG: IP-CHANGE: Preparing for IP address change *Apr 27 14:32:01.664: %SYS-5-CONFIG_I: Configured from console by on vty0 (EEM:IP-CHANGE) *Apr 27 14:32:01.700: %HA_EM-1-LOG: IP-CHANGE: Proceeding with IP address change *Apr 27 14:32:03.872: %ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_BACKUP: Backing up current running config to disk0:/arc.cfg-Apr-27-14-32-01.740-5 *Apr 27 14:32:03.876: %ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_START_ABSTIMER: User: : Scheduled to rollback to config disk0:/arc.cfg-Apr-27-14-32-01.740-5 in 5 minutes *Apr 27 14:32:04.188: %SYS-5-CONFIG_I: Configured from console by on vty0 (EEM:IP-CHANGE) *Apr 27 14:32:04.200: %HA_EM-5-LOG: IP-CHANGE: IP change complete. fast0/0 is 12.12.12.1 255.255.255.0, gateway is 12.12.12.2. *Apr 27 14:32:04.360: %HA_EM-5-LOG: IP-CHANGE: Waiting 60 seconds for cabling change *Apr 27 14:33:04.204: %HA_EM-5-LOG: IP-CHANGE: Initiating connectivity check to 4.2.2.2. *Apr 27 14:33:04.392: %HA_EM-1-LOG: IP-CHANGE: Ping check succeeded! Confirm change and exit script. *Apr 27 14:33:04.400: %ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_CONFIRM: User: : Confirm the configuration change
And after confirming a ping to the new IP from R3, I re-established the telnet.
R3#ping 12.12.12.1 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 12.12.12.1, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 8/8/8 ms R3#telnet 12.12.12.1 Trying 12.12.12.1 ... Open R1#exit [Connection to 12.12.12.1 closed by foreign host]
Now, in all reality, you would probably bag everything after action 500 in the script and rely on the config archive/rollback to handle the rollback on the 5 minute timer. There’s also something to be said for requiring the engineer him/herself to actually be the one to confirm and lock in the change. I let the script do that automatically just to prove that it could be done and to demonstrate loop control within EEM including a while loop with a nested conditional.
EEM is a powerful tool. When combined with IOS.sh and the embedded Tcl shell, you can do even more. I hope that in some upcoming blog posts, I’ll explore these features further. Network automation is getting lots of attention right now, and while not everything is a perfect use case for automation tools, this series of posts shows that even simple changes can be addressed using simple automation techniques. If you’ve used EEM or other IOS scripting/automation tools to do cool stuff, let me know in a comment.