Using EEM to Remotely Change a WAN IP – Part 2

In my last EEM post I provided a simple means to change an IP address and default route of a Cisco router using a script that makes the change without requiring interactive user input. This is helpful if you are remotely changing a device’s WAN/Internet IP and waiting for some on-site hands to move a cable over to a new ISP or WAN SP connection. That first script, however, would make the change and then exit. What would happen if the new Internet connection had a problem, or the on-site help couldn’t move the cable for some reason? Proper testing and preparation should help you avoid most of those issues but you just never know.

One way to deal with this possibility is to issue a “reload in 10” before kicking off the EEM change script. If the change can’t be completed, the router will reboot back to its previous configuration. That’s fine, but I like to avoid a full reboot whenever possible, and “reload in” has always been a rather clunky rollback mechanism.

Another idea mentioned by Jody Lemoine in the comments of the last post, is taking advantage of the newer configuration archive and rollback features. While my original version of this improved script was more manual in nature, I really liked Jody’s idea and decided to incorporate it in to demonstrate the rollback feature as well. The only real difference is that the newer version of the script would potentially clobber any existing “config archive” path value so if you were to use this in an environment that was already using the archive config feature you may want to note and restore the archive path when you’re done. I probably could have worked that into the script as well but I was short on time to lab this improved-improved version so we’ll just go with it.

Here’s the new script:

event manager applet IP-CHANGE
 event none sync no default 90 maxrun 300
 action 0001 comment -----------------------------------
 action 0005 set _filesys "disk0:"
 action 0010 set _iface "fast0/0"
 action 0020 set _old_gw "192.0.2.2"
 action 0030 set _new_ip_mask "12.12.12.1 255.255.255.0"
 action 0040 set _new_gw "12.12.12.2"
 action 0050 set _ping_target "4.2.2.2"
 action 0060 set _max_tries "3"
 action 0099 comment -----------------------------------
 action 0100 syslog priority alerts msg "IP change 30 second countdown initiated, change $_iface to $_new_ip_mask"
 action 0199 wait 30
 action 0200 syslog priority alerts msg "Preparing for IP address change"
 action 0210 cli command "enable"
 action 0220 cli command "config term"
 action 0230 cli command "archive"
 action 0240 cli command "path $_filesys:/arc.cfg"
 action 0250 cli command "end"
 action 0300 syslog priority alerts msg "Proceeding with IP address change"
 action 0400 cli command "enable"
 action 0410 cli command "config term revert timer 5"
 action 0420 cli command "interface $_iface"
 action 0430 cli command "ip addr $_new_ip_mask"
 action 0440 cli command "no ip route 0.0.0.0 0.0.0.0 $_old_gw"
 action 0450 cli command "ip route 0.0.0.0 0.0.0.0 $_new_gw"
 action 0460 cli command "end"
 action 0500 syslog priority notifications msg "IP change complete. $_iface is $_new_ip_mask, gateway is $_new_gw."
 action 0600 set _num_tries "1"
 action 0700 while $_num_tries le $_max_tries
 action 0710  syslog priority notifications msg "Waiting 60 seconds for cabling change"
 action 0720  wait 60
 action 0730  syslog priority notifications msg "Initiating connectivity check to $_ping_target."
 action 0740  cli command "enable"
 action 0750  cli command "ping $_ping_target rep 10 time 1"
 action 0760  regexp "(\!\!\!\!)" "$_cli_result" _match
 action 0770  if $_regexp_result eq "1"
 action 0771   syslog priority alerts msg "Ping check succeeded! Confirm change and exit script."
 action 0772   cli command "config confirm"
 action 0775   exit
 action 0780  else
 action 0781   syslog priority alerts msg "Ping check #$_num_tries failed."
 action 0782   increment _num_tries 1
 action 0790  end
 action 0800 end
 action 0900 syslog priority alerts msg "All ping checks failed, reverting change"
 action 1000 cli command "enable"
 action 1010 cli command "config revert now"
 action 1100 syslog priority notifications msg "Reversion complete"
 action 9999 exit

The IP change itself looks very similar to the original version, with the main change being the use of variables that are all defined at the top of the script. This makes it easier to ensure you’re modifying all necessary parameters when reusing this script on different devices.

The other alterations are around the rollback and also the connectivity check. To ensure the rollback will work, actions 200-250 ensure that the configuration archive feature is enabled and pointed at a local file system. Then, when the change starts, action 410 starts configuration mode with a 5 minute reversion timer. If the configuration is not confirmed within 5 minutes of beginning this configuration session, the changes will be rolled back.

Following the change, we have a set of nested conditionals in actions 600-800. Basically, we set ourselves a little loop iteration counter (_num_tries), and after waiting a minute for the cable swap we try pinging a predefined target after the change. In action 750 we run a ping command, and action 760 then does a regular expression match for several !’s against the result of the preceding CLI command. If (action 770) the RegEx test succeeded (indicating successful pings), then we automatically issue the “config confirm” command to lock in the change. Otherwise (action 780) we increment the attempt counter, log a message, and go back to the top of the while loop to wait and then try the ping again. The net effect here is that the router tries 3 times over 3 minutes to ping a target following the change. This gives the on-site hands plenty of time to swap cables around. If those three attempts all fail, then starting at action 900, we revert the change using the “configure revert now” CLI. Even if this failed for some reason, the fact that we put a 5 minute timer on the config session in the first place will revert the config a few minutes later even if the script bombs for some reason.

Let’s see it in action! We’re using the same lab topology and change as before:

ISP Change Topology

First, a failed change where we never establish connectivity at the new IP.

From R3, we reach R1 and initiate the change:

R3#telnet 192.0.2.1
Trying 192.0.2.1 ... Open
R1#event man run IP-CHANGE
R1#exit

[Connection to 192.0.2.1 closed by foreign host]
R3#

Here are the log messages from R1’s console showing what happened:

R1#
*Apr 27 14:23:15.400: %HA_EM-1-LOG: IP-CHANGE: IP change 30 second countdown initiated, change fast0/0 to 12.12.12.1 255.255.255.0
*Apr 27 14:23:45.404: %HA_EM-1-LOG: IP-CHANGE: Preparing for IP address change
*Apr 27 14:23:45.524: %SYS-5-CONFIG_I: Configured from console by on vty0 (EEM:IP-CHANGE)
*Apr 27 14:23:45.560: %HA_EM-1-LOG: IP-CHANGE: Proceeding with IP address change
*Apr 27 14:23:47.704: %ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_BACKUP: Backing up current running config to disk0:/arc.cfg-Apr-27-14-23-45.600-3
*Apr 27 14:23:47.704: %ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_START_ABSTIMER: User: : Scheduled to rollback to config disk0:/arc.cfg-Apr-27-14-23-45.600-3 in 5 minutes
*Apr 27 14:23:48.100: %SYS-5-CONFIG_I: Configured from console by on vty0 (EEM:IP-CHANGE)
*Apr 27 14:23:48.204: %HA_EM-5-LOG: IP-CHANGE: IP change complete. fast0/0 is 12.12.12.1 255.255.255.0, gateway is 12.12.12.2.
*Apr 27 14:23:48.360: %HA_EM-5-LOG: IP-CHANGE: Waiting 60 seconds for cabling change
*Apr 27 14:24:48.204: %HA_EM-5-LOG: IP-CHANGE: Initiating connectivity check to 4.2.2.2.
*Apr 27 14:24:58.360: %HA_EM-1-LOG: IP-CHANGE: Ping check #1 failed.
*Apr 27 14:24:58.556: %HA_EM-5-LOG: IP-CHANGE: Waiting 60 seconds for cabling change
*Apr 27 14:25:58.368: %HA_EM-5-LOG: IP-CHANGE: Initiating connectivity check to 4.2.2.2.
*Apr 27 14:26:08.556: %HA_EM-1-LOG: IP-CHANGE: Ping check #2 failed.
*Apr 27 14:26:08.712: %HA_EM-5-LOG: IP-CHANGE: Waiting 60 seconds for cabling change
*Apr 27 14:27:08.560: %HA_EM-5-LOG: IP-CHANGE: Initiating connectivity check to 4.2.2.2.
*Apr 27 14:27:18.712: %HA_EM-1-LOG: IP-CHANGE: Ping check #3 failed.
*Apr 27 14:27:18.720: %HA_EM-1-LOG: IP-CHANGE: All ping checks failed, reverting change
*Apr 27 14:27:18.740: %ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_REVERTNOW: User: : Rollback immediately.
*Apr 27 14:27:18.744: %ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_ROLLBACK_START: Start rolling to: disk0:/arc.cfg-Apr-27-14-23-45.600-3
*Apr 27 14:27:18.760: Rollback:Acquired Configuration lock.
*Apr 27 14:27:22.500: %HA_EM-5-LOG: IP-CHANGE: Reversion complete

You can see the ping checks that failed eventually resulting the in script cutting back with the “config revert now” function.

Now, a successful change:

R3#telnet 192.0.2.1
Trying 192.0.2.1 ... Open
R1#event man run IP-CHANGE
R1#show event man policy active

Key: p - Priority        :L - Low, H - High, N - Normal, Z - Last
     s - Scheduling node :A - Active, S - Standby

default class - 1 applet event
 no.  job id      p s status  time of event             event type          name
 1    4           N A running Sun Apr27 14:31:31 2014   none                IP-CHANGE
R1#exit

R1’s logs show the success:

R1#
*Apr 27 14:31:31.500: %HA_EM-1-LOG: IP-CHANGE: IP change 30 second countdown initiated, change fast0/0 to 12.12.12.1 255.255.255.0
*Apr 27 14:32:01.500: %HA_EM-1-LOG: IP-CHANGE: Preparing for IP address change
*Apr 27 14:32:01.664: %SYS-5-CONFIG_I: Configured from console by  on vty0 (EEM:IP-CHANGE)
*Apr 27 14:32:01.700: %HA_EM-1-LOG: IP-CHANGE: Proceeding with IP address change
*Apr 27 14:32:03.872: %ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_BACKUP: Backing up current running config to disk0:/arc.cfg-Apr-27-14-32-01.740-5
*Apr 27 14:32:03.876: %ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_START_ABSTIMER: User: : Scheduled to rollback to config disk0:/arc.cfg-Apr-27-14-32-01.740-5 in 5 minutes
*Apr 27 14:32:04.188: %SYS-5-CONFIG_I: Configured from console by  on vty0 (EEM:IP-CHANGE)
*Apr 27 14:32:04.200: %HA_EM-5-LOG: IP-CHANGE: IP change complete. fast0/0 is 12.12.12.1 255.255.255.0, gateway is 12.12.12.2.
*Apr 27 14:32:04.360: %HA_EM-5-LOG: IP-CHANGE: Waiting 60 seconds for cabling change
*Apr 27 14:33:04.204: %HA_EM-5-LOG: IP-CHANGE: Initiating connectivity check to 4.2.2.2.
*Apr 27 14:33:04.392: %HA_EM-1-LOG: IP-CHANGE: Ping check succeeded! Confirm change and exit script.
*Apr 27 14:33:04.400: %ARCHIVE_DIFF-5-ROLLBK_CNFMD_CHG_CONFIRM: User: : Confirm the configuration change

And after confirming a ping to the new IP from R3, I re-established the telnet.

R3#ping 12.12.12.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 12.12.12.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 8/8/8 ms
R3#telnet 12.12.12.1
Trying 12.12.12.1 ... Open

R1#exit

[Connection to 12.12.12.1 closed by foreign host]

Now, in all reality, you would probably bag everything after action 500 in the script and rely on the config archive/rollback to handle the rollback on the 5 minute timer. There’s also something to be said for requiring the engineer him/herself to actually be the one to confirm and lock in the change. I let the script do that automatically just to prove that it could be done and to demonstrate loop control within EEM including a while loop with a nested conditional.

EEM is a powerful tool. When combined with IOS.sh and the embedded Tcl shell, you can do even more. I hope that in some upcoming blog posts, I’ll explore these features further. Network automation is getting lots of attention right now, and while not everything is a perfect use case for automation tools, this series of posts shows that even simple changes can be addressed using simple automation techniques. If you’ve used EEM or other IOS scripting/automation tools to do cool stuff, let me know in a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

@greatwhitetec

Virtualization, Storage, and other techy stuff

The Stupid Engineer

I ask those questions you're too clever to.

Sunay Tripathi's Blog

Pluribus Networks Founder's Blog on OS, Networking, Virtualization, Cloud Computing, Solaris Architecture, etc

Ed Koehler's Blog

Just another WordPress.com weblog

JGS.io

Data networking, stray thoughts, nerdy fun...

Network Heresy

Tales of the network reformation

The Borg Queen

Jottings on the intersection of tech and humanness

Networking From The Trenches

Ramblings about my thoughts, experiences, and ideas.

Networking 40,000

Attaining my CCIE with the help of Warhammer 40k

Network Shenanigans

Making Packets Do Silly Things

It must be the network...

Ramblings of JD (@subnetwork)

Not Another Network Blog

Musings from yet another IT nerd

rsts11 - Robert Novak on system administration

Resource sharing, time sharing, (20)11 and beyond. A retired sysadmin's blog.

%d bloggers like this: