Recently, during a very carefully planned, meticulously scripted change, I hit a bump in the road. It ended up not being a total show-stopper, but it did derail the change by about 30 minutes while we figured out a work-around. I think this kind of thing has probably happened to anyone who has been working on networks for a while. Invariably you get through (or abort!) whatever you’re working on and the following morning you start digging into what you hit. That’s when you discover that one command that would have saved your bacon last night. I call this the Morning-After command.
The Night Before
In my example from the other night, we were trying to build a new layer 3 port-channel between a pair of 6500s to move the IBGP connection for my customer (a hosting facility/hosted VoIP provider) off of a VLAN that had recently blown up to a dedicated routed link. In the planning process I had worked out all the ports we would use for the new link. I wanted to spread the link over a couple of line cards to guard against a card failure. Since these 6500s are fairly sparsely populated at the moment I only had a couple blades to work from. One thing I didn’t even think to look at was the model of blade in each slot. During our change window, I configured 4 ports for my port channel from blades 4 and 8, then we connected the cables. And this came across the console…
Mar 26 20:28:12.009 EDT: %EC-SP-5-CANNOT_BUNDLE2: Gi4/47 is not compatible with Gi8/47 and will be suspended (qos-card types of Gi4/47 do not match Gi8/47)
Although I didn’t check this before-hand as part of the planning process (clearly a miss on my part), I immediately knew what it was about. The 6500-series has many different hardware details on each line card. One of these is the QOS queuing design. In the data sheets for each blade you can find the queuing model for the blade, such as “1p3q3t” which signifies that the ports on that module support a single priority queue, 3 other traffic queues, and 3 discard thresholds per queue. If you don’t feel like looking up the data sheet, you can use the command “show interface <blah> capabilities” to see the queuing model(s) supported. Doing that on the ports in question from my recent change, we can readily see the mismatch (due to the different line cards):
6509-1#show int gi8/47 capabilities | i Model|QOS sched Model: WS-X6748-GE-TX QOS scheduling: rx-(2q8t), tx-(1p3q8t) 6509-1#show int gi4/47 capabilities | i Model|QOS sched Model: WS-X6548-GE-TX QOS scheduling: rx-(1q2t), tx-(1p2q2t)
The more capable WS-X6748 card is 1p3q8t, while the older WS-X6548 is less flexible with only 1p2q2t queuing capability. Going into great detail on what these parameters really mean for QoS design is beyond the scope I want to write about in this article (I recommend this Cisco Press book for a great source to learn about switch QoS). But why the complaint from the switch (especially for a logical port-channel interface)? On a switch, which is a very hardware-centric device, egress queuing is performed entirely by the hardware ASICs that drive the ports. Even if the ports are bundled together in a port-channel, each member port handles queuing independently because it depends on its associated hardware buffers queue frames during congestion, rather than a software queue like a software router. And that’s why the switch refuses to bundle together ports with unlike queuing models: because it would necessarily mean that two identical flows that happened to get hashed to two different member ports might be queued differently resulting in inconsistent treatment of like flows.
The Morning After
So what was the Morning-After command? It came to me courtesy of Twitter:
@bobmccouch do I get you right? Tried the port-channel commandno mls qos channel-consistency?
— Jimmy (@0x86DD) March 27, 2013
Ah-hah! There’s a nerd-knob for my issue!
What are some other examples of Morning-After commands? The list grows over time. Sometimes they are hidden IOS commands, other times just commands one never thinks to look for. They are always commands that cause you to face-plam the moment you see that they exist because they would have saved you much pain had you known about them. Here are a few of my favorites:
- “speed nonegotiate” — As I recently described on this very blog
- “service unsupported-transceiver” — Thoroughly covered in Tom Hollingsworth’s post at Networking Nerd.
- “bgp bestpath cost-community ignore” — I’ve seen inclusion of the cost community mess up expected path selection. Ivan Pepelnjak explains it here.
- “sdm prefer dual-ipv4-and-ipv6” — Can’t figure out why your switch doesn’t recognize “ipv6” commands?
- “no ppp chap ignoreus” — OK, this one is probably only really useful anymore if you’re preparing for the CCIE R&S lab.
Will You Still Respect Me in the Morning?
Not knowing the Morning-After command when you needed it the night before can be frustrating and downright embarrassing. It’s easy to feel really bad when you discover that the event that got delayed or aborted could have been saved if you’d just known that command. But rest assured, no one knows every possible configuration command for every possible platform.
Of course, in a perfect world we wouldn’t be worried about such things because we’d always have a lab environment to test every change. But working at a small consulting shop like I do, frankly we don’t have a pair of 6500s with Sup 720 3Bs and a mix of 65XX and 67XX line cards, or every other exact combination of hardware that our customers do. My usual methods for testing change procedures with tools like Dynamips wouldn’t catch this particular glitch because of the hardware dependency. Even using a pair of fixed configuration Catalyst switches like the 3560s in my lab wouldn’t help because they will never have a QoS model mismatch between interfaces.
Certainly, as networking practitioners we should make every effort to validate and verify every procedure we plan to implement, simulating the real world as closely as possible. But sometimes it just doesn’t shake out that way and you have to learn a lesson the hard way. It’s important to note that once you discover that command in the morning, you will probably never forget it. Maybe you can even use it bail someone else out someday (looking like a super-hero-genius in the process!).
If you discovered a good Morning-After command, leave it in the comments!
This one has to do with unsupported transceivers as well:
no errdisable detect cause gbic-invalid
bgp bestpath as-path multipath-relax