Learning Journal: May 2017

Tuesday, 30 May 2017

Quality of Service - 2

MQC Class-Based Generic Traffic Shaping

The purpose of traffic shaping is to “format” an outbound packet flow so that it conforms to a traffic contract. The formatting process slows down the average bitrate and the packet flow structure, resulting in a traffic flow consisting of uniformly spaced traffic bursts. For example a customer buys an Ethernet circuit provisioned at 10Mbps, but the physical link to the provider is FastEthernet (100Mbps). Since the customer’s interface always serializes packets outbound at 100Mbps, and the service provider performs traffic policing/admission control inbound, shaping is needed on the customer side.

To slow the rate down, the first task of the shaper is to meter the traffic coming into the output queue, and decide whether it exceeds the target average rate. The concept of metering is based on the fact that traffic leaves an interface in a serial manner (bit by bit, packet by packet), and that packets are usually grouped in bursts, separated by periods of interface silence. While the router sends each burst at AR speed, the spacing between bursts makes the average rate less than the AR. The goal of metering is to mark those bursts that exceed (do not conform to) the desired average rate, called the Committed Information Rate (CIR).

The metering function of traffic shaping uses what is known as a token bucket model to determine if traffic conforms to, or exceeds, the average rate. Every time a packet tries to be de-queued to the transmit ring, the metering function compares the size of the packet trying to leave to the amount of tokens, or credit, in the token bucket. If the size of the packet is less than or equal to the amount of credit, the packet conforms and is sent. If the size of packet is greater than the amount of credit in the token bucket, the packet exceeds, and is delayed. The size of the token bucket is calculated by taking the desired average rate (CIR) in bits per second, and breaking it down into a smaller value of bursts in bits per interval in milliseconds. These values are expressed as Bc (Burst Committed) bits, and Tc (Time Committed) milliseconds. The size of the token bucket is Bc bits. Essentially every Tc period, the token bucket is refilled with the Bc amount in bits. Think of the Bc bits as tokens going into the bucket every Tc interval. The key point here isthat Bc bits per Tc interval is the same value as CIR bits per second, but is simply expressed in smaller units.

If packet exceeds because there are not enough tokens in the bucket, the shaping process delays the packet and holds it in the internal shaping queue. By this logic, even though traffic is always sent at the AR, the periods of delay incurred by non-conforming traffic in the shaping queue results in the overall average rate (CIR) being lower than the AR. The size of Tc is not manually configurable, however it is configured indirectly by configuring the CIR and the Bc values based on the formula Bc = CIR * Tc/1000; Tc is in milliseconds.

One possible problem with the above calculation for Bc is the case that the packet trying to be de-queued is larger than Bc, which means that there would never be enough credits in the token bucket to send it. For example if a packet’s size is 1500 bytes, but the Bc is only 1000 bytes. To deal with this situation the shaper calculates a deficit counter (e.g. 1000-1500=-500) and adds this counter to the accumulated credit in the next round (next Tc interval). In effect this reduces the amount of traffic to send the next time around. To avoid this problem altogether ensure that Bc is greater than the average packet size, which will achieve a smoother packet distribution. This is not always possible though, since there are cases when CIR value is too low. In the latter case, layer 2 fragmentation can be introduced. The next problem case that the scheduler can run into is when it has no traffic to send during a time interval (e.g. a pause in the packet stream), but it has more than Bc bits to send in the following time interval. Based on the leaky token bucket algorithm, no more than Bc bytes can be sent per Tc interval, even if in previous intervals it did not send enough traffic. The result of this is that the shaper achieves less than the desired average rate. To resolve this problem, traffic shaping uses what is known as a dual leaky token bucket, with the first token bucket represented as Committed Burst (Bc) and the second token bucket as Excess Burst (Be).

The Excess Burst bucket is only filled in the case that the full Bc bucket was not emptied in the previous interval. The extra credits, or tokens, left over from the Bc bucket are then moved to the Be bucket before the Bc bucket is refilled. For example, if the Bc size is 10 bits, but only 8 bits were sent in the current interval, a credit of 2 bits can be moved to the Be bucket if space is available. During the next interval, the scheduler can now de-queue up to Bc+Be bits. If Bc capacity is again not used completely, the left over credits are moved to Be, up to its maximum size.

Like Bc, Be has a finite size defined which controls how much credit can be stored. The size of the Be bucket is constrained by the Access Rate of the physical link, since the packets are always serialized at this rate. Therefore the maximum Be value (maxBe) is equal to (AR-CIR)*Tc/1000 which implies that if the shaper sends Bc+maxBe per Tc, it is sending at the Access Rate. The Be value can be set lower than maxBe, but should never exceed maxBe. Note that since Be is only populated due to a lack of Bc being used, the average sending rate over time still never exceeds the CIR.

With the command shape average under the policy-map class configuration, the CIR, Bc, and Be are defined. We used class-based shaping to limit the sub-interfaces sending rate. This is a common use of GTS(Generic Traffic Shaping), and the effect is that each sub-interface now uses its own software queue, whereas by default, all sub-interfaces share the software queue of their main interface. This also allows the use of separate QoS policies per sub-interface, because of the ability to tune shaper’s queue.

Using Ethanalyzer on Nexus platform for control-plane and data-plane traffic analysis

Ethanalyzer does not capture data traffic that Cisco NX-OS forwards in the hardware but you can use ACLs with log option as a workaround using ACL logging to sample specific packets from data plane.

When we use ACLs and the “log” keyword, access control entries (ACEs) with log keyword cause system to punt a copy of matching packets to supervisor CPU. Key point is that original traffic forwarded or dropped in hardware with no performance penalty. Note that punted copies subjected to hardware rate limiter, forwarding engine hardware enforces rate to avoid saturating inband interface/CPU.

hardware rate-limit access-list-log command adjusts rate (100 pps by default).

Full Packet Analysis

1. Define ACL entry with logging to match traffic of interest
ip access-list acl-cap
permit tcp 10.1.1.3/32 10.1.2.2/32 eq 5000 log
permit ip any any

2. Attach ACL to interface
interface e1/1
ip access-group acl-cap in

3. Define ethanalyzer capture and/or display filter to capture just the subject traffic
ethanalyzer local interface inband capture-filter “tcp port 5000”

4. View captured traffic on-switch, or copy to PC/workstation for GUI analysis
Example – Brief Decode On-Switch
n7010# ethanalyzer local interface inband brief capture-filter "tcp port 5000" limit-cap 3

Example – Full Decode On-Switch
n7010# ethanalyzer local interface inband capture-filter "tcp port 5000" limit-captured-frames 1 | no-more

Example – Write Data to File
n7010# ethanalyzer local interface inband capture-filter "tcp port 5000" limit-captured-frames 50 write bootflash:test.cap

Example Captures
This example shows detailed captured data for one HSRP packet:
switch(config)# ethanalyzer local interface mgmt capture-filter "udp port 1985"
limit-captured-frames 1

Other filter examples:

ethanalyzer local interface mgmt capture-filter “dst host 172.16.185.1”
ethanalyzer local interface inband capture-filter “stp”
ethanalyzer local interface inband decode-internal capture-filter “stp”
ethanalyzer local interface inband capture-filter “stp” limit-frame-size 64
ethanalyzer local interface inband capture-filter “icmp and host 10.10.10.1” limit-captured-frames 1000 write bootflash:icmp

Wednesday, 24 May 2017

Quality of Service - 1

MQC Bandwidth Reservations and CBWFQ

Ethernet Sub-Interfaces do not have a way to get the state of congestion that their underlying physical, or "main," interface may be experiencing. Due to this reason, a queuing policy applied directly to a subinterface would have no way to know when the link is congested, and thus would
never trigger. As such, Cisco IOS does not allow direct application of a policy-map that uses any sort of queuing policies directly to sub-interfaces. A way to overcome this limitation is to apply a policy that shapes the rate of the subinterface to create artificial congestion. This type of configuration is referred to as HQF, or Hierarchical Queuing Framework.

The logic of CBWFQ is that during congestion, a class with a bandwidth reservation of ClassBandwidth will have at least a ClassBandwidth/ InterfaceBandwidth share of the total interface bandwidth. CBWFQ reservation only becomes active when congestion occurs.

The CLI will not allow you to assign a service-policy with CBWFQ weights unless the interface is using FIFO queuing, which implies that CBWFQ is not compatible with any of the legacy queuing methods, such as custom queueing or priority queueing. These must be disabled explicitly before applying the service policy.

Each class configured with a bandwidth statement under the policy-map has its own dedicated FIFO queue in the interface’s CBWFQ conversation pool. The depth of each FIFO queue can be changed on a per-class basis with the queue-limit command. The overall WFQ settings, such as the CDT and the total queue size, can be set using the queue-limit under class-default, and the hold-queue <number> out command at the interface level.

As soon as the bandwidth keyword is specified under any user-defined class, the interface queue turns into CBWFQ. This means that any unmatched flows that fall back into class-default are scheduled using dynamic WFQ weights. This means that automatic classification occurs, along with precedence-based weight assignment and sharing of the single buffer space of WFQ. This behavior is default, even if you did not configure fair-queue under the class-default. If you want to disable fair-queue for unclassified packets, an explicit bandwidth value for the class-default can be configured, which turns it into a single FIFO queue.

Moreover, if you do not define any classes other than class-default, and class default has a bandwidth value defined, the entire interface queue essentially becomes a FIFO queue.
policy-map TEST
class class-default
bandwidth 96

Bandwidth Reservations and CBWFQ

It is not possible to mix the bandwidth and bandwidth percent commands in the same policy map; the units must be the same among classes.

MQC LLQ and Remaining Bandwidth Reservations

To prevent starvation of other queues, the packets de-queued from the LLQ conversation are metered using a simple token bucket, with a configurable rate and burst size. Packets that exceed the token bucket are dropped (policed) during times of congestion; if there is no congestion, exceeding traffic is not dropped, but it is simply not prioritized. Multiple classes inside a single policy-map can use the priority keyword, but only a single priority queue exists. This design of multiple priority classes is used to ensure that one priority flow does not starve another priority flow.

When the LLQ priority reservation is configured, the CBWFQ algorithm subtracts the reserved bandwidth of the priority queue from the interface’s available bandwidth or from the available rate based on the shaping policy. The remaining bandwidth can be used to create a relative bandwidth reservation for other classes in the CBWFQ.

bandwidth remaining percent

MQC WRED

CBWFQ supports three drop policies: classic tail-drop, which is the default for user-defined classes, Congestive Discard for WFQ, and Random Early Detection (RED).

When you apply the random-detect command under a user-defined class, it automatically removes the MQC LLQ and Remaining Bandwidth Reservations (pending update) command and enforces RED as the drop policy. When using RED with CBWFQ, each flow is considered an individual FIFO queue. This is similar to legacy flow-based WRED; the big improvement is the ability to use random drop per flow, not per whole queue.

MQC Dynamic Flows and WRED

There are two ways to enable WRED within class-default. The first is to configure a bandwidth reservation statement (turning the class’s queue into a FIFO queue) and then enabling RED, and the second is to enable RED with WFQ. The second case

activates RED dropping to replace Congestive Discard Threshold-based drops for dynamic flows.

MQC WRED with ECN

TCP Explicit Congestion Notification (ECN), similar to BECN and FECN in Frame Relay, is used to signal the forthcoming of network congestion for TCP flows. Originally, TCP detected network congestion based on packet loss, timeouts, and duplicate acknowledgments. This was usually the result of full queues and unconditional packet drops. TCP ECN allows the network to signal the receiver of the flow that the network is close to dropping packets. It’s then up to the TCP receiver to decide how to react to this notification; it usually signals the sender to slow the sending rate. The overall effect of TCP ECN is better performance, compared to simple packet drops and slow start, because it allows the sender to respond faster than slow start would and results in less time spent on the recovery from a packet loss.

TCP ECN works together with RED by changing the exceed action from random drop to ECN marking. Instead of randomly dropping a packet when the average queue depth grows above the minimum threshold, RED marks packet with the special ECN flag.

ip tcp ecn

policy-map AAA

class BBB

random-detect

random-detect ecn