Routing in Packet Networks

A packet switch network provides communication services among multiple nodes/routers. There are usually multiple route/path from one node/router to another. Which route this the best? The optimality depends on the objective function that the network tries to optimize, for example, minimal delay, minimal number of hops, maximum bandwidth, and the minimum cost.

Routing Tables

To create routing tables, a routing algorithm, in general, need to :

Gather information of state of links
- up / down, congested, delay or other metrics
Distribute links state information using a routing protocol
- what to exchange? how often? with whom?
Compute optimal routes based on link state information
- single or multiple metrics, single or multiple routes

After the algorithm has determined the route, the route information will be stored in the routing table. So that each router knows how to forward package.

Here is an example of routing table for datagram network. The table contains many entries, each of which is for a destination in the network. The entry specifies the next hop, that is to be taken by packets with associated destination. When a packet arrives, its route is determined by table lookup. The size of table could become impractical for very large number of destinations.

Route table

Destination address      Output port
      1705                    7
      1345                   12
      ...                    ...

For example, Internet protocol (IP) use datagram packet switching across networks. Each host has two part IP address network address and host address. Routers do table lookup based on network address only, which will reduce size of routing tables significantly.

In a virtual-circuit packet switching network, route is determined during connection setup. The virtual connection identifier VCI is local to the router and for each link the identifier may be translated to a different identifier by using label switching, depending on the available VCIs at a given link.

The size of routing tables can be reduced if a hierarchical approach is used in the assignment of addresses.

Without the hierarchical approach, there is no relationship between address and routing proximity.
With the hierarchical approach, host that are near each other should have address that have common prefix (denoted by the first few bits). Routers only need to exam the prefix, in order to decide how a package should be routed.

Flooding and Deflection Routing

There are two specialized approaches to routing called flooding and deflection routing, which are used in certain network scenarios.

Flooding means sending (broadcasting) a packet to all nodes in a network. No routing tables are required, so it is useful in the starting of a network. But the problem is the flooding may easily swamp the network.

To reduce resource consumption due to flooding: One simple method is to use a Time-to-Live field in each packet that it limits the number of hops to a certain diameter. Each router decrements the Time-to-Live by one before flooding the packet. If the value reaches zero, the router discards the packet. Another method is each router has is identifier before flooding, and discard the packet if it contains the same identifier of the router. Furthermore, sequence number can be used for ease of implementation.

Deflection routing provide multiple paths for each source and destination pair. Each network node forwards packets to preferred port, if preferred port is busy, it will deflect the packet to another port. So the node can be buffer-less, and packets don’t have to wait for a specific port to become available.

Shortest Path Routing

In datagram packet switch networks, there are often many possible paths connecting any given source to any given destination. Typically, it is possible to attach a cost or distance to a link connecting two routers. Many routing algorithms are based on variants of shortest path routing which tries to determine the shortest path according to some cost criteria.

There are two general approaches to shortest path routing: distance vector based, and state link based.

Distance Vector

In this approach, neighbours exchange list of distances to destinations. The best next-hop is determined for each destinations based on the Bellman-Ford algorithm. The routing table for each destination lists the next hop and distance.

Destination    Next hop     Distance
   22             3            10
   ...            ..           ...

The idea is, if route D_i is the shortest path from node i to destination, and node j is a neighbor of i on the shortest path, then we have D_i = C_ij + D_j, where C_ij is the shortest distance between node i and j, and D_j is the shortest distance from router j to the destination. However node i only has local information from its neighbors.

        /- C_ij' - - [ D_j' ] - - -
       /
[ D_i ] - - C_ij - - [ Dj ] - - - ... - - - [ Destination ]
       \
        \- C_ij'' - - [ D_j'' ] - - -

Bellman-Ford Algorithm

The distance vector works like this: consider all routers want to know their shortest distance to the destination. These routers are one-hop, two- hops, or three-hops to the destination.

Destination node sends accurate distance information to its one-hop nodes.
The one-hop nodes calculate the current shortest distance to the destination, and send that information to their neighbors (i.e. the two-hops nodes).
The two-hop nodes calculate the current shortest distance to the destination, and then send the information to their neighbors too (i.e. the three-hops nodes).

By doing so hop by hop, eventually the current shortest information about distance to the destination is rippled across the whole network. And the the shortest path from each node to the destination is calculated.

For example, in this graph there are 6 nodes, we want to build the shortest paths to node 6. Initially, all nodes are assigned a pair of values (-1, ∞), which means the next hop is unknown and the distance is infinite.

          2
 [ 1 ] - - - - [ 3 ]
  |    5\   2/      \1
  |3    [ 4 ]        [ 6 ]
  |   1/     \3     /2
 [ 2 ] - - - - [ 5 ]
          4

Iteration	Node 1	Node 2	Node 3	Node 4	Node 5
Initial	(-1, ∞)	(-1, ∞)	(-1, ∞)	(-1, ∞)	(-1, ∞)
1	(-1, ∞)	(-1, ∞)	*(6, 1)*	(-1, ∞)	*(6, 2)*
2	*(3, 3)*	*(5, 6)*	(6, 1)	*(3, 3)*	(6, 2)
3	(3, 3)	*(4, 4)*	(6, 1)	(3, 3)	(6, 2)

The Count-To-Infinity Problem

The distance vector approach has a problem that broken links will cause the minimal cost from each node to the destination node to be recomputed. It is also possible that there are 2 nodes believe that the best path is through each other. As a result, a packet in either of these 2 nodes bounce back and forth until the minimal cost is infinite, at which point the process realize that the destination node is unreachable. This problem is often called Count-to-infinity.

To avoid it, several changes to the distance vector algorithm have been proposed but none of them work well in all situations. One method that was widely implemented is called the Split Horizon in which the minimum cost to a given destination is NOT sent to the neighbor if the neighbor is the next node along the shortest path.

Another variation is called a Split Horizon with Poisoned Reverse which allows a node to send the minimal costs to all its neighbor. But the minimal cost to a given destination is set to infinity, if the neighbor is the next node along the shortest path.

The essential problem of distance vector approach is that the information is only exchanged with local neighbors. The link-state algorithm seeks global optimization.

Routing Information Protocol (RIP)

Routing Information Protocol uses the distance vector algorithm and is implemented in the routed program which is distributed in BSD UNIX. It runs on top of UDP (which is unreliable) using port 520. The metrics used in the computation of shortest path is typically configured to be the number of hops, which is limited to value 15, and value 16 means infinity to deal with the counter-to-infinity problem. RIP is suitable for small networks (say LAN).

A router which implements RIP will:

Send update message to its neighbors every 30 seconds.
Expect to receive an update message from each of its neighbors within 180 seconds (worst case).
- A router assumes a link is failed and set the corresponding cost to 16 (infinity)

RIP uses Split Horizon with Poisoned Reverse to reduce routing loops. Convergence is speed up by requiring a router to implement triggered updates.

Link State

The basic idea of the link state algorithm involves three steps:

Each node creates a link-state packet containing the link states to its neighbours.
Each node broadcasts (by flooding) its link-state packet (containing its neighbours’ ID, and distance to them), so that each node in the network will
- get a map of all nodes in the network
- get link metrics of the entire network
Find the shortest path on the map from the source node to all destination nodes.

But when to build links data packets? A quick answer is periodically or when significant events occur such as system reboot.

Dijkstra Algorithm

Dijkstra Algorithm is an excellent approach for finding the shortest paths from a source node to all other nodes in a network. It is generally more efficient than the Bellman-Ford algorithm. Generally speaking the idea is:

Put the source node in a set, at this moment, the set contains only the source node.
Find the closest node from the set’s neighbors, add the node into the set.
Repeat the step 2. At the k-th iteration, it will find the k-th closest node from the source node.

Take the example above again, with node 1 being the source node, use Dijkstra algorithm to solve it:

          2
 [ 1 ] - - - - [ 3 ]
  |    5\   2/      \1
  |3    [ 4 ]        [ 6 ]
  |   1/     \3     /2
 [ 2 ] - - - - [ 5 ]
          4

Iteration	D1 (Set)	D2	D3	D4	D5	D6
Initial	{1}	3	2	5	∞	∞
1	{1, 3}	3	X	4	∞	3
2	{1, 2, 3}	X		4	7	3
3	{1, 2, 3, 6}			4	5	X
4	{1, 2, 3, 4, 6}			X	5
5	{1, 2, 3, 4, 5, 6}				X

Many routers in the Internet support both distance vector routing and the link state routing protocols. Nodes in a network must cooperate and exchange information to obtain the values of the metrics, so distance vector routing is used, the problem is the protocol adapts to changes in the network topology slowly.

The advantage of link state routing is that it react to the failure of network very quickly. When a link fails, the node 1) sets the failed link distance to infinity and 2) floods the network with an update packet. All nodes immediately update their link database and recalculate their shortest paths.

In the datagram network, each router is responsible for determining the next hop along the shortest path. If every router performs the same process, it is hop-by-hop routing.

Another routing approach is called source routing, whereby the path to the destination is determined by the source router. Source routing allows the host to control to the path that its information traverse in the network.

Usually the source host initially include the entire path in the packet header, e.g [1, 3, 6, 8]
Each node examines the packet header, stripes off the current address, e.g. [1, 3, 6, 8] becomes [3, 6, 8] and forward the packet to the next node.

Open Shortest Path First (OSPF)

Open Shortest Path First was developed to fix some of the deficiencies in RIP. It works in an autonomous system which is loosely defined as a set of routers or networks technically demonstrated by a single organization. OSPF protocol uses link-state routing approach, enabling each router to learn the complete network topology.

To improve scalability, OSPF introduce a two-level hierarchy that allows an autonomous system to be partitioned into several groups called areas that are interconnected by a central backbone area.

 area 1       area 2         area 3
     \          |            /
       central backbone area
     /          |            \
 area 4       area 5         area ...

An area is identified by a 32-bit number known as the area ID. Routers in an area only knows complete topology inside that area and limits the flooding of link-state information to the area, which makes the protocol more scalable.

Each area must be connected to backbone area that is identified as area ID. Area border routers summarize information from other areas and a backbone area distributes router info between areas.

Four types routers are defined in OSPF:

An internal router	A router with all its links connected to the networks within the same area.
An area border router	A router with its links connected to more than one area.
A backbone router	A router that has its links connected to the backbone.
An autonomous system boundary router	A router that has its links connected to another autonomous system, which learns about routes outside anonymous system through an exterior gateway protocol such as BGP.

Asynchronous Transfer Mode (ATM)

Asynchronous Transfer Mode (ATM) is a connection-oriented method for time-division multiplexing (TDM) and packet switching that provides rich quality of service support like real time voice / video, circuit emulation for digital transportation, and data traffic with bandwidth guarantees.

ATM has a strong similarity to virtual circuit packet switching. One major difference is ATM use of short, fixed-length cells. This approach simplifies implementation of switch, and it makes high speed possible. The term asynchronous is that its transmission of cells is not synchronous to any frame structure.

The information flows generated by various users are converted into “cells” which are 53 bytes longs, including 5 bytes header.
Cells are sent to an ATM multiplexer, which arranges them into one or more queues
An scheduling strategy determines the order in which the cells are transmitted, providing different data flows for different quality of services.

	TDM	Packet Switching
Variable bitrate	Multi-rate only	Easily handled
Delay	Low and fixed	Variable
Burst traffic	Inefficient	Efficient
Processing	Minimal, very high speed	Header and packet processing required

Traffic Management

Traffic management is concerned with delivery of quality of service to the end user and with efficient use of network resources. Based on the traffic granularity, we classify traffic management into three levels:

	Traffic management is concerned with	Time scale
Packet level	Packet *queuing* and packet *scheduling* at a switch, routers and multiplexers, in order to provide differentiated treatment for packets belonging to different quality of service classes.	microseconds
Flow level	Individual traffic flow to ensure the quality of services delivered to users.	milliseconds ~ seconds
Flow aggregate level	Routing of aggregated traffic flow across the network.	minutes ~ days

Packet Level

The path traverses by a packet can be modeled as a sequence of queuing systems. A packet traversing network encounters delay and possible loss at various multiplexing points. Packets from other flows may also interfere with a packet contending buffer and transmission capacity along the path.

End-to-end performance is the accumulation of hop performance. If we are able to guarantee that the delay at each system can be kept below some upper bound, the end-to-end delay can be kept below the sum of all upper bounds. Packet loss also occurs when there is no more buffer available for an incoming packet.

Queue Scheduling

A queuing system must implement strategies for controlling the transmissions that are called queue scheduling, which must consider priority and fairness/isolation. Buffer management is also needed for managing how packets are placed in the queuing systems.

First In First Out

The simplest form of queue scheduling is First In First Out (FIFO), where all packet flows share the same buffer, and the packets are transmitted in order of their arrivals. The delay and loss experienced by packets in FIFO system depend on the packet inter-arrival time and on packet lengths. FIFO can not provide differential QoS to different packet flows.

The FIFO can be modified to provide different packet loss performance to different classes of traffic:

Packets with low access priority: when the number of packets in a buffer reaches a certain threshold, they are not allowed to enter the system – the packet will be discarded when the threshold is exceeded.
Packets with high access priority: they are allowed to enter as long as the buffer is not full.

As a result, packets of low access quality will experience a higher packet loss probability, so performance differentiation is provided.

Head of Line Priority Queuing

Head of line priority queuing is another approach that involve defining a number of priority classes. A separate buffer is maintained for each priority class. Each time the transmission link becomes a variable, the next packet is expected to be transmitted, and is selected from the head of line in the highest priority queue that is not empty. So the high priority queue has a lower waiting time.

However, this approach doesn’t allow for providing some degree of guaranteed access to transmission bandwidth to the lower priority classes. Surge in high priority queue can cause the low priority queue to saturate.

Fair Queuing

Fair queuing attempts to provide isolated and equitable access to transmission bandwidth. Each user flow has its own logical buffer. In an ideal system, the transmission bandwidth is divided equally among the buffers that have packets to transmit. Weighted fair queuing (WFQ) further addresses different users with different priorities.

To achieve fair queuing, one approach could be simply to service each non-empty buffer one bit at a time in round-robin fashion. In the case of ATM, fair queuing can be approximated in an easier way, because in ATM all packets are the same length. In packet networks, implementation of fair queuing requires approximation, like simulation of fluid system, or sorting packets according to completion time.

Although fair queuing provides a fair allocation of bandwidth among multiple flows, it is rarely implemented in a core network, where tens of thousands of flows may be present at any given time. Maintaining state information of so large a number of flows will increase the implementation overhead significantly and make the system impossible to scale.

Buffer Management

Buffer management involves packet drop strategies. Fairness is important – a router should protect behaving sources from misbehaving sources. Random Early Detection (RED) is a buffer management technique that drops packets if queue length exceeds a given threshold.

Packet drop probability increases linearly with queue length. A dropped packet provides feedback information to the source, so it informs the source to reduce its transmission rate implicitly. When a given source transmitting at a higher rate than others, the source will suffer from a higher packet dropping rate.

Technically the RED maintains running average of queue length. Specifically two thresholds are defined: min threshold and max threshold.

Average queue length < Min threshold	RED doesn’t drop any arriving packets.
Min threshold < Average queue length < Max threshold	RED drops an arriving packet with an increasing probability as average queue length increases.
Max threshold < Average queue length	RED drops any arriving packet.

Flow Level

Congesting occurs when a surge of traffic overloads network resources in transmitting and buffering. The purpose of traffic management at the flow level is to control the flows of traffic and maintain performance even in the presence of traffic congestion.

Congestion control is a very hard problem as is packet routing. There are two main categories:

Preventive Approach (open-loop, based on scheduling and resource reservation)
- Initially for connection-oriented networks
- Key mechanisms: admission control, policing and traffic shaping
Reactive Approach (closed-loop, based on detection and discarding)

Admission Control

Admission control was initially developed for connection-oriented virtual circuit packet networks. Flows negotiate a contract with the network, specifying its requirements. Network computes the resource and determines whether the resource is available.

Leaky Bucket Policing

The process of monitoring and enforcing the traffic flow is called policing. To prevent that a source from violating its contract, the network monitors the traffic flows continuously, to ensure they meet their traffic contract. So in case that a packet violates the contract, network can discard or tag the packet giving it a lower priority. If congestion occurs, tagged packets are discarded first. Most implementations of a policing device are based on the concept of a leaky bucket.

Imagine the traffic flow to a policing device as water being poured into a bucket that has a hole at the bottom. The bucket has specified depth L to accommodate the variation of arrival rate. A counter is used to record the content of the leaky bucket. When a packet arrives:

The counter is increased by value I if it does not exceed the bucket’s limit, the packet is conforming. Value I indicates the average inter-arrival time of packets being policed.
If it exceeds the bucket’s limit, the bucket would overflow, the packet is non-conforming.

Traffic Shaping

If a source node wants to ensure that its traffic flow conforms to requirements specified by a leaky bucket policing device, the node may has to change the traffic flow. Traffic shaping refers to the process of changing a traffic flow to ensure the conformance.

A traffic shaping device is often located at the node just before the traffic flow leaves the network.
A traffic policing device is usually located at the node that received the traffic flow from a network.

- - → [ node | traffic shaping ] - - → - - → - - → [ traffic policing | node ]

First take a look at the leaky bucket traffic shaper as an example. All incoming packets are first stored in a buffer, so that surges in arrivals are buffered and smoothed out. On the other side of the traffic shaper, packets are output smoothly. The size of the buffer defines the maximum burst that can be accommodated. One possible problem is it might be too restrictive, and it does not allow variable-rate outgoing traffic. And the delay can be long for those applications.

Leaky bucket traffic shaper

[7].[6]...[5][4][3]..[2].[1] → [packet buffer] → [7][6][5][4][3][2][1]

A more realistic shaper called as Token Bucket Traffic Shaper, which only regulate the packets that are non-conforming. Tokens are generated periodically at a constant rate and are stored in a token bucket. If the token bucket is full, arriving tokens are discarded.

A packet from the traffic shaping buffer can be transmitted only if it can takes a token from the token bucket. If the token bucket is empty, arriving packets have to wait in a packet buffer. A token is a permit to transmit a packet.

Token bucket traffic shaper

(t)(t)(t)(t)(t)(t)(t)(t)(t)(t) → [token bucket]
                                       ↓
[7].[6]...[5][4][3]..[2].[1] → [packet buffer] → [7][6][5][4][3][2][1]

Flow Aggregate Level

At the flow aggregate Level, traffic management deals with many flows. It focus on the routing of aggregate traffic flows across a network for efficient utilization of resources and meeting of service levels. This level of management is usually called a Traffic Engineering, which takes place at a scale of minutes to days.

My Certificate

For more on Routing in Packet Networks, please refer to the wonderful course here https://www.coursera.org/learn/packet-switching-networks-algorithms

My 129th certificate from Coursera

Related Quick Recap

Frame Switching and Packet Switching

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai