Introduction to Internet Protocol

Table of Contents

TCP/IP Protocol Suite is to build a network of networks (Internet) that can operate over multiple, coexisting, and heterogeneous network technologies. It provides ubiquitous connectivity through the IP packet transfer. Besides the two well-known protocols, TCP (Transmission Control Protocol) and IP (Internet Protocol), there are other protocols:

UDP User Datagram Protocol
ICMP Internet Control Message Protocol
IGMP Internet Group Management Protocol
ARP Address Resolution Protocol
other basic application protocols like HTTP, SMTP, DNS, RTP, etc.

The Protocol Data Units (PDUs) of a given layer is encapsulated in a Protocol Data Unit of the layer below.

HTTP layer	`[HTTP Request]`
TCP layer	`[TCP Header][HTTP Request]` TCP Header contains source and destination port numbers
IP layer	`[IP Header][TCP Header][HTTP Request]` IP Header contains source and destination IP address, and transport protocol type
Ethernet layer	`[Ethernet Header][IP Header][TCP Header][HTTP Request][FCS]` Ethernet Header contains source and destination MAC address, and network protocol type

When an IP packet is passed to a router, the router:

Calculates the header checksum for correctness and then checks all the fields (version, total length, etc…)
Identifies the next hop for the IP packet by consulting the routing table
Changes the fields such as time-to-live and header checksum
Forwards the packet along the next hop

Routing decision is done based on destination IP address. Internet Protocol (IP) layer provides a connection-less and best-effort delivery service to the transport layer (TCP or UDP). IP layer is trying it’s best, but it doesn’t guarantee that a packet will be delivered to the destination, nor its quality of service.

Internet Protocol

Each host (computer, server, etc) in the Internet is identified by a globally unique IP address, which is divided into 2 parts, this combination facilitates routing and reduces routing tables.

`Network ID`	Assigned by Internet service provider. Identifies the network that a host is connected to. All hosts connect you to the same network have the same network ID.
`Host ID`	Assigned by network administrators at the local site. Identifies a host in a network.

IPV4 addresses often use dotted-decimal notation, like 192.168.0.1. The IP address structure is divided into five address classes:

Class A	`[0][ NetID ( 7-bits) ][ HostID (24-bits) ]` 126 networks with up to 16 millions hosts
Class B	`[1][0][ NetID (14-bits) ][ HostID (16-bits) ]` 16382 networks with up to 64000 hosts
Class C	`[1][1][0][ NetID (21-bits) ][ HostID ( 8-bits) ]` 2 millions networks with up to 254 hosts
Class D	`[1][1][1][0][ Multicast address (28-bits) ]` Up to 250 millions multicast groups at the same time
Class E	Used for experiments

They are reserved Host IDs consisting of all ones or all zeroes for special purpose:

Host ID = 0s	An internet address used to refer to a network has its Host ID set to all zeroes.
Host ID = 1s	An Host ID that contains all ones meant to broadcast the packet to all hosts on the network specified by the network ID.
Network ID = 1s	When the Network ID contains all ones, the packet is broadcast on the local network.

There are specific range of IP addresses designated for use in private networks. Routers in public Internet will discard packets with these addresses:

Range 1: 10.0.0.0 ~ 10.255.255.255
Range 2: 172.16.0.0 ~ 172.31.255.255
Range 3: 192.168.0.0 ~ 192.168.255.255

Network Address Translation (NAT) is used to convert between private and global (public) IP addresses.

Subnet Addressing

The original classes for IP addressing has drawbacks: Class A and Class B might be too large for an organization where Class C might be too small. How to allow one large network to be split into several smaller parts for internal use but still act like a single network to the outside?

The basic idea of subnet addressing is to add another hierarchical level called a subnet, which is oblivious to the outside of the network:

A host outside of this network would still see the original address structure with two levels: Network ID and Host ID.
Inside of the network, administrators are free to choose any combination of Local Area Networks (LANs) for the subnets, which simplifies the management of multiple LANs within the network.

Original address:
[1][0][ Network ID ][       Host ID        ]

Subnetted address:
[1][0][ Network ID ][ Subnet ID ][ Host ID ]

Subnet mask:
[1][1][ 1........1 ][ 1.......1 ][ Host ID ]

IP address masking is used to find the Subnet ID. Consider an organization that has a Class B address with Network ID 150.100.0.0. The organization has many Local Area Networks, each consisting of no more than 100 hosts. Since 2⁷ – 2 = 126 > 100, so 7 bits will be sufficient for each subnet. The other 16 – 7 = 9 bits will be used to identify the subnets within the organization, there will be 2⁹ – 2 = 510 subnets in total.

When a packet arrived at the network from the outside, the router can determine the subnet number by performing a perfect logic AND operation between the subnet mask and packet’s destination IP address. The result is called subnet address, which is used by routers to forward packages to the correct subnet.

Subnetwork:
[ Subnet address ][ 0.....0 ]

Boardcast in the subnet
[ Subnet address ][ 1.....1 ]

Range of IP addresses in the subnet:
From [ Subnet address ][ 0....01 ]
To   [ Subnet address ][ 1....10 ]

Routing with Subnets

The IP layer in hosts and the IP layer in routers maintain their routing tables, and work together to route packets from source to destination. Each row in a routing table contains:

Destination IP address
IP address of next-hop router
Physical address
Statistics information
Flags
- H = 1: route is to a host; H = 0: route is to a network
- G = 1: route is to a router; G = 0: route is to a directly connected destination

Each time a packet is to be routed, the routing table is searched in orders:

Search destination address, if found send as per next-hop and G flag
Search destination network ID, if found send as per next-hop and G flag
Search default router, if found send as per next-hop
Declare the packet is undeliverable, send ICMP “host unreachable error” packet to the originating host

Dividing the IP address space into A, B, C classes is inflexible and inefficient. In long term, IPv6 with much bigger address space is a solution. However in short term, techniques such as CIDR, new address allocation policy, network address translation (NAT), can help a lot.

Classless InterDomain Routing (CIDR)

Most organization under-utilize Class A and B address space meanwhile need more addresses that can be provided by a Class C address space. There is also routing table explosion problem.

CIDR was proposed to deal with these problem. CIDR enables supernetting technique to allow a single routing entry to cover a block of classful addresses. By CIDR, networks are represented by prefix and mask to replace the classful scheme. For instance a prefix 205.100.0.0 with mask 22 is written as 205.100.0.0/22, in which /22 means network mask is 22 bits long:

1111 1111 1111 1111 1111 1100 0000 0000
|←        22 bits        →|

An entry in CIDR routing table contains 32-bit IP address and a 32-bit mask. Packets are routed according to the prefix without address classes. For instance, the following 4 consecutive /24 networks often use the same outgoing line. CIDR aggregation can be done to reduce the number of entry at the router.

Four networks:
128.56.24.0/24    1000 0000 0011 1000 0001 1000 0000 0000
128.56.25.0/24    1000 0000 0011 1000 0001 1001 0000 0000
128.56.26.0/24    1000 0000 0011 1000 0001 1010 0000 0000
128.56.27.0/24    1000 0000 0011 1000 0001 1011 0000 0000

Instead of 4 entries in routing table,
one entry is sufficient by CIDR:
128.56.24.0/22    1000 0000 0011 1000 0001 1000 0000 0000

It summarizes a consecutive group of Class C addresses if all of them use the same outgoing line. In the case that multiple entries in routing table may match a given destination IP address, the Longest Prefix Match rule requires the packet must be routed using the most specific route.

Address Resolution Protocol (ARP)

Currently, Ethernet is the most common network that IP runs on. Ethernet uses 48-bit MAC physical address, so IP addresses need to be converted into specific physical addresses. This is the job done by Address Resolution Protocol (ARP).

When host H1 wants to send an IP packet to another host H3, but it doesn’t know the MAC address of H3. H1 first broadcasts an ARP request packet asking the destination host, which is identified by H3’s IP address, to reply. All hosts in the network receive the packet, but only the intended host H3 will respond to H1. The ARP response packet contains H3’s MAC and IP addresses. From now on, H1 can cache H3’s MAC and IP address in it’s ARP table so that H1 can simply look up H3’s MAC address in the table for future use.

Fragmentation and Reassembly

IP can work on a variety of physical networks, however each of physical networks usually impose a certain packet size limitation called the Maximum Transmission Unit (MTU).

When IP wants to send a packet that is larger than the MTU of the physical network, IP must break packet into smaller fragments to fit. Fragmentation can be done at a source host or at an intermediate router. The IP layer at the destination is responsible for assembling the fragments to the original packet. To do this, the destination waits until it has received all the fragments belonging to the same packet. If one or more fragments are lost in the network, the destination abandons the assembly process.

In the IP packet header, there are 3 fields are used to do fragmentation and reassembly:

Identification	Identifies which packet a particular fragment belongs to.
Flags	If the “don’t fragment” bit is set to one, the router not to fragment the packet. If “more fragment” bit is set to one, destination host knows that there are more fragments to follow.
Fragment Offset	Identifies the location of a fragment in the packet. The value matches offset in units of 8 bytes between the beginning of the packet and the beginning of the fragment.

Dynamic Host Configuration Protocol (DHCP)

DHCP automatically configures hosts that connect to a TCP/IP network. It was built on top of the Bootstrap Protocol to deliver configuration information to a host. Server uses port 67, and client uses port 68. DHCP is used extensively by Internet service providers to assign temporary IP addresses to hosts and maximize the usage of their limited IP address space.

When a host wishes to obtain an IP address:

The host broadcasts a DHCP discover message in its physical network.
The DHCPs servers in the network respond with a DHCP offer message that provides an IP address and another configuration information.
The host selected one of the offers, and boardcasts a DHCP request message that indicates the ID of the selected server.
The selected DHCP server then allocates the giving IP address to the host and sends a DHCP ack message assigning the IP address to the host for some period or release time.

Network Address Translation (NAT)

NAT refers to a method for mapping packets from hosts in private networks into packets that can traverse the Internet. It also transfers packets arriving from the public / global Internet to the appropriate destination host in the private network.

NAT router acts as an agent between a private network and a public network, in this way a number of hosts can share a limited number of registered IP addresses. The NAT router maintains a table for mapping packets from the private network into the Internet and back.

In the example below, 128.100.10.15 is a registered global IP address. Each time a host (say, 192.168.0.10) in private network generates a packet destined for the Internet, a new entry is created in the NAT router table. The entry contains

the private IP address 192.168.0.10 of the host as well as port number, say xxx
the registered global IP address 128.100.10.15 as well as another port number that is not in use, say yyy

The NAT router then sends a packet into the Internet with the registry global IP address, that is 128.100.10.15 in this example.

[ Private IP addresses ]            [ Public IP address ]
    192.168.0.10:xxx     ⟺ NAT ⟺   128.100.10.15:yyy
    192.168.0.13:zzz                  128.100.10.15:sss

When the response packets arrive, although they have the same global IP address 128.100.10.15 as the destination, the port number (yyy or sss) is used to retrieve the original private IP address and port number by looking at the table (192.168.0.10:xxx or 192.168.0.13:zzz respectively). So packets can then be delivered to the appropriate host.

In theory, one public IP address can support up to 2¹⁶ private IP addresses. However overhead of running NAT operation shall be considered. One potential problem is that NAT is implementing it at its IP layer, but it takes advantage of TCP/UDP port number (from the upper transport layer) for the lookup table. This actually violates the OSI layer architecture, that is, a higher layer shall only utilizes services provided by the lower layer, but not vice versa.

IPv6

IPv4 has played an essential role in the Internet, however IPv4 addresses (32-bits) eventually cannot accommodate its explosive growth. There are two major changes from IPv4 to IPv6:

A longer address field. IPv6 uses 128-bits for addressing that can support 3.4 x 10³⁸ hosts.
Simplified header format. All fields are of fixed size in IPv6.
- Fields kept: Version.
- Fields dropped: Header length, ID, Flags, Flag offset, Header checksum, etc.
- Fields replaced:
  - Datagram length → Payload length indicates the length of date excluding header. 16-bits is allocated to this field, so the Payload length is limited to 65536 bytes. But it is possible to send larger payloads by using the extension header.
  - Protocol type → Next header which is used to chain extension headers.
  - TTL → Hop limit
  - TOS → Traffic class to support differentiated services
- Fields added: Flow label to identify quality of service requested by the packet. The IPv6, a flow is defined as a sequence of packets from particular source to a particular destination, for which source requires special handling by the intervening routers.

IPv6 addresses use hexadecimal notation, and are divided into three categories:

Unicast addresses
Multicast addresses
Anycast addresses

Extension Headers

To support extra functionalities that are not provided by the basic header, IPv6 allows an arbitrary number of extension headers to be placed between the basic header and the payload. Extension headers act like options in IPv4, but are more efficient and flexible.

By using an extension header, IPv6 allows a payload size of more than 64K bytes called Jumbo packet. The use of larger payload size is promoted by high speed networks, big data applications and by super computing applications.

Fragmentation

IPv6 allows only the source host to perform fragmentation. Intermediate routers are not allowed. The rationale is to speed up routing under the intermediate routers. If the packet length is greater than the MTU of the network, a router simply discards the packet and sends an error message back to the source.

A source host can find all MTU along the path from the source to the destination by performing a path MTU discovery procedure. One disadvantage is that the path between a source and destination must remain reasonably static.

My Certificate

For more on Introduction to Internet Protocol, please refer to the wonderful course here https://www.coursera.org/learn/tcp-ip-advanced

My 131st certificate from Coursera

Related Quick Recap

Routing in Packet Networks

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai