ICOM6012 Network Layer

Overview

Services and protocols

transport segment from sending to receiving host
network layer protocols in every Internet device, including hosts and routers
IP provides best-effort services only
two functions
- forwarding (data plane): local action, move arriving packets from router's input link to appropriate router output link
- routing (control plane): global action, generated by routing algorithms and determine source-destination paths (end-to-end)
  - centralized (e.g. telephone network)
    - "emerging" approach under the context of Software-Defined Networking (SDN)
    - routing is done by controller (a centralized server)
      - Q: Different routing algorithms can be easily used (why)?
      - A: Yes, the routing paths are determined by ourselves. This time I choose x path (the shortest way), the next time I can choose y path (the lowest delay way). Because routers do not collaborate.
    - the controller determines the paths (based on various packet header fields), and configures the forwarding tables at routers
    - In SDN, routers are called "openflow switches", because the routing function is done by the controller
      - Q: Can we use this method in the whole network?
      - A: It is impossible. This method can only be used in small network, such as campus network, enterprise network and datacenter network.
    - should obey the Openflow Specification (standard)
    - routing can be designed by software (programming, you can control the routing path by yourself)
  - distributed
    - routers collaborate with each other to find (shortest) paths (based on destination), and configure their own fowarding tables accordingly
    - self-healing
    - routing protocol is implemented inside routers
    - network operator lacks control of routing paths

Forwarding

What is inside a router

router architecture overview
- input port functions
  - decentralized switching (the red one)
    - given data dest, lookup out port using forwarding table in input port memory
    - goal: complete input port processing at "line rate/speed" (completely use the bandwidth, not want to become bottlenect)
    - queuing: if datagrams arrive faster than forwarding rate into switch fabric
- switching fabrics
  - memory: links have been generated
  - bus: just like broadcasting
  - crossbar: lost of buses, just like a small switch
    - configure some points connection or disconnection to make packets can be transmitted parallelly
  - some high-speed router combine different techniques together
  - switching rate: rate at which packets can be transferred
    - output-queued switch: switching rate = N times of line rate
      - allow N packets (R) come to one output port together
      - one packet come, output immediately (ideal), but expensive
    - input-queued switch: switching rate = line rate
      - only allow single packet to one output port, others would wait (if buffer flow, tcp would help)
      - in practice, use this method
- output ports
  - buffering required when datagrams arrive from fabric faster than the transmission rate
  - scheduling discipline chooses among queued datagrams for transmission

IP: the Internet protocol

routing protocols and ICMP all rely on IP, so IP is the "only" standard in the network layer
datagram format
- ver: 4 bits, e.g. IPv4, IPv6
- header length: 4 bits, 4 bytes as a union (just like TCP), the minimum header size is 20 bytes
- type of service: 8 bits, choose which type of service should be used, because the designer believe IP can do more than best-effort things (but majority people do not use this)
- length: theoretically, maximum size of datagram is \(2^{16}-1\)
  - actually, smaller than that, due to frame has its limitation
- 16-bit identifier: judge the packets belong to the same original message
- flags: show the order of one packet, e.g. this packet is the last one of the original message
- fragment offset: reassembly original message in order
  - many routers may not support fragmentation
  - try not to make the message be fragmented (in designer view)
  - IPv6 removes 16-bit identifier, flags and fragment offset
    - fragmentation would be done by host, not router
- time to live: set by sending host, 255 (max, initially, due to 8 bits)
  - arrive at each router, minus 1, until 0, drop it
  - prevent looping
- upper layer: link the transport layer and network layer
  - 6: TCP
  - 17: UDP
  - 89: OSPF
- header checksum: hop-by-hop (due to the value of time to live has changed) basis error detection, only calculate header (check before forwarding)
  - checksum of TCP and UDP are end-to-end basis, calculate all things

IP addressing

IP address is associated with interface, not host
- multiple interfaces mean multiple IP addresses
- interface: connection between host/router and physical/wireless link
  - router has multiple interfaces, typically
  - host has one or two interfaces, typically (e.g. wired Ethernet, WiFi, Bluetooth)
  - Q: How are interfaces actually connected?
  - A: Ethernet switches, WiFi base station, etc.
subnet: a network inside a network
- device interfaces can communicate with each other without routers
- should have unique subnet address
- each forwarding table entry corresponds to a subnet, or a range of addresses, to make forwarding table simple
  - forwarding table would be bottleneck, due to finding routes
structure
- subnet part: high order bits (subnet mask)
- host part: remaining low order bits
  - within a subnet, host address must be unique
old days: classful addressing
CIDR (classless interdomain routing)
- subnet portion of address of arbitrary length
- address format: a.b.c.d/x, where x is number bits in subnet portion of address
- specific (0.0.0.0 means you have no IP address)
how to get IP addresses
- host part: DHCP (dynamic host configuration protocol)
  - DHCP server would be in the subnet (e.g. at home, WiFi router has a DHCP server), without configuration by user
  - overview (dora)
    - host broadcasts "DHCP discover" msg (optional, due to lease timeout, just jump to the last two steps)
      - broadcast
      - src: 0.0.0.0 68
      - dest: 255.255.255.255 67
    - DHCP server responds with "DHCP offer" msg (optional, can be multiple, but only accept one)
      - broadcast
      - src: 223.1.2.5 67
      - dest: 255.255.255.255 68
      - lifetime: 3600s
    - host requests IP address: "DHCP request" msg
      - broadcast (let other DHCP servers know you want to accept one offer, others would know their offers are not successful)
      - src: 0.0.0.0 68
      - dest: 255.255.255.255 67
      - lifetime: 3600s
    - DHCP server sends address: "DHCP ack" msg
      - broadcast
      - src: 223.1.2.5 67
      - dest: 255.255.255.255 68
      - lifetime: 3600s
  - other parts of DHCP
    - IP address of first-hop router
    - name and IP address of DNS server (e.g. dns.google.com -> 8.8.8.8, due to collection of data, it is free)
    - subnet mask (indicating network versus host portion of address)
      - it can dertermine whether the packet transmission needs a router
- subnet part: from ISP/ICANN -> public IP addresses (unique on the Internet)
  - but in setting up a new WiFi router, network part address -> private IP addresses (unique on the home network, so your this IP address can be the same as the others)

NAT (network address translation)

used in routers
- use port number (should be unique, other processes can not use) to match with both sides
translates a set of IP addresses to another set of IP addresses (using translation table)
help preserve the limited amount of IPv4 public IP addresses (with private IP addresses)
- public IP addresses
  - publicly registered
  - directly access the Internet with a public IP address
- private IP addresses
  - not publicly registered
  - cannot directly access the Internet with a private IP address
  - only used internally, those IP addressed can not be seen on the Internet or routers
  - if the packet contains these IP addresses, it may be considered as an error and dropped immediately

name	start IP address	end IP address	subnet	remark
24-bit block	10.0.0.0	10.255.255.255	10.0.0.0/8	apple use this
20-bit block	172.16.0.0	173.31.255.255	172.16.0.0/12	not many use this
16-bit block	192.168.0.0	192.168.255.255	192.168.0.0/16	most use this, e.g. asus

advantage: good for security, just like a firewall, that is why we may use this even if we will use the IPv6
outside hosts want to communicate with an internal host
- DDNS
- configure NAT translation table in advance (the router would allow you to do that), called port forwarding
Q: The IP address you find on your iPhone is your smart phone IP address?
- A: No. This IP address is assigned by WiFi router, you can dial *3001#12345#* to test for iPhone.

IPv6
- differences between IPv4 and IPv6
- motivations
  - to solve the IPv4 address space shortage problem
  - to speed up packet processing/forwarding (by flow label)
  - to facilitate QoS
- transition from IPv4 to IPv6
  - use tunneling: IPv6 datagram carried as payload in IPv4 datagram among IPv4 routers
  - set the value of upper layer protocol (in IPv4 header) is 41, making routers know IPv4 datagram is covered by IPv6 datagram
  - problems
    - more overhead
    - packets would be too big and fragmentated by router
      - consider the maximum length of Ethernet frame is 1500 bytes
  - example

Routing

Classification overview

Link-state routing

net topology, link costs known to all nodes
- via "link state broadcast", each router knows its neighbours by configuration
- all nodes have same info
each node computes its shortest paths to all other nodes using Dijkstra's algorithm
- based on the shortest paths found, configure the local forwarding table
OSPF (open shortest path first)
- open: publicly available (cisco may monopoly the market in the past)
- link state routing
  - LS packet dissemination
  - topology map at each other
  - route computation using Dijkstra's algorithm
- OSPF advertisement message carries one entry per neighbour
- advertisement flooded to entire network
  - directly over IP (rather than TCP or UDP) with "upper layer = 89"
- reliability (although use IP)
  - by retransmission, just like DNS

Distance vector routing

distance = hop count
vector = next hop
by periodically exchanging distance vectors (DVs) with neighbours, each router knows neighbours' distance to destinations
each router uses Bellman-Ford algorithm to refine its own DVs.
- e.g., using neighbour with the shortest distance to a destination as next hop
- "routing by rumors"
LS vs DV
- LS has the global topology and information, and each router calculate the ways by themselves
- DV just trust others, they share the information with each other, each router just knows their neighbours know
- the result may be same
- message complexity
  - LS: with n nodes & E links, O(nE) messages sent
  - DV: exchange between neighbours only
- speed of convergence
  - LS: relatively fast
  - DV: convergence time varies
- robustness
  - LS: node can advertise incorrect link cost, but each node computes only its own table
  - DV: node can advertise incorrect path cost, each node's table just trust and use it (error propagation)
RIP (routing information protocol)
- distance metric: the number of hops (max = 15), each link has cost 1
- DVs exchanged with neighbours every 30 sec in advertisement messages
- advertisements sent in UDP segments

Hierarchical routing

aggregate routers into regions, called "autonomous systems" (AS)
- each AS is assigned a unique AS number (16 bits, but change to 32 bits today, due to shortage)
- routers in same AS run same routing protocol
  - "intra-AS" routing protocol, e.g. OSPF, RIP
  - routers in different AS can run different intra-AS routing protocol
- ASes must be interconnected via gateway routers
- forwarding table configured by both intra- and inter-AS routing algorithm
  - intra-AS sets entries for internal dests
  - inter-AS & intra-AS sets entries for external dests
  - for multiple ASes, we can use hot potato routing (shorest way) or obey the policy (e.g. content provider), etc.
  - two layers is enough, but in the future, it is hard to say

BGP (border gateway protocol)
- may be version 4
- glue that holds the Internet/ASes together
- tasks
  - for outbound traffic (how to find suitable way to other ASes)
    - obtain subnet reachability information from neighbouring ASes
    - propagate reachability information to all AS-internal routers
    - determine good routers to other networks based on reachability information and policy
  - for inbound traffic
    - advertise subnets that the AS can help to reach
    - hosts of outside ASes go to my ASes, make sure hosts of my AS are visible
- BGP session: two BGP routers ("peers") exchange BGP messages
  - advertising paths to different subnets ("path vector" protocol)
  - exchange over TCP connections (sever port 179)
    - reliable data transfer
    - permenant connection, only one overhead
  - catagory
    - eBGP (external BGP)
      - logical and TCP connection
      - direct link
    - iBGP (internal BGP)
      - logical and TCP connection
      - share information (purpose)
  - conprehensive example
    - AS3 is willing to carry transit traffic N1: router 3a advertises path (N1, AS3) to router 1c over eBGP session
    - 1c applies its IMPORT policies to decide whether it wants to forward packets to N1 via 3a
      - if yes, forwarding table in 1c is updated to indicate 3a as the next-hop for N1
    - based on its EXPORT policies, assume AS1 is willing to carry transit traffic (from other ASes) to N1 (if AS3 is AS1 customer)
    - AS1 advertise path (N1, AS3, AS1) to AS2 via eBGP session. (note: 1b receives path (N1, AS3) from 1c via iBGP session)
  - elimination rules
    - local preference (LOCAL_PREF) value attribute: policy decision
    - shortest AS-PATH: AS hops rather than router hops
    - closest NEXT-HOP router: hot potato routing
    - additional criteria: backbone, Tier-1 ISP, etc.

Internet Control Message Protocol (ICMP)

Overview

mainly focus on error reporting (also echo request/reply by ping)
network-layer above IP
- upper layer protocol = 1
ICMP message: type and code

Traceroute and ICMP

source sends three UDP segments to dest
- first set has TTL = 1
- second set has TTL = 2, etc.
- unlikely port number (very likely no specific ports in the dest)
when nth set of datagrams arrives to nth router
- router discards datagrams
- and sends source ICMP messages (type 11, code 0)
- ICMP messages includes name of router & IP address
when ICMP messages arrive, source records RTTs
stopping criteria
- UDP segment eventually arrives at destination host
- destination returns ICMP "port unreachable" message (type 3, code 3)
- source stops

Datacenter Networks

Load balancer: application-layer routing

receives external client requests
directs workload within datacenter
- datacenter TCP (specifically)
returns results to external client (hiding datacenter internals from client)