11 minutes
ICOM6012 Network Layer
Overview
Services and protocols
- transport segment from sending to receiving host
- network layer protocols in every Internet device, including hosts and routers
- IP provides best-effort services only
- two functions
- forwarding (data plane): local action, move arriving packets from router's input link to appropriate router output link
- routing (control plane): global action, generated by routing algorithms and determine source-destination paths (end-to-end)
- centralized (e.g. telephone network)
- "emerging" approach under the context of Software-Defined Networking (SDN)
- routing is done by controller (a centralized server)
- Q: Different routing algorithms can be easily used (why)?
- A: Yes, the routing paths are determined by ourselves. This time I choose x path (the shortest way), the next time I can choose y path (the lowest delay way). Because routers do not collaborate.
- the controller determines the paths (based on various packet header fields), and configures the forwarding tables at routers
- In SDN, routers are called "openflow switches", because the routing function is done by the controller
- Q: Can we use this method in the whole network?
- A: It is impossible. This method can only be used in small network, such as campus network, enterprise network and datacenter network.
- should obey the Openflow Specification (standard)
- routing can be designed by software (programming, you can control the routing path by yourself)
- distributed
- routers collaborate with each other to find (shortest) paths (based on destination), and configure their own fowarding tables accordingly
- self-healing
- routing protocol is implemented inside routers
- network operator lacks control of routing paths
- centralized (e.g. telephone network)
Forwarding
What is inside a router
- router architecture overview
- input port functions
- decentralized switching (the red one)
- given data dest, lookup out port using forwarding table in input port memory
- goal: complete input port processing at "line rate/speed" (completely use the bandwidth, not want to become bottlenect)
- queuing: if datagrams arrive faster than forwarding rate into switch fabric
- decentralized switching (the red one)
- switching fabrics
- memory: links have been generated
- bus: just like broadcasting
- crossbar: lost of buses, just like a small switch
- configure some points connection or disconnection to make packets can be transmitted parallelly
- some high-speed router combine different techniques together
- switching rate: rate at which packets can be transferred
- output-queued switch: switching rate = N times of line rate
- allow N packets (R) come to one output port together
- one packet come, output immediately (ideal), but expensive
- input-queued switch: switching rate = line rate
- only allow single packet to one output port, others would wait (if buffer flow, tcp would help)
- in practice, use this method
- output-queued switch: switching rate = N times of line rate
- output ports
- buffering required when datagrams arrive from fabric faster than the transmission rate
- scheduling discipline chooses among queued datagrams for transmission
- input port functions
IP: the Internet protocol
- routing protocols and ICMP all rely on IP, so IP is the "only" standard in the network layer
- datagram format
- ver: 4 bits, e.g. IPv4, IPv6
- header length: 4 bits, 4 bytes as a union (just like TCP), the minimum header size is 20 bytes
- type of service: 8 bits, choose which type of service should be used, because the designer believe IP can do more than best-effort things (but majority people do not use this)
- length: theoretically, maximum size of datagram is \(2^{16}-1\)
- actually, smaller than that, due to frame has its limitation
- 16-bit identifier: judge the packets belong to the same original message
- flags: show the order of one packet, e.g. this packet is the last one of the original message
- fragment offset: reassembly original message in order
- many routers may not support fragmentation
- try not to make the message be fragmented (in designer view)
- IPv6 removes 16-bit identifier, flags and fragment offset
- fragmentation would be done by host, not router
- time to live: set by sending host, 255 (max, initially, due to 8 bits)
- arrive at each router, minus 1, until 0, drop it
- prevent looping
- upper layer: link the transport layer and network layer
- 6: TCP
- 17: UDP
- 89: OSPF
- header checksum: hop-by-hop (due to the value of time to live has changed) basis error detection, only calculate header (check before forwarding)
- checksum of TCP and UDP are end-to-end basis, calculate all things
IP addressing
- IP address is associated with interface, not host
- multiple interfaces mean multiple IP addresses
- interface: connection between host/router and physical/wireless link
- router has multiple interfaces, typically
- host has one or two interfaces, typically (e.g. wired Ethernet, WiFi, Bluetooth)
- Q: How are interfaces actually connected?
- A: Ethernet switches, WiFi base station, etc.
- subnet: a network inside a network
- device interfaces can communicate with each other without routers
- should have unique subnet address
- each forwarding table entry corresponds to a subnet, or a range of addresses, to make forwarding table simple
- forwarding table would be bottleneck, due to finding routes
- structure
- subnet part: high order bits (subnet mask)
- host part: remaining low order bits
- within a subnet, host address must be unique
- old days: classful addressing
- CIDR (classless interdomain routing)
- subnet portion of address of arbitrary length
- address format: a.b.c.d/x, where x is number bits in subnet portion of address
- specific (0.0.0.0 means you have no IP address)
- how to get IP addresses
- host part: DHCP (dynamic host configuration protocol)
- DHCP server would be in the subnet (e.g. at home, WiFi router has a DHCP server), without configuration by user
- overview (dora)
- host broadcasts "DHCP discover" msg (optional, due to lease timeout, just jump to the last two steps)
- broadcast
- src: 0.0.0.0 68
- dest: 255.255.255.255 67
- DHCP server responds with "DHCP offer" msg (optional, can be multiple, but only accept one)
- broadcast
- src: 223.1.2.5 67
- dest: 255.255.255.255 68
- lifetime: 3600s
- host requests IP address: "DHCP request" msg
- broadcast (let other DHCP servers know you want to accept one offer, others would know their offers are not successful)
- src: 0.0.0.0 68
- dest: 255.255.255.255 67
- lifetime: 3600s
- DHCP server sends address: "DHCP ack" msg
- broadcast
- src: 223.1.2.5 67
- dest: 255.255.255.255 68
- lifetime: 3600s
- host broadcasts "DHCP discover" msg (optional, due to lease timeout, just jump to the last two steps)
- other parts of DHCP
- IP address of first-hop router
- name and IP address of DNS server (e.g. dns.google.com -> 8.8.8.8, due to collection of data, it is free)
- subnet mask (indicating network versus host portion of address)
- it can dertermine whether the packet transmission needs a router
- subnet part: from ISP/ICANN -> public IP addresses (unique on the Internet)
- but in setting up a new WiFi router, network part address -> private IP addresses (unique on the home network, so your this IP address can be the same as the others)
- host part: DHCP (dynamic host configuration protocol)
NAT (network address translation)
- used in routers
- use port number (should be unique, other processes can not use) to match with both sides
- translates a set of IP addresses to another set of IP addresses (using translation table)
- help preserve the limited amount of IPv4 public IP addresses (with private IP addresses)
- public IP addresses
- publicly registered
- directly access the Internet with a public IP address
- private IP addresses
- not publicly registered
- cannot directly access the Internet with a private IP address
- only used internally, those IP addressed can not be seen on the Internet or routers
- if the packet contains these IP addresses, it may be considered as an error and dropped immediately
- public IP addresses
name start IP address end IP address subnet remark 24-bit block 10.0.0.0 10.255.255.255 10.0.0.0/8 apple use this 20-bit block 172.16.0.0 173.31.255.255 172.16.0.0/12 not many use this 16-bit block 192.168.0.0 192.168.255.255 192.168.0.0/16 most use this, e.g. asus - advantage: good for security, just like a firewall, that is why we may use this even if we will use the IPv6
- outside hosts want to communicate with an internal host
- DDNS
- configure NAT translation table in advance (the router would allow you to do that), called port forwarding
- Q: The IP address you find on your iPhone is your smart phone IP address?
- A: No. This IP address is assigned by WiFi router, you can dial *3001#12345#* to test for iPhone.
- used in routers
IPv6
- differences between IPv4 and IPv6
- motivations
- to solve the IPv4 address space shortage problem
- to speed up packet processing/forwarding (by flow label)
- to facilitate QoS
- transition from IPv4 to IPv6
- use tunneling: IPv6 datagram carried as payload in IPv4 datagram among IPv4 routers
- set the value of upper layer protocol (in IPv4 header) is 41, making routers know IPv4 datagram is covered by IPv6 datagram
- problems
- more overhead
- packets would be too big and fragmentated by router
- consider the maximum length of Ethernet frame is 1500 bytes
- example
Routing
Classification overview
Link-state routing
- net topology, link costs known to all nodes
- via "link state broadcast", each router knows its neighbours by configuration
- all nodes have same info
- each node computes its shortest paths to all other nodes using Dijkstra's algorithm
- based on the shortest paths found, configure the local forwarding table
- OSPF (open shortest path first)
- open: publicly available (cisco may monopoly the market in the past)
- link state routing
- LS packet dissemination
- topology map at each other
- route computation using Dijkstra's algorithm
- OSPF advertisement message carries one entry per neighbour
- advertisement flooded to entire network
- directly over IP (rather than TCP or UDP) with "upper layer = 89"
- reliability (although use IP)
- by retransmission, just like DNS
Distance vector routing
- distance = hop count
- vector = next hop
- by periodically exchanging distance vectors (DVs) with neighbours, each router knows neighbours' distance to destinations
- each router uses Bellman-Ford algorithm to refine its own DVs.
- e.g., using neighbour with the shortest distance to a destination as next hop
- "routing by rumors"
- LS vs DV
- LS has the global topology and information, and each router calculate the ways by themselves
- DV just trust others, they share the information with each other, each router just knows their neighbours know
- the result may be same
- message complexity
- LS: with n nodes & E links, O(nE) messages sent
- DV: exchange between neighbours only
- speed of convergence
- LS: relatively fast
- DV: convergence time varies
- robustness
- LS: node can advertise incorrect link cost, but each node computes only its own table
- DV: node can advertise incorrect path cost, each node's table just trust and use it (error propagation)
- RIP (routing information protocol)
- distance metric: the number of hops (max = 15), each link has cost 1
- DVs exchanged with neighbours every 30 sec in advertisement messages
- advertisements sent in UDP segments
Hierarchical routing
- aggregate routers into regions, called "autonomous systems" (AS)
- each AS is assigned a unique AS number (16 bits, but change to 32 bits today, due to shortage)
- routers in same AS run same routing protocol
- "intra-AS" routing protocol, e.g. OSPF, RIP
- routers in different AS can run different intra-AS routing protocol
- ASes must be interconnected via gateway routers
- forwarding table configured by both intra- and inter-AS routing algorithm
- intra-AS sets entries for internal dests
- inter-AS & intra-AS sets entries for external dests
- for multiple ASes, we can use hot potato routing (shorest way) or obey the policy (e.g. content provider), etc.
- two layers is enough, but in the future, it is hard to say
BGP (border gateway protocol)
- may be version 4
- glue that holds the Internet/ASes together
- tasks
- for outbound traffic (how to find suitable way to other ASes)
- obtain subnet reachability information from neighbouring ASes
- propagate reachability information to all AS-internal routers
- determine good routers to other networks based on reachability information and policy
- for inbound traffic
- advertise subnets that the AS can help to reach
- hosts of outside ASes go to my ASes, make sure hosts of my AS are visible
- for outbound traffic (how to find suitable way to other ASes)
BGP session: two BGP routers ("peers") exchange BGP messages
- advertising paths to different subnets ("path vector" protocol)
- exchange over TCP connections (sever port 179)
- reliable data transfer
- permenant connection, only one overhead
- catagory
- eBGP (external BGP)
- logical and TCP connection
- direct link
- iBGP (internal BGP)
- logical and TCP connection
- share information (purpose)
- eBGP (external BGP)
- conprehensive example
- AS3 is willing to carry transit traffic N1: router 3a advertises path (N1, AS3) to router 1c over eBGP session
- 1c applies its IMPORT policies to decide whether it wants to forward packets to N1 via 3a
- if yes, forwarding table in 1c is updated to indicate 3a as the next-hop for N1
- based on its EXPORT policies, assume AS1 is willing to carry transit traffic (from other ASes) to N1 (if AS3 is AS1 customer)
- AS1 advertise path (N1, AS3, AS1) to AS2 via eBGP session. (note: 1b receives path (N1, AS3) from 1c via iBGP session)
- elimination rules
- local preference (LOCAL_PREF) value attribute: policy decision
- shortest AS-PATH: AS hops rather than router hops
- closest NEXT-HOP router: hot potato routing
- additional criteria: backbone, Tier-1 ISP, etc.
Internet Control Message Protocol (ICMP)
Overview
- mainly focus on error reporting (also echo request/reply by ping)
- network-layer above IP
- upper layer protocol = 1
- ICMP message: type and code
Traceroute and ICMP
- source sends three UDP segments to dest
- first set has TTL = 1
- second set has TTL = 2, etc.
- unlikely port number (very likely no specific ports in the dest)
- when nth set of datagrams arrives to nth router
- router discards datagrams
- and sends source ICMP messages (type 11, code 0)
- ICMP messages includes name of router & IP address
- when ICMP messages arrive, source records RTTs
- stopping criteria
- UDP segment eventually arrives at destination host
- destination returns ICMP "port unreachable" message (type 3, code 3)
- source stops
Datacenter Networks
Load balancer: application-layer routing
- receives external client requests
- directs workload within datacenter
- datacenter TCP (specifically)
- returns results to external client (hiding datacenter internals from client)