Overview

Services and protocols

  • transport segment from sending to receiving host
  • network layer protocols in every Internet device, including hosts and routers
  • IP provides best-effort services only
  • two functions
    • forwarding (data plane): local action, move arriving packets from router's input link to appropriate router output link
    • routing (control plane): global action, generated by routing algorithms and determine source-destination paths (end-to-end)
      • centralized (e.g. telephone network)
        • "emerging" approach under the context of Software-Defined Networking (SDN)
        • routing is done by controller (a centralized server)
          • Q: Different routing algorithms can be easily used (why)?
          • A: Yes, the routing paths are determined by ourselves. This time I choose x path (the shortest way), the next time I can choose y path (the lowest delay way). Because routers do not collaborate.
        • the controller determines the paths (based on various packet header fields), and configures the forwarding tables at routers
        • In SDN, routers are called "openflow switches", because the routing function is done by the controller
          • Q: Can we use this method in the whole network?
          • A: It is impossible. This method can only be used in small network, such as campus network, enterprise network and datacenter network.
        • should obey the Openflow Specification (standard)
        • routing can be designed by software (programming, you can control the routing path by yourself)
      • distributed
        • routers collaborate with each other to find (shortest) paths (based on destination), and configure their own fowarding tables accordingly
        • self-healing
        • routing protocol is implemented inside routers
        • network operator lacks control of routing paths

Forwarding

What is inside a router

  • router architecture overview
    • input port functions
      • decentralized switching (the red one)
        • given data dest, lookup out port using forwarding table in input port memory
        • goal: complete input port processing at "line rate/speed" (completely use the bandwidth, not want to become bottlenect)
        • queuing: if datagrams arrive faster than forwarding rate into switch fabric
    • switching fabrics
      • memory: links have been generated
      • bus: just like broadcasting
      • crossbar: lost of buses, just like a small switch
        • configure some points connection or disconnection to make packets can be transmitted parallelly
      • some high-speed router combine different techniques together
      • switching rate: rate at which packets can be transferred
        • output-queued switch: switching rate = N times of line rate
          • allow N packets (R) come to one output port together
          • one packet come, output immediately (ideal), but expensive
        • input-queued switch: switching rate = line rate
          • only allow single packet to one output port, others would wait (if buffer flow, tcp would help)
          • in practice, use this method
    • output ports
      • buffering required when datagrams arrive from fabric faster than the transmission rate
      • scheduling discipline chooses among queued datagrams for transmission

IP: the Internet protocol

  • routing protocols and ICMP all rely on IP, so IP is the "only" standard in the network layer
  • datagram format
    • ver: 4 bits, e.g. IPv4, IPv6
    • header length: 4 bits, 4 bytes as a union (just like TCP), the minimum header size is 20 bytes
    • type of service: 8 bits, choose which type of service should be used, because the designer believe IP can do more than best-effort things (but majority people do not use this)
    • length: theoretically, maximum size of datagram is \(2^{16}-1\)
      • actually, smaller than that, due to frame has its limitation
    • 16-bit identifier: judge the packets belong to the same original message
    • flags: show the order of one packet, e.g. this packet is the last one of the original message
    • fragment offset: reassembly original message in order
      • many routers may not support fragmentation
      • try not to make the message be fragmented (in designer view)
      • IPv6 removes 16-bit identifier, flags and fragment offset
        • fragmentation would be done by host, not router
    • time to live: set by sending host, 255 (max, initially, due to 8 bits)
      • arrive at each router, minus 1, until 0, drop it
      • prevent looping
    • upper layer: link the transport layer and network layer
      • 6: TCP
      • 17: UDP
      • 89: OSPF
    • header checksum: hop-by-hop (due to the value of time to live has changed) basis error detection, only calculate header (check before forwarding)
      • checksum of TCP and UDP are end-to-end basis, calculate all things

IP addressing

  • IP address is associated with interface, not host
    • multiple interfaces mean multiple IP addresses
    • interface: connection between host/router and physical/wireless link
      • router has multiple interfaces, typically
      • host has one or two interfaces, typically (e.g. wired Ethernet, WiFi, Bluetooth)
      • Q: How are interfaces actually connected?
      • A: Ethernet switches, WiFi base station, etc.
  • subnet: a network inside a network
    • device interfaces can communicate with each other without routers
    • should have unique subnet address
    • each forwarding table entry corresponds to a subnet, or a range of addresses, to make forwarding table simple
      • forwarding table would be bottleneck, due to finding routes
  • structure
    • subnet part: high order bits (subnet mask)
    • host part: remaining low order bits
      • within a subnet, host address must be unique
  • old days: classful addressing
  • CIDR (classless interdomain routing)
    • subnet portion of address of arbitrary length
    • address format: a.b.c.d/x, where x is number bits in subnet portion of address
    • specific (0.0.0.0 means you have no IP address)
  • how to get IP addresses
    • host part: DHCP (dynamic host configuration protocol)
      • DHCP server would be in the subnet (e.g. at home, WiFi router has a DHCP server), without configuration by user
      • overview (dora)
        • host broadcasts "DHCP discover" msg (optional, due to lease timeout, just jump to the last two steps)
          • broadcast
          • src: 0.0.0.0 68
          • dest: 255.255.255.255 67
        • DHCP server responds with "DHCP offer" msg (optional, can be multiple, but only accept one)
          • broadcast
          • src: 223.1.2.5 67
          • dest: 255.255.255.255 68
          • lifetime: 3600s
        • host requests IP address: "DHCP request" msg
          • broadcast (let other DHCP servers know you want to accept one offer, others would know their offers are not successful)
          • src: 0.0.0.0 68
          • dest: 255.255.255.255 67
          • lifetime: 3600s
        • DHCP server sends address: "DHCP ack" msg
          • broadcast
          • src: 223.1.2.5 67
          • dest: 255.255.255.255 68
          • lifetime: 3600s
      • other parts of DHCP
        • IP address of first-hop router
        • name and IP address of DNS server (e.g. dns.google.com -> 8.8.8.8, due to collection of data, it is free)
        • subnet mask (indicating network versus host portion of address)
          • it can dertermine whether the packet transmission needs a router
    • subnet part: from ISP/ICANN -> public IP addresses (unique on the Internet)
      • but in setting up a new WiFi router, network part address -> private IP addresses (unique on the home network, so your this IP address can be the same as the others)
  • NAT (network address translation)

    • used in routers
      • use port number (should be unique, other processes can not use) to match with both sides
    • translates a set of IP addresses to another set of IP addresses (using translation table)
    • help preserve the limited amount of IPv4 public IP addresses (with private IP addresses)
      • public IP addresses
        • publicly registered
        • directly access the Internet with a public IP address
      • private IP addresses
        • not publicly registered
        • cannot directly access the Internet with a private IP address
        • only used internally, those IP addressed can not be seen on the Internet or routers
        • if the packet contains these IP addresses, it may be considered as an error and dropped immediately
    name start IP address end IP address subnet remark
    24-bit block 10.0.0.0 10.255.255.255 10.0.0.0/8 apple use this
    20-bit block 172.16.0.0 173.31.255.255 172.16.0.0/12 not many use this
    16-bit block 192.168.0.0 192.168.255.255 192.168.0.0/16 most use this, e.g. asus
    • advantage: good for security, just like a firewall, that is why we may use this even if we will use the IPv6
    • outside hosts want to communicate with an internal host
      • DDNS
      • configure NAT translation table in advance (the router would allow you to do that), called port forwarding
    • Q: The IP address you find on your iPhone is your smart phone IP address?
      • A: No. This IP address is assigned by WiFi router, you can dial *3001#12345#* to test for iPhone.
  • IPv6

    • differences between IPv4 and IPv6
    • motivations
      • to solve the IPv4 address space shortage problem
      • to speed up packet processing/forwarding (by flow label)
      • to facilitate QoS
    • transition from IPv4 to IPv6
      • use tunneling: IPv6 datagram carried as payload in IPv4 datagram among IPv4 routers
      • set the value of upper layer protocol (in IPv4 header) is 41, making routers know IPv4 datagram is covered by IPv6 datagram
      • problems
        • more overhead
        • packets would be too big and fragmentated by router
          • consider the maximum length of Ethernet frame is 1500 bytes
      • example

Routing

Classification overview

Link-state routing

  • net topology, link costs known to all nodes
    • via "link state broadcast", each router knows its neighbours by configuration
    • all nodes have same info
  • each node computes its shortest paths to all other nodes using Dijkstra's algorithm
    • based on the shortest paths found, configure the local forwarding table
  • OSPF (open shortest path first)
    • open: publicly available (cisco may monopoly the market in the past)
    • link state routing
      • LS packet dissemination
      • topology map at each other
      • route computation using Dijkstra's algorithm
    • OSPF advertisement message carries one entry per neighbour
    • advertisement flooded to entire network
      • directly over IP (rather than TCP or UDP) with "upper layer = 89"
    • reliability (although use IP)
      • by retransmission, just like DNS

Distance vector routing

  • distance = hop count
  • vector = next hop
  • by periodically exchanging distance vectors (DVs) with neighbours, each router knows neighbours' distance to destinations
  • each router uses Bellman-Ford algorithm to refine its own DVs.
    • e.g., using neighbour with the shortest distance to a destination as next hop
    • "routing by rumors"
  • LS vs DV
    • LS has the global topology and information, and each router calculate the ways by themselves
    • DV just trust others, they share the information with each other, each router just knows their neighbours know
    • the result may be same
    • message complexity
      • LS: with n nodes & E links, O(nE) messages sent
      • DV: exchange between neighbours only
    • speed of convergence
      • LS: relatively fast
      • DV: convergence time varies
    • robustness
      • LS: node can advertise incorrect link cost, but each node computes only its own table
      • DV: node can advertise incorrect path cost, each node's table just trust and use it (error propagation)
  • RIP (routing information protocol)
    • distance metric: the number of hops (max = 15), each link has cost 1
    • DVs exchanged with neighbours every 30 sec in advertisement messages
    • advertisements sent in UDP segments

Hierarchical routing

  • aggregate routers into regions, called "autonomous systems" (AS)
    • each AS is assigned a unique AS number (16 bits, but change to 32 bits today, due to shortage)
    • routers in same AS run same routing protocol
      • "intra-AS" routing protocol, e.g. OSPF, RIP
      • routers in different AS can run different intra-AS routing protocol
    • ASes must be interconnected via gateway routers
    • forwarding table configured by both intra- and inter-AS routing algorithm
      • intra-AS sets entries for internal dests
      • inter-AS & intra-AS sets entries for external dests
      • for multiple ASes, we can use hot potato routing (shorest way) or obey the policy (e.g. content provider), etc.
      • two layers is enough, but in the future, it is hard to say

  • BGP (border gateway protocol)

    • may be version 4
    • glue that holds the Internet/ASes together
    • tasks
      • for outbound traffic (how to find suitable way to other ASes)
        • obtain subnet reachability information from neighbouring ASes
        • propagate reachability information to all AS-internal routers
        • determine good routers to other networks based on reachability information and policy
      • for inbound traffic
        • advertise subnets that the AS can help to reach
        • hosts of outside ASes go to my ASes, make sure hosts of my AS are visible
    • BGP session: two BGP routers ("peers") exchange BGP messages

      • advertising paths to different subnets ("path vector" protocol)
      • exchange over TCP connections (sever port 179)
        • reliable data transfer
        • permenant connection, only one overhead
      • catagory
        • eBGP (external BGP)
          • logical and TCP connection
          • direct link
        • iBGP (internal BGP)
          • logical and TCP connection
          • share information (purpose)

      • conprehensive example
        • AS3 is willing to carry transit traffic N1: router 3a advertises path (N1, AS3) to router 1c over eBGP session
        • 1c applies its IMPORT policies to decide whether it wants to forward packets to N1 via 3a
          • if yes, forwarding table in 1c is updated to indicate 3a as the next-hop for N1
        • based on its EXPORT policies, assume AS1 is willing to carry transit traffic (from other ASes) to N1 (if AS3 is AS1 customer)
        • AS1 advertise path (N1, AS3, AS1) to AS2 via eBGP session. (note: 1b receives path (N1, AS3) from 1c via iBGP session)
      • elimination rules
        • local preference (LOCAL_PREF) value attribute: policy decision
        • shortest AS-PATH: AS hops rather than router hops
        • closest NEXT-HOP router: hot potato routing
        • additional criteria: backbone, Tier-1 ISP, etc.

Internet Control Message Protocol (ICMP)

Overview

  • mainly focus on error reporting (also echo request/reply by ping)
  • network-layer above IP
    • upper layer protocol = 1
  • ICMP message: type and code

Traceroute and ICMP

  • source sends three UDP segments to dest
    • first set has TTL = 1
    • second set has TTL = 2, etc.
    • unlikely port number (very likely no specific ports in the dest)
  • when nth set of datagrams arrives to nth router
    • router discards datagrams
    • and sends source ICMP messages (type 11, code 0)
    • ICMP messages includes name of router & IP address
  • when ICMP messages arrive, source records RTTs
  • stopping criteria
    • UDP segment eventually arrives at destination host
    • destination returns ICMP "port unreachable" message (type 3, code 3)
    • source stops

Datacenter Networks

Load balancer: application-layer routing

  • receives external client requests
  • directs workload within datacenter
    • datacenter TCP (specifically)
  • returns results to external client (hiding datacenter internals from client)