VXLAN — Virtual Extensible LAN

VXLAN

How VXLAN tunnels Layer 2 Ethernet frames over a Layer 3 IP network to build scalable overlay segments.

vxlanoverlayvteptunnellayer2sdndata-centerevpn

Overview

Traditional VLANs (802.1Q) are limited to 4,094 IDs — inadequate for large multi-tenant data centers where thousands of isolated Layer 2 segments are needed. VLANs also require every switch in the broadcast domain to participate in the same Spanning Tree instance, creating scalability problems across large fabrics.

VXLAN (Virtual Extensible LAN, RFC 7348) solves both problems by encapsulating Layer 2 Ethernet frames inside UDP packets and tunneling them over an existing Layer 3 IP network (the underlay). The virtual Layer 2 network carried inside is the overlay. The VXLAN ID space is 24 bits — supporting up to 16,777,216 isolated segments.

Core Concepts

TermMeaning
VNIVXLAN Network Identifier — 24-bit segment ID, equivalent to a VLAN ID
VTEPVXLAN Tunnel Endpoint — encapsulates outgoing and decapsulates incoming VXLAN traffic
UnderlayThe physical IP/routed network connecting all VTEPs
OverlayThe virtual Layer 2 segment carried inside VXLAN tunnels
BUM trafficBroadcast, Unknown unicast, Multicast — requires special handling

VTEPs can be physical switches (top-of-rack hardware), hypervisor vSwitches (VMware, OVS), or router software. In VMware NSX-T and AWS VPC, VTEPs run directly in the hypervisor kernel.

Encapsulation Format

VXLAN adds four header layers around the original Ethernet frame:

VXLAN Encapsulated Packet

Outer Ethernet Header (14B)
3B
Outer IP Header (20B)
4B
Outer UDP Header — dst port 4789 (8B)
2B
VXLAN Header (8B)
2B
Inner Ethernet Header (14B)
3B
Inner IP + Payload
4B

The VXLAN header:

VXLAN Header (8 bytes)

Flags (RRRRIRRR — I bit = VNI valid)
1B
Reserved (24 bits)
3B
VNI (24 bits)
3B
Reserved (8 bits)
1B

The I flag must be set to indicate the VNI field is valid. The outer IP headers carry the packet between VTEPs — the underlay sees nothing but IP/UDP traffic and has no awareness of the inner Ethernet frames or VNIs.

Total overhead: 50 bytes (14 outer Eth + 20 outer IP + 8 UDP + 8 VXLAN). If the underlay MTU is 1500 bytes, inner frames must be ≤ 1450 bytes. For standard 1500-byte inner frames, underlay MTU must be ≥ 1550 bytes. Most data center fabrics use jumbo frames (MTU 9000) to accommodate this without fragmentation.

Unicast Packet Flow (Known MAC)

VM-A on VTEP-1
IP Underlay
Inner frame: Eth(MAC-A→MAC-B) / IP(10.0.0.1→10.0.0.2)
VM-A sends a normal Ethernet frame
VXLAN encapsulate: outer IP src=VTEP-1 dst=VTEP-2, VNI=5000
UDP dst port 4789 — underlay routes this as any IP packet
Packet arrives at VTEP-2
VTEP-2 strips outer Eth+IP+UDP+VXLAN headers
Inner frame delivered to VM-B
VTEP-2 learns: MAC-A is reachable via VTEP-1

VTEP-2 caches the mapping of MAC-A → VTEP-1 in its local VTEP MAC table. Subsequent frames to MAC-A are encapsulated directly to VTEP-1 without any flooding.

BUM Traffic Handling

When a VTEP receives a frame for an unknown MAC (or broadcast/multicast), it must flood to all VTEPs in the same VNI. Three approaches:

1. Multicast-Based Flooding

Each VNI maps to an IP multicast group. VTEPs join the group using PIM in the underlay. BUM traffic is sent to the multicast group address and the underlay replicates it to all group members.

Requires: Multicast-capable underlay (PIM, IGMP). Common in campus and some data center deployments.

2. Ingress Replication (Unicast Flooding)

The sending VTEP maintains a flood list — a list of all remote VTEPs in the VNI — and sends a separate unicast copy of the BUM frame to each. More CPU-intensive but requires no multicast support in the underlay.

Common in: Cloud environments (AWS VPC, Azure VNet) where underlay multicast is unavailable.

3. BGP EVPN Control Plane (No Flooding)

Modern deployments use BGP EVPN (RFC 7432) to distribute MAC/IP reachability before traffic flows. VTEPs advertise their MAC and IP bindings as BGP EVPN routes. Flood-and-learn is eliminated entirely.

VTEP-2
BGP Route Reflector
EVPN Type-2 route: MAC=VM-B, IP=10.0.0.2, VTEP=192.168.1.2, VNI=5000
VM-B comes online — VTEP-2 advertises immediately
EVPN Type-2 update distributed
First frame to VM-B → encapsulated directly to VTEP-2
No ARP flooding — VTEP-1 already knows the destination

BGP EVPN also distributes ARP suppression information — VTEPs answer ARP requests locally from their EVPN-learned MAC/IP table, eliminating broadcast ARP from crossing the underlay entirely.

BGP EVPN Route Types

TypeNamePurpose
Type 1Ethernet Auto-DiscoveryMulti-homing fast convergence
Type 2MAC/IP AdvertisementMAC and IP binding — most common
Type 3Inclusive MulticastBUM traffic handling (flood lists)
Type 4Ethernet SegmentMulti-homing designated forwarder election
Type 5IP PrefixRoute leaking between VNIs (inter-VRF)

Underlay Requirements

The underlay must:

  1. Route between all VTEPs — standard IP routing (OSPF, BGP) is sufficient
  2. Support MTU ≥ 1550 (or use jumbo frames for full 9000-byte inner payloads)
  3. Hash ECMP flows on the inner packet — load balancing across multiple paths

ECMP hashing is critical. Older hardware hashed only on outer IP headers, placing all traffic between two VTEPs on one link. Modern hardware and software VTEPs use the UDP source port as an entropy field — the source port is derived from a hash of the inner frame’s 5-tuple, ensuring good load balancing across ECMP paths.

VXLAN in Practice

PlatformVTEP implementation
VMware NSX-THypervisor kernel (N-VDS)
AWS VPCNitro hypervisor hardware
Proxmox VELinux kernel VXLAN interface (ip link add vxlan0)
Cisco Nexus (NX-OS)Hardware ASIC
Open vSwitchSoftware VTEP (OpenFlow-programmable)

VXLAN with BGP EVPN is the standard for modern spine-leaf data center fabrics and underlies virtually all large-scale cloud virtual networking.

References