GCP — Load Balancing and CDN

LOAD-BALANCING

GCP's load balancing portfolio — global vs regional, HTTP(S), TCP, UDP, and internal load balancers, plus Cloud CDN for caching.

gcpgoogle-cloudload-balancingcloud-cdnnetworkinghttps

Overview

Google Cloud’s load balancing portfolio is broader than most cloud providers offer at first glance. Rather than a single “load balancer” service with configuration options, GCP provides seven distinct load balancer types, each purpose-built for a specific combination of scope (global vs regional), protocol (HTTP/HTTPS, TCP, UDP), and traffic direction (external vs internal). Understanding which product to reach for — and why — is as important as knowing how to configure any single one of them.

All GCP load balancers are software-defined, fully managed, and require no pre-warming. They scale to meet demand automatically. The key architectural split is between proxy-mode load balancers (which terminate the client connection and open a new connection to the backend) and the pass-through Network Load Balancer (which forwards packets without terminating connections, preserving the original client IP).


The Seven Load Balancer Types

GCP categorises its load balancers across three dimensions: scope (global or regional), protocol, and traffic direction (external or internal).

Load BalancerScopeProtocolDirectionMode
Global External HTTP(S)GlobalHTTP, HTTPS, HTTP/2, gRPCExternalProxy
Regional External HTTP(S)RegionalHTTP, HTTPSExternalProxy
External SSL ProxyGlobalSSL/TLS (non-HTTP)ExternalProxy
External TCP ProxyGlobalTCP (non-HTTP, non-SSL)ExternalProxy
External Network TCP/UDPRegionalTCP, UDP, ESP, ICMP, SCTPExternalPass-through
Internal HTTP(S)RegionalHTTP, HTTPSInternalProxy
Internal TCP/UDPRegionalTCP, UDPInternalPass-through

The External Network TCP/UDP and Internal TCP/UDP load balancers are pass-through — they do not terminate connections. The client IP is preserved at the backend, which matters for applications that log source IPs or make per-IP decisions. All other types terminate the connection at the load balancer and open a new one to the backend.

Global load balancers require Premium Network Tier, which routes traffic across Google’s private backbone from the nearest edge Point of Presence (POP) to the backend. Regional load balancers work with Standard Network Tier, where traffic uses the public internet for some legs of its journey.


Global External HTTP(S) Load Balancer

This is the flagship product and the most commonly deployed load balancer for internet-facing web applications. Its defining properties:

Component Architecture

A Global External HTTP(S) Load Balancer is assembled from several distinct components:

Forwarding rule — the entry point. Binds an IP address, port, and protocol to a target proxy. There is one forwarding rule for HTTP (port 80) and one for HTTPS (port 443) in a typical deployment.

Target HTTP(S) proxy — accepts incoming connections and references a URL map. For HTTPS, it also references an SSL certificate (Google-managed or self-managed).

URL map — the routing table. Defines host and path rules that map requests to backend services. A default service handles all traffic that does not match a specific rule.

Backend service — references one or more backend groups (Managed Instance Groups or Network Endpoint Groups), plus a health check. The backend service also holds configuration for session affinity, connection draining timeout, and Cloud CDN settings.

Health check — periodically probes backend instances to determine if they can serve traffic. Unhealthy instances are removed from the rotation. Health check probes originate from Google’s health checking IP ranges — firewall rules must allow these.


SSL Proxy and TCP Proxy Load Balancers

These are global, proxy-mode load balancers for non-HTTP protocols that still benefit from global anycast routing.

External SSL Proxy terminates SSL/TLS and then proxies the unencrypted (or re-encrypted) TCP connection to backends. Used for protocols like IMAPS, SMTPS, or any TLS-wrapped TCP service that is not HTTP. Supports certificate management and SSL policy enforcement (minimum TLS version, cipher suites).

External TCP Proxy is the same idea without SSL — it proxies raw TCP connections globally. Useful for protocols that need global reach but are not HTTP and do not use TLS at the load balancer layer. Applications must handle their own encryption end-to-end if needed.

Both products provide the same global anycast IP and Google backbone routing advantage as the HTTP(S) load balancer, but for non-HTTP workloads.


External Network TCP/UDP Load Balancer

This is GCP’s pass-through regional load balancer. Packets are forwarded directly to backend instances without connection termination. Key characteristics:

Common use cases include gaming servers (UDP, low latency, client IP matters), VoIP, and applications where the backend needs the real client IP for logging or policy enforcement.


Internal Load Balancers

Internal load balancers handle traffic that stays within a VPC and never reaches the internet. They are used for microservice-to-microservice communication, splitting traffic across backend tiers, and building internal service meshes.

Internal TCP/UDP Load Balancer

Regional, pass-through load balancer for internal VPC traffic. Assigns an internal RFC 1918 IP address (called a forwarding rule IP or an Internal Load Balanced IP) that other VMs in the VPC can reach. Backends are Managed Instance Groups in the same region. Because it is pass-through, the backend sees the actual client IP. Used for internal TCP or UDP services — database proxies, internal APIs over custom ports, UDP-based internal services.

Internal HTTP(S) Load Balancer

Regional, proxy-mode load balancer for internal HTTP and HTTPS traffic. Provides the same URL map, backend service, and health check architecture as the external HTTP(S) load balancer, but with an internal IP address accessible only within the VPC (and connected networks via Shared VPC or VPC peering). The proxy layer adds features like path-based routing, header-based routing, and traffic splitting — useful for internal microservice architectures and service mesh deployments that need L7 routing without external exposure.


Choosing the Right Load Balancer

RequirementRecommended Load Balancer
Global web app, HTTPS, URL routingGlobal External HTTP(S)
Regional web app, HTTPSRegional External HTTP(S)
Global TCP, non-HTTP, with SSL offloadingExternal SSL Proxy
Global TCP, non-HTTP, no SSL offloadingExternal TCP Proxy
UDP traffic, external, client IP requiredExternal Network TCP/UDP
Internal microservices, L7 routingInternal HTTP(S)
Internal TCP/UDP, client IP requiredInternal TCP/UDP

The most common mistake is using the External Network TCP/UDP load balancer for HTTP workloads — it works, but you lose URL routing, CDN integration, Cloud Armor, and SSL offloading. The Global External HTTP(S) load balancer is almost always the right choice for web workloads.


Cloud CDN

Cloud CDN caches content at Google’s globally distributed edge POPs, reducing latency for end users and reducing load on origin backends. It integrates exclusively with the Global External HTTP(S) Load Balancer — it is not available with other load balancer types.

How Caching Works

When a request arrives at a Google edge POP, Cloud CDN checks whether it has a cached response for that request. If it does (a cache hit), it returns the cached response immediately without contacting the origin backend. If it does not (a cache miss), it forwards the request to the backend, caches the response (if the response headers permit caching), and returns it to the client.

Cache eligibility is determined by response headers:

Cache Modes

Cloud CDN offers three cache modes that control how aggressively content is cached:

Cache ModeBehaviour
USE_ORIGIN_HEADERS (default)Caches only if origin sends explicit Cache-Control headers permitting caching.
CACHE_ALL_STATICAutomatically caches static content types (CSS, JS, images, fonts) regardless of Cache-Control headers. Dynamic responses still require explicit cache headers.
FORCE_CACHE_ALLCaches all successful responses regardless of origin headers. Origin Cache-Control: no-store is ignored. Use with care for truly dynamic content.

Cache Keys

Cloud CDN builds a cache key for each request — the key determines whether a new request matches a cached entry. By default the cache key includes the full URL (protocol, host, path, query string). You can customise cache keys to:

Cache Invalidation

When content at the origin changes before it expires in the CDN, you can invalidate cached entries:

Invalidations propagate to edge POPs quickly but are not instantaneous. Setting appropriate max-age values in Cache-Control headers is a better long-term strategy than relying on invalidation.

Signed URLs and Signed Cookies

For content that should only be accessible to authorised users (video streams, downloadable files, private assets), Cloud CDN supports:


Network Endpoint Groups (NEGs)

A Network Endpoint Group is a configuration object that specifies a group of backend endpoints rather than a group of VMs. NEGs allow load balancers to send traffic to targets that are not Compute Engine VMs.

NEG TypeBackend TargetUse Case
Zonal NEGVM instances or GKE pods (container-native load balancing)Direct-to-pod routing in GKE, bypasses kube-proxy
Internet NEGExternal hostname or IP (on-premises or other cloud)Hybrid backends, third-party APIs as backends
Serverless NEGCloud Run services, Cloud Functions, App EngineRoute load balancer traffic to serverless workloads
Hybrid connectivity NEGOn-premises endpoints via Cloud Interconnect or VPNExtend GCP load balancing to on-premises servers

Serverless NEGs are especially useful — they allow you to place a Global External HTTP(S) Load Balancer in front of Cloud Run or Cloud Functions, gaining URL routing, Cloud Armor, and Cloud CDN without any Compute Engine infrastructure. The load balancer terminates SSL, evaluates security policies, and routes to the appropriate serverless backend.

Zonal NEGs with container-native load balancing (also called direct-path or pod-native) route traffic directly to individual GKE pods rather than to node-level NodePort services. This eliminates a level of indirection and reduces latency, and allows health checks to probe individual pods rather than nodes.


Network Tier Consideration

GCP offers two network tiers that affect how traffic routes to your load balancer:

Premium Tier — traffic enters Google’s network at the nearest edge POP and traverses Google’s private global backbone to the region hosting your backends. This provides the lowest latency and the best reliability. Required for global load balancers.

Standard Tier — traffic uses the public internet from the user to a Google POP in the same region as your backends. Lower cost but higher and more variable latency. Only supports regional load balancers.

For latency-sensitive global applications, Premium Tier with the Global External HTTP(S) Load Balancer is the standard choice. For cost-sensitive regional applications where users are geographically close to the backend region, Standard Tier with a Regional External HTTP(S) Load Balancer is a reasonable trade-off.


References