Overview
Google Kubernetes Engine (GKE) is a fully managed Kubernetes service on GCP. Google created Kubernetes internally as Borg and open-sourced it in 2014, so GKE runs the same orchestration system that underpins a significant portion of Google’s own production infrastructure. As the original authors, Google’s implementation is tightly integrated with the broader GCP platform — IAM, Cloud Monitoring, Cloud Logging, Cloud Load Balancing, and Artifact Registry all have native GKE integrations.
Kubernetes itself is an open-source container orchestration system. It manages the deployment, scaling, and operation of containerised workloads across a cluster of machines. GKE takes the operational burden of running a Kubernetes control plane off your hands and integrates deeply with GCP’s networking, storage, and security primitives.
GKE vs Self-Managed Kubernetes
Running Kubernetes yourself (on Compute Engine VMs, for example) gives you full control but requires managing:
- etcd cluster health and backups
- Control plane upgrades (kube-apiserver, kube-controller-manager, kube-scheduler)
- Node OS patching
- Certificate rotation
- High availability of the control plane across zones
- Integration with cloud provider networking and storage
GKE handles all of this automatically. The GKE control plane is Google-managed, Google-operated, and covered by a 99.95% (zonal) or 99.99% (regional) uptime SLA. You focus on workloads; Google manages the Kubernetes machinery.
Cluster Modes: Standard vs Autopilot
GKE offers two cluster modes with fundamentally different operational models.
Standard Mode
In Standard mode, you manage node pools — groups of Compute Engine VMs that form the cluster’s worker nodes. You choose the machine type, OS image, disk size, and autoscaling settings for each node pool. You pay per node (per VM), regardless of how much of each VM’s capacity your pods actually use.
Standard mode is appropriate when:
- You need privileged containers or DaemonSets that require host-level access
- You have specific GPU or TPU node requirements
- Your workloads have specialised node configuration requirements
- You need fine-grained control over node pool configuration for cost or performance reasons
Autopilot Mode
In Autopilot mode, Google manages the nodes entirely. You do not see or manage node pools. You declare pod resource requests and GKE provisions the right amount of compute automatically. You pay per pod resource request (CPU and memory), not per node.
| Feature | Standard Mode | Autopilot Mode |
|---|---|---|
| Node management | You manage node pools | Google manages nodes |
| Billing unit | Per node (VM) | Per pod resource request |
| Privileged containers | Allowed | Not allowed |
| DaemonSets | Allowed | Restricted |
| Custom node images | Allowed | Not available |
| Cluster Autoscaler | Optional | Always enabled |
| Cost for idle capacity | You pay for empty nodes | No charge for unscheduled capacity |
| Use case | Complex or custom workloads | Simpler deployments, predictable billing |
Autopilot is excellent for teams that want to focus on application code without platform engineering overhead. It removes the need to right-size node pools, manage node pool upgrades, or worry about bin-packing efficiency.
Zonal vs Regional Clusters
Zonal Clusters
A zonal cluster has its control plane in a single zone. Worker nodes can be spread across multiple zones if you configure multi-zone node pools, but the control plane itself is a single point of failure. If the zone hosting the control plane has an outage, the Kubernetes API becomes unavailable — existing workloads may continue running (kubelet operates independently) but you cannot deploy, scale, or make configuration changes.
Zonal clusters have a 99.95% uptime SLA for the control plane.
Regional Clusters
A regional cluster replicates the control plane across three zones within the selected region. All three replicas serve API traffic simultaneously; if any zone fails, the remaining two control plane replicas continue operating. Worker nodes are also spread across three zones by default.
Regional clusters have a 99.99% uptime SLA. They are the correct choice for any production workload where cluster availability matters.
# Create a regional GKE cluster
gcloud container clusters create my-cluster \
--region=us-central1 \
--num-nodes=2 \
--machine-type=n2-standard-4 \
--release-channel=regular
Node Pools
A node pool is a group of nodes within a cluster that all share the same configuration (machine type, OS, disk, labels, and taints). A cluster can have multiple node pools with different configurations, allowing mixed workloads on the same cluster.
Common multi-pool patterns:
- Default pool — general-purpose N2 nodes for most workloads
- GPU pool — A2 or G2 nodes for ML inference or training jobs (created on demand)
- High-memory pool — M-series nodes for memory-intensive analytics workloads
- Spot pool — Spot VMs for fault-tolerant batch workloads at reduced cost
Nodes are tainted and pods use node selectors or tolerations to target specific pools:
# Pod spec targeting the GPU node pool
spec:
nodeSelector:
cloud.google.com/gke-nodepool: gpu-pool
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
Node Auto-Provisioning
Node Auto-Provisioning (NAP) is an extension of the Cluster Autoscaler that automatically creates and deletes node pools as needed. Instead of pre-configuring all possible node pools, NAP creates a pool with the right machine type when a pod cannot be scheduled anywhere else. When pods are no longer running on auto-provisioned nodes, NAP removes those nodes and eventually the pool.
Key Kubernetes Objects
Understanding core Kubernetes objects is prerequisite to working with GKE effectively.
| Object | Purpose |
|---|---|
| Pod | Smallest deployable unit; one or more containers sharing a network namespace and storage |
| Deployment | Manages a ReplicaSet; enables declarative rolling updates and rollbacks |
| StatefulSet | For stateful applications; provides stable network identity and ordered pod management |
| DaemonSet | Ensures one pod runs on every (or selected) node; used for log collectors, monitoring agents |
| Service | Stable network endpoint for a set of pods; types: ClusterIP, NodePort, LoadBalancer |
| Ingress | HTTP(S) routing rules; routes traffic to Services based on host and path |
| PersistentVolumeClaim (PVC) | Request for persistent storage; GKE maps this to a GCP persistent disk |
| ConfigMap / Secret | Inject configuration and sensitive data into pods without embedding in images |
| HorizontalPodAutoscaler | Scales pod replica count based on CPU, memory, or custom metrics |
Workload Identity
Workload Identity is the recommended way for GKE pods to authenticate to GCP APIs. Without it, pods would need either a service account key file (a security risk) or be given the node’s service account permissions (too broad — all pods on the node share the same identity).
Workload Identity maps a Kubernetes Service Account (KSA) to a GCP Service Account (GSA). Pods annotated with the KSA receive tokens that GCP accepts as proof of the GSA’s identity, allowing fine-grained per-workload permissions with no key management.
Setup Steps
- Enable Workload Identity on the cluster
- Create a Kubernetes Service Account
- Create a GCP Service Account with required IAM roles
- Bind the KSA to the GSA
- Annotate the KSA with the GSA email
# Enable Workload Identity on cluster creation
gcloud container clusters create my-cluster \
--workload-pool=my-project.svc.id.goog
# Create GCP Service Account
gcloud iam service-accounts create my-workload-sa
# Grant the GCP SA permissions (e.g., Cloud Storage access)
gcloud projects add-iam-policy-binding my-project \
--member="serviceAccount:[email protected]" \
--role="roles/storage.objectViewer"
# Bind the KSA to the GSA
gcloud iam service-accounts add-iam-policy-binding [email protected] \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:my-project.svc.id.goog[my-namespace/my-ksa]"
# Annotate the Kubernetes Service Account
kubectl annotate serviceaccount my-ksa \
iam.gke.io/gcp-service-account=my-workload-sa@my-project.iam.gserviceaccount.com
GKE Networking
VPC-Native Clusters
GKE clusters should be created as VPC-native clusters (the default for new clusters). In VPC-native mode, pods receive IP addresses from a secondary IP range on the cluster’s subnet — alias IPs. This means pods are first-class citizens in the VPC network, reachable directly from on-premises or other VPCs via VPN/Interconnect/VPC Peering without network address translation.
In contrast, routes-based (legacy) clusters use custom routes, which do not scale as well and are less compatible with VPC features like Shared VPC and VPC Service Controls.
Cluster IP Ranges
Three separate CIDR ranges are required for a GKE cluster:
- Node IP range — the primary subnet range; one IP per node
- Pod IP range (alias IP secondary range) — typically /14 or larger; 256 IPs per node by default (configurable with
--max-pods-per-node) - Service IP range (cluster services secondary range) — for ClusterIP services within the cluster
Plan these ranges carefully before cluster creation — they cannot be changed after the fact on most configurations.
Load Balancing in GKE
When you create a Kubernetes Service of type LoadBalancer, GKE automatically provisions a GCP External Network Load Balancer (pass-through, regional) and assigns it a static IP. This is suitable for TCP/UDP services.
For HTTP(S) workloads, the GKE Ingress controller provisions a GCP Global External HTTP(S) Load Balancer. An Ingress resource in Kubernetes translates to a URL map, backend services, and health checks in GCP — all managed automatically.
# Example Ingress routing two services on different paths
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-ingress
annotations:
kubernetes.io/ingress.class: "gce"
spec:
rules:
- host: app.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
- path: /
pathType: Prefix
backend:
service:
name: frontend-service
port:
number: 80
Private Clusters
In a Private GKE cluster, nodes have only private (RFC 1918) IP addresses — no public IPs. The cluster’s control plane endpoint can also be made private (reachable only within the VPC or via Authorised Networks). Private clusters are the recommended configuration for production workloads that should not be directly reachable from the internet.
Cluster Autoscaler
The GKE Cluster Autoscaler (CA) automatically adjusts the number of nodes in a node pool based on scheduling demand. When pods cannot be scheduled due to insufficient resources, the CA adds nodes. When nodes are underutilised and their pods can be rescheduled elsewhere, the CA removes nodes.
Cluster Autoscaler works at the node pool level. It respects:
- Minimum and maximum node counts per pool
- Pod Disruption Budgets (PDBs) — it will not drain a node if doing so would violate a PDB
- Node labels and taints — it will only add nodes from pools that can actually schedule the pending pods
Security Features
Binary Authorization
Binary Authorization enforces a policy that only cryptographically signed container images can be deployed to GKE. Signing happens as part of the CI/CD pipeline (e.g., Cloud Build signs images after passing tests and vulnerability scanning). At deploy time, GKE’s admission controller verifies the signature against the policy — unsigned or untrusted images are rejected.
GKE Sandbox (gVisor)
GKE Sandbox adds an additional security layer around containers using gVisor, a Google-developed container sandbox. gVisor intercepts container system calls in a user-space kernel, isolating containers from the underlying node’s kernel. It is useful for running untrusted or multi-tenant workloads where a container escape through a kernel vulnerability is a concern. Sandboxed pods have slightly higher overhead than standard containers.
Shielded GKE Nodes
Shielded GKE Nodes use the same protections as Shielded VMs: Secure Boot, virtual TPM, and Integrity Monitoring. They prevent node images from being tampered with between provisioning and first boot, protecting against supply-chain attacks targeting the node OS.
Cluster Maintenance Windows and Exclusions
GKE automatically upgrades both the control plane and nodes to stay within a supported Kubernetes version range. Maintenance windows let you control when GKE performs these upgrades — for example, restricting upgrades to overnight hours or weekends.
Maintenance exclusions allow you to prevent any upgrades during specific periods (e.g., freeze periods around major product launches or end-of-year trading).
Release channels (Rapid, Regular, Stable) determine how quickly a cluster receives new Kubernetes versions. Most production clusters use the Regular channel; the Stable channel receives versions that have been validated for several months.
# Set a maintenance window and exclusion
gcloud container clusters update my-cluster \
--maintenance-window-start=2026-01-01T02:00:00Z \
--maintenance-window-end=2026-01-01T06:00:00Z \
--maintenance-window-recurrence="FREQ=WEEKLY;BYDAY=SA,SU"