Overview
Serverless compute means you deploy code or containers without managing servers, operating systems, or runtime environments. The platform provisions capacity on demand, scales automatically with traffic — including down to zero when there is no traffic — and charges only for actual usage.
GCP offers three serverless compute platforms that span different levels of abstraction:
- App Engine — the original GCP serverless platform; for web applications and APIs
- Cloud Functions — event-driven, single-function execution
- Cloud Run — container-based, fully managed; the most flexible of the three
Understanding the differences between these services and the trade-offs involved in choosing among them is central to GCP architecture decisions.
The Compute Spectrum
It is useful to place serverless in context alongside GCP’s full compute spectrum:
| Service | Abstraction Level | You Manage | GCP Manages |
|---|---|---|---|
| Compute Engine | IaaS | OS, runtime, app, scaling | Hardware, hypervisor |
| GKE Standard | CaaS | Nodes, node pools, containers | Control plane |
| GKE Autopilot | Managed CaaS | Containers only | Nodes + control plane |
| App Engine Flexible | PaaS (container) | Dockerfile, app | Nodes, scaling |
| Cloud Run | Serverless containers | Container image | Instances, scaling, OS |
| App Engine Standard | PaaS | App code only | Runtime, scaling, OS |
| Cloud Functions | FaaS | Function code only | Runtime, scaling, packaging |
Moving down the table, you give up control in exchange for reduced operational burden. Serverless services abstract away infrastructure management entirely — you write code or build a container, and the platform handles the rest.
App Engine
App Engine is GCP’s Platform-as-a-Service offering and the oldest serverless service on GCP. It is designed for web applications and HTTP APIs. An App Engine application has a specific structure: it is divided into services (previously called modules), each service runs a deployable unit (e.g., a frontend, an API backend, a worker service), and each service can have multiple deployed versions running simultaneously.
Standard vs Flexible Environments
| Feature | Standard Environment | Flexible Environment |
|---|---|---|
| Runtime | Specific versions only: Python 2.7/3.x, Java 8/11/17, Go, PHP, Ruby, Node.js | Any language via custom Dockerfile |
| Container model | Google-managed sandbox | Docker container on Compute Engine VMs |
| Scaling | Scales to zero (no cost when idle) | Minimum 1 instance always running |
| Startup time | Milliseconds (pre-warmed) | Minutes (Docker container start) |
| Pricing | Per instance-hour; free tier available | Per vCPU-hour and GB memory-hour |
| SSH access | No | Yes |
| Local disk writes | Restricted (temp only) | Full disk access |
| Background threads | Restricted | Full support |
| Best for | Spiky, unpredictable traffic; cost-sensitive; standard runtimes | Consistent traffic; custom dependencies; background processing |
Standard is the right choice for most web workloads: it starts fast, scales to zero, and has a free tier. When traffic is zero, you pay nothing. When a request arrives, an instance starts in milliseconds.
Flexible uses Compute Engine VMs under the hood. It is better when you need a custom runtime (a language GCP’s Standard sandbox does not support), access to native OS libraries, or long-running background threads. The trade-off is that Flexible always keeps at least one instance running — you cannot scale to zero.
App Engine Features
Traffic splitting allows routing a percentage of requests to different deployed versions. This enables:
- Canary deployments — send 5% of traffic to a new version, monitor for errors, then increase gradually
- A/B testing — split traffic between two versions to compare user behaviour
# Split traffic: 90% to v1, 10% to v2
gcloud app services set-traffic default \
--splits=v1=0.9,v2=0.1
# Deploy a new version without shifting traffic
gcloud app deploy --no-promote
Version management allows multiple versions of each service to be deployed simultaneously. You can shift traffic to any deployed version instantly — rollback is as simple as pointing traffic back to the previous version.
Cloud Functions
Cloud Functions is GCP’s Function-as-a-Service (FaaS) offering. You write a single function in a supported language, deploy it, and the platform handles everything else: packaging, deployment, execution, and scaling. Functions execute in response to events or HTTP requests.
Trigger Types
| Trigger | Description |
|---|---|
| HTTP | Function exposes an HTTPS endpoint; called by direct HTTP requests |
| Cloud Pub/Sub | Function invoked when a message is published to a Pub/Sub topic |
| Cloud Storage | Function invoked on object create, delete, archive, or metadata update |
| Cloud Firestore | Function invoked on document create, update, or delete |
| Firebase events | Auth, Realtime Database, Remote Config, Analytics events |
| Eventarc | Unified event routing from 90+ GCP services and custom sources |
Gen 1 vs Gen 2
| Feature | Gen 1 | Gen 2 |
|---|---|---|
| Underlying infrastructure | Custom FaaS runtime | Cloud Run |
| Maximum timeout | 9 minutes | 60 minutes |
| Maximum instance size | 8 GB RAM | 16 GB RAM, 4 vCPUs |
| Concurrency | 1 request per instance | Multiple concurrent requests per instance |
| Minimum instances | Not supported | Supported (avoid cold starts) |
| Traffic splitting | No | Yes (via Cloud Run revisions) |
Cloud Functions Gen 2 is built on top of Cloud Run. This means every Gen 2 function is actually a Cloud Run service under the hood, giving it Cloud Run’s concurrency model, longer timeouts, and more powerful instance sizes. For new function deployments, Gen 2 is preferred.
Runtimes
Cloud Functions supports: Node.js (18, 20), Python (3.10, 3.11, 3.12), Go (1.20, 1.21), Java (11, 17, 21), .NET (6, 8), Ruby (3.2), PHP (8.2).
# Example Cloud Functions Gen 2 HTTP function
import functions_framework
@functions_framework.http
def hello_http(request):
"""HTTP Cloud Function."""
request_json = request.get_json(silent=True)
name = request_json.get('name', 'World') if request_json else 'World'
return f'Hello, {name}!'
# Deploy a Python HTTP function (Gen 2)
gcloud functions deploy hello-function \
--gen2 \
--runtime=python312 \
--region=us-central1 \
--source=. \
--entry-point=hello_http \
--trigger-http \
--allow-unauthenticated
Cloud Run
Cloud Run is GCP’s container-based serverless platform. You provide a container image (any language, any framework, as long as it listens on an HTTP port), and Cloud Run handles deployment, scaling, TLS termination, and infrastructure. It is the most flexible of the three serverless options because it places no constraints on language, runtime version, or dependencies — if it runs in a container, it runs on Cloud Run.
Core Concepts
Requests and concurrency: Cloud Run is fundamentally HTTP-request-driven. Instances receive requests and the platform scales the number of instances based on concurrent requests. The default concurrency is 80 requests per instance; the maximum is 1,000. Higher concurrency means fewer instances and lower cost, but requires the application to be thread-safe and handle concurrent requests correctly.
Minimum and maximum instances:
--min-instances=0(default) — scale to zero when idle; no cost between requests; cold starts possible--min-instances=N— keep N instances always warm; eliminates cold starts; you pay for idle instance time--max-instances=N— cap the number of instances; prevents runaway scaling and cost
Request timeout: HTTP requests can run for up to 3,600 seconds (1 hour). This is far longer than typical HTTP responses, accommodating long-running stream processing or data export operations.
Cloud Run vs Cloud Functions Decision
| Criterion | Use Cloud Functions | Use Cloud Run |
|---|---|---|
| Code structure | Single function | Full application or multiple endpoints |
| Container control | Not needed | Required (custom base image, dependencies) |
| Language | Supported runtimes only | Any language |
| Concurrency handling | Simple (per-invocation) | Required (multiple concurrent requests) |
| Long timeouts (> 9 min) | Gen 2 only | Yes |
| Simplest deployment | Yes | Slightly more setup (Dockerfile needed) |
| Traffic splitting | Gen 2 only | Yes (revisions) |
The simplest heuristic: if your workload is a single function triggered by an event and fits a supported runtime, Cloud Functions (Gen 2) is the easier path. For anything requiring a custom container image, a full web framework, or more complex routing, Cloud Run is more appropriate.
Revisions and Traffic Splitting
Every Cloud Run deployment creates a new revision — an immutable snapshot of the container image and configuration. Traffic can be split between revisions, enabling canary deployments and rollbacks:
# Deploy a new revision sending only 10% of traffic
gcloud run deploy my-service \
--image=gcr.io/my-project/my-app:v2 \
--no-traffic
# Split traffic: 90% to previous revision, 10% to new
gcloud run services update-traffic my-service \
--to-revisions=my-service-00001-abc=90,my-service-00002-def=10
# Roll back instantly by routing all traffic to a specific revision
gcloud run services update-traffic my-service \
--to-revisions=my-service-00001-abc=100
Cloud Run Jobs
Cloud Run Jobs run containers to completion rather than serving HTTP requests. A Job defines a container task, and it runs the task a specified number of times (with parallelism). Jobs are ideal for:
- Batch data processing
- Database migrations
- Scheduled report generation
- One-off administrative tasks
# Create and execute a Cloud Run Job
gcloud run jobs create my-job \
--image=gcr.io/my-project/batch-processor:latest \
--tasks=10 \
--parallelism=5 \
--region=us-central1
gcloud run jobs execute my-job --region=us-central1
Job execution timeout is up to 24 hours. Individual tasks can run for up to 24 hours. Failed tasks are automatically retried (configurable).
Private Networking with VPC Connector
By default, Cloud Run services can only reach public internet endpoints. To access resources in a VPC (Cloud SQL via private IP, Redis on Memorystore, internal services), you need a Serverless VPC Access connector.
A VPC connector is a managed proxy that forwards traffic from Cloud Run into a specified VPC subnet using a small pool of VMs. It adds a small amount of latency (typically < 1ms within a region).
# Create a VPC connector
gcloud compute networks vpc-access connectors create my-connector \
--region=us-central1 \
--subnet=my-subnet \
--subnet-project=my-project
# Deploy a Cloud Run service using the connector
gcloud run deploy my-service \
--image=gcr.io/my-project/my-app \
--vpc-connector=my-connector \
--vpc-egress=private-ranges-only
--vpc-egress=private-ranges-only sends only RFC 1918 traffic through the connector; internet traffic goes directly. --vpc-egress=all-traffic routes all outbound traffic through the VPC, which is required when accessing the internet via a NAT gateway for a static outbound IP.
Cold Starts
A cold start occurs when a serverless platform must start a new instance to handle a request. During a cold start, the platform must:
- Allocate a container (or function runtime)
- Pull the container image (Cloud Run) or package the function (Cloud Functions)
- Start the application process
- Initialise the application code (database connections, config loading, etc.)
- Handle the request
Cold start duration depends on container image size, runtime, and application initialisation time. Typical cold start latency:
- Cloud Functions (Python/Node.js): 100–500 ms
- Cloud Run (lightweight Go/Node.js): 200ms–1s
- Cloud Run (Java/heavy frameworks): 2–10s
- App Engine Standard: milliseconds (pre-warmed)
Mitigating Cold Starts
-
Minimum instances — keep at least one instance always warm (
--min-instances=1). Adds cost but eliminates cold starts for user-facing services. -
CPU always allocated (Cloud Run) — by default, Cloud Run only allocates CPU during request processing. With
--cpu-always-allocated, CPU is always allocated, allowing background tasks and faster warm-up. -
Lean container images — use minimal base images (distroless, Alpine). Smaller images are pulled faster on cold start.
-
Lazy initialisation — defer expensive initialisation (database connection pools, heavy config parsing) until the first request, or do it in a goroutine/thread after the process is already listening.
-
Concurrency — higher concurrency per instance means fewer cold starts under load, since each existing instance absorbs more requests.
Event-Driven Architecture with Eventarc
Eventarc is GCP’s unified eventing platform that routes events from GCP services, custom applications, and third-party sources to serverless destinations (Cloud Run, Cloud Functions Gen 2, Cloud Workflows).
Eventarc supports two event formats:
- Direct events — triggered by actions in GCP services (e.g., Cloud Storage object created, BigQuery job completed, Pub/Sub message published)
- Audit log events — triggered by any Cloud Audit Log entry, providing coverage of virtually every GCP API call
# Create an Eventarc trigger to invoke Cloud Run on Cloud Storage events
gcloud eventarc triggers create my-trigger \
--destination-run-service=my-service \
--destination-run-region=us-central1 \
--event-filters="type=google.cloud.storage.object.v1.finalized" \
--event-filters="bucket=my-bucket" \
--service-account=my-trigger-sa@my-project.iam.gserviceaccount.com
For simpler scenarios, Cloud Functions can be triggered directly by Pub/Sub, Cloud Storage, or Firestore without configuring Eventarc explicitly. Eventarc becomes the right tool when you need centralised event routing, audit-log-based triggers, or routing to multiple destinations.