GCP — Managed SQL Databases

Overview

GCP offers two distinct managed relational database services: Cloud SQL for traditional single-region relational workloads, and Cloud Spanner for globally distributed relational workloads requiring horizontal scale and strong consistency. A third option, AlloyDB, targets PostgreSQL workloads that have outgrown Cloud SQL but don’t need the global distribution of Spanner. Understanding the architectural differences between these services — and what drives the cost differential — is essential for making the right selection for a given workload.

Cloud SQL

Cloud SQL is GCP’s managed relational database service supporting three engines: MySQL, PostgreSQL, and SQL Server. “Managed” means Google handles patching, backups, failover, and storage auto-growth. You retain control over schema, query optimization, connection management, and configuration parameters.

Instance Specifications

Cloud SQL instances scale vertically — you select a machine type and can resize it (with a brief restart). Maximum instance sizes are substantial:

Up to 96 vCPUs and 624 GB of RAM
Up to 64 TB of storage, with auto-growth enabled by default (storage only grows, never shrinks automatically)
SSD or HDD storage tiers; SSD is recommended for most workloads

High Availability

Cloud SQL HA operates on an active/standby model within a single region:

A primary instance serves all reads and writes in one zone
A standby instance in a different zone within the same region receives synchronous replication
If the primary zone fails, Cloud SQL automatically promotes the standby — failover typically completes in under 60 seconds
The connection string does not change after failover; Cloud SQL handles the DNS update transparently

HA doubles the instance cost because two instances run simultaneously (primary + standby). HA does not provide cross-region protection — it covers zone failures only.

Read Replicas

Read replicas serve SELECT queries to offload read traffic from the primary. They use asynchronous replication, which means there is a small replication lag — applications that write and immediately read may see stale data.

Replica Type	Location	Promotion
In-region replica	Same region as primary	Can be promoted to standalone instance
Cross-region replica	Different region	Can be promoted for cross-region DR
Cascading replica	Replica of a replica	Reduces primary load further

Cross-region replicas are the mechanism for Cloud SQL disaster recovery across regions. They do not fail over automatically — a DBA must manually promote the cross-region replica if the primary region fails. This is a deliberate design choice; automatic cross-region failover would risk split-brain scenarios.

Automated Backups and Point-in-Time Recovery

Cloud SQL takes automated daily backups during a configurable maintenance window and retains them for a configurable period (1–365 days). Beyond point-in-time snapshots, Cloud SQL supports Point-in-Time Recovery (PITR): restore the database to any second within the binary log retention window (up to 7 days). PITR requires binary logging to be enabled (it is for MySQL/PostgreSQL HA instances by default).

Maintenance Windows

Google applies patches to Cloud SQL instances during a configurable weekly maintenance window (day of week + hour). Patches may cause a brief restart. Best practice: set the maintenance window to off-peak hours and ensure the application handles brief disconnections gracefully.

Cloud SQL Auth Proxy

Direct TCP connections to Cloud SQL require either a public IP (with authorized networks) or a private IP within the same VPC. The Cloud SQL Auth Proxy is a binary that runs alongside your application and handles:

Authentication: Uses the application’s Google credentials (service account) to authenticate to Cloud SQL — no passwords in connection strings
Encryption: Wraps the connection in TLS automatically
Connectivity: Works from any environment (on-premises, GKE, Cloud Run, Compute Engine, local dev) without firewall rule changes

The proxy listens on 127.0.0.1:<port> and forwards to the Cloud SQL instance. Applications connect to localhost as if Cloud SQL were a local database. This is the recommended connection method for all non-VPC environments and for GKE workloads.

Cloud Spanner

Cloud Spanner is a fundamentally different kind of relational database. It is horizontally scalable — meaning you add capacity by adding nodes (processing units), and it automatically reshards data across them. It maintains a SQL interface with strong ACID semantics and strong consistency across all nodes globally. This combination — horizontal scale + relational model + global strong consistency — is what makes Spanner unusual.

TrueTime and External Consistency

The foundational technology that enables Spanner’s global consistency is TrueTime: a globally synchronized clock built on GPS receivers and atomic clocks in every Google data center. TrueTime provides a bounded uncertainty interval for the current time — rather than a single timestamp, each TrueTime call returns a range [earliest, latest] within which the true time lies.

Spanner uses TrueTime to assign commit timestamps to transactions. Before committing, Spanner waits for the TrueTime uncertainty interval to pass, ensuring that no two transactions in different regions can be assigned overlapping timestamps. The result is external consistency (a stronger form of serializability): if transaction T1 commits before T2 starts in wall-clock time, T2 is guaranteed to see T1’s writes — even if T1 committed in Iowa and T2 started in Singapore.

This eliminates the clock skew and coordination overhead that makes global strong consistency impractical in conventional distributed databases.

Nodes and Instances

Spanner capacity is purchased in processing units (1,000 processing units = 1 node). A single node provides roughly:

2,000 QPS for reads
1,000 QPS for writes
Up to ~2 TB of storage per node (soft limit; Spanner recommends staying under ~2 TB/node for performance)

Processing units are divisible: you can start with 100 processing units (0.1 node) for small workloads, then scale incrementally without downtime.

Instance Configurations

Configuration	SLA	Replica Count	Use Case
Regional	99.99%	3 read-write replicas across 3 zones	Low latency for single-region workloads
Multi-region	99.999%	Replicas in multiple regions + read-only replicas	Global applications, highest availability

Multi-region configurations include “leader” regions (where writes are committed) and “non-leader” regions (which maintain read-write replicas and serve reads locally). Read latency from a non-leader region is local; write latency includes a round trip to the leader.

Interleaved Tables

Spanner stores related parent-child data together physically through table interleaving. By declaring a child table as interleaved in a parent table, rows from the child table with the same parent key are stored adjacent to the parent row on disk.

CREATE TABLE Orders (
  CustomerId INT64 NOT NULL,
  OrderId INT64 NOT NULL,
  OrderDate DATE
) PRIMARY KEY (CustomerId, OrderId),
  INTERLEAVE IN PARENT Customers ON DELETE CASCADE;

Without interleaving, querying a customer’s orders would require fetching rows from two separate tablet ranges, potentially on different servers. With interleaving, the data co-locates, and the join is local. Interleaving is the Spanner-specific optimization for one-to-many relationships.

Secondary Indexes

Spanner supports secondary indexes, but they work differently from traditional databases: each secondary index is stored as a separate interleaved table internally. Index writes are executed as distributed transactions, adding latency compared to indexes in traditional databases. For this reason, index design is more impactful in Spanner — avoid over-indexing.

Cloud SQL vs Cloud Spanner — Decision Criteria

Dimension	Cloud SQL	Cloud Spanner
Engine	MySQL, PostgreSQL, SQL Server	Spanner SQL (ANSI 2011 superset)
Scale model	Vertical (resize instance)	Horizontal (add nodes/processing units)
Multi-region HA	No (HA is single-region only)	Yes (built-in, part of the product)
Consistency model	Strong within instance	Strong globally (external consistency)
Write throughput ceiling	Single primary limits writes	Scales linearly with nodes
Cost	Low to medium	High (minimum billing, per-node pricing)
Best for	Standard OLTP workloads, web apps, microservices with regional scope	Global OLTP requiring no downtime, financial ledgers, inventory systems with global users

The core decision: if you can keep the database in one region and vertical scale is sufficient, Cloud SQL is almost always the right choice — it is far cheaper, more familiar, and operationally simpler. If your write volume exceeds what a single Cloud SQL primary can handle, or if you genuinely need strong consistency across multiple regions without application-level conflict resolution, Spanner justifies its cost.

AlloyDB for PostgreSQL

AlloyDB is GCP’s newest managed database offering — a PostgreSQL-compatible database built on a custom storage engine optimized for Google’s infrastructure. Key characteristics:

PostgreSQL wire-compatible: Existing PostgreSQL applications connect without modification
Columnar engine: An in-memory columnar cache (enabled with a flag) accelerates analytical queries against the same database that serves OLTP — enabling HTAP (Hybrid Transactional and Analytical Processing) workloads
Higher performance than Cloud SQL: Google claims 4x faster transactional throughput and 100x faster analytical queries compared to standard PostgreSQL on Cloud SQL
HA within a region: Primary + read pool (one or more read replicas) with automatic failover under 60 seconds
No cross-region replication at launch (cross-region support has been added progressively)

AlloyDB occupies the space between Cloud SQL and Cloud Spanner: more capable and higher performance than Cloud SQL, but regional (not globally distributed) and significantly cheaper than Spanner. It is the appropriate choice when you have a demanding PostgreSQL workload that has hit Cloud SQL limits but does not require Spanner’s global distribution.

Connection Management Best Practices

All three services (Cloud SQL, Spanner, AlloyDB) benefit from connection pooling. Database connections are expensive to establish — each connection has memory overhead on the database server.

For Cloud SQL: use PgBouncer (PostgreSQL) or ProxySQL (MySQL) as a connection pooler in front of Cloud SQL, or rely on the connection pooling built into frameworks
For Cloud Run and serverless workloads: use the Cloud SQL connector libraries (available for Python, Java, Go, Node.js) which handle connection lifecycle, reconnection, and Auth Proxy functionality without the sidecar binary
For Cloud Spanner: the client libraries handle session pooling internally — Spanner uses “sessions” rather than raw connections; configure the session pool size based on expected query concurrency

Avoid opening a new database connection on every HTTP request in serverless environments. Connection pooling or the connector libraries are mandatory for production Cloud SQL workloads on Cloud Run or Cloud Functions.