GCP — Managed SQL Databases

CLOUD-SQL

Cloud SQL and Cloud Spanner — GCP's relational database offerings, their architectures, replication models, and when to choose each.

gcpgoogle-cloudcloud-sqlcloud-spannerdatabasespostgresqlmysql

Overview

GCP offers two distinct managed relational database services: Cloud SQL for traditional single-region relational workloads, and Cloud Spanner for globally distributed relational workloads requiring horizontal scale and strong consistency. A third option, AlloyDB, targets PostgreSQL workloads that have outgrown Cloud SQL but don’t need the global distribution of Spanner. Understanding the architectural differences between these services — and what drives the cost differential — is essential for making the right selection for a given workload.


Cloud SQL

Cloud SQL is GCP’s managed relational database service supporting three engines: MySQL, PostgreSQL, and SQL Server. “Managed” means Google handles patching, backups, failover, and storage auto-growth. You retain control over schema, query optimization, connection management, and configuration parameters.

Instance Specifications

Cloud SQL instances scale vertically — you select a machine type and can resize it (with a brief restart). Maximum instance sizes are substantial:

High Availability

Cloud SQL HA operates on an active/standby model within a single region:

HA doubles the instance cost because two instances run simultaneously (primary + standby). HA does not provide cross-region protection — it covers zone failures only.

Read Replicas

Read replicas serve SELECT queries to offload read traffic from the primary. They use asynchronous replication, which means there is a small replication lag — applications that write and immediately read may see stale data.

Replica TypeLocationPromotion
In-region replicaSame region as primaryCan be promoted to standalone instance
Cross-region replicaDifferent regionCan be promoted for cross-region DR
Cascading replicaReplica of a replicaReduces primary load further

Cross-region replicas are the mechanism for Cloud SQL disaster recovery across regions. They do not fail over automatically — a DBA must manually promote the cross-region replica if the primary region fails. This is a deliberate design choice; automatic cross-region failover would risk split-brain scenarios.

Automated Backups and Point-in-Time Recovery

Cloud SQL takes automated daily backups during a configurable maintenance window and retains them for a configurable period (1–365 days). Beyond point-in-time snapshots, Cloud SQL supports Point-in-Time Recovery (PITR): restore the database to any second within the binary log retention window (up to 7 days). PITR requires binary logging to be enabled (it is for MySQL/PostgreSQL HA instances by default).

Maintenance Windows

Google applies patches to Cloud SQL instances during a configurable weekly maintenance window (day of week + hour). Patches may cause a brief restart. Best practice: set the maintenance window to off-peak hours and ensure the application handles brief disconnections gracefully.

Cloud SQL Auth Proxy

Direct TCP connections to Cloud SQL require either a public IP (with authorized networks) or a private IP within the same VPC. The Cloud SQL Auth Proxy is a binary that runs alongside your application and handles:

The proxy listens on 127.0.0.1:<port> and forwards to the Cloud SQL instance. Applications connect to localhost as if Cloud SQL were a local database. This is the recommended connection method for all non-VPC environments and for GKE workloads.


Cloud Spanner

Cloud Spanner is a fundamentally different kind of relational database. It is horizontally scalable — meaning you add capacity by adding nodes (processing units), and it automatically reshards data across them. It maintains a SQL interface with strong ACID semantics and strong consistency across all nodes globally. This combination — horizontal scale + relational model + global strong consistency — is what makes Spanner unusual.

TrueTime and External Consistency

The foundational technology that enables Spanner’s global consistency is TrueTime: a globally synchronized clock built on GPS receivers and atomic clocks in every Google data center. TrueTime provides a bounded uncertainty interval for the current time — rather than a single timestamp, each TrueTime call returns a range [earliest, latest] within which the true time lies.

Spanner uses TrueTime to assign commit timestamps to transactions. Before committing, Spanner waits for the TrueTime uncertainty interval to pass, ensuring that no two transactions in different regions can be assigned overlapping timestamps. The result is external consistency (a stronger form of serializability): if transaction T1 commits before T2 starts in wall-clock time, T2 is guaranteed to see T1’s writes — even if T1 committed in Iowa and T2 started in Singapore.

This eliminates the clock skew and coordination overhead that makes global strong consistency impractical in conventional distributed databases.

Nodes and Instances

Spanner capacity is purchased in processing units (1,000 processing units = 1 node). A single node provides roughly:

Processing units are divisible: you can start with 100 processing units (0.1 node) for small workloads, then scale incrementally without downtime.

Instance Configurations

ConfigurationSLAReplica CountUse Case
Regional99.99%3 read-write replicas across 3 zonesLow latency for single-region workloads
Multi-region99.999%Replicas in multiple regions + read-only replicasGlobal applications, highest availability

Multi-region configurations include “leader” regions (where writes are committed) and “non-leader” regions (which maintain read-write replicas and serve reads locally). Read latency from a non-leader region is local; write latency includes a round trip to the leader.

Interleaved Tables

Spanner stores related parent-child data together physically through table interleaving. By declaring a child table as interleaved in a parent table, rows from the child table with the same parent key are stored adjacent to the parent row on disk.

CREATE TABLE Orders (
  CustomerId INT64 NOT NULL,
  OrderId INT64 NOT NULL,
  OrderDate DATE
) PRIMARY KEY (CustomerId, OrderId),
  INTERLEAVE IN PARENT Customers ON DELETE CASCADE;

Without interleaving, querying a customer’s orders would require fetching rows from two separate tablet ranges, potentially on different servers. With interleaving, the data co-locates, and the join is local. Interleaving is the Spanner-specific optimization for one-to-many relationships.

Secondary Indexes

Spanner supports secondary indexes, but they work differently from traditional databases: each secondary index is stored as a separate interleaved table internally. Index writes are executed as distributed transactions, adding latency compared to indexes in traditional databases. For this reason, index design is more impactful in Spanner — avoid over-indexing.


Cloud SQL vs Cloud Spanner — Decision Criteria

DimensionCloud SQLCloud Spanner
EngineMySQL, PostgreSQL, SQL ServerSpanner SQL (ANSI 2011 superset)
Scale modelVertical (resize instance)Horizontal (add nodes/processing units)
Multi-region HANo (HA is single-region only)Yes (built-in, part of the product)
Consistency modelStrong within instanceStrong globally (external consistency)
Write throughput ceilingSingle primary limits writesScales linearly with nodes
CostLow to mediumHigh (minimum billing, per-node pricing)
Best forStandard OLTP workloads, web apps, microservices with regional scopeGlobal OLTP requiring no downtime, financial ledgers, inventory systems with global users

The core decision: if you can keep the database in one region and vertical scale is sufficient, Cloud SQL is almost always the right choice — it is far cheaper, more familiar, and operationally simpler. If your write volume exceeds what a single Cloud SQL primary can handle, or if you genuinely need strong consistency across multiple regions without application-level conflict resolution, Spanner justifies its cost.


AlloyDB for PostgreSQL

AlloyDB is GCP’s newest managed database offering — a PostgreSQL-compatible database built on a custom storage engine optimized for Google’s infrastructure. Key characteristics:

AlloyDB occupies the space between Cloud SQL and Cloud Spanner: more capable and higher performance than Cloud SQL, but regional (not globally distributed) and significantly cheaper than Spanner. It is the appropriate choice when you have a demanding PostgreSQL workload that has hit Cloud SQL limits but does not require Spanner’s global distribution.


Connection Management Best Practices

All three services (Cloud SQL, Spanner, AlloyDB) benefit from connection pooling. Database connections are expensive to establish — each connection has memory overhead on the database server.

Avoid opening a new database connection on every HTTP request in serverless environments. Connection pooling or the connector libraries are mandatory for production Cloud SQL workloads on Cloud Run or Cloud Functions.