GCP — NoSQL and Big Data Databases

NOSQL

Firestore, Bigtable, Memorystore, and BigQuery — GCP's NoSQL and analytical database services and their appropriate use cases.

gcpgoogle-cloudfirestorebigtablememorystorebigquerynosql

Overview

GCP offers multiple NoSQL and analytical database services, each optimised for a distinct data model and access pattern. The challenge is not learning any single service in isolation — it is understanding which service fits which problem, and why the others would be wrong choices. Picking the wrong data store is one of the costliest architectural mistakes to correct after the fact.

The four primary options discussed here are:


Cloud Firestore

What It Is

Firestore is GCP’s document-model NoSQL database. Data is organized into collections of documents, where each document is a JSON-like map of fields and values. Documents can contain nested maps and arrays, and collections can contain sub-collections of documents — enabling a hierarchical data model.

Firestore is the successor to Cloud Datastore, and the two share conceptual DNA but diverged significantly in capability.

Native Mode vs Datastore Mode

When you create a Firestore database, you choose a mode at creation time. This choice is permanent — you cannot switch modes after creation, and the two modes cannot coexist in the same GCP project.

FeatureNative ModeDatastore Mode
Data modelCollections and documentsEntities and kinds
Real-time listenersYes — push updates to clients on data changeNo
Offline supportYes — SDKs cache data for offline useNo
ACID transactionsMulti-document, multi-collectionLimited
Target use caseMobile apps, web apps, client-side SDKsServer-side applications; legacy Datastore migration
PricingPer-document read/write/deletePer-entity read/write/delete

Choose Native mode for all new projects. Datastore mode exists primarily for organisations migrating existing Datastore workloads without rewriting application logic.

Real-Time Listeners

Native mode Firestore supports real-time listeners — a client can subscribe to a document or query, and the SDK delivers updates to the client within milliseconds whenever the underlying data changes. This is fundamentally different from a polling model. Chat applications, collaborative editors, live dashboards, and gaming leaderboards are natural fits.

Real-time listeners work via a persistent WebSocket-like connection from the client SDK to Firestore. The server pushes diffs (only changed fields) when documents are updated, keeping bandwidth usage low.

Offline Support

The Firestore mobile SDKs (iOS, Android) and the web SDK maintain a local cache of recently accessed data. When the device is offline, reads are served from this cache. Writes are queued and replayed when connectivity is restored. Conflict resolution is handled by the SDK automatically.

Security Rules

Firestore Native mode supports Firebase Security Rules — a declarative rules language evaluated server-side for all client SDK requests. Rules allow data-level access control without deploying backend code:

rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {
    match /users/{userId} {
      allow read, write: if request.auth != null && request.auth.uid == userId;
    }
  }
}

Rules are not applied to Admin SDK or Cloud Functions accessing Firestore server-side — only to client SDK requests. This makes Firestore viable for mobile apps that read and write directly, without a backend API layer.

Pricing

Firestore pricing is per-operation: you pay per document read, per document write, and per document delete. Storage is priced per GB. There is no concept of instance size or reserved capacity — you pay for exactly what you use.


Cloud Bigtable

What It Is

Cloud Bigtable is a wide-column NoSQL database designed for massive-scale, low-latency workloads. It is the same technology that powers Google Search indexing, Google Maps, and Gmail. The managed service makes that technology available with a compatible HBase API, meaning applications written for Apache HBase can migrate to Bigtable with minimal changes.

Scale: Bigtable can handle petabytes of data across millions of rows, with sub-10ms read/write latency even at that scale. This is a fundamentally different operating point than any relational database.

Data Model

Bigtable organises data as a massive sorted map:

The full address of a value is: table → row key → column family → column qualifier → timestamp → value.

Unlike a relational database, Bigtable tables are very wide and very sparse — most cells in a row are empty. You do not define a fixed schema of columns; columns are created implicitly when you write to them.

Row Key Design is Everything

Bigtable is sorted by row key. All lookups are either:

  1. An exact row key lookup (single row read)
  2. A row key prefix scan (range read across sorted row keys)

There are no secondary indexes. If your application needs to look up data by any attribute other than the row key, you either design the row key to encode that attribute, or you maintain a separate lookup table.

Critical pitfall: monotonically increasing row keys (like sequential integers or ISO timestamps in ascending order) cause hotspotting — all recent writes go to the same tablet (partition), overwhelming one node while others sit idle. Solutions:

Good row key design is the single most important decision when building a Bigtable schema.

Performance and Scaling

Bigtable performance scales linearly with the number of nodes (instances). Adding a node increases throughput proportionally. A single node provides approximately 10,000 reads/second or 10,000 writes/second at sub-10ms latency (SSD storage).

Storage types:

Bigtable also supports replication across up to 8 clusters in different regions. Replication is asynchronous with eventual consistency (no external consistency like Spanner). Multiple clusters enable geographic distribution of read traffic and provide availability during regional failures.

Use Cases

Bigtable is the right choice when:

It is the wrong choice when:


Cloud Memorystore

Memorystore is GCP’s managed in-memory caching service. It eliminates the operational burden of running self-managed Redis or Memcached clusters — patching, failover, scaling, and backups are handled by Google.

Memorystore for Redis

Redis is a data structure server — it stores data in memory and supports a rich set of data types: strings, hashes, lists, sets, sorted sets, streams, and more. Common use cases: session caching, rate limiting, leaderboards, pub/sub message queuing, and distributed locks.

FeatureDetail
Max instance size300 GB
Max throughputUp to 12 Gbps
HAPrimary + replica with automatic failover
PersistenceRDB snapshots to Cloud Storage (optional)
Read replicasSupported for Redis 7.0+ (distribute read load)
ProtocolStandard Redis protocol — existing Redis clients work unchanged

HA Memorystore instances have a primary node and at least one replica. Failover is automatic and typically completes in under 1 minute. The connection endpoint does not change.

Memorystore for Memcached

Memcached is a simpler distributed memory object cache — it supports only string values and has no persistence, no replication, and no pub/sub. Its strength is horizontal scalability: add nodes to increase the total cache capacity without any downtime.

FeatureDetail
Max cluster size5 TB across 1–20 nodes
PersistenceNone — data is lost on node restart
ProtocolStandard Memcached protocol
HANone — nodes are independent; node loss means cache miss, not service outage

Memcached is appropriate for simple, stateless caching scenarios where cache misses are acceptable (the application falls back to the database). If you need persistence, pub/sub, sorted data structures, or HA, choose Redis.


The Full Comparison: When to Use What

This comparison covers the full spectrum of GCP data stores, including the SQL options from the previous article, to give a complete picture.

ServiceData ModelScaleLatencyStrong ConsistencyBest Use Case
Cloud SQLRelational (MySQL/PG/SQL Server)Vertical, up to ~96 vCPUMillisecondsYes (within instance)Standard OLTP, web apps, regional services
Cloud SpannerRelational (NewSQL)Horizontal (global)MillisecondsYes (global)Global OLTP, financial systems, global inventory
FirestoreDocument (JSON)Serverless, auto-scaleMillisecondsYes (per-document)Mobile apps, web apps, real-time sync
Cloud BigtableWide-columnPetabyte-scaleSub-10msEventual (across clusters)Time-series, IoT, clickstream, HBase migration
Memorystore (Redis)In-memory key-valueUp to 300 GBSub-millisecondYes (single instance)Session cache, rate limiting, leaderboards
Memorystore (Memcached)In-memory key-valueUp to 5 TBSub-millisecondNoSimple distributed caching
BigQueryColumnar (analytics)Serverless, petabyte-scaleSecondsYes (for reads)Analytics, reporting, data warehouse

Firebase Realtime Database vs Firestore

Both are Firebase products and both are NoSQL databases accessible from mobile clients without a backend. They are distinct services that still coexist in Firebase.

DimensionFirebase Realtime DatabaseCloud Firestore
Data modelOne large JSON treeCollections and documents
QueryingLimited — queries on one property at a timeRich — multiple field filters, ordering
Offline supportYesYes
Real-time syncYesYes
TransactionsYes (single-location)Multi-document ACID
Scale~100,000 concurrent connections per databaseNo practical connection limit (serverless)
RegionsLimited to specific regionsMulti-region available
PricingPer data downloaded + storagePer document operation

For new projects, Cloud Firestore is the recommended choice. Firebase Realtime Database predates Firestore and remains in active use for its simpler JSON tree model, lower cost for certain streaming patterns, and 100,000 concurrent connection guarantee per database instance. Existing Realtime Database applications do not need to migrate unless they have hit its query or scale limitations.