Overview
GCP offers multiple NoSQL and analytical database services, each optimised for a distinct data model and access pattern. The challenge is not learning any single service in isolation — it is understanding which service fits which problem, and why the others would be wrong choices. Picking the wrong data store is one of the costliest architectural mistakes to correct after the fact.
The four primary options discussed here are:
- Firestore — document NoSQL, mobile/web-first
- Cloud Bigtable — wide-column NoSQL, petabyte-scale, low-latency operational workloads
- Memorystore — managed in-memory caching (Redis and Memcached)
- BigQuery — serverless columnar data warehouse for analytics
Cloud Firestore
What It Is
Firestore is GCP’s document-model NoSQL database. Data is organized into collections of documents, where each document is a JSON-like map of fields and values. Documents can contain nested maps and arrays, and collections can contain sub-collections of documents — enabling a hierarchical data model.
Firestore is the successor to Cloud Datastore, and the two share conceptual DNA but diverged significantly in capability.
Native Mode vs Datastore Mode
When you create a Firestore database, you choose a mode at creation time. This choice is permanent — you cannot switch modes after creation, and the two modes cannot coexist in the same GCP project.
| Feature | Native Mode | Datastore Mode |
|---|---|---|
| Data model | Collections and documents | Entities and kinds |
| Real-time listeners | Yes — push updates to clients on data change | No |
| Offline support | Yes — SDKs cache data for offline use | No |
| ACID transactions | Multi-document, multi-collection | Limited |
| Target use case | Mobile apps, web apps, client-side SDKs | Server-side applications; legacy Datastore migration |
| Pricing | Per-document read/write/delete | Per-entity read/write/delete |
Choose Native mode for all new projects. Datastore mode exists primarily for organisations migrating existing Datastore workloads without rewriting application logic.
Real-Time Listeners
Native mode Firestore supports real-time listeners — a client can subscribe to a document or query, and the SDK delivers updates to the client within milliseconds whenever the underlying data changes. This is fundamentally different from a polling model. Chat applications, collaborative editors, live dashboards, and gaming leaderboards are natural fits.
Real-time listeners work via a persistent WebSocket-like connection from the client SDK to Firestore. The server pushes diffs (only changed fields) when documents are updated, keeping bandwidth usage low.
Offline Support
The Firestore mobile SDKs (iOS, Android) and the web SDK maintain a local cache of recently accessed data. When the device is offline, reads are served from this cache. Writes are queued and replayed when connectivity is restored. Conflict resolution is handled by the SDK automatically.
Security Rules
Firestore Native mode supports Firebase Security Rules — a declarative rules language evaluated server-side for all client SDK requests. Rules allow data-level access control without deploying backend code:
rules_version = '2';
service cloud.firestore {
match /databases/{database}/documents {
match /users/{userId} {
allow read, write: if request.auth != null && request.auth.uid == userId;
}
}
}
Rules are not applied to Admin SDK or Cloud Functions accessing Firestore server-side — only to client SDK requests. This makes Firestore viable for mobile apps that read and write directly, without a backend API layer.
Pricing
Firestore pricing is per-operation: you pay per document read, per document write, and per document delete. Storage is priced per GB. There is no concept of instance size or reserved capacity — you pay for exactly what you use.
Cloud Bigtable
What It Is
Cloud Bigtable is a wide-column NoSQL database designed for massive-scale, low-latency workloads. It is the same technology that powers Google Search indexing, Google Maps, and Gmail. The managed service makes that technology available with a compatible HBase API, meaning applications written for Apache HBase can migrate to Bigtable with minimal changes.
Scale: Bigtable can handle petabytes of data across millions of rows, with sub-10ms read/write latency even at that scale. This is a fundamentally different operating point than any relational database.
Data Model
Bigtable organises data as a massive sorted map:
- Table: A collection of rows
- Row: Identified by a single row key (a byte string, up to 4 KB)
- Column family: A group of related columns; defined at table creation time; each family has its own compaction settings
- Column qualifier: The column name within a family; defined at write time (not at table creation)
- Cell: The intersection of row + column family + column qualifier; each cell stores the value plus a timestamp; multiple timestamped versions of a cell can be stored
The full address of a value is: table → row key → column family → column qualifier → timestamp → value.
Unlike a relational database, Bigtable tables are very wide and very sparse — most cells in a row are empty. You do not define a fixed schema of columns; columns are created implicitly when you write to them.
Row Key Design is Everything
Bigtable is sorted by row key. All lookups are either:
- An exact row key lookup (single row read)
- A row key prefix scan (range read across sorted row keys)
There are no secondary indexes. If your application needs to look up data by any attribute other than the row key, you either design the row key to encode that attribute, or you maintain a separate lookup table.
Critical pitfall: monotonically increasing row keys (like sequential integers or ISO timestamps in ascending order) cause hotspotting — all recent writes go to the same tablet (partition), overwhelming one node while others sit idle. Solutions:
- Reversed timestamps:
Long.MAX_VALUE - timestamp— most recent data sorts first; scans naturally hit recent data without hot spots - Hashed prefix: MD5 or SHA prefix of the natural key distributes writes randomly across tablets
- Composite keys: Combine multiple attributes (
sensor_id#reversed_timestamp) to distribute by entity while keeping time-ordered scans per entity
Good row key design is the single most important decision when building a Bigtable schema.
Performance and Scaling
Bigtable performance scales linearly with the number of nodes (instances). Adding a node increases throughput proportionally. A single node provides approximately 10,000 reads/second or 10,000 writes/second at sub-10ms latency (SSD storage).
Storage types:
- SSD (default): Low latency (~6ms p99), higher cost
- HDD: Higher latency (~200ms p99), significantly lower cost; appropriate for batch read workloads where latency is not critical
Bigtable also supports replication across up to 8 clusters in different regions. Replication is asynchronous with eventual consistency (no external consistency like Spanner). Multiple clusters enable geographic distribution of read traffic and provide availability during regional failures.
Use Cases
Bigtable is the right choice when:
- You need petabyte-scale storage with single-digit millisecond latency
- Your access pattern is primarily single row key lookups or key range scans
- Your data is naturally structured as time-series (IoT telemetry, stock prices, network flow data) or event logs (clickstream, application events)
- You need the HBase API for application compatibility
It is the wrong choice when:
- You need secondary indexes or ad-hoc query flexibility
- Your data is relational with complex joins
- Your dataset is under 1 TB (Cloud SQL or Firestore is more cost-effective)
Cloud Memorystore
Memorystore is GCP’s managed in-memory caching service. It eliminates the operational burden of running self-managed Redis or Memcached clusters — patching, failover, scaling, and backups are handled by Google.
Memorystore for Redis
Redis is a data structure server — it stores data in memory and supports a rich set of data types: strings, hashes, lists, sets, sorted sets, streams, and more. Common use cases: session caching, rate limiting, leaderboards, pub/sub message queuing, and distributed locks.
| Feature | Detail |
|---|---|
| Max instance size | 300 GB |
| Max throughput | Up to 12 Gbps |
| HA | Primary + replica with automatic failover |
| Persistence | RDB snapshots to Cloud Storage (optional) |
| Read replicas | Supported for Redis 7.0+ (distribute read load) |
| Protocol | Standard Redis protocol — existing Redis clients work unchanged |
HA Memorystore instances have a primary node and at least one replica. Failover is automatic and typically completes in under 1 minute. The connection endpoint does not change.
Memorystore for Memcached
Memcached is a simpler distributed memory object cache — it supports only string values and has no persistence, no replication, and no pub/sub. Its strength is horizontal scalability: add nodes to increase the total cache capacity without any downtime.
| Feature | Detail |
|---|---|
| Max cluster size | 5 TB across 1–20 nodes |
| Persistence | None — data is lost on node restart |
| Protocol | Standard Memcached protocol |
| HA | None — nodes are independent; node loss means cache miss, not service outage |
Memcached is appropriate for simple, stateless caching scenarios where cache misses are acceptable (the application falls back to the database). If you need persistence, pub/sub, sorted data structures, or HA, choose Redis.
The Full Comparison: When to Use What
This comparison covers the full spectrum of GCP data stores, including the SQL options from the previous article, to give a complete picture.
| Service | Data Model | Scale | Latency | Strong Consistency | Best Use Case |
|---|---|---|---|---|---|
| Cloud SQL | Relational (MySQL/PG/SQL Server) | Vertical, up to ~96 vCPU | Milliseconds | Yes (within instance) | Standard OLTP, web apps, regional services |
| Cloud Spanner | Relational (NewSQL) | Horizontal (global) | Milliseconds | Yes (global) | Global OLTP, financial systems, global inventory |
| Firestore | Document (JSON) | Serverless, auto-scale | Milliseconds | Yes (per-document) | Mobile apps, web apps, real-time sync |
| Cloud Bigtable | Wide-column | Petabyte-scale | Sub-10ms | Eventual (across clusters) | Time-series, IoT, clickstream, HBase migration |
| Memorystore (Redis) | In-memory key-value | Up to 300 GB | Sub-millisecond | Yes (single instance) | Session cache, rate limiting, leaderboards |
| Memorystore (Memcached) | In-memory key-value | Up to 5 TB | Sub-millisecond | No | Simple distributed caching |
| BigQuery | Columnar (analytics) | Serverless, petabyte-scale | Seconds | Yes (for reads) | Analytics, reporting, data warehouse |
Firebase Realtime Database vs Firestore
Both are Firebase products and both are NoSQL databases accessible from mobile clients without a backend. They are distinct services that still coexist in Firebase.
| Dimension | Firebase Realtime Database | Cloud Firestore |
|---|---|---|
| Data model | One large JSON tree | Collections and documents |
| Querying | Limited — queries on one property at a time | Rich — multiple field filters, ordering |
| Offline support | Yes | Yes |
| Real-time sync | Yes | Yes |
| Transactions | Yes (single-location) | Multi-document ACID |
| Scale | ~100,000 concurrent connections per database | No practical connection limit (serverless) |
| Regions | Limited to specific regions | Multi-region available |
| Pricing | Per data downloaded + storage | Per document operation |
For new projects, Cloud Firestore is the recommended choice. Firebase Realtime Database predates Firestore and remains in active use for its simpler JSON tree model, lower cost for certain streaming patterns, and 100,000 concurrent connection guarantee per database instance. Existing Realtime Database applications do not need to migrate unless they have hit its query or scale limitations.