Seven Databases in Seven Weeks - A Guide to Modern Database Systems

Sun Jan 26 2025

A comprehensive exploration of seven different database systems: PostgreSQL, Redis, MongoDB, CouchDB, Neo4j, HBase, and Riak. Understanding What, Why, How, and When to use each database for your projects.

Seven Databases in Seven Weeks - A Guide to Modern Database Systems

In the modern software development landscape, choosing the right database is crucial for building scalable, maintainable systems. “Seven Databases in Seven Weeks” by Eric Redmond and Jim R. Wilson provides an excellent journey through different database paradigms, each optimized for specific use cases.

This guide explores seven diverse database systems, examining What they are, Why they exist, How they work, and When to use them. Understanding these databases will help you make informed decisions for your next project.

1. PostgreSQL - The Relational Powerhouse

What

PostgreSQL is a powerful, open-source relational database management system (RDBMS) that has been evolving for over 30 years. It’s known for its SQL compliance, ACID transactions, and extensive feature set including advanced data types, full-text search, and JSON support.

Why

PostgreSQL exists to provide a robust, feature-rich relational database that balances performance, reliability, and standards compliance. It’s designed for applications that require:

Data integrity: ACID transactions ensure consistency
Complex queries: Powerful SQL with joins, subqueries, and window functions
Extensibility: Custom functions, operators, and data types
Standards compliance: SQL standard adherence for portability

How

PostgreSQL uses a traditional relational model where data is organized into tables with rows and columns. It employs:

MVCC (Multi-Version Concurrency Control): Allows concurrent reads and writes without locking
Write-Ahead Logging (WAL): Ensures durability and enables point-in-time recovery
Query planner: Optimizes SQL queries for performance
Indexes: B-tree, hash, GIN, GiST, and BRIN indexes for fast lookups

-- Example: Creating a table with JSON support
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(255) UNIQUE,
    metadata JSONB,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Complex query with window functions
SELECT 
    name,
    email,
    ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) as row_num
FROM users;

When

Use PostgreSQL when:

✅ You need ACID transactions for financial or critical data
✅ Your data has clear relationships (users → orders → items)
✅ You require complex queries with joins and aggregations
✅ You need strong consistency guarantees
✅ You want mature tooling and extensive community support
✅ You’re building traditional web applications with structured data

Avoid PostgreSQL when:

❌ You need horizontal scaling across many nodes (consider sharding or alternatives)
❌ Your data is highly unstructured and changes frequently
❌ You need extremely high write throughput (millions of writes/second)
❌ Your queries are simple key-value lookups (Redis might be better)

2. Redis - The In-Memory Speed Demon

What

Redis (Remote Dictionary Server) is an in-memory data structure store that can be used as a database, cache, or message broker. It stores data in memory for ultra-fast access, with optional persistence to disk.

Why

Redis exists to solve performance problems where speed is critical. It’s designed for:

Caching: Store frequently accessed data in memory
Session storage: Fast user session management
Real-time analytics: Counters, leaderboards, and rate limiting
Pub/Sub messaging: Real-time communication between services
Queue management: Simple job queues and task processing

How

Redis stores data in memory using various data structures:

Strings: Simple key-value pairs
Hashes: Field-value maps (like objects)
Lists: Ordered collections
Sets: Unordered unique collections
Sorted Sets: Ordered sets with scores
Streams: Log-like data structure for messaging

# Example: Using Redis for caching and session management
SET user:123:session "active" EX 3600  # Expires in 1 hour
GET user:123:session

# Using sorted sets for leaderboards
ZADD leaderboard 1000 "player1"
ZADD leaderboard 950 "player2"
ZRANGE leaderboard 0 -1 WITHSCORES

# Pub/Sub messaging
PUBLISH notifications "User logged in"
SUBSCRIBE notifications

When

Use Redis when:

✅ You need sub-millisecond latency for data access
✅ You’re building a caching layer to reduce database load
✅ You need session storage for web applications
✅ You want real-time features (leaderboards, counters, rate limiting)
✅ You need pub/sub messaging between services
✅ You’re implementing distributed locks or coordination

Avoid Redis when:

❌ You need persistent storage as primary database (use with persistence carefully)
❌ Your dataset is too large for memory (consider Redis Cluster or alternatives)
❌ You need complex queries or relationships
❌ You require ACID transactions across multiple keys
❌ Your use case is simple file storage (overkill)

3. MongoDB - The Document Store Pioneer

What

MongoDB is a NoSQL document database that stores data in flexible, JSON-like documents called BSON (Binary JSON). It’s schema-less, allowing documents in the same collection to have different structures.

Why

MongoDB was created to address limitations of relational databases:

Flexible schemas: Adapt to changing data requirements without migrations
Horizontal scaling: Shard across multiple servers easily
Developer-friendly: Documents map naturally to programming language objects
Rapid development: No need to define schemas upfront
Rich queries: Powerful query language with indexing support

How

MongoDB organizes data into:

Databases: Top-level containers
Collections: Groups of documents (like tables)
Documents: BSON objects (like rows)
Fields: Key-value pairs within documents

// Example: Flexible document structure
// Document 1
{
  "_id": ObjectId("..."),
  "name": "John Doe",
  "email": "john@example.com",
  "address": {
    "street": "123 Main St",
    "city": "New York"
  },
  "tags": ["developer", "golang"]
}

// Document 2 - Different structure, same collection
{
  "_id": ObjectId("..."),
  "username": "jane_smith",
  "profile": {
    "bio": "Software engineer",
    "skills": ["JavaScript", "Python"]
  },
  "created_at": ISODate("2024-01-01")
}

// Querying with MongoDB
db.users.find({ "address.city": "New York" })
db.users.createIndex({ "email": 1 })  // Index for fast lookups

When

Use MongoDB when:

✅ Your data is document-oriented and semi-structured
✅ You need rapid schema evolution without migrations
✅ You’re building content management systems or catalogs
✅ You need horizontal scaling for large datasets
✅ Your queries are mostly single-document operations
✅ You want developer productivity with JSON-like structures

Avoid MongoDB when:

❌ You need complex joins across collections (limited support)
❌ You require ACID transactions across multiple documents (though multi-document transactions exist)
❌ Your data has strict relational integrity requirements
❌ You need complex analytical queries (PostgreSQL might be better)
❌ Your team lacks MongoDB expertise (learning curve exists)

4. CouchDB - The Distributed Document Database

What

CouchDB is a document-oriented NoSQL database that uses JSON for documents, JavaScript for MapReduce queries, and HTTP for an API. It’s designed for distributed systems with built-in replication and conflict resolution.

Why

CouchDB was built with distribution and offline-first capabilities in mind:

Master-master replication: Any node can accept writes
Offline-first: Works offline and syncs when online
Conflict resolution: Built-in mechanisms for handling conflicts
RESTful API: HTTP-based interface, no special drivers needed
Eventual consistency: Optimized for distributed, disconnected scenarios

How

CouchDB uses:

Documents: JSON documents stored in databases
Views: MapReduce functions for querying and indexing
Replication: Built-in replication protocol for syncing data
Conflict resolution: Automatic and manual conflict handling
Futon/Admin UI: Web-based administration interface

// Example: MapReduce views in CouchDB
// Map function
function(doc) {
  if (doc.type === 'user') {
    emit(doc.email, doc.name);
  }
}

// Reduce function (optional)
function(keys, values) {
  return values.length;  // Count documents
}

// Querying via HTTP
// GET /mydb/_design/users/_view/by_email?key="john@example.com"

When

Use CouchDB when:

✅ You need offline-first applications (mobile apps, distributed systems)
✅ You require master-master replication across data centers
✅ You’re building sync-enabled applications (like note-taking apps)
✅ You want RESTful API without custom drivers
✅ You need conflict resolution for distributed writes
✅ You’re building CouchApps (applications served from CouchDB)

Avoid CouchDB when:

❌ You need strong consistency guarantees
❌ You require complex queries or aggregations
❌ You need high write throughput (conflict resolution overhead)
❌ Your data has strict relational requirements
❌ You need real-time queries (views are eventually consistent)

5. Neo4j - The Graph Database

What

Neo4j is a native graph database that stores data in nodes (entities) and relationships (edges). It’s optimized for traversing relationships and understanding connections in data.

Why

Neo4j exists to solve problems where relationships are as important as the data itself:

Relationship queries: Find connections between entities efficiently
Graph algorithms: Path finding, centrality, community detection
Network analysis: Social networks, recommendation engines
Fraud detection: Identifying suspicious patterns and connections
Knowledge graphs: Representing complex domain knowledge

How

Neo4j uses:

Nodes: Entities with labels and properties
Relationships: Directed connections between nodes with types and properties
Cypher: Graph query language (like SQL for graphs)
Indexes: On node labels and relationship types
Traversal: Efficient graph walking algorithms

// Example: Creating nodes and relationships
CREATE (alice:Person {name: "Alice", age: 30})
CREATE (bob:Person {name: "Bob", age: 25})
CREATE (alice)-[:FRIENDS_WITH {since: 2020}]->(bob)
CREATE (alice)-[:WORKS_AT]->(company:Company {name: "Tech Corp"})

// Querying relationships
MATCH (person:Person)-[:FRIENDS_WITH]->(friend:Person)
WHERE person.name = "Alice"
RETURN friend.name

// Finding paths
MATCH path = (a:Person)-[:FRIENDS_WITH*2..4]->(b:Person)
WHERE a.name = "Alice" AND b.name = "Charlie"
RETURN path

When

Use Neo4j when:

✅ Your data is highly connected (social networks, knowledge graphs)
✅ You need to traverse relationships frequently
✅ You’re building recommendation engines (friend suggestions, product recommendations)
✅ You need fraud detection or pattern matching
✅ You’re modeling complex domain relationships (organizational charts, dependencies)
✅ You need graph algorithms (shortest path, centrality, clustering)

Avoid Neo4j when:

❌ Your data is mostly disconnected (simple CRUD operations)
❌ You need complex aggregations or analytical queries
❌ Your relationships are simple and few (relational DB might suffice)
❌ You need high write throughput for simple operations
❌ Your team lacks graph database expertise (learning curve)

6. HBase - The Columnar Big Data Store

What

HBase is a distributed, scalable, big data store modeled after Google’s Bigtable. It’s built on top of Hadoop and provides random, real-time read/write access to large datasets.

Why

HBase was created to handle massive scale data that doesn’t fit traditional databases:

Horizontal scaling: Add nodes to handle more data
Columnar storage: Efficient for sparse data and wide tables
Time-series data: Optimized for timestamped data
Big data analytics: Integrates with Hadoop ecosystem
High write throughput: Handles millions of writes per second

How

HBase organizes data in:

Tables: Collections of rows
Row keys: Unique identifiers (like primary keys)
Column families: Groups of columns stored together
Columns: Key-value pairs within column families
Versions: Multiple versions of cell values (timestamped)
Regions: Tables split into regions distributed across nodes

// Example: HBase operations
// Creating a table
HTableDescriptor table = new HTableDescriptor(TableName.valueOf("users"));
table.addFamily(new HColumnDescriptor("info"));
table.addFamily(new HColumnDescriptor("contact"));
admin.createTable(table);

// Writing data
Put put = new Put(Bytes.toBytes("user123"));
put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("John"));
put.addColumn(Bytes.toBytes("contact"), Bytes.toBytes("email"), Bytes.toBytes("john@example.com"));
table.put(put);

// Reading data
Get get = new Get(Bytes.toBytes("user123"));
Result result = table.get(get);

When

Use HBase when:

✅ You have massive datasets (billions of rows, petabytes of data)
✅ You need horizontal scaling across hundreds of nodes
✅ You’re storing time-series data (IoT sensors, logs, metrics)
✅ Your data is sparse (many columns, few populated per row)
✅ You need high write throughput (millions of writes/second)
✅ You’re building big data analytics with Hadoop ecosystem

Avoid HBase when:

❌ Your dataset is small to medium (PostgreSQL or MongoDB might be better)
❌ You need ACID transactions across multiple rows
❌ You require complex queries or joins
❌ You need low-latency random reads (Cassandra might be better)
❌ Your team lacks Hadoop/HBase expertise (complex setup and operations)

7. Riak - The Distributed Key-Value Store

What

Riak is a distributed NoSQL key-value database designed for high availability, fault tolerance, and operational simplicity. It’s inspired by Amazon’s Dynamo paper and focuses on eventual consistency.

Why

Riak was built to provide:

High availability: System continues operating even with node failures
Fault tolerance: Automatic data replication and recovery
Operational simplicity: Easy to operate and scale
No single point of failure: Distributed architecture
Flexible consistency: Tunable consistency levels per operation

How

Riak uses:

Buckets: Namespaces for organizing keys
Keys: Unique identifiers for values
Values: Arbitrary data (JSON, binary, text)
Vector clocks: For conflict resolution in distributed systems
Consistent hashing: For data distribution across nodes
Riak Search: Full-text search capabilities

# Example: Riak operations via HTTP
# Storing data
curl -X PUT http://localhost:8098/buckets/users/keys/user123 \
  -H "Content-Type: application/json" \
  -d '{"name": "John Doe", "email": "john@example.com"}'

# Retrieving data
curl http://localhost:8098/buckets/users/keys/user123

# Using links (relationships)
curl -X PUT http://localhost:8098/buckets/users/keys/user123 \
  -H "Link: </buckets/orders/keys/order456>; riaktag=\"has_order\"" \
  -d '{"name": "John Doe"}'

When

Use Riak when:

✅ You need high availability and fault tolerance
✅ You’re building distributed systems across multiple data centers
✅ You need simple key-value operations at scale
✅ You want operational simplicity (easy to add/remove nodes)
✅ You can tolerate eventual consistency
✅ You’re storing session data or user preferences

Avoid Riak when:

❌ You need strong consistency guarantees
❌ You require complex queries or relationships
❌ You need ACID transactions
❌ Your use case is simple caching (Redis is faster)
❌ You need real-time analytics or aggregations
❌ Your data requires strict schema validation

Choosing the Right Database

The key to selecting the right database is understanding your requirements:

Decision Matrix

Requirement	Best Choice	Alternatives
ACID transactions	PostgreSQL	MongoDB (limited)
Sub-millisecond latency	Redis	Memcached
Flexible schemas	MongoDB	CouchDB
Offline-first	CouchDB	PouchDB (browser)
Graph relationships	Neo4j	ArangoDB, Amazon Neptune
Massive scale (PB)	HBase	Cassandra, Bigtable
High availability	Riak	Cassandra, DynamoDB
Complex SQL queries	PostgreSQL	MySQL, SQL Server

Hybrid Approaches

Many modern applications use multiple databases for different purposes:

PostgreSQL for primary transactional data
Redis for caching and session storage
MongoDB for content management
Neo4j for relationship analysis
Elasticsearch for full-text search

This polyglot persistence approach allows you to use the right tool for each job.

Key Takeaways

No one-size-fits-all: Each database is optimized for specific use cases
Understand your requirements: Data structure, scale, consistency needs
Consider trade-offs: Consistency vs. availability, performance vs. features
Start simple: PostgreSQL or MongoDB can handle most applications initially
Scale when needed: Don’t over-engineer; add specialized databases as requirements grow
Learn the paradigms: Understanding different data models makes you a better architect

Conclusion

The seven databases we’ve explored represent different approaches to data storage, each with unique strengths. PostgreSQL excels at relational data, Redis at speed, MongoDB at flexibility, CouchDB at distribution, Neo4j at relationships, HBase at scale, and Riak at availability.

The best database choice depends on your specific requirements: data structure, scale, consistency needs, and team expertise. Often, the best solution is a combination of databases, each handling what it does best.

As you build systems, remember: the database is a tool, not a constraint. Choose wisely, but don’t be afraid to evolve your architecture as your needs change.

Seven Databases in Seven Weeks - A Guide to Modern Database Systems

1. PostgreSQL - The Relational Powerhouse

What

Why

How

When

2. Redis - The In-Memory Speed Demon

What

Why

How

When

3. MongoDB - The Document Store Pioneer

What

Why

How

When

4. CouchDB - The Distributed Document Database

What

Why

How

When

5. Neo4j - The Graph Database

What

Why

How

When

6. HBase - The Columnar Big Data Store

What

Why

How

When

7. Riak - The Distributed Key-Value Store

What

Why

How

When

Choosing the Right Database

Decision Matrix

Hybrid Approaches

Key Takeaways

Conclusion

Further Reading