Seven Databases in Seven Weeks - A Guide to Modern Database Systems

A comprehensive exploration of seven different database systems: PostgreSQL, Redis, MongoDB, CouchDB, Neo4j, HBase, and Riak. Understanding What, Why, How, and When to use each database for your projects.

Seven Databases in Seven Weeks - A Guide to Modern Database Systems

In the modern software development landscape, choosing the right database is crucial for building scalable, maintainable systems. โ€œSeven Databases in Seven Weeksโ€ by Eric Redmond and Jim R. Wilson provides an excellent journey through different database paradigms, each optimized for specific use cases.

This guide explores seven diverse database systems, examining What they are, Why they exist, How they work, and When to use them. Understanding these databases will help you make informed decisions for your next project.

1. PostgreSQL - The Relational Powerhouse

What

PostgreSQL is a powerful, open-source relational database management system (RDBMS) that has been evolving for over 30 years. Itโ€™s known for its SQL compliance, ACID transactions, and extensive feature set including advanced data types, full-text search, and JSON support.

Why

PostgreSQL exists to provide a robust, feature-rich relational database that balances performance, reliability, and standards compliance. Itโ€™s designed for applications that require:

  • Data integrity: ACID transactions ensure consistency
  • Complex queries: Powerful SQL with joins, subqueries, and window functions
  • Extensibility: Custom functions, operators, and data types
  • Standards compliance: SQL standard adherence for portability

How

PostgreSQL uses a traditional relational model where data is organized into tables with rows and columns. It employs:

  • MVCC (Multi-Version Concurrency Control): Allows concurrent reads and writes without locking
  • Write-Ahead Logging (WAL): Ensures durability and enables point-in-time recovery
  • Query planner: Optimizes SQL queries for performance
  • Indexes: B-tree, hash, GIN, GiST, and BRIN indexes for fast lookups
-- Example: Creating a table with JSON support
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(255) UNIQUE,
    metadata JSONB,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Complex query with window functions
SELECT 
    name,
    email,
    ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) as row_num
FROM users;

When

Use PostgreSQL when:

  • โœ… You need ACID transactions for financial or critical data
  • โœ… Your data has clear relationships (users โ†’ orders โ†’ items)
  • โœ… You require complex queries with joins and aggregations
  • โœ… You need strong consistency guarantees
  • โœ… You want mature tooling and extensive community support
  • โœ… Youโ€™re building traditional web applications with structured data

Avoid PostgreSQL when:

  • โŒ You need horizontal scaling across many nodes (consider sharding or alternatives)
  • โŒ Your data is highly unstructured and changes frequently
  • โŒ You need extremely high write throughput (millions of writes/second)
  • โŒ Your queries are simple key-value lookups (Redis might be better)

2. Redis - The In-Memory Speed Demon

What

Redis (Remote Dictionary Server) is an in-memory data structure store that can be used as a database, cache, or message broker. It stores data in memory for ultra-fast access, with optional persistence to disk.

Why

Redis exists to solve performance problems where speed is critical. Itโ€™s designed for:

  • Caching: Store frequently accessed data in memory
  • Session storage: Fast user session management
  • Real-time analytics: Counters, leaderboards, and rate limiting
  • Pub/Sub messaging: Real-time communication between services
  • Queue management: Simple job queues and task processing

How

Redis stores data in memory using various data structures:

  • Strings: Simple key-value pairs
  • Hashes: Field-value maps (like objects)
  • Lists: Ordered collections
  • Sets: Unordered unique collections
  • Sorted Sets: Ordered sets with scores
  • Streams: Log-like data structure for messaging
# Example: Using Redis for caching and session management
SET user:123:session "active" EX 3600  # Expires in 1 hour
GET user:123:session

# Using sorted sets for leaderboards
ZADD leaderboard 1000 "player1"
ZADD leaderboard 950 "player2"
ZRANGE leaderboard 0 -1 WITHSCORES

# Pub/Sub messaging
PUBLISH notifications "User logged in"
SUBSCRIBE notifications

When

Use Redis when:

  • โœ… You need sub-millisecond latency for data access
  • โœ… Youโ€™re building a caching layer to reduce database load
  • โœ… You need session storage for web applications
  • โœ… You want real-time features (leaderboards, counters, rate limiting)
  • โœ… You need pub/sub messaging between services
  • โœ… Youโ€™re implementing distributed locks or coordination

Avoid Redis when:

  • โŒ You need persistent storage as primary database (use with persistence carefully)
  • โŒ Your dataset is too large for memory (consider Redis Cluster or alternatives)
  • โŒ You need complex queries or relationships
  • โŒ You require ACID transactions across multiple keys
  • โŒ Your use case is simple file storage (overkill)

3. MongoDB - The Document Store Pioneer

What

MongoDB is a NoSQL document database that stores data in flexible, JSON-like documents called BSON (Binary JSON). Itโ€™s schema-less, allowing documents in the same collection to have different structures.

Why

MongoDB was created to address limitations of relational databases:

  • Flexible schemas: Adapt to changing data requirements without migrations
  • Horizontal scaling: Shard across multiple servers easily
  • Developer-friendly: Documents map naturally to programming language objects
  • Rapid development: No need to define schemas upfront
  • Rich queries: Powerful query language with indexing support

How

MongoDB organizes data into:

  • Databases: Top-level containers
  • Collections: Groups of documents (like tables)
  • Documents: BSON objects (like rows)
  • Fields: Key-value pairs within documents
// Example: Flexible document structure
// Document 1
{
  "_id": ObjectId("..."),
  "name": "John Doe",
  "email": "john@example.com",
  "address": {
    "street": "123 Main St",
    "city": "New York"
  },
  "tags": ["developer", "golang"]
}

// Document 2 - Different structure, same collection
{
  "_id": ObjectId("..."),
  "username": "jane_smith",
  "profile": {
    "bio": "Software engineer",
    "skills": ["JavaScript", "Python"]
  },
  "created_at": ISODate("2024-01-01")
}

// Querying with MongoDB
db.users.find({ "address.city": "New York" })
db.users.createIndex({ "email": 1 })  // Index for fast lookups

When

Use MongoDB when:

  • โœ… Your data is document-oriented and semi-structured
  • โœ… You need rapid schema evolution without migrations
  • โœ… Youโ€™re building content management systems or catalogs
  • โœ… You need horizontal scaling for large datasets
  • โœ… Your queries are mostly single-document operations
  • โœ… You want developer productivity with JSON-like structures

Avoid MongoDB when:

  • โŒ You need complex joins across collections (limited support)
  • โŒ You require ACID transactions across multiple documents (though multi-document transactions exist)
  • โŒ Your data has strict relational integrity requirements
  • โŒ You need complex analytical queries (PostgreSQL might be better)
  • โŒ Your team lacks MongoDB expertise (learning curve exists)

4. CouchDB - The Distributed Document Database

What

CouchDB is a document-oriented NoSQL database that uses JSON for documents, JavaScript for MapReduce queries, and HTTP for an API. Itโ€™s designed for distributed systems with built-in replication and conflict resolution.

Why

CouchDB was built with distribution and offline-first capabilities in mind:

  • Master-master replication: Any node can accept writes
  • Offline-first: Works offline and syncs when online
  • Conflict resolution: Built-in mechanisms for handling conflicts
  • RESTful API: HTTP-based interface, no special drivers needed
  • Eventual consistency: Optimized for distributed, disconnected scenarios

How

CouchDB uses:

  • Documents: JSON documents stored in databases
  • Views: MapReduce functions for querying and indexing
  • Replication: Built-in replication protocol for syncing data
  • Conflict resolution: Automatic and manual conflict handling
  • Futon/Admin UI: Web-based administration interface
// Example: MapReduce views in CouchDB
// Map function
function(doc) {
  if (doc.type === 'user') {
    emit(doc.email, doc.name);
  }
}

// Reduce function (optional)
function(keys, values) {
  return values.length;  // Count documents
}

// Querying via HTTP
// GET /mydb/_design/users/_view/by_email?key="john@example.com"

When

Use CouchDB when:

  • โœ… You need offline-first applications (mobile apps, distributed systems)
  • โœ… You require master-master replication across data centers
  • โœ… Youโ€™re building sync-enabled applications (like note-taking apps)
  • โœ… You want RESTful API without custom drivers
  • โœ… You need conflict resolution for distributed writes
  • โœ… Youโ€™re building CouchApps (applications served from CouchDB)

Avoid CouchDB when:

  • โŒ You need strong consistency guarantees
  • โŒ You require complex queries or aggregations
  • โŒ You need high write throughput (conflict resolution overhead)
  • โŒ Your data has strict relational requirements
  • โŒ You need real-time queries (views are eventually consistent)

5. Neo4j - The Graph Database

What

Neo4j is a native graph database that stores data in nodes (entities) and relationships (edges). Itโ€™s optimized for traversing relationships and understanding connections in data.

Why

Neo4j exists to solve problems where relationships are as important as the data itself:

  • Relationship queries: Find connections between entities efficiently
  • Graph algorithms: Path finding, centrality, community detection
  • Network analysis: Social networks, recommendation engines
  • Fraud detection: Identifying suspicious patterns and connections
  • Knowledge graphs: Representing complex domain knowledge

How

Neo4j uses:

  • Nodes: Entities with labels and properties
  • Relationships: Directed connections between nodes with types and properties
  • Cypher: Graph query language (like SQL for graphs)
  • Indexes: On node labels and relationship types
  • Traversal: Efficient graph walking algorithms
// Example: Creating nodes and relationships
CREATE (alice:Person {name: "Alice", age: 30})
CREATE (bob:Person {name: "Bob", age: 25})
CREATE (alice)-[:FRIENDS_WITH {since: 2020}]->(bob)
CREATE (alice)-[:WORKS_AT]->(company:Company {name: "Tech Corp"})

// Querying relationships
MATCH (person:Person)-[:FRIENDS_WITH]->(friend:Person)
WHERE person.name = "Alice"
RETURN friend.name

// Finding paths
MATCH path = (a:Person)-[:FRIENDS_WITH*2..4]->(b:Person)
WHERE a.name = "Alice" AND b.name = "Charlie"
RETURN path

When

Use Neo4j when:

  • โœ… Your data is highly connected (social networks, knowledge graphs)
  • โœ… You need to traverse relationships frequently
  • โœ… Youโ€™re building recommendation engines (friend suggestions, product recommendations)
  • โœ… You need fraud detection or pattern matching
  • โœ… Youโ€™re modeling complex domain relationships (organizational charts, dependencies)
  • โœ… You need graph algorithms (shortest path, centrality, clustering)

Avoid Neo4j when:

  • โŒ Your data is mostly disconnected (simple CRUD operations)
  • โŒ You need complex aggregations or analytical queries
  • โŒ Your relationships are simple and few (relational DB might suffice)
  • โŒ You need high write throughput for simple operations
  • โŒ Your team lacks graph database expertise (learning curve)

6. HBase - The Columnar Big Data Store

What

HBase is a distributed, scalable, big data store modeled after Googleโ€™s Bigtable. Itโ€™s built on top of Hadoop and provides random, real-time read/write access to large datasets.

Why

HBase was created to handle massive scale data that doesnโ€™t fit traditional databases:

  • Horizontal scaling: Add nodes to handle more data
  • Columnar storage: Efficient for sparse data and wide tables
  • Time-series data: Optimized for timestamped data
  • Big data analytics: Integrates with Hadoop ecosystem
  • High write throughput: Handles millions of writes per second

How

HBase organizes data in:

  • Tables: Collections of rows
  • Row keys: Unique identifiers (like primary keys)
  • Column families: Groups of columns stored together
  • Columns: Key-value pairs within column families
  • Versions: Multiple versions of cell values (timestamped)
  • Regions: Tables split into regions distributed across nodes
// Example: HBase operations
// Creating a table
HTableDescriptor table = new HTableDescriptor(TableName.valueOf("users"));
table.addFamily(new HColumnDescriptor("info"));
table.addFamily(new HColumnDescriptor("contact"));
admin.createTable(table);

// Writing data
Put put = new Put(Bytes.toBytes("user123"));
put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("John"));
put.addColumn(Bytes.toBytes("contact"), Bytes.toBytes("email"), Bytes.toBytes("john@example.com"));
table.put(put);

// Reading data
Get get = new Get(Bytes.toBytes("user123"));
Result result = table.get(get);

When

Use HBase when:

  • โœ… You have massive datasets (billions of rows, petabytes of data)
  • โœ… You need horizontal scaling across hundreds of nodes
  • โœ… Youโ€™re storing time-series data (IoT sensors, logs, metrics)
  • โœ… Your data is sparse (many columns, few populated per row)
  • โœ… You need high write throughput (millions of writes/second)
  • โœ… Youโ€™re building big data analytics with Hadoop ecosystem

Avoid HBase when:

  • โŒ Your dataset is small to medium (PostgreSQL or MongoDB might be better)
  • โŒ You need ACID transactions across multiple rows
  • โŒ You require complex queries or joins
  • โŒ You need low-latency random reads (Cassandra might be better)
  • โŒ Your team lacks Hadoop/HBase expertise (complex setup and operations)

7. Riak - The Distributed Key-Value Store

What

Riak is a distributed NoSQL key-value database designed for high availability, fault tolerance, and operational simplicity. Itโ€™s inspired by Amazonโ€™s Dynamo paper and focuses on eventual consistency.

Why

Riak was built to provide:

  • High availability: System continues operating even with node failures
  • Fault tolerance: Automatic data replication and recovery
  • Operational simplicity: Easy to operate and scale
  • No single point of failure: Distributed architecture
  • Flexible consistency: Tunable consistency levels per operation

How

Riak uses:

  • Buckets: Namespaces for organizing keys
  • Keys: Unique identifiers for values
  • Values: Arbitrary data (JSON, binary, text)
  • Vector clocks: For conflict resolution in distributed systems
  • Consistent hashing: For data distribution across nodes
  • Riak Search: Full-text search capabilities
# Example: Riak operations via HTTP
# Storing data
curl -X PUT http://localhost:8098/buckets/users/keys/user123 \
  -H "Content-Type: application/json" \
  -d '{"name": "John Doe", "email": "john@example.com"}'

# Retrieving data
curl http://localhost:8098/buckets/users/keys/user123

# Using links (relationships)
curl -X PUT http://localhost:8098/buckets/users/keys/user123 \
  -H "Link: </buckets/orders/keys/order456>; riaktag=\"has_order\"" \
  -d '{"name": "John Doe"}'

When

Use Riak when:

  • โœ… You need high availability and fault tolerance
  • โœ… Youโ€™re building distributed systems across multiple data centers
  • โœ… You need simple key-value operations at scale
  • โœ… You want operational simplicity (easy to add/remove nodes)
  • โœ… You can tolerate eventual consistency
  • โœ… Youโ€™re storing session data or user preferences

Avoid Riak when:

  • โŒ You need strong consistency guarantees
  • โŒ You require complex queries or relationships
  • โŒ You need ACID transactions
  • โŒ Your use case is simple caching (Redis is faster)
  • โŒ You need real-time analytics or aggregations
  • โŒ Your data requires strict schema validation

Choosing the Right Database

The key to selecting the right database is understanding your requirements:

Decision Matrix

RequirementBest ChoiceAlternatives
ACID transactionsPostgreSQLMongoDB (limited)
Sub-millisecond latencyRedisMemcached
Flexible schemasMongoDBCouchDB
Offline-firstCouchDBPouchDB (browser)
Graph relationshipsNeo4jArangoDB, Amazon Neptune
Massive scale (PB)HBaseCassandra, Bigtable
High availabilityRiakCassandra, DynamoDB
Complex SQL queriesPostgreSQLMySQL, SQL Server

Hybrid Approaches

Many modern applications use multiple databases for different purposes:

  • PostgreSQL for primary transactional data
  • Redis for caching and session storage
  • MongoDB for content management
  • Neo4j for relationship analysis
  • Elasticsearch for full-text search

This polyglot persistence approach allows you to use the right tool for each job.

Key Takeaways

  1. No one-size-fits-all: Each database is optimized for specific use cases
  2. Understand your requirements: Data structure, scale, consistency needs
  3. Consider trade-offs: Consistency vs. availability, performance vs. features
  4. Start simple: PostgreSQL or MongoDB can handle most applications initially
  5. Scale when needed: Donโ€™t over-engineer; add specialized databases as requirements grow
  6. Learn the paradigms: Understanding different data models makes you a better architect

Conclusion

The seven databases weโ€™ve explored represent different approaches to data storage, each with unique strengths. PostgreSQL excels at relational data, Redis at speed, MongoDB at flexibility, CouchDB at distribution, Neo4j at relationships, HBase at scale, and Riak at availability.

The best database choice depends on your specific requirements: data structure, scale, consistency needs, and team expertise. Often, the best solution is a combination of databases, each handling what it does best.

As you build systems, remember: the database is a tool, not a constraint. Choose wisely, but donโ€™t be afraid to evolve your architecture as your needs change.


Further Reading

Book Reference: โ€œSeven Databases in Seven Weeksโ€ by Eric Redmond and Jim R. Wilson