Nagy Ervin - Software Development Portfolio & Blog

From Infrastructure to Implementation

In the previous article, we covered the infrastructure side of MongoDB CDC with Debezium - the oplog retention problem, why Atlas shared tiers fail, and how to configure a production-ready connector. Now let's focus on what happens after events land in Kafka: how to consume them reliably and build read-optimized projections.

The goal is simple: react to changes in your authoritative data store and maintain derived views for specific query patterns. But "simple" doesn't mean "easy" - you'll need to handle duplicates, failures, and recovery scenarios.

Event Handling Pattern

Once events flow through Kafka, your consumers need to handle them reliably. The core pattern is straightforward:

Algorithm 1 ProcessCDCEvent

Require: $event$ : Debezium CDC event from Kafka

Ensure: Updated search index state

1: $op \gets event.operation$

2: $key \gets event.documentKey$

3:if $op \in \{CREATE, UPDATE\}$ then

4: $data \gets event.payload.after$

5:Upsert( $index, key, data$ )

6:else if $op = DELETE$ then

7:Delete( $index, key$ )

8:end if

9:Commit( $event.offset$ )

10:return $SUCCESS$

Key Patterns

Operation filtering: Debezium events include an op field - c (create), u (update), d (delete)
Auto-commit: Commit offset after successful processing; on failure, send to dead letter queue
Idempotency: Use upsert operations - handlers must tolerate duplicate events (Kafka at-least-once delivery)

Building Read Models

The search service consumes CDC events to maintain a read-optimized index. Here's the pattern using Redis with RediSearch:

Algorithm 2 BuildSearchIndex

Require: $event$ : CDC event with user data

1: $userId \gets event.key$

2: $userData \gets event.payload.after$

3:HSET( $\text{"user:"} + userId$ , $\{userId, username\}$ ) // RediSearch auto-indexes

4:if $event.op = DELETE$ then

5:DEL( $\text{"user:"} + userId$ )

6:end if

The key insight: RediSearch watches hash keys with a matching prefix and automatically maintains the full-text index. Your CDC handler just needs to write/delete hashes.

Recovery Mechanism: Batch Rebuild

Even with reliable oplog retention, you need a fallback for scenarios like Redis data loss or index corruption. Implement a batch rebuild endpoint:

Algorithm 3 RebuildSearchIndex

Require: $batchSize$ : number of records per batch

Require: $offset$ : starting position

Ensure: Rebuilt search index

1: $users \gets$ FetchFromSource( $batchSize, offset$ ) // Bypasses CDC

2:EnsureIndexExists()

3:for $user$ in $users$ do

4:Upsert( $index, user.id, user$ )

5:end for

6:return $\{processed: |users|\}$

This runs parallel to CDC processing, allowing index recovery without stopping the real-time pipeline.

Data Ownership and Cross-Service CQRS

It's critical that the single source of truth and authoritative owner of the data remains the user service. In this architecture, we follow the database-per-microservice pattern. You could argue that search isn't a separate domain from user service - however, search has shared concerns across many services (posts, products, users) and requires information from multiple systems to build a rich search experience.

This pattern can be viewed as a form of cross-service CQRS: separating read and write models where the write model lives in the authoritative service (user service with MongoDB) and the read model lives in a specialized query service (search service with Redis). The CDC pipeline is the synchronization mechanism between these models.

Recovery as a First-Class Concern

Since we're dealing with data replication, things can and will go wrong. Data can become stale, out of sync, or faulty. Your system must be recoverable from the beginning. Design for idempotent recovery paths - the ability to rebuild any downstream read model from scratch without corrupting state or losing data.

The batch rebuild API shown above is just one example. In a later post, I'll cover how to make this architecture fully fault-tolerant: essentially treating downstream services as throwaway systems that can be recreated or recovered from an arbitrary state to a consistent state at any time.

Failure Modes and Recovery

Scenario	Behavior	Recovery
Debezium connector down	CDC pauses, resume token retained	Restart connector, process backlog
Oplog rotation (token lost)	`snapshot.mode: when_needed` triggers	Automatic re-snapshot (Atlas M30+ prevents this)
Search service down	Events accumulate in Kafka	Restart service, consume from last offset
Redis index corrupted	Search API fails	Trigger batch rebuild API
Handler throws error	Message sent to DLQ	Fix code, replay DLQ

Conclusion

Building CDC consumers is fundamentally about handling uncertainty gracefully. Events may arrive out of order, duplicates will happen, and your downstream stores can fail at any time.

The patterns covered here - idempotent upserts, operation-based routing, dead letter queues, and batch rebuild APIs - form a foundation for reliable event processing. The key insight is treating your read models as disposable: if you can rebuild them from scratch at any time, you're resilient to almost any failure mode.

Combined with the infrastructure setup from Part 1, you now have a complete picture of production CDC with MongoDB, Kafka, and Debezium.