5th of 11 Questions.

What is a 'Rollback' in the context of a replica set, and how do you minimize its occurrence?

A rollback is the process of reverting write operations on a former primary when it rejoins a replica set after a failover, ensuring data consistency by discarding un-replicated writes.

In a MongoDB replica set, a rollback is an automatic data-recovery mechanism that occurs when a former primary node rejoins the set after a failover. It reverts write operations that were accepted by the old primary but not successfully replicated to any secondary before the primary stepped down. This process ensures that all nodes in the replica set maintain a consistent view of the data. Rollbacks are designed to be rare events, and when they do occur, they are often the result of network partitions or secondary nodes that cannot keep up with the primary's write throughput.

Detecting and Reading Rollback Data

The root cause of a rollback is the timing gap between when a primary acknowledges a write to the client and when that write is replicated to secondary nodes. With default write concern { w: 1 }, MongoDB only acknowledges the write after it's committed on the primary, regardless of replication status. If the primary fails before secondaries can replicate that write, the data exists only on the failed primary. When that node later rejoins the set, it must roll back those un-replicated writes to align with the new primary's data state.

The most effective way to minimize or prevent rollbacks is to use stronger write concerns that ensure writes are replicated to multiple nodes before acknowledgment.

1. Use Majority Write Concern

Configure write concern { w: "majority" } to ensure writes are propagated to a majority of voting nodes before acknowledgment
Since MongoDB 5.0, { w: "majority" } is the default write concern for most deployments
This guarantees that even if the primary fails, the write is safely stored on other nodes and cannot be rolled back

2. Enable Journaling

Run all voting members with journaling enabled to provide crash recovery
The writeConcernMajorityJournalDefault setting controls whether majority writes wait for on-disk journaling
Setting this to false makes majority writes vulnerable to rollback if a majority of nodes crash and restart

3. Ensure Secondaries Can Keep Up

Monitor replication lag to prevent secondaries from falling behind
High write throughput that outpaces secondary replication increases both the likelihood and impact of rollbacks
Consider upgrading hardware or adding more secondaries to distribute read load

4. Handle Priority Configuration Carefully

Be aware that higher-priority primaries can trigger rollbacks when they rejoin after failure
If a higher-priority node reconnects and immediately attempts to become primary, it may force rollbacks of writes accepted by the current primary
Test failover scenarios with your specific priority settings

Question Loading...