When a bulk write operation fails midway through 10,000 documents, your response strategy depends entirely on whether the operation was ordered or unordered. Ordered operations stop at the first error, potentially leaving a partially written batch. Unordered operations continue processing remaining documents even after errors, providing a partial result. MongoDB drivers provide specific exception types like MongoBulkWriteException or BulkWriteCommandException that contain detailed error information and partial results, enabling you to identify exactly which documents failed and why .
By default, bulkWrite() performs ordered operations, executing documents serially in the order provided. If an error occurs during an ordered bulk write, MongoDB returns without processing any remaining operations. For unordered operations (set ordered: false), MongoDB can execute operations in parallel and continues processing remaining operations even if some fail. This fundamental difference determines how you handle failures—ordered operations require restarting from the failure point, while unordered operations give you a partial result with failed operations identified by index.
For ordered operations, the first error stops execution, leaving a partial write. The simplest recovery approach is to split your 10,000 documents into smaller batches (e.g., 500-1000 documents per batch). This way, if a batch fails, only that batch needs retry logic, and you can track success across batches. After identifying the failing document, you can isolate and fix the issue (like a duplicate key violation or validation error), then retry the remainder of the batch starting from that point.
With unordered operations, MongoDB continues processing even after errors. The driver throws a bulk write exception that contains a partial result object. This partial result includes counts of successfully processed operations (insertedCount, modifiedCount, deletedCount) and an array of write errors with the index of each failed operation. Your recovery strategy should: capture the failed operation indices from the exception, extract the corresponding documents from your original array, fix any data issues (like duplicate keys or schema violations), and retry only those failed operations in a new bulk write.
If you need true atomicity—either all 10,000 documents succeed or none do—you cannot rely on bulk write alone. Bulk write operations are not atomic across multiple documents; each individual write is atomic, but the batch as a whole is not. For all-or-nothing semantics across the entire batch, you must use multi-document transactions. However, transactions have performance overhead and are generally not recommended for batches this large. The practical approach is to design for idempotency and implement retry logic with partial results [citation:9].
Batch splitting: Break 10,000 documents into smaller batches (500-1000) to limit the impact of failures and simplify retry logic [citation:3].
Pre-validation: Validate documents against schema requirements and check for duplicate keys before executing the bulk write to reduce failure rates.
Idempotent operations: Design operations to be idempotent, allowing safe retries without causing duplicate data.
Index analysis: Ensure unique indexes are correctly configured and that your data won't violate them, as duplicate key errors (code 11000) are the most common bulk write failures [citation:3][citation:10].