Use lazy_load() instead of load() to stream documents from a loader as a generator, processing each document one at a time, and chain it with a splitter using split_documents to avoid loading all data into memory at once.
LangChain provides a lazy_load() method on document loaders that returns a generator (iterator) rather than loading all documents into memory at once. This is crucial for large files or large numbers of documents. You can then iterate over the generator, and for each document, apply your splitter using split_documents. This creates a streaming pipeline where only one document chunk is held in memory at a time. Some loaders, like the database loaders, also support lazy_load() natively.
This pattern is essential for handling very large documents (e.g., multi-gigabyte log files) or directories with thousands of files. Without lazy loading, the entire dataset would be loaded into memory, leading to memory exhaustion and slow performance. The lazy_load() method is implemented by all LangChain loaders that support streaming (most do). If a loader doesn't implement it, you can create a custom loader by subclassing BaseLoader and implementing the lazy_load generator method.