Handle long context RAG using truncation (head+tail preserves beginnings and endings), query-aware chunk compression (SmartChunk predicts optimal abstraction level), or hierarchical summarization (chunk summaries → section rollups → document summary).
When retrieved chunks exceed the LLM's context window, apply multi-strategy compression. Head+tail truncation preserves the beginning and end of each chunk, which contain the most critical information. SmartChunk retrieval uses a planner to predict the optimal chunk abstraction level for each query, producing high-level embeddings without repeated summarization. For very long documents, hierarchical summarization creates local summaries per chunk, consolidates into section-level rollups, and finally generates a document summary, passing only the most relevant level to the LLM. The REFRAG framework compresses low-relevance chunks while preserving core content, achieving up to 30x speedup and 16x context extension.
Head+tail truncation: Preserve beginning and end, truncate middle section
SmartChunk retrieval: Query-adaptive framework predicting optimal abstraction level
Hierarchical summarization: Chunk summaries → section rollups → final summary
REFRAG framework: Compress low-relevance chunks while preserving core content (30x speedup)