Contextual Compression post-processes retrieved documents, extracting only the most relevant parts, reducing noise and token usage for downstream LLM processing.
Standard vector retrieval returns entire chunks, which often contain irrelevant or redundant information. Contextual Compression wraps a base retriever with a document compressor (e.g., LLM‑based extractor or a re‑ranker) that filters or compresses each document to keep only the parts most relevant to the query.
Reduces token usage (up to 50‑60% savings) by removing low‑signal passages.
Improves answer faithfulness by focusing only on relevant content.
Lowers LLM inference cost and latency.