A re-ranker (cross‑encoder) is a more accurate but slower model that reorders initial retrieval results, placed after the first retrieval stage to refine relevance while keeping total latency acceptable.
Vector retrieval (bi‑encoder) is fast but less accurate. A cross‑encoder jointly encodes the query and a document, producing a precise relevance score, but is too slow to run on all documents in a corpus. The typical pipeline retrieves a larger candidate set (e.g., 50–100 documents) with a fast retriever, then reranks the top candidates with a cross‑encoder to get the most relevant ones.
You need higher precision for critical applications (e.g., legal search, medical QA).
Your initial retrieval returns many moderately relevant documents; reranking picks the best few.
You have enough latency budget to afford a cross‑encoder on a small candidate set.