Implement fallback strategies: set a similarity threshold to detect empty retrieval, use web search as secondary source, or have the LLM respond with "I don't know" to prevent hallucination.
When retrieval returns no relevant documents, the LLM is at risk of hallucinating an answer. To prevent this, implement a fallback workflow: first, check if any document's similarity score exceeds a predefined threshold. If none does, you can either use a secondary retriever (e.g., a web search API), or directly instruct the LLM to return a safe message like "I don't have enough information to answer that." This preserves user trust and avoids generating false information.[reference:12][reference:13]
Similarity Threshold: Set a minimum relevance score; reject documents below it.
Confidence Scoring: Use a small LLM grader to evaluate document relevance.
Web Search Fallback: Integrate a search API (Tavily, Bing) to fetch external data when internal retrieval is empty.
Parametric Knowledge: Allow the LLM to use its training data only when explicitly instructed, but with a strong warning.
Re‑generation with Different Query: Rewrite the query and try retrieval again (e.g., using MultiQueryRetriever).