Build a Conversational RAG chain using LangChain's create_history_aware_retriever and create_retrieval_chain, which reformulate follow-up questions into standalone queries using the chat history and then retrieve relevant documents to generate context-aware answers.
A Conversational RAG chain enhances basic RAG by maintaining chat history and reformulating follow-up questions into standalone queries. The process involves three steps: First, it uses a query-condensing prompt to convert the latest question and chat history into a self-contained query. Second, it retrieves relevant documents using this reformulated query. Third, it passes the retrieved documents and the full conversation history to the LLM to generate a coherent answer. LangChain provides create_history_aware_retriever for the reformulation step and create_retrieval_chain for the final QA.[reference:2][reference:3]
Store chat history in a session-based database (e.g., Redis, PostgreSQL) for multi-turn conversations.
Use a sliding window to keep only recent messages, preventing context window overflow.
Limit the number of retrieved documents (e.g., k=3–5) to reduce noise and token usage.
Implement a fallback for empty retrieval to avoid hallucination (e.g., "I don't have enough information to answer that.")