RAG

1. What is RAG and why is it preferred over fine-tuning for domain-specific knowledge in production applications?

Level: Expert | Frequency: High

2. What are the core components of a RAG pipeline in LangChain — Document Loaders, Text Splitters, Embeddings, Vector Stores, Retrievers, and Chains?

Level: Expert | Frequency: High

3. What is the difference between semantic search and keyword search and why does RAG rely on semantic similarity?

Level: Expert | Frequency: High

4. What is an Embedding in the context of RAG — what does it represent and why is cosine similarity used to compare them?

Level: Expert | Frequency: High

5. What is a Vector Store and how does it differ from a traditional relational or document database?

Level: Expert | Frequency: High

6. What is the difference between a Retriever and a Vector Store in LangChain — why is the abstraction separation important?

Level: Expert | Frequency: High

7. What is a Document object in LangChain — what are pageContent and metadata fields and why does metadata matter in RAG?

Level: Expert | Frequency: High

8. What are Document Loaders in LangChain and how do you choose the right loader for PDFs, web pages, Notion, Google Drive, or SQL databases?

Level: Expert | Frequency: High

9. What is the difference between RecursiveCharacterTextSplitter and CharacterTextSplitter — when would you use one over the other?

Level: Expert | Frequency: High

10. What is chunk size and chunk overlap in text splitting — how do you decide the right values for your use case?

Level: Expert | Frequency: High

11. How do you handle structured documents like tables, code blocks, or markdown files during the splitting phase to avoid breaking semantic meaning?

Level: Expert | Frequency: High

12. How do you load and split documents lazily (streaming) to handle very large files without running out of memory?

Level: Expert | Frequency: High

13. What is a SemanticChunker and how does it differ from fixed-size character-based splitting?

Level: Expert | Frequency: High

14. How do you preserve and propagate source metadata (filename, page number, URL, timestamp) through the loading and splitting pipeline?

Level: Expert | Frequency: High

15. How do you choose the right embedding model — what tradeoffs exist between OpenAI embeddings, Cohere, HuggingFace, and local models like nomic-embed?

Level: Expert | Frequency: High

16. What is the difference between dense embeddings and sparse embeddings (BM25) — when would you combine both in a hybrid search?

Level: Expert | Frequency: High

17. How do you handle embedding model upgrades in production — what happens to your existing vectors when you switch models?

Level: Expert | Frequency: High

18. How do you efficiently batch embed a large corpus of documents without hitting rate limits or memory constraints?

Level: Expert | Frequency: High

19. What are the tradeoffs between vector stores like Pinecone, Weaviate, Chroma, pgvector, and FAISS — how do you choose for production?

Level: Expert | Frequency: High

20. How do you implement namespace or tenant isolation in a vector store for a multi-tenant RAG application?

Level: Expert | Frequency: High

21. How do you handle incremental updates to a vector store — adding, updating, and deleting documents without full re-indexing?

Level: Expert | Frequency: High

22. What is HNSW indexing and why does it make approximate nearest neighbor search fast at scale?

Level: Expert | Frequency: High

23. What is a similarity score threshold in retrieval and how do you use it to filter out low-confidence results?

Level: Expert | Frequency: High

24. What is MMR (Maximal Marginal Relevance) retrieval and how does it balance relevance with diversity of results?

Level: Expert | Frequency: High

25. What is a MultiQueryRetriever and how does it improve recall by generating multiple phrasings of the same question?

Level: Expert | Frequency: High

26. What is Contextual Compression in LangChain retrieval and how does it reduce noise in retrieved chunks?

Level: Expert | Frequency: High

27. What is a ParentDocumentRetriever — how does it index small chunks but return larger parent chunks to the LLM?

Level: Expert | Frequency: High

28. What is HyDE (Hypothetical Document Embedding) and how does it improve retrieval for vague or abstract queries?

Level: Expert | Frequency: High

29. What is Self-Query Retrieval and how does it allow the LLM to generate structured metadata filters alongside the semantic query?

Level: Expert | Frequency: High

30. How do you implement hybrid search combining dense vector search with BM25 keyword search using EnsembleRetriever?

Level: Expert | Frequency: High

31. What is a Re-ranker (cross-encoder) and where does it fit in the RAG pipeline after initial retrieval?

Level: Expert | Frequency: High

32. What is the difference between Stuff, MapReduce, Refine, and MapRerank document chain strategies — when do you use each?

Level: Expert | Frequency: High

33. How do you build a Conversational RAG chain that maintains chat history and reformulates follow-up questions into standalone queries?

Level: Expert | Frequency: High

34. What is query decomposition and how do you break a complex multi-part question into sub-queries for better retrieval?

Level: Expert | Frequency: High

35. How do you implement Step-Back Prompting in a RAG pipeline to improve retrieval for highly specific questions?

Level: Expert | Frequency: High

36. What is CRAG (Corrective RAG) and how does it add a grading step to decide whether retrieved docs are relevant before answering?

Level: Expert | Frequency: High

37. What is Self-RAG and how does the LLM decide when to retrieve, whether retrieved docs are relevant, and whether the answer is grounded?

Level: Expert | Frequency: High

38. How do you implement a fallback strategy when retrieval returns no relevant documents — how do you avoid hallucination in this case?

Level: Expert | Frequency: High

39. How do you implement RAG evaluation — what metrics like faithfulness, answer relevancy, and context recall do you measure using RAGAS?

Level: Expert | Frequency: High

40. How do you detect and mitigate hallucination in RAG outputs — what role does citation and source grounding play?

Level: Expert | Frequency: High

41. How do you build a citation system that maps each sentence in the LLM's answer back to the exact source chunk it came from?

Level: Expert | Frequency: High

42. How do you handle multilingual RAG — embedding and retrieving documents in multiple languages for a global user base?

Level: Expert | Frequency: High

43. How do you optimize retrieval latency in production — what caching, pre-fetching, or index optimization strategies do you apply?

Level: Expert | Frequency: High

44. How do you implement access control at the retrieval layer — ensuring users only retrieve documents they are authorized to see?

Level: Expert | Frequency: High

45. How do you handle long context RAG — when retrieved chunks exceed the LLM's context window, what strategies do you apply?

Level: Expert | Frequency: High

46. How do you design a RAG pipeline with LangGraph — turning retrieval, grading, and generation into discrete stateful graph nodes?

Level: Expert | Frequency: High

All Topics

BasicsAgentsModelsMessagesToolsMiddlewareMemoryMCPMulti AgentsRAGContext EngineeringHuman in the loop

Question Loading...