asdasd

9th of 46 Questions.

What is the difference between RecursiveCharacterTextSplitter and CharacterTextSplitter — when would you use one over the other?

RecursiveCharacterTextSplitter intelligently splits text by trying separators in a defined order to preserve natural boundaries, while CharacterTextSplitter splits strictly by a single separator character or sequence, often breaking words or paragraphs arbitrarily.

CharacterTextSplitter is the simpler of the two. It splits text based on a single user-provided separator (default "\n\n"). It's straightforward but can be blunt: if a chunk exceeds the size limit, it will cut at the character boundary, potentially splitting words or sentences. RecursiveCharacterTextSplitter is smarter and more versatile. It uses a list of separators (["\n\n", "\n", " ", ""]) and recursively applies them, meaning it first tries to split at double newlines (paragraphs), then single newlines, then spaces, and finally characters. This preserves semantic units (paragraphs, sentences) as much as possible, leading to better context retention for the LLM.

Example: `CharacterTextSplitter` vs `RecursiveCharacterTextSplitter`

As a rule of thumb, use RecursiveCharacterTextSplitter for most general text splitting tasks. It's the most widely used splitter in LangChain because it balances simplicity with intelligence. Use CharacterTextSplitter only when you're certain your text has a consistent, simple separator (like double newlines for paragraphs) and you're not concerned about token boundaries. The RecursiveCharacterTextSplitter is also preferred for code splitting because it can be configured with language-specific separators to preserve syntax.

Question Loading...

asdasd

9th of 46 Questions.

What is the difference between RecursiveCharacterTextSplitter and CharacterTextSplitter — when would you use one over the other?

Example: `CharacterTextSplitter` vs `RecursiveCharacterTextSplitter`