You trim messages by intelligently discarding older, less relevant messages while preserving the system prompt and the most recent turns, using LangChain's trim_messages utility that can keep message counts or token budgets and optionally summarize instead of dropping.
LangChain provides a built-in trim_messages helper (from langchain_core.messages.utils) that implements various trimming strategies. You can trim by a fixed number of messages, by token count, or keep only the most recent turns while always preserving the system message. You can also configure whether to include or exclude the last message. For token-based trimming, you need a tokenizer (e.g., using tiktoken). To avoid losing context, you can optionally enable summarization for trimmed messages, replacing them with a condensed summary.
For production agents, you may want to implement a sliding window memory that keeps the last N turns while summarizing older turns. LangChain's trim_messages with strategy='last' and allow_partial=False is the simplest approach. For token-based trimming, you must provide a token_counter function that returns the number of tokens for a list of messages.