asdasd

25th of 32 Questions.

How do you handle token counting per message — accounting for role overhead, tool schemas, and system prompt tokens — to accurately predict context usage?

Use tokenizers provided by LLM providers (e.g., tiktoken for OpenAI) to count tokens per message, summing the token counts of content, role overhead, tool definitions, and system prompt. LangChain's get_num_tokens_from_messages utility can help.

Token counting for messages requires more than just counting content tokens because each message has a role overhead (e.g., a few tokens for {"role": "user"}). Tool definitions (schemas) also consume tokens. LangChain provides a helper get_num_tokens_from_messages (in langchain_core.messages.utils) that takes a list of messages and returns the total token count using the appropriate tokenizer based on the model name. For accurate counting, you need to specify the model name (e.g., "gpt-4o") so the tokenizer matches. For custom tool schemas, you can use the same tokenizer to count the JSON string of the tool definition.

Counting Tokens for Messages and Tools

For production, you can also use LangSmith to see the exact token usage per request. It provides usage_metadata in the run output, showing prompt_tokens and completion_tokens, which includes all overhead.

https://python.langchain.com/docs/how_to/chat_token_usage_tracking/

Question Loading...

asdasd

25th of 32 Questions.

How do you handle token counting per message — accounting for role overhead, tool schemas, and system prompt tokens — to accurately predict context usage?

Counting Tokens for Messages and Tools

https://python.langchain.com/docs/how_to/chat_token_usage_tracking/