Tools•calculator
RAG Context Window Calculator
Quickly estimate whether your chunking strategy fits a model context window while leaving room for system prompts and user queries.
Try the tool
client runnerChunking recommendation
Run the tool to see output.
Examples
Balanced setup for a mid-size context model
{
"model_context_tokens": 128000,
"reserved_tokens": 8000,
"chunk_size_tokens": 1000,
"chunk_overlap_percent": 15,
"top_k": 8
}Expected output
{"retrieval_tokens":6800,"fits_context":true,"headroom_tokens":113200,"note":"Good baseline; consider top_k 6-8 for latency."}Over-budget retrieval configuration
{
"model_context_tokens": 32000,
"reserved_tokens": 6000,
"chunk_size_tokens": 2000,
"chunk_overlap_percent": 30,
"top_k": 20
}Expected output
{"retrieval_tokens":28000,"fits_context":false,"overflow_tokens":2000,"note":"Reduce top_k or chunk size."}How it works
The tool estimates effective tokens per chunk after overlap, multiplies by top_k, and compares the total against available context after reserved prompt budget. It highlights headroom or overflow so you can tune chunk size and top_k before indexing large corpora.