Toolscalculator

RAG Context Window Calculator

Quickly estimate whether your chunking strategy fits a model context window while leaving room for system prompts and user queries.

Try the tool

client runner

Chunking recommendation

Run the tool to see output.

Examples

Balanced setup for a mid-size context model

{
  "model_context_tokens": 128000,
  "reserved_tokens": 8000,
  "chunk_size_tokens": 1000,
  "chunk_overlap_percent": 15,
  "top_k": 8
}

Expected output

{"retrieval_tokens":6800,"fits_context":true,"headroom_tokens":113200,"note":"Good baseline; consider top_k 6-8 for latency."}

Over-budget retrieval configuration

{
  "model_context_tokens": 32000,
  "reserved_tokens": 6000,
  "chunk_size_tokens": 2000,
  "chunk_overlap_percent": 30,
  "top_k": 20
}

Expected output

{"retrieval_tokens":28000,"fits_context":false,"overflow_tokens":2000,"note":"Reduce top_k or chunk size."}

How it works

The tool estimates effective tokens per chunk after overlap, multiplies by top_k, and compares the total against available context after reserved prompt budget. It highlights headroom or overflow so you can tune chunk size and top_k before indexing large corpora.

Related tools