Context Window: The Working Memory of AI Models Explained

Context Window: The maximum token amount an AI model can process at once. Claude offers 200,000 tokens. Tips on Auto-Compact & context management.

Category:AI & Machine Learning

The context window refers to the maximum amount of text that a large language model (LLM) can consider simultaneously in a single processing pass. It is the AI's working memory – everything within this limit is "visible" to the model and can inform the response. Everything outside it simply does not exist for the model at that moment.

The size of the context window is measured in tokens. A token roughly corresponds to a word or word fragment – approximately 0.75 words per token in English. A page of normal text contains around 750 tokens. The contents of the context window encompass everything: system instructions, the conversation history so far, loaded files, results of tool calls and the actual user input.

The Development of Context Window Sizes

Context window sizes have changed dramatically since the early GPT models. This development is one of the central technical advances that first made professional Agentic Coding possible:

Early Models (2020–2022): 4,000–8,000 Tokens

  • GPT-3 (2020): 4,096 tokens – approximately 5–6 pages of text
  • GPT-3.5 Turbo (2022): 4,096 tokens, later 16,384 tokens
  • Limitation: Only small code files or short conversations possible
  • Consequence: AI assistants would regularly "forget" earlier parts of a session

Middle Phase (2023): 32,000–100,000 Tokens

  • GPT-4 (2023): 8,192 tokens, later 32,768 tokens
  • Claude 2 (2023): 100,000 tokens – a breakthrough
  • Possibility: Entire modules and multiple files readable simultaneously
  • First Agentic Coding workflows emerge

Current Generation (2024–2025): 200,000+ Tokens

  • Claude 3 / Claude 3.5 Sonnet: 200,000 tokens
  • GPT-4o: 128,000 tokens
  • Gemini 1.5 Pro: up to 1 million tokens
  • Possibility: Large parts of a codebase, complete books, extensive documentation

200,000 tokens correspond to approximately 150,000 words or the contents of 500–600 average source code files. This means an AI assistant today can survey substantially larger portions of a real software project at once.

Why the Context Window Is Critical for Agentic Coding

The Problem of Small Context Windows

When the context window is too small for the task, typical problems arise:

  • Forgetting instructions: Rules from the start of the session are no longer active
  • Inconsistent changes: The AI does not know all relevant files and produces contradictory code
  • Lost dependencies: Interfaces, types and function signatures from other files are unknown
  • Truncated responses: The model cannot generate a complete response because the context is full

What a Large Context Window Enables

With a sufficient context window, an AI assistant can consider the entire relevant environment for coding tasks:

  • The complete CLAUDE.md with all project rules is always active
  • Multiple related files are readable simultaneously (frontend component + backend route + data model)
  • The full conversation history remains available
  • Results of multiple tool calls can be considered simultaneously
  • Long error messages, logs and stack traces fit completely into context

How Tokens Are Counted

Not all content uses the same number of tokens:

  • Normal text (English): ~0.75 words per token
  • Source code: Tokenized efficiently, as many characters have their own tokens
  • JSON/YAML: Structured data often uses more tokens than the pure information content might suggest
  • Images (multimodal models): 85–1,500+ tokens per image depending on resolution

Input Tokens vs. Output Tokens

The context window refers to input tokens – everything the model receives as input. The maximum response length (output tokens) is a separate limit that varies by model and plan:

  • Input tokens: System prompt + chat history + loaded files + current request
  • Output tokens: The generated response (typically 4,000–8,192 token maximum)
  • Total limit: Input + output must not exceed the context window

Context Window Management in Practice

Auto-Compact (Claude Code)

Claude Code features automatic context management. When the token limit of a session is reached, the auto-compact mechanism intelligently compresses the previous conversation history:

  • Older, less relevant parts are summarized
  • Critical information (project rules, current task, important decisions) is retained
  • The session can continue seamlessly without manual intervention
  • No knowledge from the CLAUDE.md is lost as it is always re-included

Session Handover

For very long work phases, an explicit session handover may be more appropriate than auto-compact:

  • The current state is documented in a structured file
  • The new session reads this file first
  • The context starts "clean" without compressed remnants
  • Good for complex projects with many parallel tasks

Selective Context Loading

Good context engineering loads only relevant files into context:

  • For a frontend feature: only the affected Vue components and the associated service
  • For an API change: route, controller, service and data model – not the entire backend
  • For debugging: the error message, the affected file and direct dependencies

Context Window and Response Quality

Lost in the Middle

Research shows that large language models process information at the beginning and end of the context window better than information in the middle. This phenomenon is called "Lost in the Middle". In practice this means:

  • Important instructions should be at the start (system prompt)
  • The current task should be clearly formulated at the end (current user turn)
  • Less critical reference data can be placed in the middle

Context Relevance vs. Context Size

More context is not always better. A very large context window filled with irrelevant information can reduce the quality of responses. Optimal context engineering means: loading the right context, not the largest context.

Agentic Coding Workshop: Using the Context Window Effectively

In the Agentic Coding Workshop at elasticbrains you will learn how to use the context window optimally:

  • Context engineering: which files to load into context and when
  • Structuring CLAUDE.md so that important rules are always active
  • Using auto-compact and session handover effectively
  • Optimizing project structure for better AI collaboration
  • Practical examples from real projects with 200,000-token sessions

Further Resources

Agentic Coding Workshop

Learn this topic hands-on in our workshop - with real projects and experienced trainers.

View Workshop

More Glossary Terms