Context Window: The Working Memory of AI Models Explained

Context Window: The maximum token amount an AI model can process at once. Claude offers 200,000 tokens. Tips on Auto-Compact & context management.

Category:AI & Machine Learning

The context window refers to the maximum amount of text that a large language model (LLM) can consider simultaneously in a single processing pass. It is the AI's working memory – everything within this limit is "visible" to the model and can inform the response. Everything outside it simply does not exist for the model at that moment.

The size of the context window is measured in tokens. A token roughly corresponds to a word or word fragment – approximately 0.75 words per token in English. A page of normal text contains around 750 tokens. The contents of the context window encompass everything: system instructions, the conversation history so far, loaded files, results of tool calls and the actual user input.

The Development of Context Window Sizes

Context window sizes have changed dramatically since the early GPT models. This development is one of the central technical advances that first made professional Agentic Coding possible:

Early Models (2020–2022): 4,000–8,000 Tokens

GPT-3 (2020): 4,096 tokens – approximately 5–6 pages of text
GPT-3.5 Turbo (2022): 4,096 tokens, later 16,384 tokens
Limitation: Only small code files or short conversations possible
Consequence: AI assistants would regularly "forget" earlier parts of a session

Middle Phase (2023): 32,000–100,000 Tokens

GPT-4 (2023): 8,192 tokens, later 32,768 tokens
Claude 2 (2023): 100,000 tokens – a breakthrough
Possibility: Entire modules and multiple files readable simultaneously
First Agentic Coding workflows emerge

Current Generation (2024–2025): 200,000+ Tokens

Claude 3 / Claude 3.5 Sonnet: 200,000 tokens
GPT-4o: 128,000 tokens
Gemini 1.5 Pro: up to 1 million tokens
Possibility: Large parts of a codebase, complete books, extensive documentation

200,000 tokens correspond to approximately 150,000 words or the contents of 500–600 average source code files. This means an AI assistant today can survey substantially larger portions of a real software project at once.

Why the Context Window Is Critical for Agentic Coding

The Problem of Small Context Windows

When the context window is too small for the task, typical problems arise:

Forgetting instructions: Rules from the start of the session are no longer active
Inconsistent changes: The AI does not know all relevant files and produces contradictory code
Lost dependencies: Interfaces, types and function signatures from other files are unknown
Truncated responses: The model cannot generate a complete response because the context is full

What a Large Context Window Enables

With a sufficient context window, an AI assistant can consider the entire relevant environment for coding tasks:

The complete CLAUDE.md with all project rules is always active
Multiple related files are readable simultaneously (frontend component + backend route + data model)
The full conversation history remains available
Results of multiple tool calls can be considered simultaneously
Long error messages, logs and stack traces fit completely into context

How Tokens Are Counted

Not all content uses the same number of tokens:

Normal text (English): ~0.75 words per token
Source code: Tokenized efficiently, as many characters have their own tokens
JSON/YAML: Structured data often uses more tokens than the pure information content might suggest
Images (multimodal models): 85–1,500+ tokens per image depending on resolution

Input Tokens vs. Output Tokens

The context window refers to input tokens – everything the model receives as input. The maximum response length (output tokens) is a separate limit that varies by model and plan:

Input tokens: System prompt + chat history + loaded files + current request
Output tokens: The generated response (typically 4,000–8,192 token maximum)
Total limit: Input + output must not exceed the context window

Context Window Management in Practice

Auto-Compact (Claude Code)

Claude Code features automatic context management. When the token limit of a session is reached, the auto-compact mechanism intelligently compresses the previous conversation history:

Older, less relevant parts are summarized
Critical information (project rules, current task, important decisions) is retained
The session can continue seamlessly without manual intervention
No knowledge from the CLAUDE.md is lost as it is always re-included

Session Handover

For very long work phases, an explicit session handover may be more appropriate than auto-compact:

The current state is documented in a structured file
The new session reads this file first
The context starts "clean" without compressed remnants
Good for complex projects with many parallel tasks

Selective Context Loading

Good context engineering loads only relevant files into context:

For a frontend feature: only the affected Vue components and the associated service
For an API change: route, controller, service and data model – not the entire backend
For debugging: the error message, the affected file and direct dependencies

Context Window and Response Quality

Lost in the Middle

Research shows that large language models process information at the beginning and end of the context window better than information in the middle. This phenomenon is called "Lost in the Middle". In practice this means:

Important instructions should be at the start (system prompt)
The current task should be clearly formulated at the end (current user turn)
Less critical reference data can be placed in the middle

Context Relevance vs. Context Size

More context is not always better. A very large context window filled with irrelevant information can reduce the quality of responses. Optimal context engineering means: loading the right context, not the largest context.

Agentic Coding Workshop: Using the Context Window Effectively

In the Agentic Coding Workshop at elasticbrains you will learn how to use the context window optimally:

Context engineering: which files to load into context and when
Structuring CLAUDE.md so that important rules are always active
Using auto-compact and session handover effectively
Optimizing project structure for better AI collaboration
Practical examples from real projects with 200,000-token sessions

Further Resources

Glossary: Context Engineering – structured management of the context window
Glossary: Agentic Coding – professional use of AI in development
Glossary: Large Language Model (LLM) – technical foundations
Glossary: Prompt Engineering – formulating effective inputs
Workshop: Agentic Coding Workshop

Agentic Coding Workshop

Learn this topic hands-on in our workshop - with real projects and experienced trainers.

View Workshop

More Glossary Terms

Back to Glossary